From Predictive Analytics to Support: Philosophy and Application

The “Target and Tailor” Approach

During my very first job out of graduate school, I was charged with building our institutional model of student success. My boss, the VP of Student Affairs, insisted: We don’t need new programs – we need to better connect students with the ones we already have.

For some reason, this was a concept that not only stuck with me but buried deep inside my professional being. Having spent my career working with and within colleges and universities, I’ve always wondered why student success isn’t built on two fundamental strategies:

1. Targeting outreach to students based on their probability of success.

2. Tailoring interventions based on student strengths and challenges.

To me, it always seemed like a blinding flash of the obvious. After all, that’s what a background in assessment trains you to do: use data to identify areas of need, target action, and – hopefully – observe improvement. If we’re focusing on student success, it’s really the only approach that makes sense. Sure, given my background in developing assessments, I may be biased in suggesting that using student-level measures should be the centerpiece of that plan of action, but as I often tell my wife, “Just because I’m biased doesn’t mean that I’m wrong.”

Having spent the last fifteen years trying to build, demonstrate, and apply this approach with colleges and universities across the country, I’ve learned that it’s obviously not as clear to others as it is to me. Particularly since starting DIA almost six years ago, a lot of my time has gone toward the exploration of various approaches to explaining these concepts in hopes that maybe a different hook will help people understand. Well, let’s consider this another trip to the plate…

I wanted to use this blog post to give some really quick and powerful applied examples of that first tenet: targeting outreach to students.

So many schools are trying to use analytics to improve student success, and indeed, there are many powerful platforms to give insight into the various predictors of success, groups of students who require support, etc. Yet these platforms have yet to improve student success to an extent commensurate with the power of the methods, analytics, and software being employed. I think this happens for two reasons: (1) structurally, our institutions are not able to respond to these data, and (2) culturally, our perceptions about support are misaligned.

In this post, I’ll first share with you some models we’ve run recently to predict student success within two verydifferent institutions. Rather than dive into the data and methods, I simply want to share with you a quick tidbit about these institutions, demonstrate the efficacy of these models, and show you what the results look like. Second, I will share with you our approach to student support and the challenges that many institutions face in applying such a mindset.

A Predictive Analytics Preamble

Before we get started, a few important notes about predicting student success. There are about as many methods and approaches as there are data points themselves, but I’ve always found a few principles central to effective and impactful predictive analytics.

The things we measure must be observable before we want to intervene. After all, this is the whole point of proactive student support: helping students before major issues arise, rather than responding to an issue after it’s too late.
The things we measure should be malleable. Certainly, we need to understand student characteristics like identity, socioeconomic status, and other demographic characteristics, but if that’s all our model has, we’re going to be left with little information about what exactly to do. After all, we’ve known for decades that first-generation students are less likely to succeed than their peers, but simply having that knowledge has done little to help us close that success gap.
The results of a model must suggest action. No matter how powerful or accurate analytics might be, if they don’t help guide institutional action, what good are they?

There are a lot of consequences of these principles, but that level of depth and explanation is beyond our purposes here. I would like to say though, that these guidelines have led me to a pretty common and simple approach to analytics. I tend to use a pretty limited set of data and run a rather simple (multiple or logistic regression) model to estimate a probability of success. These results are then used to create categories of students that recommend action. Again, I’ll spare you the “which data?” and “how to run the models?” discussions for the time being (those are posts all their own), instead I’ll focus first on the results, then talk about our approach to action.

Different Schools, One Approach, Similar Results

Both of our schools are users of our ISSAQ platform. ISSAQ starts with a broad-based assessment of noncognitive skills: twelve behavioral, motivational, emotional, and social factors related to student success in higher education. We also ask students to report some indicators of academic preparation (i.e., HSGPA, ACT, SAT), though we can estimate success regardless of their availability. Ultimately, those factors are standardized and consolidated into a single academic preparation indicator. That means that, at most, any model we run has 13 potential variables.

Those variables are compared to student outcomes to build a predictive model. Typically, one model is built to predict academic success (i.e., GPA) and another for retention, as previous studies have shown that the composition of predictive models varies significantly across academic and enrollment outcomes (Robbins et al., 2004; Markle et al., 2013).

Let’s start with SPSU (a Small, Private, Selective University). For SPSU, as with several schools with which we work, one of the challenges we face is the restriction of range. For example, the average first-year GPA in the sample we examined (n=1,828) is 3.4, and the retention rate among that sample is 92%. This means that predicting these outcomes can be challenging.

From a statistical perspective, I’ll just say simply that our models are quite predictive, with both academic and noncognitive factors contributing to the prediction of each outcome in a statistically and practically significant manner. Rather than emphasize uninterpretable metrics around significance, I’ll instead show you some results.

One of the advantages of using a regression-based approach to building these models is that they give you a predicted level of the outcome to be compared to the actual outcome (which is essentially how the models evaluate statistical efficacy). From a practical perspective, however, it’s a nice feature because you can look at the relationship between predicted and actual. In the figures below, you’ll see four groups, each representing a quartile based on the predicted probability of success. In other words, Q1 is the 25% of students with the lowest predicted GPA/probability of retention, while Q4 is 25% of students with the highest predicted level of each outcome.

As you can see, both models show a strong relationship between the predictive model results and actual student outcomes. Even at a university with an incredibly high retention rate, we were able to produce two sizable groups with a 15% difference in the probability of retention.

As a drastically different example, I also present to you some results from STC (a Small, Technical College). STC is an interesting example because, rather than retention, they wanted to examine attendance. Given an accreditation regulation, STC is required to track each hour of instruction that students attend. Thus, this measure is important for not only persistence but instructional quality.

Once again, you can see significant discrimination across the two models. Particularly concerning the number of hours missed, where those students in the bottom 20% missed, on average, more than seven times as many hours than those students in the top group. (Note that the 20/60/20 split was chosen by STC strategically, based on the number of students to whom they intended to provide various levels of support).

So why have I shared these results? The short and quick answer is that I want you to get some tangible feeling of what practical predictive analytics looks like. Yes, our ISSAQ platform is a powerful tool for doing this, given the value of holistic data in predicting student success, but these types of simple models can be run at many institutions. Mostly, I’m hopeful that, upon seeing these results, you begin to realize that you could easily segment your students into similar groups early in their career.

The real question is this: what would you do if you had that understanding of your own students?

A Different Mindset of Action

One of the major challenges to our student success strategy is that it relies on a quantitative approach to support. We assume that those students with a low probability of success need large amounts of support and those students with a high probability of success need less. This creates two problems.

First, it’s a drastic misstep for those students predicted to be high performers. Frankly, we tend to leave those students alone, assume they’ll be fine, and anticipate that, if they do run into a problem, they’ll let us know. This can lead to a lot of “false positives” in our prediction, but it also misses a massive chance to engage these students. Students who come in with strong profiles shouldn’t just be left to their own devices, they should be encouraged to maximize their learning and development.

Are you super organized and academically prepared? Why not become a tutor or peer mentor? Or, we could get you involved in undergraduate research, connected with the career center so you can find an internship, or otherwise engage you with those resources on campus that don’t just overcome challenges but maximize learning and development. Similarly, those students who are motivated and socially engaged could join a campus organization, become a tour guide or resident assistant, or otherwise become an even more involved member of our campus community.

But when it comes to students with a low probability of success, I do not think that “more of the same” is the appropriate action. For most institutions, the primary step for students who are at risk of attrition is to increase the outreach from advising or similar offices. Yet this is often insufficient. Indeed, a holistic model helps us better understand if and when students are feeling disconnected, unengaged, or unmotivated. For these students, a qualitatively different type of outreach is needed.

What that looks like varies across institutions. In some cases, it might be a dedicated coach or advisor. In others, a mandated student success course. In still other cases, engaging a student with a living-learning community could provide the type of wrap-around support the student needs. The bottom line is that simply providing “more advising” is probably not sufficient.

“What about the students in the middle?”, you might ask. In this case, we’d recommend a strategy of ensuring that students connect with those extant resources at your institution – all those “high-impact” and otherwise promising co-curricular practices that we all know nearly any student could benefit from. Here, it’s more about identifying and addressing a smaller number of potential challenges rather than thinking of a drastically different type or level of support.

While our estimation of a student’s likely outcomes is quantitative, we encourage institutions to think about these qualitative differences in support. And it’s this differential approach to support where most institutions struggle. We simply do not hold the capacity to separate a student’s path in a meaningful, intentional, and prescribed way. Resources like living-learning communities and student success courses are decent examples, but I think we can do better.

One last note for those of you worried about the zero-sum game. The beauty of an analytics-based approach to student support is that we can use the data to spend our resources most wisely. If our advising loads are overburdened, we can ensure that the earliest and most frequent outreach goes to those students who need it. Those strong performers can likely be reached through an email, newsletter, or even direct outreach from those programs and resources outside of advising. Thus, the strategy, as much as the mindset, is about quality over quantity. It helps us spend our resources not only where there needed the most, but does so in the most appropriate way.

From Predictive Analytics to Support: Philosophy and Application

Recent Posts

Comments