Sports

NCAA tournament predictions: Matthew Record’s statistical analysis offers insights for your March Madness brackets

Advanced model says Iowa, Tennessee, Louisville, Pitt and Oklahoma State are undervalued, while the selection committee overrated North Carolina, Syracuse, Iowa State, Kansas and Michigan.

by Matthew Record

Unlike college football, basketball has a much more egalitarian nature to it. Whereas the talking heads rehashing the same bullshit about the same teams in the same conferences get all the play, in basketball the best teams actually have to beat the other best teams to win a championship. It’s a wonderful system (how about that, BCS?). However, between the tired axioms of analysts on TV and the opaque (and often, needlessly complex) statistical models of metric communities online, I thought it would be fun to see if I could carve out a more parsimonious, predictive path of my own.

It’s worth pointing out that on average, each team (after the 4 play-in games) only has about a 1.6% chance of winning. In practice, obviously, Cal Poly is a lot closer to 0. Even the most likely team to win it all is only has, in my estimation, somewhere between a 4%-10% in the final analysis. That doesn’t bode well for those of you hoping to look like geniuses with your predictions. I’m no different, but I definitely like my model.

Speaking of which, this is how I constructed the model. [Note: If you’d rather just get to the predictions, you can skip to the next section.] I began with offensive and defensive efficiency rankings from teamrankings.com and utilized those and a couple small controls to run a regression, using wins as an outcome variable. The coefficients generated helped to construct a weighted overall efficiency.

As it turns out, in the NCAA (at least this year), a team’s ability to score a basket is just a touch more valuable than that same team’s ability to prevent one from being scored. That weighted overall average was then multiplied by conference difficulty, proxied by each one’s non-conference winning percentage. In other words, the Big 12 was the most powerful conference in this model because their teams did the best against non-Big 12 opponents. This is obviously not perfect but it was quick and dirty and generated some promising looking results. The R-squared for these analyses (the measure of the overall model fit) was a little less than .87. This is very, very good for any econometric analysis but it does mean that 13% of what causes a team to win a game – coaching, home court advantage, luck, confidence, information, preparation, etc. – is not accounted for. Thus, this 13% is something akin to a margin of error when predicting individual games and any prediction made where the percentages are within 13% of each are predictions made with diminished confidence. Filling out your bracket is all about finding the actual value relative to the perceived value, so let’s start by identifying value propositions amongst the teams in the tournament.

Underrated Teams

Iowa

This is the hardest team in the tournament to understand and thus the hardest team to predict. One thing is certain: their excessively low seeding is purely a function of having had a rough finish to the season. The Hawkeyes are a team with Final Four-level talent but they’re stuck with a play-in game thanks to a big late collapse. Looking through their schedule, we see that they beat many of the toughest teams in the Big Ten only to lose to the likes of Minnesota and Indiana. Iowa is seeded 45 overall in the tournament and my model has them ranked 6. Admittedly, 6th is probably too high but this team was ranked the 12th most efficient offense, 53rd most efficient defense and had the 11th best point differential in Divison 1. Iowa had only two non-conference losses while playing in basketball’s second-toughest conference. They deserve better than a play-in game.

Tennessee

The biggest knock against the Vols is their very tough play-in game against the aforementioned Iowa. Seeded 44 in the tournament, my model has them at 23 based in part to their 29th best point differential in basketball this year. This is a very well balanced team – within the top third of D1 in both offense and defense. However, their conference argument is weaker than Iowa’s: they have more losses than the Hawks in a weaker SEC. In the end, there isn’t nearly the mystery there is with Iowa – they win the games they should win and tend to lose games against better teams. In my opinion, they likely won’t make the kind of run Iowa could if they make it past the play-in game but this is still a very solid team.

Oklahoma State

This is a very good team ranked 26 spots too low according to my model (35 vs. 9). They had a very impressive 11.2 point differential (ranking them 14th overall) in the single toughest conference in basketball. The Cowboys had a solid top 50 defense and a top 20 offense. Of their 9 losses, 11 came against teams in the tournament. However, while I will call them underrated, I’m not quite predicting them to go on much of a run given their insanely tough draw in the first two rounds against Gonzaga (19th best in my model) and Arizona (2nd). If they do somehow find their way through the morass of the first two rounds, they are a real threat to make the Final Four.

Pittsburgh

This is another team way underrated by the tournament masters – by 19 spots according to my model (17 vs. 36 seed). Pittsburgh continues our trend of teams over-penalized for losses (7 of 8) in tough conferences (in Pitt’s case a tough – but in fairness, not elite – ACC). The Panthers played tough against a phenomenal Virginia team twice, holding them to 48 and 51 points in their meetings. All their losses came against tournament teams except a tight game against Florida State on 2/23. Similar to Oklahoma State, Pitt is underrated and thus a threat in the early rounds, but I would not call them elite in their own right. The biggest compliment you can pay Pitt is also the biggest knock against them – solid in all phases, exceptional in none: top 30 offense; top 40 defense.

Louisville

Louisville is third in the Vegas odds to win it all and I still consider them underrated. By efficiency rates, the Cardinals are the best team in the tournament and it’s not close. The only knock against Louisville is their weak (transitional) conference. Honestly, this is the reigning national champion – are we really going to doubt them based on their conference? In fact, my model, if anything, over-emphasizes the best conferences and still has Louisville as by far the best team. Furthermore, thanks to their unbelievably low ranking, they have a cakewalk to the sweet 16 where they likely run into Wichita State or Michigan. The Cards are a full standard deviation better than the second place team in point differential: Louisville doesn’t beat their opponents, they destroy them. Louisville is top 3 in both offensive and defensive efficiency. Besides Wichita State, no other team is even top 40 in both categories. No one can hope to touch Louisville for its elite play in both phases of the game.

Overrated Teams

North Carolina

Team A: 23-8, 5 conference losses in the ACC, 7.2 point differential, 84th offense, 47th defense

Team B: 23-8, 7 conference losses in the ACC, 10.1 point differential, 28th offense, 39th defense

You guessed it, Team A is North Carolina, currently seeded 21 overall, and Team B is Pittsburgh, seeded 39th. The cherry on top, of course, is that Pittsburgh just beat North Carolina in the ACC tournament. Vegas has done a slightly better job of handicapping these teams properly, having Pittsburgh 23rd overall to win it to North Carolina’s 16. Without the name and the baby blue unis, North Carolina is probably somewhere between an 8 and a 12 seed in its region. This isn’t to say that North Carolina couldn’t make a little bit of a run. Their placement in the East means that Carolina is in the weakest region by far. North Carolina only needs to get past Providence and a very beatable Iowa State team in order to make the Sweet 16.

Speaking of beatable Iowa State teams:

Iowa State/Syracuse

We’re going to discuss these two together as they are the weakest of all the 3 and 4 seeds in the tournament.

While their ultimate fate is similar, the specific nature of Iowa State’s and Syracuse’s deficiencies are very different. Syracuse is a team without star players, well coached and very well balanced, ranked 41 in offensive efficiency and 42 in defense. They simply over-performed their talent and level of play in the ACC. It could be that Jim Boeheim has turned into the Gregg Popovich of college and built a team that will out-perform its talent level into the Final Four. More likely, though, this is just a 7 seed masquerading as a 3 seed.

Iowa State, on the other hand, is a two-man show with Melvin Ejim and DeAndre Kane both top 50 efficiency players in Division 1. Most teams don’t even have one player ranked that high. So with that talent on the team, what’s the problem? It’s difficult to say exactly – it could be that with the NBA as a likely end for Ejim and Kane that the team dynamic focuses unduly on those two players while they get their STATS or it could be that the talent level on the rest of the team simply isn’t there. In any event, the list of teams with a better defense in this tournament includes American, Manhattan, Tulsa and Mercer. Iowa State’s excellent offense carried them through a tough Big 12 but to get past the first weekend, they’re going to have to prevent some baskets, too.

Kansas

Kansas’ problem is more or less an amplified version of what is wrong with Iowa State. Compared to the Cyclones, Kansas has a higher ceiling, a lower floor and a lot more uncertainty. Over the last 3 games, with center Joel Embiid on the bench, Kansas has the fourth worst defense among teams in the tournament. That’s play-in game bad. The larger problem is that we can’t all breathlessly watch the injury reports and then make a solid determination. It could be that Embiid is hampered if and when he returns but that’s not what I mean. Kansas is still a below-average defense (worse, in fact than Iowa State) even when Embiid is 100%. That, above all, is why I would consider them overrated. Many elite offenses in this year’s tournament don’t have anywhere near Kansas’ problems on the other side of the ball. If Embiid comes back in time for the sweet 16, I still like Ohio State or Syracuse against them. Moreover, it may not come to that since a second round upset by a solid New Mexico team is very possible.

Michigan

It’s debatable whether Michigan should even be in this section, not because they aren’t ranked too high (they are) but because they have an unbelievably easy path to the Sweet 16. With Wofford in the first round and equally weak Arizona State/Texas in the second, Michigan might make the Final Four just by virtue of their freshness in a very tough Midwest bracket. Wichita St, Kentucky and Louisville are going to rough each other up on the other side of the bracket before they even see Michigan. Don’t get me wrong, Michigan is a solid team with a 9 point differential in the second best conference in basketball, but their 2 seed should really be more like a 4 or 5 and stands in contrast to Louisville who should really be a 1. Due to the peculiarities of the selection committee, however, Michigan gets to breeze through the early rounds while Wichita State is forced to play 3 Final Four-level games just to get to the Final Four. They may be overrated, but a Michigan run isn’t just possible, it’s likely.

Round 1 Upsets

We’re going to define an upset as any time a lower seed my beats a higher seed, irrespective of ranking. Obviously, a 9 beating an 8 isn’t anything crazy but it is still technically an upset. Since, 2000, the median number of first round upsets (defined broadly) is 8.5. There ought to about 19 upsets overall with most of those, obviously, occurring in the first two rounds. Thus, in order to keep consistent with the historical numbers, I will be considering all lower seeds wining an upset while acknowledging that a 9 beating an 8 is a lot less fun than a 15 beating a 2.

Possible but unlikely

(11) Stanford over (7) New Mexico. The only reason to consider Stanford is relative conference strength. The Mountain West is obviously not in the same class as the Pac-12 but Stanford is obviously not in the same class as New Mexico. Only real mid-major haters ought to take Stanford.

(12) Xavier/NC State over (5) Saint Louis

This is the first of a few possible upsets by play-in teams. Such upsets are obviously very tough to predict since it adds an entire other team to account for as a counterfactual. Against NC State, Saint Louis is a safe bet and I wouldn’t consider an upset to be much of a possibility. Against Xavier, though, I only have Saint Louis is 48% to win vs. 38% for Xavier, well within the margin of error. Saint Louis has one of the best defenses in the tournament but are almost as bad on offense as Cal Poly. I wouldn’t say it’s likely but Xavier in a low scoring game is not unthinkable.

(12) Harvard over (5) Cincinnati

Cincinnati got hosed on this match-up, plain and simple. They are by far the best 5 seed and with whom do they get matched up? By far the best 12 seed. This is still Cincinnati’s game to lose, their defense is elite at a level that Harvard has never seen this year but Harvard can play a little D, too. This one will likely be a war of attrition but Cincinnati is still odds-on to win.

(12) N. Dakota St. over (5) Oklahoma

If Oklahoma were playing Harvard, they would probably lose. They are downright bad on defense but N. Dakota St simply doesn’t have the weapons to exploit the Sooners’ weakness. It’s a tight call and I’d have no problem justifying a 12 over a 5 here, but either team loses in the next round anyway.

(12) Stephen F. Austin over (5) VCU

This is a popular choice among talking heads on TV for an upset but VCU is a really good team and Stephen F. Austin represents perhaps the most unknowable quantity in the tournament this year (at least among teams that might make it out of the first round). Stephen F. Austin feels like a “discovery” so it is fun to pick them. They might win, but I tend to lean toward no.

Toss-ups

(9) GWU over (8) Memphis

Toss a coin. Literally, it’s about the same odds. I have Memphis 43.7% to win and GW at 43.2%. Memphis has more tournament bona fides, so you might want to lean that way.

(9) Oklahoma St. over (8) Gonzaga

Again, this barely qualifies as an “upset” but this is the first game where I have the underdog as more likely to win. Although, again, not by much.

Likely Upsets

(11) Iowa or (11) Tennessee over (6) Massachusetts

The fans of SMU and Louisville and Wichita State can complain all they want (and they will), but no one got screwed more than Iowa and Tennessee. Why we have 11 seeds in a play-in game I will never understand. Why Iowa and Tennessee aren’t, like, 5 or 6 seeds I’ll never understand. The fact that one of these two teams gets eliminated before the tournament really even starts is pretty goddamned unfair. These are both good teams. So good, in fact that I have either of them as a healthy favorite against a way overmatched UMass team. Iowa is the better of the two teams so I’d ride them but I think either team is likely to dispatch Massachusetts.

(9) Pittsburgh over (8) Colorado

Again, not much of an upset but very, very likely. I’ve sung Pittsburgh’s praises enough already but suffice it to say this is more like an 8 vs. a 4 than an 8 vs. a 9.

(10) Arizona St. over (7) Texas

People seem to like Texas and I can’t really understand why. They had a worse point differential this year than West Michigan, Xavier, St. Joseph’s, Tulsa, East Kentucky and Dayton. They had a couple squeakers against North Carolina and Texas Tech. Turn those into losses and the Longhorns are likely in the NIT instead of enjoying the 25th seed in the NCAA Tournament. Arizona St. isn’t an amazing team, but they’re definitely better than Texas. I see this as a comfortable Arizona State win.

Round 2 Upsets

Possible but unlikely

(5) VCU over ( 4) UCLA

This is much more likely upset than it looks like based on the seedings and I’d still call UCLA the favorite. However, VCU is a strong, strong team and if they make it past this game a real run or the Final Four is not out of the question, especially since I’m not nearly as high on Florida as the rest of the world is.

(7) New Mexico over (2) Kansas

7 seeds UConn and Oregon would both be likely to give Kansas a lot of trouble. If the Jayhawks escape the first weekend, they can thank New Mexico’s relatively weak offense.

(7) Oregon over (2) Wisconsin

Speaking of 7 seeds, there is no one Oregon wanted to see less in the second round than Wisconsin. They’re very unlikely to outscore an offense like Wisconsin’s and they don’t have another style of play to fall back on.

Toss-ups

(11) Iowa over (3) Duke

If Iowa makes it past the play-in and UMass, this could be the single most exciting match-up of the first weekend. An over-under of 170 combined points would not be far-fetched. Iowa’s defense is far superior to the Blue Devils’ from an efficiency standpoint and as such, this match represents the best possibility of a double-digit seed making it to the Sweet 16.

(5) Cincinnati over (4) Michigan State

Did I mention how screwed UC was by the tournament seeders? Michigan State will likely pull this one out but you could justify going either way on this one.

Likely Upsets

(6) Ohio State over (3) Syracuse

This looks like a homer pick but it really isn’t. As outlined above, Syracuse has been playing over their heads all year and the Bucks started the season by winning 14 straight. Ohio State’s shutdown defense is going to run into steamrolling offense at some point in this tournament, but I don’t think it will be Syracuse.

Any predictions beyond the first two rounds would be too speculative to be useful. But my overall predictions for the Final Four are: Florida, Villanova, Arizona, Louisville. Champion: Louisville.

Good luck everyone!

Matthew Record is a PhD student studying Public Policy and Management in the John Glenn School of Public Affairs at Ohio State University. He is the drummer and driving force behind the indie pop sextet Fortune & Spirits and the sole contributor to his own blog.

Matthew’s favorite board game is Power Grid and he suggests you all go to boardgamegeek.com right now to discover one of your own. He currently resides in Columbus, OH.

Categories: Sports

4 replies »

  1. I’m a big fan of statistical systems and their relationship to sports, and I applaud you for making your own. Having said that, I think it’s very difficult, if not impossible, to include some of the most important factors in them. I’m not talking about one team being “up” and the other being “flat” for a game, though I suppose it would be possible to include such a thing based on circumstances of a particular game (coming off a big win, coming off a big loss, important for tournament consideration or irrelevant for such, etc.)

    One thing that tends to be missing from such systems is in accounting for match-ups. For instance, you believe that UNC is overrated, and I tend to agree. But I’ll also bet that if you check your data, you’ll find a very high standard deviation on their results. The Heels are very, very good in transition, and very good in the front court. They are not very good in the half court game and, while Marcus Paige is a very fine three-point shooter and can drive the basket and finish, the back court otherwise is quite weak. So, if you match the Heels against a team that loves to run, they have a chance of obliterating that team. If you match them against a team that denies transition, is strong on the inside, and forces more three-point shots, the Heels fall flat. You can see this in their results with losses to defense-minded Pitt, Virginia, and Syracuse, and wins against Louisville, Michigan State, Kentucky, and Duke. Yes, they sometimes beat teams with good defenses, and sometimes lose to teams that like to run, but there’s definitely a pattern there. If I were betting UNC games, I’d bet them to win against most transition teams and to lose against most half-court, defensive teams.

    Another factor that gets left out is if a team runs an unusual offense or defense. The Princeton Offense, for instance, befuddled some teams the first time they saw it, and that’s an advantage in a national tournament when teams are often seeing each other for the first time. This year, Arizona and Virginia run a very effective Pack Line defense, which should give them an advantage over teams not in the ACC or Pac 12. Syracuse runs its own version of the 2/3 zone, which has given them an advantage in the past and should, again. VCU’s Havoc defense is a more known commodity, but it has variations that this team is just getting comfortable with at this point in the season, and may also take them a bit farther than a more familiar defense would. BTW, this advantage is more pronounced, I think, on second-games at any level (sub-regional, regional, final four) because the other team has less time to prepare for them.

    Most statistical systems (but not all) also give too little weight to recent results over early results. Teams tend to get better or worse over a season. Sometimes, a team that’s young at the beginning of a season is hardly even the same team at the end. They gel. They better comprehend the coach’s system. They improve individually. Other teams tank. They get tired. They lose faith in the system and/or their teammates and start taking bad shots on their own. They quit fighting for position down low because they rarely get a pass into the post. They had some close losses and lost confidence in themselves. Who knows why, but it seems to me that taking season-long statistics and weighting early results the same as late results is probably going to lead to less accuracy in the predictions.

    My picks to go farther this year than their seeds would suggest include:

    VCU — getting better as the season goes along, and they have an unusual defense.

    Syracuse — Yeah. I know. Their seeding may seem to high. I agree with you on that. On the other hand, they are not deep, so they got tired down the stretch, and an injury to Jerami Grant drastically reduced their effectiveness. They should show up rested and healthy, and their 2/3 zone will be a huge problem for many teams. I have them as a dark horse Final Four team.

    Michigan State — A popular pick with the talking heads, but for the right reasons. They’re mostly healthy, talented, and well-coached, and they’re peaking. That’s tough to beat.

  2. I agree with many of your larger points. For example, I have call Michigan “overrated” in ab absolute sense but personally find their early round draws to be weak. I don’t consider them a top 20 team in this tournament but they could (and likely will) end up finishing in the top 20. Same goes for North Carolina who may have, pound for pound the easiest path to the elite eight of any of the top 24 seeds. The overrated/underrated designation has as much to do with the seeding as it does with how far I think they should go.

    As far as your final picks go, I like VCU a lot. I don’t see them in the final four but it’s a strong team. Not only does my model think Michigan State is weaker than everyone else in the world seems to, it’s a crowded bandwagon do it’s no fun to pick them anyway, Syraucse is the one on which I most disagree with you. They’re really not elite in any phase of the game (including their much vaunted defense). They get a lot of play on TV and Jim Boeheim is probably the third or fourth most famous/popular coach in the league, so in my opinion Syracuse is almost always overrated.

  3. One last point – you put a lot of stock in the novelty of defenses and their effectiveness against tournament teams that have never played against them before and there is some evidence to suggest that that’s true. Syracuse, for example, didn’t lose a single game until conference play. They beat Duke the first time and lost to them the second time.

    However, if Syracuse makes it out of the first two rounds, other teams will have a solid week to prep. The overall novelty of a defensive scheme and the effectiveness of that novelty with regard to winning a game should be captured, for the most part, by this model. What WOULDN’T be captured would be if Syracuse had an equally (or near equally) effective half-court trap offense they could transition to in case the other team was unusually successful from the outside. This is something that happens with the Spurs very often in the NBA – they have multiple offensive and defensive configurations they can switch to. If Boeheim actually does have some tricks up his sleeve we haven’t yet seen in Syracuse then indeed, they should probably outperform my model. I’m skeptical, however.

    It’s worth emphasizing that my model doesn’t have Syracuse as a mortal lock to drop in the second a round. They are simply a team that’s over-seeded and more likely than many others to lose early.