This is the final part of our three part interview with Professor Sokol of Georgia Tech. Professor Sokol explained his system in Part I and told us about the differences between college hoops and college football statistically in Part II. In today's segment, Professor Sokol goes into greater depth about his system and how it has changed since its initial inception. He also answers the age old question about predicting Cinderellas.
FTRS: How accurate was your first model and what tweaks have you made since then? How accurate was it for the 2009 Tournament?
Sokol: Our initial model was (statistically) significantly better than competing rankings like the AP and ESPN/USA Today polls, the RPI, and Sagarin and Massey's ratings. We've made some tweaks since then, including rolling out a Bayesian version of LRMC this year (we now report both the original and Bayesian methods on our page). The Bayesian version seems to do a better job of valuing some of the teams whose results give less helpful information.
For 2009, our methods were actually less successful than our competitors. In 2008, the opposite was true (in fact, in 2008 LRMC correctly predicted the whole final four, final two, and winner). Statistically, a single year's results are almost never enough to be significant -- but we now have 10 years of tracking data, so we can say that over the long run LRMC is statistically-significantly better.
FTRS: Can you give us a layman's definition of the Bayesian model you referred to?
Sokol: Bayesian models are a different sort of statistical methodology than standard parameter estimation. In most basic statistical models, we assume there's some "true" value of something (for example, how good a basketball team is), and using the data we have, we try to find an accurate estimate of that unknown true value.
In a Bayesian model, we start with a pretty generic guess, and use the data we observe to update that guess.\
(Editor's Note: Many probability and statistics students use Bayes' Theorem to deal with conditional probability. If you would like a more detailed explanation of Bayes' methods, click here.)
FTRS: You refer to statistical significance in the same answer. How many years of NCAA data is required for the results to be statistically significant?
Sokol: Actually, the amount of data needed depends on how big a difference there is between the methods we're comparing. For example, suppose you want to see which of two basketball teams is better. So, you have them play each other every day until you're sure you know the answer.
If it's Duke vs. NJIT, Duke will probably win the first 10 games (and they'd all probably be blowouts). It's likely that you'll just end the experiment there, and declare that Duke is a better team.
But if it's Duke vs. Kansas, the first 10 games might be split 5-5 or 6-4, so it's less clear who's really better. (Even 6-4 isn't convincing; if one close game went the other way it would've been a 5-5 split.) So you might make them play 10 more, and now it's 11-9, still pretty close. So you make them play more, etc. Eventually, you'll get to the point where you can say okay, it's now (say) 115-95, and that's convincing enough that the team with 115 wins is better (though not by so much) than the team with 95 wins.
The question is how many games you need, and that's easy to answer using standard statistical techniques.
In our case we're going the other way. We have a certain number of games (in our data set, we have 10 years, or 630 total tournament games), and we want to know whether it's enough to show statistical significance. Normally, 630 games would be enough -- but not necessarily here, because so many games give no information. For example, any reasonable prediction method will have the 1-seed beating the 16-seed in the first round, so those games don't tell us anything about which prediction method is better. In fact, any game where two methods predict the same winner doesn't give us any useful information about comparing the two methods.
So, we have to use something called McNemar's test, which compares two methods only on the games where they disagree on the predicted winner. With dissimilar methods, there could be a hundred or more disagreements over the 10 years, but with similar methods (like when we compare two versions of LRMC) there might be only 2-4 disagreements per year. So for some comparisons we don't yet have enough data to claim statistical significants. [In fact, we actually have two Bayesian improvements to LRMC, and not enough data to be sure which one is better than the other, but we do have enough data to show that they're both statistically significantly better than the original LRMC.]
FTRS: Once the brackets are announced, how long does it take for your model to run a simulation? Do you run multiple simulations? Can you tell our readers why it is a good idea to run more than one simulation?
Sokol: We actually don't run a simulation; instead, we just rank all of the teams and assume that the better team is our predicted winner in each round.
That's not always technically the best prediction though -- for example, consider 4-team pod where A, B, and C are ranked in that order, but are very close, and D is much worse. A plays B in the first round, and C plays D. So A vs. B is almost 50/50, but C is very likely




















