
Engineering a Statistically Average Bracket
Hey guys. This is a tad long, but I think it revealed some interesting stuff to use on your brackets this week. I hope you'll take a few minutes to read through!
Since I'm a sucker for Cinderella, my brackets usually have WAY too many upsets. With this in mind I wanted to gather some data, present it in a digestible way, and use it to write a computer program that will pick statistically nonabsurd brackets and see how the computer does against my friends.
I needed to figure out, on average, which seeds tend to be upset the most, and how often. So I used bracket information from 20002011 and counted the number of upsets in each game (1/16, 2/15, … 8/9). Since there are 4 regions over 12 years that gave me 48 total games at each seeding.
Taking the total number of upsets in each game over 48 gave me the average likelihood of an upset in that particular match up.
The upsets in 12 years with some useful averages:
So pause here and consider this: On average there will be maybe one 4/13 upset. Also, on average you should predict AT LEAST one upset in the 5/12, 6/11, 7/10 and 8/9 games
In order to use this data to pick my bracket unemotionally, I needed to fit the data to create a usable model for the likelihood of an upset. Cue Excel and some neato graphs.
Here's a graph showing the percentages from above with a linear fit. As you can see, it isn't terrible for 6, 7 and 8, but it predicts far too many upsets in games for the 2 and 3 seeds.
So, I thought this data looked kind of like an erf function. I used excel to fit the erf function to the data and the result was MUCH better.
I shoved the formula for this fit into MatLab and used a fairly simple algorithm to simulate upsets and output results. Think of this like the pretourney "Eye Tests" we love so much: No names, no brands, no emotion. Just numbers.
I'll let you guys know how my brackets do and if anyone is interested I'll share the MatLab code.
Last edited by driegner; 03122012 at 03:04 AM.

Re: Engineering a Statistically Average Bracket
nice work, thanks for sharing...looks like I'm going to update my picks a little.
**2014 CRZZZFFL REGULAR SEASON CHAMPION**
**2014 CF Fantasy Basketball Regular Season Champion**
"I’d rather f work at McDonald’s than work with some of those guys. Not that there is anything bad about working at McDonald’s.”  Bo Pelini

Re: Engineering a Statistically Average Bracket
looking for some feedback...so...bump

Re: Engineering a Statistically Average Bracket
Interesting  thanks for sharing. It would also be interesting to take this analysis into subsequent rounds  what percentage of time do various seeds advance to Sweet Sixteen, Elite Eight, etc.
As we all know, it's nice to get the firstround games right, but the big payoff for NCAA brackets comes with picking the following rounds.

Re: Engineering a Statistically Average Bracket
Originally Posted by driegner
looking for some feedback...so...bump
My feedback: You are a nerd.
But seriously, I love analysis like this and will take a look at it.

Re: Engineering a Statistically Average Bracket
How about the second round match ups?

Re: Engineering a Statistically Average Bracket
Originally Posted by jaretac
How about the second round match ups?
I have a pretty solid idea about how to do it but I have a final tomorrow morning. After my exam I may put some time into it but because far more match ups are possible it will add complexity to the model.

Re: Engineering a Statistically Average Bracket
Have you figured out yet how you are going to pick your upsets? Obviously it's not enough to get the right number of upsets, you need to pick the actual upsets as well. I've messed around some with football score analysis and gotten good correlations with conference records, but never anything that was predictive (other than a rough estimate of odds to win, but most of the time that's doable without a whole lotta math).
You can spend a lot of time and money picking out the perfect floral bouquet for your date ... but you're probably better off checking if you have bad breath and taking the porn out of the glove compartment.
The moral: you gain more by not being stupid, than you do by being smart. Smart gets neutralized by other smart people. Stupid does not.

Re: Engineering a Statistically Average Bracket
Originally Posted by GoCubsGo
Interesting  thanks for sharing. It would also be interesting to take this analysis into subsequent rounds  what percentage of time do various seeds advance to Sweet Sixteen, Elite Eight, etc.
As we all know, it's nice to get the firstround games right, but the big payoff for NCAA brackets comes with picking the following rounds.
I too would be very interested in seeing an analysis like this of later rounds.
I have a theory that picking no upsets whatsoever will generally result in a well aboveaverage bracket, but often not a winning one. While there will be upsets, the chances of correctly picking the upsets is not favorable. As your numbers bear out, for any given game, you are more likely to pick correctly if you pick the higher seed. For the past several years I have submitted a number of brackets in which I pick no upsets. I find that such brackets are usually aboveaverage and I am usually in the running until the final four is completed, but that the noupset bracket usually loses, as someone often correctly picks a few laterround upsets.

Re: Engineering a Statistically Average Bracket
Originally Posted by besserheimerphat
Have you figured out yet how you are going to pick your upsets? Obviously it's not enough to get the right number of upsets, you need to pick the actual upsets as well. I've messed around some with football score analysis and gotten good correlations with conference records, but never anything that was predictive (other than a rough estimate of odds to win, but most of the time that's doable without a whole lotta math).
That's the tricky part, I think.
I've done brackets long enough and seen statistics and so I already use an approach similar to what OP is trying (mine is less numberreliant, obviously, since I don't have firm data at hand). So I know to find a 102 upset, advance at least one doubledigit to Sweet 16, look for 125's and maybe a 134.
Trouble is always, which?
Even though that part is the challenge, and history can't predict anything, it's generally good to at least have a starting point, and not go too far w/ upsets (or not far enough) in first few rounds.

Re: Engineering a Statistically Average Bracket
Originally Posted by jaretac
How about the second round match ups?
Good point.
I think most people who do brackets for many years zeroin on where to find 3456 upsets, then ignore the next step.
Where might we get a 12/13 matchup in Roundof32? Where would you be willing to 'waste' both a 4 and 5 to move a 12 to the Sweet 16? It doesn't happen every year, but has occurred far more often than seems reasonable.
One 2 and one 3 are almost certain to bounce by the end of the first weekend. Who is it? Do you go with perceived vulnerability factor? Or which of the 611/710 teams has the best shot at surviving first two games?
And so on.
It's even further risk (but higher reward) to try an outsidethebox Elite 8. One of my brackets last season managed to include (8) Butler vs. (2) Florida. Of course, I whiffed by advancing Florida to Final Four. But at least I got that close.

Re: Engineering a Statistically Average Bracket
If you want a more robust data set (particularly for later rounds analysis) you can use this site, which has the records of all 116 seeds since the tourney expanded to 64 in 1985:
mcubed.net : Men's NCAA Basketball Tournament : Records per seed

Re: Engineering a Statistically Average Bracket
Originally Posted by Kyle
I too would be very interested in seeing an analysis like this of later rounds.
I have a theory that picking no upsets whatsoever will generally result in a well aboveaverage bracket, but often not a winning one. While there will be upsets, the chances of correctly picking the upsets is not favorable. As your numbers bear out, for any given game, you are more likely to pick correctly if you pick the higher seed. For the past several years I have submitted a number of brackets in which I pick no upsets. I find that such brackets are usually aboveaverage and I am usually in the running until the final four is completed, but that the noupset bracket usually loses, as someone often correctly picks a few laterround upsets.
This. The OP's data actually say that you shouldn't pick any upsets. In fact, trying to pick one upset (which his data suggests that there usually is one upset each for the 12 and 13 seed games) will probably actually reduce your chances of getting them all right. Of course, this will just lead to a better then average bracket, but will be unlikely to win any decentsized tournament pool. Unfortunately, I think you've gotta just roll the dice and hope you get lucky....

Re: Engineering a Statistically Average Bracket
Originally Posted by Clone9
This. The OP's data actually say that you shouldn't pick any upsets. In fact, trying to pick one upset (which his data suggests that there usually is one upset each for the 12 and 13 seed games) will probably actually reduce your chances of getting them all right. Of course, this will just lead to a better then average bracket, but will be unlikely to win any decentsized tournament pool. Unfortunately, I think you've gotta just roll the dice and hope you get lucky....
As I said, I tend to pick too many upsets. This data has given me guidelines as to how many actually tend to happen in a given year. The answer to that question is ~1 in the 4/13 and almost 2 in the 8/9.
Moving forward my next 2 steps are to build in the next 2 rounds, and to determine actual bracket configurations with a realistic number of upsets in the "best" positions.
I want to see if I can get MatLab to pick a better bracket than an average person. Can I get a model that will actually do better than someone who picks the favorite everytime?
I can't wait to play around with this more.
P.S. Anyone familiar with the subject may realize that this is very similar to a statistical thermodynamics question. We have a list of configurations and their relative likelihood of occurring. The "energy" in this situation is analogous to the probability of a given configuration occurring. It's unlikely that the system will be perfect (No upsets) but it's also unlikely to be completely disordered (all upsets). "Equilibrium" is somewhere in between.

Re: Engineering a Statistically Average Bracket
Originally Posted by driegner
I want to see if I can get MatLab to pick a better bracket than an average person. Can I get a model that will actually do better than someone who picks the favorite everytime?
The first question is not all that interesting. As discussed, a noupset bracket is almost always going to do better than average.
If you could build a formula such that the answer to the second question was a definitive yes, you could make a fortune as a Vegas oddsmaker. That would undoubtedly require the input of far more information than you plan to use though, and I'm sure you are not the only one that has tried it. Here's a short article on predictive statistics.
The Secret Formula for Picking NCAA Basketball Tournament Winners  Wall St. Cheat Sheet
It seems that the most interesting thing for you to try and accomplish is to use the statistical data to create a system for creating brackets that is more likely to produce winning brackets than other methods. Using the numbers you provided for the firstround games, it seems that a bracket that picks upsets in the ratio your numbers suggested is more likely to be a "perfect bracket." I suspect that as the poolsize increases it becomes more necessary to get closer to a perfect bracket in order to have the top bracket in the pool. For example, if I am competing against only one other person, I suspect that a noupset bracket would be the oddson favorite to win, as the chances that the one other person correctly picked the upsets is small. It would seem that as the pool size gets larger, the odds that a noupset bracket will be the best should get smaller faster than the odds that would be predicted by simply increasing the size of the pool. I'd be curious at what point the odds of a noupset bracket are reduced to less than one over the pool size (e.g. 20% win probably for a five person pool). It seems that as the pool gets larger, one must pick more upsets in order to win. Here's an article that suggests the same thing.
March Madness: Using Game Theory To Win Your Upset Picks  Sports Business  Minyanville.com
The basic technique you are using seems best suited to determining the number of upsets that should be picked when in an extremely large pool. For smaller pools, you are probably better off picking fewer upsets than are suggested by your numbers.
Last edited by Kyle; 03122012 at 06:54 PM.
Seneca Wallace.
Posting Permissions
 You may not post new threads
 You may not post replies
 You may not post attachments
 You may not edit your posts

Forum Rules
 

Bookmarks