Sunday, December 16, 2007

Top NCAA Women's Programs 2003-07, Relative to National Tourney Seedings

With another women's NCAA Division I season on the books -- Penn State having defeated Stanford in a five-game final -- now is a good time to take stock of how the nation's leading programs have been doing in NCAA play in recent years.

As one option, we could look at which schools have been winning championships and making the Final Four. Looking at the last five years, we would find many "usual suspects," such as Stanford, Nebraska, Washington, USC, and Penn State; even Minnesota, whose icy cold locale doesn't necessarily suggest volleyball greatness, has made two Final Fours in this timeframe.

A more subtle approach, however, is to look at which teams have done best relative to their seedings. Such an analysis can tell us which teams raise their games come tournament time, compared to what their regular-season performance would have suggested. A team that comes into the NCAA tourney as a No. 16 national seed, for example, would be favored to win two matches, until becoming an underdog against the No. 1 seed. If the No. 16 team were instead to win four matches, it would receive a +2 difference score, reflecting the magnitude of its "overachievement" (that a team may have underperformed during the regular season would also be compatible with a positive difference score).

Conversely, a team that compiled a negative difference score (i.e., winning fewer matches than expected from the seeding) would suggest underachievement in the post-season.

Because weird things can happen in any given tournament -- witness this year's early exits by Nebraska, Washington, and Wisconsin -- I felt that aggregation over the past five years would be helpful.

The difference-score approach is not original to me. In the mid-1990s, I saw someone apply it to NCAA men’s basketball. Also, it follows the logic of the chi-square statistic, in terms of taking the difference between actual (observed) and expected counts. Let's start by going over how many matches a team is expected to win in the NCAA tournament, based on its seeding:

The No. 1 national seed is expected to win 6 matches, thus capturing the title...
The No. 2 national seed is expected to win 5 matches, thus making the final...
Seeds 3 and 4 are each expected to win 4 matches, thus getting to the Final Four...
Seeds 5-8 are each expected to win 3 matches, and...
Seeds 9-16 are each expected to win 2 matches.

(If seeding were done down to No. 32, we would know who was expected to win 1 match and who was expected to win none.)

The following figure -- which you make click to enlarge -- summarizes how different schools have fared under this metric, during the past five years (limited to schools that have been seeded at least three times in this period).

As with any statistical analysis, some cautions are in order in interpreting this one:

*Unseeded teams that have had great runs -- such as Santa Clara making the 2005 Final Four -- are not depicted in the chart.

*Teams are, in a sense, "penalized" in these analyses for receiving high seeds. In any given year, the No. 1 national seed cannot exceed its expected number of wins (6), and can only break even, at best. A measurement system that appears to impose an artificial constraint on how high something can be rated is knows as a “ceiling effect.”

No team has faced this situation in recent years any more than Nebraska; the Cornhuskers have received three national No. 1 seedings plus a No. 2 during the five-year span, thus making it impossible or nearly impossible for them to exceed their expected number of wins. On the other hand, there was a lot of room for Nebraska to win fewer matches than expected in a given year. Thus, even with a national championship and national runner-up finish during the years examined, the Huskers still finished with an aggregate minus-6 value. Such a result should be looked at in context and taken with a grain of salt (or in the case of Nebraska corn, with a pat of butter).

To provide some context, I have listed teams' average seedings over the past five years. One suggestion is to compare “+/-” values of teams with similar average seedings. As shown in the top panel of the figure, three teams that have been dealt similar seedings over the five years are USC, Washington, and Hawaii (all of whose average seeds are between 6.2 and 7.0). Clearly, the Huskies have done the best from their starting positions in the field, and the Rainbow Wahine, the worst.

The small sample sizes are another reason for caution, but a five-year window is still preferable to a one-year snapshot.


Pablo said...

Given the approach of only seeding the top 16 seeds, this analysis is going to have to be very limited. As you note, for example, you can't account for Santa Clara's great run to the final four.

Instead of using seeds, why not use something that covers more teams, like Pablo or RPI or even the AVCA poll? For starters, there is no reason to treat the seedings as if they are gospel (When I've examined other divisions, Pablo tends to outperform seedings in terms of predicting winners, or is at least comparable; I've never really examined D1, but I do know that Pablo had Penn St ranked over Stanford (and Michigan over Colo St, and Mich St over Dayton, I think)).

Jill said...

This analysis takes the seedings as given; however, to be honest, I am often at a loss to understand how seedings are determined -- they certainly don't correlate perfectly with the national rankings (which may be a better measure?). For example, this year, Cal received the #10 seed even though they were ranked #7. Moreover, St. John's received the #12 seed, but were ranked #18 in the national polls. (Cal ended up reaching the Final Four, and St. John's the Sweet 16.)
Oh, and that Penn State team? Seeded #3 but #1 in the national rankings.

Additionally, the remaining 48 teams are unseeded and are distributed according to some rule based on traveling distances, etc. (and specifically, not rankings/seedings) which ends up leaving some brackets tough, and others soft, which no doubt has an impact on the outcome of the tournament.

Taking into account such considerations should have an effect on the analysis you propose...