Monday, September 15, 2008
The women's college volleyball season has now been going on for a few weeks, so it's time to jump in with some statistical analysis. To mark the occasion, I've scanned some schedule posters I've collected and displayed in my office over the years and edited them into a collage. I hope you like it!
Competition thus far has been exclusively of the nonconference variety, so lacking conference standings, we have only the national polls to judge which teams are doing well.
Today's entry seeks to get inside the heads -- indirectly, of course -- of voters in the September 8 poll of the American Volleyball Coaches Association. The poll presents a Top 25, but also reports voting points (i.e., 25 points for a first-place vote, 24 for second, etc.) for additional teams ("Others Receiving Votes and appearing on two or more ballots").
We thus have voting point totals for 37 teams, from No. 1 Penn State's 1,500 points down to unranked Georgia Tech and Arizona, each which 3 points. What factors might voters be using when they submit their ballots? Using the technique of multiple regression, I examined how well the poll rankings could be reproduced from knowledge of three factors:
*Success in last year's NCAA tournament (1 point for each match won, from 0 for a team that lost in the first round to 6 for the championship team, Penn State; teams that did not make the tournament last year received a -1). Many voters may subscribe to the idea that a championship team (or other historical powerhouses) should continue to be ranked highly until displaced by other teams. Using last year's NCAA success as a predictor reflects, in part, this philosophy.
*Number of returning starters. To the extent that a team has solid, experienced players, that is a plus. I should note that on the volleyball pages of many schools' athletic websites, the reported number of returning starters was not that easy to interpret. Whether a team's libero should be counted (I did not count them) and what to do about teams that may have had more than six players start last year are the main ambiguities. One policy I adopted was not to give a team a value greater than six returning starters.
*Number of wins vs. Top 25 opponents this current season. One obvious way to attain (or retain) a high position in the rankings is to play against and defeat top competition. This measure is somewhat imprecise, as a win over the No. 5 club (for example) is treated the same as a win over the No. 23 squad. Still, wins over Top 25 teams constitute a simple measure, whose usefulness will be determined by the analysis. Anecdotally, St. Louis University's win over then-No. 3 Stanford has propelled the Billikens from 7 voting points in the September 1 AVCA poll to 134 points and just outside the Top 25 in the September 8 poll.
Multiple regression provides several pieces of information. First, the three independent variables of 2007 NCAA wins, number of returning starters, and number of 2008 wins over Top 25 opponents, collectively accounted for 70% of what's known as the variance in the dependent variable of voter points in the September 8, 2008 AVCA poll (R-square = .699, adjusted R-square = .672). In other words, to the extent the teams vary in their vote totals, from Penn State down to Georgia Tech and Arizona, 70% of the amount of difference can be accounted for by the present statistical model.
We can also look at how useful each of the independent variables was individually in helping reproduce the points in the ranking poll. Two variables were statistically significant (i.e., substantial enough in their relationship to ranking points that the results would be unlikely to be due to chance, with p < .001 in both cases).
Because of the small sample size and crudeness of some of the measures, I wouldn't take the following numbers from the regression equation overly seriously, but here they are. For each round a team advanced in last year's NCAA tournament, it would receive about 157 extra points in the ranking-poll voting. Further, for each win over a Top 25 opponent, a team would receive an added 285 points. For those readers with a background in regression analysis, the standardized Betas were .547 for 2007 NCAA tournament matches won, and .460 for number of 2008 wins against Top 25 opponents.
Number of returning starters was not a significant predictor, so we cannot reject the null hypothesis of zero impact for that factor. Because most of the teams returned most or all of last year's starters, there was little variation on this item, which reduces its predictiveness.
If you're a fan of a particular team, there's nothing you can do about last year's NCAA tournament. However, with all appropriate cautions about drawing causal inferences, you should try to root your team on to victories over Top 25 opponents over the next couple of months, as that is likely to move your team up in the national rankings and keep it there.
Hawai'i swept Long Beach State last night in Los Angeles to win its second straight NCAA men's championship. Scores were 25-22, 25-...
Two years ago, I created a very simple prediction equation for the NCAA women's tournament. Each team gets its own value on the predicti...
I was invited once again this year to vote for the Off the Block men's collegiate volleyball awards . The number of awards has increased...
With this year's NCAA women's Final Four getting underway Thursday night in Seattle, today's posting offers some statistical obs...