Monday, December 24, 2007

Does Momentum from Previous Games Predict Who Wins Game 5?

Happy holidays to everyone!

Today's entry looks at whether, in a five-game match, the winner of the fifth game can be predicted by the pattern of how the first four games have gone. For example, if one team wins the first two games, but the other team wins the third and fourth games, one might expect the latter team to win the fifth game, owing to its momentum from Games 3 and 4. This line of reasoning led many to expect a Stanford win in Game 5 of the recent NCAA women's final against Penn State, but it was the Nittany Lions prevailing.

In the analysis below, I looked at 2007 within-conference matches from four major women's conferences: the Big 10, Big 12, Pac 10, and SEC. As can be seen, in the 34 total matches in which one team (represented by "A") had won the first two games and the other team ("B") had rebounded to win Games 3 and 4, Team A won Game 5 -- and the match -- somewhat more often than did Team B, 19 compared to 15.

Another scenario involves single-game alternation, that is, Team A wins a game, then Team B, then A, and then B. In this case, Teams A and B had virtually identical probabilities of winning the decisive fifth game.

Lastly, a situation can arise in which Team A wins Games 1 and 4, and Team B wins Games 2 and 3. Here, Team A was nearly twice as likely to win Game 5 (15 occurrences) than was Team B (8 occurrences). If the underlying probability of either team winning the fifth game were .50, the probability of one team winning 15 (or more) times out of 23 would be .105, assuming independence of observations, like coin-flipping (see here for an online binomial calculator). This result renders a 15-of-23 result unlikely to arise from an underlying 50/50 distribution, but it does not achieve the conventional .05 statistical significance level necessary for rejecting the null hypothesis of a 50/50 underlying distribution.

The number of matches studied above comprised a relatively small sample, so additional data from the upcoming men's collegiate season and from next year's women's season will be useful for strengthening the analyses.

Another place to look is men's professional tennis, the major tournaments of which use a 3-out-of-5-set format. There are differences, to be sure, between tennis and volleyball, including one being an individual sport and the other, a team sport. Still, momentum-related phenomena may transcend particular sports.

I found an online article that looked at all matches from 1995-2004 in the four Grand Slam tournaments (Australian Open, French Open, Wimbledon, and the U.S. Open).

The tennis article used a different notation than I did, but the formats are analogous. Below are listed the number of occurrences of each outcome:

WWLLW (like AABB-A) 151
LLWWW (like AABB-B) 188
p = .025

The first of the three comparisons was significant, leading us to reject the null hypothesis of a 50/50 distribution of fifth-set outcomes, when one player has won the first two sets and the other, the next two. The player coming back from 0-2 won five-set matches significantly more than 50% of the time. This result is consistent with the "Stanford momentum" line of thinking in the context of the NCAA volleyball final.

WLWLW (like ABAB-A) 135
LWLWW (like ABAB-B) 156
p = .120

Under the single-set alternation scenario, we cannot reject a 50/50 distribution of outcomes, as was the case for volleyball.

WLLWW (like ABBA-A) 186
LWWLW (like ABBA-B) 138
p = .004

As with the volleyball analysis, in tennis the player who has won the first and fourth sets is substantially more likely to take the fifth, than is the player who has won the second and third sets.

I don't know about anyone else, but staring at these notations makes me want to listen to some music by the group ABBA.

UPDATE December 28: After I published the above write-up, I posted a message at the VolleyTalk discussion site to let people know about my analysis. Among the string of messages, a few VolleyTalk readers posted the results of volleyball analyses they had done previously. The following tabulation, by "p-dub," was most on-point (you can click on the chart to enlarge it):

The chart shows what percent of the time Team A wins, under each of the three distributions of wins in Games 1-4. For any given Team-A winning percentage, you can take (1 - p) to see what percent of the time Team B wins under the relevant configuration.

Although the deviations from .500 are small in p-dub's extensive samples, once again the "A" team in the "ABBA" sequence has an increased chance of winning.

Friday, December 21, 2007

Subtlety and Context in Interpreting Volleyball Stats

Over at the VolleyTalk discussion website, user “38 Skynyrd” started a topic the evening of December 15, after the Penn State-Stanford NCAA title match, about volleyball statistics. The initial salvo in the discussion essentially argued for more subtlety and context in interpreting volleyball statistics, given that:

“…absolute raw numbers in the box score do not always tell the true story of how good or bad a player did in a given game or match.”

The full set of messages is available here. A number of suggestions were made by the discussants for new statistics. Given the obvious relevance of this discussion for our mission here at VolleyMetrics, I have excerpted a number of these ideas (with the author of each one credited in parentheses). These are shown below:

“…a hitter may have hit for good numbers overall, but if they made 5-6 hitting errors at critical points in the match, then their overall hitting percentage may still look good in the box score, but they still had a crappy match.

There are also a lot of stats that aren't tracked in the box score. Missed blocking or defensive assignments, poor choices in out-of-system plays, crappy free ball passes. A blocker could not block a single ball during an entire match, yet that player's coach could praise her for having the most incredible blocking match of her career if she made every single blocking assignment and move correctly, and took away what she was assigned to take away, and the back row defenders end up with 300 digs” (38 Skynyrd)

“…a blocked attack is not the same error as an outright hitting error. And there's no way to tell what a "0" attack led to. Did it throw the opposition out of system or did it not stress them at all? Was that hitting error caused by the block?” ([R]uffda!)

“One of the easiest ways to stat an individual player's performance during a match is to assign a +1, 0, -1 scale to each touch that they get on a ball. For instance,

+1 = excellent pass
0 = marginal pass (i.e. passed to 10-foot line)
-1 = bad pass, aced or shank

+1 = kill
0 = ball kept in play by opponent
-1 = hitting error (blocked, out-of-bounds, into net)

If [you] grade each touch a player makes [and average the scores], by the end of the match you'll have a score that is between -1.0 and +1.0.

It can help judge an individual players performance better, but it's not weighted for situational dynamics (i.e. a +1 contact when the score is 0-0 should not be weighted the same as a -1 contact when the team is down 23-29). It also doesn't address bad plays when the player is not touching the ball - such as out-of-position on defense, missed blocking reads, getting in another player's way on the court, etc.” (38 Skynyrd)

“…there are still important many areas of play which are not covered in the stats for volleyball. IMO [In my opinion], the most important of these are passes which do not hit the target, missed blocking assignments (i.e., failure to close the block) and missed digs in the back row” (Traveler5)

“One stat that is completely objective and I think could be useful would be what % of a player’s attacks are handled and then killed for a point by the other team. So if the other team digs your attack, and winds up getting a kill on that 3-touch sequence immediately following your attack, that would count negative, and if they don’t score, either by sending over a freeball or having a legit attack dug up, it counts towards you…

You could also… break down how many blockers were on each of a player’s attacks, and how well they hit against 0, 1, 2, and 3 blockers. This could also be used by setters to see the average number of blockers their hitters had to go against.

Front row vs. back row would also be nice, since a player good enough to get a lot of back row swings is probably going to hurt their % some since that is a lower percentage shot” (Chance)

Suggestions for refinements of dig statistics: “…opportunities matter. Thus, I have calculated the "dig %", which is digs/non-error attacks (it doesn't make sense to penalize a team for not digging a ball that is blocked or out)… [and] digging compared to opponents… difference between digs and opponents’ ” (p-dub)

I and another user, Mike Garrison, both alluded to improvements in baseball statistics that have been fueled by the “sabermetric” movement. Some responses:

“Baseball is helped by the fact that it is such a discrete game. You are either out or safe. You are either at first, second, or third base. A pitch is either a ball or a strike. Etc.

The weakest part of SABRmetrics is the fielding stats… A player can just barely miss a ball or not really come close to it at all…

Most other sports, including volleyball, are more like baseball fielding than hitting or pitching. So I would start by looking more at fielding stat methods than at hitting or pitching methods.” (Mike Garrison)

“Even more important is the independence of events. Whereas the events in baseball are not completely independent, they are far closer than any sport. If a batter is batting with a runner on first, you generally know where that runner is going to be at any time... None of this can be said for volleyball (or most other team sports). Players are all over the place, and offensive sets can vary significantly.

The best example of baseball's stats independence is in offense and defense. Baseball (and like sports) is the only sport where a great defensive play does not improve offensive opportunities. In all other sports, the defense can help the offense. In volleyball, great defense can slow the opponents attack, and we always hear about "transitioning." Shoot, the defense can even score directly on a block...” (p-dub)

“[Unlike baseball, where the pitching rubber and batter’s box constrain the positions of the key players...] Volleyball is a game where the ball is struck from almost anywhere on the court, and there isn't enough information available…

My biggest pet peeve of the [n]umber crunchers are those who tout ridiculous stats or won-loss records without regard for the average competition faced” (Bear Clause)

Where feasible, I will consider gathering the data to produce the statistics suggested above. I invite you readers to do the same!

Sunday, December 16, 2007

Top NCAA Women's Programs 2003-07, Relative to National Tourney Seedings

With another women's NCAA Division I season on the books -- Penn State having defeated Stanford in a five-game final -- now is a good time to take stock of how the nation's leading programs have been doing in NCAA play in recent years.

As one option, we could look at which schools have been winning championships and making the Final Four. Looking at the last five years, we would find many "usual suspects," such as Stanford, Nebraska, Washington, USC, and Penn State; even Minnesota, whose icy cold locale doesn't necessarily suggest volleyball greatness, has made two Final Fours in this timeframe.

A more subtle approach, however, is to look at which teams have done best relative to their seedings. Such an analysis can tell us which teams raise their games come tournament time, compared to what their regular-season performance would have suggested. A team that comes into the NCAA tourney as a No. 16 national seed, for example, would be favored to win two matches, until becoming an underdog against the No. 1 seed. If the No. 16 team were instead to win four matches, it would receive a +2 difference score, reflecting the magnitude of its "overachievement" (that a team may have underperformed during the regular season would also be compatible with a positive difference score).

Conversely, a team that compiled a negative difference score (i.e., winning fewer matches than expected from the seeding) would suggest underachievement in the post-season.

Because weird things can happen in any given tournament -- witness this year's early exits by Nebraska, Washington, and Wisconsin -- I felt that aggregation over the past five years would be helpful.

The difference-score approach is not original to me. In the mid-1990s, I saw someone apply it to NCAA men’s basketball. Also, it follows the logic of the chi-square statistic, in terms of taking the difference between actual (observed) and expected counts. Let's start by going over how many matches a team is expected to win in the NCAA tournament, based on its seeding:

The No. 1 national seed is expected to win 6 matches, thus capturing the title...
The No. 2 national seed is expected to win 5 matches, thus making the final...
Seeds 3 and 4 are each expected to win 4 matches, thus getting to the Final Four...
Seeds 5-8 are each expected to win 3 matches, and...
Seeds 9-16 are each expected to win 2 matches.

(If seeding were done down to No. 32, we would know who was expected to win 1 match and who was expected to win none.)

The following figure -- which you make click to enlarge -- summarizes how different schools have fared under this metric, during the past five years (limited to schools that have been seeded at least three times in this period).

As with any statistical analysis, some cautions are in order in interpreting this one:

*Unseeded teams that have had great runs -- such as Santa Clara making the 2005 Final Four -- are not depicted in the chart.

*Teams are, in a sense, "penalized" in these analyses for receiving high seeds. In any given year, the No. 1 national seed cannot exceed its expected number of wins (6), and can only break even, at best. A measurement system that appears to impose an artificial constraint on how high something can be rated is knows as a “ceiling effect.”

No team has faced this situation in recent years any more than Nebraska; the Cornhuskers have received three national No. 1 seedings plus a No. 2 during the five-year span, thus making it impossible or nearly impossible for them to exceed their expected number of wins. On the other hand, there was a lot of room for Nebraska to win fewer matches than expected in a given year. Thus, even with a national championship and national runner-up finish during the years examined, the Huskers still finished with an aggregate minus-6 value. Such a result should be looked at in context and taken with a grain of salt (or in the case of Nebraska corn, with a pat of butter).

To provide some context, I have listed teams' average seedings over the past five years. One suggestion is to compare “+/-” values of teams with similar average seedings. As shown in the top panel of the figure, three teams that have been dealt similar seedings over the five years are USC, Washington, and Hawaii (all of whose average seeds are between 6.2 and 7.0). Clearly, the Huskies have done the best from their starting positions in the field, and the Rainbow Wahine, the worst.

The small sample sizes are another reason for caution, but a five-year window is still preferable to a one-year snapshot.