Wednesday, September 24, 2014

"What Wins in the Big 12?"

As some long-term readers of this blog may know, I'm a professor at Texas Tech University and I meet occasionally with Red Raider volleyball coach Don Flora to discuss statistical aspects of the sport and find out what kind of analyses he might be interested in at a given time. We last met this past spring and he told me his big question: "What wins in the Big 12?" I took the meaning of the question to be: what combinations of success at hitting, blocking, digging, serving, etc., were associated with winning conference matches in the Big 12? I told Coach Flora I would have something for him, and proceeded to start thinking about how I would conduct analyses.

Now, with the Red Raiders opening their Big 12 portion of the schedule by hosting TCU tonight, I have the fruits of my inquiry. I first created a database of all 72 conference matches played a year ago (despite its name, the Big 12 has only 10 schools and one, Oklahoma State, doesn't field a women's volleyball team; nine teams playing a double round-robin schedule of 16 matches yields 72 total matches). For each team in a given match, I recorded its hitting percentage; blocks, digs, aces, and service errors per game; whether the team won or lost the match; and the number of games it took. Note that with 72 matches and two teams per match, there were 144 records or "stat-lines" possible.

One very basic comparison, among the techniques used by Penn State coach Russ Rose in his 1978 Master's thesis at Nebraska, is to see how often the team that outperformed its opponent on a given statistic won the match. As shown in the following chart, the team with the better hitting percentage in a match won nearly all the time (68 out of 72 matches). Having more blocks, digs, and aces also conferred sizable advantages, but not as powerfully as out-hitting one's opponent.


Hitting does not occur in isolation, however. Some teams might hit well, but not block well; or hit well and not dig well; etc. To probe this issue, I looked at the 144 stat-lines referred to above. To illustrate a stat-line, let's look at Texas Tech's (focal team) when it visited Kansas:

Hitting Percentage = .192; Blocks/Game = .67; Digs/Game = 17.67; Aces/Game = 1.33; Service Errors/Game = 2.67.

I then submitted the 144 stat-lines to a cluster analysis, a technique that attempts to sort cases (or stat-lines) into groups with other similar cases. In other words, the stat-lines within a group should end up relatively similar to each other (i.e., within-group homogeneity), but the different clusters of stat-lines will be dissimilar from each other (i.e., between-group heterogeneity). I obtained 10 clusters, but two of them had only four cases each, which is too small for statistical analysis. Ultimately, our interest will be in seeing the win-loss records of the eight viable clusters, but let's review some basics first. The following graphic illustrates the membership of Cluster 9, as an example (unless you have exceptionally strong eyesight, you'll want to click on the chart to enlarge it).


Seventeen stat-lines ended up in Cluster 9. Each focal team (to whom the stat-line belongs) is highlighted in yellow, with its opponent for the particular match appearing in the second column. Note that many different schools can appear in the same cluster. We're grouping performances, not teams per se. Averages for the complete sample of 144 stat-lines on the various volleyball performance measures are shown in red above each column.

Probably the funnest aspect of conducting cluster analyses is that you get to make up names for the clusters, based on their statistical properties. As seen in the above chart, I named Cluster 9 "Slightly Above-Average Hitting, Below Average Blocking, VERY GOOD DIGGING, HIGH ACES." The digs/game for the 17 cases are shown above in red outline; they range from 17.33 (Baylor, playing at West Virginia) to 20.50 (Texas Tech, hosting Oklahoma). All of these dig statistics exceeded the complete-sample average of 14.92, illustrating why a major part of this cluster's "identity" would consist of "very good digging." Apparently as a result of the digging, the teams in this cluster went 11-6 in the relevant matches, despite hitting only slightly above average in them (the average hitting percentage for the Cluster-9 teams was .223, compared to a complete-sample average of .214).

I've placed all the detailed statistics on the clusters below in an Appendix, for anyone who is interested (once again, please click on the graphic to enlarge it). In the remainder of this posting, I provide brief summaries of the clusters:

Cluster 1. Plagued by below average digging (12.92/game) and a high rate of service errors (2.71, compared to the complete-sample average of 1.73), teams whose stat-lines were in this cluster went 5-12 in the relevant matches.

Cluster 2. Cases in this group displayed great hitting on average (.253, compared to the full-sample mean of .214). They also served aces at a higher-than-average rate (1.93/game, compared to 1.14 for the full sample), but also committed more service errors (2.20/game) than the overall average (1.73). The kind of seemingly powerful/aggressive play exhibited in this cluster produced a 12-6 record.

Cluster 3. Characterized by below-average blocking (1.49/game, compared to the overall average of 2.19) and few aces (.74/game), cases in this cluster went 6-13.

Cluster 4. Though this cluster contained only four cases, the signs of poor play were quite vivid (e.g., .031 average hitting percentage, 1.08 blocks/game, a paltry 8.42 digs/game). Although caution is warranted due to the small size of this cluster, the results are just as one would expect: 0-4.

Cluster 5. This cluster excelled in most every way (.256 hitting percentage, 3.12 blocks/set, 1.36 aces/set with only 1.56 service errors), except for digging (12.02/game). The focal teams went 10-5 in the relevant matches.

Cluster 6. This group combined weak hitting (.130), blocking (1.04/game), and digging (11.78/game), with apparent caution from the service line (only .75 aces and 1.27 service errors, per game). This is not a pattern to emulate, as the teams went 1-9.

Cluster 7. Cases in this cluster hit at the overall average (.214), blocked (2.47) and dug (16.68) somewhat above average, but also showed caution when serving (.89 aces and 1.32 errors, per game). I would have expected these cases to have a winning record, but they didn't, going 15-16.

Cluster 8. Cases here hit (.268) and blocked (3.58) extremely well, rarely served aces (.58/game), and were pretty average on the other metrics. Dominating the net paid off big, as these cases went 8-1.

Cluster 9. Discussed above.

Cluster 10. The other cluster with only four cases, the teams here played great defense (22.31 digs, and 2.60 blocks, per game) and went 4-0.

In conclusion, to answer Coach Flora's question, there are multiple ways to win in the Big 12 (see Clusters 2, 5, and 8), but they all seem to revolve around great hitting. One way to increase the sample size and achieve greater precision in a future study would be to look at win-loss records of games rather than matches. Box scores typically include team hitting percentages by game (to correlate with the winning of games), but blocks, digs, and serving statistics are only reported for the match as a whole. One final issue is that the present analysis tells us nothing about whether the findings are in any way unique to the Big 12; the same relationship between performance metrics and winning might emerge for other conferences, as well. We just don't know.

Appendix


1 comment:

thevolleyballanalyst said...

I like this post. If you wrote it 2 years earlier, I would probably have referenced it as my whole Masters dissertation was based on this very question.

I did find hitting % as one of the best performance indicators critical to success in volleyball. This reflected pretty much every academic research article on this subject that had attack as critical to success. I concluded that this makes sense because to win matches you need to win games and to win games you need to score 25 points before the opponent does and most points won and lost are done via the attack. This is why the hitting %, which takes into account kills AND attack errors, is one of the best performance indicators related to success. You can read the abridged version on this link: http://thevolleyballanalyst.blogspot.co.uk/2014/07/dissertation-title-identification-of.html

I would also like to note that I think per set averages are not good performance indicators. A team may have a high per set average simply because they had more opportunities to do so. For example, teams that engage in long rallies will likely have a high digs per set average because there are more opportunities to dig. Anyway, here are other reasons why I don't like per set averages, especially when comparing players.

http://thevolleyballanalyst.blogspot.co.uk/2013/08/on-why-i-dont-like-per-set-averages.html

2023 NCAA Women's Preview

Sixth-four teams are alive at the moment, but it sure looks like Nebraska (28-1) and Wisconsin (26-3) will meet for a third time this season...