Friday, August 25, 2017

spreadsheet blues: how i wish the "rank" function worked

Sometimes I wish the "rank" function in Excel allowed for multiple dimensions (i.e., a tiebreaker).

Figure 1: A simple use of the rank function
Excel's rank function is used to indicate where a number ranks within a range of numbers. The syntax is "RANK(item,range)." Technically, there's a third parameter to indicate whether the rank is based on descending or ascending order -- but that's neither here nor there. A simple application of this function is illustrated in Figure 1. There are four people, listed with their height. Column C indicates each one's rank within the group. The formula in C2 will produce the number 4 when you enter it.
Figure 2: The rank function when there's a tie
But what if there's tie? In that case the two tied items share the higher rank, as shown in Figure 2. Here there are two people who have the same height. Excel assigns them the same rank (in this case 1). The next tallest is ranked 3, so 2 is skipped altogether. If I recall correctly, I used to use a version of Lotus 1-2-3 that would have assigned the two items a rank of 1.5. Either approach makes sense.

Figure 3: Rank determined with weight as a tiebreaker
Ranks are hardcoded, not calculated
Figure 4: Rank determined by a function of two items
But suppose I don't want to have a tie. Suppose I want ties to be broken on the basis of a second measure, say height. See, for example Figure 3. I added another column for weight, which I've decided is the tiebreaking criterion because of reasons. But I had to hardcode the rank because the rank function doesn;t allow me to include a secondary item to settle ties. I'd love to have rank use a syntax such as "RANK(item1,range1,item2,range2...itemx,rangex)." In the example of Figure 3, the formula in cell D2 would be "RANK(B2,B$2:B$6,C2,C$2:C$6). Heck, let it require parameters to indicate order within each range, if that'll make it work.

The fact is, there is a way around it -- create a new item that's a function of the primary item and the tiebreaking item. In figure 4, I created an item called "rank determiner." That's calculated by multiplying the height by 1000 and adding the weight. Then rank is calculated based on the "rank determiner." If you're worried about aesthetics, you can hide the column. It works, but there are drawbacks. For starters, you have to be careful with the function you create;he function I used may not work right if some of the people weigh more than 1000 pounds.

I have come across another drawback since I am using the rank function on several years of data, to see how ranks change over time. Specifically, I am looking at the cumulative win totals for every major league baseball team ever. (spoiler alert: The Marlins aren't on top). As a tiebreaker, I am using cumulative games above .500 (which is the equivalent of using losses in descending order). But since I am graphin the ranks over time, I have a grid of ranks, which is based on a grid of win totals. In order to use my tiebreaker, I need to have a grid of games over .500, and a grid of a "ranl determiner" function, which I've defined as wins×1,000,000 + GamesOver500. It's doable, but it makes things less wieldy.





No comments:

Post a Comment