A Discrete Choice Face-Off: Q Sort vs. MaxDiff
It is common for product managers, developers, engineers, marketers, owners, you name it - to want to know how their users/customers/shoppers/etc. prioritize a list of items. That list might include potential features for the next product release, benefits of the product/service, and maybe even - flavors of cookies.
If you’ve been to a shopping mall or grocery store lately, it is likely that you have seen our friends, the Girl Scouts. The cookie selling program is the ‘largest girl-led entrepreneurial program in the world’, says the organization. And as a female entrepreneur myself, I dedicate this week’s discrete choice take to them. You go girls!
Now, while I talk a lot about the benefit of MaxDiff, or Best-Worst Scaling (Finn & Louviere 1992), when it comes to prioritizing lists, there are actually other methods that can work well too! And one method that doesn’t get enough attention is the Q-Sort methodology (Stephenson 1953) . While there are pros and cons to each, I wanted to share an overview and let you decide. And what better topic to do this with than Girl Scout Cookies!
The Q-sort methodology works very well on a mobile device and there are no advanced analytics necessary for analysis. In our example, we have nine girl scout cookie flavors. In this case, a respondent would go through multiple steps where they:
Select the cookie they would like the Most (In a list of 9, this is #1)
Of the remaining cookies, select the next two they would like the Most (In a list of 9, these become #2/#3)
Then we shift gears and ask, of the remaining cookies, which one would they like the Least (#9)
Of the remaining, select the next two cookies they would like the Least (#7/#8)
Then, the remaining three cookies are auto-coded as #4/#5/#6
Need a visual? Take the survey here or see the images below.
We can then assign points to each of the items in the five buckets, resulting in clear differentiation among the items. For example, we might code #1 as a 5, #2/#3 as a 4, the three non-chosen items #4/#5/#6 as a 3, the two next least #7/#8 as a 2 and the very least #9 as a 1. Then we can compare simple statistics for each of the nine items across the sample.
PROS of Q-Sort
Easy for a respondent
Faster than MaxDiff
Great when the list of items is between 7-20
CONS of Q-Sort
Because we force respondents to put a specific number of items into a specific number of categories (1-2-3-2-1 in our case), Q-Sort imposes a quasi-normal distribution on the responses, instead of a uniform distribution.
Becomes more difficult as the list of items grows longer.
Which brings us to our friend, MaxDiff. In a traditional MaxDiff exercise, a respondent may see three to five items at a time and would specify the item they like the Most and the one they like the Least. They would complete multiple questions like this, each showing a different subset of items conforming to a well-balanced experimental design where, preferably, every respondent sees every item at least three times. In our girl scout example, respondents complete eight MaxDiff questions, each with three girl scout cookie flavors. See the image below, or again, take the survey here.
PROS of MaxDiff
Complete ranking of all items at the individual level
Easy for respondents to complete versus rating/ranking large lists
Handles large lists of items (even up to 2,000 in Sawtooth Software’s MaxDiff module)
CONS of MaxDiff
Usually requires advanced statistical modeling like hierarchical Bayes (HB) (however, comparing counts of Best to Worst can work well too!)
Takes longer for a respondent to complete
So what do the experts say? Well, Chrzan and Golovashkina (2006) found that Q-Sort and MaxDiff produced more highly discriminating measures than other methods like importance ratings, unbounded ratings, magnitude estimation, and constant sum. This is likely because both Q-Sort and MaxDiff impose trade-offs into the prioritization process. Another great feature of both methods is that we can conduct them with smaller sample sizes than the others and still draw meaningful conclusions. However, when forced to choose between the two, Chrzan and Golovashkina found that MaxDiff did outperform Q-Sort in terms of predictive validity and MaxDiff’s ability to uncover more differentiation among items. You can read their full paper here.
At the end of the day, it’s up to you to decide as there are Pros and Cons of both methods and levels of comfort with the analysis. However, a few key recommendations for your next project.
If you use a Q-Sort approach, make sure your items appear in random order across respondents, but keep that order consistent within respondent.
Both Q-Sort and traditional MaxDiff assume that all items are actually relevant and important (or in our case, appetizing) to all respondents. However, we know that isn’t always the case so use an anchored MaxDiff approach or a constructed list within the Q-Sort that removes the items that have no value prior to the exercise, giving you a better line of sight as to which items are actually important/preferred/appetizing.
Let us know which method you prefer in the comments below!
Chrzan, K., & Golovashkina, N. (2006). An Empirical Test of Six Stated Importance Measures. International Journal of Market Research, 48(6), 717–740. sci-hub.se/10.1177/147078530604800607