Your Users Aren’t You

As with any review site, we had to deal with user ratings. In our case, our users rated a place between 0 (bad) to 10 (great!).

One straightforward approach to aggregate ratings is to use a simple average. But that comes with an inherent problem. The moment something new is given a 10, it shoots right to the top.

So we naively decided to use a Bayesian average based rating system instead.

To paraphrase Wikipedia:

In a calculation of an average review score of a book where only two reviews are available, both giving scores of 10, a normal average score would be 10. However, as only two reviews are available, 10 may not represent the true average had more reviews been available.

The review site may instead calculate a Bayesian average of this score by adding the average review score of all books in the store to the calculation. For example, by adding five scores of 7 each, the Bayesian average becomes 7.86 instead of 10.

The results were great. Our top 20 list was laid out exactly as we expected.

After all, a rating is a relative representation of quality. It only makes sense among other ratings. The Bayesian average gave our users a more accurate indication of a place’s quality.

Then suddenly, and almost unexpectedly, our users started to rage. Some questioned our integrity. Some lobbed conspiracies at us.

Why were we on the receiving end of a backlash, when all we did was to improve our users’ experience with more accurate ratings?

It turns out we didn’t. Our users were wondering why that amazing new place was only still a “7” when they were giving it “10s”. Or why that crappy one was rated a “4” despite having received ten “0s”.

A Bayesian average makes sense conceptually on a macro level, but not at first glance. When you rate an unrated place a “10”, you expect to see a “10”, not a “7”. Our rating system just didn’t fit the typical mental model of ratings.

The system works better once a place has larger number of ratings, as users don’t expect their contribution to nudge the overall rating by much. New places typically have a low number of ratings, so the effect of a Bayesian average becomes too obvious for comfort. In the end, we tweaked our rating system to better suit our users’ mental model.

People always expect feedback with any action they perform. If the feedback strays too far from expectation, it affects their understanding of what they just did. And that, in turn, degrades their experience.

In product design, perceived logic often trumps conceptual logic. No matter how conceptually awesome an esoteric feature may be, you have to always keep your users’ perception in mind. Your users aren’t you.

 
45
Kudos
 
45
Kudos

Now read this

Why does spec work still exist?

Dribbble’s position on spec work: Dribbble’s inbox is filled with requests from companies wanting to run contests that leverage the creative pool to crowdsource their product needs. We tell them about spec work and let them know it’s not... Continue →