Rating the rating system 9 0

The problem

The jam was good, but the rating system left me cold. I have a total of three issues with it; let's see if you can spot them.

10
9
8
7
6
5
4
3
2
1

It is obvious right? Ok, I'll give you the first one for free: there are no descriptive words to go with the numbers. Let's fix that.

10 The best ever
9 Amazing
8 Great
7 Good
6 Above average
5 Below average
4 Bad
3 Terrible
2 Abysmal
1 The worst ever

Better. Now everyone has similar expectations of each grade, more so than before. Do keep in mind this is just an example, there are other fitting descriptions and they could even change based on each category.

Second problem? There is no average! In a big jam there are a whole lot of average entries, so-so stuff. With this rating system I'll have to score them slightly better or slightly worse every time. How annoying.

The last problem is more personal than the prior two. In my opinion a scale of 10 has too much granularity. What is the difference between Amazing and Great? How about Terrible and Abysmal? It is not clear and as such we can expect different people to use these grades in fuzzy ways, which muddles the results. In the best case scenario the grade should reflect the (hopefully written) opinions of each reviewer exactly, not close enough.

Suggestions

In light of the prior explanations some propositions are in order. The first one is obvious, simply introduce descriptive words and an average grade to the current system. I'd be fine with that. Next, two more ideas.

7-grade, zero centered model

+3 Amazing
+2 Good
+1 Above average
0 Average
-1 Below average
-2 Bad
-3 Terrible

The old 5-grade model

5 Amazing
4 Good
3 Average
2 Bad
1 Terrible

Conclusions

I rate the rating system 2 out of 5, bad.

Cheers

Comments (9)

DaFluffyPotato • 7 years ago •

Or we could rework the result algorithm to handle the differences in the way people vote. .-.

Wan • 7 years ago •

Thanks for the review! It's interesting how it shows every person has his own approach to rating games.

I like your first suggestion for labeling grades, since it might help people have a more consistent rating scale (this could resolve in part @CMLSC 's issue). I'd also be alright with introducing a usable "zero" grade, to help 5 feel like a real middle ground.

I'm less a fan of the other suggestions, and the main reason why is the problem of granularity. With the Ludum Dare/itch.io 5-star system, most people rarely use the too harsh 1 star grade, and reserve 5 stars to few excellent entries. For the bulk of the games, that leaves us with 3 options, i.e. "Bad"/"Average"/"Good". The first problem is, in an event as large as LD, this was hardly enough to get reliable rankings (i.e. rating 1000 games with most within a range of 25x3=75 stars means a lot of ties).

Even if we decide not too care about the reliability of rankings (which admittedly can never be perfect), there's a lot of wasted potential, because for two games a user has rated "good", there's probably one that he could have said "I liked it more". With a 5 star system, you don't get that information. With 10 stars, you get it a little bit more (see this rating distribution). (Note: I think Ludum Dare is actually planning for half-stars for the next jam)

So, the idea is that finer grades lets people express more accurately which game they preferred. This idea is also the reason why we have that "Manage ratings" page, which we'd actually like to expand to let you edit ratings directly in it (by reordering games with drag'n'drop, or with manual entry). We're even considering letting people finetune ratings as decimal numbers to let the bigger nerds make their own "definitive rankings". What do you think about the idea?

Oh and to react on the 7-grade/0-centered idea, I see the interest it, but feel like it would be less intuitive than scales we knew at school (5/10/20/100 depending on the country…). Letting your friends know that your game rated a "+1.51" average in graphics would just make them raise eyebrows. Though maybe other people would like the idea? I'm open to more dicussion.

HuvaaKoodia • 7 years ago •

@CMLSC
And making it transparent. Nothing worse than an obscure algoritm "correcting" data. In other words: if done right, then worth a shot.

@Wan
Yeah, I thought the other suggestions might not fly that well (I like the 7 grader exactly for its oddity, it makes people think!) I prefer punchy grades as a reviewer, although I did find use for all the 10-grades when calculating overall scores. If low granularity results in many ties in the top entries then, sure, it is not proper. As long as the system is clear about the difference between grades then we are golden.

I did use the Manage ratings page a few times to checks ratings I'd given in the past to keep things consistent. Didn't change any ratings afterwards as that would not have worked well with my visible ratings policy. I can see it being a useful tool for people who review and rate differently though, so it is a good addition.

Decimal ratings on the other hand… I reckon simply ordering entries in the rating manager (drag and drop sounds nice!) would be good enough. Basically a backend preference value is stored when the scores are the same. Then if the end result scores for two or more entries are the exact same, an unlikely event, the backend preference values can be used as a tie breaker (transparently, of course).

If you do push for a visible decimal system, then it has to be a relatively hidden feature indeed. Otherwise everyone and their dog is going to start using it from the get-go resulting in a world of hurt! 100-grade systems… shudders

Way back in elementary we had a weirdo 4 to 10 system. It was never explained why. (Apparently for historical reasons only, go figure).

Thanks for responding.

Wan • 7 years ago •

that would not have worked well with my visible ratings policy

I like this policy, if I wasn't too lazy I would probably do the same. This makes me think… we could probably set up an opt-in feature to let users make their votes public. Hmmm.

Also we're on the same page for the decimal ratings, if present it should not make things too complicated. Reading you inspired me possible implementations, like making drag'n'drop apply hidden, super small decimal changes just to break ties.

HuvaaKoodia • 7 years ago •

@Wan
A public ratings opt-in would be great!

Glad the idea inspired you. Applying small changes to the grades themselves sure would work. Having thought about it some more, a separate order value used only in the case of a tie would simply create silly problems. It sounded good on paper; time to shred that paper.

TimBeaudet • 7 years ago •

I'm not convinced public ratings is a great idea. While I get the idea, since a lot seem to rate publicly anyway; It doesn't solve anything and I always feel bad knowing how other people rated me. It may also add unplanned baises, "they rated me well" will lead to higher than actual scores. Say normally you'd have given a two, that thought may push you into a three, just to spare them. (With or without your own ratings being opted in for public).

Thrainsa • 7 years ago •

I really don't like the public rating idea.
And for the rating, I don't have the same signification, for me a 5 doesn't mean that you are on the average of the other entries but that you made just the minimum of what was expected. So every one can have more than 5 if they have a good game.

Wan • 7 years ago •

@timbeaudet That's a good point. After some thinking I felt that public ratings should not be revealed during the voting phase anyway, because if the practice becomes widespread 1. it could influence other's people ratings a lot 2. it would spoil approximate game rankings. Holding the detailed ratings until the end of the jam would limit the impact of the "rub my back I'll rub yours" thing, but not completely I admit.

@thrainsa Each person has a different scale in mind when rating games, so I'm not that surprised to see Huvaa's differ a lot from yours. You know I'm often even nicer than you ;) Everyone still gives better grades to better games so I tend to think all approaches are valid… But the luck aspect of people getting more or less harsh voters is indeed a concern. Maybe giving labels to grades (Huvaa's first proposal) would help?

HuvaaKoodia • 7 years ago •

@Thrainsa

I really don't like the public rating idea.

As an option no-one is forcing you to use it. The potential for bias is there, if you do have ratings on your own entry of course. I only talk from the perspective of someone making unranked entries.

for me a 5 doesn't mean that you are on the average of the other entries

In any big jam it is probable due to the massive amount of entries that most of them are average. Obviously if I only played, say, 3 great entries, then all of them would receive great grades too. Middle of the road projects get average ratings, regardless of the other entries in the specific jam. Sure enough, I've played thousands of titles so maybe that is easy for me to say.

In any case, people having different signification for each grade is a big problem as ratings between 0 and 10 then lose meaning. You might get lucky and receive a bunch of ratings from people who never rate below 5, or you might not get so lucky. If everyone did use 5 as an average then those in-between ratings would be more specific accross the board as a result.