In defense of the number: a note on video game review scores

 Pages 1 2 3 NEXT
 

Many people dislike the idea of assigning numerical ratings to video games. The prevailing attitude among games industry pundits and developers alike seems to range from indifference to outright hostility. Popular game review sites such as Kotaku, Rock Paper Shotgun, Ars Technica and The Verge forgo numerical ratings entirely. Eurogamer has recently decided to dispense with the traditional 10-point scoring system [1], as did Joystiq shortly before closing its doors [2]. Chief among the complaints about review scores are that they discourage meaningful discussion of games, and that they aren't sufficiently nuanced to be an effective tool for reviewers [3]. Other industry professionals prefer to direct their ire towards the practice of review score aggregation on websites such as Metacritic and GameRankings. Have a listen to old Sessler rave about how evil Metacritic is tearing the industry apart [4]. TotalBiscuit also isn't a fan, to put it mildly [5].

On the other side of the table, plenty of game consumers don't see a problem with including scores as a component of written reviews. Aggregated ratings pages are viewed as as a helpful, if not completely definitive, source of information about present and past titles. Games released with serious technical deficiencies will almost certainly find themselves on the business end of a numerical beatdown from critics and users alike. In fact, if attempts at coercion and bribery by publishers are even half as pervasive as games journalists tell us, turning to aggregate ratings for a reliable appraisal of game quality is a perfectly sensible course of action [6]. I've also seen many comments on discussion boards that boil down to "find a critic that works for you", which suggests it isn't so much a matter of the particular review format being employed as it is the tastes of the reviewer. Currently, it seems there is still some degree of support for review scores among game critics [7] [8], while for others it comes across as a calculated business decision [9].

So what exactly is the problem with review scores? I find the artsy-fartsy answer of "video games, like any form of art, are far too complex for their merits to be reduced to a number" less than convincing [10]. A more cynical view is that taking a non-committal approach to reviews merely serves as a way for professional critics to alleviate themselves of the embarrassment of recommending the next big stinker, or perhaps as a way to make criticism more palatable to site advertisers with a vested interest in game sales [11]. This isn't quite right though; many game reviewers adopt an even more committal approach than writing down a number, as explained below.

I think it's misguided to point fingers at neutral aggregators like Metacritic, but I can understand why some eyebrows were raised at the discovery that game developer bonuses had, at least in one instance [12], been tied to critic review scores. What I find indefensible is the idea that one can take a principled stand against review scores while at the same time being perfectly happy to issue binary recommendations of the form 'buy / don't buy' or 'play / don't play'. Kotaku, Ars Technica, and now Eurogamer explicitly engage in this practice, to say nothing of the propensity of unscored reviews to all but club the reader over the head with a final verdict. The reason I consider this position absurd is simple: a score more astutely clarifies the relative value of the pros and cons discussed by a reviewer than a binary or ternary assessment. That is to say, if we gauge various forms of review by the amount of information they convey, the 'yes / no' recommendation ranks at the bottom. It's about as nuanced as a chainsaw wielding medical intern who complains that the surgical instruments aren't sharp enough.

image

Meanwhile, back in the camp where nuance isn't merely a philosophical construct for peddling opinions that are anything but, there are those who aren't as dogmatically opposed to numerical ratings but feel as though the scoring system is flawed. The main point of contention is that scores on a 10-point or 100-point scale are artificially skewed towards the 70%-100% region [13] [14], and any game rated below 70% is more likely to be found in a GameStop bargain bin than a console disc tray. Thus, there is a perceived incompatibility with the traditional 4-star system in which scores are (presumably) more evenly distributed across the scale, and where 2/4 truly does stand for average quality [15].

Well, if this 'tight scale makes for honest scores' argument holds water, surely we should be able to find some evidence to support it. The problem is that video game reviewers who employ a strict 4-star or 5-star grading system (strict = no half-stars) are a rare commodity. Among sites that could be classified as well-established, I count only Giant Bomb. So, while I'll grant that a single data point doesn't prove the general case, this particular data point might be viewed as an important one by those familiar with the origins of Giant Bomb [16]. Illustrated in the bar plot below is the probability distribution of review scores for Giant Bomb and The Escapist based on console and PC games reviewed over the past five years [17]. Note that Metacritic uses proportional scaling to convert to a 100-point score (e.g. 3/5 = 60/100, 9/10 = 90/100). Rather than the vaunted uniform distribution across the scale, it can be observed that Giant Bomb's scores are about as heavily concentrated towards the high end as The Escapists'. The average ratings of 72 and 74 are also very similar. Even if, in a foolish attempt to appease the "five stars doesn't translate to a perfect score!!!" mouth breathers, we shift the score distributions left by a half-interval (subtract 10 points for Giant Bomb and 5 points for Escapist), what remains is that Giant Bomb rates 61% of games above the midpoint of its scale (3/5 stars) while just 13% fall below this mark.

image

There's no denying the fact that video game ratings lean heavily towards the upper end of the scale. You need only browse the publication lists on Metacritic to discover an industry-wide scoring average currently sitting at 74, a figure that the vast majority of individual publications fall within +/-10 points of. Compare that to an overall average score of 62 for movies, 65 for television and 72 for music.

To gain a better appreciation of the status quo, the table below contains a statistical summary of review scores from some prominent gaming critics for a large selection of console and PC titles [18]. Average scores are in the range of 68-78. The cumulative probability data conveys the distribution of each critic's review scores across the scale [19]. Some useful observations can be made. For example, the probability of scoring a game at 70 or lower is between 0.33 and 0.52 for most publications, but it goes as low as 0.24 (Game Informer and GameTrailers) and as high as 0.65 (Edge Magazine). The probability of scoring at 50 or below is typically between 0.09 and 0.17, but this can be as low as 0.05 in the most extreme case. Perhaps most alarming is the inclination of certain critics to use the 81-100 region of the scale for half of all games they rate (take 1.0 minus the cumulative probability value in the 80 column), whereas most gamers would agree that 81-100 territory should be reserved for truly top notch efforts [20]. All told, the results serve as further confirmation that nearly all of the action takes place in the top half of the scale.

CriticSamplesAverageCumulative probability at score:
30405060708090100
Destructoid747740.040.080.130.200.370.630.901.00
Edge Magazine548680.040.080.190.400.650.880.991.00
Eurogamer767710.040.090.160.290.520.790.961.00
Game Informer1090780.010.020.060.130.240.540.891.00
GameSpot1334720.010.060.120.240.450.790.991.00
GamesRadar746730.030.060.140.260.470.760.951.00
GameTrailers641780.000.010.050.110.240.470.881.00
Giant Bomb506720.020.130.130.390.390.870.871.00
IGN1476750.020.040.090.190.330.590.921.00
Joystiq638730.030.100.170.290.450.730.921.00
PC Gamer347740.010.030.090.180.330.590.941.00
Polygon386710.040.080.170.270.480.720.951.00
The Escapist488740.020.070.120.280.430.770.911.00
VideoGamer672730.030.060.120.230.480.780.971.00

How to explain these findings? Has the sphere of professional game critics gone so thoroughly mad with corruption and fanboyism as to be incapable of delivering anything resembling an honest appraisal? I think not. At least not entirely, and allow me to explain why. First of all, there is a positive bias with respect to what quality of game even registers as a blip on the radar of reviewers. If it isn't a big studio release backed by marketing or an indie title blessed by IGF or IndieCade, it generally doesn't receive a mention let alone a dedicated review. There isn't necessarily anything insidious about this state of affairs; I'd wager that at least a few critics regularly sample low budget offerings only to be reminded of why they don't more often. Mind you, I don't assert that marketing buzz is an accurate predictor of game quality, only that the subset of games with enough traction to garner attention from reviewers is, statistically speaking, of above average quality [21]. As support for this claim, consider the following graph of average critic rating (Metascore) as a function of the number of critics. The best linear fit to the data suggests that Metascore increases by an average of 11.7 points as the number of critic reviews goes from 4 (little attention) to 100 (massive attention) [22].

image

Which brings me to a second, albeit related, point of discussion. A robust rating system can accommodate not only the good and the bad, but the extremes thereof. As a former dabbler with Microsoft XNA which led to a minor obsession with XBLIG games, I discovered a degree of awful that simply can't be overcome by normalizing to a price point approaching zero. However, much like the stereotypical dumb jock who brings home a report card of failing grades, an overhyped video game featuring insipid gameplay in a drab universe still manages to answer half the questions correctly [23]. The gamer who encounters a review score of 5/10 or 50/100 is no more deceived about the game's quality than the parents of the 50%-average student are deceived about his intellectual prowess. Because they have a feel for the scale and don't need it excessively dumbed down into compartments labeled 'bad / fair / good'.

In conclusion, I can't help but think the fixation of certain game critics with compactness of scale is misguided. And I'm not the only one to recognize that trading a high resolution scale for one with fewer notches doesn't immediately solve the problems inherent in judging video game quality [24]. Major games review sites, operating under the guise of addressing the needs of their readers, are all too eager to trumpet their own special flavor of review format, apparently unaware that assigning different names to two or three rating categories doesn't suddenly make you innovative. Coupled with a growing aversion to the 10-point system, you might say that professional game critics are collectively struggling with their own variant of 'not invented here' syndrome, a phenomenon well known to those in the software industry. Barring a widespread outbreak of sanity, I'll just be sitting here waiting for the next super-duper-friend-of-consumer recommendation system to come along and caress my prefrontal cortex into submission with all the finesse of a day-one DLC offering.

[1] Eurogamer has dropped review scores - Eurogamer - Oli Welsh - Feb 10, 2015
http://www.eurogamer.net/articles/2015-02-10-eurogamer-has-dropped-review-scores
The article lays out Eurogamer's rationale for dropping the 10-point scoring system in favor of a 3-level 'Essential / Recommended / Avoid' system. Says Oli Welsh: "This hasn't been the first time we've discussed dropping review scores. In the past, the case we've made for it internally has always been that a number is a very reductive way to represent a nuanced, subjective opinion, and that the arguments started by scores aren't productive." Welsh goes on to state: "The counter-argument was simple but powerful: as an at-a-glance guide to what we think, scores were very useful to readers. We no longer think that's true. In the present environment, scores are struggling to encompass the issues that are most important to you." In summary: "Scores are failing us, they're failing you, and perhaps most importantly, they are failing to fairly represent the games themselves." For all the talk in the article about how modern games are continuously evolving through patches and feature updates, and how this has made reviewing games more difficult, it's not made clear exactly how the proposed 3-level system is better equipped to handle these challenges. Eurogamer's manifesto becomes even more puzzling when it's revealed that the proposed rating system will only be applied to selected games, and that, barring extraordinary circumstances, they will not update reviews as games evolve over time.
[2] Joystiq isn't scoring reviews anymore, and here's why - Joystiq - Richard Mitchell - Jan 13, 2015
http://www.joystiq.com/2015/01/13/joystiq-isnt-scoring-reviews-anymore-and-heres-why/
Here Joystiq explains their decision to stop scoring reviews. Unfortunately, they did not get the opportunity to gauge the decision's impact as the site was shuttered by AOL just two weeks later. Mitchell states: "The very purpose of a score is to define something entirely nebulous and subjective - fun - as narrowly as possible. The problem is that narrowing down something as broad and fluid as a video game isn't truly useful, especially in today's industry. Between pre-release reviews, post-release patching, online connectivity, server stability and myriad other unforeseeable possibilities, attaching a concrete score to a new game just isn't practical. More importantly, it's not helpful to our readers." Later in the article it is claimed that Joystiq felt compelled to modify their original five star rating system to better align with the typical distribution of scores on Metacritic, an apparent "capitulation to the industry" they weren't happy with. However, as the only changes implemented were to start using half-stars and to add a half-star to a few old scores, it's not entirely convincing that there was ever any serious incompatibility between Joystiq's rating scale and that of the typical reviewer listed on Metacritic.
[3] The Spotty Death and Eternal Life of Gaming Review Scores - Ars Technica - Kyle Orland - Feb 15, 2015
http://arstechnica.com/gaming/2015/02/the-spotty-death-and-eternal-life-of-gaming-review-scores/
The article contains some comments from Jason Schreier of Kotaku regarding his opinion of game review scores. Says Schreier: "When I read through the comments on an IGN review, for example, all I see is people talking about the score ... compare that to, say, comments on an [unscored] review from Kotaku or Rock Paper Shotgun, and it's night and day". Schreier elaborates further: "Scores strip the nuance from video game criticism, forcing reviewers to stuff games into neat little boxes labeled 'good' and 'bad'". Well, it's perhaps not surprising that Schreier believes he and his colleagues at Kotaku are cultivating a superior audience, but it remains unclear as to whether reality subscribes to the same theory. What's considerably more likely is that reality maintains a subscription to the eponymous subreddit known for mocking the outlet's tawdry editorials.
[4] Adam Sessler's rant about Metacritic at GDC 2009 - Youtube video - Simon LeMonkey - Mar 28, 2009
https://www.youtube.com/watch?v=0QsXrswJ-yM#t=25s
This video is of Adam Sessler giving a talk at the Game Developers Conference (GDC) 2009. By way of introduction, Sessler proclaims: "Fuck Metacritic. Who the hell made you king?" Sessler relates an anecdote of a developer he was acquainted with approaching him, upset that Sessler's 2/5 rating for his game had been translated to a score of 40/100 on Metacritic. He later clarifies: "It's just kind of odious when we in the press are seeing our work retranslated and recalibrated ... where we're really not claiming ownership and suddenly there's this number attached to it." At this point, I'm wondering in what alternative universe has a 2/5 rating ever been indicative of anything other than a steamy mound of canine feces? It should go without saying that a straight multiplicative scaling of 5-point or 10-point scores to a 100-point score is the most natural method of conversion. The talk also touches on the issue of publishers compensating developers based on Metacritic scores. Here Sessler displays an astounding proficiency at mental gymnastics when he suggests that professional reviewers should not in any way be held responsible for poor game sales, but that Metacritic (an aggregation of professional reviews) should definitely be held responsible: "You know what? If it's a good game and you know it's a good game, but it doesn't sell well, go talk to your marketing staff, all right? Don't put it on us [game critics]. I'm sorry this has happened to you [game developers]. I want to put a stop to it. Maybe somehow we'll all get together, we'll march down to CNET [Metacritic's owner], we'll flip 'em the bird, and maybe somebody in that building will take a long hard look at something they're putting online that is odious, pernicious and needs to stop now."
[5] Content Patch : January 3rd, 2013 - Ep. 025 [Tomb Raider, Gamestick, Elite] - Youtube video - TotalBiscuit, The Cynical Brit - Jan 3, 2013
https://www.youtube.com/watch?v=mQqzqHgvB90#t=521s
Says TotalBiscuit: "It's no secret that I'm very much a critic of Metacritic. I very much dislike the notion of the website, and I very much dislike what it has done to the industry". The commentary devolves into a meandering rant containing several dubious arguments, such as (1) Metacritic is lambasted for refusing to remove a GameSpot review score after the reviewers later admitted to doing a terrible job, (2) the use of background colors to improve readability of review score text is criticized as pandering to lazy dimwits, (3) Metacritic is accused of operating on a "business model of manipulating ratings" because they dare to translate review scores of 10/10 or A+ to 100/100. The single coherent point of the tirade revolves around Metacritic's lack of transparency in the way critic scores are weighted to determine a game's Metascore, a legitimate concern that has been raised elsewhere. However, it's hard to take this too seriously when the sinister allegations of targeted rating manipulation that follow aren't backed by any evidence or even reasonable suspicion. TotalBiscuit concludes with a strongly worded condemnation: "it just proves once again that Metacritic, in its current form, is a dangerously useless site that is actively stomping around like a bull in a china shop when it comes down to game and media reviews. And its limited usefulness is not enough to counteract the potential damage that Metacritic could actually do to the industry and is continuing to do to this very day."
[6] The basic premise is that a minority of deliberately inflated (or deflated) review scores will not have a great impact on the overall average score. Fortunately, the premise still holds when outliers are merely a result of overzealousness rather than outright dishonesty.
[7] Review Score Guide - The Jimquisition - Jim Sterling - Nov 22, 2014
http://www.thejimquisition.com/review-score-guide/
This guide was posted shortly after Jim Sterling left The Escapist to become an independent game critic funded directly by readers. There isn't anything especially remarkable here, other than the fact that Sterling elects to continue using review scores of his own accord: "Some people don't like review scores, and do not want to see them in reviews - that is okay! Scores here are subtle in their application, casually included at the end of the review, and you can always ignore it if you don't think it's useful. I personally like using scores, and intend to continue doing so until such time as I don't."
[8] The Official Destructoid Review Guide - Destructoid - Chris Carter - June 16, 2011
http://www.destructoid.com/the-official-destructoid-review-guide-2011-203909.phtml
This article explains Destructoid's system of review scoring. Yanier 'Niero' Gonzalez, founder of Destructoid, is quoted as saying: "Ad companies we've worked with have called us crazy for publishing scores. It really is like deciding to go to war. The only reason a site does not publish review scores is to sell more advertising. We have lost ad campaigns because we've given bad review scores, and frankly my dear, I don't give a damn. I'm not compromising our voice. Still, we understand the danger of a bad score. For example, some publishers giving their employees pay cuts due to scores, but in that case we push it back on them. It's not our fault you choose this method to compensate your employees. Grow a backbone, stand behind your work, make better games, and stop blaming the gaming press for having an honest opinion." Blunt and logical. A sincere defense of the review score is a rare sight to behold.
[9] The Spotty Death and Eternal Life of Gaming Review Scores - Ars Technica - Kyle Orland - Feb 15, 2015
http://arstechnica.com/gaming/2015/02/the-spotty-death-and-eternal-life-of-gaming-review-scores/
The article contains some comments from Arthur Gies of Polygon regarding his opinion of game review scores. Says Gies: "The anecdotal accounts and experiences we had suggested that readers want them [scores], whether they admit to it or not", adding that "I think there are people who are interested in arguing numbers and people who are more interested in discussing points raised in review text, and that neither are mutually exclusive." Damn Arthur, could you be any less enthusiastic about it? Not even a respectful nod towards the numeric tramp stamp you brandished to hit back at that pixelated perpetrator of psychosexual trauma known as Bayonetta?
[10] Listening to some of the unabashed arrogance on display from professional games journalists over the past six months, both on forums and in the Twitterverse, one begins to wonder if it isn't in fact their creations we're expected to regard with the sort of reverence usually reserved for fine art. And there's more than a sneaking suspicion that certain critics, unable to conceal a stunning lack of awareness of their own position on the food chain of gaming content production, are deeply resentful of Metacritic for continuing to exist on the back of their hard work. As Escapist reader Thanatos2K aptly puts it: "The real reason reviewers try to argue that "averages are useless" and "averages flatten out stuff" is that they're afraid of what averages really do - render them just a voice in a sea of other reviews, all equal. They don't want to be equal though. They want their review to mean more than everyone else's. They think they know better after all, and they want you to read THEIR review (and please click through to our site while you're at it). When I can see 50 reviews at a glance, their scores, and the average of their scores, I only need to fully read a few reviews to get the gist of why the scores fell where they did. This is poison to reviewer ego."
[11] See reference [8].
[12] Obsidian missed Fallout: New Vegas Metacritic bonus by one point - Joystiq - Ben Gilbert - Mar 15, 2012
http://www.joystiq.com/2012/03/15/obsidian-missed-fallout-new-vegas-metacritic-bonus-by-one-point/
A report on Fallout: New Vegas developer Obsidian missing out on royalties because the game achieved a (critic) Metascore of 84 instead of 85 or higher. This actually seems like a generous score considering the reviews of the PC version make universal reference to a buggy experience and gameplay all too similar to its predecessor. Curiously, even though Joystiq themselves only awarded the Xbox 360 version of the game a score of 70 (tied for 4th lowest out of 81 ratings), they can't resist taking a swing at Metacritic for giving smaller outlets a seat at the table: "Leaving aside the fact that Metacritic is a woefully unbalanced aggregation of review scores from both vetted and unvetted publications, agreements like this can leave indie studios -- like Obsidian -- in the lurch should that Metacritic score just barely miss the mark." Sorry Ben Gilbert, but the gaseous emissions of major gaming publications aren't quite as fragrant as you seem to think.
[13] Video Game Reviews: A Discussion Of The Ten-Point Scale And Inflated Scores - Forbes - Erik Kain - June 14, 2013
http://www.forbes.com/sites/erikkain/2013/06/14/how-video-game-reviews-work/
In this article, Erik Kain begins by explaining how the compression of school grades into the high end of the percentage scale is mirrored by video game review scores: "First of all, the 10-point scale is deceptive. Here's what I mean by that: First, take the numbers 1-10 and graft them over to the traditional letter-grading we use at school. There are just five letter grades [which] translate roughly to A = 90-100%, B = 80-89%, C = 70-79%, D = 60-69%, F = 00-59%. Only truly awful grades would get an F even though F comprises 59% of the total scale ... the same is true with video game reviews. Only truly awful games are given an F while most games fall somewhere between a 7 and a 9." Later Kain reveals his personal preference of a 3-tier rating system ("Buy / Hold / Sell"), stating: To me, only two scores count: ones above 9 and ones below 7. This indicates something that might be special on the one hand, and something that might be truly terrible on the other or at the very least not worth buying. Everything else just means it's okay-to-good with a margin of error based on personal taste."
[14] Review Score Guide - The Jimquisition - Jim Sterling - Nov 22, 2014
http://www.thejimquisition.com/review-score-guide/
Here Jim Sterling advocates the full range of the 10-point scale: "In my prior work at Destructoid, I always aimed to use the full ten-point scale, rather than simply the higher end of it. There's a popular belief that reviews are rated from 7-10 by major outlets, instead of 1-10, and while that's an exaggeration, I certainly feel more publications could stand to utilize all the numbers a bit more readily." Sterling also recommends the use of half-points on the 10-point scale, stating "[it's] useful to have that bit of wiggle room". The post goes on to describe the 10-point system employed by The Jimquisition.
[15] #GamerGate Wants Objective Video Game Reviews: What Would Roger Ebert Do? - Forbes - Erik Kain - Dec 28, 2014
http://www.forbes.com/sites/erikkain/2014/12/28/gamergate-wants-objective-video-game-reviews-what-would-roger-ebert-do/
After dismissing various complaints made by Gamergate as "paranoid" and "silly", Kain moves on to the general topic of objectivity in game reviews, saying: "readers need to accept that each critic will weight his or her review differently, and that the search for the 'objective' reviewer is futile. A reviewer who ignores politics or gender issues in their review entirely is simply biased in another direction. Balance is crucial." I'd argue that leaving politics and gender issues out of reviews entirely is a far cry from hamfisting them to the forefront in fictional works where they aren't remotely a main theme. Citing famed movie critic Roger Ebert's review of The Last Boy Scout as the epitome of balance, Kain states: "Part of the problem may be our scoring system for video games. There's something about the four-star system that's simpler and more honest than a ten point scale. Gone are the weird decimals. Gone is the tendency to weight scores toward the upper end of the scale. A great movie or game simply gets four stars. A good movie or game gets three. A mediocre movie or game gets two. And a bad movie or game gets one. It's nice and tidy, and it allows reviewers to give a 'good' review score to a good game while still criticizing its less savory aspects, much as Ebert does with The Last Boy Scout." I find this reasoning extremely flimsy. The 10-point scale allows equivalent penalties to be levied for "less savory aspects" while affording greater flexibility in just how large that penalty should be. Flexibility ought to be a welcome ally to the "it's just a subjective opinion, no worse than any other" crowd, a group that many game critics count themselves as proud members of and which Kain seems intent on joining.
[16] Jeff Gerstmann Explains His Departure From Gamespot - The Escapist - Earnest 'Nex' Cavalli - Mar 15, 2012
http://www.escapistmagazine.com/news/view/116360-Jeff-Gerstmann-Explains-His-Departure-From-Gamespot
A synopsis of the infamous dismissal of Jeff Gerstmann from GameSpot in 2007 and the subsequent formation of Giant Bomb. The conflict between Gerstmann and GameSpot management arose primarily over a low review score he awarded to Kane & Lynch: Dead Men, a game that was being heavily advertised on the site. Once the details of the affair became known, Gerstmann was lauded for refusing to cave to pressure from advertisers and he became somewhat of a symbol for ethical games journalism. Ironically, Giant Bomb was later sold to CBS Interactive, the parent company of GameSpot. More recently, Gerstmann's spotless reputation has been called into question for admitting indie game marketing baronesses onto Giant Bomb to hawk their wares under the pretext of 'Top 10 Games' lists.
[17] A collection of data for the analysis of video game ratings - Blog post - Slandered Gamer - Dec 30, 2014
http://slanderedgamer.blogspot.com/2014/12/a-collection-of-data-for-analysis-of.html
This blog post details a software application for downloading and viewing Metacritic game reviews. It also provides a sizeable collection of review data. The collection of data used in this article includes all console and PC games reviewed by either Polygon, Joystiq, Giant Bomb or The Escapist from January 2010 to mid-December 2014. Mobile and handheld titles were excluded completely. Why does this matter? Well, I suspect this selection of games slightly inflates the score statistics of other publications, i.e., anyone who isn't Polygon, Joystiq, Giant Bomb or The Escapist, by discounting some of the lesser known (and lower scoring) titles they've reviewed. This is because game reviewers tend to cover all the same major titles while randomly picking and choosing among minor titles with less overlap. However, disparities between statistics given here and those listed on Metacritic - typically 1 to 5 points in average score, for example - are also influenced by the exclusion of mobile and handheld games as well as any games released prior to 2010. These exclusions were viewed as desirable in order to obtain a selection of games that is both relevant and recent.
[18] See reference [17].
[19] If you've never heard of a cumulative distribution function (CDF) before, here is a brief explanation sufficient for our purposes. I have a bunch of review scores for different games, say {55, 70, 75, 90, 100}. If you name a particular score threshold you're interested in, I can calculate what fraction of my scores are less than or equal to that threshold. For example, if you say 70, I count 2 out of 5 of scores that are less than or equal to 70, which gives a cumulative probability 2/5 = 0.4. If you then ask about 85, I find that 3 of my 5 scores are less than or equal to 85, giving a cumulative probability of 3/5 = 0.6. The nice thing about this system is that it allows us to efficiently summarize thousands of scores by calculating the cumulative probability at a small number of preselected thresholds, for example at 30, 40, 50, 60, 70, 80, 90, 100 like in the article.
[20] Downfall of Gaming Journalism #9: GAME INFORMER - Youtube video - The Rageaholic (RazorFist) - Feb 15, 2015
https://www.youtube.com/watch?v=pss0hJkmLBA
I doubt the creator of this video would think much of the current article as he unequivocally condemns video game rating inflation instead of seeking rationale for it. But if we agree on one thing, it's who to point the finger at when the worst offenders are lined up. Says RazorFist: "Polygon, Kotaku and Rock Paper Shotgun didn't just wake up one day and decide "hey, let's be a cunt lapping cabal of bought out bitches." ... Long before there were URLs and mailing lists, there were SKUs and mailing lists. Print motherfucking journalism, folks. And no institution is more steeped in, or emblematic of, the omnipresent orgy of corruption in gaming journalism than fucking Game Informer." Among various bombs unloaded during this blistering rant, an interesting theory is put forward concerning a tipping point in the history of game review scoring: "I'm of the opinion that Game Informer almost single-handedly skewed review scores across all websites and publications for all time ... I hold an issue [of Game Informer] in my hand from 2009 - far from a banner year for gaming - that in 25 reviews boasts not one that ranks below a 5.5 [out of 10]". RazorFist later adds: "A bad game isn't a 6 or a 7 you colluding cunt flaps [reviewers], a bad game is a 1 or a 2.", going on to enumerate the many faults of the December 2009 edition of Game Informer magazine.
[21] Video Game Reviews: A Discussion Of The Ten-Point Scale And Inflated Scores - Forbes - Erik Kain - June 14, 2013
http://www.forbes.com/sites/erikkain/2013/06/14/how-video-game-reviews-work/
On our return to this article, I draw attention to a comment in which Erik Kain volunteers the following: "For instance, I tend to review games I want to play and play games that I think I will enjoy. So my scores tend to be a bit high, often hovering between 7 and 9 [out of 10] and rarely dropping to a 6 or below." Perhaps without fully appreciating it, Kain has supplied a partial answer to his own inquiries into why video game review scores are clustered at the top half of the scale. It isn't so much that reviewers only play games they think they'll enjoy, it's that they don't waste time formally reviewing games that their audiences won't know or care about. Pageviews pay the bills. Niche games don't attract nearly as many eyeballs as promoted titles, and they also happen to fall on the low end of the quality spectrum with greater regularity.
[22] The linear function is y = ax + b where y denotes score, x is the logarithm of the number of critic reviews, and a, b are the fitted parameters. The power function is y = cx^a + b where a, b, c are parameters. Whereas the linear function indicates an increase in average score from 66.5 to 78.2 (+11.7 points) as the number of reviews increases from 4 to 100, the power function produces a score increase from 65.6 to 76.0 (+10.4 points) over the same range. Some might consider 4 review scores a precariously low sample size, and you could certainly make the argument that it isn't in keeping with the perception of a Metascore as a consensus among a significant number of critics. To address this problem, I recomputed the best fits after discarding all observations with fewer than 10 critic reviews. The results showed a slightly stronger trend: going from 10 to 100 critic reviews yielded +12.3 points for the linear function and +10.6 points for the power function. Let's put things into perspective though - the data exhibits wide deviation around the fitted curves. The correlation coefficient between x and y is just 0.22 (or 0.24 if the 10 review minimum is enforced), which is a statistically significant positive correlation but by no means a dominant one. This is a good sign because it suggests that other factors (such as quality) are more closely connected to a game's overall score than the number of critics who deem it worthy of attention. Coming back around to the original point, in my opinion these results make a reasonable case for the aforementioned selection bias among games reviewers. But I imagine that one of the internet's myriad causation experts will be along to correct me shortly, arguing that all this really proves is professional games reviewers are under the influence of marketing hype if not outright bribery.
[23] My argument is straightforward: games that manage to succeed at the fundamentals - competent modelling and animation, functional game mechanics, controls that aren't unintuitive or horribly laggy, free of game-breaking bugs, price point in line with quality and quantity of gameplay - have done enough things correctly to warrant a score in the neighborhood of 50%. Even if the overall package isn't so attractive, I don't see a compelling reason to assign 1/5 or 3/10 just to satisfy some contrived quota of low scores to prove how 'honest' your opinions are. There's a useful distinction to be made between 'difficult to recommend' and 'complete and utter trash'. There isn't anything wrong with a scoring standard where 50% is regarded as a marker for acceptable product quality, as opposed to being treated as a target average score for the particular subset of (mostly high-end) products appraised by a reviewer. My general feeling is that game review scores are somewhat inflated at the moment, but perhaps only by 1 point out of 10 for most publications rather than the 2-2.5 points deduced by comparing current scoring averages to a 5/10 target. People who bleat about "average game quality" are invariably full of shit because they never elaborate on their personal selection criteria from which that nebulous average is derived.
[24] A Review Scoring System That Would Work - The Escapist - Ben 'Yahtzee' Croshaw - Feb 17, 2015
http://www.escapistmagazine.com/articles/view/video-games/columns/extra-punctuation/12989-Video-Game-Review-Scores-A-System-That-Would-Work
In this article, Ben Croshaw attempts to demonstrate the futility of game review scoring by emphasizing the subjective experience of the player. The introduction offers some commentary on the recent changes to Eurogamer's scoring system: "what Eurogamer doesn't seem to have realized is that it's not cured, it's just dropping from heroin to methadone. In the article, they state that they are switching from scores to awards. So rather than ditching scores entirely, they've merely switched from a 10-point system to a 4-point one. Essential, Recommended, Avoid, and presumably one that goes between the last two indicated by there being no award at all." Later, Croshaw argues that the multifaceted nature of games means they can't be adequately characterized by anything less than a detailed account of the individual experiences of 25 players, to be collected through a series of questionnaires filled out at regular intervals during the playing session. The reader would then decide which playtester(s) to heed based on the results of personality tests. Just when you're beginning to appreciate the article as a satirical take on the sort of game review system that might be devised by a drug-addled social sciences dropout from San Francisco, you're reminded that the author is trying to make a serious point against quantitative scoring. However, rather than supporting its conclusion of "perhaps alternatively you could just [ignore scores and] read the cunting review", the main text of the article contradicts it by showing exactly why no single reviewer ought to be trusted, least of all one who neglects to divulge the results a mental status examination taken at the time of writing. Instead, the savvy game consumer must seek council from some 25 reviewers at minimum, which can be interpreted as a tacit endorsement of the following strategy: browse the aggregated review blurbs on Metacritic until you find something that resonates. I always suspected a measure of grudging support for review aggregation lurking beneath the tough guy facade of the games press.

Holy crap, looks like you put a lot of work into this.

Personally, I have never been a fan of review scores. When I worked for my college paper I included them because that was the paper's policy, but once I began writing reviews on my own time I opted not to use them. I don't like them for a number of reasons you could probably guess right off the top of your head, but I am not wholly against their use when done right.

Though the sample size is rather narrow for the strict 4 to 5 star rating system, I am surprised to see it fall so neatly in-line with those that use broader systems. I've always thought that using a tighter rating system would make the scoring far less arbitrary and result in more honest scoring. Just look at GameInformer if you want to see the most arbitrary review scores of all time (what the fuck differentiates 97.3% from 97.5%?!).

I kind of think the age of review scores is beginning to wane, however. More and more people are beginning to get their information from previously unconventional sources (mostly Youtube) and many more are suckers for pre-order culture, which circumvent the review process entirely. Outside of parents looking for Christmas gifts for their kids, I don't see it a very useful consumer tool these days. All it really seems to do is give fanboys something to rage about.

OP, I can tell you put a lot of work into this. I have to apologise and admit that I only read about half of it.

Anyway, pretty much the only time I give scores credence is when they're universally bad. If a game has a 20 on meta-critic it's a game I'll be avoiding. But if a game gets a 95 I'm still not going to buy it without doing research. And that's why I don't find scores useful, because I do research before buying a game (unless it's one of my many impulse buys, but I only do that with cheap games).

Here's what happens (or what would happen if I read reviews any more): a critic gives a game a number, let's say 7. What does that 7 tell me? Almost nothing. I need to know what was good and what was bad. Was the game worth a 9 but a couple of points got knocked off because multiplayer was bad? Fine by me, I almost always play single player games anyway. Or did they give it a 7 because the story was really good but the gameplay was bland? If that's the case I'll probably avoid it.

I find out why they gave it a 7 by reading the review. But now that I've read the review I don't need the number. I have all of the information, so the score of 7 is irrelevant.

After saying all that, I'm not going to buy a game off one reviewer's recommendation anyway. My process usually involves skimming a bunch of reviews and forums for 10 minutes or so to get a general consensus of the good and bad points, then I'll watch a couple of minutes of footage on youtube the get a proper idea on how the game plays.

Very impressive post, OP!

From my experience as a reviewer it's spot-on. Especially this:

First of all, there is a positive bias with respect to what quality of game even registers as a blip on the radar of reviewers. If it isn't a big studio release backed by marketing or an indie title blessed by IGF or IndieCade, it generally doesn't receive a mention let alone a dedicated review. There isn't necessarily anything insidious about this state of affairs; I'd wager that at least a few critics regularly sample low budget offerings only to be reminded of why they don't more often. Mind you, I don't assert that marketing buzz is an accurate predictor of game quality, only that the subset of games with enough traction to garner attention from reviewers is, statistically speaking, of above average quality

Many publications are running on low budgets and rely on review copies from publishers to fill their (web)pages. While publishers of 'lesser' games do send in copies and keys (a Nintendo publication I worked at kept getting Barbie and other kids game even though we rarely reviewed them), it's unlikely their game will be picked up for review. Publications have a limited amount of budget and/or pages for reviews and tend to pick games readers are already interested in, which are games being developed by competent, established studios. Once in a while they'll throw in a really bad game for laughs/as a reminder of what really bad actually is. Other spots are filled with games reviewers bought themselves and wanted to share with their readers.

As for whether or not we should do away with review scores... I think they can have a place in review systems, as long as we use them responsibly:

1. Tight scales are a must. There's no good reason to say a game with a 85% rating is better than a game with a 84% rating. It's better to go with something like a five stars system and see the scores (1-2-3-4-5) as categories of quality rather than some arbitrary scale to rank games by.

2. Publications must explain what the scores mean. A 9 from Edge is different from a 9 from IGN, so it's important to provide context.

3. Readers must be reminded that a review is simply one opinion from one person given at one time. The number that accompanies the review is not meant as an absolute or ever-lasting judgment, it's just the reviewers thoughts on its quality in the form of a number.

Fappy:
Though the sample size is rather narrow for the strict 4 to 5 star rating system, I am surprised to see it fall so neatly in-line with those that use broader systems. I've always thought that using a tighter rating system would make the scoring far less arbitrary and result in more honest scoring.

Movie reviewers are notorious sticklers for a strict 4-star system, although it's far from universal. As the average movie rating on Metacritic is 62 - roughly 10 points lower than music and games - there might be some substance to the 'tight scale' argument. It isn't the magical cure-all that some would have us believe, though.

Fappy:
Just look at GameInformer if you want to see the most arbitrary review scores of all time (what the fuck differentiates 97.3% from 97.5%?!).

Yea, about those decimals. I've heard the "how is there any tangible difference between a score of 76 or 77?!?" trotted out many times as a way to make the 100-point scale seem ridiculous. Obviously, the same thing applies to differences of 0.1 on the 10-point scale. I find this argument completely dishonest.

First, consider just how you might end up with decimals in any number of perfectly valid scoring systems. Let's say you devise a very simple system in which the final score is determined as follows: 20% for Aesthetics, 30% for Performance, 50% for Gameplay. Each of these three individual categories are scored on a strict 5-point system, say for example 2/5, 4/5, and 3/5. The final score out of 100 then becomes:

Score = 20*2/5 + 30*4/5 + 50*3/5 = 62

Now, you might argue that this number ought to be rounded down to 60. Otherwise we're perceived as claiming a level of precision that just isn't possible (or meaningful). OK, but eventually you'll end up reviewing two games that score at 64 and 65. The first is rounded down to 60 and the second is rounded up to 70. What you've then done is presented an illusion of a meaningful quality difference between two games that were actually just 1/100 apart.

In my view, single points on the 100-point scale are fine. Leave them in instead of rounding. The people who shout about how ridiculous they are only make themselves look ignorant.

Fappy:
I kind of think the age of review scores is beginning to wane, however. More and more people are beginning to get their information from previously unconventional sources (mostly Youtube) and many more are suckers for pre-order culture, which circumvent the review process entirely. Outside of parents looking for Christmas gifts for their kids, I don't see it a very useful consumer tool these days. All it really seems to do is give fanboys something to rage about.

Possibly. I still enjoy written reviews even though I rarely base purchasing decisions on them. The same can be said of formal video reviews on youtube. When I want to see how a game actually plays, usually I'll just hop on twitch for an hour.

I don't have a problem rating ANYTHING based on a number scale, I think it's very helpful in fact. It allows the person reading/watching/hearing the review to get an idea of how much more or less a reviewer liked something over something else. I understand you can't 100% accurately equate a person's feeling to a number but that doesn't mean rating something on a scale is useless either.

The following is the REAL problem with game reviews:

StreamerDarkly:
There's no denying the fact that video game ratings lean heavily towards the upper end of the scale. You need only browse the publication lists on Metacritic to discover an industry-wide scoring average currently sitting at 74, a figure that the vast majority of individual publications fall within +/-10 points of. Compare that to an overall average score of 62 for movies, 65 for television and 72 for music.

To gain a better appreciation of the status quo, the table below contains a statistical summary of review scores from some prominent gaming critics for a large selection of console and PC titles. Average scores are in the range of 68-78. The cumulative probability data conveys the distribution of each critic's review scores across the scale. Some useful observations can be made. For example, the probability of scoring a game at 70 or lower is between 0.33 and 0.52 for most publications, but it goes as low as 0.24 (Game Informer and GameTrailers) and as high as 0.65 (Edge Magazine). The probability of scoring at 50 or below is typically between 0.09 and 0.17, but this can be as low as 0.05 in the most extreme case. Perhaps most alarming is the inclination of certain critics to use the 81-100 region of the scale for half of all games they rate (take 1.0 minus the cumulative probability value in the 80 column), whereas most gamers would agree that 81-100 territory should be reserved for truly top notch efforts. All told, the results serve as further confirmation that nearly all of the action takes place in the top half of the scale.

CriticSamplesAverageCumulative probability at score:
30405060708090100
Destructoid747740.040.080.130.200.370.630.901.00
Edge Magazine548680.040.080.190.400.650.880.991.00
Eurogamer767710.040.090.160.290.520.790.961.00
Game Informer1090780.010.020.060.130.240.540.891.00
GameSpot1334720.010.060.120.240.450.790.991.00
GamesRadar746730.030.060.140.260.470.760.951.00
GameTrailers641780.000.010.050.110.240.470.881.00
Giant Bomb506720.020.130.130.390.390.870.871.00
IGN1476750.020.040.090.190.330.590.921.00
Joystiq638730.030.100.170.290.450.730.921.00
PC Gamer347740.010.030.090.180.330.590.941.00
Polygon386710.040.080.170.270.480.720.951.00
The Escapist488740.020.070.120.280.430.770.911.00
VideoGamer672730.030.060.120.230.480.780.971.00

It's funny that you say there's no denying the fact that game rating lean heavily towards good scores because I've been saying that for what feels like forever and everyone here just disagrees with me. I didn't feel the need to do a data analysis on it because it's so fucking obvious. The problem isn't just the higher review scores than other mediums, it's that the vast majority of a game's score will all fall into a very narrow range like say between 8.5-9.5. You can't be serious that almost every game critic feels game XYZ is of basically the same exact quality. It's unheard of in any other medium for a work of art to score so high like say a GTAV, TLOU, etc. Even Best Picture winning movies don't come close to the kind of aggregate scores GOTY titles score. You start a thread here on this very site about The Witcher, Skyrim, GTAV, Mass Effect, etc. and you'll find lots people genuinely rating these very highly rated games all over the 1-10 scale. Just read through this thread here about Dragon Age Inquisition and you'll find many very reasoned and valid opinions of the game from the game being bad to great. You'll get way way more information about DA:I just reading a page of that thread vs reading every professional DA:I review. But that doesn't happen with game critics, most games are rated basically the same score across 50+ reviewers, there's something wrong with that.

How is there only 1 negative review of FFXIII for example? I haven't even played the game but I do know that FFXIII is very much a love/hate it game yet the "professional" reviews are completely different. This is exactly why I think gamers have so many issues with game reviews because there is really no way to even find your "go-to critic" because if you didn't like FFXIII (or GTA or TLOU or Skyrim), there's very very few critics who gave those games a mixed review let alone a negative review. How are you supposed to find a critic you align with? I don't have a problem finding a movie critic that aligns with my tastes. I stopped even looking at professional reviews a long time ago already because they are rather worthless outside of a select few critics like Jim Sterling or Yahtzee (and he doesn't even do real reviews) that actually criticize games. Reading game reviews is basically just reading how awesome a game is.

I think the core issue that has caused this is game reviewers think (unconsciously at this point) they are supposed to objectively rate a game. Say with FFXIII (again), if a reviewer disliked the battle system but felt the battle system worked, he/she would count that as a positive instead of a negative because the battle system is objectively good and functional regardless of how much or little they liked it. This kind of stuff even goes into how a review rates a game's story and characters. Heavy Rain got 99 positive reviews out of 107 reviews. The enjoyment of the game completely hinges on how much you enjoy the characters and story (exactly like a movie), you're going to tell me 99 out of 107 people thought the writing was 7/10 good or better. If Heavy Rain was a movie, it so wouldn't have averaged an 87. Writing is so bad in gaming yet game reviewers hardly ever mark games down for bad writing (even story focused games).

Lastly, 5/10 is fucking average, not 7/10.

Phoenixmgs:
It's funny that you say there's no denying the fact that game rating lean heavily towards good scores because I've been saying that for what feels like forever and everyone here just disagrees with me.

Really? All I've seen is near universal agreement on this point.

Phoenixmgs:
I didn't feel the need to do a data analysis on it because it's so fucking obvious.

Yes, but the idea was to take a closer look at the severity of the issue, and also to explore the causes of it. The world is full of self-proclaimed know-it-alls who don't think anything is worth doing because they already have the answers. What you often find is that the answers they're most firmly in possession of are to questions like "How much does the double cheeseburger combo cost?" or "What is the fare to Sixth and Main?".

Phoenixmgs:
The problem isn't just the higher review scores than other mediums, it's that the vast majority of a game's score will all fall into a very narrow range like say between 8.5-9.5.

At this point I'm beginning to think you didn't take the time to read the table. Because it certainly doesn't support the claim of a vast majority of scores being between 8.5 and 9.5. If it seems that way, you're probably only reading reviews of the top tier games.

Phoenixmgs:
You can't be serious that almost every game critic feels game XYZ is of basically the same exact quality. It's unheard of in any other medium for a work of art to score so high like say a GTAV, TLOU, etc. Even Best Picture winning movies don't come close to the kind of aggregate scores GOTY titles score.

The listings of top rated movies on Metacritic say otherwise.

Phoenixmgs:
You'll get way way more information about DA:I just reading a page of that thread vs reading every professional DA:I review.

No argument here. The people who know the most about the game are the ones who play it for more than 5 hours. And I agree, random forum posts can be very insightful at times.

Phoenixmgs:
This is exactly why I think gamers have so many issues with game reviews because there is really no way to even find your "go-to critic" because if you didn't like FFXIII (or GTA or TLOU or Skyrim), there's very very few critics who gave those games a mixed review let alone a negative review. How are you supposed to find a critic you align with? I don't have a problem finding a movie critic that aligns with my tastes. I stopped even looking at professional reviews a long time ago already because they are rather worthless outside of a select few critics like Jim Sterling or Yahtzee (and he doesn't even do real reviews) that actually criticize games. Reading game reviews is basically just reading how awesome a game is.

Why do you need a go-to critic? I understand that natural preferences will emerge based on the presentation style and preferences of individual critics, but listening to a single authority all the time just leads to gradual brainwashing.

Phoenixmgs:
Lastly, 5/10 is fucking average, not 7/10.

Well, I can't force anyone to read the points made in the article.

SmallHatLogan:

Here's what happens (or what would happen if I read reviews any more): a critic gives a game a number, let's say 7. What does that 7 tell me? Almost nothing. I need to know what was good and what was bad. Was the game worth a 9 but a couple of points got knocked off because multiplayer was bad? Fine by me, I almost always play single player games anyway. Or did they give it a 7 because the story was really good but the gameplay was bland? If that's the case I'll probably avoid it.

I find out why they gave it a 7 by reading the review. But now that I've read the review I don't need the number. I have all of the information, so the score of 7 is irrelevant.

I agree with most of what you say here. Realize though that no one is asking for scores to replace the review text. The score is only ever intended to be part of the review.

The common argument you hear from game critics who dislike scores is that readers end up fixating on the number and arguing endlessly about it, thus paying less attention to the review text.

My opinion is that the scores serve as a useful clarification of the reviewer's overall opinion. The summaries of good and bad parts of a game often feel like quota driven checklists - you need to add a criticism or two, no matter how trivial, just to balance out the effusive praise. A score is a concrete object that can more easily be identified as bullshit than some of the vague and weasel-worded statements found in reviews.

Honestly I am very much opposed to number scores being given, firstly because they are a completely arbitrarily chosen unit to measure something that is far more complex, secondly idiots use them as a platform to shout down other people trotting out useless facts with no context because a game got a 7 so its the worst thing ever yet their precious 9 is 'the bestest', thirdly because those same numbers have been used to negatively affect people within the industry and the sooner Metacritic dies out the better [I am not a fan of a review aggregate site changing the final total on a whim just because they can], especially because like you a 4/5 does not equal 80/100 and with no standardized review scale the whole thing is meaningless, fourthly because the scale is so meaningless a 7 is now set as average, and lastly as Yahtzee put it I will never believe that something as complex as critique can somehow be boiled down to a single defining number and would rather people spent more time actually understanding a review then being so lazy as to just stare at an arbitrary number.

StreamerDarkly:

Yea, about those decimals. I've heard the "how is there any tangible difference between a score of 76 or 77?!?" trotted out many times as a way to make the 100-point scale seem ridiculous. Obviously, the same thing applies to differences of 0.1 on the 10-point scale. I find this argument completely dishonest.

First, consider just how you might end up with decimals in any number of perfectly valid scoring systems. Let's say you devise a very simple system in which the final score is determined as follows: 20% for Aesthetics, 30% for Performance, 50% for Gameplay. Each of these three individual categories are scored on a strict 5-point system, say for example 2/5, 4/5, and 3/5. The final score out of 100 then becomes:

Score = 20*2/5 + 30*4/5 + 50*3/5 = 62

Please, please, please, don't do that! You're not not rating a maths test but entertainment or even art. There are good games in which the gameplay takes a backseat to the storyline or were very simple aethetics are used to go for a retro atmosphere. There are complex games in which a lot of things can go wrong and there is simple stuff like Tetris.

I'm against numerical scores for a very simple reason.

It facilitates laziness. That is, it permits readers to forego reading the review itself to immediately form an opinion of the product based on the abstract measure of (Subjective) quality that is the numerical score.

Anything that stops people from actually reading and understanding is negative.

Has the escapist even had review scores for 5 years?

I know they were famously one of the few places that didn't and insisted they shouldn't use them.
At some point, overwhelming pressure from... Presumably the readers (we hope), led them to change their mind.

Honestly, I don't really know if review scores are a good thing or a bad thing. It's just one of those things you get used to as some kind of summary of the text of a review...

NPC009:

StreamerDarkly:
Score = 20*2/5 + 30*4/5 + 50*3/5 = 62

Please, please, please, don't do that! You're not not rating a maths test but entertainment or even art. There are good games in which the gameplay takes a backseat to the storyline or were very simple aethetics are used to go for a retro atmosphere. There are complex games in which a lot of things can go wrong and there is simple stuff like Tetris.

You don't like my shiny new scoring system?!? Admittedly, it's a bit rough around the edges, but I believe the next iteration will take the world of games review by storm. Still waiting to here back from the Patent Office.

It was just an example for the sake of argument. I absolutely agree that the set of criteria changes depending on whether you're rating a story-driven game, a platformer, a twitch shooter, etc.

But listen, the main point is that using a formula is a perfectly reasonable way to operate. Reviewers already do this, it's just a sort of mental equation they keep in their heads instead of writing out on paper. The pros and cons must be weighed. Doing so in a more transparent fashion might actually introduce a bit of much needed discipline into review scoring. "No no no" most critics would say to that - "I like the freedom to pull a number out of my ass that is unencumbered by what I wrote in the review body ... formulas are cold and impersonal". And this is the point where you get the feeling the critic actually views himself as an artist on the same plane as the creator of the work.

In my view, the catch-all defense of "everything is subjective" has been stretched beyond all reason. Critics are almost celebrated for their inconsistencies in appraising games. If it were possible to conduct the following experiment - obtain 3 completely independent reviews of the same game by the same reviewer - some readers would be fucking applauding the fact that the experiment produced 3 very different scores.

Here's the thing - Everything is subjective. The purpose of a review is to determine if the subjective impression of the reviewer coincides with yours.

There is no formula. It's about what the reviewer liked or didn't like and whether the reader like or dislike the same things. Numerical abstractions of subjective impressions need not apply.

Is it technically inept? Then write so. Is it equal parts entertaining and frustrating? Then write so. Is there one thing in it that elevates it's overall quality past all it's other elements? Then write so. Is there a single thing that brings down the quality of the entire experience. Then write so.

Then the person interested in the game can READ what is written, to understand why this game is or isn't a good fit for themselves.

StreamerDarkly:

Phoenixmgs:
The problem isn't just the higher review scores than other mediums, it's that the vast majority of a game's score will all fall into a very narrow range like say between 8.5-9.5.

At this point I'm beginning to think you didn't take the time to read the table. Because it certainly doesn't support the claim of a vast majority of scores being between 8.5 and 9.5. If it seems that way, you're probably only reading reviews of the top tier games.

I understand the table, I was going on about other things that kinda start at the table. The fact the 7/10 is average is a problem. It creates game scores to be more bunched together because reviewers only have 7-10 to rate a game if they found it average through great.

There's only 2 Uncharted 2 reviews that are outside of the 9.0-10 range. No other medium sees reviewers in that much agreement over works of art.

Phoenixmgs:
You can't be serious that almost every game critic feels game XYZ is of basically the same exact quality. It's unheard of in any other medium for a work of art to score so high like say a GTAV, TLOU, etc. Even Best Picture winning movies don't come close to the kind of aggregate scores GOTY titles score.

The listings of top rated movies on Metacritic say otherwise.

I don't use Metacritic for movies because there are more reviews for a game when there exists more movie reviewers than game reviewers, that doesn't make sense. On RottenTomatoes, Birdman is averaging an 8.5 (Best Picture 2014) and 12 Years a Slave is averaging a 9/10 (Best Picture 2013). Whereas games can see average scores as high as at least 97/100.

Why do you need a go-to critic? I understand that natural preferences will emerge based on the presentation style and preferences of individual critics, but listening to a single authority all the time just leads to gradual brainwashing.

You never agree completely with one person. But it is nice to have a go-to critic (or a couple) to see if you should go watch that movie or play that game when it comes out or wait till later to check it out. I'm waiting to see if there's going to be a director's cut of Jupiter Ascending released on Blu-ray before checking it out as Moviebob said the movie felt like it was re-cut to be shorter, he's not the only one that said that either. Some things I do want to experience completely fresh without reading any reviews.

With gaming, I have pretty much no go-to critics. I base what games I buy on release completely from watching say a 15+ minute unedited gameplay demo. I knew exactly what Watch Dogs was going to be like just by watching full unedited missions, it was basically 3rd-person FarCry in a city with hacking, that's exactly what I got and I wanted in fact. Yeah, I knew the story would be lame, but that's probably 90% of games since writing is so bad in the gaming industry. That's something critics rarely knock games for, even story heavy games. Look at Mass Effect 3, not a single mixed review. I didn't even hate the ending but I wouldn't rate any of the games as high as the Metacritic average. I really did enjoy the trilogy, the writing was never that good. Mass Effect's writing is good for games and that's the problem right there, video game writing should be as good as other mediums but it's not even close. The status quo that games have shitty writing is not OK and it should be criticized as such.

CrystalShadow:
Has the escapist even had review scores for 5 years?

Yes, they have been scoring that long. For example, their review of Alan Wake from 2010 by Susan Arendt:
http://www.escapistmagazine.com/articles/view/video-games/editorials/reviews/7524-Review-Alan-Wake

A more recent example by Stew Shearer:
http://www.escapistmagazine.com/articles/view/video-games/editorials/reviews/13613-Hotline-Miami-2-Review-Offers-More-Game-But-Less-Satifaction

CrystalShadow:
I know they were famously one of the few places that didn't and insisted they shouldn't use them. At some point, overwhelming pressure from... Presumably the readers (we hope), led them to change their mind.

I believe it's only Yahtzee who really dislikes the use of scores.

This is one of the reasons I watch Zero Punctuation. Yahtzee is more of a critic not a reviewer, he doesn't even uses numbers (but I wouldn't mind if he did either) I like him.

He just tears a game apart bit by bit usually exposing the negative first, he is the most honest and fair of the internet in my opinion.

Mutant1988:
It facilitates laziness. That is, it permits readers to forego reading the review itself to immediately form an opinion of the product based on the abstract measure of (Subjective) quality that is the numerical score.

Anything that stops people from actually reading and understanding is negative.

You might consider the possibility that those who come only to read the score aren't going to read the article just because the score is taken away.

Mutant1988:
Here's the thing - Everything is subjective. The purpose of a review is to determine if the subjective impression of the reviewer coincides with yours.

There is no formula. It's about what the reviewer liked or didn't like and whether the reader like or dislike the same things. Numerical abstractions of subjective impressions need not apply.

Is it technically inept? Then write so. Is it equal parts entertaining and frustrating? Then write so. Is there one thing in it that elevates it's overall quality past all it's other elements? Then write so. Is there a single thing that brings down the quality of the entire experience. Then write so.

Then the person interested in the game can READ what is written, to understand why this game is or isn't a good fit for themselves.

Yet, the idea of a game being horrible, bad, mediocre, good, or amazing exists. Not only does it exist, but you can often get a consensus on such things, which decidedly shows that NOT everything is subjective. A basic way to capture the spectrum of opinions by critics (or players) is to look at a large number of such opinions. To assert that 90% agreement on a game being amazing is meaningless because 10% don't agree ... that's just madness.

If you start with an absolute conviction that only words carry any validity, you'll end up in the same position no matter what. Fortunately, your opinion is entirely subjective, as I have irrefutably proved by banging down some seriously powerful words on this page with the full weight of my angry fingers.

StreamerDarkly:

Mutant1988:
It facilitates laziness. That is, it permits readers to forego reading the review itself to immediately form an opinion of the product based on the abstract measure of (Subjective) quality that is the numerical score.

Anything that stops people from actually reading and understanding is negative.

You might consider the possibility that those who come only to read the score aren't going to read the article just because the score is taken away.

Mutant1988:
Here's the thing - Everything is subjective. The purpose of a review is to determine if the subjective impression of the reviewer coincides with yours.

There is no formula. It's about what the reviewer liked or didn't like and whether the reader like or dislike the same things. Numerical abstractions of subjective impressions need not apply.

Is it technically inept? Then write so. Is it equal parts entertaining and frustrating? Then write so. Is there one thing in it that elevates it's overall quality past all it's other elements? Then write so. Is there a single thing that brings down the quality of the entire experience. Then write so.

Then the person interested in the game can READ what is written, to understand why this game is or isn't a good fit for themselves.

Yet, the idea of a game being horrible, bad, mediocre, good, or amazing exists. Not only does it exist, but you can often get a consensus on such things, which decidedly shows that NOT everything is subjective. A basic way to capture the spectrum of opinions by critics (or players) is to look at a large number of such opinions. To assert that 90% agreement on a game being amazing is meaningless because 10% don't agree ... that's just madness.

If you start with an absolute conviction that only words carry any validity, you'll end up in the same position no matter what. Fortunately, your opinion is entirely subjective, as I have irrefutably proved by banging down some seriously powerful words on this page with the full weight of my angry fingers.

Consensus is not the same thing as objective. A consensus is still subjective.

Everything that isn't scientifically proven to be fact, is subjective. Be it supported by a majority opinion or not.

And the people who go to a review without the intent to read it should not bother looking for a review in the first place, seeing how they seem to have no interest in a detailed assessment of a product.

I never said that a consensus is meaningless. Nor did I bring up the concept of consensus in the first place. You did, in a vain attempt to discredit my critique of your inane justification for a numerical abstract of quality.

A number is pointless without the words to explain how they were reached and given the words, the numbers are not needed.

The only sure way to determine the subjective quality of a product is to use it. Lacking the ability to do so, the second best way is to read the accounts of others that have used it. Based on what they say, you can form a more informed opinion on the subjective quality of the product, as it applies to your own likes and dislikes.

StreamerDarkly:

CrystalShadow:
Has the escapist even had review scores for 5 years?

Yes, they have been scoring that long. For example, their review of Alan Wake from 2010 by Susan Arendt:
http://www.escapistmagazine.com/articles/view/video-games/editorials/reviews/7524-Review-Alan-Wake

A more recent example by Stew Shearer:
http://www.escapistmagazine.com/articles/view/video-games/editorials/reviews/13613-Hotline-Miami-2-Review-Offers-More-Game-But-Less-Satifaction

CrystalShadow:
I know they were famously one of the few places that didn't and insisted they shouldn't use them. At some point, overwhelming pressure from... Presumably the readers (we hope), led them to change their mind.

I believe it's only Yahtzee who really dislikes the use of scores.

Ok, 5 years back. Wow. Not sure I agree on yahtzee being the only one that doesn't like them though. The escapist made a very big deal out of their decision to start using them. It may, as it turns out have been more than 5 years ago, but the escapist didn't start out with them

this is one from 2009. Note there's no score on it
http://www.escapistmagazine.com/articles/view/video-games/editorials/reviews/6321-Review-Abe-s-Oddysee

from 2007. No score.
http://www.escapistmagazine.com/articles/view/video-games/editorials/reviews/2739-Review-Alpha-Prime-and-Shadowgrounds-Survivor-PC

another from 2007, again, no score
http://www.escapistmagazine.com/articles/view/video-games/editorials/reviews/1239-Review-Alter-Ego

2008, no score
http://www.escapistmagazine.com/articles/view/video-games/editorials/reviews/5241-Review-American-McGee-s-Grimm

Anyway, you get the idea.
interestingly, this one from july 2009 has a score, so that tells us approximately when things changed...
http://www.escapistmagazine.com/articles/view/video-games/editorials/reviews/6320-Review-Splosion-Man

given we have one on the 30th of july 2009 without a score, and one on the 31st WITH a score...

Ah, and finally the key article itself. From february 2010, Russ Pitts, then editor in chief, explaining the change in policy, and introduction of review scores.
http://www.escapistmagazine.com/articles/view/video-games/editorials/7148-Why-We-re-Using-Review-Scores

As you can see, this was a big deal at the time, and they argued about it for a long time... And certainly quite a few of the staff were against it...

Still, things change, and they've now had scores longer than not. (also looks like suzan arendt must have been using them before it was official policy...)

Mutant1988:
Consensus is not the same thing as objective. A consensus is still subjective.
Everything that isn't scientifically proven to be fact, is subjective. Be it supported by a majority opinion or not.

Is that so? Well, kindly remind everyone that gravity is still a theory. Supported by a wide consensus of experimental evidence to be sure, but nonetheless not a fact established with absolute certainty. Science deals in best available theories at present to explain observable reality, not on absolute facts. Picking something that's less settled, how subjective would you say the current theories on man-made climate change are?

Mutant1988:
I never said that a consensus is meaningless. Nor did I bring up the concept of consensus in the first place. You did, in a vain attempt to discredit my critique of your inane justification for a numerical abstract of quality.

That you didn't bring up the notion of a consensus is exactly the problem. It's impossible to have a meaningful discussion on this topic unless you acknowledge degrees of subjectivity instead of carelessly throwing around the term as a shield. Notice also that in the previous post I specifically avoided numbers by sticking with basic quality descriptors of the kind gamers use every day.

Mutant1988:
A number is pointless without the words to explain how they were reached and given the words, the numbers are not needed.

It's not about necessity, it's about utility. To repeat it for the second time, there's no argument that the written component of a review should be removed in favor of a numerical rating.

Humans love assigning numbers to everything, but if there is no well defined universal scale then they are just arbitrary and meaningless. If you want more information than a yes/no recommendation, an imaginary number won't help you. That's like claiming that the temperature was exactly 36.7 C, but your thermometer has an accuracy of +/- 3 C.

CrystalShadow:

interestingly, this one from july 2009 has a score, so that tells us approximately when things changed...
http://www.escapistmagazine.com/articles/view/video-games/editorials/reviews/6320-Review-Splosion-Man

given we have one on the 30th of july 2009 without a score, and one on the 31st WITH a score...

Ah, and finally the key article itself. From february 2010, Russ Pitts, then editor in chief, explaining the change in policy, and introduction of review scores.
http://www.escapistmagazine.com/articles/view/video-games/editorials/7148-Why-We-re-Using-Review-Scores

As you can see, this was a big deal at the time, and they argued about it for a long time... And certainly quite a few of the staff were against it...

Still, things change, and they've now had scores longer than not. (also looks like suzan arendt must have been using them before it was official policy...)

Thank you for taking the time to find this! I wasn't aware the Escapist had had a huge internal debate on the topic and written up that article explaining the decision. Guess I missed out on a potential reference for the article :-(

StreamerDarkly:

CrystalShadow:

interestingly, this one from july 2009 has a score, so that tells us approximately when things changed...
http://www.escapistmagazine.com/articles/view/video-games/editorials/reviews/6320-Review-Splosion-Man

given we have one on the 30th of july 2009 without a score, and one on the 31st WITH a score...

Ah, and finally the key article itself. From february 2010, Russ Pitts, then editor in chief, explaining the change in policy, and introduction of review scores.
http://www.escapistmagazine.com/articles/view/video-games/editorials/7148-Why-We-re-Using-Review-Scores

As you can see, this was a big deal at the time, and they argued about it for a long time... And certainly quite a few of the staff were against it...

Still, things change, and they've now had scores longer than not. (also looks like suzan arendt must have been using them before it was official policy...)

Thank you for taking the time to find this! I wasn't aware the Escapist had had a huge internal debate on the topic and written up that article explaining the decision. Guess I missed out on a potential reference for the article :-(

It happens. You'd be forgiven for thinking this was never the case if you look at the site now, the only reason I knew this article existed, is because I've been here so long that I witnessed the transition first-hand.
Turns out to be unreasonably difficult to find an old article on the escapist though.
Especially one like that, which is an editorial from a specific month...

I knew the article existed, and yet I still struggled to find it. I can imagine someone unaware of that article even having being written would have had no chance of ever finding it.

I'm against numerical scores in game reviews for a simple reason - I have worked as a movie critic for 5 years and not a day has gone by that I haven't rued the numerical score system. It belittles everything I write by reducing it to a simple number. And honestly every time I put a score up it's mostly based on gut feeling. I ask myself, "Does this review read like a 9, or is it leaning towards 8?". I flip a coin, stamp a number and move on. I'm trying to translate everything I've argued about - which takes a considerably amount of time, by the way - into a number for lazy people who don't care much for readin' around these parts, mister.

People give numerical scores way too much credit.

Johnny Novgorod:
I'm against numerical scores in game reviews for a simple reason - I have worked as a movie critic for 5 years and not a day has gone by that I haven't rued the numerical score system. It belittles everything I write by reducing it to a simple number. And honestly every time I put a score up it's mostly based on gut feeling. I ask myself, "Does this review read like a 9, or is it leaning towards 8?". I flip a coin, stamp a number and move on. I'm trying to translate everything I've argued about - which takes a considerably amount of time, by the way - into a number for lazy people who don't care much for readin' around these parts, mister.

People give numerical scores way too much credit.

I've worked as a game critic for nearly ten and while gut feelings are surprisingly important when assigning scores, most publications I wrote for actually explained with the scores meant. Something like this:

1-3 - So bad it's (almost) funny. But seriously, stay the hell away.
4-5 - Nice try, but it's hard to find something worthwhile in here.
6-7 - Decent game, but won't appeal to everyone.
8-9 - Good game, safe buy for most people.
10 - Instant classic. You need to play this!

Made things a lot easier and more transparent.

Any professional critic or reviewer who cannot even decide if he thought the experience he had, was good, bad or somewhere in between, is unfit for his job.

If he or she can, then he/she can also assign a score very easily, even if it is only a 1, a 2 or a 3, out of 3.
An experienced gamer, with some more games for comparison, could be much more precise, using a much wider scale, but there's always a like-o-meter in that head.

The only question that remains then is, as a reviewer, do I want to share my overall impression, or final recommendation with the reader?
If yes, then scores: sure why not.
If no, then I shouldn't make any subjective conclusion or recommendation, so I might as well not bother with the review at all.

There is no fundamental difference between using words like good or bad, or to use numbers that are assigned to those words in a pre-defined table.
All that's left then is foolish people whining about too much precision in other people's reviews, to whom I say: in your mind round that 7.7 out of 10 down to a 8/10 or a 4/5 or a 3/3 or maybe even a 1/1 and everybody is happy.

As for metacritic: don't like it? Then just don't go there. Another problem solved.

StreamerDarkly:

Is that so? Well, kindly remind everyone that gravity is still a theory. Supported by a wide consensus of experimental evidence to be sure, but nonetheless not a fact established with absolute certainty. Science deals in best available theories at present to explain observable reality, not on absolute facts. Picking something that's less settled, how subjective would you say the current theories on man-made climate change are?

Gravity is still a theory.

http://en.wikipedia.org/wiki/Scientific_theory

That is to say, the extent to which we understand it has been proven by experiment. Knowing absolutely everything about something is not a requirement to understand a part of it.

I fail to see how the lobby impeded scientific study of Climate Change is of any relevance. In fact, I would compare that to the video game industry in the sense of publishers trying to buy review scores. That is to say, compare that to oil and otherwise polluting industries pouring millions into disproving anything that would impact their operations, fact or not.

Confirmation bias with a strong element of willing ignorance and callous uncaring self interest.

Kinda like the people looking for positive review scores to "prove" that their game is better than everyone else's. Rather than, you know, explain by which merits it is.

StreamerDarkly:

That you didn't bring up the notion of a consensus is exactly the problem. It's impossible to have a meaningful discussion on this topic unless you acknowledge degrees of subjectivity instead of carelessly throwing around the term as a shield. Notice also that in the previous post I specifically avoided numbers by sticking with basic quality descriptors of the kind gamers use every day.

Oh, so now it's called degrees of subjectivity? That's different from the "not subjective" you insisted on before.

Is there a general consensus on what constitutes quality in any given context? Yes.

Can this be mathematically formulated and should it be? No.

StreamerDarkly:

It's not about necessity, it's about utility. To repeat it for the second time, there's no argument that the written component of a review should be removed in favor of a numerical rating.

It's not about utility, it is about enabling readers to ignore the reasoning for the abstract. It facilitates laziness and nothing else.

My argument is that the numerical rating is not needed in the slightest. The only thing a review should communicate is whether it lives up to the expectations of quality the reviewer has and explain why it does or doesn't do that.

Do the writer like this product or not? And why?

The text is the only important part of the review. If the text contradicts the general consensus of what constitutes quality, or fails to address your concerns, then you read another review.

The more reviews you read, the greater your understanding of the product and the more informed your purchasing decision is.

Numbers not necessary.

In fact, I'd argue that the Steam Review format is one of the best out there. A simple "Do I recommend this Yes/No" rating system and as much text as you need to explain why you would or wouldn't.

Especially useful to read several recommendations as well as non-recommendations, as it gives a more complete picture of the advantages and disadvantages of the product. The latter being especially important for PC games, which aren't guaranteed to work at all.

NPC009:

Johnny Novgorod:
I'm against numerical scores in game reviews for a simple reason - I have worked as a movie critic for 5 years and not a day has gone by that I haven't rued the numerical score system. It belittles everything I write by reducing it to a simple number. And honestly every time I put a score up it's mostly based on gut feeling. I ask myself, "Does this review read like a 9, or is it leaning towards 8?". I flip a coin, stamp a number and move on. I'm trying to translate everything I've argued about - which takes a considerably amount of time, by the way - into a number for lazy people who don't care much for readin' around these parts, mister.

People give numerical scores way too much credit.

I've worked as a game critic for nearly ten and while gut feelings are surprisingly important when assigning scores, most publications I wrote for actually explained with the scores meant. Something like this:

1-3 - So bad it's (almost) funny. But seriously, stay the hell away.
4-5 - Nice try, but it's hard to find something worthwhile in here.
6-7 - Decent game, but won't appeal to everyone.
8-9 - Good game, safe buy for most people.
10 - Instant classic. You need to play this!

Made things a lot easier and more transparent.

Everybody has their lil' slider going from 1 to 10, good to bad, red to green. Of course there's always some sort of manual to follow when scoring something. I find it easier to just deduce the number from what I just wrote. Does the review feel like a 6, 7, 8, what? 1 and 10 are the easiest numbers to apply. Everything in the middle is prone for discussion.

7.8 too much data

Now in a more serious note, this needs more focus on the other side of the score: the readers. After all, a score is useless if the reader doesn't interpret its meaning.

StreamerDarkly:

...most gamers would agree that 81-100 territory should be reserved for truly top notch efforts.

It's ironic that I ask this: do you have more information to support this statement?

veloper:
Any professional critic or reviewer who cannot even decide if he thought the experience he had, was good, bad or somewhere in between, is unfit for his job.

If he or she can, then he/she can also assign a score very easily, even if it is only a 1, a 2 or a 3, out of 3.
An experienced gamer, with some more games for comparison, could be much more precise, using a much wider scale, but there's always a like-o-meter in that head.

The only question that remains then is, as a reviewer, do I want to share my overall impression, or final recommendation with the reader?
If yes, then scores: sure why not.
If no, then I shouldn't make any subjective conclusion or recommendation, so I might as well not bother with the review at all.

There is no fundamental difference between using words like good or bad, or to use numbers that are assigned to those words in a pre-defined table.
All that's left then is foolish people whining about too much precision in other people's reviews, to whom I say: in your mind round that 7.7 out of 10 down to a 8/10 or a 4/5 or a 3/3 or maybe even a 1/1 and everybody is happy.

As for metacritic: don't like it? Then just don't go there. Another problem solved.

Some space in the article was spent explaining why I think coarse two-level and three-level scales are bad. Perhaps some clarification can be added now.

I can see the appeal of binary value judgments. Nothing could be simpler and more direct than to conclude a review with a "play / don't play" decision. It's firm. It shows some balls. It might even be respected as consumer-friendly or 'honest'. It's the ultimate distillation of a wall of text.

The problem is that it's too unambiguous. It presumes to make up the reader's mind for them. Let's assume that I find Kotaku's reviews insightful and would like to act on their advice, but it just so happens that I the reader don't have the means, inclination or spare time to buy every single game they recommend. What to do? Perhaps I can read back over their recent reviews and try to figure out which game(s) they liked best. But they recommended quite a few, so this will take some significant effort and is also prone to misinterpretation.

A finer scale such as the 10-point system solves this problem nicely. I can easily identify my preferred reviewer's top-rated games over the last few months, read the associated reviews if I haven't done so already, and decide which one(s) to spend my limited funds/time on. You need at least a few notches on the scale between 'playable' and 'amazing' to provide this sort of selectivity. It also allows review score aggregation sites to be consulted as a verification that the critic's score isn't completely out to lunch, thereby avoiding bad purchases without needing to read five or more complete reviews.

CaitSeith:
7.8 too much data

Now in a more serious note, this needs more focus on the other side of the score: the readers. After all, a score is useless if the reader doesn't interpret its meaning.

StreamerDarkly:

...most gamers would agree that 81-100 territory should be reserved for truly top notch efforts.

It's ironic that I ask this: do you have more information to support this statement?

Yes, I do. Unfortunately it's going to involve more data.

Exhibit #1 would be the wealth of comments by gamers stating that AAA titles are shamelessly overrated by popular critics. You don't hear anyone challenging this claim.

Exhibit #2 is based on the analysis of critic and user Metacritic ratings. User scores are typically 5-10 points lower than critic scores, on average. For the selection of games discussed in the article, 38% of critic Metascores fall in the 81-100 range compared to only 22% of UserScores (average user score). If you adjust that to the 85-100 range, it becomes 21% for critics versus just 7% for users.

---Stats computed based on a minimum of 10 critic reviews and 30 user reviews---

That's quite an impressive article. Shouldn't that be submitted to a journal or magazine or something?

I've really never seen the problem with a scoring system. Maybe this is down to the fact that I'm from a very sciencey background and most journalists aren't so they don't have that natural tendency to quantify and systematise. But giving my own experience with a game a number out of 10 feels incredibly natural to me.

What really makes me laugh is when sites say "oh we think our 5 star rating system is a bit vulgar and reductionistic, so we've come up with 3 different awards of 'don't bother', 'try it' or 'buy it'"
Lol well that's functionally indistinguishable from just a 3 star system. You're clearly just not comfortable with numbers.

Tilly:
That's quite an impressive article. Shouldn't that be submitted to a journal or magazine or something?

Thank you, but I don't think it's quite at that level of quality. And if it was, I wouldn't be able to sling petty insults at Jason Schreier and Arthur Gies in the references.

Tilly:
I've really never seen the problem with a scoring system. Maybe this is down to the fact that I'm from a very sciencey background and most journalists aren't so they don't have that natural tendency to quantify and systematise. But giving my own experience with a game a number out of 10 feels incredibly natural to me.

What really makes me laugh is when sites say "oh we think our 5 star rating system is a bit vulgar and reductionistic, so we've come up with 3 different awards of 'don't bother', 'try it' or 'buy it'"
Lol well that's functionally indistinguishable from just a 3 star system. You're clearly just not comfortable with numbers.

This is exactly the case for me as well. I've gradually come to realize that many gamers don't like numerical ratings, whereas in the beginning I thought it was mainly celebrity game journalists intoxicated on the scent of their own farts.

NPC009:
Very impressive post, OP!

From my experience as a reviewer it's spot-on. Especially this:

First of all, there is a positive bias with respect to what quality of game even registers as a blip on the radar of reviewers. If it isn't a big studio release backed by marketing or an indie title blessed by IGF or IndieCade, it generally doesn't receive a mention let alone a dedicated review. There isn't necessarily anything insidious about this state of affairs; I'd wager that at least a few critics regularly sample low budget offerings only to be reminded of why they don't more often. Mind you, I don't assert that marketing buzz is an accurate predictor of game quality, only that the subset of games with enough traction to garner attention from reviewers is, statistically speaking, of above average quality

Many publications are running on low budgets and rely on review copies from publishers to fill their (web)pages. While publishers of 'lesser' games do send in copies and keys (a Nintendo publication I worked at kept getting Barbie and other kids game even though we rarely reviewed them), it's unlikely their game will be picked up for review. Publications have a limited amount of budget and/or pages for reviews and tend to pick games readers are already interested in, which are games being developed by competent, established studios. Once in a while they'll throw in a really bad game for laughs/as a reminder of what really bad actually is. Other spots are filled with games reviewers bought themselves and wanted to share with their readers.

As for whether or not we should do away with review scores... I think they can have a place in review systems, as long as we use them responsibly:

1. Tight scales are a must. There's no good reason to say a game with a 85% rating is better than a game with a 84% rating. It's better to go with something like a five stars system and see the scores (1-2-3-4-5) as categories of quality rather than some arbitrary scale to rank games by.

2. Publications must explain what the scores mean. A 9 from Edge is different from a 9 from IGN, so it's important to provide context.

3. Readers must be reminded that a review is simply one opinion from one person given at one time. The number that accompanies the review is not meant as an absolute or ever-lasting judgment, it's just the reviewers thoughts on its quality in the form of a number.

I apologize for taking so long to respond as this is one the more insightful posts in my opinion.

1. In general I agree that tiny differences on a 100-point scale mean nothing, but my earlier post illustrates the danger of rounding error. I'm still not totally convinced of the merits of a tight scale ... there might be some psychological component involved, but to the technically minded person it just feels like using 8-bit integers when you could be using double floating point variables.

2. Most of them do. Despite all the talk of widely varying standards, it's remarkable how similar the scales are. I contend that the bigger issue is reviewers failing to adhere to their own scales.

3. This is true. On the other hand, I don't think reviewers should feel emboldened to say (or score) whatever the hell they please by hiding behind "it's just an opinion". They should invest significant time in a game, learn its intricacies, and feel a degree of personal responsibility when explaining its merits and weaknesses.

 Pages 1 2 3 NEXT

Reply to Thread

This thread is locked