Our goal is
to use an expectation-maximization algorithm to learn the parameters of a model that we will then use to forecast, without
peeking at the tags, whether a given game is likely to be considered by Steam users as "indie", or not.
A recent Masters graduate from the school of Aerospace Engineering at Colorado University at Boulder decided to use his years of advanced maths training to answer the question that really matters: What is an indie game? The debate about how to define "indie" has been going on for years, and talking about it has put us no closer to a conclusive answer than we were before. Jackson Wagner developed a learning algorithm, fed it metrics about a whole slew of Steam games, like Tags, Scores, and Pricing. Eventually, the algorithm learned enough to be able to spot indie games about 2 of 3 times based on those metrics, having removed the "indie" tag, which is obviously cheating.
I'll spare you my hazy recollections of Gaussian mathematics, but if you're interested in the nitty gritty, the entire paper available for download here!
Generally speaking, it seems these metrics act like social demographics, or internet advertising, which takes some data about a person, compares that to the data of millions of others, and makes an educated guess about how that person will behave, based on empirical data. Just think of Steam stats as the browser cookies of game genres.