Keith Shubeck first became interested in memes while losing at StarCraft. As his defenses crumbled, his opponents would taunt him with “AYBABTU,” or “All Your Base Are Belong To Us,” one of the earliest blockbuster memes of the online era.
Shubeck, now doing his Ph.D. in psychology at the University of Memphis, wondered what it was about some memes that made them so popular. “There are all these memes out there competing for our limited cognitive resources. If memes are competing with each other, those that are easier to remember should have an advantage.”
While double rainbows and hot dog legs can boost any meme’s chances, Shubeck realized that a much more basic mechanism influences meme memorability, and therefore success: how we process language. As Shubeck’s collaborator Stephanie Huette, a professor of psychology at the University of Memphis, explains, “Every word, every syllable matters.”
Armed with insights from research into language processing and working memory, Shubeck and Huette are now among a small group of scientists creating machine learning models to predict which memes are more likely to succeed.
Many linguistic factors determine how easily people recall words and sentences. Some factors are straightforward. Shorter words and sentences, for instance, are easier to remember than longer ones. Other factors, however, are far from intuitive but reflect fundamental aspects of how human cognition works. Emotional arousal improves recall, so the presence of words expressing positive and negative sentiments like nice and ugly have some impact on memorability. Similarly, concrete words like house are generally easier to recall than abstract words like proof in short-term memory tasks.
Shubeck and Huette set out to test whether these same linguistic factors shown to influence recall in the lab could account for a meme’s popularity online. They selected a set of 268 memes from knowyourmeme.com and ran them through linguistic analysis tools to determine the presence of a number of linguistic features. In addition to length, emotional arousal and concreteness, they tracked other memorable features, including swear words and purposeful misspellings. They then fed these memes into a neural network – a kind of machine learning model that approximates connections in the brain – that learned how these features correlated with success, defined as 37,400 or more verbatim Google search results.
Though Shubeck cautions that this work is still at the proof-of-concept stage, after the neural network was fully trained, it could predict success with 80% accuracy when exposed to new memes.
For example, the model correctly predicted the success of the meme “Banana for scale.” This meme is short and benefits from the presence of a concrete word, “banana.” The model likewise correctly predicted a lack of success for the meme “Does this look like the face of mercy?” This meme also contains a concrete word, “face,” but because the meme is over four words in length, the model treated it as long.
Shubeck admits there is an element of mystery to how the variables interact, but they did find a few predictors that were statistically significant in determining success. Shorter memes were 2.8 times more likely to be successful. More unexpectedly, the researchers found that memes containing swear words, which the scientific literature shows have a memory advantage over neutral words, were 1.77 times less likely to be successful. “We were a little surprised,” Huette says, “but then figured that people communicating with friends and family may not want to use taboo words.”
Oren Tsur, a post-doctoral researcher at Harvard and Northeastern Universities has used a similar approach in determining which hashtags become popular on Twitter. His 2012 paper described one of the earliest attempts to use a machine learning algorithm to predict the spread of memes based on linguistic content. In addition to linguistic factors such as emotion words and pronouns, his work also shows the importance of orthographic differences, such as capitalization to explain, for instance, why the hashtag #saveTheNHS was more popular than #savethenhs, despite the extra physical effort of typing capital letters. “People are lazy and want to use the simplest hashtag,” Tsur explains. “On the other hand, if a hashtag is all lower case, it is too hard to read. We don’t know what people prefer, but the algorithm finds a balance.”
The ability of these machine learning models to find a balance between multiple factors is one of the most promising aspects of this work. While no single linguistic feature can determine a meme’s success, and scientists have not yet articulated a winning formula, these models are able to learn how several psycholinguistic elements taken together and interacting in complex ways improve a meme’s chances. “It is likely that some features contribute very little and some contribute a lot more,” says Huette. “It is really about the interaction between them that is driving the success so well.”
So what are the implications of machines that can learn to speak in meme? Shubeck envisions a day when his model might be flipped upside down, so that instead of predicting successful memes, it will be creating them, citing the recent case of the neural network that learned how to generate new cards for Magic: The Gathering. “I think people are tickled at the idea of neural networks generating new content,” he says.
That sounds like good fun, but it could be that the first to adopt this technology will have a more self-serving agenda. Tsur points out that companies and political parties are already using data analysts to personalize advertising on sites like Facebook. “If you want to divide into subgroups, you could take the algorithm and train it differently on different subsets of users,” he says. With enough data, his model could potentially show which hashtag is more likely to resonate with a 40-year-old woman who owns her own home versus a 30-year-old man who still lives in his parents’ basement.
Unbeknownst to us, our subconscious cognitive processes may determine much of our online behavior. Once marketers figure out how to play to our psycholinguistic reflexes, we could become defenseless, instinctively sharing their hilarious corporate memes, unaware that somebody set up us the bomb.