Science Discovers the Secret to Successful Writing

 Pages 1 2 NEXT
 

Science Discovers the Secret to Successful Writing

Researchers from Stony Brook University have created an algorithm that can predict successful books with 84 percent accuracy.

Writing isn't always the easiest thing to do. Granted, you can say that about most any job or activity, but sometimes the art of putting pens to paper and fingers to keys does feel like a perplexing thing to get a handle on. This is especially the case when you're talking about writing something as lengthy and complex as a novel. Knowing where to start, where to go and how to get there can be incredibly frustrating; even more so when you consider that there's no way to know if anyone will even like the finished product. Expect maybe there is.

Researchers at Stony Brook University in New York have apparently developed an algorithm that can predict whether or not a book is going to be commercially successful. Utilizing statistical stylometry, a mathematical method of looking at words and grammar, the algorithm can be used to give an accurate determination as to whether or not a book destined for the bestseller's list or the bargain bin. How accurate you ask? It gets it right 84 percent of the time.

The algorithm was developed by analyzing more than 800 books, many taken from the Project Gutenberg archive of classic literature, and then comparing them to the real world success they enjoyed. Conversely, the researchers also analyzed unsuccessful books to see if they could find a common factor that led to their failure. These included both commercial failures and critical failures which they found by looking at Amazon's low selling books and by researching critical reviews.

All of this said, you're probably wondering what key elements they determined drive a novel toward success or failure. The research found that books primed to fail had a tendency to overuse verbs and adverbs and included more language describing explicit actions. Successful books meanwhile spent more time describing thought processes and also made a habit of using conjunctions more heavily. There were, of course, other factors involved -luck, for instance- which can be found in the team's official report.

Source: Telegraph

Permalink

Sad as that is, the thing with conjunctions explains the success of "No Country For Old Men" to me. The author really loves the word "and" - the average usage per sentence is around 3-4 and I've seen one sentence have 9.

Best-sellers aren't well written per se, just how successful movies don't have to necessarily have good cinematography(medium specifc aspects).

Also, 50 Shades of Grey was a best-seller, let's not forget that.

In addition, the metrics that define by which a book will be successful seem non-existant to me, it's all about speaking to the zeitgeist, be it criticising it(A Picture of Dorian Gray, Jane Eyre) referencing it(Cantenbury Tales, World War Z) or exploring specific aspects of it(Game of Thrones: criticism of human nature, in line with self-analysis and critique that seems to be frequent in 21st century western/northern hemisphere culture), etc.

I feel that throwing maths into the mix will be nothing more than a meta-analysis, which is unreliable.

Akichi Daikashima:
snip

Note that the article isn't titled "Scientist Discovers the Secret to Good Writing". Nobody is saying that these metrics will turn you into Hemingway, it's tracking trends in popularity and taste, which pretty much defines "success". If the best writer in the world can't sell his book, he isn't successful and there are a multitude of reason why that might happen.

84% accuracy? I can't say I buy it. I think too many factors beyond the writing itself determine success for that kind of ratio to be realistic.

I find it dubious in the extreme that someone would try to reduce that phenomenon known as creative writing to a statistic. People are unpredictable, ergo so is their response to books. What you have with the accrued knowledge here is an educated guess, but I could make such a guess too.

JamesBr:

Akichi Daikashima:
snip

Note that the article isn't titled "Scientist Discovers the Secret to Good Writing". Nobody is saying that these metrics will turn you into Hemingway, it's tracking trends in popularity and taste, which pretty much defines "success". If the best writer in the world can't sell his book, he isn't successful and there are a multitude of reason why that might happen.

Yes, and I said that succesful writing is also a bit trend defiant in in of itself, and is usually reflective of the zeitgeist in a way(from my experience anyhow)

I'm curious how they tested it. After a semester of Machine Learning I think I understand the basic theory behind it, but if they were just attempting to develop an algorithm for sorting existing data (the 800 books mentioned) it does nothing beyond describe the data you put into it.

Skimming the report, they are indeed using a Support Vector Machine on sentences taken from the books in question. Although the method is valid as a machine learning technique I'm not certain that the results are at all meaningful.

Moreover, the method they are using as a measure of success is the number of downloads the file has from Project Gutenberg. Given the nature of the books found on Project Gutenberg, I think they might skew too heavily towards just describing properties of books likely to be used in school research projects.

Now if only we can apply it to fanfiction, the cycle will be complete.

FalloutJack:
I find it dubious in the extreme that someone would try to reduce that phenomenon known as creative writing to a statistic. People are unpredictable, ergo so is their response to books. What you have with the accrued knowledge here is an educated guess, but I could make such a guess too.

Normally I would agree but here is a quote form the abstract "We examine the quantitative connection, if any, between writing style and successful literature." So they are not trying to look at the creative aspect so much as the tone and some of the technical style. Their conclusion is basically that book that are more conversational and focus more on the things people commonly care about in their descriptions sell better, not a greatly outlandish claim.

zerragonoss:

FalloutJack:
I find it dubious in the extreme that someone would try to reduce that phenomenon known as creative writing to a statistic. People are unpredictable, ergo so is their response to books. What you have with the accrued knowledge here is an educated guess, but I could make such a guess too.

Normally I would agree but here is a quote form the abstract "We examine the quantitative connection, if any, between writing style and successful literature." So they are not trying to look at the creative aspect so much as the tone and some of the technical style. Their conclusion is basically that book that are more conversational and focus more on the things people commonly care about in their descriptions sell better, not a greatly outlandish claim.

But that isn't the whole picture, though. You can't render a statistic with half of the information. No wait, let me amend that, because people do that all the time. You can't render an accurate statistic with half of the information. You can determine what colors and shapes will be the most-pleasing to people, add some sound effects, and make Tetris...but it won't be a viable means of capturing art in a theorum.

zerragonoss:

FalloutJack:
I find it dubious in the extreme that someone would try to reduce that phenomenon known as creative writing to a statistic. People are unpredictable, ergo so is their response to books. What you have with the accrued knowledge here is an educated guess, but I could make such a guess too.

Normally I would agree but here is a quote form the abstract "We examine the quantitative connection, if any, between writing style and successful literature." So they are not trying to look at the creative aspect so much as the tone and some of the technical style. Their conclusion is basically that book that are more conversational and focus more on the things people commonly care about in their descriptions sell better, not a greatly outlandish claim.

Their claim for failing books also seems fairly accurate.

If you've ever read pieces of utterly horrible fan-fiction you'll be fairly familiar with overuse of verbs, adverbs and descriptions of explicit situations.

Even if you have the most creatively wonderful storyline in mind it'll still end up being disliked by many people if you lack the writing style to describe it.

I'm dumb, somebody give me an example.

"The research found that books primed to fail had a tendency to overuse verbs and adverbs and included more language describing explicit actions. Successful books meanwhile spent more time describing thought processes and also made a habit of using conjunctions more heavily."

So basically they're saying that well written books are more successful. Sorry scientists, it looks like you just learned something the rest of us learned years age. 5 minutes in a creative writing class could have taught you this much. Someone post a slow poke meme.

Gary Thompson:
I'm dumb, somebody give me an example.

I just watched that episode of DS9 that your avatar came from yesterday.

But I can't help you with an example.. :)

Gary Thompson:
I'm dumb, somebody give me an example.

1. Another forum regular thought carefully about this request. He then joyfully typed a carefully-worded example to illustrate the described differences.
2. I read your post and posted an example.

You have to be smart and creative to write response #1. The responses say the same thing. Response #2 is still better. Nobody wants all of the useless adjectives and adverbs. Too flowerly = bad novel.

A 16% margin of error? That's actually quite a large chunk. Not to mention the fact that as someone said, the only reason why some books are successful is because of certain factors that have nothing to do with the actual quality of the writing itself.

JamesBr:

Akichi Daikashima:
snip

Note that the article isn't titled "Scientist Discovers the Secret to Good Writing". Nobody is saying that these metrics will turn you into Hemingway, it's tracking trends in popularity and taste, which pretty much defines "success". If the best writer in the world can't sell his book, he isn't successful and there are a multitude of reason why that might happen.

Yes. Also, opinions tend to be divided about what constitutes quality or good writing. This experiment might be more interesting and useful if they based it not on sales, but on ratings and reviews, and separated books into categories, calculating book scores for each individual category.

FalloutJack:
I find it dubious in the extreme that someone would try to reduce that phenomenon known as creative writing to a statistic. People are unpredictable, ergo so is their response to books. What you have with the accrued knowledge here is an educated guess, but I could make such a guess too.

People aren't as unpredictable as you might hope.

Clankenbeard:

Gary Thompson:
I'm dumb, somebody give me an example.

1. Another forum regular thought carefully about this request. He then joyfully typed a carefully-worded example to illustrate the described differences.
2. I read your post and posted an example.

You have to be smart and creative to write response #1. The responses say the same thing. Response #2 is still better. Nobody wants all of the useless adjectives and adverbs. Too flowerly = bad novel.

Or in short: It's in poor taste to use more words than are needed; part of the appeal to reading is letting your imagination fill-in some of the blanks.

FalloutJack:
I find it dubious in the extreme that someone would try to reduce that phenomenon known as creative writing to a statistic. People are unpredictable, ergo so is their response to books. What you have with the accrued knowledge here is an educated guess, but I could make such a guess too.

That's not entirely true. Yes on an individual level you might find quite a few differences, but as a whole we are pretty homogenous on a lot of topics and trends. You see it in both culture and nature. There are just certain appearances and trends that resonate with the vast majority of people.

Off the top of my head (and from a few years back so might be a few inaccuracies), but there is a reason most people find baby animals cute. It has something to do with the a small body and out of proportion features being aesthetically pleasing. I believe its the same for human children as well.

Key point of note here. This is in terms of financial success I.e sales, not whether or not the story is good, just weather or not it will sell. To illustrate, McDonalds sells well...but I don't think anyone would call McDonalds great food.

vid87:
Sad as that is, the thing with conjunctions explains the success of "No Country For Old Men" to me. The author really loves the word "and" - the average usage per sentence is around 3-4 and I've seen one sentence have 9.

YES, Cormac McCarthy's simplistic style of writing inspired me to start writing myself... My writing is terrible but Cormac's spare prose makes me think that you dont need to be fancy to write a great book, and i certainly have a great story to tell.

That being said, posters are right, data collection for this study seems odd. Specially since its based purely on internet (and just from specific sites) sales and not on reviews.

Reminds me of that Roald Dahl story "The Great Automatic Grammatizator". A man in the story believes that the rules of grammar are fixed to certian mathmatic principles, and he creates a machine that can create award winning novels with ease, and with it, he begins to destroy human creativity.

I'm not telling them they need to go broke making this, but 800 seems like a puny sample size. No comment about the rest...

Falterfire:
I'm curious how they tested it. After a semester of Machine Learning I think I understand the basic theory behind it, but if they were just attempting to develop an algorithm for sorting existing data (the 800 books mentioned) it does nothing beyond describe the data you put into it.

Skimming the report, they are indeed using a Support Vector Machine on sentences taken from the books in question. Although the method is valid as a machine learning technique I'm not certain that the results are at all meaningful.

Moreover, the method they are using as a measure of success is the number of downloads the file has from Project Gutenberg. Given the nature of the books found on Project Gutenberg, I think they might skew too heavily towards just describing properties of books likely to be used in school research projects.

Bang on what I was going to say. Is PG strictly public domain books? So if you wanted to write a book that was successful 100 years ago, feel free to take this advice.

WhiteTigerShiro:
People aren't as unpredictable as you might hope.

NightHawk21:

That's not entirely true. Yes on an individual level you might find quite a few differences, but as a whole we are pretty homogenous on a lot of topics and trends. You see it in both culture and nature. There are just certain appearances and trends that resonate with the vast majority of people.

Off the top of my head (and from a few years back so might be a few inaccuracies), but there is a reason most people find baby animals cute. It has something to do with the a small body and out of proportion features being aesthetically pleasing. I believe its the same for human children as well.

Look, I can understand where you're coming from, guys, but honestly the mob mentality doesn't cover it all. It proves that you can get a general broad-strokes notion of human behavior, but if it were totally predictable, we wouldn't have companies wondering why their latest cool product is a flop, even when you talk to random focus groups to get in on what people think. How bad must it be when a fairly logical approach actually gets you into hot water? Basing products off of the opinions of actual customers to supply a demand should be quite natural...but in practice it will always be much harder than in theory. It is my belief that you can do everything right and still get nothing to show for it because of the random moods of people. We may be the ones who HAVE logic, but we are not a logical people for the most part.

Akichi Daikashima:
Also, 50 Shades of Grey was a best-seller, let's not forget that.

Sex doesn't sell porn books to women, obsession sells porn books to women.

There have been other studies of successful books and successful Literotica stories (which is a massive porn story site women use if you didn't know) and the most successful stories always contain one human obsessed with another or at least something they do that human.

Women are apparently obsessed with obsession. And 50 Shades of Grey contains a whole bunch of that.

Of course if you were trying to sex sell porn books to men then you would need lots of sex in it, but men don't really read porn books, we prefer visual porn.

Tiamat666:

Yes. Also, opinions tend to be divided about what constitutes quality or good writing. This experiment might be more interesting and useful if they based it not on sales, but on ratings and reviews, and separated books into categories, calculating book scores for each individual category.

Critical opinion is so highly subjective. One persons masterpiece is another's piece of garbage. Even the great writers of the past have literary lovers who despise them and find their work to be crap.

Look to Goodreads for an example, it's the largest community of book lovers out there who rate everything. Over time, every authors score tends to stabilize at about 3. Why 3? 3 is the medium point between the lowest and highest scores.

A book/author might start off high or low, but as more readers come in with their opinion it invariably levels off.

My opinion is that once you leave off a very basic level of grammar/spelling behind (and even then, for most readers according to surveys I've read...) not a single book rises above the subjective "Some will like it, some won't."

In the end, my partner hates the Grapes of Wrath, so I'm forced to either assume literary critique is fully subjective and arbitrary, or get a separation.

Falterfire:
I'm curious how they tested it. After a semester of Machine Learning I think I understand the basic theory behind it, but if they were just attempting to develop an algorithm for sorting existing data (the 800 books mentioned) it does nothing beyond describe the data you put into it.

Skimming the report, they are indeed using a Support Vector Machine on sentences taken from the books in question. Although the method is valid as a machine learning technique I'm not certain that the results are at all meaningful.

Moreover, the method they are using as a measure of success is the number of downloads the file has from Project Gutenberg. Given the nature of the books found on Project Gutenberg, I think they might skew too heavily towards just describing properties of books likely to be used in school research projects.

You got it right here.

This is all postdiction rather than prediction, they did only a little bit of further analysis on books outside that data set.

Also did you see later on that they counted Dan Brown's "Lost Symbol" as unsuccessful! Their defense is that they mean literary success and not financial success... but there's no secret that metrics for finding better written books will identify more respected literary stories.

The fact that an author knows correctly how to use the word "whom" probably puts them in the top 84% on its own!

You're also right that Project Gutenberg is not a good sample. They're free for a start!

Anyway still interesting.

StewShearer:
Researchers from Stony Brook University have created an algorithm that can predict successful books with 84 percent accuracy.[quote]

OK, firstly from the study:

[quote]We use the download counts in Gutenberg-catalog as a surrogate to measure the degree of success of
novels.

Real world success had absolutely nothing to do with the study. Their measure of success was solely how many times each book was downloaded from Gutenberg. I would say this is a fatal flaw in the study that makes the result essentially meaningless. What they are looking at is not how successful a book was, either in sales or critical reception and public opinion, but simply how well known its name and/or author later became. What they call a successful book is actually just a book that more people go looking for on Project Gutenberg.

After this, they also looked at a couple of prize winning books and a few with low Amazon sales rankings, making that three entirely different measures of success that are in no way comparable.

Secondly, about that "84%" result. They absolutely cannot predict a book's success with 84% accuracy. Out of 15 different ways of classifying text that they tried, one of them managed 84% for one of the eight genres they had split books into. The worst result for that same classification for 57% for a different genre, and its average across all genres was right in the middle (see table 2).

So overall, a poor study that uses a skewed sample and multiple incompatible success criteria, followed by a poor article that doesn't even report what the study said accurately.

vid87:
Sad as that is, the thing with conjunctions explains the success of "No Country For Old Men" to me. The author really loves the word "and" - the average usage per sentence is around 3-4 and I've seen one sentence have 9.

That's a Cormac Mcarthy thing, check Blood Meridian and The Road, he does the same thing, it's his style, but in my opinion it makes up for irritating me. I loved No Country for Old Men and The Road, but quit on Blood Meridian. What annoys me the most is how he doesn't indicate when a character is speaking, so you stat the line reading it as "just another phrase" when it's actually dialogue, and in Blood Meridian, most of the time, there's not even any ID as to who's saying what... The main character in a kid he calls simply "kid", now that might be cool until you get to passages where there's a mention of a "young man" or a "child" or whatever, then you start having problems if by saying "kid" he refers to the protagonist or the child... Well, I like him and I'm looking forward for The Counselor... All I can say is that reading his stuff requires a clear mind, I quit Blood Meridian and jumped into Doctor Sleep (Stephen King), it's like a ride in a cozy Disney theme attraction...

MASTACHIEFPWN:
Reminds me of that Roald Dahl story "The Great Automatic Grammatizator". A man in the story believes that the rules of grammar are fixed to certian mathmatic principles, and he creates a machine that can create award winning novels with ease, and with it, he begins to destroy human creativity.

I didn't know Roald Dahl did a biography of Steven King.
But seriously, this program could've only done so much. Computers don't yet have souls.

800 books is a really small sample size especially when you consider how many books are written each year.

Milanezi:

vid87:
Sad as that is, the thing with conjunctions explains the success of "No Country For Old Men" to me. The author really loves the word "and" - the average usage per sentence is around 3-4 and I've seen one sentence have 9.

What annoys me the most is how he doesn't indicate when a character is speaking, so you stat the line reading it as "just another phrase" when it's actually dialogue, and in Blood Meridian, most of the time, there's not even any ID as to who's saying what...

I actually remember having that same problem - I would read a few lines, then lose the pattern of who was speaking and have to re-read everything just to set myself straight.

 Pages 1 2 NEXT

Reply to Thread

Log in or Register to Comment
Have an account? Login below:
With Facebook:Login With Facebook
or
Username:  
Password:  
  
Not registered? To sign up for an account with The Escapist:
Register With Facebook
Register With Facebook
or
Register for a free account here