A second book is supposed to be notoriously difficult to write. There are blogs all over the internet about so-called ‘second novel syndrome.’ As well as being difficult to write, we tend to think of sequels as being worse in quality than the originals.
But this often is not reflected in Goodreads ratings. I have often found that a sequel scores better on Goodreads than the original, so I decided to investigate. I used a sample of ratings data from 53,000 users on the top 10,000 books on Goodreads. This backed up my hunch, the average rating for author’s second books (3.95) is higher than for their first (3.93), amongst the 1,100 authors with two books in the dataset. Maybe ‘second novel syndrome’ is just a myth and authors get better with practice?
Potentially, but the headline numbers here can be misleading. The group of people who rate author’s first and second novels are different. If you read an author’s first book and did not like it, you are less likely to read their follow up. We can see this in the data, 31% of people who rated the first book in a series 5 out of 5 on Goodreads read (and rated) the second book, whereas only 8% of people who rated the book 1 read the second book.
This can artificially increase the rating of sequels relative to an author’s first book. The group who read the second book are more favourable to the author’s writing, than the group who read the author’s first book and therefore push the rating of the second book up.
We can adjust for this composition effect, by comparing ratings just amongst the group who read both books. When we do this, we see that second novels do not quite live up to authors’ first efforts. The ratings for both books increase, as the group of people who decide to read both books, will tend to be more positive towards the author than those who just read 1 of the 2 books. But the average rating of the first book goes up considerably more. With the adjustment the average rating for first books is 4.17, whereas it is only 4.03 for second books.
There are some exceptions to this rule. Dan Brown’s ‘Angel and Demons’ and ‘Da Vinci Code’ have approximately the same rating before adjustments. When just looking at users who rated both, ‘Da Vinci Code’ (the second book) is rated more highly than ‘Angel and Demons.’ Unlike in most series, more people read ‘Da Vinci Code’ and not ‘Angel and Demons’ than vice versa, so in this case it’s ‘Da Vinci Code’ that is dragged down more by the composition effects.
On average though, users do tend to prefer authors’ first books to their second. This is hidden by the fact that users who already know they like an author, are more likely to read the second novel. So next time if you consider skipping the first book in a series because the sequel has a higher rating on Goodreads, think again.
Thanks to Zygmuntz, who scraped the dataset used in this blog. It can be found here. Also, I’ve started an Instagraph here.