Sunday, 1 February 2015

Body Breakdown

For our assignment this week, we were asked to play around with a few tools which can help quantify the written word. I chose to work with the Text-Token Ratio analysis tools, and employed them on a small comparison of some more famous novels between two genres - Fantasy, and Science Fiction. Essentially, this means I am looking at how often the author repeats words. A set of characters between spaces (usually, a word) is recognized as a token, and every time a unique token is recognized, it is recorded as a type. Repeated words count as tokens, but do not create new types, so the ratio of types to tokens gives you some idea of how often words are repeated.

I did not deliberately select these novels to try to make any particular point, they were essentially the first novels that came to mind and were easily obtainable... otherwise, Dune would have made the Sci-Fi list, and I would have worked with A Game of Thrones instead of A Dance with Dragons. Though I can't imagine there being any legal issues to this analysis, it's worth noting that I actually own physical copies of all these books.

After selecting these six, I had predicted that Tolkien and Martin would have the highest TTR, especially working with a limited set of tokens so the length of their books do not work against them. It seems that I could hardly have been more wrong! The two of them are both closer to Card's relatively easy Ender's Game than they are to Snow Crash or Neuromancer. It is difficult to fathom why this might be the case. After all, both genres are more than happy to indulge in making up new words. Perhaps the length of a book affects more than just the TTR consideration for the whole piece - a longer book is often paced quite differently. The author may spend more time detailing any given location when there are 800 pages to go, while an author trying to fit their whole story in under 300 won't spend precious time informing the reader about dust motes floating in a sunbeam.

It is, of course, worth repeating that TTR is in no way an indication of the quality of a book. It can give some idea regarding the reading level of a particular book, but I would certainly not say that Neuromancer is the most difficult book in this list. Really, they're all fantastic and you should read them all.

Novel and Author Types Tokens TTR
C.S. Lewis' The Lion, The Witch, and the Wardrobe 2,129 20,000 10.6%
J.R.R. Tolkien's The Fellowship of the Ring 3040 20,000 15.2%
George R.R. Martin's A Dance with Dragons 3430 20,000 17.2%
Science Fiction
Orson Scott Card's Ender's Game 2647 20,000 13.2%
Neal Stephenson's Snow Crash 4313 20,000 21.6%
William Gibson's Neuromancer 4401 20,000 22.0%

No comments:

Post a Comment