Using Google’s Books Ngram Viewer, you can now visualize how language and literature have changed over time, by searching a subset of the more than 15 million books that Google has digitized since 2004. All told, today’s datasets contain more than 500 billion words from 5.2 million books in Chinese, English, French, German, Russian, and Spanish.
The datasets contain phrases of up to five words with counts of how often these occurred each year, providing a great deal of insight – for scholars and casual word hounds – into how language usage changes over time. The datasets were the basis of a research project led by Harvard University’s Jean-Baptiste Michel and Erez Lieberman Aiden and published today in Science that demonstrates how quantitative analysis of texts can offer new insights into areas including censorship, technology adoption, and cultural memory.
And now Google has put that visualization tool into everyone’s hands, along with the ability to download the raw data.
Uses so far? Computer. Seven dirty words. Media types, authors, vegetables, vices, and religions. Political ideologies. Main St. versus Wall Street, inflation vs. deflation, gold vs. oil, and global superpowers. Oh, and there’s a Tumblr for that (and a Twitter hashtag #ngrams),