Text Mining

Installing MeCab and RMeCab on a Mac

Since my last post, you’ve maybe played around with WebChamame a bit. That site is a powerful tool to see what’s possible in natural language processing, which is a method used in linguistics and other social sciences to create statistics from texts. Although with WebChamame you are able to determine a number of settings and output your data into downloadable files, however, the site has its limitations. Most prominent among them: it takes a long time to find the bits of information you might be interested in.

Statisticians created the computer language R and the editor RStudio to overcome that problem because they need to wrangle data quickly to produce useful information. It is also possible to tokenize premodern Japanese texts in R and RStudio, but it requires installing a program for morphological analysis on your own computer. As you might have noticed in the last post, WebChamame uses MeCab for its morphological analyses.

This is an illustration from Tsutsumi and Ogiso (2015) of WebChamame’s workflow.