Since my last post, you’ve maybe played around with WebChamame a bit. That site is a powerful tool to see what’s possible in natural language processing, which is a method used in linguistics and other social sciences to create statistics from texts. Although with WebChamame you are able to determine a number of settings and output your data into downloadable files, however, the site has its limitations. Most prominent among them: it takes a long time to find the bits of information you might be interested in.
Statisticians created the computer language R and the editor RStudio to overcome that problem because they need to wrangle data quickly to produce useful information. It is also possible to tokenize premodern Japanese texts in R and RStudio, but it requires installing a program for morphological analysis on your own computer. As you might have noticed in the last post, WebChamame uses MeCab for its morphological analyses.
