Announcement: Text Analysis Tools

Text and content analysis tools.

History Flow

History Flow is a tool created by Fernanda Viegas and Martin Wattenburg as part of the IBM Collaborative User Experience Research Group. Viegas’ and Wattenburg’s creation visualizes “dynamic, evolving documents and the interactions of multiple collaborating authors. In its current implementation, history flow is being used to visualize the evolutionary history of wiki* pages on Wikipedia” (history flow home page). The tool works by color coding edits according to the user who makes the changes. The result is a richly detailed visual overview of the life of a page. History Flow’s outputs allow visual analysis of issues critical to the credibility of Wikipedia, such as collaboration, vandalism, edit wars, etc.

Accessed online or in downloadable form, WordNet allows users to tap intelligently into “a large lexical database of English” for the purpose of exploring concepts and their interrelations.

“Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. The resulting network of meaningfully related words and concepts can be navigated…. WordNet’s structure makes it a useful tool for computational linguistics and natural language processing.” In essence, WordNet can be conceived of as an extremely high-powered, interactive thesaurus that facilitates the rapid pursuit of conceptual relations and affiliations—a kind of “rapid prototyping” of language-based concepts. While reading a poem, for instance, one might use WordNet to explore the author’s choice of a particular word by seeing the word cocooned within a structured universe of alternative and related “synsets.” Developed by a team led by George A. Miller, Professor of Psychology, Emeritus, Princeton University.

Text Analysis Portal for Research (TAPoR)

TAPoR is a web portal for analyzing digital text. The project is based out of McMaster University and includes a project team from six different Canadian universities.

Created by a consortium of Canadian universities, TAPoR is a collection of online text-analysis tools—ranging from the basic to sophisticated—that allows users to run search, statistical, collocation, extraction, aggregation, visualization, hypergraph, transformation, and other “tools” on texts. (The site comes seeded with prepared texts, but users can sign up for a free account and input their own.) TAPoR allows tools to be mixed and matched in a mashup-style “workbench.” Particularly impressive is the “recipes” page, which in step-by-step fashion suggests ways that tools can be combined for particular purposes—e.g., identify themes, analyze colloquial word use, visualize text, explore changes in language use by a writer, create an online interactive bibliography, build a social network map from text, create a chronological timeline from bibliographical text, etc. As regards the general philosophy of TAPoR, which descends from the mature computational-linquistics side of humanities computing (the oldest use of computers for the humanities), project developer Geoffrey Rockwell says at the beginning of his article for the Text Analysis Developers Alliance (TADA) entitled “What is Text Analysis?”:

Text analyis tools aide the interpreter asking questions of electronic texts:” Much of the evidence used by humanists is in textual form. One way of interpreting texts involves bringing questions to these texts and using various reading practices to reflect on the possible answers. Simple text analysis tools can help with this process of asking questions of a text and retrieving passages that help one think through the questions. The computer does not replace human interpretation, it enhances it…. (1) Text-analysis tools break a text down into smaller units like words, sentences, and passages, and then (2) Gather these units into new views on the text that aide interpretation.”

The Coh-Metrix Project is run by the Institute for Intelligent Systems at the University of Memphis. The project utilizes two computer programs, Coh-Metrix and CohGIT, to assess the difficulty of a given text. Coh-Metrix analyzes a text for its overall “cohesion,” a major factor in textual coherence. CohGIT pinpoints the areas of a text where gaps in cohesion occur. The goal of the project is to provide writers and educators with the ability to match texts with proper target audiences.

“How do you know if something you’ve written is too difficult for your intended audience? How can you tell if your writing makes sense – for the reader you have in mind? Recent advances in the areas of cognitive science, computational linguistics, educational research, and computer science are guiding us toward answers to these questions. These answers are coming to life within a web-based text analysis tool called Coh-Metrix. Using advanced technologies, Coh-Metrix will allow readers, writers, educators, and researchers to instantly gauge the difficulty of written material, based on the target audience. Moreover, CohGIT, our cohesion gap identification tool, will pinpoint where potential problems are hiding within a text.

The potential contributions of Coh-Metrix and Coh-GIT are innumerable. This project will benefit writers, editors, researchers, and policy makers. Our overarching goal is to develop methods and standards for improving academic textbooks, thus improving students’ ability to understand and learn difficult course material” (from Coh-Metrix Project website).

Software program that culls meaning from searchable text.

“ConceptNet focuses on semantic meaning in a text, analyzing concepts and the contexts in which they are found, offering a unique approach compared to traditional keyword or statistical evaluations of texts.

ConceptNet has been used as the basis for several programs designed to distill particular meanings from texts (affect, for example) and provide intelligent feedback about the text’s content.” (From Katrina Kimport’s research report.)

