About | Project Members | Research Assistants | Contact | Posting FAQ | Credits

Text Vizualization

History Flow

History Flow is a tool created by Fernanda Viegas and Martin Wattenburg as part of the IBM Collaborative User Experience Research Group. Viegas’ and Wattenburg’s creation visualizes “dynamic, evolving documents and the interactions of multiple collaborating authors. In its current implementation, history flow is being used to visualize the evolutionary history of wiki* pages on Wikipedia” (history flow home page). The tool works by color coding edits according to the user who makes the changes. The result is a richly detailed visual overview of the life of a page. History Flow’s outputs allow visual analysis of issues critical to the credibility of Wikipedia, such as collaboration, vandalism, edit wars, etc.

Starter Links: History Flow home page | IBM Collaborative User Experience home page | Wikipedia


Web-recombiner program that applies Andruid Kerne’s theory of “interface ecology.”

Created by Andruid Kerne and the Interface Ecology Lab at Texas A&M, CollageMachine allows users to explore a recombinant information space, where different web elements surface, blend, and adapt to their browsing. The program automatically seeks out and imports media elements of interest and continuously streams these elements into the user’s field of view. Thus, the user is able to locate information and to generate conceptual links that may not have been possible with a traditional web browser. CollageMachine has been further developed in the Interface Ecology Lab as combinFormation, an agent-driven tool that can be used online to build collage-style combinations of visual and textual scraps from web sites, allowing the user then rearrange and reprioritize the found-data to facilitate the discovery of relations.

Starter Links: Andruid Kerne’s home page | combinFormation | Interface Ecology Website (Texase A&M)

Transliteracies Research ReportTransliteracies Research Report By Nicole Satrosielski

Text Analysis Portal for Research (TAPoR)

TAPoR is a web portal for analyzing digital text. The project is based out of McMaster University and includes a project team from six different Canadian universities.

Created by a consortium of Canadian universities, TAPoR is a collection of online text-analysis tools—ranging from the basic to sophisticated—that allows users to run search, statistical, collocation, extraction, aggregation, visualization, hypergraph, transformation, and other “tools” on texts. (The site comes seeded with prepared texts, but users can sign up for a free account and input their own.) TAPoR allows tools to be mixed and matched in a mashup-style “workbench.” Particularly impressive is the “recipes” page, which in step-by-step fashion suggests ways that tools can be combined for particular purposes—e.g., identify themes, analyze colloquial word use, visualize text, explore changes in language use by a writer, create an online interactive bibliography, build a social network map from text, create a chronological timeline from bibliographical text, etc. As regards the general philosophy of TAPoR, which descends from the mature computational-linquistics side of humanities computing (the oldest use of computers for the humanities), project developer Geoffrey Rockwell says at the beginning of his article for the Text Analysis Developers Alliance (TADA) entitled “What is Text Analysis?”:

Text analyis tools aide the interpreter asking questions of electronic texts:” Much of the evidence used by humanists is in textual form. One way of interpreting texts involves bringing questions to these texts and using various reading practices to reflect on the possible answers. Simple text analysis tools can help with this process of asking questions of a text and retrieving passages that help one think through the questions. The computer does not replace human interpretation, it enhances it…. (1) Text-analysis tools break a text down into smaller units like words, sentences, and passages, and then (2) Gather these units into new views on the text that aide interpretation.”

Starter Links: TAPoR | TAPoR Recipes | Text Analysis Developers Alliance (TADA)

Peter Cho, “Typotopo”Transliteracies Research Report

Typotopo is a website that collects the typographical and topographical work of Peter Cho.

This site represents the space where typography and topography overlap: explorations of type in virtual environments, experiments in mapping, and innovations in textual display. TYPOTOPO examines how the act of reading evolves when letters and words, viewed both as text and image, are placed in interactive and dynamic environments. TYPOTOPO explores typographic information spaces and the possibilities for playful, expressive letterforms (Typotopo).

Starter Links: Typotopo | Peter Cho’s home page

Transliteracies Research ReportTransliteracies Research Report By Kate Marshall

Self Organizing Maps

“The SOM is an algorithm used to visualize and interpret large high-dimensional data sets. Typical applications are visualization of process states or financial results by representing the central dependencies within the data on the map.

The map consists of a regular grid of processing units, “neurons”. A model of some multidimensional observation, eventually a vector consisting of features, is associated with each unit. The map attempts to represent all the available observations with optimal accuracy using a restricted set of models. At the same time the models become ordered on the grid so that similar models are close to each other and dissimilar models far from each other.”

The map that can be seen through the following link is a great application of the SOM idea made by Andre Skupin:

In Terms of Geography

“Description of Content:
Visualization of the geographic knowledge domain based on more than 22,000 conference abstracts submitted to the Annual Meeting of the Association of American Geographers (1993-2002). Landscape features express the degree of topical focus, with elevated areas corresponding to more well-defined, topical regions and low-lying areas corresponding to a mingling of various topics. Dominant terms are used as labels for topical regions.
Description of Unique Features:
The most unique aspect of this visualization is its combination of intense computation with geographic metaphors and cartographic design considerations. From a computational perspective, the use of a self-organizing map consisting of a large number of neurons (10,000) is fairly unique. The final map presented here aims to explore how far we can go in the design of map-like information visualizations. Its use of a range of label sizes (from very large to very small) on a large-format map and the omission of a legend are aimed at challenging traditional notions of interactivity, by encouraging viewers to vary their distance from the map and instigating discussion.”


Online tool by Brian L. Pytlik Zillig.

“TokenX: a text visualization, analysis, and play tool” (from the project web site), is an online interface based out of the University of Nebraska’s Center for Digital Research in the Humanities that allows the user to view web page components or file components in alternative organizational formats.

Starter Links: TokenX

US Presidential Speeches Tag Cloud

Program by Chirag Mehta that tracks the frequency of word usage in presidential speeches dating back to 1776.

“The above tag cloud shows the popularity, frequency, and trends in the usages of words within speeches, official documents, declarations, and letters written by the Presidents of the US between 1776 – 2006 AD. The dataset consists of over 360 documents downloaded from Encyclopedia Britannica and ThisNation.com. Once the documents have been dated and converted to plain-text, my tag-cloud-generation script goes through every text chronologically and makes a list of all the unique words that have been used and counts how many times each word is used.”(From the project web site.)

Starter Links: Home Page

Thinkmap Visual Thesaurus

Thinkmap is a unique implementation of a dictionary and thesaurus as an interactive visual display.

“The Visual Thesaurus is a dictionary and thesaurus with an intuitive interface that encourages exploration and learning. Available in both a Desktop Edition and an Online Edition, the Visual Thesaurus is a marvelous way to improve your vocabulary and your understanding of the English language.” (from The Visual Thesaurus Product Overview)

Starter Links: Visual Thesaurus Homepage | Thinkmap (Designers)

Turning the Pages

British Library projects that allows the online user to view items held in the Library’s special collections.

“Turning the Pages is the award-winning interactive program that allows museums and libraries to give members of the public access to precious books while keeping the originals safely under glass. Initially developed by and for the British Library, it is now available as a service for institutions and private collectors around the world.

Turning the Pages allows visitors to virtually ‘turn’ the pages of manuscripts in a realistic way, using touch-screen technology and interactive animation. They can zoom in on the high- quality digitised images and read or listen to notes explaining the beauty and significance of each page. There are other features specific to the individual manuscripts. In a Leonardo da Vinci notebook, for example, a button turns the text round so visitors can read his famous ‘mirror’ handwriting.” (From the Project’s web site.)

Starter Links: Turning the Pages | BBC article on their digitization of Mozart’s diaries as part of the project | BBC article about their digitization of what was to become Alice in Wonderland


Software program that organizes large lists of citations to help researchers sort the terrain of the literature.

“With this powerful text analysis and visualization software program, you get an intuitive framework for exploring reference collections based on content. RefViz provides an at-a-glance overview and reveals trends and associations in references—now you can retain important references otherwise lost when narrowing a search or skimming a list.? (from the Product Info page.)

Starter Links: Refviz


Application that attempts to visualize reading practices through an analysis of reader data.

txtkit is an Open Source visual text mining tool for exploring large amounts of multilingual texts. It’s a multiuser-application which mainly focuses on the process of reading and reasoning as a series of decisions and events. To expand this single perspective activity txtkit collects all of the users’ mining data and uses them to create content recommendations through collaborative filtering. The software requires Mac OS X 10.3 and Internet access.

”...The txtkit interface is divided into two parts: txtshell (shell interface) and txtvbot (visual bot). txtshell provides several commands to browse, to read and to select text, whilst txtvbot displays the user activity in real-time. The visualization is based on the users actions, statistical information about the data as well as collaborative filtering schemes. That is the reason why the complexity of its visual output is according to the increasing number of users! You can use txtvbot and txtshell individually, but through an alternating perception you will merge visual and textual cognition processes in order to empower abductive reasoning in digital contexts. ” (from the txtkit website).

Starter Links: txtkit | Do-It-Yourself Parsing

W. Bradford Paley, TextArcTransliteracies Research Report

Text-visualization and -analysis tool that processes texts (e.g., a novel) to give an overview of networks of repeated words and where repetitions occur; also retains the original text in readable form:

“A TextArc is a visual represention of a text—the entire text (twice!) on a single page. A funny combination of an index, concordance, and summary; it uses the viewer’s eye to help uncover meaning. Here are more detailed overviews of the interactive work and the prints.” (from the TextArc web site.)

Starter Links: TextArc | W. Bradford Paley’s Home Page

Transliteracies Research ReportTransliteracies Research Report By Katrina Kimport

Marumushi.com, Newsmap

A visual representation of the Google News aggregator’s shifting status.

Newsmap is an application that visually reflects the constantly changing landscape of the Google News news aggregator. A treemap visualization algorithm helps display the enormous amount of information gathered by the aggregator. Treemaps are traditionally space-constrained visualizations of information. Newsmap’s objective takes that goal a step further and provides a tool to divide information into quickly recognizable bands which, when presented together, reveal underlying patterns in news reporting across cultures and within news segments in constant change around the globe.

“Newsmap does not pretend to replace the googlenews aggregator. It’s objective is to simply demonstrate visually the relationships between data and the unseen patterns in news media. It is not thought to display an unbiased view of the news, on the contrary it is thought to ironically accentuate the bias of it. ” (from the Newsmap web site.)

Starter Links: newsmap | Marumushi.com | Google News

Zachary Lieberman, Intersection: a Study in Typographic Space

Interactive online site that allows the user to visualize representations of letter conglomerations in three dimensional space.

“This project is an exploration in how type forms intersect when projected along the x, y, and z axis in 3D space. The intersections of those projections are used to define new 3D shapes and hybrid letterforms.? (from the intersection web site.)

Starter Links: intersection | thesystemis

David Link, Poetry Machine 1.0

An interactive installation that generates texts through a combination of user input and autonomous web crawlers (web bots).

“The interactive installation operates with a keyboard as interface, an Internet connection and two video displays. Poetry Machine is a word processor that extracts associations. The sources of information for this self-composing poetry machine are the gigantic pools of information on the Internet. When a word is typed that is as yet unknown to the poetry machine, the program will send out autonomous “bots” to the Internet to collect texts in which the word in question occurs. This action of the bots, searching sites and documents, can be watched on a plasma screen by the side of the installation. In this interaction of machine words and human text, Poetry Machine creates a new écriture automatique, where language is no longer the exclusive domain of human thought but also that of the internal logic of computers.” (from the project description page on the Media Art Net web site).

Starter Links: Poetry Machine 1.0 | Median Kunst Netz / Media Art Net | Poetry Machine 1.5

PieSpy Social Network Bot: Inferring and Visualizing Social Networks on IRC

Software from Jibble.org to visualize the social network created in an IRC channel:

“PieSpy is an IRC bot that monitors a set of IRC channels. It uses a simple set of heuristics to infer relationships between pairs of users. These inferrences allow PieSpy to build a mathematical model of a social network for any channel. These social networks can be drawn and used to create animations of evolving social networks. PieSpy has also been used to visualize Shakespearean social networks.” (from Jibble.org’s PieSpy site)

Starter links: Jibble.org PieSpy page | Visualization of the social network implied in Shakespeare’s Antony and Cleopatra, treated as a communicational network

Nora Project

Project to create text-mining, pattern-recognition, and visualization software to enable the discovery of significant patterns across large digital text archives:

“In search-and-retrieval, we bring specific queries to collections of text and get back (more or less useful) answers to those queries; by contrast, the goal of data-mining (including text-mining) is to produce new knowledge by exposing unanticipated similarities or differences, clustering or dispersal, co-occurrence and trends. Over the last decade, many millions of dollars have been invested in creating digital library collections: at this point, terabytes of full-text humanities resources are publicly available on the web. Those collections, dispersed across many different institutions, are large enough and rich enough to provide an excellent opportunity for text-mining, and we believe that web-based text-mining tools will make those collections significantly more useful, more informative, and more rewarding for research and teaching.” (from Nora project description)

Starter Links: Nora Project home page

pStruct: The Social Life of Data

Self-organizing graphing program for visualizing large bodies of data, including Web forum posts; being developed at UCSB’s Four Eyes Lab:

“pStruct enables content to organize itself dynamically, based on similarities to other pieces of data, as well as users’ interaction with the forum. The result is an unstructured graph that responds in life-like ways to the interaction of data and users…. pStruct is built on a multithreaded Java architecture designed to maintain system responsiveness when faced with hundreds of users and millions of pieces of content. Every post to the forum is stored in a database for archival purposes. A subset of the posts are kept in memory as ‘live’ content which users are presented with and can interact with. When a post is no longer live, it is saved to the database for later retrieval. Each live entity runs as a separate thread, maintaining connections to other entities (posts, users, etc.), responding to requests and seeking out new relationships. While pStruct is currently built to act as a web forum backend, the architecture is general enough to allow for management of any data storage and content retrieval system.” (from UCSB Four Eyes Lab site)

Starter Links: UCSB Four Eyes Lab description of pStruct

VectorsTransliteracies Research Report

Experimental online journal from the Annenberg Center for Multimedia Literacy:

“This investigation at the intersection of technology and culture is not simply thematic. Rather, Vectors is realized in multimedia, melding form and content to enact a second-order examination of the mediation of everyday life. Utilizing a peer-reviewed format and under the guidance of an international board, Vectors will feature submissions and specially-commissioned works comprised of moving- and still-images; voice, music, and sound; computational and interactive structures; social software; and much more. Vectors doesn’t seek to replace text; instead, we encourage a fusion of old and new media in order to foster ways of knowing and seeing that expand the rigid text-based paradigms of traditional scholarship. In so doing, we aim to explore the immersive and experiential dimensions of emerging scholarly vernaculars. ” (from Vectors site)

Starter Links: Vectors

Transliteracies Research ReportTransliteracies Research Report By Jessica Pressman


Java-based tool from the NINEs project and Applied Research in Patacriticism for the comparison of texts:

“Juxta is a cross-platform tool for collating and analyzing any kind or number of textual objects. The tool can set any textual witness as the base text and can filter white space and/or punctuation. It has several kinds of visualizations, including a heat map of textual differences and a histogram that can expose the filtering results. When collations are being executed, Juxta keeps the textual transcriptions keyed to any digital images that may stand behind the transcriptions as their documentary base. Juxta also allows the collations and analyses to be annotated and saved for further use.” (from Juxta site)

Starter Links: Juxta home | NINEs Tools & Interfaces

Franco Moretti, Graphs, Maps, Trees: Abstract Models for a Literary History (Verso, 2005)

Book demonstrating Moretti’s quantitative, “distant reading” (rather than close reading) approach to novels:

“Professor Franco Moretti argues heretically that literature scholars should stop reading books and start counting, graphing, and mapping them instead. He insists that such a move could bring new luster to a tired field, one that in some respects is among “the most backwards disciplines in the academy.? Literary study, he argues, has been random and unsystematic. For any given period scholars focus on a select group of a mere few hundred texts: the canon. As a result, they have allowed a narrow distorting slice of history to pass for the total picture. Moretti offers bar charts, maps, and time lines instead, developing the idea of “distant reading,? set forth in his path-breaking essay “Conjectures on World Literature,? into a full-blown experiment in literary historiography, where the canon disappears into the larger literary system. Charting entire genres—the epistolary, the gothic, and the historical novel—as well as the literary output of countries such as Japan, Italy, Spain, and Nigeria, he shows how literary history looks significantly different from what is commonly supposed and how the concept of aesthetic form can be radically redefined.” (from publisher’s blurb)

Starter Links & References: Verso, 2005 (ISBN: 1844670260) | Publisher’s blurb for the book | Inside Higher Ed review