About | Project Members | Research Assistants | Contact | Posting FAQ | Credits

Announcement: Search & Data Mining Innovations

Search and data mining technology innovations with implications for the future of online reading.

CollageMachine/combinFormation

Web-recombiner program that applies Andruid Kerne’s theory of “interface ecology.”

Created by Andruid Kerne and the Interface Ecology Lab at Texas A&M, CollageMachine allows users to explore a recombinant information space, where different web elements surface, blend, and adapt to their browsing. The program automatically seeks out and imports media elements of interest and continuously streams these elements into the user’s field of view. Thus, the user is able to locate information and to generate conceptual links that may not have been possible with a traditional web browser. CollageMachine has been further developed in the Interface Ecology Lab as combinFormation, an agent-driven tool that can be used online to build collage-style combinations of visual and textual scraps from web sites, allowing the user then rearrange and reprioritize the found-data to facilitate the discovery of relations.

Starter Links: Andruid Kerne’s home page | combinFormation | Interface Ecology Website (Texase A&M)

Transliteracies Research ReportTransliteracies Research Report By Nicole Satrosielski

Text Analysis Portal for Research (TAPoR)

TAPoR is a web portal for analyzing digital text. The project is based out of McMaster University and includes a project team from six different Canadian universities.

Created by a consortium of Canadian universities, TAPoR is a collection of online text-analysis tools—ranging from the basic to sophisticated—that allows users to run search, statistical, collocation, extraction, aggregation, visualization, hypergraph, transformation, and other “tools” on texts. (The site comes seeded with prepared texts, but users can sign up for a free account and input their own.) TAPoR allows tools to be mixed and matched in a mashup-style “workbench.” Particularly impressive is the “recipes” page, which in step-by-step fashion suggests ways that tools can be combined for particular purposes—e.g., identify themes, analyze colloquial word use, visualize text, explore changes in language use by a writer, create an online interactive bibliography, build a social network map from text, create a chronological timeline from bibliographical text, etc. As regards the general philosophy of TAPoR, which descends from the mature computational-linquistics side of humanities computing (the oldest use of computers for the humanities), project developer Geoffrey Rockwell says at the beginning of his article for the Text Analysis Developers Alliance (TADA) entitled “What is Text Analysis?”:

Text analyis tools aide the interpreter asking questions of electronic texts:” Much of the evidence used by humanists is in textual form. One way of interpreting texts involves bringing questions to these texts and using various reading practices to reflect on the possible answers. Simple text analysis tools can help with this process of asking questions of a text and retrieving passages that help one think through the questions. The computer does not replace human interpretation, it enhances it…. (1) Text-analysis tools break a text down into smaller units like words, sentences, and passages, and then (2) Gather these units into new views on the text that aide interpretation.”

Starter Links: TAPoR | TAPoR Recipes | Text Analysis Developers Alliance (TADA)

Semantic Web Transliteracies Research Report

Innovative method for creating organizational structures and ontologies online.

“The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. It is a collaborative effort led by W3C with participation from a large number of researchers and industrial partners. It is based on the Resource Description Framework (RDF)” (W3C).

Starter Links:
Glossary definition | W3C

Transliteracies Research ReportTransliteracies Research Report By Angus Forbes

Collex Transliteracies Research Report

Developed by ARP (Applied Research in Patacriticism) in collaboration with NINES (Networked Interface for Nineteenth-century Electronic Scholarship), Collex allows a user to access resources from nine different online scholarly resources. Using semantic web technologies, Collex facilitates collaborative research and access to a variety of sources, while retaining the unique characteristics of each source. Resources are added on an on-going basis as they are evaluated by the NINES editorial team.

“Users of the web-based NINES aggregation can now, through Collex 1.0:

  • perform text searches on finding aids for all 45,000 digital objects in the system;

  • search full-text content across participating sites (currently Rossetti, Swinburne, and Poetess);

  • browse common metadata fields (dates, genres, names, etc.) across all objects in a non-hierarchical, faceted manner;

  • constrain their search and browse operations to generate highly-individualized results;

  • create personal accounts on the system to save and share their research work;
    publicly tag, privately annotate, and ultimately “collect” digital objects located through Collex or in browsing NINES-affiliated sites;

  • browse their own and others’ collections in an integrated sidebar interface;
    and discover new, related objects of interest through the Collex “more like this” feature”.
(from the NINES Collex press release)

Starter Links: Collex | NINES website | ARP Collex Blog | ARP Webpage

Transliteracies Research ReportTransliteracies Research Report By Kim Knight

ConceptNetTransliteracies Research Report

Software program that culls meaning from searchable text.

“ConceptNet focuses on semantic meaning in a text, analyzing concepts and the contexts in which they are found, offering a unique approach compared to traditional keyword or statistical evaluations of texts.

ConceptNet has been used as the basis for several programs designed to distill particular meanings from texts (affect, for example) and provide intelligent feedback about the text’s content.” (From Katrina Kimport’s research report.)

Starter Links: ConceptNet

Transliteracies Research ReportTransliteracies Research Report By Katrina Kimport

BlogdexTransliteracies Research Report

Blogdex is a research project from the MIT Media Laboratory that traces the diffusion of content, represented in the form of hypertext links, over time, through blogs.

“Programs such as Blogdex offer a window into the networking structure of the blogging community, an opportunity to systematically analyze large textual datasets, and a way to think about meaning in the online environment.” (From Katrina Kimport’s research report.)

Starter Links: Blogdex

Transliteracies Research ReportTransliteracies Research Report By Katrina Kimport

StumbleUpon

Online search engine that provides an innovative method for searching the web.

“StumbleUpon uses [thumbs-up/thumbs-down] ratings to form collaborative opinions on website quality. When you stumble, you will only see pages which friends and like—minded stumblers have liked.” (From StumbleUpon.)

Starter Links: StumbleUpon

KartOO

Online search engine that visualizes search results.

“KartOO is a metasearch engine with visual display interfaces. When you click on OK, KartOO launches the query to a set of search engines, gathers the results, compiles them and represents them in a series of interactive maps through a proprietary algorithm.” (From KartOO.)

Starter Links: KartOO | article about KartOO from The State

Inform.com Transliteracies Research Report

News portal site from Inform Technologies LLC that uses advanced algorithms to sift news, blogs, audio, and video; analyzes them according to structure and relationships through “polytope” mathematical/geometrical relations; and then “channels” the results adaptively (according to evolving “discovery paths”) for particular readers:

“Inform is creating a free online tool that we believe will revolutionize how people read news on the web. We not only provide thousands of news sources, including blogs, video, and audio, in a convenient single interface, we process the news for you, allowing you to get at what you’re interested in more quickly, intelligently, and comprehensively.

Inform’s differentiating technology uses a series of information structuring techniques and natural-language interpretation to auto-categorize and group news stories into thousands of categories, and then shreds the text of the stories to isolate the important elements of each. Once the elements have been identified, you can easily connect and read news on any person, place, organization, topic, industry or product quickly, successfully, and easily right from the article you’re reading, or by utilizing a custom news channel you create, all for free.” (from “About Us” on Inform.com site)

Starter Links: Inform.com | Business Week article discussing Inform.com and related, math-driven information and business technologies (.pdf)

Transliteracies Research ReportTransliteracies Research Report By Lisa Swanstrom

Internet Archive, WayBack MachineTransliteracies Research Report

Online archive of past web sites, including defunct or no longer operable pages. Takes its name from “Peabody’s Improbable History,” a frequent short on The Rocky and Bullwinkle Show.

“Browse through 40 billion web pages archived from 1996 to a few months ago. To start surfing the Wayback, type in the web address of a site or page where you would like to start, and press enter. Then select from the archived dates available. The resulting pages point to other archived pages at as close a date as possible. Keyword searching is not currently supported.” (from Archive.org.)

Starter Links: Internet Archive |

Transliteracies Research ReportTransliteracies Research Report By Lisa Swanstrom

Memories for Life Grand Challenge Proposal

Project devoted to considering strategies for information storage and retrieval.

“People are capturing and storing an ever-increasing amount of information about themselves, including emails, web browsing histories, digital images, and audio recordings. This tsunami of data presents numerous challenges to computer science, including: how to physically store such “digital memoriesâ€? over decades; how to protect privacy, especially when data such as photos may involve more than one person; how to extract useful knowledge from this rich library of information; how to use this knowledge effectively, for example in knowledge-based systems; and how to effectively present memories and knowledge to different kinds of users. The unifying grand challenge I to manage this data, these digital memories, for the benefit of human life and for a lifetime.â€? (from the Memories for Life project proposal.)

Starter Links: Memories for Life (.pdf file)

Nora Project

Project to create text-mining, pattern-recognition, and visualization software to enable the discovery of significant patterns across large digital text archives:

“In search-and-retrieval, we bring specific queries to collections of text and get back (more or less useful) answers to those queries; by contrast, the goal of data-mining (including text-mining) is to produce new knowledge by exposing unanticipated similarities or differences, clustering or dispersal, co-occurrence and trends. Over the last decade, many millions of dollars have been invested in creating digital library collections: at this point, terabytes of full-text humanities resources are publicly available on the web. Those collections, dispersed across many different institutions, are large enough and rich enough to provide an excellent opportunity for text-mining, and we believe that web-based text-mining tools will make those collections significantly more useful, more informative, and more rewarding for research and teaching.” (from Nora project description)

Starter Links: Nora Project home page

pStruct: The Social Life of Data

Self-organizing graphing program for visualizing large bodies of data, including Web forum posts; being developed at UCSB’s Four Eyes Lab:

“pStruct enables content to organize itself dynamically, based on similarities to other pieces of data, as well as users’ interaction with the forum. The result is an unstructured graph that responds in life-like ways to the interaction of data and users…. pStruct is built on a multithreaded Java architecture designed to maintain system responsiveness when faced with hundreds of users and millions of pieces of content. Every post to the forum is stored in a database for archival purposes. A subset of the posts are kept in memory as ‘live’ content which users are presented with and can interact with. When a post is no longer live, it is saved to the database for later retrieval. Each live entity runs as a separate thread, maintaining connections to other entities (posts, users, etc.), responding to requests and seeking out new relationships. While pStruct is currently built to act as a web forum backend, the architecture is general enough to allow for management of any data storage and content retrieval system.” (from UCSB Four Eyes Lab site)

Starter Links: UCSB Four Eyes Lab description of pStruct