ConceptNet is a software program that culls meaning from searchable text. ConceptNet focuses on semantic meaning in a text, analyzing concepts and the contexts in which they are found, offering a unique approach compared to traditional keyword or statistical evaluations of texts.
ConceptNet has been used as the basis for several programs designed to distill particular meanings from texts (affect, for example) and provide intelligent feedback about the text’s content.
Developed in 2002, ConceptNet is a natural language processing tool that studies associations between words in a sentence. In contrast to other textual analysis programs, ConceptNet does not look at texts purely on the word level. Instead, ConceptNet looks at the sentence as its unit and parses the sentence into nodes and relations.
Nodes are semi-structured English fragments, the central components of a sentence. ConceptNet has over 300,000 nodes in its knowledge database. In a departure from other similar programs, ConceptNet allows for compound nodes thereby allowing ConceptNet to capture a larger range of events and activities. For example, “buy food” is interpreted as a single node as is “food.”
Nodes are related to each other through the twenty types of semantic relation logged by ConceptNet. Some examples of semantic relations include “is a,” “effect of,” and “capable of.”
The knowledge base of nodes and relations was developed from over 700,000 sentences generated through the Open Mind Common Sense Project (OMCS). OMCS asked users to complete sentences such as “A knife is used for…” With contributions from over 14, 000 collaborators, OMCS generated a base of nodes and relations to build a database of commonsense associations between words.
Nodes are further assigned to realms: temporal, spatial, or action. For a given text, ConceptNet can analyze the associations between concepts (termed concept neighborhoods), determine the forms of semantic relation, and separate these associations into the realms listed above. ConceptNet can perform a number of additional actions, including identifying novel concepts, determining analogous concepts, and sensing affect.
ConceptNet has been used in a variety of other programs to distill meaning from texts and make inferences about these texts.
Linguists have long found that language is far more complex than a simple combination of words. For this reason, technologies designed to study meaning that deploy keyword or statistically based systems of analysis offer little insight into textual meaning. ConceptNet, however, departs from formal, linguistic, logic-based analyses of words and meaning in an important way. By capturing relations between words instead of simply the words themselves, ConceptNet offers a better approximation of textual meaning.
In its applications, ConceptNet can sort, categorize, and label texts based on the concepts they include. It was initially designed to help scholars interested in artificial intelligence, specifically, to help machines learn to reason.
ConceptNet is a Python API, integrated and distributed with the MontyLingua natural language toolkit. It is also available in other programming languages.
ConceptNet first applies extraction rules to a given text, separating the text into sentence-long segments. Using MontyLingua, a natural language tool, each sentence is parsed to identify nodes and connectors. During this phase, verbs are stripped of their tense. The text is then put through a relaxation phase where a number of inferences are made: a hierarchy of nodes is implemented, duplicate assertions eliminated, synonyms merged, and adjectival modifiers are lifted. Finally, the nodes and their relations are mapped.
Evaluation of Opportunities/Limitations for the Transliteracies Topic:
As opposed to keyword or statistical analyses of texts, ConceptNet’s common sense approach depends on semantic understanding of text. ConceptNet represents an interesting modification of standard approaches to culling meaning from texts. ConceptNet moves beyond standard word analysis–particularly keyword or word count based approaches–to embrace the context in which words are found. As any search engine user has discovered, keyword matching often draws unintended search results (think of how a search about breast cancer might be misled by using just the search word “breast”). More elaborate strings of words do not guarantee that a user will find a site with matching tone. There are, after all, many online texts that discuss the Iraq war but they do not, by any means, all agree.
ConceptNet moves beyond word counts to tease out the use of words in particular contexts. Specifically, ConceptNet seeks to capture the commonsense underpinnings in text. Commonsense, being just that, is generally unstated in texts. A writer may say that she ate a lemon and made a face. As readers, we know that lemons are sour and can thus connect the lemon eating to the reaction it provoked. Likewise, a statement that John ran into his friend in the store depends on a different “running into” than the observation that a car ran into a pole. Because ConceptNet looks at context, it can bridge the components of the first example and see the difference between the two statements in the second example.
For these reasons, ConceptNet is a useful tool in analyzing texts. Its ability to usefully sort large quantities of text by concept is promising for the Transliteracies Project. ConceptNet has been used as the basis of several other applications designed to assess meaning and generate new texts. For example, in GloBuddy, ConceptNet is built into a program that operates like a foreign phrase book. In Emotus Ponens, ConceptNet works to determine the affect of incoming email messages and then assigns them an emoticon.
Although ConceptNet breaks new ground in capturing meaning in large quantities of textual data, its reliance on the sentence unit is a weakness. The program expands interpretation of word use from the individual word to the sentence level but, without a larger context, is unable to fully account for sarcasm, parody, or irony.
Resources for Further Study:
- The ConceptNet Project.
- Liu, H. and P. Singh. “ConceptNet: A Practical Commonsense Reasoning Tool-Kit.”
Points for Expansion:
- Cyc (http://www.cyc.com/cyc/technology/whatiscyc): “multi-contextual knowledge base and inference engine.” Meaning is structured as a logic-based system.
- Wordnet (http://wordnet.princeton.edu/): “lexical database of the English language.”
Additional topics of interest:
- Open Mind Common Sense Project –online collaboration of over 14,000 authors.
Papers of interest:
- S. A. Inverso, N. Hawes, J. Kelleher, R. Allen, and K. Haase.”Think And Spell: Context-Sensitive Predictive Text for an Ambiguous Keyboard Brain-Computer Interface Speller,” to appear in the Journal of Biomedizinische Technik special issue Proceedings of the 2nd International Brain-Computer Interface Workshop and Training Course, Graz, Austria, September 16-18, 2004 (paper).
- Ashwani Kumar, Sharad C. Sundararajan, Henry Lieberman (2004). “Common Sense Investing: Bridging the Gap Between Expert and Novice.” Conference on Human Factors in Computing Systems (CHI 04), Vienna, Austria. (paper).
- Tom Stocky, Alexander Faaborg, Henry Lieberman (2004). “A Commonsense Approach to Predictive Text Entry.” Conference on Human Factors in Computing Systems (CHI 04), Vienna, Austria. (website).
- Hugo Liu (2003). “Unpacking Meaning from Words: A Context-Centered Approach to Computational Lexicon Design.” CONTEXT 2003: 218-232 (paper).
- Rami Musa, Madleina Scheidegger, Andrea Kulas, Yoan Anguilet (2003). “GloBuddy, a Dynamic Broad Context Phrase Book.” Proceedings of the International and Interdisciplinary Conference on Modeling Context. pp. 467-474 (paper).