About | Project Members | Research Assistants | Contact | Posting FAQ | Credits

The Coh-Metrix Project

Research Report by Kim Knight
(created 8/22/06; version 1.1 updated 9/15/06)

Related Categories: Cognitive Approaches to Reading

Original Object for Study description

Summary:
The Coh-Metrix Project is a research project concerned with predicting the readability of texts in order to facilitate textual comprehension. The underlying assumption of the project is that current “readability” tests, based upon word and sentence length, are inadequate to truly predict textual coherence. Coherence in this context is defined as a mental representation that results from an interaction between the reader’s skills and goals, and textual cohesion. The Coh-Metrix project proposes the creation of two tools that will provide a more nuanced prediction of textual cohesion than current indices allow: 1. Coh-Metrix computes the cohesion of a text based on complex cohesion metrics. 2. CohGIT locates where gaps in textual cohesion occur, facilitating textual improvement. The project relies upon an interdisciplinary approach to reading practices, drawing upon “psychology, linguistics, education, literary theory, cognitive science, mathematics, and artificial intelligence” (McNamara, Louwerse, & Graesser 5).

Description:
Developed by the Institute for Intelligent Systems, and funded by a grant from the Institute of Education Sciences, The Coh-Metrix Project is housed at the University of Memphis. Other projects at the Institute include AutoTutor, a web-based tutoring program “that simulates the dialog moves of effective human tutors,” and QUAID, a program that assists social scientists and others in creating effective questions for surveying purposes (Institute for Intelligent Systems website).

The Coh-Metrix Project developed out of the hypothesis that readability formulas, widely used to evaluate textbooks, are painfully inadequate because they base scores upon word length and sentence length. The project instead promotes evaluations based upon textual coherence, an interaction between a reader’s skill level, background knowledge, motivation, and the cohesion of the text in question. Cohesion is defined as connections between linguistic elements. The Research team uses a set of sample sentences to demonstrate the discord between readability measures and textual cohesion:

For example, One part of the cloud develops a downdraft. Rain begins to fall. has lower causal cohesion than One part of the cloud develops a downdraft, which causes rain to fall, but a lower Flesch-Kincaid grade level (3.4 and 4.9 respectively). Similar patters are found for passages with empirically documented comprehension effects. For instance, a high-cohesion text about cell mitosis in McNamara (2001) resulted in better comprehension but had a Flesch-Kincaid grade level of 11.2 compared to 9.3 for the low-cohesion version. Many more examples are available, but the bottom line is that increasing cohesion often requires adding words. Longer sentences result in increased grade level predictions. (McNamara, Louwerse, & Graesser 11).

The overall goals of the project are three-fold: to measure textual coherence, to study the effects of textual cohesion on readers, and to fine-tune the cohesion metrics developed for the project.

In the original grant proposal, McNamara, et al. proposed a series of experiments on young readers (3rd — 5th grade) and college readers, the results of which would lead to the development of two computational tools – Coh-Metrix and CohGIT. Coh-Metrix scores texts based upon sixty cohesion metrics (see Technical Analysis section). CohGIT (Cohesion Gap Identification Tool) will pinpoint areas of cohesion gaps in texts, which will give writers and publishers tangible feedback on how to improve cohesion. The remainder of this report will focus on the Coh-Metrix tool as most of the currently available information and product demos relate to this aspect of the project.

A web-based tool, Coh-Metrix is currently available to the general public. The original grant proposal indicated that users would “input a text to be scaled and set optional parameters, such as the amount of background knowledge in the reader and the language skills of the targeted reader” (McNamara, Louwerse, & Graesser 14). However, the current version of Coh-Metrix (2.0) does not include this type of information. Instead, users enter some preliminary information about the text and cut and paste texts into a web form, pictured below:

Results are then returned on sixty different cohesion metrics, of six different types: “(1) general identification and reference information, (2) readability indices, (3) general word and text information, (4) syntactic indices, (5) referential and semantic indices, and (6) situational model dimensions” (Coh-Metrix 2.0 Demo). Sample results are pictured below:

The original grant proposal indicated that the computer tool would deliver an overall cohesion density score in addition to analysis of the different types of cohesion (see below). The research team speculates that this might allow them to identify whether different readers rely on certain kinds of cohesion. Additionally, the proposal indicated that the tool would provide overall scores for “readability, comprehension, learning, and appropriateness” (McNamara, Louwerse, & Graesser 16) in relation to different types of readers. The tool does not currently perform these functions and there is no indication whether these features are forthcoming in subsequent versions of Coh-Metrix.

Research Context:
The Coh-Metrix Project is situated within the field of cognitive approaches to reading. Many of the concepts in the grant proposal seem to echo concepts in the other cognitive objects for study in the Transliteracies Project. The web interface of Coh-Metrix extends the audience for cognitive data beyond specialists to a wider community of users.

Technical Analysis:
Types of Cohesion
The grant proposal identifies three kinds of cohesion, which form twenty different possible combinations. One of the stated goals of the research proposal is to compute an overall cohesion density score as well as produce information in relation to the various types of cohesion.

  1. Local vs. Global (McNamara, Louwerse, & Graesser 11)
    o Local: “relations between adjacent clauses in the text”
    o Global: “links between groups of clauses and groups of paragraphs”

  2. Grammar-driven vs. Vocabulary-driven (McNamara, Louwerse, & Graesser 12)
    o Grammar-driven: “information in the text that cues grammar-based inferences”
    o Vocabulary-driven: “words that cue knowledge-based inferences”

  3. Referential, temporal, locational, causal, structural (McNamara, Louwerse, & Graesser 12 — 13)
    o Referential: “established with the use of anaphora, repeated phrases, definite articles, and conceptual overlap”
    o Temporal: “refers to continuity in time, which is established by connectives (before and then), preopostional phrases (Later on that day), verb tense and aspect, or with order of mention”
    o Locational: “cued by adverbs (here, there), adverbial phrases, prepositions (above, near) and verbs that reflect the narrator’s point-ov-view (come versus go)”
    o Causal: “established by marking the causal relations between two events with connectives”
    o Structural: “refers to the continuity in syntactic and conceptual form of clauses”

Sixty Metrics
Each text run through Coh-Metrix is analyzed based upon sixty different metrics that “consider syntactic and semantic aspects of texts at various levels” (McNamara, Louwerse, & Graesser 14). Currently there is no information available on how the various metrics relate to the different categories of cohesion outlined above. Metrics range from counts of words and sentences to Flesch-Kincaid grade level and readability scores. For in-depth descriptions of the Coh-Metrix indices, see the Coh-Metrix Demo.

The Web Tool

Coh-Metrix is a web-based tool, best utilized in IE 5.0 or above. The tool also functions in Safari, but does not seem to work in Firefox. “The software [relies] on a variety of computer languages (JAVA, LISP, and C++) in Linux and Windows XP operating systems, whereas the hardware [is] a configuration of Dell Pentium servers” (McNamara, Louwerse, & Graesser 14). The overall goal in designing the tool is to keep the human-computer interface as simple as possible to encourage widespread use.

Other Tools/Theories Utilized

  • Latent Semantic Analysis (LSA) — “a mathematical, statistical technique for representing world knowledge, based on a large corpus of texts.”

  • Flesch Reading Ease Score – a readability test that scores texts from 0 – 100 based upon the average number of syllables per word and the average sentence length in the text. The higher the Flesch Reading Ease score, the easier a text should be to read (Wikipedia).

  • Flesch Kincaid Grade Level Score” – a translation of the Flesch Reading Ease score into a grade level. Although one would expect that reading ease and grade level would always correspond, there is sometimes a discrepancy between the two (Wikipedia).

  • MRC Psycholinguistic Database” — “contains 150,837 words and provides information of up to 26 different linguistic properties of these words”

  • WordNet — “an online lexical reference system”


Evaluation of Opportunities/Limitations for the Transliteracies Topic:

The Coh-Metrix tool produces a report with a vast amount of data; data that is potentially helpful to the Transliteracies project. However, at present, the data is only valuable if we are able to employ some sort of mediary, be it another software program or a person who specializes in this area, to turn the data into information. It could be quite useful to understand which elements of a text facilitate coherence in readers with particular standards of coherence. This could potentially assist in facilitating offline reading processes and increasing metacognitive awareness while reading online content.

In addition, it might be useful to consider the question of genre, emphasized in the original project grant proposal. The research team stresses the importance of genre and has decided to develop their tool specifically for textbooks. Certainly many of the standards of coherence would apply to other types of text, but the question eventually arises, “how is this data affected when incoherence or lack of cohesion is purposefully employed as a strategy?” This question has varying implications depending upon the genre of texts that a Transliteracies tool might support. For example, the strategic use of “cognitive disequilibrium” (McNamara, Louwerse, & Graesser 9) is likely to be more commonly found among literary texts than textbooks. The grant proposal gestures in this direction when the researchers acknowledge the potential benefits of low-cohesion texts for readers who already have a minimum level of skills: the additional work required to form a textual representation of a low-cohesion texts may increase comprehension and the skill sets of already advanced readers.

Resources for Further Study:

  tl, 08.22.06

4 Responses to The Coh-Metrix Project

  1. Jafar Toolabi says:

    Hi I am Ph.D. candidate at UPM in Malaysia. My research is closely related to this software (Coh-Metrix). I want to know if this softwear is freely available for a long time or it may be closed. if it may be closed how we can alpply for softwear. Thank you

  2. Kimberly Knight says:

    I’ve fixed the link to the Coh-Metrix website. You should be able to use the software online for the foreseeable future.

    Best,
    KK

  3. reham says:

    Good night
    Hi, my name is reham i’, preparing my master nowadays and
    i want to use the coh metrix program rapidly , but i cannot how to get a version or acopy of it .
    please tell me how quickly.
    thanks very much

  4. Masayu says:

    Hi, I am Masayu, PhD student at Bandung Institute of Technology Indonesia. My research also need Coh-Metrix.
    I want to know whether there’s Coh-Metrix API or library to be used to process our corpus. Thank you very much.