Transliteracies » Blog Archive » Document Database Integration for the Professional Social Environment (ProSE)

Document Database Integration for the Professional Social Environment (ProSE)

Research Report by Salman Bakht

(created 10/6/09)

Document Database Integration for the Professional

Social Environment (ProSE)

ProSE (Professional Social Environment) is a social network environment developed by the Bluesky Group of the Transliteracies Project. While online reading interfaces such as Professional Reading Environment (PReE) being developed by the Electronic Textual Cultures Laboratory (ETCL) provide sophisticated access to data derived from documents in a professional or scholarly field, ProSE provides access to the social network connected the field. ProSE models social networks in a way that seamlessly combines professional readers and writers, both contemporary and historical. Consequently, ProSE is designed to populate its social network database from existing databases within one or more fields of study to supplement user-created entries. This report describes the following databases, which may be integrated into ProSE:

English Broadside Ballad Archive (EBBA): a database of seventeenth-century broadside ballads, created by the Early Modern Center at UC Santa Barbara.

Early English Books Online (EEBO): a database of text images from 1475-1600.

Early English Books Online-Text Creation Partnership (EEBO-TCP): coding of the full text 25,000 works in EEBO.

The Renaissance English Knowledgebase (REKn): a database developed at ETCL consisting of primary and secondary sources related to the Renaissance.

The Iter Bibliography: a bibliographical database for articles, essays, books, dissertations, encyclopedia entries, and reviews pertaining to the Middle Ages and Renaissance (400-1700).

EBBA

The English Broadside Ballad Archive (EBBA), created by the Early Modern Center at UC Santa Barbara, is an online database of printed English-language ballads, with priority given to seventeenth-century broadsides. This archive can be accessed from the website: http://ebba.english.ucsb.edu. Currently, EBBA includes ballads from the Pepys Collection at Magdalene College, Cambridge and the Roxburghe Collection at the British Library, which together include over 3,000 ballads. For each ballad, EBBA provides a high-quality facimile, a full text transcription, an audio recording of the sung ballad whenever a tune is extant, citations, and TEI-XML and MARC records. EBBA record citations have over a dozen catalogue headings including title, ID numbers, publication date, and author, although the author and a definite publication date are often unavailable. Bahkt Image 1

Figure 1: An EBBA Record

XML records, which contain this metadata in addition to the full text transcription, are available for each ballad. Additionally, each XML record contains data archived for search features, such as modernized ballad titles, and archive development data such as editorial credits. EBBA conforms to the Text Encoding Initiative (TEI) standards. The TEI consortium describes the TEI standard as, “an international and interdisciplinary standard that helps libraries, museums, publishers, and individual scholars represent all kinds of literary and linguistic texts for online research and teaching, using an encoding scheme that is maximally expressive and minimally obsolescent.”

EEBO and EEBO-TCP

Early English Books Online (EEBO) is an online collection of citations and scanned page images of texts. EEBO describes the collection as follows:

From the first book published in English through the age of Spenser and Shakespeare, this incomparable collection now contains about 100,000 of over 125,000 titles listed in Pollard & Redgrave’s Short-Title Catalogue (1475-1640) and Wing’s Short-Title Catalogue (1641-1700) and their revised editions, as well as the Thomason Tracts (1640-1661) collection and the Early English Books Tract Supplement. Libraries possessing this collection find they are able to fulfill the most exhaustive research requirements of graduate scholars — from their desktop! — in many subject areas, including: English literature, history, philosophy, linguistics, theology, music, fine arts, education, mathematics, and science.[1]

The EEBO-TCP project is a collection of structurally marked-up full-text transcriptions of 25,000 of the texts in EEBO. The EEBO texts, including the EEBO-TCP transcriptions where available, are accessible from the EEBO search engine interface. All EEBO records are searchable by citation information, and the page images of the texts are downloadable. Additionally, the EEBO-TCP transcriptions are text-searchable and available for download in ASCII format.

The EEBO interface, accessible at http://eebo.chadwyck.com, allows the user to perform keyword searches and browsing by author or subject. Each record links to the page images, ASCII text if available, and the citation information, which can be accessed in several formats including ProCite, EndNote, Reference Manager, RefWorks, and plain text.

Bahkt Image 2 Figure 2: An EEBO Record

The record also provides a link to the author page in Literature Online (http://lion.chadwyck.com), provides biographical information about the author including birth and death dates, gender, associated literary movements, nationality, and portrait image when available.

Bahkt Image 3 Figure 3: Literature Online Author Page

REKn

The Renaissance English Knowledgebase (REKn) is an electronic field-specific knowledgebase created by the Electronic Textual Cultures Lab (ETCL) at University of Victoria. REKn contains over 13,000 primary sources (texts, images, audio) and 100,000 secondary sources (articles, e-books, etc.) related to the Renaissance period. Unlike the others described herein, REKn is research-only database and is not publicly available. REKn has been designed to serve as a prototype for the development of field-specific knowledgebases and has been used in the development of the Professional Reading Environment (PReE), the online reading interface developed by ETCL.

The Iter Bibliography

Iter is a not-for-profit partnership of the Arizona Center for Medieval and Renaissance Studies (ACMRS) at Arizona State University, the Centre for Reformation and Renaissance Studies (CRRS) at Victoria University in the University of Toronto, the Faculty of Information at the University of Toronto, the Renaissance Society of America, the Sixteenth Century Society and Conference, and the University of Toronto Library created “for the advancement of learning in the study and teaching of the Middle Ages and Renaissance (400-1700) through the development of online resources.”[2] Iter offers several online resources, one being the Iter Bibliography. This online bibliographical database for articles, essays, books, dissertations, encyclopedia entries, and reviews pertaining to the Middle Ages and Renaissance includes more than 1,070,000 records as of July 2009. The Iter Bibliography can be accessed via the website: http://www.IterGateway.org/. Bahkt Image 4

Figure 4: Iter Search Interface

The Iter search engine interface allows the user to search for entries based on text fields (author, title, subject, publication title, series title, and Dewey class number) with Boolean operators. Additionally, the search can be constrained based on language (of which 14 are listed), type (abstract, article, book, etc.), and publication year. The records in the Iter Bibliography have several cross-referenced fields. Of particular relevance to ProSE are the “personal author,” “added author,” and “personal subject” fields, which may be used as entries in the social network. A birth date and death date are associated with each person, although the values are often missing. The “personal subject” fields contain human subjects on which the document is focused. Additionally, the “geographic term” field may also be used in ProSE to form geographical entries. The Iter Bibliography is OpenURL-enabled, allowing for access to the full text of a work.

Figure 5: An Iter Bibliography Record

Overall Conclusions

Although the databases described above are designed primarily for online document access, their records can be integrated into a social network-centric model such as that used in ProSE. The minimum data for author identification including name, birth date, and death date can be found in all records, although cases where definite dates are unknown or names have multiple spellings must be considered. Specific databases have information beyond this that may be integrated. Literature Online provides gender, associated literary movements, nationality, and occasionally a portrait, while Iter associates each document with personal subjects in addition to the authors. Additional information could potentially be indirectly drawn from the records. For example, geographic information associated with an author could be derived from the geographic information associated with a document.

As this information already exists within a database, this data is easy to access. Particularly, as many of these databases export formatted ASCII records (in XML or other formats), this information can be gathered quickly with a text parser if direct access to the database back-end is not available. There is the potential for additional biographical information to be drawn from the text of documents themselves, by performing a text search for proper nouns, for example. However, this would likely be performed by dedicated text-analysis software, such as PReE, as this is beyond the current scope of ProSE.

Bibliography

Early English Books Online:

Electronic Textual Cultures Laboratory:

English Broadside Ballad Archive:

Iter:

Literature Online:

Text Creation Partnership:

[1] “About EEBO,” 15 Sept. 2009 .

[2] “Iter: Overview,” 15 Sept. 2009 .

Current Research Highlights:
Research Reports (Chronologically)

chagenah, 10.06.09

Research in the Technological, Social, and Cultural Practices of Online Reading