PhiloLogic User Manual

Table of Contents


1. Introduction

About PhiloLogic:
PhiloLogic, developed by the
ARTFL Project at the University of Chicago in collaboration with The University of Chicago Library's Electronic Text Services, provides sophisticated searching of a wide variety of large encoded databases on the World Wide Web. It is an easy to use, yet powerful, full-text search, retrieval, and reporting system for large multimedia databases (texts, images, sound) with the ability to understand complex text structures (e.g., SGML, BetaCode) with rich metadata. Its functions were originally designed and continue to be for scholarly research in databases of literary, religious, philosophical, and historical texts. Important historical encyclopedias and dictionaries are also ideally suited for development under PhiloLogic. The EncyclopédieProject, for example, is an implementation of a full hypermedia system supporting full-text retrieval and navigation with hyper-textual cross-referencing and full digital imaging support in a single, easy-to-use system.

PhiloLogic in its simplest form can serve as a document retrieval/look up mechanism whereby users can search a relational database to retrieve given titles and, in some implementations, portions of texts such as acts, scenes, articles, or head-words. This same document retrieval mechanism serves as the basis for defining a corpus in a full-text search. The typical PhiloLogic search is broken down into five distinct stages: defining a corpus (i.e. limiting a search), word expansion, word index searching, text extraction, and link resolution and formatting (e.g., SGML to HTML conversion). In other words, after defining a corpus (or one may search an entire database), one can execute a single term, phrase or proximity search. By looking up indices of the word(s) in a relational database, PhiloLogic extracts blocks of text containing the search term(s) with links to larger blocks of text. These extracts are formatted to display on a Web browser and sometimes include links to images, other texts, and other databases.

In addition to simple word and phrase searches, users can perform more sophisticated searches by using extended UNIX-style regular expressions and, in some implementations, morphological and orthographical expansion. All of these mechanisms to expand words can be combined using Boolean operators such as OR (the vertical bar "|") and AND (a space) within a variety of searching contexts.

The Text Collection Version of PhiloLogic:
This version of PhiloLogic has been developed by the ARTFL Project in collaboration with the University of Chicago Library's Electronic Text Services (ETS). Other versions of PhiloLogic, including those developed for dictionaries and encyclopedias, have somewhat different functionality. Accordingly, the documentation which follows outlines elements of PhiloLogic that are specific to databases of collections of texts. This documentation provides general user documentation for the main functions and features of PhiloLogic.

Return to Table of Contents


2. Searching the Bibliography

Bibliographic searching in PhiloLogic has two distinct purposes: 1) to allow the user to locate particular documents and read them online ("document retrieval") and 2) to allow the user to select one or more documents in which to search ("defining a corpus" or "limiting one's search"). If the user does not enter search term(s) into the Search Text(s) For: box, PhiloLogic automatically acts as a document retrieval system, providing a bibliography with links to the digital table of contents of each document retrieved. If, on the other hand, term(s) are entered into the Search Text(s) For: box, then PhiloLogic goes into full-text searching mode, looking for the entered term(s) in the document(s) specified in the bibliographic fields of the search-form.

In PhiloLogic the most common bibliographic fields for searching are:

Some databases, nonetheless, offer many more fields and some offer less. To select all documents in a database, simply leave the bibliographic fields blank and press SEARCH. (If performing a bibliographic search to retrieve documents, keep in mind that this can generate a very large number of titles) The documents are sorted by date with the earliest published being listed first. In PhiloLogic, fields can be combined to refine a search further. Thus, for example, entering smollett in the author field and 1750-1765 in the date field selects only those works by Tobias George Smollett which were published between 1750 and 1765.

On systems with a MySQL backend installed, you can narrow your search using the dynamic search terms buttons. Clicking on one of the buttons will pull up a list with all of the possible terms for that bibliographic field. You can highlight and copy one of these terms, hit the back button in your browser and paste it back into the appropriate bibliographic field search box. You can gradually narrow down your search by adding more bibliographic criteria.

2.1 Bibliographic Fields

A. Searching by Author:
Bibliographic entries in the author field must match the name exactly as given in a database's online bibliography. (One may, however, use upper or lower case letters; author searches are case insensitive.) Searches are on "strings" of characters; in fact, punctuation, spaces, and diacritics must be entered or one receives a "No documents found" message. Nonetheless, author searching has been designed so that a user can enter the fewest possible terms. Typically, it will suffice to enter an author's last name if the author's name is unique within the online bibliography. Thus, entering smollett is likely to select titles only by Tobias George Smollett in the Eighteenth-Century Fiction Database. If entering the author's full name, one must type smollett, tobias george. Author searching also works on "sub-strings" so that entering smoll also selects works by Smollett. PhiloLogic's wildcard characters may also be employed to match many forms.

Note: At this time brackets ([ ]), double quotes ("), and parentheses (( )) are not searchable in the author field. Thus, block-copying an author listing such as Eliot, T. S. (Thomas Stearns) will produce a "No documents found" message. Try the most distinctive sub-string or a wildcard character (period) for the mark of punctuation.

To search the works of more than one author type the authors' names separated by a vertical bar (|) which serves as the OR operator (with no spaces intervening). Thus, smollett|fielding|sterne searches the works of Tobias George Smollett, Henry Fielding, and Laurence Sterne as would smollett, tobias george|fielding, henry|sterne, laurence. To select all the authors in a database, leave the "Author:" field, as well as the other fields, blank.

Please note that PhiloLogic now requires the user to take into account accented characters in bibliographic searching when accents appear in the online bibliography. Accents in bibliographic fields are to be represented in the same way as in full-text searching, described in detail in section 3.1 Accents and Special Characters. Thus one may 1) enter the accented character as such from one's browser, 2) use a two character sequence (e.g.,. e^) or 3) use an uppercase letter (e.g., E) to match any form of that letter. Thus, entering calderOn or caldero/n finds works by Pedro Calderón de la Barca in the Teatro español del siglo deoro database. Depending on the operating system, one may also choose to key-in multi-byte characters directly. In this case, make sure the character set or text encoding specified by the browser corresponds to the that of the database. Tip: In order to enter search terms without having to pay attention to diacritics simply turn on "Caps Lock" and type in all uppercase.

B. Searching by Title:
Bibliographic entries in the title field must match the title exactly as given in a database's online bibliography. (One may, however, use upper or lower case letters; title searches are case insensitive.) Searches are on "strings" of characters; in fact, punctuation, spaces, and diacritics must be entered or one receives a "No documents found" message. Nonetheless, title searching has been designed so that a user can enter the fewest possible terms. Complete titles' names are rarely required to compose well-defined queries. Typically, it will suffice to enter an uncommon word or phrase from a title if the word or phrase is unique within the online bibliography. Thus, entering jones or tom jones is likely to select only The History of Tom Jones, a Foundling by Henry Fielding in the Eighteenth-Century Fiction Database. If entering a title's full name, one must type history of tom jones, a foundling with comma and spaces. Title searching also works on "sub-strings" so that entering jon also selects Fielding's The History of Tom Jones, a Foundling. PhiloLogic's wildcard characters may also be employed to match many forms.

Note: At this time entering the following punctuation marks and symbols into the title field produces a "No documents found" message: parentheses, ampersand (&), double quotes, and brackets ([ ]). In all cases, punctuation and spacing must match exactly that in the bibliography.

To search more than one title at a time type the titles separated by a vertical bar (|) which acts as the OR operator (with no spaces intervening). Thus, jones|amelia selects both The History of Tom Jones, a Foundling and Amelia as would history of tom jones, a foundling|amelia. To select all titles in a database, leave the "Title:" box, as well as the "Author:" and "Date:" boxes, blank.

Please note that PhiloLogic now requires the user to take into account accented characters in bibliographic searching when accents appear in the online bibliography. Accents in bibliographic fields are to be represented in the same way as in full text searching, described in detail in section 3.1 Accents and Special Characters. Thus one may 1) enter the accented character as such from one's browser, 2) use a two character sequence (e.g.,. o/) or 3) use a capitalized letter (e.g., O) to match any form of that letter. Thus, entering nin~a de go/mez arias or niNa de gOmez arias finds La niña de Gómez Arias by Pedro Calderón de la Barca in the Teatro español del siglo de oro database. Tip: in order to enter search terms without having to pay attention to diacritics simply turn on "Caps Lock" and type in all uppercase.

C. Searching by Date:
To define a corpus by date or a range of dates enter a single year (e.g., 1880) or a range of years (e.g., 1865-1875). Since some works cannot be dated to an exact year, it is often best to adopt a range of dates strategy. Always check a database's online bibliography to confirm dates.

Note: At this time searching by date in several ETS databases is not always productive since in some cases the publisher has entered only the date of the printed edition from which the data have been drawn, not the date of the first edition, composition, or first performance. In the African-American Poetry Database, for example, only by searching for works published in 1993 is one able to search the poems of Paul Laurence Dunbar (1872-1906) by date, since the data come from the 1993 edition. The Database-Specific Searching Tips on individual database search-forms warns users if dates of first publication, composition, and/or performance have not been entered.

Return to Table of Contents


2.2 Retrieving and Navigating Documents:

PhiloLogic displays bibliographic citations, which are linked to a work's digital table of contents, in a number of places:


Clicking on the title of a document automatically generates a "digital table of contents", showing the bibliographic entry of the document and all of the parts that have been identified in that document. The parts reflect the logical organization of the document in up to three levels of hierarchy (not all documents contain three levels). The top level part of a hierarchy is not indented and shown in bold. The second level is indented several spaces. The third level of a hierarchy is indented further and shown in italics. Any part of any level may be selected by simply clicking on it (unless the links have been disabled because of copyright restrictions). Notice the structure in the following example taken from Eighteenth-Century Fiction (links to the text have been disabled).


Fielding, Sarah [1759], The History of the Countess of Dellwyn. In Two Volumes: By the Author of David Simple. [etc.] (Cambridge: Chadwyck - Healey, 1996) [FieSar,ThHiOfT3].

   [Fielding, S./The Countess of Dellwyn, Vol. 1]
      [Fielding, S./The Countess of Dellwyn, Vol. 1, Title Page]
      [Fielding, S./The Countess of Dellwyn, Vol. 1, Preface]
      [Fielding, S./The Countess of Dellwyn, Vol. 1, Book 1]
         [Fielding, S./The Countess of Dellwyn, Vol. 1, Book 1, Chap. 1]
         [Fielding, S./The Countess of Dellwyn, Vol. 1, Book 1, Chap. 2]

         [...material omitted...]

   [Fielding, S./The Countess of Dellwyn, Vol. 2]
      [Fielding, S./The Countess of Dellwyn, Vol. 2, Title Page]
      [Fielding, S./The Countess of Dellwyn, Vol. 2, Book 3]
         [Fielding, S./The Countess of Dellwyn, Vol. 2, Book 3, Chap. 1]
         [...material omitted...]


When a part is selected, PhiloLogic displays the bibliographic citation at the top and bottom of the text with a link back to the digital table of contents. It also allows one to go to previous and next sections at the same level of the hierarchy if they should exist.

When one selects a document part from a hierarchy or a page, PhiloLogic provides links, when available, to additional material such as images or cross-references (e.g., notes). In some documents, note references are displayed at the bottom of textual units with the notes themselves available through these links from a database note server. Specific details on the location of notes and other types of material are found on individual database search-forms under Database-Specific Searching Tips.

Return to Table of Contents


3. Character Representation for Search Terms

The term(s) to be searched in selected documents are entered into the Search Text(s) For: box on the search-form. Word searches in PhiloLogic are by default case insensitive, so that a search finds both lower and upper case representations of words. The user must, however, take into account diacritics when searching databases that have accented characters. PhiloLogic's wildcard characters may also be employed to match many forms. The simplest search in PhiloLogic is a single term search without wildcards. If searching for a term such as "magic" in a database, simply type the word magic into the Search Text(s) For: box and press the SEARCH button.

3.1 Accents and Special Characters:
PhiloLogic requires that one take into account diacritics when searching documents with accented characters in both bibliographic and full-text searching. The system provides three ways to search for accented characters: 1) simply type the required accented character from the keyboard; 2) use a capital letter to match all accented and non-accented forms of a letter; or 3) enter the two character representations listed below. Tip: If you do not want to have to think about accents, turn on "Caps Lock" and type in all uppercase.

capital letter = any form of the letter
(e. g., E matches é ê è ë and e (no accent) and É Ê È Ë and E (no accent).
grave = (\) back slash
(e.g., a\ matches à).
acute = (/) forward slash
(e.g., e/ matches é).
circumflex = (^) caret
(e.g., e^ matches ê).
cedilla = (,) comma
(e.g., c, matches ç).
ümlaut/dieresis = (") double quote
(e.g., u" matches ü).
tilde = (~) tilde
(e.g., n~ matches ñ).

Special Characters and Symbols

ae-ligature (æ) = ae
the ligature is resolved into two letters. (e.g., to search æther type in aether).
oe-ligature (œ) = oe
the ligature is resolved into two letters. (e.g., to search œconomy type in oeconomy).
sz-ligature or sharp S (ß) = s^
Always check Database-Specific Searching Tips to see whether the German sz-ligatures have been resolved into two esses or not.
ampersand (&)
is not a searchable character. Avoid Phrase Searches where an ampersand could be used as a conjunction.
mathematical symbols = to be determined
the equal sign (=) and minus sign (-) will produce a "Nothing found" message. The plus sign (+) is not a searchable character, but, if entered, will be ignored.

In order to handle words properly that have italics, bold, underlining, superscripts, and subscripts, PhiloLogic does not treat the following tags as word separators:

3.2 Wildcard Characters and Boolean Operators:
Wildcard characters allow the user to enter a single search entry that may find many forms. This is in contrast to a simple word search which requires an exact match in order to find a word. The following describes the most commonly used wildcard characters in full-text searching and in bibliographic searching.

3.2.1 Full-Text Searching: PhiloLogic supports wildcard characters and Boolean (logical) operators, which are modeled on UNIX regular expressions to perform "pattern matching" in full-text searching. Pattern matching allows identification of a large number of words corresponding to a defined pattern. Wildcard characters can be useful, for example, in identifying cognates made obscure by affixes and vowel weakening, inconsistencies due to irregular orthography, and variations on account of word inflection as well as for discovering potential emendations for uncertain readings. The most commonly used regular expression operators (wildcard and Boolean) are listed below.

Wildcard Characters

. (period):
matches any single character (e.g., gentlem.n will retrieve gentleman and gentlemen).
.* (period asterisk "dot-star"):
matches any string of characters, anchoring the match at the beginning of a word (e.g., cigar.* will match cigar, cigars, cigarette, etc.), anchoring the match at the end of a word (e.g., .*habit will retrieve habit, cohabit, and inhabit), or in the middle (e.g., c.*eers matches compeers, cheers, and careers).
.? (period question mark):
matches the characters entered or the characters entered plus one more character in place of the question mark (e.g., hono.?r matches both honor and honour and cat.? matches cat and cats, but not cathedral, Catherine, etc.).
[a-z] (brackets):
matches a single character found in the specified range (e.g., [c-f]at will match cat, dat, eat, and fat) or any letters within the brackets (e.g., civili[zs]e will match both civilize and civilise).
# (hash mark):
matches capitalized words only (e.g., #bacon will retrieve Bacon, but not bacon). Otherwise word searches are case insensitive. Please note that this operator does not work properly in conjunction with the vertical bar (e.g., searching #hamlet|#bacon will not retrieve accurate results).
E (capital letter):
matches all accented and non-accented forms (e.g., to search naïveté regardless of accents type naIvetE).

Note: If you are using wildcard characters and would like to see a full list of the words matching your search-term, then run your search as a "Frequency by Title" search. The results page of a "Frequency by Title" search lists all the terms found in a database that match your search-term.

Boolean Operators

| (vertical bar):
serves as the OR operator (e.g., freedom|liberty retrieves instances of either).
Space:
serves as the AND operator in sentence and paragraph Proximity Searching (e.g., church state retrieve all cases where church and state appear in the same specified context; this is not the case in phrase searching).

These expressions can be combined for more sophisticated searches; for example, searching old|aged|ancient m.n|fellow.* finds any of the three adjectives together with the nouns man or fellow in the singular or plural.

3.2.2 Bibliographic Searching (Corpus Definition and Document Retrieval):
PhiloLogic also supports certain Boolean and wildcard operators, which are modeled on UNIX regular expressions, for "pattern matching" in bibliographic searching; however, there are important differences. Only the Boolean operator OR may be used and not AND since all bibliographic searches are by default consecutive searches. Furthermore, since bibliographic searches are also by default searching for "strings" of characters, the wildcard operator (.*) is not needed. Thus, typing habit in a bibliographic field is the same as typing .*habit.* in full-text searching. Names of authors and titles bearing diacritics must be entered with accented characters or with the use of a capital letter for the accented character. Bibliographic searching is otherwise also case insensitive. In bibliographic searching, unlike in full-text searching, marks of punctuation are not only permitted, but in most cases required, when found in the online bibliography. If the bibliography, for example, reads Poe, Edgar Allen, the name must be entered in the author field in the same inverted order with comma separating surname from given names. Otherwise, one receives a "No documents found!" message. One must also avoid unwanted spaces. Typically, it will suffice to enter an uncommon word or phrase from a title or author's name if the word or phrase is unique within the online bibliography.

3.3 Punctuation Marks and Searching

Punctuation and Full-Text Searching: In full-text searching entering marks of punctuation (for example, when a period is used as a full-stop) in the search box often produces a "nothing found" message. All punctuation marks such as the comma, question mark, exclamation mark, vertical bar (|), forward and backward slashes, colons, and semicolons as well as quotation marks, ampersands (&), asterisk (*), percentage sign (%), dollar sign, number sign (#) should be stripped from an entry especially if one is block-copying text. (Many of the symbols traditionally used for punctuation are used instead for accent representation or wildcard characters.) Some marks of punctuation are especially problematic and may be dealt with slightly differently:

Apostrophe: The only punctuation that PhiloLogic regularly supports in full-text searching is the apostrophe. Entering sister's retrieves "sister's" in most databases, but typing in sisters does not retrieve the possessives sister's or sisters', only the plural sisters. Always check Database-Specific Searching Tips on individual database search-forms to be sure since punctuation marks are treated differently because of a given language's needs. (In French and Italian, for example, the apostrophe separates words and thus must be entered with a space following it, e.g., l' histoire and d' Italia.)

Hyphen: Hyphens act as word separators in most databases. Thus, if looking for all occurrences of the word "valiant," one may enter only valiant and still find "ever-valiant." Always check Database-Specific Searching Tips on individual database search-forms to be sure since punctuation marks may be treated differently because of a given language's needs.

Brackets: Although brackets usually act as word-separators, they will not always, for example, when they indicate uncertain readings (Agr[ipp]ina). In the near future, PhiloLogic will support MSS punctuation for some databases, in which cases brackets will not be word-separators. Other marks of punctuation will be part of the MSS implementation. Always check individual search-forms under Database-Specific Searching Tips to know for sure.

Ampersand: The ampersand (&) is not a searchable character. Avoid Phrase Searches where an ampersand could be used as a conjunction.

Period: The period is not a searchable character (it serves as a wildcard operator). Please note that most databases are not tagged for sentence termination and therefore PhiloLogic must rely on marks of punctuation in combination with capitalization to identify sentence termination. This is especially problematic for combinations such as St. Ambrose. If you suspect that a period in an abbreviation may be splitting a phrase, switch to a Proximity Search in the same paragraph.

Punctuation and Bibliographic Searching: At this time entering the following marks of punctuation and symbols into bibliographic fields produces a "No documents found" message: parentheses (( )), semi-colons (;), colons (:), ampersand (&), apostrophes ('), single and double quotes, braces ({ }), brackets ([ ]), and angle brackets (< >) as well as the dollar sign ($).Thus block-copying a name such as D'Urfey, Thomas will produce a "No documents found" message. Try the most distinctive sub-string such as urfey or a wildcard character (period) for the mark of punctuation (e.g., d.urfey, thomas). The following punctuation marks have no adverse effect on an author or title search and, if appearing within a string, must be entered: period (.), hyphen (-), question mark (?), exclamation mark (!), forward slash (/), and comma (,).

Return to Table of Contents


4. Selecting a Results Format

PhiloLogic at this time offers two kinds of searches: "Single Term and Phrase Search," which is set up as the default, and "Proximity Searching in the Same Sentence or Paragraph." One may select and deselect a search option by clicking on the "radio" buttons.

4.1. Similarity searches

Similarity searches allow you to check for similar or alternative spellings for your search query that might exist within a collection of texts. To execute a similarity search, click the box immediately following the main search box labelled Similar Word Search. No numbers, textual punctuation, or wildcards are allowed when performing similarity searches. After entering your search term and submitting your search, if your search string sufficiently resembles (as defined by AGREP) strings that exist in the indicies, you will be a returned a list of potential search terms and checkboxes. The resulting search is an OR search incorporating all of your selected search terms.

4.2. Single Term and Phrase Search (Default):

To search a single term in the entire database or a defined corpus make sure that the Single Term and Phrase Search radio button is highlighted, simply enter the term into the Search Text(s) For: box, and press the SEARCH button. (One may use upper or lower case letters; searches are case insensitive.) Single Term searching supports wildcard characters and the Boolean operator OR, which is the vertical bar (|). Entering, for example, freedom|liberty retrieves all occurrences of the word "freedom" or "liberty" in the entire database or a specified corpus.

Similarly, to search a phrase make sure that the Single Term and Phrase Search radio button is highlighted, simply type the phrase into the Search Text(s) For: box, and press the SEARCH button. Phrase searching restricts the search to adjacent words in a particular order (punctuation in the text is ignored). Thus, for example, the search church state would not retrieve "church and state," but only cases where the word "church" is next to the word "state" with the word "church" preceding. To retrieve occurrences of the phrase "church and state" one must type in church and state. Phrase searching supports wildcard characters and the Boolean operator OR. Note: one cannot search for two separate phrases using the OR operator. Two separate searches must be run. One may, however, use the OR operator within a phrase; medieval|mediaeval age retrieves, for example, instances of both "medieval age" and "mediaeval age."

4.3 Proximity Searching in the Same Sentence or Paragraph:

Searching for more than one term in a single sentence or paragraph without regard to adjacency or word-order constitutes Proximity Searching. Simply type the terms in question into the Search Text(s) For: box, indicate whether they are to be found in the same sentence or paragraph by highlighting the appropriate radio button, and press SEARCH. (One may use upper or lower case letters; searches are case insensitive.) Proximity Searching supports wildcard characters, the Boolean operator OR, which is the vertical bar (|), and the Boolean operator AND, which is a space. If looking for occurrences of the words "church" and "state" within the same sentence or paragraph in any order, enter church state. Entering church state|throne retrieves instances of "church" and "state" or "church" and "throne" in the same sentence or paragraph. Note: at this time one cannot perform a proximity search with a phrase and another phrase or a phrase and another single term in the same sentence or paragraph. Remember; a space acts as the AND operator in proximity searching.

Return to Table of Contents


4.4 The Orthography Option

Use the Orthography options to specify how the search term is interpreted:

Return to Table of Contents


4.5 The Sampling Option

To request to see only a sampling from the results of a search, click the checkbox named Sample to turn it on, and type the sample size in the field. This option can be useful with searches that return a large number of results. For example, in the Shakespeare Sources database, a search for the lemma "appear" results in 1840 results, which is probably too many to examine in detail individually. To see just a sample of 100 of these results, request of sample size of 100 words. In this case, you are shown only every 18th item from the full result set, for a total of 100 samples.

Return to Table of Contents


5. Refining Search Results

Except in the case of a Frequency by Title and Author reports, references for occurrences are numbered from one and sorted by date with the works published the earliest being listed first. A Results Bibliography can be found at the bottom of the report. Bibliographic citations generally take the following form:

Sterne, Laurence [1760], The Life and Opinions of Tristram Shandy, Gentleman ... The Second Edition (Cambridge: Chadwyck - Healey, 1996) [SteLau,ThLiAnO].

Each typically shows the author's name, the date of first publication or composition, the title, information on the digital publication, and finally the short citation code in brackets. The short citation code is displayed with the reference for each occurrence in Concordance and KWIC reports. All full titles are linked to their digital table of contents (disabled in the example above). In some cases, you will find "[n.d.]" in the place of a date which means that no date was provided or that it was not tagged according to the accepted encoding specs and the metadata extractor was unable to locate it.

A user can switch to another display format at any time while viewing results without having to resubmit a search. Simply click on the appropriate link ("Click here for a KWIC Report" or "Click here for a Concordance Report"), which is always provided at the bottom of any given results page (and usually at the top, unless the report is still in progress when the first 25 occurrences are initially displayed).

Note: PhiloLogic will not complete a search that yields more than 10,000 occurrences. Only the first 10,000 will be retrieved. In addition, users are currently limited to 500 unique forms in a single search. By using wildcard characters and Boolean operators one can sometimes submit a query for a very large set of terms, especially in highly inflected languages. If a search exceeds the limit of unique forms, PhiloLogic will provide a list of all 500 plus unique forms so that the user can devise an alternate strategy for searching. Some databases such as the PLD have higher limits set. Research is underway to find ways to increase this limit substantially.

5.1.1 Frequency by Title Sorted by Raw Frequencies

A Frequency by Title report sorted by raw frequencies indicates the bibliographic criteria entered, the number of documents searched, the search term(s) entered, the number of unique forms derived from the search term(s) within the database, a list of those unique forms, and the total number of occurrences found in the defined corpus. Following this information, the report indicates the number of occurrences by title in descending order of frequency with a link to the digital table of contents for each title and a link to the occurrences found within that title. See below for an example (links to the table of contents and occurrences have been disabled).


Bibliographic criteria: author=robinson
Searching 6 documents for eft.?|newt.
Number of Unique Forms: 5

Search Terms: newt | Newt | eft | Eft | efts

Your search found 3 occurrences.


Frequency by title in descending numeric order:

1. 2 The Collected Poetry of Robinson Jeffers: Edited by Tim Hunt: Volume 3 1938-1962, Jeffers, Robinson [Occurrences]
2. 1 Collected poems of Edwin Arlington Robinson, Robinson, Edwin Arlington [Occurrences]


The Frequency by Title Report is useful if one is curious how frequently an author uses term(s) in one work as compared to his/her other works or in his/her works as compared to others' works. It can also be enlightening to see for what terms within a database one's search criteria are searching (for example, one can discover that entering the search term magic.* in Early English Prose Fiction searches for the following unique forms: magic, magical, magicall, magician, magicians, magick, magicke, and magicks).

Any definable corpus or search can be used in generating this report. Unlike Concordance and KWIC reports, this report does not display text, only frequency statistics with links to occurrences displayed in Concordance Report format. Note: the sets of occurrences linked to from the frequency report are numbered in chronological order, not by frequency. In other words, clicking on the [Occurrences] link for a title at the top of the list could, for example, bring up occurrences numbered 21-28 instead of 1-8 because that title while ranked first in frequency is not first chronologically.

Frequency by Title sorted by rate per 10000 words

A Frequency by Title sorted by rate per 10000 words report indicates the bibliographic criteria entered, the number of documents searched, the search term(s) entered, the number of unique forms derived from the search term(s) within the database, a list of those unique forms, and the total number of occurrences found in the defined corpus. Following this information, the report indicates the number of occurrences by title in descending order of rate per 10000 words. See below for an example (links to the table of contents and occurrences have been disabled).

Bibliographic criteria: author=Curry
Searching 1 documents for love.
Your search found 2 occurrences


1. Curry, James. Narrative of James Curry, A... [Occurrences]


2. Curry, James. Narrative of James Curry, A... [Occurrences]


Results Bibliography

Curry, James [1840], Narrative of James Curry, A Fugitive Slave: Electronic Edition. (Academic Affairs Library, UNC-CH, 10 January 1840), 1 p. [narrativeo].


The Frequency by Title by rate per 10000 words is useful if one is curious about the relative frequency of an author's usage of certain term(s) in one work as compared to his/her other works or in his/her works as compared to others' works. It can also be enlightening to see for what terms within a database one's search criteria are searching (for example, one can discover that entering the search term magic.* in Early English Prose Fiction searches for the following unique forms: magic, magical, magicall, magician, magicians, magick, magicke, and magicks).

Return to Table of Contents


5.2.1 Frequency by Author Sorted by Raw Frequencies

A Frequency by Author report indicates the bibliographic criteria entered, the number of documents searched, the search term(s) entered, the number of unique forms derived from the search term(s) within the database, a list of those unique forms, and the total number of occurrences found in the defined corpus. Following this information, the report indicates the number of occurrences by author in descending order of frequency with individual titles listed with a link to the digital table of contents for each title and a link to the occurrences found within that title. See below for an example (links to the table of contents and occurrences have been disabled).


Bibliographic criteria: none
Searching Entire Database for newt|eft.?.
Number of Unique Forms: 3

Search Terms: eft | efts | newt

Your search found 4 occurrences.


Frequency by Author in descending numeric order:

1. Scott, Walter, Sir, 1771--1832: 2
      1: A Legend of Montrose [in, the Waverley Novels]  [Occurrences]
      1: Guy Mannering; Or, The Astrologer [in, the Waverley Novels]  [Occurrences]
2. Lytton, Edward Bulwer Lytton, Baron, 1803--1873: 2
      2: Pelham; Or, The Adventures Of A Gentleman  [Occurrences]


Any definable corpus or search can be used in generating this report. Unlike Concordance and KWIC reports, this report does not display text, only frequency statistics with links to occurrences displayed in Concordance Report format. Note: the sets of occurrences linked to from the frequency report are numbered in chronological order, not by frequency. In other words, clicking on the [Occurrences] link for a title at the top of the list could, for example, bring up occurrences numbered 21-28 instead of 1-8 because that author's title while ranked first in frequency is not first chronologically.

Return to Table of Contents


5.2.2 Frequency by Author Sorted by Rate per 10000 Words

A Frequency by Author sorted by rate per 10000 words report indicates the bibliographic criteria entered, the number of documents searched, the search term(s) entered, the number of unique forms derived from the search term(s) within the database, a list of those unique forms, and the total number of occurrences found in the defined corpus. Following this information, the report indicates the number of occurrences per 10000 words for each author in descending order of rate per 10000 words. See below for an example (links to the table of contents and occurrences have been disabled).


Bibliographic criteria: keywords=Slavery
Searching 178 documents for freedom.
Number of Unique Forms: 1

Search Terms: freedom

Your search found 3436 occurrences.


Frequency by Author in descending order of rate per 10,000 with [frequency] in brackets (e.g., 4.72 [4] means 4.72 occurrences in 10,000 words with a total of 4 occurrences in that author's works.):

1. Miles, James Warley, 1818-1875: 19.87 [17] 17 God in History. A Discourse Delivered before the Graduating Class of the College of Charleston on Sunday Evening, March 29, 1863: Electronic Edition. [Occurrences]
2. Gallaudet, T. H. Thomas Hopkins, 1787-1851: 18.69 [5] 5 A Statement with Regard to the Moorish Prince, Abduhl Rahhahman: Electronic Edition. [Occurrences]
3. Eliot, William Greenleaf, 1811-1887: 18.54 [43] 43 The Story of Archer Alexander.From Slavery to Freedom, March 30, 1863: Electronic Edition. [Occurrences]


The Frequency by Author by rate per 10000 words is useful if one is curious about the relative frequency of an author's usage of certain term(s) compared to other authors in the same collection.

Return to Table of Contents


5.3.1. Frequency by Year Group Sorted by Raw Frequencies

A Frequency by Year Group report sorted by raw frequencies indicates the bibliographic criteria entered, the number of documents searched, the search term(s) entered, the number of unique forms derived from the search term(s) within the database, a list of those unique forms, and the total number of occurrences found in the defined corpus. Following this information, the report indicates the number of occurrences by year group in descending order of frequency, with rate per 10000 words in brackets. Each work has a link to the digital table of contents for each title and a link to the occurrences found within that title. See below for an example (links to the table of contents and occurrences have been disabled).


Bibliographic criteria: none
Searching Entire Database for amber.
Number of Unique Forms: 1

Search Terms: amber

Your search found 30 occurrences.


Frequency by Years in descending numeric order with frequency in bold and [rate per 10,000] in brackets:

1. 1650-59: 26 [1.96] 25 Grey, Elizabeth, Countess of Kent, A Choice Manual, or Rare and Select Secrets, 1653 [Occurrences]
1 Bradstreet, Anne, The Tenth Muse, 1650 [Occurrences]
2. 1790-99: 2 [0.81] 2 Cristall, Ann Batten, Poetical Sketches, 1795 [Occurrences]
3. 1620-29: 1 [0.47] 1 [unknown], Swetnam, the Woman-Hater, Arraigned by Women, 1620 [Occurrences]
4. 1660-69: 1 [0.17] 1 Philips, Katherine (Fowler), Poems, 1664 [Occurrences]


The Frequency by Year Group sorted by raw frequencies report is useful for comparing the overall usage of certain terms across entire historical periods.

Return to Table of Contents


Frequency by Year Group Sorted by Rate Per 10000 Words

A Frequency by Year Group report sorted by rate per 10000 indicates the bibliographic criteria entered, the number of documents searched, the search term(s) entered, the number of unique forms derived from the search term(s) within the database, a list of those unique forms, and the total number of occurrences found in the defined corpus. Following this information, the report indicates the number of occurrences by year group in descending order of frequency, with rate per 10000 words in brackets. Each work has a link to the digital table of contents for each title and a link to the occurrences found within that title. See below for an example (links to the table of contents and occurrences have been disabled).


Bibliographic criteria: none
Searching Entire Database for amber.
Number of Unique Forms: 1

Search Terms: amber

Your search found 30 occurrences.


Frequency by Years in descending order of rate per 10,000 with [frequency] in brackets (e.g., 3.09 [8] means 3.09 occurrences in 10,000 words with a total of 8 occurrences for that period of years.):

1. 1650-59: 1.96 [26] 25 Grey, Elizabeth, Countess of Kent, A Choice Manual, or Rare and Select Secrets, 1653 [Occurrences]
1 Bradstreet, Anne, The Tenth Muse, 1650 [Occurrences]
2. 1790-99: 0.81 [2] 2 Cristall, Ann Batten, Poetical Sketches, 1795 [Occurrences]
3. 1620-29: 0.47 [1] 1 [unknown], Swetnam, the Woman-Hater, Arraigned by Women, 1620 [Occurrences]
4. 1660-69: 0.17 [1] 1 Philips, Katherine (Fowler), Poems, 1664 [Occurrences]


The Frequency by Year Group sorted by rate per 10000 words is useful for comparing the relative usage of certain terms across entire historical periods.

Return to Table of Contents


5.4 Collocation Table

The collocation table report provides users with a simple way of seeing the words with which the search terms most often co-occur. By default, it filters out short and very high-frequency words which can be viewed by clicking on the "Filtered Words" link. This filter can be turned off however, by clicking the "Turn Filter Off" check box. The user also has the option of expanding and narrowing the word span on each side of the search term(s) in question using the numeric drop down menu "Spanning [5] Words". A collocation table indicates the bibliographic criteria entered, the number of documents searched, the search term(s) entered, the number of unique forms derived from the search term(s) within the database, a list of those unique forms, and the total number of occurrences found in the defined corpus. Following this information, you'll find in descending order of frequency, the words that most often occur within the chosen span to the left, to the right, and on either side of the specified search term(s). See below for an example.


Bibliographic criteria: none
Searching Entire Database for tradition.
Number of Unique Forms: 1

Search Terms: tradition

Your search found 459 occurrences.


Keywords found (with occurrences): tradition (459)

The 120 most common words are being filtered from this report. To include filtered words select "Turn Filter Off" on the search-form.

Ranking Within 5 Words
on Either Side
 Ranking Within 5 Words
to Left only
 Ranking Within 5 Words
to Right only
1 says (38)  1 according (16)  1 says (38)
2 family (23)  2 history (14)  2 among (16)
3 history (20)  3 family (13)  3 family (10)
4 among (19)  4 way (11)  4 tells (7)
5 according (16)  5 another (7)  5 history (6)
6 way (13)  6 tradition (5)  6 authority (6)
7 authority (11)  7 oral (5)  7 tradition (5)
8 tradition (10)  8 name (5)  8 three (5)
9 down (9)  9 left (5)  9 north (5)
10 indians (8)  10 indians (5)  10 handed (5)

Return to Table of Contents


5.5 Word in Clause Position Analysis (Theme-Rheme)

The word in clause position analysis report is a highly experimental report inspired by Michal Halliday's theme-rheme analysis. The main idea is that the contextual significance of a word can often be determined by its position within a clause: the front of a clause (theme) or the end of the clause (rheme). By default, only front-of-clause hits are returned because we assume these to be more significant, but after selecting to refine your search results using this report, you can choose several options from the "Display Options" drop down menu.

A word in clause position analysis indicates the bibliographic criteria entered, the number of documents searched, the search term(s) entered, the number of unique forms derived from the search term(s) within the database, a list of those unique forms, and the total number of occurrences found in the defined corpus. Following this information, the report returns the occurences in context and indicates within what percentage of the length of the clause the word falls. Clauses are identified by punctuation. See below for an example.


Bibliographic criteria: none
Searching Entire Database for tradition.
Number of Unique Forms: 1

Search Terms: tradition

Your search found 459 occurrences.


Clause Position Analysis

Positions are calculated on within what percentage of the length of the clause the word falls. Front of Clause (first 35%); Last (last 10%), Remainder (middle 55%), Too Short (clause length 3 words or less). Words of 2 letters or fewer and numbers are excluded in calculating clause length. Clauses are identified with punctuation as the primary determining factor.

Go to Statistical Summary Below

Go to Front of Clause (hits)


Front of Clause (Theme)


Front: 1. [2/6 = 33.33%] Tucker, John... . The Bible or Atheism: Electronic... [page 7 | Paragraph | Section]

marts of Christendom, and amid the Islands of the sea; within the barred gates of Japan or the walls of China; with every other variety, there is one unity of thought-Humanity everywhere believes there is a God! Nay, more-in the morning of its birth, as far as tradition or history tells its story; in its infancy; in its heyday of glory; in the dark age of barbarism; from its cradle to its meridian prime; amid all other changes and revolutions; in religion, with an unbroken unity of expression-Humanity still declares there is a


Front: 2. [3/13 = 23.07%] Tucker, John... . The Bible or Atheism: Electronic... [page 8 | Paragraph | Section]

of the race, that its very mention thrills every fibre of humanity now, and must do so eternally. It is, indeed, no dream of human fancy-no conclusion from the terms of a human syllogism, but a fact manifested by divinity, in such a manner that, from age to age, history and tradition have handed it down to fill the wicked with terror and fear: the pious with devotion, reverence, and love. 4. But I argue, there must have been such a revelation, because the non-existence of it is so improbable-and if there was such a revelation, it was obviously the


Front: 3. [2/7 = 28.57%] Brinch,... . The Blind African Slave, or... [page 62 | Paragraph | Section]

the government are appointed and installed, or sworn into office, the pleasures varying from day to day. One day, combats are performed; next, feats of agility; on another, acts of strength &c. until the feast closes, which continues generally about seven days. There is a tradition which was handed down among us, that this custom was anciently introduced by a great high priest of a foreign land, whose name was Ziphia; and here I will observe thet there are certain societies, as I was informed by my Grandmother, Whryn Dooden Wrogan, which had


The clause position analysis report is highly experimental. We hope that it can be useful in filtering out less significant occurences of the search term(s) based upon the position within a clause.

Return to Table of Contents


5.6 Concordance Report (300 Characters Plus)

Concordance reporting is the default results format option. This report indicates the number of texts searched, the search term(s) entered in a defined corpus, and the total number of occurrences found. (The number of occurrences displays at the top of the report if PhiloLogic has detected the number before generating the first 25 occurrences. If not, the total number of occurrences displays at the bottom of the report.) Following this general information is a list of occurrences. Each occurrence is represented by a short citation consisting of abbreviations for the author's name and the title of the work with a reference to where the term(s) in question occur within the document. (Full entries for the short citations are listed in the Results Bibliography at the bottom of the report.) References may be page numbers, acts and scenes, chapters and verses, columns, and the like. Along side the citation is listed several levels of context (e.g., page, paragraph, or levels of hierarchy designated by h3, h2, and h1). Below the short citation there is a passage of text consisting of some forty words on either side of the key word, which is highlighted. PhiloLogic, however, displays as much text as needed to capture all words in a multi-term search and all search words are highlighted. The reference listed with the short citation is linked to the text. If clicking on the page number, one retrieves the full page with key words still highlighted. The same is true for paragraph and the three other levels of hierarchy. Links to the previous and next page, paragraph or levels respectively, if they exist, are provided.
Note: remember that, when searching for two or more terms within the same paragraph, the concordance report expands the amount of text displayed to include all of the search terms in the paragraph. At times the text displayed in a proximity search to accommodate all the search terms may be several screens in length since some paragraph divisions in documents in some databases are very far apart.

In cases where a search finds more than 25 occurrences, PhiloLogic provides the first 25 occurrences with links at the bottom of the report to the remaining occurrences of the search in sets of one hundred. One may also retrieve a full list of occurrences which can be useful for down-loading or printing, but which may take some time to retrieve. Note: when results number over hundreds or thousands of occurrences, the report may not be complete when first starting to view results. In this case, one sees the message "The search is still in progress. 908 occurrences have been generated so far. (please follow the link(s) below to check on the progress) ". The server continues to append results until it has completed the entire report and, by clicking on any of the sets of one hundred, one can retrieve the full report.

Return to Table of Contents


5.7 Line by Line (KWIC - Key Word In Context) Report (A Single Line of Text)

As in a Concordance Report, a KWIC (pronounced "quick") report indicates the number of texts searched, the search term(s) entered in a defined corpus, and the total number of occurrences found. (The number of occurrences displays at the top of the report if PhiloLogic has detected the number before generating the first 25 occurrences. If not, the total number of occurrences displays at the bottom of the report.) Following this general information is a list of occurrences. Each occurrence is represented by a short citation consisting of abbreviations for the author's name and the title of the work with a reference to where the term(s) in question occur within the document. References may be page numbers, acts and scenes, chapters and verses, columns, or the like. A KWIC Report differs from a Concordance Report in that it limits the text displayed to only a single line of text. The search term, which is highlighted, is centered in the line so that a user can quickly scan the results. At the bottom of the report one finds the Results Bibliography, which lists the full references for the short citations above. Unlike the Concordance report, a KWIC report only offers one level of linked context (typically a page reference or scene number) with search terms still highlighted and the next and previous pages (or scenes) available, if they should exist.

In cases where a search finds more than 25 occurrences, PhiloLogic provides the first 25 occurrences with links at the bottom of the report to the remaining occurrences of the search in sets of one hundred. One may also retrieve a full list of occurrences which can be useful for downloading or printing, but which may take some time to retrieve. Note: when results number over hundreds or thousands of occurrences, the report may not be complete when first starting to view results. In this case, one sees the message "The search is still in progress. [908] occurrences have been generated so far. (please follow the link(s) below to check on the progress) ". The server continues to append results until it has completed the entire report and, by clicking on any of the sets of one hundred, one can retrieve the full report.

Note: when executing a "Proximity Search," especially with paragraph set as the searching parameter, it is best to avoid the KWIC format since all search terms are not likely to be in the single line of text displayed. The term that is located first in the paragraph is the one that is centered in the single line of text. Using the Concordance results format ensures that all terms are included in the display even if the paragraph should happen to run for several pages. One can switch from a KWIC format to a Concordance Report format at any time while viewing results and switch back. PhiloLogic takes the user to the same set of results being viewed at the time of the switch.

Return to Table of Contents


5.8 Navigating Documents from Word Searches

In a Concordance report one finds several options for viewing more context around one's matched term(s). In addition to "page" and paragraph, one finds other levels of context. The parts of a document in up to three levels of hierarchy are indicated by h3, h2, and h1 and reflect the logical organization of the document from smaller parts (h3) to larger parts (h1). In other words, the top level part of a hierarchy is h1; the second level is h2; and the third level of a hierarchy is h3. What each level represents depends upon how each text was encoded and so in some cases there may not be an h3 (e.g., Volume/Book/Chapter or Act/Scene). Any part of any level may be selected by simply clicking on it. Once a user goes to a second level of context, he/she will find the search term(s) still highlighted. One may also find the next and previous sections for each level if one should wish to "flip through" the document by sections (provided that a next or previous section exists for a given level). As always, the linked table of contents for the entire work is available by clicking on the title of the work as listed in the Results Bibliography at the bottom of a report or in the reference citation, when within sections, listed at the top and bottom of any level of sections.
Please note that some databases have limited navigation because of copyright restrictions, at which times only a few pages of context are allowed and the links from the digtial table of contents are disabled.

Notes: In PhiloLogic notes never interfere when searching the text to which they refer. Note references are linked to notes and in recently acquired databases text from notes is linked to page references. Note references can be found on any level of context (e.g., page, paragraph, h3, h2, or h1), but not from a first-level results screen.

Images: Most images are displayed as inline images once the user pulls up any level of context (e.g., page, paragraph, h3, h2, or h1), but not from a first-level results screen.

Sound: In databases for which there are recordings, one finds links to RealAudio files from any level of context (e.g., page, paragraph, h3, h2, or h1), but not from a first-level results screen.

Return to Table of Contents