ENCODING A "LEGACY" FINDING AID IN MICROSOFT WORD
Forgetting, for the moment, intellectual reordering, there are two major practical stumbling blocks in retrospective tagging: special characters and layout. EAD files are more or less long streams of ASCII characters. They can contain hard returns between some elements to appear more readable, but any effects or styles or layout within the output is strictly a result of the tags. Although the EAD file itself has no particular page layout, it is integral to efficient tagging that, in the legacy file, one lines up the elements of each unit of description in a standard format as an intermediate step.
Since EAD files are .txt files consisting of plain characters, and extant finding aids often incorporate word-processor specific formatting, the first step in conversion is to look for formatting that is important to the understanding of the finding aid and replacing it with EAD tags. Examples would be italicized titles or subscript and superscript scientific annotations. Everything in this section can be applied to front matter as well as container lists. As Microsoft Word is by far the most popular word processing application, is used in these examples. Readers who use other word processing software should consult their documentation for analogous special characters and key sequences. The principles remain the same.
From Special Characters to Tags in Your Word Processor
Italics seem to be the most common special character, and so will illustrate this example, but the principles are the same for any special character. Make sure that only the words you want to remain italicized are italicized, and do not include any extraneous spaces or punctuation [1]. Call up the "Find & Replace" box by simultaneously pressing the [Ctrl] key and the [H] key or by pulling down the "Replace" option from the "Edit" menu. In "Find what:" choose "Format: Italic" by simultaneously pressing the [Ctrl] key and the [I] key. In "Replace with:" type <emph render="italic">^&</emph> OR <title render="italic">^&</emph>, depending on the requirements of the finding aid, and choose "No Formatting" or, holding down the [Ctrl] key, press [I] once or twice until "Format: Not Italic" appears.
The ^& inserts the "found" text between the tags whenever italicized text (in this case) occurs. The same technique can be used for any type of formatting that needs to be tagged.
[Ctrl] and [B] is equivalent to EAD's bold
[Ctrl] and [I] is equivalent to EAD's italic
[Ctrl] and [U] is equivalent to EAD's underline
[Ctrl] and [B] then [I] is equivalent to EAD's bolditalic
[Ctrl] and [B] then [U] is equivalent to EAD's boldunderline
For tagging some formats, such as subscript and superscript, it is necessary to open the "Find & Replace" box and manually choose the correct format for the "Find what:" box. If the [Format] button is not visible, click the [More] button. Click Format, and select "Font..." In the "Find Font" menu that pops up, choose the "Effect" you would like to tag, e.g., Superscript.
Lining It Up
The next step in "automated" tagging of a container list is getting the list into a consistent format. Your goal is to have all of the information for each smallest unit (box, folder, or item, depending on the level to which the collection is processed) on one line, with a tab dividing the elements (box, folder, title, date, notes, etc.) and a hard return at the end of the line. This may create "lines" so long that they wrap around and take up several rows of text. This is irrelevant as long as the paragraph symbol occurs, when visible, only at the end of the final element each time. Take series divisions into account at this point. If the container list is divided into multiple series but the format is the same from series to series, it may be easiest to temporarily remove the series identification, tag the data as one, and add the series headers back later. If each series has a different format, e.g., Box, Folder, Contents, Notes for one series and Box, Contents, Span Date for another, then you will want to work with each series separately. In the latter case you will need to save each series as a separate file.
Save your document as a text file and show nonprinting characters (by clicking the paragraph symbol, in Word) in order to try and line up the elements as best as possible. You may not have a box number on each line. If this is the case, select the range of folders for each box and use this search and replace: Find: ^p, Replace with: ^p1 [if box number is 1]. Be sure and use "Search: Down" instead of "Search: All" if you are working with one block of text at a time. In the same way, series and subseries names can be added to the list. If the information is to be migrated to a database later, these elements will allow more accurate sorting. If for whatever reason, the box and folder numbering sequence does not flow sequentially from series to series, this step is vital.
You should strive for one and only one [Tab] between elements. Remove multiple [Tab]s like this: find: ^t^t, replace with: ^t. Repeat this operation until you get 0 replacements.
If the container list has been formatted into columns using the space key rather than the [Tab] key, you can count the number of spaces between each "field" and replace them with a [Tab]. If the number of spaces varies, find the minimum number of spaces used, replace that number with a [Tab] through Search and Replace, and then do another Search and Replace for ^t followed by one space replacing that with just ^t. Do this until you have 0 replacements.
Although it is not required within the EAD data type, it is possible to tag dates quickly if they are in a regular format. For example, if dates are always in YYYY format, one year per folder, and no other digits appear four in a row: find: ^#^#^#^#, replace with: <unitdate>^&</unitdate>. An advanced, multiple-step version of this operation can tag multiple date styles.
Suppose you have a container list with several hundred folders and four styles of dating them:
Report cards, 1975-1987
Assorted punk ephemera, ca. 1982-1987
Short story ("Plague of 1976"), 1983
Detention notices, 1983, 1985
Since all lines end in a unit date, find: ^p and replace with: </unitdate>^p to place the end tag. (This ensures that only the actual unit dates that end each entry get tagged, not stray dates such as in "Plague of 1976," above.) Then, starting with the longest format:
Find: ca.[space]^#^#^#^#-^#^#^#^#</unitdate>^p Replace with: <unitdate>^&
Find: ,[space]^#^#^#^#, ^#^#^#^#</unitdate>^p Replace with: <unitdate>^&
Find: ,[space]^#^#^#^#-^#^#^#^#</unitdate>^p Replace with: <unitdate>^&
Find: ,[space]^#^#^#^#</unitdate> Replace with: <unitdate>^&
NB: the comma and [space] beginning these examples are imperative; without them, previously tagged date formats would have more tags added to them. Even so, with the last date style you will have to approve the changes one-by-one, to avoid adding extra tags to the "1983, 1985"-style format.
Finally, to move the commas and spaces that may have been stuck inside the <unitdate> tags back where they belong:
Find: <unitdate>,[space]
Replace with: ,[space]<unitdate>
Now that you have a lined-up container list free of special font styles, the list can be marked up. There are two quick ways to do this: with a word processing macro and by importing the list into a database. The latter option has the added benefit of creating a sortable table, and is the option preferred by the author, but a macro, once set up, is quicker.
Tagging Container Lists with Word Processing Software
The principle behind tagging a container list with word processing software is the repeating macro. If a list is standardized, then actions performed once can be automated. In Word, a macro is recorded by pulling down the "Tools" menu and selecting "Macros-->Record New Macro..."
For example, this recordable sequence, begun with the cursor at the beginning of the container list, would build tags around a line with the format [Box #][Tab][Folder#][Tab][Folder Contents][Paragraph Mark]:
1. Find: ^t
2. Press "Cancel"
3. Hit the [Delete] key.
4. Type: </container></dentry><dentry spanname="c4-6"><container type="folder">
5. Find: ^t
6. Press "Cancel"
7. Hit the [Delete] key.
8. Type: </container></dentry><dentry spanname="c7-20"><unittitle>
9. Find: ^p
10. Press "Cancel"
11. Hit the [Delete] key.
12. Type: </unittitle></dentry></drow></c02>
13. Hit [Enter]
14. Type: <c02><drow><dentry spanname="c1-3"><container type="box">
Stop recording, then repeat this macro until all of the list is tagged, and move the final <c02><drow><dentry spanname="c1-3"><container type="box">" to the top of the list. Note that <c02> will not always be the appropriate component level.
If your container list is long, you can specify how many times you would like the macro to repeat by adding two lines in the Visual Basic Editor. From the Tools menu, open "Macro-->Macros..." and double-click the macro you have created. Within the Sub statement, add a "For...Next" loop where x equals 1 to the number of repetitions of the macro you would like. For a macro named "TagList," you would add the "For..." and "Next x" lines thus:
Sub TagList()
For x = 1 To [number of repetitions]
[all other statements]
Next x
End Sub
[1] If the document has a great deal of italics, it may be a good idea to search for inadvertently italicized hard returns (^p in Word) and replace them with non-italicized versions.
Return to David Gartrell's main page