Introducing Houghton Library’s New Digital Archivist

Front facade of Houghton Library

By Monique Lassere, Digital Archivist, Houghton Library

Hi, everyone. My name is Monique and I am Houghton Library’s new Digital Archivist! I started working at Houghton in May 2020. My job sits within the Manuscript Section and revolves around the born-digital collections Houghton acquires in the form of media like hard drives and floppy disks, or networked content, like websites. While I haven’t yet visited Houghton’s physical space due to the pandemic, there’s been no shortage of work to dive into while working remotely.

Over the last few months, I have spent much of my time digging into the born-digital archival materials we have on cloud storage. I’m able to do this because of the previous work conducted by Accessioning Archivist, Melanie Wisner, and past Administrative Fellow and Project Archivist, Magee Lawhorn, to get born-digital work off the ground at Houghton. Before I arrived, Melanie and Magee captured files from born-digital removable media, like the 3.5” floppy disks in the John Updike, John Ashbery, and Jerry Schatzberg papers, respectively. As a result, I can begin to work with the digital files we have on cloud storage to determine how we can best provide access for researchers to these materials.

These particular born-digital objects are in the middle of being processed, meaning we still have to run the objects through workflows to identify sensitive information or to extract the actual files (such as .DOC files) from the existing digital objects on cloud storage before they can be made available. While they aren’t ready for public access yet, I am currently assessing the materials to better understand needs these files have for proper access.

Many files are old text-based formats, some of which can be opened in current editions of Microsoft Word, while many others require older, outdated software and without these can only be opened in plain-text editors. These older files retain a lot of data that isn’t necessary for the user but is visible because current software can’t open the file in the way it was originally intended.

Below is a screenshot of a writing file: the essay “My Cartooning”—written for Hogan’s Alley—from the John Updike papers. The file is an older, text-based format with the .SAM file extension.

Lines of code in white, on a black background, showing many rows of number strings before the text content.
Screenshot of John Updike’s “My Cartoonage!” in plain text editor. Image courtesy of Monique Lassere. 

While Updike’s text is readable, there are about 700 lines of encoding visible before we get to read what Updike wrote:

Lines of code in white, on a black background.
Screenshot detail of John Updike’s “My Cartoonage!” in plain text editor. Image courtesy of Monique Lassere.

Further, formatting information, like font and font size, is visible and interwoven with what the author wrote. That is not ideal and leaves questions for me like the following: is it possible to migrate this file format to one that is more readable, without so much encoding and formatting data retained? What format would allow the users to see and read the original content?

If it seems complicated, it is! That is, in part, why many of the other projects I have been working on will help Houghton provide access to these born-digital files. Some of this work includes training to deposit files into Harvard Library’s Digital Repository Service (DRS), which will ensure the long-term preservation of and access to our born-digital archives; creating policy documents, which will make public and guide our stewardship decisions; and cross-repository organizational work with other Digital Archivists and digital preservation folks on campus to share workflows and strategize and support one another on common issues, like providing access to outdated files or software.

All of this work supports Houghton’s common, overarching goal to provide access to letters, manuscripts, and other born-digital archival content to users and researchers across the world. We recently joined the Software Preservation Network, an organization dedicated to preserving software and helping researchers and cultural heritage institutions build the resources to do so. As a part of our membership, we get the chance to be a part of the 2021 Hosted Emulation Services Pilot, where we will be working towards providing virtual access to older files, such as the .SAM files that are a part of the Ashbery papers, in their native computing environments. As we embark further into our born-digital preservation work, watch this space for updates and developments!