From 2015 to 2018, North Carolina, Kansas, and Utah collaborated to develop “Transforming Online Mail with Embedded Semantics,” (TOMES) with the support of a grant from the National Historical Publications and Records Commission (NHPRC). TOMES “seeks to identify email accounts of public officials with enduring value in order to capture, preserve, and provide access to important government records.” As TOMES concluded in December 2018, the North Carolina State Archives and the School of Information and Library Science at UNC Chapel Hill received a grant from the Mellon Foundation to start “Review, Appraisal, and Triage of Mail” (RATOM).
The following interview transcript discusses both projects. Taylor de Klerk (TdK), Acquisitions and Appraisal Section intern, conducted this interview on February 7, 2019, with Camille Tyndall Watson (CTW), Digital Services Section Head for the North Carolina State Archives.
TdK: Please explain your position at the North Carolina State Archives
CTW: The State Archives is made up of different sections and we all kind of have a different focus. In my position as Digital Services Section Head I oversee a team that covers two different areas of the State Archives’ work. The first is the Digital Access branch, and that’s headed up by our amazing branch manager Ashley Yandle. Her team handles our scanning projects, our digital collections, our website, our social media, and our online catalog–pretty much anything that’s forward facing as an access point that’s digital. The rest of my staff, and a lot of my time, is spent working on digital preservation and our digital repository. That includes work such as day-to-day maintenance of the repository, developing policies and best practices for electronic records management, both internally and as guidance for state agencies, and consulting both in-house and external to the state archives with folks about electronic records management. We work really closely with our records analysis unit within the government records section to discuss transfers of records and all that. We also work with state agencies to help them figure out their electronic records management programs. We’ve even done consultations with local and municipal governments to figure out the best ways for them to manage their electronic records.
TdK: How did you come to have that position?
CTW: It’s been kind of a roundabout circle. When I was in graduate school at UNC getting my MLS, I wanted to learn everything I could. I kind of wanted to have a very broad focus, one, because I knew I was going to be applying for jobs, but two, just to really have an understanding of what the entire field looks like. One of the jobs that I was lucky enough to get was a graduate assistantship with the Southern Folklife Collection at UNC, and part of that was doing some audio digitization and editing of audio digital files. That was my first job out of grad school, working on a grant project with them. From there, I had a little bit of a lull after that grant ended before I found my next position, and so I decided to expand my understanding of digital materials. I took some online classes and some community college classes in very basic coding. I’m certainly not a developer by any means but at least it helped me get a better understanding of what was going on there. And then I was lucky enough to get a job at the North Carolina Railroad Company, which is also in Raleigh, as an archives assistant doing some reprocessing of their collections. They had just gotten a new content management system, so doing a lot of entry into that and helping maintain that system. Then a position for a digital archivist job came up at the State Archives and it just seemed like a really great fit. It was going to allow me to work with a lot of different types of materials and different types of people and stay in a digital space which I had kind of realized was really something I was interested in because it was always changing. It always gives us archivists a chance to be kind of entrepreneurial in a way, because it’s constantly developing. So that’s how I ended up at the State Archives and then from there after a few years my current position opened up and here I am.
TdK: I know you’re working on two projects, TOMES and RATOM. The TOMES one has been going for a couple of years, right, and the RATOM one was just recently funded? Could you explain the TOMES project?
CTW: TOMES was a project that started in 2015 and we actually recently wrapped up the first round of funding through NHPRC in December of 2018. Essentially what we were setting up to do with TOMES was twofold. First, we wanted to figure out a way to handle the appraisal of email within state government, and what we decided on was basically an adapted capstone approach using NARA’s capstone guidance and figured out how to make that work for state government. We also hoped to, and I think we’re getting fairly close to, figure out a way to at least quasi-automate that process by working with our Department of Information Technology (DIT) and our Office of State Human Resources (OSHR) to identify positions by position number and then to run reports on who has left capstone positions and put legal holds on people who enter capstone positions. That’s still kind of in the works but we’re getting closer so I think we’ll have something at least approximating that goal in the coming year. So that was the first half and we worked very closely with DIT and OSHR but also our government records section, especially our records analysts, to develop appraisal criteria for that working with state agencies to help figure out what those capstone positions are and nail that down. The second half of TOMES was a development project which was, basically we created a tool that is a microservices tool. So it can take you from having a PST all the way to having a tagged XML file. It goes from PST to a MIME format email, into an EAXS file, which acts as a XML schema for email that the state developed as part of a different grant. And then we built out an NLP tagging tool that has dictionaries that are specific to state government format as well as PII and sensitive information tagging. So that’s what would get spit out at the very end is an XML file that has the embedded tags in it to improve the time it might take to process an email account.
TdK: So you identify the public officials who fit into that capstone description, and you acquire their emails, and then you appraise the contents of those emails?
CTW: Well, so the major appraisal is happening at the beginning of figuring out what positions are capstone. We’re looking at it from a functional perspective, what is the function of this position in government, as opposed to the individual person, because what we found when we were very very early on testing and taking a look at preliminary capstone lists that we were getting from agencies is that we couldn’t really look at the account to tell if it was archival or not right off the bat because people use email in very different ways. If we did it by function in state government, that at least helped us get a list of what email accounts we need to take a look at, or what email accounts we needed to bring in to process and have in our collections. The development part of the tool will, in our next stage of development which is what RATOM is, will allow us to do more appraisal of the email itself and do appraisal of the messages.
TdK: So as a result of TOMES you’re just taking the lump sum of those emails and then you’re going to appraise them more minutely as the RATOM project develops? It’s very complicated.
CTW: Yes, it is. So RATOM is still part of the TOMES team, the collaboration between the TOMES team here at the State Archives of North Carolina and the BitCurator team over at UNC. What we found in the first part of TOMES is that our usual models of processing electronic records, or at least what we were able to find as published guidance, wasn’t particularly useful for processing emails. At least for the size of the accounts that we’re bringing in because it’s just such a large bulk of materials. So we worked with our records description unit over at government records and what they were telling us is that when they get big transfers they can’t necessarily process a collection all at once, it’s just not realistic, so they process on demand. What we’re hoping to build out functionality with in TOMES is the ability to do something similar with email accounts, we call it iterative processing, and so as emails are requested they can be searched and processed and we can do appraisal of record or non-record, restricted, open, redacted, all of that as part of the iterative processing process.
TdK: Once you’ve identified those capstone positions, do you require them to do anything to their emails before they’re transferred?
CTW: No, we don’t. What happens is once they’re identified as capstone, they don’t have to do anything else. It’s all us behind-the-scenes working with our human resources office and DIT to place legal holds on their email accounts so that they can be transferred once they leave that capstone position. If people work in an email account outside of DIT, they might have to do more to work with their in-house IT to do similar work, but there’s nothing special they have to do with the management of their inbox.
TdK: What kind of reactions do you get from people as you acquire their emails? Do you ever get any pushback?
CTW: Oh sure, email kind of occupies a funny space in how we do our business, right? It’s somewhere between writing letters and correspondence and voicemail. For every important email I get, I also get three “hey guys, there’s donuts in the break room.” So depending on the agency and what they do kind of depends on what the buy-in is. For agencies that do a lot of work that is restricted or confidential, they’ve been pretty concerned about us taking their email because they don’t want it to just be widely released without review, which is completely understandable and we would never do that. Some of that work has been outreach, educating about how we take care of records, how we go about providing access, what we’re doing to review and process it, to help them feel more comfortable with the concept of transferring to us. Other agencies question whether or not email is public record, which again just involves a lot of outreach and conversations and education. But at the same time, we never transfer records without the permission of the agency. It’s not like somebody says, “this position’s a capstone position” and we take it without asking. The way we work is that at the point somebody would leave a capstone position, we would notify the agency and say, “hey, we see that this person left this position, this position is designated as capstone, it’s eligible for transfer.” Then we would go through the process of helping them with that transfer, working with DIT and the agency to get all of our paperwork signed so it’s a properly documented transfer and if they opt not to transfer at that time, we’re not going to force their hand.
TdK: I’m guessing that it’s part of a retention policy that it will get transferred eventually though.
CTW: Yes, capstone positions are written into our retention schedules. Within the past year we actually redid our schedules to be a functional schedule, so capstone is written in to those schedules now.
TdK: I know RATOM is just starting, but what kinds of attributes will likely factor into that email appraisal? Earlier you mentioned the “donuts in the breakroom emails,” are you going to keep those or will they be deaccessioned?
CTW: So the funny thing about emails is that because we’re exporting an entire PST it’s not made up of individual message files, it’s one large file. Some of that is going to depend on the development roadmap, if it’s not deaccessioned or actually removed from the account, it will at least be hidden. It will be classified as a non-record and just won’t show up in an ideal world. How that actually works in practice will depend on our developers and the solutions we find.
TdK: So they’ll still be preserved?
CTW: Probably. Right now that’s how it looks but if we could find a way to remove it in a way that doesn’t break the file, I think that would be great because it would save some storage space. At the same time, within government records we also have this issue, and this is kind of where iterative processing comes in and where we’re investigating what iterative processing does with things like OAIS where you’re supposed to get something into an archive and it doesn’t change, but we’re getting emails and then we’re putting them through several migrations and tagging them and doing this iterative processing process, and there’s no way I could even say from PST to MIME format it’s the same. We’re already having to keep the original PST in these migrations to vouch for the authenticity of these records.