Testing the Waters: Working With CSS Data in Congressional Collections

Similar documents
Congressional Papers Roundtable Newsletter

Inventory Project: Identifying and Preserving Minnesota s Digital Legislative Record

Case 4:14-cv SOH Document 30 Filed 11/24/14 Page 1 of 10 PageID #: 257

PRACTICE DIRECTION [ ] DISCLOSURE PILOT FOR THE BUSINESS AND PROPERTY COURTS

Belton I.S.D. Records Management Policy and Procedural Manual. Compiled by: Record Management Committee

Digitisation Project Tanja Zech NSW Parliament

101 Ready-to-Use Excel Macros. by Michael Alexander and John Walkenbach

The Digital Appellate Court Introduction to the edca Electronic Portal

Abstract: Submitted on:

April 1, RE: Florida Courts Technology Commission Yearly Report. Dear Chief Justice Labarga:

Appendix 2. [Draft] Disclosure Review Document

User Guide. City Officials Historical Database. By Susan J. Burnett

Appraising a Retiring Senator's Papers: A View from the Staff of Senator Alan Cranston

RECORDS RETENTION IN THE MONTANA LEGISLATURE

AMENDATORY SECTION (Amending WSR , filed 1/31/06, effective 3/3/06)

Indiana Digital Preservation (InDiPres) Governance Policy Approved: August 11, 2016 Revised: September 20, 2017

IN THE THIRTEENTH JUDICIAL CIRCUIT COURT FOR HILLSBOROUGH COUNTY, FLORIDA

VOLUME 1 - CIVIL CASE PROCESSING SYSTEM FUNCTIONAL STANDARDS

Management Overview. Introduction

Statewide Technology Issues Regional Training Workshops

Partners in Collaborative Cataloging: The U.S. Government Printing Office and the University of Montana

If your answer to Question 1 is No, please skip to Question 6 below.

SENATOR HARLAN MATHEWS PAPERS

Drafting Legislation Using XML in the U.S. House of Representatives

Child Check In Quick Start Guide. v 9.5. Local: (706) Atlanta: (404) Toll Free: (866)

TEXAS STATE RECORDS RETENTION SCHEDULE

NORTH CAROLINA EDUCATION LOTTERY REQUEST FOR PROPOSALS #LC ENTERPRISE CONTENT MANAGEMENT SOLUTION AND RELATED SERVICES

LEBANON COUNTY RIGHT-TO-KNOW POLICY

TEXAS STATE RECORDS RETENTION SCHEDULE

Library Archives Commission Summary of Recommendations - Senate

One View Watchlists Implementation Guide Release 9.2

On phone: Michael Andrec, Treasurer Karen Jamison Trivette, Vice President

Don t Get Burned: Proper Implementation of the Litigation Hold Process is Your Best SPF (Spoliation Protection Factor)

Trusted Logic Voting Systems with OASIS EML 4.0 (Election Markup Language)

Wharton Global Clubs Network Election Guide

IN THE SUBORDINATE COURTS OF THE REPUBLIC OF SINGAPORE. epractice DIRECTION NO. 1 OF 2009 DISCOVERY AND INSPECTION OF ELECTRONICALLY STORED DOCUMENTS

What is the Congressional Record?

ST. LOUIS COUNTY LIBRARY DISTRICT REQUEST FOR PROPOSAL DIGITIZATION OF FINANCE DOCUMENTS DATE ISSUED: November 8, 2017

Functional Schedules for North Carolina State Agencies

Florida Supreme Court Standards for Electronic Access to the Courts

Privacy Impact Assessment. April 25, 2006

City of Toronto Election Services Internet Voting for Persons with Disabilities Demonstration Script December 2013

The Community Capability Model Framework & Tools

Parliamentary proceedings in Italian Senate

Re: Public Records Request - Records from the November 2004 General Election

Legislative Records: Guide to Preparation and Transfer

Ameri- can Thoracic Society, 1. Key definitions Authorized Users Outsource Provider Effective Date Fee Licensed Material Licensee

BERMUDA LEGAL DEPOSIT ACT : 30

FY2014 Budget Documents: Internet and GPO Availability

STATE OF MINNESOTA OFFICE OF ADMINISTRATIVE HEARINGS

El Paso County, Texas Leveraged Odyssey to support eight major integration points that delivered cost and time savings

ISi DATABASES INTERNET LICENSE AGREEMENT

FREQUENTLY ASKED QUESTION

Request for Proposals

process will save judges, sheriffs, clerks, and attorneys' time and money.

A Guide to Model Rules for Electronic Filing and Service. Travis Olson, Esq. Marsha Edwards Hon. Arthur M. Monty Ahalt (ret.

Collection Development Policy Federal Government Documents Ouachita Baptist University Library

User Guide. News. Extension Version User Guide Version Magento Editions Compatibility

Alabama State Licensing Board for General Contractors. Functional Analysis & Records Disposition Authority

Records Management 101:

Making National Data Local: Using American FactFinder to Describe Local Hispanic Communities

Researching Current Federal Legislation and Regulations: A Guide to Resources for Congressional Staff

Integrated Court System. FCCC New Clerk Academy August 21, 2017

Congressional Documents in

UNITED STATES DISTRICT COURT FOR THE DISTRICT OF COLUMBIA

70 th INTERNATIONAL ASTRONAUTICAL CONGRESS WASHINGTON D.C., UNITED STATES OCTOBER 2019 INSTRUCTIONS FOR AUTHORS

Researching Current Federal Legislation and Regulations: A Guide to Resources for Congressional Staff

Election Audit Report for Pinellas County, FL. March 7, 2006 Elections Using Sequoia Voting Systems, Inc. ACV Edge Voting System, Release Level 4.

Regional Depository Libraries in the 21st Century: A Preliminary Assessment

CITY OF WILLIAMS LAKE BYLAW NO. 2072

Making a Freedom of Information request

FY 2008 Technology Fee Proposal

Colloquium organized by the Council of State of the Netherlands and ACA-Europe. An exploration of Technology and the Law. The Hague 14 May 2018

DIANA: A Human Rights Database

If your answer to Question 1 is No, please skip to Question 6 below.

Please see my attached comments. Thank you.

Cite-Checking Research Guide for USC Law Students

ForeScout Extended Module for McAfee epolicy Orchestrator

Fairsail. User. Benefits & Open Enrollment User Guide. Version 3.23 FS-BOE-XXX-UG R003.23

KENTUCKY. Jim Swain, Chief Information Officer Legislative Research Commission. Monday, August 6, 2012

MOS Exams Objective Mapping

DIGITAL PRESERVATION NETWORK. Member Call Wednesday, April 18 3:00 pm ET

Fairsail Payflow Cookbook for CSV Record Downloads

Today I am going to speak about the National Digital Newspaper Program or NDNP, the Historic Maryland Newspapers Project or HMNP--the Maryland

Question 1. Does your library plan to remain in the Federal Depository Library Program?

Public Records Request

Just How Does That Work? An In Depth Look at Three Useful Web Sites

TERANET CONNECT USER S GUIDE Version 1.4 August 2013

FLORIDA DEPARTMENT OF JUVENILE JUSTICE PROCEDURE

State Records Board 19 April 2012 Executive Conference Room, Center for Historical Research Kansas Historical Society

MID-SOUTHERN CALIFORNIA AREA 09 COMMUNICATIONS COMMITTEE

Associated Students of the University of Montana records,

SPECIAL INSPECTOR GENERAL FOR AFGHANISTAN RECONSTRUCTION CHIEF FOIA OFFICER REPORT FISCAL YEAR 2010

Stephanie Smith 4400 Massachusetts Avenue NW Washington, DC US Day Phone:

Creating and Managing Clauses. Selectica, Inc. Selectica Contract Performance Management System

JD Edwards EnterpriseOne Applications

FY2011 Budget Documents: Internet and GPO Availability

Legal Deposit Copy Act

Supreme Court of Florida

FILED: NEW YORK COUNTY CLERK 06/24/ :19 PM INDEX NO /2015 NYSCEF DOC. NO. 20 RECEIVED NYSCEF: 06/24/2016

Transcription:

Electronic Records Case Studies Series Congressional Papers Roundtable Society of American Archivists Testing the Waters: Working With CSS Data in Congressional Collections Natalie Bond University of Montana natalie.bond@montana.edu Date Published: August 2015 Case Study#: ERC004 Abstract: Senator Max Baucus deposited his papers with the Maureen and Mike Mansfield Library in April of 2014, a multi-format collection which included 1.4 TB of electronic records. In this case study, I will discuss how we managed and preserved the CSS data contained within these electronic records, data which span Baucus Senatorial career from 1979-2014. Specifically, I review the structure and content of the data that we received, the history of CSS/CMS use within the Senate, and our workflow for accessioning and viewing the data. Finally, I reflect on considerations for moving forward with long-term preservation, exploitation of the data, and future advocacy and collaboration opportunities for archivists and repositories. Keywords: CSS/CMS, databases, digital preservation, electronic records, migration, Microsoft Access Created 2015-07 CPR Electronic Records Committee

Case Study: Working with CSS data in Congressional collections Natalie Bond Adjunct Political Papers Archivist Mansfield Library, University of Montana July 2015 Introduction In April of 2014, then-senator Max Baucus signed an agreement with the University of Montana to deposit his Congressional papers with the Mansfield Library s Archives and Special Collections (A&SC). The collection numbered approximately 959 boxes of manuscript and audiovisual material and 1.4 TB of electronic records. Baucus began his political career in 1972 when he was elected to the Montana House of Representatives. He subsequently served two terms in the U.S. House of Representatives and was elected to the Senate in 1978, where he went on to serve six full terms. Baucus retired from the Senate in 2014 to become the U.S. Ambassador to China, a position in which he continues to serve today. Per the University s agreement with Senator Baucus, A&SC staff received electronic records via network transfer, as well as via external hard drives, CDs, DVDs, thumb drives, and floppy disks. In this case study, I will focus on how we managed and continue to manage the data we received from Senator Baucus s constituent services systems (CSS). We received the data in two batches: the first, CSS data from 1979-1990, arrived as a.dat file; the second batch, dated 1983-2014, arrived as a.tab file. Through trial and error, A&SC staff imported this data into a Microsoft Access database; we are currently working through next steps for managing the data. 1 CMS Overview Constituent services systems, also known as constituent management systems (CMS), are general terms that refer to the large-scale databases that manages the relationship between a Senator/Representative and her/his constituency. Within the CSS, there can be a variety of components facilitating different kinds of activity scheduling, correspondence, casework, possibly even document management. These systems have their roots in the Senate of the 1970s, when a pressing need for more efficient workflows and processes (particularly relating to the handling of constituent correspondence) resulted in the establishment of the Automated Indexing System (AIS), a database system developed by the Senate Computer Center to streamline constituent correspondence activities. Senate offices used AIS in conjunction with the Senate Mail File (SMF) until 1991, when the Senate Mail System was developed for use as a single database. In 1994, the Senate Computer Center stopped supporting SMF and began moving all Senate offices towards adopting proprietary CSS systems. 2 1 I came on board with the Max Baucus Papers project on December 1, 2014, so was not on hand during the transfer/accessioning processes, which were overseen by Head of Archives and Special Collections Donna McCrea and Digital Archivist Sam Meister. Sam was also very much involved in conversations with the Senate Sergeant at Arms during the electronic records export process prior to my arrival. 2 When I began working with the project, I was both fairly new to working with Congressional electronic records and had no prior experience working with CSS data. I reached out to Brittany Durell, a former Baucus staffer, who facilitated the transfer of the electronic records from Baucus Washington office, as well as Senate Archivist Karen Paul. Both were extremely helpful in breaking down the nuts and bolts of how CSS are utilized, as well as the general timeline of CSS usage within the Senate from the mid-1970s to present day. Another excellent resource,

Page 2 of 12 Senator Baucus s office utilized the same CSS as other Congressional offices in the late 1970s and early 1980s. In the mid-1980s, the office became one of six pilot Senate offices for the implementation of servers using a Prime Computer. The server provided word processing and other administrative functionalities, and was created by Lincoln National Information Systems and their partner, LSW. In the mid-1990s, the Baucus office began using a proprietary program called Intranet Quorum, or IQ, developed by Lockheed Martin. The office used IQ until 2008, when it switched over to another system called Voice, developed by a vendor named Symplicity. All data from IQ was transferred to Voice at the time of system transition. What We Have A&SC received two batches of CSS data, as mentioned previously. One batch, the.dat file, arrived on a CD and contained AIS data from 1979-1990. The second batch, the.tab file, arrived on an external hard drive and contained IQ and Voice data dating from 1983-2014. 3 These files were accessioned according to our established procedures, as part of the 1.4TB of electronic records from the Baucus office. A&SC staff generated checksums, created disk images, secured preservation copies, performed virus scans, exported files to a server reserved for working files, extracted file system metadata, and scanned files for personally identifiable information. Accession information and media characteristics were entered into the borndigital log, 4 a Microsoft Access database containing all accession information about the Archives born digital collections. Figure 1: The two CSS data files were part of larger accessions received from the Senate Sergeant At Arms and documented in A&SC s borndigital log. See Appendix for more screenshots. The.DAT file, comprising CSS data from 1979-1990, contained thousands of correspondence records from AIS, the early Senate office correspondence system (see Figure 2). This file consisted of names, addresses, issues, and other constituent metadata entered by staffers into the CSS. It contained 32 recommended by Karen Paul, is Naomi Nelson s Taking a Byte Out of the Senate: Reconsidering the Research Use of Correspondence and Casework Files. 3 Dates are approximate, as most ex-staffers I have spoken with are unsure of exact transition dates. We are working to figure out specific date ranges of what we have. 4 This is the name of the Access database for accessions. Our digital archivist maintains a no-spaces file-naming convention for command-line reasons.

Page 3 of 12 fields of data and was operable in Excel as well as text editors and word processors (although the latter programs did not format the data properly as a table, as it was tab delimited). This was accompanied by a note from the U.S Senate Historical Office including full descriptions of the 32 fields that we received. Figure 2:.DAT file, 1979-1990, as received. The second batch of CSS data from both IQ and Voice, cumulatively, dating from 1983-2014 arrived as a.tab file (see Figure 3), also accompanied by record layout notes from the U.S. Senate Historical Office. Figure 3:.TAB file, 1983-2014, as received. In addition to the main correspondence file, we received library data relating to the office s form letter library as well as the corresponding form letters; incoming/outgoing correspondence correlating to the data in the correspondence.tab file; and email attachments (see Figures 4-7). Figure 4: Correspondence folder & incoming, partial. Received as part of.tab file data.

Page 4 of 12 Figure 5: Incoming correspondence received as part of.tab file data. Figure 6: Incoming correspondence in.txt format, received as part of.tab file data. Incoming correspondence came in a few different formats, mostly.txt, but also including.tiff,.pdf, and.html formats.

Page 5 of 12 Figure 7: Outgoing letter from the CSS library file, in.txt format, received as part of.tab file data. We have the files: Now what? Both batches of.tab correspondence data Archive and correspondence were unable to open fully in a Microsoft Excel spreadsheet, as they exceeded the maximum number of rows Excel is able to display (Excel 2007, 2010 and 2013 support 1,048,576 rows). I experimented with opening correspondence in a text editor and then dividing that text into smaller, more manageable batches of data, but this was extremely time-consuming and carries a significant risk of losing data in the process due to the copying and pasting of large quantities of data. Microsoft Access turned out to be the solution. Sam and I saved the.dat and.tab files in.txt format, and were then able to fully import both converted text files into an Access 2013 database, along with library data correlating to the 1983-2014 CSS data. 5 (See Figures 8-9.) 5 Microsoft Office Support. Import data into an Access database. https://support.office.com/en- IE/article/Import-data-into-an-Access-database-782703aa-6b21-4458-9429-480eaf0c71d6. Accessed July 28, 2015.

Page 6 of 12 Figure 8: CSS data tables in Microsoft Access (data redacted). Figure 9: CSS data tables in Microsoft Access (data redacted). It was a major success to be able to open both files in full. I went ahead and filled in the field names for each column, so we have an organized, searchable database. Looking Forward This is where we re at currently, and we are discussing how to move forward, as there are several things to consider in regards to future plans for the CSS data. Primary on our minds is access: What will that look like for researchers? The collection currently has a 30-year blanket restriction in place, in addition

Page 7 of 12 to further potential restrictions due to sensitive information inherent in the data, so developing a goodlooking front-end was deemed a low priority for the time being. Long-term preservation of the original data is another high priority; in addition to the source.tab and.dat files, preservation now encompasses the data currently housed within the Access database. Given the long period of dormancy in regards to use of the collection, we will need to start considering migration policies and subsequently open-source solutions. Incorporating the correlating form letters and incoming/outgoing correspondence, too, poses a similar problem. Building a robust database capable of linking the above data to these text and PDF files requires specialized knowledge and skills that we will likely need to outsource; there is much future potential for more capable and streamlined tools for manipulating CSS data, and having the data contained within open-source database software will more easily facilitate migration. Users can, for now, identify file names within a field named In Correspondence Document Name(s), and subsequently find that particular incoming letter by searching for that file name within the file directory (see figures 10-11). The same goes for outgoing correspondence, which can be identified with a field named Out Correspondence ID. Figure 10: The highlighted file name is associated with a letter stored in the Incoming Correspondence file directory.

Page 8 of 12 Figure 11: The file name identified in Figure 10 above can be searched and the letter identified via Windows Explorer. Finally, the momentum has already begun to gather in our field for requesting the full range of fields from proprietary CSS entities, which would result in vastly more robust caches of data. I have not reached out to Lockheed Martin or Symplicity, the respective proprietors of IQ and Voice, to request this data, but I imagine they would ask for a significant fee to export these additional fields, as Adriane Hanson experienced. I am hopeful, however, that current and future lobbying will gain momentum such that Congressional offices will begin to advocate for the full export of CSS data as standard practice. The conversation continues around future preservation and manipulation of the CSS data/access database here at the Mansfield Library s A&SC. For now, I am comfortable having the totality of the CSS metadata temporarily housed in Microsoft Access, and we will soon begin exploring the construction of a MySQL-based database for long-term storage as well as continuing our other work with electronic records. I would love to see conversations around the research potential of combining multiple Congressional CSS datasets; Middle Tennessee State University s Gore Center has already initiated the development of a software enabling the ingest of CSS data from Intranet Quorum, in addition to proposing an IQ dataset consortium. 6 Finally, I hope to see repositories continue to lobby for the release of the full range of data fields from proprietary CSS producers, ideally before the final transfer from Congressional office to repository. If anyone has had success in receiving more than 32 fields, I would love to hear about it. Please do not hesitate to contact me with questions, suggestions, or for a general chat about working with Congressional electronic records. Natalie Bond Adjunct Political Papers Archivist Mansfield Library, University of Montana natalie.bond@mso.umt.edu (406) 243-2053 6 Williams, Jim. Recreating the Intranet Quorum Interface for Archival Retrieval and Research. August, 2014. [PowerPoint Slides]

Page 9 of 12 APPENDIX This appendix consists of additional screenshots of the A&SC s management of CSS data. Figure 1: In the borndigital log, staff enter metadata and document all relevant accessioning processes. This is the record for the external hard drive which the.dat file arrived on.

Figure 2: Same as Figure 1, but for the CD that the.tab file arrived on. Page 10 of 12

Page 11 of 12 Figure 3: Incoming.TIFF correspondence, received as part of the.tab data file. Incoming correspondence came in a few different formats, mostly.txt, but also including.tiff,.pdf, and.html formats.

Page 12 of 12 Figure 4: What the.tab file data looks like when opened it in WordPad. Note that the file did not open fully in WordPad; this is just how the data appears in a word processer.