- Created by Emily K. Porter, last modified on Jan 24, 2023
Emory’s Cor preservation repository provides robust preservation support for Emory’s unique and rare digital assets. Preservation services that the repository provides include:
- Fixity checking
- Extraction of technical and administrative metadata
- Virus checking
- File format identification and validation
- Replication of files
Works ingested to the repository also receive preservation events and workflow metadata to document major lifecycle activities such as Accessioning, Ingest, Decommissioning, and Deletion.
Content files submitted to the Cor repository are first transferred to a pre-ingest space. These files are retained for a minimum of 6 months after a Collection is fully ingested. Once quality assurance testing is completed, these files are then transferred to Glacier storage and retained permanently.
Ingested content files are stored in Amazon S3 in the US-East region (Virginia) and are retained permanently.
Every 48 hours, newly ingested content files are replicated to a second S3 bucket located in the US-West region (Oregon).
Backups and Restoration
The Cor repository receives regular backups of the following preservation data, which are retained for a minimum of 6 months:
- Application databases: backed up daily
- Fedora (preservation metadata): backed up daily
- SOLR index: backed up daily
- S3 content storage (replicated every 48 hours)
A full backup and restoration of production data was last tested in June, 2020.
Implemented Preservation Events and Workflows: Summary
The Cor repository supports both major preservation lifecycle workflows as well as specific preservation events/actions. More information about requirements identified by the Preservation Functional Requirements Group is available on our wiki.
System-generated Preservation Events
Preservation Master Files
Works: records the Visibility/access control assigned at time of Ingest
|Modification||At-rest/monitoring||Works: records the initiating user and timestamp when a work is modified after ingest|
Works: validates that SIP includes all required components
Files: FITS validation for the identified format
Master file only is scanned at time of ingest
Message digest calculation
All files: sha1
sha1, md5, sha256
*Derivatives receive minimal characterization
All files are submitted to preservation storage and then a second copy is replicated
Files transferred to AWS in bulk receive fixity checking, but events are not recorded until ingest
Fixity services check all files using sha1. Both copies of files in S3 are checked every six months.
Can also be run on-demand in the Curate product
The following major lifecycle workflows were initially identified through the Digital Preservation Functional Requirements Group and have been further refined during the implementation of the repository system. In version 1 of the Cor repository, some workflows are not fully automated, and some workflows are not yet implemented. Future releases of the repository will expand on this initial functionality.
Process by which depositors prepare the components of a digital object for submission to Emory’s preservation repository; most activities occur outside of the repository system
Manual processes for appraisal and preparation of material
Fixity checking during file transfers to pre-ingest storage
Automated processes for generating ingest-ready submission packages
Repository support for Accession workflow metadata
Process in which the repository software collects or generates the components of a digital object and transfers it to the preservation environment
Repository performs highest priority preservation events and provides support for Ingest workflow metadata
Ongoing monitoring of ingested objects and files
Repository provides fixity checking (ongoing and on-demand) with basic reporting
Repository provides storage replication and monitoring
Formal capture of modifications to an ingested object and its files so that an actionable version history is created
Repository enables basic audit trails for FileSets
Large-scale dissemination of objects to third parties for preservation or discovery
Long-term or permanent removal of object from public access
Manual review processes with support for Decommission workflow metadata
Permanent removal of content files from public access and repository
Manual review and deletion processes with support for Deletion workflow metadata
- No labels