Digital preservation recommendations for small museums
Table of contents
- List of abbreviations
- Introduction
- Recommendation 1: use Canadian Heritage Information Network's Digital Preservation Toolkit
- Recommendation 2: use shortcuts and existing tools to simplify the inventory process
- Recommendation 3: use additional storage space rather than an open archival information system
- 3(a) Centralize and organize digital assets
- 3(b) Locate and identify database content
- 3(c) Immediately create preservation metadata for master copies
- 3(d) Create preservation copies annually
- 3(e) Maintain the content
- 3(f) Make regular backups
- 3(g) When accessing content, refer to the shared drive first, then to both backups and preservation copies simultaneously
- 3(h) Maintain the hard drives
- 3(i) Limit the use of other storage media
- 3(j) Name files in a structured and consistent manner
- 3(k) Consider switching to an open archival information system preservation model should capacity and requirements increase
- Hardware architecture of a typical system using these recommendations
- Known risks associated with these recommendations
- Summary
- Appendix: Running the robocopy command
- Glossary
List of abbreviations
- CHIN
- Canadian Heritage Information Network
- IT
- information technology
- OAIS
- open archival information system
- SSD
- solid state drive
- TB
- terabyte (240 bytes or 1000 gigabytes)
- TDR
- trusted digital resource
Introduction
Digital preservation standards exist and are well established. But some of the requirements laid out in these standards are beyond the resources of smaller cultural heritage institutions, such as community museums. Digital archiving systems, for instance, cost money and require time investments that volunteer-run or single-employee organizations either don't have or can't make. Obtaining a trusted digital repository (TDR) certification, or even maintaining a digital archive that conforms to the open archival information system (OAIS) framework, is simply not possible. Yet the alternative to doing nothing, or doing nothing more than backups, leaves these organizations exposed to unnecessary risks.
To assist small and medium-sized institutions, the Canadian Heritage Information Network (CHIN) created a set of digital preservation recommendations. These recommendations are based on CHIN's experience in three areas:
- implementing digital preservation plans and policies in smaller cultural heritage institutions,
- delivering digital preservation workshops to small organizations and
- collecting feedback from the presentation of its work to the Digitization and Digital Preservation Discussion Group, an informal group of digitization and digital preservation experts from across Canada.
Recommendation 1: use Canadian Heritage Information Network's Digital Preservation Toolkit
This Digital Preservation Toolkit can be used by institutions of any size. It helps cultural heritage institutions:
- Take stock (create an inventory and assess risks and the impact of losses) of an organization's digital assets;
- Produce a digital preservation policy; and then
- Produce a digital preservation plan and procedures.
No step should be omitted, as they are important to institutions of any size. However, the last two steps (production of a policy and a plan) are likely to be much simpler with smaller institutions. Even in the case studies conducted by CHIN (8th Hussars Museum and Medalta Museum), the policy and plan documents were, in retrospect, more detailed than necessary.
The policy document doesn't need to be longer than two pages. Focus on what to preserve and why, and use the development of this document as a change management tool in order to generate buy-in for resources as well as the ongoing commitment of all who will be involved.
Likewise, the plan document will involve far less solution hunting and comparison than what is done in larger institutions with greater budgets and requirements. Indeed, this resource recommends basic preservation solutions for specific situations.
Recommendation 2: use shortcuts and existing tools to simplify the inventory process
Taking an inventory (or survey) of an institution's digital assets is the first step in the preservation process (consult How to use the Digital Preservation Toolkit). The following is a list of tips and shortcuts that will help make the task easier:
- Use the DROID: file format identification tool (provided for free by The National Archives in the UK) to quickly identify file types, file quantities and storage requirements.
- In conjunction with DROID, use The National Archives' PRONOM technical registry to identify technical details of any particular file format.
- Estimate the number of digital assets found on loose physical carriers such as optical discs and their storage requirements. It is not necessary to be exact.
- Ensure that any preservation plan you implement is flexible to future needs by acknowledging digital assets currently not held but which are likely to be held in the near future, e.g. if there is a large digitization project planned or underway. Include an estimation of storage requirements for these as well (when in doubt, round up in your estimation).
Recommendation 3: use additional storage space rather than an open archival information system
A common best practice is to use an archiving system that is compliant with the OAIS reference model. This type of system ingests content elements (typically as individual digital assets), each of which must be documented with preservation metadata and then managed (refreshed and migrated) as the asset ages. The benefit to this method is that ingested assets are better managed, searched and retrieved as authoritative copies.
For smaller institutions, however, there are problems with this model. Firstly, the resources needed (financial, labour and skills sets) are often not available to implement or sustain a system for which a dedicated digital archivist is often required. Secondly, the quantity and nature of digital content may not justify an OAIS system; small amounts of static content with similar metadata for multiple assets can easily be documented without a formal archive. Thirdly, the number and nature of anticipated queries on the archive may not justify an OAIS either. A small institution with just one worker, who is preserving content which has only a remote likelihood of being accessed in the future, may wish to preserve that content but not invest heavily in doing so.
Where the above conditions hold true, CHIN recommends the use of regular backups similar to those done by information technology (IT) staff, as well as the regular creation of preservation copies, which are never overwritten and which are accompanied by basic preservation metadata.
While many of the following recommendations will apply when using other operating systems, they are written with a Windows environment in mind.
3(a) Centralize and organize digital assets
Centralize all master copies of the institution's locally stored digital assets onto a single shared hard drive, with the following considerations in mind:
- Make this drive shareable only to staff who require access.
- If the content is already on other physical media (e.g. optical discs), these copies may be kept, particularly if they are of archival quality. However, the content should also be centralized on hard drives as described here. Note that the Copyright Modernization Act prevents the migration of information from media containing a technological protection measure (sometimes found on commercially produced compact discs).
- Organize the directory structure of the shared drive according to asset groups identified in the inventory process, which is step 1 in the Digital Preservation Toolkit.
- Clearly denote that these assets are master copies. This can be done by naming directories accordingly (e.g. by using the word "master copies" in a root directory) and by including standard text in each resource's filename (e.g. "mstr_").
- Working copies of these digital assets may also be stored on the shared drive, but these should be clearly distinguished from the master files by using separate directories and a distinct naming convention.
- Where possible, benefit from existing databases to document digital assets. A museum can include an object's accession number in the filename of any digital copy of a physical holding, thereby linking the digital asset to the original (physical) object and its record. Likewise, digitally born objects (such as an audio track of an interview) can be recorded as a museum holding by creating a new record in the museum's collections management system and by including the object's accession number in the filename of that digital asset.
3(b) Locate and identify database content
Some digital assets (or asset groups) are fluid in nature. A collections management database, for example, is constantly updated. Consider the following for such content:
- If possible, any such database should be stored on the shared drive (if it is installed and running locally), and in spite of the constant changes, the directory in which it is located will be deemed part of the master copies.
- If it is not possible or practical to move the database to the shared drive, keep it where it is located and perform all the tasks that follow in this list separately for the drive upon which the database is located.
- Do not modify the directory or filenames within such a system, as some applications may require specific pathnames to function properly.
- Any database stored online (e.g. an online collections management system or a digital asset management system) can also be counted as a master copy of that resource. However, copies of this information should be periodically downloaded and saved on the shared drive in an accessible format, in the event that access to the online service is lost. Textual or numeric information from such sources should be saved in tab delimited, comma separated values or in a similarly accessible text format. Only the most recent copy of this downloaded information needs to be retained on the shared drive at any given time. Prior versions will be captured in the backups and preservation copies discussed below.
3(c) Immediately create preservation metadata for master copies
As soon as a shared drive has been organized, and immediately prior to making any preservation copy, preservation metadata should be created or identified for master copies.
Where possible, add preservation metadata to digital asset groups (as defined in the inventory process, step 1 of the Digital Preservation Toolkit Workflow) rather than individual assets. If no other method is available, record this information using a simple text editor in a file entitled "preservation_metadata," and store this file on the shared drive in the same directory as the digital asset groups.
Key components of preservation metadata include:
Provenance
This is information regarding the origin of the digital asset or asset group.
- If the asset was tied to a record in a collections management system, a digital asset management system or a similar database, some provenance information will already be recorded, and no further action is necessary.
- There is also metadata embedded in the header of some file formats (such as image files), which is produced at the time of creation. A photograph's metadata may include the date, location and type of equipment used to produce the image. Again, no further action is necessary to retain this information.
- If further information regarding provenance can be added for an entire asset group or sub-group, do so by recording it in the directory's preservation metadata text file. There is no prescribed format for this, other than to record information that you feel is pertinent.
Context
This is information about how the digital asset relates to other assets, institution holdings, digitization projects, installations, etc.
- Some of this information may be stored in a database to which the asset is tied, and thus, no further action is necessary.
- If further contextual information exists regarding an asset group or sub-group, it can be added to the directory's preservation metadata text file. Do this by adding a "Context" line entry, then by following this entry with a free text description of the context. Include any information that you feel is relevant.
Preservation activity
This refers to a history or audit trail of all actions performed on a digital asset during the preservation process.
- While the process described here provides no provisions for recording preservation activity on individual digital assets, actions performed on entire asset groups (or sub-groups) can be added to the preservation metadata text file in the group's directory. Include any file conversion activity necessary to create master copies (include dates, file formats and software used) as well as any file migration activity over the course of the digital asset's life (again, include dates, file formats and software used). Consult the section Content migration.
- Further to this, the root directory of a preservation copy (consult section 3(d) Create preservation copies annually) for all content can be named with the date on which the copy was made (e.g. "preservation_copy_yyyymmdd").
Authenticity
This refers to all activities necessary to ensure that the digital asset accessed is what it is supposed to be, and it is a combination of many of the preservation metadata elements described here. Typically, authenticity is managed by a digital archivist, a process for which these recommendations do not allow. However, the recommendations provide sufficient alternate information so as to permit someone conducting basic research of the master copies or the preservation copies to reasonably establish a digital asset's authenticity.
Integrity
In addition to authenticity, integrity refers to evidence that a digital asset has not inadvertently changed during the preservation process. Such changes can occur during migration (which is further discussed in the section Content migration) or during a mistaken edit or overwrite of a master copy. Ensuring that a file has not been inadvertently changed is sometimes referred to as establishing "fixity."
- The simplest method of establishing fixity is to generate and store checksums for all of the resources being preserved.
- In the context of digital preservation, a checksum is a sequence of data that is produced by applying a specific algorithm to the detailed contents of a file. Applying the same algorithm to the same file contents will always yield the same checksum. Conversely, any change to the original file will yield (with overwhelming probability) a different checksum.
- When a directory with master copies is first made, checksums are produced for all files. They are subsequently stored in an easily accessible location, typically a text file located in the same directory.
- By applying the same algorithm to the preserved files at a future date, and then comparing the results to the original checksums, it can be determined if master files have changed with time. If they have, then backups (consult section 3(f) Make regular backups) or preservation copies can be used to restore master files to their correct state.
- One algorithm commonly used for preservation is MD5. There are several software applications that can be used to produce the MD5 checksum, one being MD5 Summer (freeware, supported by donations).
- Regardless of the package you choose, ensure that the results are stored in an easily accessible text document, so that any MD5 checksum software may be used in the future to verify fixity.
- Note that some backup software may also use MD5 checksum algorithms, usually for the purpose of determining what files have changed since a previous backup. Unfortunately, these packages often store the checksums in a proprietary manner that is inaccessible to anything but that backup software. This is not practical for long-term fixity checking since specific backup software may become obsolete or unusable. It is fine to select such software for your backup purposes, but you will still need a separate checksum generator if that backup software does not store MD5 checksums in an easily accessible text file.
- When to use checksums:
- For digital assets that typically are not modified or that have master copies (images, audio, video, etc.), MD5 checksums should be used in the following situations:
- Upon initial centralization of master copies on a shared drive (done once), new checksum information should be created and saved.
- Following intentional changes to a directory containing master copies on the shared drive. This would include the addition of new files or the migration of existing files to newer formats. In both of these cases, existing checksum information for older files in the directory should first be reviewed to verify if it is correct. Once the desired changes are made (e.g. addition of new files or migration of existing ones), new checksum information should replace the old.
- Immediately prior to the creation of an annual preservation copy (consult section 3(d) Create preservation copies annually), checksum information for all directories with master files on the shared drive should be verified and this checksum file should be saved as part of the preservation copy.
- Upon accessing a preservation copy or a backup (consult section 3(f) Make regular backups) for the purposes of restoring corrupted or lost master files on the shared drive. In such cases, checksum information that was saved with the preservation copy or backup should be verified.
- For databases and other digital asset groups that are constantly being updated, there is no value in creating checksum information on the shared drive, as the asset is constantly in flux. Instead, checksum information is restricted to preservation copies, as follows:
- Upon the annual creation of preservation copies, checksum information is generated directly on the preservation copy for the newly copied database.
- Prior to the retrieval of records from a database on a preservation copy (in the event that records on the working copy were inadvertently lost or overwritten), the checksum information on the preservation copy should be verified to ensure that the database retained fixity.
- For digital assets that typically are not modified or that have master copies (images, audio, video, etc.), MD5 checksums should be used in the following situations:
Technical environment
This refers to information about the hardware, software and operating system that was used to produce the digital asset or that may be necessary to render access to it.
- Some metadata about the technical environment may already be contained in the header of some file formats (e.g. information about the camera that was used to generate an image). No further action is necessary to preserve this information.
- Additional information about technical environments for a given file format may be also found in The National Archives' PRONOM technical registry.
- Finally, if any other specific information is deemed important, it can be added to the preservation metadata file.
Rights management
This refers to information regarding copyright and usage permissions for a given digital asset.
- Some of this information may exist in a database to which the digital asset is linked. Further information may exist in administrative documentation on the shared drive.
3(d) Create preservation copies annually
As a first step after organizing the shared drive, make annual preservation copies to external hard drives by taking a snapshot of all content in master directories. The entire directory structure must be included in the preservation copy. Unlike backups, these copies include all the file attributes of the original files and folders (i.e. creation date, modification date, etc.). They also contain fixity information for the databases, and above all, they must never be overwritten. Other considerations for preservation copies include:
- On Windows operating systems, use the robocopy command to copy all content and directory structures en masse and to preserve the time-stamps of files and folders.
- Any fixity information (i.e. checksums) for digital assets on the shared drive should also be copied as part of the preservation copy.
- Preservation copies of databases, or other files that are in constant flux, will have no fixity information (checksum information) attributed to them on the shared drive. Instead, checksum information for these files should be created as part of the preservation copy immediately following the robocopy command.
- Keep two sets of these preservation copies on two separate external drives, with one off-site, and never overwrite them.
- Preservation copies should be kept indefinitely. While content could, in theory, be removed from a preservation copy if it has exceeded its disposition date, there is generally no advantage to doing so. Instead, content that has exceeded that date should be removed from the shared drive. Consult section 3(h) Maintain the hard drives for more information on exceeding drive capacity.
- Cloud storage may also be used for preservation copies, but how this is done will depend on the nature of the storage service, and it should never be used as the sole source of preservation copies. Always use hard drives for at least one preservation copy.
3(e) Maintain the content
Content refreshing
This is the copying of content to new physical media, primarily to prevent the consequences of media degradation.
- On individual hard drives, refreshing to new drives should be done at least once every five years. Use the robocopy command for this purpose. Consult 3(h) Maintain the hard drives for further details.
- In general, content on optical media should be migrated to hard drives (although the original may be kept).
- Do not refresh optical media to the same type of media, unless your organization has a specific reason for doing so (e.g. long-term disaster planning).
Content migration
This refers to the conversion of digital assets from one format to another to ensure older files avoid obsolescence and remain accessible with the passage of time.
- As part of a regular digital asset inventory process (consult the Digital Preservation Toolkit), identify content on the shared drive that requires migration to new file formats, and migrate this content while keeping it on the shared drive. Keep the original as well as any files in the migration path in a sub-folder, and label these accordingly.
- DROID and PRONOM can both be used to help map a migration plan.
- Do not migrate content on the existing preservation copies. These copies should never be overwritten or modified.
- As with any migration process, the tools used should (ideally) allow batch migration of entire digital asset groups and should ensure that the migration path (i.e. the conversion process from older formats to newer formats) retains as much relevant detail as possible, including metadata.
- All migration activity (e.g. what was migrated, by whom, using what tools and any further comments) should be recorded in the preservation metadata text file, in the directory for that asset group.
Digital asset retention and disposition
- If an asset is to be retained according to the digital asset retention and disposition schedule, do not delete it from the shared drive. If there are multiple versions of the asset, always ensure the master copy, on the shared drive, is clearly identified as to be easily accessible.
- Once an asset is to be disposed of (according to the asset retention and disposition schedule), all copies of it can be removed from the shared drive.
- Keep all preservation copies for as long as it is required for any single digital asset in that preservation copy (as per the digital asset retention and disposition schedule). While it is possible to delete content that is no longer required from a preservation copy, doing so will leave a partial image of what was stored, including checksums showing missing files). It is generally easier to keep the copy in its entirety.
3(f) Make regular backups
In addition to annual preservation copies, regular backups of all digital assets on the shared drive should be made on a more frequent basis, and ideally, backup information should be stored off-site. However, this may not be practical if only one backup drive is being used. Fixity information is not necessary for this procedure, nor is using the robocopy command to preserve all file attributes. Cloud storage or external, portable drives may be used for these backups. Note that these sorts of backups are often already being performed by a museum's IT support service.
- If using external hard drives:
- Perform backups in cycles (e.g. once a week and once a month). The backups resulting from those cycles can be overwritten after a specified time period to save drive space.
- Database files constantly change, and the multiple copies of these will be the primary reason for backing up in weekly and monthly cycles.
- Other files will likely change less frequently. For this reason, selecting backup software that saves only changed files since prior backups (known as incremental copying) will save drive space.
- If using cloud storage:
- Select a storage service that specializes in backups rather than file sharing or online storage services that serve as a primary location for content.
- Some cloud services store content in a way that allows one to browse a timeline (to see an image of the hard drive's contents at any point in time). These are convenient, but unless file metadata is also preserved (file creation date, modification dates, etc.), these services should be used for backups only, not for preservation copies.
- Be aware of any sensitive data that may be uploaded and be familiar with the Internet privacy legislation and practices of the country where storage servers are located.
3(g) When accessing content, refer to the shared drive first, then to both backups and preservation copies simultaneously
When accessing a digital asset, seek it first on the shared drive. Because nothing is ever deleted, unless dictated by an asset retention and disposition schedule, it should be there. If it is not available, revert to weekly or monthly backups, as these will be more recent than preservation copies, then to annual preservation copies, since the date stamps on the preservation copies will be correct. Always restore from the preservation copy, if possible.
If an asset that should have been on the shared drive is missing, take the time to determine why that asset is not available and identify what other content, if any, is missing from the shared drive. Restore missing digital assets (again, from the preservation copies, if possible) to prevent the problem from propagating in the future. The robocopy command can be used to keep date stamps on restored files, but be diligent so as not to overwrite materials on the shared drive.
If content is missing from a preservation copy where it should have existed, leave the preservation copy as is. Never overwrite an old preservation copy. If large volumes of content are missing from a recent preservation copy, it may be worthwhile to create an additional preservation copy once missing content on the shared drive has been restored. Document all work in the preservation metadata text files on the shared drive.
3(h) Maintain the hard drives
Management of all hard drives involves the following:
- Label the replacement date to five years from the date of purchase on any external hard drive, and refresh content using the robocopy command to new drives accordingly. The internal shared drive is likely to be replaced with its host computer once every five years or so. In such a case, add the new machine to the network, and use the robocopy command to move content to the new shared drive.
- For copies (either backups or preservation copies) in Windows environments, consider using parity-based storage spaces, which is a simple way of pooling drives into a large virtual drive for data archiving, if storage requirements exceed commercially available hard drive capacities. This process has become more stable and accessible in recent versions of the operating system. The process varies with each subsequent version of Windows, and users are asked to consult Microsoft documentation for the correct procedures. In 2018, a storage space of up to 63 terabytes (TB) could be created in this manner.
- For shared drives in Windows environments, consider a mirror-based storage space if space requirements exceed commercially available drive capacities. Both mirror and parity-based storage spaces offer redundancy, should a drive fail, but parity-based systems make more effective use of disk space, whereas mirror-based systems offer faster access times.
- If using Windows storage spaces, be sure that the drives added to the pool are of a similar capacity, as only the smallest capacity will be recognized for all drives.
- When replacing a drive in a Windows storage space disk array, do not use the robocopy command. Instead, follow Windows storage space instructions on how to replace a drive; these will vary with each successive version of Windows.
- Whether using external hard drives in a storage space pool or as standalone devices, use different known and trusted brands to reduce the chance of simultaneous drive failures.
3(i) Limit the use of other storage media
Keep optical discs on which any content already exists if and only if they are known to be of archival quality. Otherwise, do not count this media when determining the number of copies on hand. It is generally better to choose hard drives over optical discs when preserving new content.
Solid state drives (SSDs) are an acceptable substitute to hard disk drives when serving as the shared drive. However, at the time of writing (2018), using an SSD to store preservation copies was not recommended.
3(j) Name files in a structured and consistent manner
Observe the following practices and file naming conventions:
- Name master copies using a prefix in the filename (e.g. "mstr_").
- Avoid the use of spaces or special characters in filenames, as this sometimes causes problems with scripts and has been known to cause difficulties when transferring files across operating systems.
- Working copies of multimedia should also be identified as such in the filename, and version control should be maintained by integrating into the filename the date on which the version was created (in "YYYYMMDD" format).
- Use subdirectories to organize master copies, older versions and the most current working copies.
3(k) Consider switching to an open archival information system preservation model should capacity and requirements increase
An institution should consider switching over to a full OAIS-compliant solution when it is able to afford doing so and:
- if searches for preserved content cannot be performed to the satisfaction of the institution;
- if authenticity of accessed content cannot be guaranteed to the satisfaction of the institution; or
- if the total time and resources spent preserving, managing and accessing content with the above recommendations exceeds those that would be invested with a traditional OAIS model.
Hardware architecture of a typical system using these recommendations
In the simplest configuration, it is possible that only one computer would exist, and it should contain as many centralized digital assets as possible on its working hard drive. If the museum has additional computers, these can share access to the hard drive via a wireless router (preferred) or an Ethernet connection. If the museum is using an online database, such as a collections management system, a regularly updated copy of this database should be stored in an accessible format on the shared drive. The museum may also wish to use an online backup service, which is more expensive than external hard drives but can be more convenient. Preservation copies should always be saved to external hard drives, and at least one copy should be saved off-site. Should storage requirements make it necessary, any hard drive in this diagram (shared drive or external hard drive) can be replaced by a disk array (e.g. a Windows storage space disk pool).
Description of Figure 1
This figure depicts a typical hardware configuration in a small museum, as per the recommendations described in this document. All digital assets have been centralized to a shared drive (typically, but not necessarily, an internal drive) hosted by a single computer. This drive is then made accessible to other machines on the premises, typically through a wireless router. An external drive is generally left connected to this central computer and is used to hold weekly backups of the shared drive's digital assets. A second external drive is connected to the central computer on an annual basis and is used to hold annual preservation copies of the shared drive's digital assets. An additional set of preservation copies of all digital assets is made to yet another (third) drive, and this is stored offsite. The central computer may have Internet access (through a wireless router, for instance). In such a case, the Internet connection may be used to access an optional online collections management system (CMS) or an optional online backup system.
Known risks associated with these recommendations
The recommendations made in lieu of an OAIS model (i.e. those made under Recommendation 3: use additional storage space rather than an open archival information system) are meant to provide a measure of preservation in light of the limited resources and requirements in smaller institutions. It is understood that these recommendations are not without risk.
Risk – violation of the 3-2-1 rule
This generally accepted preservation rule states that at least three copies of any digital asset should be kept, that two separate types of carriers (i.e. physical media) should be used and that at least one copy should be stored off-site. Yet the above process permits the use of hard drives for all copies. This was a topic of discussion in the Digitization and Digital Preservation Discussion Group. Archival quality optical storage media, which are considered to have the longest lifespan of any storage media, are still considered to be acceptable if digital assets are already recorded to them. However, hard drive storage in new installations was considered preferable for most preservation plans due to its superior storage capacity, flexibility and affordability. To compensate for the violation of the rule, then, it is advised that hard drives be replaced every five years, that copies be retained on multiple drives of varying known brands and that at least one copy be stored off-site.
Risk – no formal archiving of an authoritative copy
The above recommendations do not provide for a clearly documented audit trail to confirm the authenticity of authoritative copies of a digital asset, nor do they allow for the preserved assets to be as easily searched and retrieved as they may be in an archival system. Instead, it is presumed that the master copy on the shared drive is the authoritative version and that sleuthing through related systems (such as collections management databases, fixity information and sporadic preservation metadata) will be necessary to confirm the copy's authenticity. Without a rigorous archiving system in place to improve searchability or to enforce the correct use of preservation metadata, this will always be a risk. Given the absence of the resources required to implement an OAIS system, it is an accepted risk to ensure that the greatest degree of preservation be implemented in institutions with smaller budgets and requirements. Moreover, this risk can be mitigated by tying digital assets to existing database systems (using the accession number in a file name, for instance) and by diligently updating the preservation metadata file (a text file stored with master copies) when any activity is performed on the asset.
Summary
These recommendations are provided to streamline the preservation activities of small to medium-sized museums and similar cultural heritage institutions. They are offered with the understanding that any effort to implement preservation measures is a step in the right direction and that such measures will be more effective if those carrying them out are well informed.
These recommendations are intended to be flexible to the needs and abilities of the institution using them. As such, they can be applied to various technical environments. As an institution grows, so can a system implemented under these recommendations (e.g. migration to disk arrays or online storage). The recommendations also encourage the creation and maintenance of preservation metadata, which will help enable long-term access to your digital assets and, should the needs and resources of your institution grow, help simplify the implementation of an OAIS-compliant system at a future date.
Finally, these are only recommendations. If they do not suit the needs of your institution, they can and should be modified. For further information about implementing these recommendations, do not hesitate to contact CHIN.
Appendix: Running the robocopy command
Robocopy can be used to quickly copy large amounts of content, including all file attributes, which is something that the standard Windows "drag and drop" or even many backup software systems fail to do. The robocopy command is run from the DOS Shell command line and is available on all Windows operating systems. No additional software is required to use it.
The DOS Shell command prompt is invoked by slightly different means in various Windows operating systems (typically by right-clicking the "start" button or "start orb" in the bottom left corner of the desktop screen, then selecting the "Run" menu option, then, from the "Open" dialogue box, entering "cmd" and pressing the Enter key). The format for the command necessary to create preservation copies would appear as:
robocopy c:\source_directory\ e:\destination_directory\ /MIR /DCOPY:T
where:
- c:\source_directory\ is the drive and path where original files are located (note that paths with spaces in them will require quotes)
- e:\destination_directory\ is the drive and path where files are to be copied (be sure this path does not yet exist)
- /MIR causes robocopy to copy all files and subdirectories as they appear
- /DCOPY:T causes robocopy to preserve time-stamps on all files, directories and subdirectories
When the copy is complete, type "exit" and press the Enter key to close the DOS Shell window.
Note that when using the /MIR switch, always copy files to a destination directory that does not yet exist, as this feature is capable of deleting files in a source directory when a pre-existing destination path is used.
Glossary
- checksum
- A small amount of datum generated by applying a function to a larger block of data. In the context of digital preservation, checksums are used to detect inadvertent changes to file contents.
- content migration
- The act of converting content to a new file format. This is done to ensure that content remains accessible with current software, operating systems and hardware.
- content refreshing
- The act of copying digital content to a new physical carrier, typically of the same media type. This is done to prevent the loss of content due to media degradation.
- digital asset retention and disposition schedule
- A document that identifies digital assets (typically by asset groups) and the date on which their disposal must take place. Often, the date will be recorded as "indefinitely," but this can be revised as an institution's inventory is reviewed.
- Digital Preservation Toolkit
- A resource provided by CHIN to help Canadian cultural heritage institutions preserve their digital assets. It consists of templates, decision trees, guidelines and case studies to help museums produce digital preservation policies, plans and procedures.
- disk pool
- A Microsoft Windows term for an array of hard drives that are logically grouped to be used as a single virtual hard drive. Disk pools can be configured into storage spaces in a number of ways, depending on the function they are to serve.
- DROID
- A software tool developed by The National Archives (UK) to perform automated batch identification of file formats.
- Ethernet
- The most commonly applied system for implementing local area networks in which a physical cable connection is involved. It is commonly identified by a wide "telephone" (RJ45) jack plugged into the networked computers.
- master copy
- The copy of a digital asset from which all other copies are derived. Once designated as such, a master copy should never be modified.
- MD5
- A checksum algorithm used in digital preservation to ensure file fixity. When applied to a file, the MD5 function yields a 32-digit alpha-numeric result (the checksum), which can be stored in a text file. By applying the same function to the same file and comparing the two checksums, one can determine if the file has changed.
- mirror-based storage space
- A way in which a Windows disk pool may be configured to ensure redundancy of information across a disk array. Using mirroring for redundancy provides less storage space than the parity method but faster access times.
- OAIS
- Open archival information system and the reference model referring to such a system. OAIS is an organization of people, technology, workflows and procedures, which adheres to a specific set of standards defined by the OAIS model.
- optical disc
- A form of storage media that retains information on a reflective surface. Common optical disc formats include the compact disc (CD) and the digital video disc (DVD).
- parity-based storage space
- A way in which a Windows disk pool may be configured to ensure redundancy of information across a disk array. Using parity for redundancy provides more storage space but slower access times than the mirror-based method.
- preservation copy
- In the context of CHIN's recommendations for small museums, the term "preservation copy" is intended to refer to a copy of a master file for a given digital resource, complete with all metadata. Unlike archival copies (which are ingested only once into an archive, then managed), preservation copies are saved annually.
- PRONOM
- A technical registry and resource provided by The National Archives (UK) for anyone requiring impartial and definitive information about the file formats, software products and other technical components required to support long-term access to electronic records and other digital objects of cultural, historical or business value.
- robocopy command
- A command available in all current Microsoft Windows operating systems. The command is accessible through the DOS Shell and can be used to copy entire directories and subdirectories and their associated metadata, including time-stamps for folders and files.
- solid state drive (SSD)
- A mass storage device similar in functionality to a hard drive but without moving parts. Instead, solid state drives use semiconductor memory.
- storage space
- A Microsoft Windows term for a configured disk pool. Configuration can be done in a number of ways to suit requirements (no redundancy, redundancy using mirroring or redundancy using parity).
- technological protection measure
- A technical means (hardware, software or both) of preventing a digital resource (usually one sold commercially) from being copied.
- working copy
- A digital asset derived from a master copy. Working copies can be modified to suit the needs of the project at hand.
Page details
- Date modified: