data capture

Parsimonious Preservation – (another) different approach to digital information management

We have been featuring various theories about digital information management on this blog in order to highlight some of the debates involved in this complex and evolving field.

To offer a different perspective to those that we have focused on so far, take a moment to consider the principles of Parsimonious Preservation that has been developed by the National Archives, and in particular advocated by Tim Gollins who is Head of Preservation at the Institution.

racks of servers storing digital information

In some senses the National Archives seem to be      bucking the trend of panic, hysteria and (sometimes)  confusion that can be found in other literature relating  to digital information management. The advice given in  the report, ‘Putting Parsimonious Preservation into  Practice‘, is very much advocating a hands-off, rather  than hands-on approach, which many other  institutions, including the British Library, recommend.

The principle that digital information requires  continual interference and management during its life  cycle is rejected wholesale by the principles of  parsimonious preservation, which instead argues that  minimal intervention is preferable because this entails  ‘minimal alteration, which brings the benefits of  maximum integrity and authenticity’ of the digital data object.

As detailed in our previous posts, cycles of coding and encoding pose a very real threat to digital data. This is because it can change the structure of the files, and risk in the long run compromising the quality of the data object.

Minimal intervention in practice seems here like a good idea – if you leave something alone in a safe place, rather than continually move it from pillar to post, it is less likely to suffer from everyday wear and tear. With digital data however, the problem of obsolescence is the main factor that prevents a hands-off approach. This too is downplayed by the National Archives report, which suggests that obsolescence is something that, although undeniably a threat to digital information, it is not as a big a worry as it is often presented.

Gollins uses over ten years of experience at the National Archives, as well as the research conducted by David Rosenthal, to offer a different approach to obsolescence that takes note of the ‘common formats’ that have been used worldwide (such as PDF, .xls and .doc). The report therefore concludes ‘that without any action from even a national institution the data in these formats will be accessible for another 10 years at least.’

10 years may seem like a short period of time, but this is the timescale cited as practical and realistic for the management of digital data. Gollins writes:

‘While the overall aim may be (or in our case must be) for ―permanent preservation […] the best we can do in our (or any) generation is to take a stewardship role. This role focuses on ensuring the survival of material for the next generation – in the digital context the next generation of systems. We should also remember that in the digital context the next generation may only be 5 to10 years away!’

It is worth mentioning here that the Parsimonious Preservation report only includes references to file extensions that relate to image files, rather than sound or moving images, so it would be a mistake to assume that the principle of minimal intervention can be equally applied to these kinds of digital data objects. Furthermore, .doc files used in Microsoft Office are not always consistent over time – have you ever tried to open a word file from 1998 on an Office package from 2008? You might have a few problems….this is not to say that Gollins doesn’t know his stuff, he clearly must do to be Head of Preservation at the National Archives! It is just this ‘hands-off, don’t worry about it’ approach seems odd in relation to the other literature about digital information management available from reputable sources like The British Library and the Digital Preservation Coalition. Perhaps there is a middle ground to be struck between active intervention and leaving things alone, but it isn’t suggested here!

For Gollins, ‘the failure to capture digital material is the biggest single risk to its preservation,’ far greater than obsolescence. He goes on to state that ‘this is so much a matter of common sense that it can be overlooked; we can only preserve and process what is captured!’ Another issue here is the quality of the capture – it is far easier to preserve good quality files if they are captured at appropriate bit rates and resolution. In other words, there is no point making low resolution copies because they are less likely to survive the rapid successions of digital generations. As Gollins writes in a different article exploring the same theme, ‘some will argue that there is little point in preservation without access; I would argue that there is little point in access without preservation.’

Diagram explaining how emulation works to make obsolete computers available on new machines

This has been bit of a whirlwind tour through a very interesting and thought provoking report that explains how a large memory institution has put into practice a very different kind of digital preservation strategy. As Gollins concludes:

‘In all of the above discussion readers familiar with digital preservation literature will perhaps be surprised not to see any mention or discussion of “Migration” vs. “Emulation” or indeed of ―“Significant Properties”. This is perhaps one of the greatest benefits we have derived from adopting our parsimonious approach – no such capability is needed! We do not expect that any data we have or will receive in the foreseeable future (5 to 10 years) will require either action during the life of the system we are building.’

Whether or not such an approach is naïve, neglectful or very wise, only time will tell.

Posted by debra in audio tape, 2 comments

Measuring signals – challenges for the digitisation of sound and video

In a 2012 report entitled ‘Preserving Sound and Moving Pictures’ for the Digital Preservation Coalition’s Technology Watch Report series, Richard Wright outlines the unique challenges involved in digitising audio and audiovisual material. ‘Preserving the quality of the digitized signal’ across a range of migration processes that can negotiate ‘cycles of lossy encoding, decoding and reformatting is one major digital preservation challenge for audiovisual files’ (1).

Wright highlights a key issue: understanding how data changes as it is played back, or moved from location to location, is important for thinking about digitisation as a long term project. When data is encoded, decoded or reformatted it alters shape, therefore potentially leading to a compromise in quality. This is a technical way of describing how elements of a data object are added to, taken away or otherwise transformed when they are played back across a range of systems and software that are different from the original data object.

Time-Based-Corrector

To think about this in terms which will be familiar to people today, imagine converting an uncompressed WAV into an MP3 file. You then burn your MP3s onto a CD as a WAV file so it will play back on your friend’s CD player. The WAV file you started off with is not the same as the WAV file you end up with – its been squished and squashed, and in terms of data storage, is far smaller. While smaller file size may be a bonus, the loss of quality isn’t. But this is what happens when files are encoded, decoded and reformatted.

Subjecting data to multiple layers of encoding and decoding does not only apply to digital data. Take Betacam video for instance, a component analogue video format introduced by SONY in 1982. If your video was played back using composite output, the circuity within the Betacam video machine would have needed to encode it. The difference may have looked subtle, and you may not have even noticed any change, but the structure of the signal would be altered in a ‘lossy’ way and can not be recovered to it’s original form. The encoding of a component signal, which is split into two or more channels, to a composite signal, which essentially squashes the channels together, is comparable to the lossy compression applied to digital formats such as mp3 audio, mpeg2 video, etc.

UMatic-Time-Based-Corrector

A central part of the work we do at Greatbear is to understand the changes that may have occurred to the signal over time, and try to minimise further losses in the digitisation process. We use a range of specialist equipment so we can carefully measure the quality of the analogue signal, including external time based correctors and wave form monitors. We also make educated decisions about which machine to play back tapes in line with what we expect the original recording was made on.

If we take for granted that any kind of data file, whether analogue or digital, will have been altered in its lifetime in some way, either through changes to the signal, file structure or because of poor storage, an important question arises from an archival point of view. What do we do with the quality of the data customers send us to digitise? If the signal of a video tape is fuzzy, should we try to stabilise the image? If there is hiss and other forms of noise on tape, should we reduce it? Should we apply the same conservation values to audio and film as we do to historic buildings, such as ruins, or great works of art? Should we practice minimal intervention, use appropriate materials and methods that aim to be reversible, while ensuring that full documentation of all work undertaken is made, creating a trail of endless metadata as we go along?

Do we need to preserve the ways magnetic tape, optical media and digital files degrade and deteriorate over time, or are the rules different for media objects that store information which is not necessarily exclusive to them (the same recording can be played back on a vinyl record, a cassette tape, a CD player, an 8 track cartridge or a MP3 file, for example)? Or should we ensure that we can hear and see clearly, and risk altering the original recording so we can watch a digitised VHS on a flat screen HD television, in line with our current expectations of media quality?

Time-Based-Correctors

Richard Wright suggests it is the data, rather than operating facility, which is the important thing about the digital preservation of audio and audiovisual media.

‘These patterns (for film) and signals (for video and audio) are more like data than like artefacts. The preservation requirement is not to keep the original recording media, but to keep the data, the information, recovered from that media’ (3).

Yet it is not always easy to understand what parts of the data should be discarded, and which parts should kept. Audiovisual and audio data are a production of both form and content, and it is worth taking care over the practices we use to preserve our collections in case we overlook the significance of this point and lose something valuable – culturally, historically and technologically.

Posted by debra in audio tape, digitisation expertise, video tape, 0 comments