open source software

Going CD-R-less – digital file-based delivery

Often customers ask us to deliver their transferred sound files on CD, in effect an audio CD-R of the transfer.

Although these recordings can still be high resolution there remains a world of difference—in an archival sense—between a CD-R, burnt on a computer drive (however high the quality of drive and disc), and CD recordings made in the context of the professional music industry.

The CD format is far from ‘obsolete,‘ and recent history has shown us repeatedly that formats deemed ‘dead’, such as vinyl or the audio cassette, can become fashionable again.

Yet when it comes to the preservation of your audio and video archives, it is a good idea to think about this material differently. It is one thing to listen to your favourite artist on CD, in other words, but that precious family recording of your Grandfather discussing his life history on a CD-R is different.

Because of this, we believe that supplying customers with digital files, on hard drive on USB stick is, in 2016 and beyond, a much better option. Holding a recording in physical form in the palm of your hand can be reassuring. Yet if you’ve transferred valuable recordings to ensure you can listen to them once…

Why risk having to do it again?

CD-Rs are, quite simply, not a reliable archival medium. Even optical media that claims spectacular longevity, such as the 1000 year proof M-Disc, are unlikely to survive the warp and weft of technological progress.

Exposure to sunlight can render CD-Rs and DVDs unreadable. If the surface of a CD-R becomes scratched, its readability is severely compromised.

There is also the issue of compatibility between burners and readers, as pointed out in the ARSC Guide to Audio Preservation:

There are standards for CD-R discs to facilitate the interchange of discs between burners and readers. However, there are no standards covering the burners or readers themselves, and the disc standards do not take preservation or longevity into consideration. Several different burning and reading speeds were developed, and earlier discs or burners are not compatible with later, faster speeds. As a result, there is considerable variability in whether any given disc can be read by any given reader (30).

Furthermore, disc drives on computers are becoming less common. It would therefore be unwise to exclusively store valuable recordings on this medium if you want them to have the best chance of long time survival.

In short, the CD-R is just another obsolete format (and an unreliable one at that). Of course, once you have the digital files there is nothing stopping you from making access copies on CD-R for friends and family. Having the digital files as source format gives you greater flexibility to share, store and duplicate your archival material.

File-based preservation

The threat of obsolescence haunts all digital media, to a degree. There is no one easy, catchall solution to preserve the media we produce now which is, almost exclusively, digital.

Yet given the reality of the situation, and the desire people harbour to return to recordings that are important to them, it makes sense that non-experts gain a basic understanding of what digital preservation may entail for them.

There are a growing amount of online resources for people who want to get familiar with the rudiments of personal digital archiving. It would be very difficult to cover all the issues below, so comments are limited to a few observations.

It is true that managing a digital collection requires a different kind of attitude – and skill set – to analogue archiving that is far less labour intensive. You cannot simply transfer your digital files onto a hard drive, put it on the shelf and forget about it for ten-fifteen years. If you were to do this, there is a very real possibility the file could not be opened when you return to it.

taking-good-care-of-personal-archive-dpc-2015

Screenshot taken from the DPC guide to Personal Digital Archiving

As Gabriela Redwine explains in the Digital Preservation Coalition’s Technology Watch Report on Personal Digital Archiving, ‘the reality of ageing hardware and software requires us to be actively attuned to the age and condition of the digital items in our care.’ The emerging personal digital archivist therefore needs to learn how to practice actively engaging with their collections if their digital files are to survive in the long term.

Getting to grips with digital preservation, even at a basic level, will undoubtedly involve learning a variety of new skills, terms and techniques. Yet there are some simple, and fairly non-technical, things you can do to get started.

The first point to emphasise is the importance of saving files in more than one location. This is probably the most basic principle of digital preservation.

The good news about digital files is they can be moved, copied and shared with family and friends all over the world with comparable ease. So if there is a fire in one location, or a computer fails in another, it is likely that the file will still be safe in the other place where it is stored.

Employing consistent and clear file naming is also very important, as this enables files to be searched for and found easily.

Beyond this, things get a little more complicated and a whole lot more computer-based. We move into the more specialist area of digital preservation with its heady language of metadatachecksums and emulation, among other terms.

The need for knowledge and competencies

At present it can feel like there is a chasm between the world of private digital archiving, where people rely on third party solutions such as Google or Amazon to store and manage their files, and the professional field of digital preservation, which is populated by tech-specialists and archival whizz-kids.

The reality is that as we move deeper into the digital, file-based future, ordinary people will need to adopt existing preservation tools if they are to learn how to manage their digital collections in a more direct and informed way.

Take, for example, the often cited recommendation for people to migrate or back up their collections on different media at annual or bi-annual intervals. While this advice may be sound, should people be doing this without profiling the file integrity of their collections first? What’s the point in migrating a collection of files, in other words, if half of those files are already corrupted?

In such instances as these, the everyday person may wish to familiarise themselves with existing software tools that can be used to assess and identify potential problems with their personal collections.

DROID (Digital Record Object IDentification), for example, a software tool developed by the UK National Archives, profiles files in your collection in order to facilitate ‘digital continuity’, ‘the ability to use digital information in the way that you need, for as long as you need.’

The open source software can identify over 200 of the most common document, image, audio and video files. It can help tell you what versions you have, their age and size, and when they were last changed. It can also help you find duplicates, and manage your file space more efficiently. DROID can be used to scan individual files or directories, and produces this information in a summary report. If you have never assessed your files before it may prove particularly useful, as it can give a detailed overview.

A big draw back of DROID is that it requires programming knowledge to install, so is not immediately accessible to those without such specialist skills. Fixity is a more user-friendly open source software tool that can enable people to monitor their files, tracking file changes or corruptions. Tools like Fixity and DROID do not ensure that digital files are preserved on their own; they help people to identify and manage problems within their collections. A list of other digital preservation software tools can be found here.

For customers of Greatbear, who are more than likely to be interested in preserving audiovisual archives, AV Preserve have collated a fantastic list of tools that can help people both manage and practice audiovisual preservation. For those interested in the different scales of digital preservation that can be employed, the NDSA (National Digital Stewardship Alliance) Levels of Preservation offers a good overview of how a large national institution envisions best practice.

Tipping Points

We are, perhaps, at a tipping point for how we play back and manage our digital data. The 21st century has been characterised by the proliferation of digital artefacts and memories. The archive, as the fundamental shaper of individual and community identities, has taken central stage in our lives.

With this unparalleled situation, new competencies and confidences certainly need to be gained if the personal archiving of digital files is to become an everyday reality at a far more granular and empowered level than is currently the norm.

Maybe, one day, checking the file integrity of one’s digital collection will be seen as comparable to other annual or bi-annual activities, such as going to the dentist or taking the car for its MOT.

We are not quite there yet, that much is certain. This is largely because companies such as Google make it easy for us to store and efficiently organise personal information in ways that feel secure and manageable. These services stand in stark contrast to the relative complexity of digital preservation software, and the computational knowledge required to install and maintain it (not to mention the amount of time it could take to manage one’s digital records, if you really dedicated yourself to it).

Growing public knowledge about digital archiving, the desire for knowledge and new competencies, as well as the pragmatic fact that digital archives are easier to manage in file-based systems, may encourage the gap between professional digital preservation practices and the interests of everyday, digital citizens, to gradually close over time. Dialogue and greater understanding is most certainly needed if we are to move forward from the current context.

Greatbear want to be part of this process by helping customers have confidence in file-based delivery, rather than rely on formats that are obsolete, of poorer quality and counter-intuitive to the long term preservation of audio visual archives.

We are, as ever, happy to explain the issues in more detail, so please do contact us if there are issues you want to discuss.

We also provide a secure CD to digital file transcription service: Digital audio (CD-DA), data (CD-ROM), audio and data write-once (CD-R) and rewritable media (CD-RW) disc transfer.

Posted by debra in audio tape, digitisation expertise, 0 comments

Significant properties – technical challenges for digital preservation

A consistent focus of our blog is the technical and theoretical issues that emerge in the world of digital preservation. For example, we have explored the challenges archivists face when they have to appraise collections in order to select what materials are kept, and what are thrown away. Such complex questions take on specific dimensions within the world of digital preservation.

If you work in digital preservation then the term ‘significant properties’ will no doubt be familiar to you. The concept has been viewed as a hindrance due to being shrouded by foggy terminology, as well as a distinct impossibility because of the diversity of digital objects in the world which, like their analogue counterparts, cannot be universally generalised or reduced to a series of measurable characteristics.

Cleaning an open reel-to-reel tape

In a technical sense, establishing a set of core characteristics for file formats has been important for initiatives like Archivematica, ‘a free and open-source digital preservation system that is designed to maintain standards-based, long-term access to collections of digital objects.’ Archivematica implement ‘default format policies based on an analysis of the significant characteristics of file formats.’ These systems manage digital information using an ‘agile software development methodology’ which ‘is focused on rapid, iterative release cycles, each of which improves upon the system’s architecture, requirements, tools, documentation, and development resources.’

Such a philosophy may elicit groans of frustration from information managers who may well want to leave their digital collections alone, and practice a culture of non-intervention. Yet this adaptive-style of project management, which is designed to respond rapidly to change, is often contrasted with predictive development that focuses on risk assessment and the planning of long-term projects. The argument against predictive methodologies is that, as a management model, it can be unwieldy and unresponsive to change. This can have damaging financial consequences, particularly when investing in expensive, risky and large scale digital preservation projects, as the BBC’s failed DMI initiative demonstrates.

Indeed, agile software development methodology may well be an important key to the sustainability of digital preservation systems which need to find practical ways of maneuvering technological innovations and the culture of perpetual upgrade. Agility in this context is synonymous with resilience, and the practical application of significant properties as a means to align file format interoperability offers a welcome anchor for a technological environment structured by persistent change.

Significant properties vs the authentic digital object

What significant properties imply, as archival concept and practice, is that desiring authenticity for the digitised and born-digital objects we create is likely to end in frustration. Simply put, preserving all the information that makes up a digital object is a hugely complex affair, and is a procedure that will require numerous and context-specific technical infrastructures.

As Trevor Owens explains: ‘you can’t just “preserve it” because the essence of what matters about “it” is something that is contextually dependent on the way of being and seeing in the world that you have decided to privilege.’ Owens uses the example of the Geocites web archiving project to demonstrate that if you don’t have the correct, let’s say ‘authentic’ tools to interpret a digital object (in this case, a website that is only discernible on certain browsers), you simply cannot see the information accurately. Part of the signal is always missing, even if something ‘significant’ remains (the text or parts of the graphics).

It may be desirable ‘to preserve all aspects of the platform in order to get at the historicity of the media practice’, Jonathan Sterne, author of MP3: Meaning of a Format suggests, but in a world that constantly displaces old technological knowledge with new, settling for the preservation of significant properties may be a pragmatic rather than ideal solution.

Analogue to digital issues

To bring these issues back to the tape we work we with at Great Bear, there are of course times when it is important to use the appropriate hardware to play the tapes back, and there is a certain amount of historically specific technical knowledge required to make the machines work in the first place. We often wonder what will happen to the specialised knowledge learnt by media engineers in the 70s, 80s and 90s, who operated tape machines that are now obsolete. There is the risk that when those people die, the knowledge will die with them. Of course it is possible to get hold of operating manuals, but this is by no means a guarantee that the mechanical techniques will be understood within a historical context that is increasingly tape-less and software-based.  By keeping our wide selection of audio and video tape machines purring, we are sustaining a machinic-industrial folk knowledge which ultimately helps to keep our customer’s magnetic tape-based, media memories, alive.

Of course a certain degree of historical accuracy is required in the transfers because, very obviously, you can’t play a V2000 tape on a VHS machine, no matter how hard you try!

Yet the need to play back tapes on exactly the same machine becomes less important in instances where the original tape was recorded on a domestic reel-to-reel recorder, such as the Grundig TK series, which may not have been of the greatest quality in the first place. To get the best digital transfer it is desirable to play back tapes on a machine with higher specifications that can read the magnetic information on the tape as fully as possible. This is because you don’t want to add any more errors to the tape in the transfer process by playing it back on a lower quality machine, which would then of course become part of the digitised signal.

It is actually very difficult to remove things like wow and flutter after a tape has been digitised, so it is far better to ensure machines are calibrated appropriately before the tape is migrated, even if the tape was not originally recorded on a machine with professional specifications. What is ultimately at stake in transferring analogue tape to digital formats is the quality of the signal. Absolute authenticity is incidental here, particularly if things sound bad.

The moral of this story, if there can be one, is that with any act of transmission, the recorded signal is liable to change. These can be slight alterations or huge drop-outs and everything in-between. The agile software developers know that given the technological conditions in which current knowledge is produced and preserved, transformation is inevitable and must be responded to. Perhaps it is realistic to assume this is the norm in society today, and creating digital preservation systems that are adaptive is key to the survival of information, as well as accepting that preserving the ‘full picture’ cannot always be guaranteed.

Posted by debra in audio / video heritage, audio tape, video tape, 1 comment

Open Source Solutions for Digital Preservation

In a technological world that is rapidly changing how can digital information remain accessible?

One answer to this question lies in the use of open source technologies. As a digital preservation strategy it makes little sense to use codecs owned by Mac or Windows to save data in the long term. Propriety software essentially operate like closed systems and risk compromising access to data in years to come.

Linux Operating System

It is vital, therefore, that the digitisation work we do at Great Bear is done within the wider context of digital preservation. This means making informed decisions about the hardware and software we use to migrate your tape-based media into digital formats. We use a mixture of propriety and open source software, simply because it makes our a bit life easier. Customers also ask us to deliver their files in propriety formats. For example, Apple pro res is a really popular codec that doesn’t take up a lot of data space so our customers often request this, and of course we are happy to provide it.

Using open systems definitely has benefits. The flexibility of Linux, for example, enables us to customise our digitisation system according to what we need to do. As with the rest of our work, we are keen to find ways to keep using old technologies if they work well, rather than simply throwing things away when shiny new devices come on the market. There is the misconception that to ingest vast amounts of audio data you need the latest hardware. All you need in fact is a big hard drive, flexible, yet reliable, software and an operating system that doesn’t crash so it can be left to ingest for 8 hours or more. Simple! Examples of open source software we use is the sound processing programme SoX. This saves us a lot of time because we are able to write scripts for the programme that can be used to batch process audio data according to project specifications.

Openness in the digital preservation world

Within the wider digital preservation world open source technologies are also used widely. From digital preservation tools developed by projects such as SCAPE and the Open Planets Foundation, there are plenty of software resources available for individuals and organisations who need to manage their digital assets. It would be naïve, however, to assume that the practice of openness here, and in other realms of the information economy, are born from the same techno-utopian impulse that propelled the open software movement from the 1970s onwards. The SCAPE website makes it clear that the development of open source information preservation tools are ‘the best approach given the substantial public investment made at the European and national levels, and because it is the most effective way to encourage commercial growth.’

What would make projects like SCAPE and Open Planets even better is if they thought about ways to engage non-specialist users who may be curious about digital preservation tools but have little experience of navigating complex software. The tools may well be open, but the knowledge of how to use them are not.

Openness, as a means of widening access to technical skills and knowledge, is the impulse behind the AV Artifact Atlas (AVAA), an initiative developed in conjunction with the community media archive project Bay Area Video Coalition. In a recent interview on the Library of Congress’ Digital Preservation Blog, Hannah Frost, Digital Library Services Manager at Stanford Libraries and Manager, Stanford Media Preservation Lab explains the idea behind the AVAA.

‘The problem is most archivists, curators and conservators involved in media reformatting are ill-equipped to detect artifacts, or further still to understand their cause and ensure a high quality job. They typically don’t have deep training or practical experience working with legacy media. After all, why should we? This knowledge is by and large the expertise of video and audio engineers and is increasingly rare as the analogue generation ages, retires and passes on. Over the years, engineers sometimes have used different words or imprecise language to describe the same thing, making the technical terminology even more intimidating or inaccessible to the uninitiated. We need a way capture and codify this information into something broadly useful. Preserving archival audiovisual media is a major challenge facing libraries, archives and museums today and it will challenge us for some time. We need all the legs up we can get.’

The promise of openness can be a fraught terrain. In some respects we are caught between a hyper-networked reality, where ideas, information and tools are shared openly at a lightning pace. There is the expectation that we can have whatever we want, when we want it, which is usually now. On the other side of openness are questions of ownership and regulation – who controls information, and to what ends?

Perhaps the emphasis placed on the value of information within this context will ultimately benefit digital archives, because there will be significant investment, as there already has been, in the development of open resources that will help to take care of digital information in the long term.

Posted by debra in audio tape, digitisation expertise, video tape, 0 comments

Convert, Join, re encode AVCHD .MTS files in Ubuntu Linux

convert, encode and join avchd files in linux

One of our audio and video archive customers has a large collection of AVCHD video files that are stored in 1.9GB ‘chunks’ as xxxxx.MTS files. All these files are of 60 minute and longer duration and must be joined, deinterlaced, re encoded to a suitable size and bitrate then uploaded for online access.

This is quite a task in computer time and file handling. These small domestic cameras produce good HD movies for a low cost but the compression to achieve this is very high and does not give you a file that is easily edited. The .MTS files are MPEG transport stream containers for H264 encoded video.

There are some proprietary solutions for MacOS X and Windows that will repackage the .MTS files into .MOV Quicktime containers that can be accessed by MacOS X or re-encoded to a less compressed format for editing with Final Cut Pro or Premiere. We didn’t need this though, just a reliable and  quick open source workflow.

  1. The first and most important issue is to rejoin the camera split files.
    These cameras use FAT32 file systems which cannot handle individual files larger than 2GB so they split the .MTS video file into chunks. As each chunk in a continuous sequence references the other chunks these must be joined in the correct order. This is easily achieved with the cat command.
  2. The rejoined .MTS files can now be reencoded to a more manageable size using open source software such as Handbrake. We also needed to deinterlace our footage as it was shot interlaced and it would be accessed on progressive displays. This will increase the encoding time but without it any movement will look odd with visible artifacts.
  3. Finding the ‘sweet spot’ for encoding can be time consuming but in this case was important as projected text needed to be legible but the file sizes kept manageable for reasonable upload times!

 

Posted by greatbear in digitisation expertise, video tape, 0 comments