Big Data

Capitalising on the archival market: SONY’s 185 TB tape cartridge

In Trevor Owen’s excellent blog post ‘What Do you Mean by Archive? Genres of Usage for Digital Preservers’, he outlines the different ways ‘archive’ is used to describe data sets and information management practices in contemporary society. While the article shows it is important to distinguish between tape archives, archives as records management, personal papers and computational archives, Owens does not include an archival ‘genre’ that will become increasingly significant in the years to come: the archival market.

The announcement in late April 2014 that SONY has developed a tape cartridge capable of storing 185 TB of data was greeted with much excitement throughout the teccy world. The invention, developed with IBM, is ‘able to achieve the high storage capacity by utilising a “nano-grained magnetic layer” consisting of tiny nano-particles’ and boasts the world’s highest areal recording density of 148 Gb/in.

The news generated such surprise because it signaled the curious durability of magnetic tape in a world thought to have ‘gone tapeless‘. For companies who need to store large amounts of data however, tape storage, usually in the form of Linear Tape Open Cartridges, has remained an economically sound solution despite the availability of file-based alternatives. Imagine the amount of energy required to power up the zettabytes of data that exist in the world today? Whatever the benefits of random access, that would be a gargantuan electricity bill.

Indeed, tape cartridges are being used more and more to store large amounts of data. According to the Tape Storage Council industry group, tape capacity shipments grew by 13 percent in 2012 and were projected to grow by 26 percent in 2013. SONY’s announcement is therefore symptomatic of the growing archival market which has created demand for cost effective data storage solutions.

It is not just magnetic tape that is part of this expanding market. Sony, Panasonic and Fuji are developing optical ‘Archival discs’ capable of storing 300GB (available in summer 2015 ), with plans to develop 500GB and 1 TB disc.

Why is there such a demand for data storage?

Couldn’t we just throw it all away?

The Tape Storage Council explain:

‘This demand is being driven by unrelenting data growth (that shows no sign of slowing down), tape’s favourable economics, and the prevalent data storage mindset of “save everything, forever,” emanating from regulatory, compliance or governance requirements, and the desire for data to be repurposed and monetized in the future.’

Big Data Elephant The radical possibilities of data-based profit-making abound in the ‘buzz’ that surrounds big data, an ambitious form of data analytics that has been embraced by academic research councils, security forces and multi-national companies alike.

Presented by proponents as the way to gain insights into consumer behaviour, big data apparently enables companies to unlock the potential of ‘data-driven decision making.’ For example, an article in Computer Weekly describes how Ebay is using big data analytics so they can better understand the ‘customer journey’ through their website.

Ebay’s initial forays into analysing big data were in fact relatively small: in 2002 the company kept around 1% of customer data and discarded the rest. In 2007 the company changed their policy, and worked with an established company to develop a custom data warehouse which can now run ad-hoc queries in just 32 seconds.

It is not just Ebay who are storing massive amounts of customer data. According to the BBC, ‘Facebook has begun installation of 10,000 Blu-ray discs in a prototype storage cabinet as back-ups for users’ photos and videos’. While for many years the internet was assumed to be a virtual, almost disembodied space, the desire from companies to monetise information assets mean that the incidental archives created through years of internet searches, have all this time been stored, backed up and analysed.

Amid all the excitement and promotion of big data, the lack of critical voices raising concern about social control, surveillance and ethics is surprising. Are people happy that the data we create is stored, analysed and re-sold, often without our knowledge or permission? What about civil liberties and democracy? What power do we have to resist this subjugation to the irrepressible will of the data-driven market?

These questions are pressing, and need to be widely discussed throughout society. Current predictions are that the archive market will keep growing and growing.

‘A recent report from the market intelligence firm IDC estimates that in 2009 stored information totalled 0.8 zetabytes, the equivalent of 800 billion gigabytes. IDC predicts that by 2020, 35 zetabytes of information will be stored globally. Much of that will be customer information. As the store of data grows, the analytics available to draw inferences from it will only become more sophisticated.

The development of SONY’s 185 TB tape indicate they are well placed to capitalise on these emerging markets.

The kinds of data stored on the tapes when they become available for professional markets (these tapes are not aimed at consumers) will really depend on the legal regulations placed on companies doing the data collecting. As the case of eBay discussed earlier makes clear, companies will collect all the information if they are allowed to. But should they be? As citizens in the internet society  how can ensure we have a ‘right to be forgotten’? How are the shackles of data-driven control societies broken?

Posted by debra in audio tape, 0 comments

Software Across Borders? The European Archival Records and Knowledge Preservation (E-Ark) Project

The latest big news from the digital preservation world is that the European Archival Records and Knowledge Preservation – (E-Ark), a three year, multinational research project, has received a £6M award from the European Commission ‘to create a revolutionary method of archiving data, addressing the problems caused by the lack of coherence and interoperability between the many different systems in use across Europe,’ the Digital Preservation Coalition, who are partners in the project, report.

What is particularly interesting about the consortium E-Ark has brought together is commercial partners will be part of a conversation that aims to establish long term solutions for digital preservation across Europe. More often than not, commercial interests have driven technological innovations used within digital preservation. This has made digital data difficult to manage for institutions both large and small, as the BBC’s Digital Media Initiative demonstrates, because the tools and protocols are always in flux. A lack of policy-level standards and established best practices has meant that the norm within digital information management has very much been permanent change.

Such a situation poses great risks for both digitised and born digital collections because information may have to be regularly migrated in order to remain accessible and ‘open’. As stated on the E-Ark website, ‘the practices developed within the project will reduce the risk of information loss due to unsuitable approaches to keeping and archiving of records. The project will be public facing, providing a fully operational archival service, and access to information for its users.’

Vectorscope

The E-Ark project will hopefully contribute to the creation of compatible systems that can respond to the different needs of groups working with digital information. Which is, of course, just about everybody right now: as the world economy becomes increasingly defined by information and ‘big data’, efficient and interoperable access to commercial and non-commercial archives will be an essential part of a vibrant and well functioning economic system. The need to establish data systems that can communicate and co-operate across software borders, as well as geographical ones, will become an economic necessity in years to come.

The task facing E-Ark is huge, but one crucial to implement if digital data is to survive and thrive in this brave new datalogical world of ours. As E-Ark explain: ‘Harmonisation of currently fragmented archival approaches is required to provide the economies of scale necessary for general adoption of end-to-end solutions. There is a critical need for an overarching methodology addressing business and operational issues, and technical solutions for ingest, preservation and re-use.’

Maybe 2014 will be the year when digital preservation standards start to become a reality. As we have already discussed on this blog, the US-based National Agenda for Digital Stewardship 2014 outlined the negative impact of continuous technological change and the need to create dialogue among technology makers and standards agencies. It looks like things are changing and much needed conversations are soon to take place, and we will of course reflect on developments on the Great Bear blog.

 

Posted by debra in audio tape, video tape, 0 comments