
It was named after a child’s toy, when software developer Doug Cutting borrowed the word from his son for “a way to take a bunch of computers and make them appear as one computer to software”, according to a video he recorded for the big data management software’s recent 10th anniversary. But Hadoop’s first public sector users were far from cuddly.
In 2009, the US National Security Agency (NSA) acknowledged it was using Hadoop, and in 2011 it transferred control of its data storage and retrieval system Accumulo , built on top of Hadoop, to software foundation Apache as an open source project, giving it the same status as Hadoop itself.
GCHQ also lists Hadoop as a technology used by staff working in a wide variety of areas, including data access, data science, networks, high-performance technology, the internet and data mining. The agency has joined the NSA in contributing to the Hadoop ecosystem by making Gaffer, a framework for building systems to store and process graphs on a Hadoop cluster running Accumulo, available through Github .
The use of Hadoop is now spreading across the UK public sector, with HM Revenue and Customs (HMRC) and the Home Office involved in significant projects. Both are doing so with commercial distributors of the software: Cloudera for HMRC, and Hortonworks for the Home Office.
Cloudera at HMRC“HMRC has built an enterprise data hub a powerful central repository for all of its data, which will help it to personalise services to customers and strengthen its compliance work,” says a spokesperson for the tax agency.
“HMRC will be able to store and analyse data using a mix ofopen source and closed source tools, and commodity hardware, representing better value for money for taxpayers. With HMRC’s data in Hadoop it will deliver new operational efficiency combined with the ability to analyse the data and gain insights that were previously extremely difficult to discover.”
HMRC spent 7.4m on Cloudera in January 2015, and a further 860,000 in February 2016 . The Cloudera project represented what a document released in July 2015 called “further spend” on the enterprise data hub. The supplier says the project has been running for two years.
According to Cloudera, the National Crime Agency’s National Cyber Crime Unit is also “developing a Cloudera platform to gain new insights and greater intelligence from the varied types of data that it uses”.
Meanwhile, the Office for National Statistics reported it had spent 242,000 with Cloudera in May 2016, and has been advertising for a supplier with experience of Cloudera’s software stack to work on its address index referencing service, which will be used as part of the 2021 Census.
Untapped Hadoop potential“Government, like a lot of old industry, has to go through that digital transformation , modernising its digital architecture, breaking down those silos,” says Stephen Line, Cloudera vice-president for northern Europe. “The UK is not necessarily behind or ahead particularly.”
But he adds there is untapped potential for smaller parts of the public sector to make use of Hadoop. Local authorities looking for fraud or NHS organisations working to improve their performance could benefit.
Line points to an unnamed US hospital group using Cloudera to reduce re-admission rates by predicting which patients have a high risk of re-admission and giving them additional medical care. Cloudera says its customer avoids 6,000 re-admissions annually and saves $76m in costs and Medicare penalties through use of the system. At present, the supplier does not have any NHS customers, although Line says there are “ongoing conversations”.
Hortonworks at the Home OfficeSpending data from the Home Office records two payments to Cloudera’s rival Hortonworks of 53,000 in August 2014 and a further 61,000 in October of that year.
Hortonworks refused to contribute to this article, but in September 2015 the Home Office’s senior enterprise architect, Wayne Horkan, told a company event that its attraction was its “alignment to open source… it protects us from vendor change and lock-in, which we are not too keen on at the moment”. In comments published by Diginomica , he added: “The other piece is that you roll up everything together, get consistent build and delivery that’s really useful to us. There is also a maturity in the ecosystem.”
The Home Office is using Hadoop as part of its project for technology platforms for tomorrow to link up numerous databases managed by the department, according to a presentation made at a Hadoop Users Group UK meeting earlier this year and reported by The Register . This could allow policing and security data on individuals to be joined up for the use of border officials and police officers, and also for machine learning.
The Home Office’s work with Hadoop was criticised by Liberal Democrat leader Tim Farron, who said: “Trying to get away with a substantial change simply by labelling it as IT replatforming is simply unacceptable.” He told The Register : “Trying to bypass parliament is not an option and the home secretary must come clean about her real intentions,” referring to now prime minister Theresa May.
MapR in India and t