In the last three years, the demands from customers have grown exponentially. Like many companies, Yahoo, Inc. is adapting to better serve itscustomers and provide a better user experience. In his talk as keynote speaker today duringHadoop Summit 2016 in San Jose, CA, Mark Holderbaugh ,senior director of Hadoop Engineering at Yahoo, discussed the three major highlights of what Yahoo has done to meet these growing demands.
1. YARNThe first thing Yahoo did was look at YARN, a cluster management technology, as a tool to increase utilization. Due to the YARN schedule, they were only getting 40 percent, so Yahoo needed to find a way to increase that figure. Based on feedback from the nodes, they were able to adjust and work on getting that percentage point up to a more favorable number.
YARN turned out to be a worthy investment that returned better utilization.
2. Migration to TezThe second action Yahootook was to migrate to Apache Tez, which isaimed at building an application framework thatallows for a complex directed-acyclic-graph of tasks for processing data.
“Tez is the key that gives us up to zero minimal changes to jobs,” said Holderbaugh. This allowed the companyto run millions of jobs and raise utilization 50 percent, just by switching to Tez. However, it was not a dynamic knife switch on each cluster, Holderbaugh emphasized. Each job has different specifications and changes every day, so it all has to be done individually on a case-by-case basis. This switch resulted in a reduction of runtime hours and memory. Yahoohad a 30 percent gain just from switching one pipeline.
Italso started migrating Apache Hive jobs from Tez. “Hive gave us better utilization and improved latencies, allowing us to do more demand-type latent jobs,” said Holderbaugh. It shows that these latencies are on which node and increases the ability to get those latencies out of jobs. This avoids the need for extra clusters, and ultimately saves money.
RELATED: Hortonworks' data platform ready for future expansions | #HS16Dublin
3. Apache StormThe third area Yahoo focused on was Apache Storm utilization. According to Holderbaugh, Yahoo has embraced Storm (anopen-source distributed real-time computation system) since its creation in 2012. Storm is now being used in every part of Yahoo. It’s being used for data analyzation, as well as monitor clusters, and the utilization is even lower than in Hadoop clusters. Yahoo’sreasoning for doing this was a mission to keep the fine-grained stuff while improving utilization.
Holderbaugh also emphasized that these topics were the subject for many panels already in the course of the Hadoop Summit, and even more planned for today. The schedule is rife with opportunities to learn more about Yahoo’s efforts at optimization and much more. He encouraged the audience to go back and watch these talks or attend what they could for inspiration.
Hadoop in the cloudAfter Holderbaugh finished his talk, Sanjay Radia , founder and architect at Hortonworks, Inc., took the stage. Radia’s talk focused on why you would want to put Hadoop in the cloud. First of all, it’s not actually a new idea, he said. Companies have been doing it for years. One major reason for that is the time and money you can save. There are no hardware costs involved, you don’t need an expert on staff and the cloud offers more elasticity. It can even create a cluster in minutes. In addition, it takes away some of the complexity by offering pre-tuned clusters.
Raida also emphasized that having shared data “fundamentally means we need shared management.” He talked about how important it is to have a shared metadata so that the data isn’t replicated and taking up needless space, which translates easily into a waste of money. You can accomplish this by having a shared database server, Radia said.
Collaboration and the cloud is certainly a theme atthis year’s Hadoop Summit, as is using emerging technology to help your company run more smoothly and cost effectively than ever before.
RELATED: JP Morgan warns of "major ripple effect" as cloud adoption accelerates
Stay tuned forthe full video interview, and be sure to check out more of SiliconANGLE and theCUBE’s coverage of the Hadoop Summit US .
Photo by SiliconANGLE
Brittany GreanerA graduate of UW Eau Claire, Brittany Greaner is a SiliconANGLE writer covering live news events with theCUBE. Telling stories has always been her passion, and she has written everything from short stories to haikus. An interest in international issues and human rights drove her to join Americorps VISTA for a year of service, where she worked with a local nonprofit to support refugees and immigrants in Pennsylvania.
She has also lived in Japan for a year, where she learned how to shoot a bow like a pro and navigate trains like she wasn’t someone from small-town Wisconsin.
Got news? Tweet us @siliconangle
Latest posts by Brittany Greaner ( see all )
3 ways Yahoo employed Hadoop to optimize utilization | #HS16SJ - June 30, 2016 How flexible is the hybrid cloud? | #RHSummit - June 30, 2016 Container networking closing the gap on virtualization | #RHSummit - June 30, 2016