Objective
![Comparison Between Hadoop 2.x vs Hadoop 3.x]()
Comparison Between Hadoop 2 vs Hadoop 3 Features Hadoop 2.x Hadoop 3.x License Apache 2.0, Open Source Apache 2.0, Open Source Minimum supported version of java Minimum supported version of java is java 7. Minimum supported version of java is java 8 Fault tolerance Fault tolerance can be handled by replication (which is wastage of space) Fault tolerance can be handled by erasure coding ( follow this tutorial for more info about erasure coding ) Data Balancing For data balancing usesHDFS balancer. For data balancing uses intra datanode balancer, which is invoked via the hdfs disk balancer CLI. Storage Scheme Uses 3X replication scheme Support for erasure encoding in hdfs. Storage overhead HDFS has 200% overhead in storage space Storage overhead is only 50% Storage overhead example If there is 6 block so there will be 18 blocks occupied the space because of replication scheme. If there is 6 block so there will be 9 block occupied the space 6 block and 3 for parity. YARN timeline service Uses an old timeline service which has scalability issues. Improve the timeline service v2 and improves the scalability and reliability of timeline service. Default ports range InHadoop 2.0 some default ports are linux ephemeral port range. So at the time of startup they will be fail to bind. But in hadoop 3.0 these ports have been moved out of the ephemeral range. Tools UsesHive, pig, Tez, Hama, Giraph and other hadoop tools Hive, pig, Tez, Hama, Giraph and other hadoop tools are available. Compatible file system HDFS (Default FS), FTP File system: This stores all its data on remotely accessible FTP servers. Amazon S3 (Simple Storage Service) file system windows Azure Storage Blobs (WASB) file system. It supports all the previous one as well as Microsoft Azure Data Lake filesystem. Datanode Resources Datanode resource is not dedicated for themapreduce we can use it for other application. Here also datanode resources can be used for other Applications too MR API compatibity MR API compatible with hadoop 1.x program to execute on hadoop 2.X Here also MR API is compatible with running hadoop 1.x programs to execute on hadoop 3.X support for Microsoft windows It can be deployed on windows it also supports for Microsoft windows Slots / container Hadoop 1 works on concept of slots but hadoop 2.X works on the concept of the container. Through in the container we can run generic task. It also works on the concept of container. Single point of failure Has Features to overcome SPOF so whenever Namenode fails it recovers automatically Has Feature to overcome SPOF so whenever Namenode fail it recovers automatically no needs manual intervention to overcome it HDFS Federation In hadoop 1.0- only single NameNode to manage all Namespace but in Hadoop 2.0- mutiple NameNode for Mutiple Namespace Hadoop 3.x also have multiple Namenode for multiple namespace Scalibility we can scale up to 10000 Nodes per cluster Better scalability. we can scale more than 10000 nodes
per cluster Faster access to data due to data Node caching we can fast access the data Here also through Datanode caching we can fast access the
data HDFS snapshot Hadoop 2 adds the support for snapshot. it provides disaster recovery and protection for user error Haddop 2 also support for the snapshot feature. platform Can serve as a platform for a wide variety of data analytics possible to run event processing, streaming and real time operations. Here also it is possible to run event processing, streaming and real time operation on the top of Yarn Cluster Resource Management For cluster resource Management it usesYARN. It improves scalability, high availability, Multi-tenancy. For cluster resource Management Uses YARN, with all the features
In this tutorial we will discuss about the comparison between Hadoop 2.x vsHadoop 3.x. What are the new features added in Hadoop version 3, is hadoop 2 programs compatible in hadoop 3, what are the difference between hadoop 2 and hadoop 3.Feature wise comparison between Hadoop 2 and Hadoop 3.

Comparison Between Hadoop 2 vs Hadoop 3 Features Hadoop 2.x Hadoop 3.x License Apache 2.0, Open Source Apache 2.0, Open Source Minimum supported version of java Minimum supported version of java is java 7. Minimum supported version of java is java 8 Fault tolerance Fault tolerance can be handled by replication (which is wastage of space) Fault tolerance can be handled by erasure coding ( follow this tutorial for more info about erasure coding ) Data Balancing For data balancing usesHDFS balancer. For data balancing uses intra datanode balancer, which is invoked via the hdfs disk balancer CLI. Storage Scheme Uses 3X replication scheme Support for erasure encoding in hdfs. Storage overhead HDFS has 200% overhead in storage space Storage overhead is only 50% Storage overhead example If there is 6 block so there will be 18 blocks occupied the space because of replication scheme. If there is 6 block so there will be 9 block occupied the space 6 block and 3 for parity. YARN timeline service Uses an old timeline service which has scalability issues. Improve the timeline service v2 and improves the scalability and reliability of timeline service. Default ports range InHadoop 2.0 some default ports are linux ephemeral port range. So at the time of startup they will be fail to bind. But in hadoop 3.0 these ports have been moved out of the ephemeral range. Tools UsesHive, pig, Tez, Hama, Giraph and other hadoop tools Hive, pig, Tez, Hama, Giraph and other hadoop tools are available. Compatible file system HDFS (Default FS), FTP File system: This stores all its data on remotely accessible FTP servers. Amazon S3 (Simple Storage Service) file system windows Azure Storage Blobs (WASB) file system. It supports all the previous one as well as Microsoft Azure Data Lake filesystem. Datanode Resources Datanode resource is not dedicated for themapreduce we can use it for other application. Here also datanode resources can be used for other Applications too MR API compatibity MR API compatible with hadoop 1.x program to execute on hadoop 2.X Here also MR API is compatible with running hadoop 1.x programs to execute on hadoop 3.X support for Microsoft windows It can be deployed on windows it also supports for Microsoft windows Slots / container Hadoop 1 works on concept of slots but hadoop 2.X works on the concept of the container. Through in the container we can run generic task. It also works on the concept of container. Single point of failure Has Features to overcome SPOF so whenever Namenode fails it recovers automatically Has Feature to overcome SPOF so whenever Namenode fail it recovers automatically no needs manual intervention to overcome it HDFS Federation In hadoop 1.0- only single NameNode to manage all Namespace but in Hadoop 2.0- mutiple NameNode for Mutiple Namespace Hadoop 3.x also have multiple Namenode for multiple namespace Scalibility we can scale up to 10000 Nodes per cluster Better scalability. we can scale more than 10000 nodes
per cluster Faster access to data due to data Node caching we can fast access the data Here also through Datanode caching we can fast access the
data HDFS snapshot Hadoop 2 adds the support for snapshot. it provides disaster recovery and protection for user error Haddop 2 also support for the snapshot feature. platform Can serve as a platform for a wide variety of data analytics possible to run event processing, streaming and real time operations. Here also it is possible to run event processing, streaming and real time operation on the top of Yarn Cluster Resource Management For cluster resource Management it usesYARN. It improves scalability, high availability, Multi-tenancy. For cluster resource Management Uses YARN, with all the features