limited to 4000 nodes per cluster | supports upto 10000 nodes in a cluster | supports over 10000 in a cluster |
does not support Microsoft Windows | added support for Microsoft Windows | supports Microsoft Windows |
works on the concept of slots | works on the concept of containers | works on the concept of containers. |
no standby name node | supports only 1 standby name node | supports 2 or more standby name node |
has single point of failure i.e. name node | has standby name node to overcome single point of failure, so whenever name node fails it recovers automatically | has features to overcome single point of failure, so whenever name node fails it recovers automatically |
no need for backward compatibility. MR API compatible and executed without any additional files | MR API compatible with Hadoop 1.x program to execute on Hadoop 2.x | here also MR API is compatible with running Hadoop 1.x programs to execute on Hadoop 3.x |
data processing was a problem in Hadoop 1.x as MapReduce wasn’t good enough for processing (resource management) | in Hadoop 2.x, YARN ( yet another resource negotiator) provides control resource manager that share a common resource to run multiple applications but has some scalability issues | Hadoop 3.x provides a more optimal use of resources using a newer version of yarn which improves the scalability and reliability of timeline service |
multi-tenancy was not supported in Hadoop 1.x | multi-tenancy was introduced in Hadoop 2.x | Hadoop 3.x also supports multi-tenancy |
replication factor was 3 | used 3x replication scheme | replication factor was reduced to 2 |
fault tolerance was done via replication | fault tolerance was also done via replication | fault tolerance was improved via erasure encoding |
the default block size of a block in HDFS is 64MB | the default block size of a block in HDFS is 128MB | the typical block size used by HDFS is 128MB |
was released in 2008 | was released in 2011-12 | was released in 2016-17 |
– | here HDFS occupies a 200% overhead storage space | here HDFS occupies only 50% overhead storage space |
MapReduce performed all the task | HDFS balancer is used for data balancing | intra-data node balancer is used for data balancing which is invoked via the HDFS disk balancer CLI. |
manual intervention is needed | manual intervention is needed | manual intervention is not needed |
the user needs to configure HADOOP_HEAPSIZE | the user needs to configure HADOOP_HEAPSIZE | provides auto-tuning of heap |
licensed under Apache 2.0 License | licensed under Apache 2.0 License | licensed under Apache 2.0 License |
supports only one namespace per cluster for managing HDFS filesystem | supports multiple namespaces in a cluster for managing HDFS filesystem | supports multiple namespace in a cluster for managing HDFS filesystem |
supports only one programming model i.e. MapReduce | supports multiple programming models with YARN Component like MapReduce, Hive, Pig, Giraph, HBase, and other Hadoop tools | supports multiple programming models with YARN Component like MapReduce, Hive, Pig, Tez, Hama, Giraph, HBase, Spark, Storm, and other Hadoop tools |
– | some default port lies within the range of ephemeral ports | no default port lies within the range of ephemeral ports |
supported file systems are HDFS (default file system), and FTP file system | supported file systems are HDFS (default file system), FTP file system, Amazon S3 file system, and Windows Azure storage blobs file system | supported file systems are HDFS (default file system), FTP file system, Amazon S3 file system, and Windows Azure storage blobs file system as well as Microsoft Azure DataLake file system |
datanode resource is dedicated for MapReduce | datatnode resource is no dedicated for MapReduce and can be used by other applications | datatnode resource is no dedicated for MapReduce and can be used by other applications |
Leave a Reply