How To Differentiate Between Big Data Hadoop Old API and New API

This article will make you understand about big Data and hadoop programming old and new API. Hadoop experts are going to differentiate between Old API and New API on the basis of execution and output, control, configuration, communication, and class and location.

We are introducing the difference between the Hadoop Old API(0.20) and Hadoop new API(1.x or 2.x).

  • Conventions that I followed in this artical:

                       File Name

                       Class Name

                       Package Name

 Difference 

OLD API

New API

Class & Location

Package

Old api is still present in ever newer hadoop version and can be found under  org.apache.hadoop.mapreduce package.

New api can be found under

org.apache.hadoop.mapreduce

package.

Mapper & Reducer

 

 

 

In Old API Mapper and Reducers are defined as interface.

In New API Mapper and Reducer are defined as classes, so we can extend it and have Custom Mappers and Reducers without breaking old implementation of classes.

Control,Configuration and Communication

Job Control 

 

In old API job control is done through JobClient(JobClient does not exist in New API).

In new API job control is done through Job class.

 Job Configuration

JobConf object is used for Job configuration in old API.

 

Job Configuration is done through Configuration object and with the help of helper methods on Job.

Mapper-Reducer

Communication

In Old API mapper passes values to reducer as java.lang.Iterator

In New API mapper passes values to reducer as java.lang.Iterable

User code communication with MapReduce System

 

 

JobConf, OutputCollector, and Reporter objects are used for communication with Map Reduce System in Old API.

Context object is used for communication with Map Reduce System in New API.

Execution & Output

Output File name in HDFS

In Old API Mapper and Reducer output files are named like part-nnnnn in HDFS.

In the new API Mapper outputs are named part-m-nnnnn, and Reducer outputs are named part-r-nnnnn (where nnnnn is an integer
designating the part number, starting from zero).

Distributed Cache File Addition

In old API we can add files and archieves to distributed cache using,
JobConf jobConf = new JobConf();
DistributedCache.addXXX(new URI(“URI Location”),jobConf)

 

In new API Hadoop 1.0.4 we can add using,

Configuration configuration = new Configuration();
DistributedCache.addXXX(Path, configuration);

In Hadoop 2.6.0 we can add using,

Job job = new Job(configuration);
job.addArchiveToClassPath(new Path(“Path to Archive”));

 

Hope the concept of big data hadoop Old API and New API is clear to you. You can subscribe to our blog and get latest updates about hadoop application development. Share your genuine reviews for this entry.