How To Differentiate Between Big Data Hadoop Old API and New API

This article will make you understand about big Data and hadoop programming old and new API. Hadoop experts are going to differentiate between Old API and New API on the basis of execution and output, control, configuration, communication, and class and location.

We are introducing the difference between the Hadoop Old API(0.20) and Hadoop new API(1.x or 2.x).

Conventions that I followed in this artical:

File Name

Class Name

Package Name

Difference	OLD API	New API
Class & Location
Package	Old api is still present in ever newer hadoop version and can be found under org.apache.hadoop.mapreduce package.	New api can be found under org.apache.hadoop.mapreduce package.
Mapper & Reducer	In Old API Mapper and Reducers are defined as interface.	In New API Mapper and Reducer are defined as classes, so we can extend it and have Custom Mappers and Reducers without breaking old implementation of classes.
Control,Configuration and Communication
Job Control	In old API job control is done through JobClient(JobClient does not exist in New API).	In new API job control is done through Job class.
Job Configuration	JobConf object is used for Job configuration in old API.	Job Configuration is done through Configuration object and with the help of helper methods on Job.
Mapper-Reducer Communication	In Old API mapper passes values to reducer as java.lang.Iterator	In New API mapper passes values to reducer as java.lang.Iterable
User code communication with MapReduce System	JobConf, OutputCollector, and Reporter objects are used for communication with Map Reduce System in Old API.	Context object is used for communication with Map Reduce System in New API.
Execution & Output
Output File name in HDFS	In Old API Mapper and Reducer output files are named like part-nnnnn in HDFS.	In the new API Mapper outputs are named part-m-nnnnn, and Reducer outputs are named part-r-nnnnn (where nnnnn is an integer designating the part number, starting from zero).
Distributed Cache File Addition	In old API we can add files and archieves to distributed cache using, JobConf jobConf = new JobConf(); DistributedCache.addXXX(new URI(“URI Location”),jobConf)	In new API Hadoop 1.0.4 we can add using, Configuration configuration = new Configuration(); DistributedCache.addXXX(Path, configuration); In Hadoop 2.6.0 we can add using, Job job = new Job(configuration); job.addArchiveToClassPath(new Path(“Path to Archive”));

Hope the concept of big data hadoop Old API and New API is clear to you. You can subscribe to our blog and get latest updates about hadoop application development. Share your genuine reviews for this entry.