How To Differentiate Between Big Data Hadoop Old API and New API
This article will make you understand about big Data and hadoop programming old and new API. Hadoop experts are going to differentiate between Old API and New API on the basis of execution and output, control, configuration, communication, and class and location.
We are introducing the difference between the Hadoop Old API(0.20) and Hadoop new API(1.x or 2.x).
- Conventions that I followed in this artical:
File Name
Class Name
Package Name
Difference |
OLD API |
New API |
Class & Location |
||
Package |
Old api is still present in ever newer hadoop version and can be found under org.apache.hadoop.mapreduce package. |
New api can be found under org.apache.hadoop.mapreduce package. |
Mapper & Reducer
|
In Old API Mapper and Reducers are defined as interface. |
In New API Mapper and Reducer are defined as classes, so we can extend it and have Custom Mappers and Reducers without breaking old implementation of classes. |
Control,Configuration and Communication |
||
Job Control
|
In old API job control is done through JobClient(JobClient does not exist in New API). |
In new API job control is done through Job class. |
Job Configuration |
JobConf object is used for Job configuration in old API.
|
Job Configuration is done through Configuration object and with the help of helper methods on Job. |
Mapper-Reducer Communication |
In Old API mapper passes values to reducer as java.lang.Iterator |
In New API mapper passes values to reducer as java.lang.Iterable |
User code communication with MapReduce System
|
JobConf, OutputCollector, and Reporter objects are used for communication with Map Reduce System in Old API. |
Context object is used for communication with Map Reduce System in New API. |
Execution & Output |
||
Output File name in HDFS |
In Old API Mapper and Reducer output files are named like part-nnnnn in HDFS. |
In the new API Mapper outputs are named part-m-nnnnn, and Reducer outputs are named part-r-nnnnn (where nnnnn is an integer |
Distributed Cache File Addition |
In old API we can add files and archieves to distributed cache using, JobConf jobConf = new JobConf(); DistributedCache.addXXX(new URI(“URI Location”),jobConf)
|
In new API Hadoop 1.0.4 we can add using, Configuration configuration = new Configuration(); In Hadoop 2.6.0 we can add using, Job job = new Job(configuration); |
Hope the concept of big data hadoop Old API and New API is clear to you. You can subscribe to our blog and get latest updates about hadoop application development. Share your genuine reviews for this entry.