Your SlideShare is downloading. ×
Slide 1
Hadoop for
Java Professionals
View Hadoop Courses at : www.edureka.in/hadoop
*
Twitter @edurekaIN, Facebook /edure...
www.edureka.in/hadoopSlide 2
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Objectives of this Ses...
www.edureka.in/hadoopSlide 3
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Big Data
 Lots of Dat...
www.edureka.in/hadoopSlide 4
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Unstructured Data is E...
www.edureka.in/hadoopSlide 5
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Big Data - Challenges
...
Slide 6 www.edureka.in/hadoop
Job Trends: Hadoop and Java
Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Que...
Slide 7 www.edureka.in/hadoop
Job Trends: Hadoop and Java
Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Que...
www.edureka.in/hadoopSlide 8
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Jobs in Hadoop
Big Dat...
www.edureka.in/hadoopSlide 9
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Hadoop for Java Profes...
www.edureka.in/hadoopSlide 10
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Hadoop for Java Profe...
www.edureka.in/hadoopSlide 11
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Hadoop for Big Data
...
www.edureka.in/hadoopSlide 12
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Hadoop and MapReduce
...
www.edureka.in/hadoopSlide 13
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
HDFS (Hadoop Distribu...
www.edureka.in/hadoopSlide 14
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
What is Map - Reduce?...
www.edureka.in/hadoopSlide 15
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
What is Map - Reduce?...
www.edureka.in/hadoopSlide 16
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
A Sample MapReduce pr...
www.edureka.in/hadoopSlide 17
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Problem – Data Proces...
www.edureka.in/hadoopSlide 18
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Huge Raw XML files
wi...
www.edureka.in/hadoopSlide 19
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Other Applications of...
www.edureka.in/hadoopSlide 20
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Pig is a High-level, ...
www.edureka.in/hadoopSlide 21
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
public class IsOfAge ...
www.edureka.in/hadoopSlide 22
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
How to call a UDF?
re...
Slide 23
Questions?
Buy Complete Course at : www.edureka.in/hadoop
Twitter @edurekaIN, Facebook /edurekaIN, use #askEdurek...
Upcoming SlideShare
Loading in...5
×

Hadoop for Java Professionals

4,134

Published on

With the surge in Big Data, organizations have began to implement Big Data related technologies as a part of their system. This has lead to a huge need to update existing skillsets with Hadoop. Java professionals are one such people who have to update themselves with Hadoop skills.

Published in: Technology, Education
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,134
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
53
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Transcript of "Hadoop for Java Professionals"

  1. 1. Slide 1 Hadoop for Java Professionals View Hadoop Courses at : www.edureka.in/hadoop * Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
  2. 2. www.edureka.in/hadoopSlide 2 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Objectives of this Session • Un • Big Data and Hadoop • Why Hadoop? • Job Trends: Hadoop and Java • Hadoop ecosystem • MapReduce Programming and Java • User Defined Functions (UDF) in Pig and Hive • HBase and Java For Queries during the session and class recording: Post on Twitter @edurekaIN: #askEdureka Post on Facebook /edurekaIN
  3. 3. www.edureka.in/hadoopSlide 3 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Big Data  Lots of Data (Terabytes or Petabytes)  Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.  The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization. cloud tools statistics No SQL compression storage support database analyze information terabytes processing mobile Big Data
  4. 4. www.edureka.in/hadoopSlide 4 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Unstructured Data is Exploding  2,500 exabytes of new information in 2012 with internet as primary driver  “Digital universe grew by 62% last year to 800K petabytes and will grow to1.2 zettabytes” this year
  5. 5. www.edureka.in/hadoopSlide 5 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Big Data - Challenges Increasing Data Volumes New data sources and types Email and documents Social Media, Web Logs Machine Device(Scientific) Transactions, OLTP, OLAP
  6. 6. Slide 6 www.edureka.in/hadoop Job Trends: Hadoop and Java Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
  7. 7. Slide 7 www.edureka.in/hadoop Job Trends: Hadoop and Java Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
  8. 8. www.edureka.in/hadoopSlide 8 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Jobs in Hadoop Big Data has opened up the door to new job opportunities, to name a few:  Hadoop Developer  Hadoop Architects  Hadoop Engineers  Hadoop Application Developer  Data Analysts  Data Scientists  Business Intelligence (BI) Architects  Big Data Engineer
  9. 9. www.edureka.in/hadoopSlide 9 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Hadoop for Java Professionals Hadoop is red-hot as it:  allows distributed processing of large data sets across clusters of computers using simple programming model.  has become the de facto standard for storing, processing, and analyzing hundreds of terabytes and petabytes of data.  Is cheaper to use in comparison to other traditional proprietary technologies such as Oracle, IBM etc. It can runs on low cost commodity hardware.  Can handle all types of data from disparate systems such server logs, emails, sensor data, pictures, videos etc.
  10. 10. www.edureka.in/hadoopSlide 10 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Hadoop for Java Professionals (Contd.) Hadoop is Natural career progression for Java professionals.  It is a Java-based framework and written entirely in Java.  The combination of Hadoop and Java skills is the number one combination in demand among all Hadoop Jobs.  Java skills comes handy while writing code for the following in Hadoop:  MapReduce programming using Java  User Defined Functions (UDFs) in PIG and Hive scripts of Hadoop Applications  Client Applications in HBase
  11. 11. www.edureka.in/hadoopSlide 11 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Hadoop for Big Data  Apache Hadoop is a framework that allows for the distributed processing of large data sets across clusters of commodity computers using a simple programming model.  It is an Open-source Data Management with scale-out storage & distributed processing.
  12. 12. www.edureka.in/hadoopSlide 12 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Hadoop and MapReduce Hadoop is a system for large scale data processing. It has two main components:  HDFS – Hadoop Distributed File System (Storage)  highly fault-tolerant  high throughput access to application data  suitable for applications that have large data set  Natively redundant MapReduce (Processing)  software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) in a reliable, fault-tolerant manner  Splits a task across processors Map-Reduce Key Value
  13. 13. www.edureka.in/hadoopSlide 13 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions HDFS (Hadoop Distributed File System) Pig Latin Data Analysis Hive DW System MapReduce Framework HBase Important Hadoop Eco-System components
  14. 14. www.edureka.in/hadoopSlide 14 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions What is Map - Reduce? cloud support database  Map - Reduce is a programming model  It is neither platform- nor language-specific  Record-oriented data processing (key and value)  Task distributed across multiple nodes  Where possible, each node processes data stored on that node  Consists of two phases  Map  Reduce ValueKey MapReduce
  15. 15. www.edureka.in/hadoopSlide 15 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions What is Map - Reduce? (Contd.) cloud support database Process can be considered as being similar to a Unix pipeline cat /my/log | grep '.html' | sort | uniq –c > /my/outfile MAP SORT REDUCE
  16. 16. www.edureka.in/hadoopSlide 16 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions A Sample MapReduce program in Java
  17. 17. www.edureka.in/hadoopSlide 17 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Problem – Data Processing
  18. 18. www.edureka.in/hadoopSlide 18 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Huge Raw XML files with unstructured data line reviews Map Reduce HDFS Category hash url +tive -tive total Problem - Data Processing Output
  19. 19. www.edureka.in/hadoopSlide 19 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Other Applications of Java Skills in Hadoop – UDFs
  20. 20. www.edureka.in/hadoopSlide 20 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Pig is a High-level, declarative data flow language.  It is at the top of Hadoop and makes it possible to create complex jobs to process large volumes of data quickly and efficiently.  Similar to SQL query where the user specifies the “what” and leaves the “how” to the underlying processing engine. Hadoop Pig User Defined Functions (UDFs) in PIG
  21. 21. www.edureka.in/hadoopSlide 21 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions public class IsOfAge extends FilterFunc { @Override public Boolean exec(Tuple tuple) throws IOException { if (tuple == null || tuple.size() == 0) { return false; } try { Object object = tuple.get(0); if (object == null) { return false; } int i = (Integer) object; if (i == 18 || i == 19 || i == 21 || i == 23 || i == 27) { return true; } else { return false; } } catch (ExecException e) { throw new IOException(e); } } }  A Program to create UDF: Pig Latin – Creating UDF
  22. 22. www.edureka.in/hadoopSlide 22 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions How to call a UDF? register myudf.jar; X = filter A by IsOfAge(age); Pig and UDF
  23. 23. Slide 23 Questions? Buy Complete Course at : www.edureka.in/hadoop Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in/hadoop Interested in learning “Big-Data & Hadoop”? Let us know by mailing us at sales@edureka.in

×