Hadoop for Java Professionals

  • 3,720 views
Uploaded on

With the surge in Big Data, organizations have began to implement Big Data related technologies as a part of their system. This has lead to a huge need to update existing skillsets with Hadoop. Java …

With the surge in Big Data, organizations have began to implement Big Data related technologies as a part of their system. This has lead to a huge need to update existing skillsets with Hadoop. Java professionals are one such people who have to update themselves with Hadoop skills.

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,720
On Slideshare
0
From Embeds
0
Number of Embeds
6

Actions

Shares
Downloads
45
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Slide 1 Hadoop for Java Professionals View Hadoop Courses at : www.edureka.in/hadoop * Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
  • 2. www.edureka.in/hadoopSlide 2 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Objectives of this Session • Un • Big Data and Hadoop • Why Hadoop? • Job Trends: Hadoop and Java • Hadoop ecosystem • MapReduce Programming and Java • User Defined Functions (UDF) in Pig and Hive • HBase and Java For Queries during the session and class recording: Post on Twitter @edurekaIN: #askEdureka Post on Facebook /edurekaIN
  • 3. www.edureka.in/hadoopSlide 3 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Big Data  Lots of Data (Terabytes or Petabytes)  Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.  The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization. cloud tools statistics No SQL compression storage support database analyze information terabytes processing mobile Big Data
  • 4. www.edureka.in/hadoopSlide 4 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Unstructured Data is Exploding  2,500 exabytes of new information in 2012 with internet as primary driver  “Digital universe grew by 62% last year to 800K petabytes and will grow to1.2 zettabytes” this year
  • 5. www.edureka.in/hadoopSlide 5 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Big Data - Challenges Increasing Data Volumes New data sources and types Email and documents Social Media, Web Logs Machine Device(Scientific) Transactions, OLTP, OLAP
  • 6. Slide 6 www.edureka.in/hadoop Job Trends: Hadoop and Java Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
  • 7. Slide 7 www.edureka.in/hadoop Job Trends: Hadoop and Java Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
  • 8. www.edureka.in/hadoopSlide 8 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Jobs in Hadoop Big Data has opened up the door to new job opportunities, to name a few:  Hadoop Developer  Hadoop Architects  Hadoop Engineers  Hadoop Application Developer  Data Analysts  Data Scientists  Business Intelligence (BI) Architects  Big Data Engineer
  • 9. www.edureka.in/hadoopSlide 9 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Hadoop for Java Professionals Hadoop is red-hot as it:  allows distributed processing of large data sets across clusters of computers using simple programming model.  has become the de facto standard for storing, processing, and analyzing hundreds of terabytes and petabytes of data.  Is cheaper to use in comparison to other traditional proprietary technologies such as Oracle, IBM etc. It can runs on low cost commodity hardware.  Can handle all types of data from disparate systems such server logs, emails, sensor data, pictures, videos etc.
  • 10. www.edureka.in/hadoopSlide 10 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Hadoop for Java Professionals (Contd.) Hadoop is Natural career progression for Java professionals.  It is a Java-based framework and written entirely in Java.  The combination of Hadoop and Java skills is the number one combination in demand among all Hadoop Jobs.  Java skills comes handy while writing code for the following in Hadoop:  MapReduce programming using Java  User Defined Functions (UDFs) in PIG and Hive scripts of Hadoop Applications  Client Applications in HBase
  • 11. www.edureka.in/hadoopSlide 11 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Hadoop for Big Data  Apache Hadoop is a framework that allows for the distributed processing of large data sets across clusters of commodity computers using a simple programming model.  It is an Open-source Data Management with scale-out storage & distributed processing.
  • 12. www.edureka.in/hadoopSlide 12 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Hadoop and MapReduce Hadoop is a system for large scale data processing. It has two main components:  HDFS – Hadoop Distributed File System (Storage)  highly fault-tolerant  high throughput access to application data  suitable for applications that have large data set  Natively redundant MapReduce (Processing)  software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) in a reliable, fault-tolerant manner  Splits a task across processors Map-Reduce Key Value
  • 13. www.edureka.in/hadoopSlide 13 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions HDFS (Hadoop Distributed File System) Pig Latin Data Analysis Hive DW System MapReduce Framework HBase Important Hadoop Eco-System components
  • 14. www.edureka.in/hadoopSlide 14 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions What is Map - Reduce? cloud support database  Map - Reduce is a programming model  It is neither platform- nor language-specific  Record-oriented data processing (key and value)  Task distributed across multiple nodes  Where possible, each node processes data stored on that node  Consists of two phases  Map  Reduce ValueKey MapReduce
  • 15. www.edureka.in/hadoopSlide 15 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions What is Map - Reduce? (Contd.) cloud support database Process can be considered as being similar to a Unix pipeline cat /my/log | grep '.html' | sort | uniq –c > /my/outfile MAP SORT REDUCE
  • 16. www.edureka.in/hadoopSlide 16 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions A Sample MapReduce program in Java
  • 17. www.edureka.in/hadoopSlide 17 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Problem – Data Processing
  • 18. www.edureka.in/hadoopSlide 18 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Huge Raw XML files with unstructured data line reviews Map Reduce HDFS Category hash url +tive -tive total Problem - Data Processing Output
  • 19. www.edureka.in/hadoopSlide 19 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Other Applications of Java Skills in Hadoop – UDFs
  • 20. www.edureka.in/hadoopSlide 20 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Pig is a High-level, declarative data flow language.  It is at the top of Hadoop and makes it possible to create complex jobs to process large volumes of data quickly and efficiently.  Similar to SQL query where the user specifies the “what” and leaves the “how” to the underlying processing engine. Hadoop Pig User Defined Functions (UDFs) in PIG
  • 21. www.edureka.in/hadoopSlide 21 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions public class IsOfAge extends FilterFunc { @Override public Boolean exec(Tuple tuple) throws IOException { if (tuple == null || tuple.size() == 0) { return false; } try { Object object = tuple.get(0); if (object == null) { return false; } int i = (Integer) object; if (i == 18 || i == 19 || i == 21 || i == 23 || i == 27) { return true; } else { return false; } } catch (ExecException e) { throw new IOException(e); } } }  A Program to create UDF: Pig Latin – Creating UDF
  • 22. www.edureka.in/hadoopSlide 22 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions How to call a UDF? register myudf.jar; X = filter A by IsOfAge(age); Pig and UDF
  • 23. Slide 23 Questions? Buy Complete Course at : www.edureka.in/hadoop Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in/hadoop Interested in learning “Big-Data & Hadoop”? Let us know by mailing us at sales@edureka.in