Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

What is Big Data and Why Learn Hadoop


Published on

With increasing amount of digital data, it has become essential to find a technology that can be used to analyse this data. The real big question is 'Why Big Data should matter to you?'
You can watch the video for more information on this.

Topics Included in this Presentation:
Big Data
Big Data and Hadoop
Why Hadoop?
Hadoop: The future of Data Management
Hadoop: Job Roles
Hadoop: Growth and Job Opportunities

For more information:

Experience Instructors Led Online Training with 24x7 support at Edureka.
Edureka provides online training courses for Big Data and Hadoop, Hadoop Admin, Cassandra, Data Science, Cloud Computing, Android Development.
Please write back to us at or call us at +91-8880862004 for more information.

Published in: Technology, Business
  • Be the first to comment

What is Big Data and Why Learn Hadoop

  1. 1. Slide 1 What is Big Data and Why learn Hadoop View Hadoop Courses at : * Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
  2. 2. 2 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Objectives of this Session • Un • What is Big Data • Traditional Warehouse vs. Hadoop – Sears Case Study • Why Should I Learn Hadoop & Related Technologies • Jobs and Trends in Big Data • Hadoop Architecture and Eco-System For Queries during the session and class recording: Post on Twitter @edurekaIN: #askEdureka Post on Facebook /edurekaIN
  3. 3. 3 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Big Data  Lots of Data (Terabytes or Petabytes)  Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications  The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization cloud tools statistics No SQL compression storage support database analyze information terabytes processing mobile Big Data
  4. 4. 4 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Unstructured Data is Exploding  2,500 exabytes of new information in 2012 with internet as primary driver  “Digital universe grew by 62% last year to 800K petabytes and will grow to1.2 zettabytes” this year
  5. 5. 5 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Big Data - Challenges Increasing Data Volumes New data sources and types Email and documents Social Media, Web Logs Machine Device (Scientific) Transactions, OLTP, OLAP
  6. 6. 6 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Big Data is here Bad News We are struggling to store, process and analyze it. Good News Big Data - Challenges (Contd.)
  7. 7. 7 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Common Big Data Customer Scenarios  Banks and Financial services  Modeling True Risk  Threat Analysis  Fraud Detection  Trade Surveillance  Credit Scoring and Analysis  Retail  Point of Sales Transaction Analysis  Customer Churn Analysis  Sentiment Analysis
  8. 8. 8 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Hidden Treasure – Case Study Case Study: Sears Holding Corporation X *Sears was using traditional systems such as Oracle Exadata, Teradata and SAS etc. to store and process the customer activity and sales data.  Insight into data can provide Business Advantage.  Some key early indicators can mean Fortunes to Business.  More Precise Analysis with more data.
  9. 9. 9 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions 90% of the ~2PB Archived Storage Processing Instrumentation BI Reports + Interactive Apps RDBMS (Aggregated Data) ETL Compute Grid 3. Premature data death 1. Can’t explore original high fidelity raw data 2. Moving data to compute doesn’t scale Mostly Append A meagre 10% of the ~2PB Data is available for BI Storage only Grid (original Raw Data) Collection Limitations of Existing Data Analytics Architecture
  10. 10. 10 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions *Sears moved to a 300-Node Hadoop cluster to keep 100% of its data available for processing rather than a meagre 10% as was the case with existing Non-Hadoop solutions. No Data Archiving 1. Data Exploration & Advanced analytics 2. Scalable throughput for ETL & aggregation 3. Keep data alive forever Mostly Append Instrumentation BI Reports + Interactive Apps RDBMS (Aggregated Data) Collection Hadoop : Storage + Compute Grid Entire ~2PB Data is available for processing Both Storage And Processing Solution: A Combined Storage Computer Layer
  11. 11. 11 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Why move to Hadoop? Hadoop is red-hot as it:  allows distributed processing of large data sets across clusters of computers using simple programming model.  has become the de facto standard for storing, processing, and analyzing hundreds of terabytes and petabytes of data.  Is cheaper to use in comparison to other traditional proprietary technologies such as Oracle, IBM etc. It can runs on low cost commodity hardware.  Can handle all types of data from disparate systems such server logs, emails, sensor data, pictures, videos etc.
  12. 12. Slide 12 Hadoop: Growth and Job Opportunities (Contd.) Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions As per the 2012-13 Salary Survey by Dice, a leading career site for technology and engineering professionals:  Out of the big three, mobile, cloud and data, there’s one that is having a disproportionate impact on salaries – it’s big data.  Salaries reported by those who regularly use Hadoop, NoSQL, and Mongo DB are all north of $100,000. By comparison, average salaries for technologies closely associated with cloud and virtualization are just under $90,000. “We’ve heard it’s a fad, heard it’s hyped and heard it’s fleeting, yet it’s clear that data professionals are in demand and well paid. Tech professionals who analyse large data streams and strategically impact the overall business goals of a firm have an opportunity to write their own ticket." said Alice Hill, Managing Director of
  13. 13. 13 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Hadoop is in Demand! Big Data Analyst Big Data Architect Big Data Engineer Big Data Research Analyst Big Data Visualizer Data Scientist 50 43 44 31 23 18 50 57 56 69 77 82 Filled job vs unfilled jobs in big data Filled Unfilled Vacancy/Filled(%) Gartner Says Big Data Creates Big Jobs: 4.4 Million IT Jobs Globally to Support Big Data By 2015
  14. 14. Slide 14 Hadoop: Growth and Job Opportunities (Contd.) Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions 60000 65000 70000 75000 80000 85000 90000 95000 100000 105000 110000 Salary – Other Technologies vs Hadoop Salaries (USD)
  15. 15. 15 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Hadoop for Big Data  Apache Hadoop is a framework that allows for the distributed processing of large data sets across clusters of commodity computers using a simple programming model.  It is an Open-source Data Management with scale-out storage & distributed processing.
  16. 16. 16 Apache Oozie (Workflow) HDFS (Hadoop Distributed File System) Pig Latin Data Analysis Mahout Machine Learning Hive DW System MapReduce Framework HBase Flume Sqoop Import Or Export Unstructured or Semi-Structured data Structured Data Hadoop Eco-System ETL/DW Professionals Developers / Programmers DBA / Administrators Twitter @edurekaIN, Facebook /edurekaIN, use #askedureka for Questions
  17. 17. 17 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Hadoop and MapReduce Hadoop is a system for large scale data processing. It has two main components:  HDFS – Hadoop Distributed File System (Storage)  highly fault-tolerant  high throughput access to application data  suitable for applications that have large data set  Natively redundant MapReduce (Processing)  software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) in a reliable, fault-tolerant manner  Splits a task across processors Map-Reduce Key Value
  18. 18. Slide 18 BATCH (MapReduce) INTERACTIVE (Text) ONLINE (HBase) STREAMING (Storm, S4, …) GRAPH (Giraph) IN-MEMORY (Spark) HPC MPI (OpenMPI) OTHER (Search) (Weave..) Hadoop 2.0 : Much More is Possible Twitter @edurekaIN, Facebook /edurekaIN, use #askedureka for Questions
  19. 19. Further Reading  Big Prospects for Big Data  Hadoop Learners Profile  Big Bucks for Big Data  5 Reasons to Learn Hadoop  Increasing Demand for ‘Hadoop and NoSQL skills’
  20. 20. Slide 20 Questions? Enroll for the Complete Course at : Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Type Enroll in the questions window if you want edureka to contact you Class Recording and Presentation will be available in 24 hours at: