Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Webinar: Big Data & Hadoop - When not to use Hadoop


Published on

The free webinar on Big Data & Hadoop titled " When not to use Hadoop " conducted by Edureka on 11th December 2014

Published in: Technology

Webinar: Big Data & Hadoop - When not to use Hadoop

  1. 1. When not to use Hadoop View Big Data and Hadoop Course at: For more details please contact us: US : 1800 275 9730 (toll free) INDIA : +91 88808 62004 Email Us : For Queries: Post on Twitter @edurekaIN: #askEdureka Post on Facebook /edurekaIN
  2. 2. Slide2 Objectives At the end of this module, you will be able to… Understand When not to use Hadoop »Real Time Analytics »Not a Replacement »Dataset Size »Complexity »Security Understand When to use Hadoop »Huge Unstructured Datasets »Response Time is Not an Issue »Future Planning »Multiple Frameworks for Big Data »Lifetime Data Availability
  3. 3. Slide3 Hadoop Mania
  4. 4. Slide4 When Not To Use Hadoop
  5. 5. Slide5 If you want to do some Real Time Analytics, where you are expecting the result quickly, Hadoop should not be used directly Hadoop works on Batch processing, hence the response time is high Day1 Day2 Day 3 Day 4 ......... ………. ………. Day n Day1 Day2 Day 3 Day 4 ......... ………. ………. Day n Input Data Processing Data Input Data Processing Data Input Data Processing Data Input Data Processing Data using MR Time Lag Real Time Analytics
  6. 6. Slide6 Real Time Analytics –Accepted Way Streaming Data Storing
  7. 7. Slide7 14 sec 0.6 sec Real Time Analytics –Accepted Way (Contd.)
  8. 8. Slide8 Hadoop is not a replacement for your existing data processing infrastructure After processing the data in Hadoop you need to send the output to relational database technologies for BI, decision support, reporting etc. It is not going to replace your database, but your database isn’t likely to replace Hadoop either Different tools for different jobs Not a Replacement for Existing Infrastructure
  9. 9. Slide9 Hadoop framework is not recommendable for small structured datasets as you have other tools available in the market which can do this work quite easily and at a fast pace than Hadoop like MS excel, RDBMS etc. For a small data analytics, Hadoop can be costlier than other tools Merge all the small files into one Multiple Smaller Datasets –Accepted Way
  10. 10. Slide10 Multiple Smaller Datasets –Accepted Way4225284 EachfileofxMB Slow Execution –10400 ms4225284 Alltheabovefilesmergedintoonefile(9xMB) Fast Execution –6140 ms Same Output Same Input
  11. 11. Slide11 Unless you have a better understanding of the Hadoop framework, its not suggested to use Hadoop for production Learning Hadoop and its eco-system tools and deciding which technology suits your need is again a different level of complexity Novice Hadoopers
  12. 12. Slide12 Many enterprises -especially within the highly regulated industries dealing with sensitive data -aren’t able to move as quick as they would like, towards implementing Big Data projects and Hadoop “Example Health-care data used by Insurance companies to calculate premium” Where Security is the Primary Concern? They don’t have to hesitate though, as many of the security and compliance challenges are being continuously worked upon and can be surmountable (for example, by using Apache Accumulo on top of Hadoop).
  13. 13. Slide13 Where Security is the Primary Concern –Accepted way Healthcare Data Hadoop Analytic Integration Healthcare Data Hadoop Analytic Integration
  14. 14. Slide14 When To Use Hadoop
  15. 15. Slide15 Your have different types of data: structured, semi-structured and unstructured The data set is huge in size i.e. several Terabytes or Petabytes You are not in a hurry for Answers Data Size and Data Diversity
  16. 16. Slide16 To implement Hadoop on your data you should first understand the level of complexity of data and the rate in which it is going to grow So we need a cluster planning, it may begin with building a small or medium cluster in your industry as per data (in GBs or few TBs ) available at present and scale up your cluster in future depending on the growth of your data Future Planning
  17. 17. Slide17 Hadoop can be integrated with multiple analytic tools to get the best out of it, like M-Learning, R , Python, Spark, MongoDB etc. Multiple Frameworks for Big Data
  18. 18. Slide18 When you want your data to be live and running forever, it can be achieved using Hadoop’s scalability Lifetime Data Availability
  19. 19. Slide19
  20. 20. LIVE Online Class Class Recording in LMS 24/7 Post Class Support Module Wise Quiz Project Work Verifiable Certificate Slide20 How it Works?
  21. 21. Slide21 Module 1 »Understanding Big Data and Hadoop Module 2 »Hadoop Architecture and HDFS Module 3 »Hadoop MapReduce Framework -I Module 4 »Hadoop MapReduce Framework -II Module 5 »Advance MapReduce Course Topics Module 6 »PIG Module 7 »HIVE Module 8 »Advance HIVE and HBase Module 9 »Advance HBase Module 10 »Oozie and Hadoop Project
  22. 22. Slide22 Questions Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions