Big Data Ecosystem- Impetus Technologies

7,427 views

Published on

A presentation on Overview of Big Data.

By Impetus Technologies

Published in: Technology

Big Data Ecosystem- Impetus Technologies

  1. 1. Impetus Technologies Inc. 1 © 2014 Impetus Technologies Big Data Presented By: Sanjay Sharma
  2. 2. Outline • About • Big Data: Recap • Big Data Technologies Landscape • Hadoop Overview • Other Big Data Tools Overview • Impact on IT and us 2 © 2014 Impetus Technologies
  3. 3. About • Big Data Solution Architect • Work for Impetus Technologies • Based out of San Jose, Atlanta & • India(1300+ Engineers) • Thought Leaders in Big Data consulting • Started Big Data Labs 4 years ago 3 © 2014 Impetus Technologies
  4. 4. Big Data 4 © 2014 Impetus Technologies Velocity Volume Variety Big Data
  5. 5. Big Data Opportunity Visualization Personalization 5 © 2014 Impetus Technologies Optimization Advanced Predictive Analytics Time to Market Business Opportunity
  6. 6. Big Data Users- 2009 6 © 2014 Impetus Technologies
  7. 7. Big Data Users – 2010 7 © 2014 Impetus Technologies
  8. 8. Big Data Job Trends 8 © 2014 Impetus Technologies Top Job Trends (Indeed.com July 2012) HTML5 MongoDB iOS Android Mobile app Puppet Hadoop jQuery PaaS Social Media
  9. 9. Big Data Every Where Search : Atlanta, GA Date : 10/16/2012 “Big Data” = 82 “Hadoop” =78 “NoSQL” = 117 MPP DBs = 192 9 © 2014 Impetus Technologies Search : USA Date : 10/16/2012 “Big Data” = 5169 “Hadoop” = 5174 “NoSQL” = 3820 MPP DBs= 4581
  10. 10. Big Data Future Source:: McKinsey-http:// www.mckinsey.com/Insights/MGI/Research/Technology_and_Innovation/Big_data_The_next_frontier_for_ innovation 10 © 2014 Impetus Technologies
  11. 11. Big Data Landscapes 11 © 2014 Impetus Technologies Hadoop MPP NewSQL NoSQL • Scalability Limits • Online vs. Batch • Open Source • Writes/Reads • Petabyte Scale • Commodity • DW Vendors • Appliances
  12. 12. Big Data Vendor Galore 12 © 2014 Impetus Technologies
  13. 13. Hadoop: Glory to the Elephant 13 © 2014 Impetus Technologies
  14. 14. Hadoop Distributed File System Distributed Processing System • Petabyte Scale • Thousands of Commodity Servers • High Availability • High Fault Tolerant 14 © 2014 Impetus Technologies • Simple easy to code Algorithm • Code once Run on PBs • High Fault Tolerance • Data Locality
  15. 15. Map Concept: RDBMS BIG COMBINED TABLE id Name © 2012 Impetus Technologies 15 © 2014 Impetus Technologies Other Columns. 1 Scott .. 2 Bob .. 3 Lisa .. 4 Sanjay .. …. .. 256 million Bob Select count(*), ‘Bob’ from table Select count(*), ‘Scott’ from table nwahmeree=“Bob”; where name=“Scott”; ->12,’Bob’ ->4,’Scott’ TABLE 2- on m/c 2 Id Name 64 million and 1 Scott 64 million and 2 Bob 64 million and 3 Lisa 64 … million and 4 Sanjay … 128 million Select Bob count(*), ‘Bob’ from table wh eSreelenactmceo=u“nBt(o*b),”;‘Bob’ from table where ”Bob”,3 ”Scott”,1 “Sanjay”,0 TABLE 3- on m/c 3 Id Name 128 million and 1 Scott 128 million and 2 Bob 128 million and 3 Lisa 128 million and 4 Sanjay …… 192 million Bob name=“Bob”; ”Bob”,3 ”Scott”,1 “Sanjay”,1 TABLE 4- on m/c 4 Id Name 192 million and 1 Scott 192 million and 2 Bob 192 million and 3 Lisa 192 million and 4 Lisa …… 256 million Bob Select count(*), ‘Bob’ from table where name=“Bob”; ”Bob”,3 ”Scott”,1 “Sanjay”,1 TABLE 1- on m/c 1 Id Name 1 Scott 2 Bob 3 Lisa 4 Sanjay …… 64 million Bob Select ‘Bob’,count(*), from table where name=“Bob”; <- same queries for ‘Scott’ & ‘Sanjay” ”Bob”,3 ”Scott”,1 “Sanjay”,2
  16. 16. Reduce Concept: RDBMS “Scott”,list([1,1,1,1]) List[1,1,1,1].iterate-> Sum(EACH) 16 © 2014 Impetus Technologies “Bob”,list([3,3,3,3]) List[3,3,3,3].iterate-> Sum(EACH) “Sanjay” ,list([2,0,1,1]) List[2,0,1,1].iterate-> Sum(EACH)
  17. 17. Hadoop DFS 17 © 2014 Impetus Technologies
  18. 18. Hadoop Map Reduce 18 © 2014 Impetus Technologies • map (k1,v1) list(k2,v2) reduce (k2,list(v2)) list(v2)
  19. 19. Hadoop Map Reduce 19 © 2014 Impetus Technologies
  20. 20. Hadoop Ecosystem 20 © 2014 Impetus Technologies
  21. 21. NoSQL: “No to SQL” OR “Not Only 21 © 2014 Impetus Technologies SQL”
  22. 22. NoSQL Overview 22 © 2014 Impetus Technologies NoSQL Models Volatile Storage Memcached, Ehcache Persistence Storage Key / Value Databases Voldemort, Redis, Scalaris Columnar Databases Hbase, Cassandra, Hypertable Document Databases MongoDB, CouchDB Graph Databases InfoGrid, Neo4j Other Databases Kyotocabinet, Berkley DB
  23. 23. NoSQL Characterstics 23 © 2014 Impetus Technologies TYPICAL BENEFITS  Scalability  Availability  Near-Real time Performance  Modeling flexibility  Deployment flexibility Auto- Sharding Failover Schema- less Intelligent client In-memory flush to disk Dynamic clustering
  24. 24. MPP: Massively Parallel Processing 24 © 2014 Impetus Technologies DW
  25. 25. MPP/ Columnar Stores • Oracle Exadata • IBM Netezza • Teradata • EMC Greenplum • HP Vertica • ParAccel • Microsoft SQL Server PDW 25 © 2014 Impetus Technologies
  26. 26. 26 © 2014 Impetus Technologies
  27. 27. Big Data: Microsoft 27 © 2014 Impetus Technologies
  28. 28. NewSQL: New Generation DB 28 © 2014 Impetus Technologies
  29. 29. New SQL / Cloud DB 29 © 2014 Impetus Technologies • SimpleDB • DynamoDB • NuoDB • Totutek • VoltDB • NimbusDB • Clustrix • Xeround
  30. 30. ETL, BI & Reporting 30 © 2014 Impetus Technologies
  31. 31. ETL, BI & Reporting • Hadoop/ MPP/ NoSQL support in- • Informatica Datastage • Talend, Pentaho • Microstrategy, SAS • Tableau, Qlikview, Intellicus 31 © 2014 Impetus Technologies
  32. 32. 32 © 2014 Impetus Technologies Big Data & Cloud
  33. 33. Big Data & Cloud • Marriage made in heaven • Big data demands met by Cloud scalability • IAAS, PAAS and DAAS offerings • AWS EMR, SimpleDB, RDS • Azure SQL Server, Hadoop • Google 33 © 2014 Impetus Technologies
  34. 34. Real Time Analytics 34 © 2014 Impetus Technologies
  35. 35. Real Time Analytics • Storm • HStreaming, StreamBase • Microsoft StreamInsight • IBM Streams • Oracle SQLstream • Complex Event Processing engines- Esper etc. 35 © 2014 Impetus Technologies
  36. 36. Big Data Impact on us 36 © 2014 Impetus Technologies
  37. 37. Big Data Careers • ETL Developers • Database Administrators • Database SQL Developers • Solution/ Technical Architects • Data Scientists 37 © 2014 Impetus Technologies Enhance OR Extend - NOT Replace
  38. 38. Hadoop/Hive Developers - Java, Hive 38 © 2014 Impetus Technologies Hadoop Architects - Java, DW, ETL Hadoop Administrators - Linux, Java NoSQL Developers - Java/ Python/ Ruby MPP DW Developers - SQL, Data Modeling MPP DW Admin - Linux, SQL Data Scientist -Machine Learning Big Data Architect - Solution/ Technical Architecture Some Big Data Careers
  39. 39. Typical Big Data Architecture 39 © 2014 Impetus Technologies
  40. 40. Credits & Acknowledgements • Company Logos – Creative Commons/ Company Copyrighted/ Trademarked • Hadoop Elephant Images– Apache Trademarked • Cloudera.com, hadoop.apache.org, Oracle big data web site, Indeed.com, McKinsey report, dzone.org. microsoft.com • The Awesome at Impetus- Team of Big Data architects and practitioners 40 © 2014 Impetus Technologies
  41. 41. 41 © 2014 Impetus Technologies Thank You Write to us at inquiry@impetus.com Follow us on Twitter @impetustech

×