The Hadoop Ecosystem© Hortonworks Inc. 2012
What is Big Data?• Big does not have to be always  Petabytes• Big refers to big enough for traditional  systems to handle ...
Big Data Facts• Twitter generates 8TB of data every day• eBay data warehouse is 10+ PB• Facebook data warehouse is 36+ PB•...
Data Types• Structured  – Pre-defined schema  – Example: relational database system• Semi Structured  – No identifiable st...
Characteristics of Big Data• Volume• Velocity• Variety• Value                  Copyright Hortonworks 2012   5
Problem with Legacy Solution• Expensive   – Scale up costs lots of $$• Rigid• Stale Data                         Copyright...
Hadoop Approach• Process data locally• Expect Hardware failures• Handle failover elegantly• Duplicate a small percentage o...
Compare with RDBMS     Copyright Hortonworks 2012   8
Hadoop Core Components
Hadoop Cluster – Basic configuration             Copyright Hortonworks 2012   10
MapReduce In ActionLogical Physical                              11
Hadoop Ecosystem                                                                              Develop                     ...
What Next?1                                 Download Hortonworks Data Platform                                  hortonwork...
Hortonworks Support SubscriptionsObjective: help organizations to successfully developand deploy solutions based upon Apac...
Hortonworks TrainingObjective: help organizations overcome Hadoopknowledge gaps• Expert role-based training for developers...
Upcoming SlideShare
Loading in …5
×

NYC-Meetup- Introduction to Hadoop Echosystem

658
-1

Published on

NYC-Meetup- Introduction to Hadoop Echosystem

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
658
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • Hi, My Name is Abhijit Lele, I am a solutions Engineer @ hortonworks. I support our customers to understand and achieve their business and technical goals with Hadoop and Big data ecosystem in general.
  • So if we were to turn our original assumptions on their respective heads, we might be able to come up with an alternate set of rules, that allow for a new way of thinking about large data stores.
  • NYC-Meetup- Introduction to Hadoop Echosystem

    1. 1. The Hadoop Ecosystem© Hortonworks Inc. 2012
    2. 2. What is Big Data?• Big does not have to be always Petabytes• Big refers to big enough for traditional systems to handle efficiently
    3. 3. Big Data Facts• Twitter generates 8TB of data every day• eBay data warehouse is 10+ PB• Facebook data warehouse is 36+ PB• Yahoo! Has 100+ PB data• Google scans and indexes 500+ PB data
    4. 4. Data Types• Structured – Pre-defined schema – Example: relational database system• Semi Structured – No identifiable structure – Cannot be stored in rows and tables in a database – Examples : logs, tweets,• Un Structured – Irregular structure or it lacks structure – Examples: free-form text, reports, customer feedback forms Copyright Hortonworks 2012 4
    5. 5. Characteristics of Big Data• Volume• Velocity• Variety• Value Copyright Hortonworks 2012 5
    6. 6. Problem with Legacy Solution• Expensive – Scale up costs lots of $$• Rigid• Stale Data Copyright Hortonworks 2012 6
    7. 7. Hadoop Approach• Process data locally• Expect Hardware failures• Handle failover elegantly• Duplicate a small percentage of the data to small groups (versus entire database)
    8. 8. Compare with RDBMS Copyright Hortonworks 2012 8
    9. 9. Hadoop Core Components
    10. 10. Hadoop Cluster – Basic configuration Copyright Hortonworks 2012 10
    11. 11. MapReduce In ActionLogical Physical 11
    12. 12. Hadoop Ecosystem Develop Analyze Visualize Hortonworks Data Platform Scripting Query Management & Monitoring (Pig) (Hive) (Sqoop, Talend, WebHDFS, WebHCatalog) NoSQL Column DB Workflow & Scheduling Data Extraction & Load (Ambari, Zookeeper) (HBase) Metadata Management (HCatalog) (Oozie)Operate Integrate Distributed Processing (MapReduce) Distributed Storage (HDFS)
    13. 13. What Next?1 Download Hortonworks Data Platform hortonworks.com/download2 Use the getting started guide hortonworks.com/get-started3 Learn more… get support Hortonworks Support • Expert role based training • Full lifecycle technical support • Course for admins, developers across four service levels and operators • Delivered by Apache Hadoop • Certification program Experts/Committers • Custom onsite options • Forward-compatible hortonworks.com/training hortonworks.com/support Page 13 © Hortonworks Inc. 2012
    14. 14. Hortonworks Support SubscriptionsObjective: help organizations to successfully developand deploy solutions based upon Apache Hadoop• Full-lifecycle technical support available – Developer support for design, development and POCs – Production support for staging and production environments – Up to 24x7 with 1-hour response times• Delivered by the Apache Hadoop experts – Backed by development team that has released every major version of Apache Hadoop since 0.1• Forward-compatibility – Hortonworks’ leadership role helps ensure bug fixes and patches can be included in future versions of Hadoop projects Page 14 © Hortonworks Inc. 2012
    15. 15. Hortonworks TrainingObjective: help organizations overcome Hadoopknowledge gaps• Expert role-based training for developers, administrators & data analysts – Heavy emphasis on hands-on labs – Extensive schedule of public training courses available (hortonworks.com/training)• Comprehensive certification programs• Customized, on-site courses available Page 15 © Hortonworks Inc. 2012

    ×