PROFIT                  FROM ALL OF                  YOUR DATAFebruary 2012Hadoop in the EnterpriseAdam Smieszny | Systems...
Agenda    • Hadoop Overview      • History of Hadoop      • What is Hadoop      • Hadoop in the Enterprise2               ...
Existing Data Management                                          10,000GIGABYTES OF DATA CREATED (IN BILLIONS)           ...
Why the Need for Hadoop?                                           10,000 GIGABYTES OF DATA CREATED (IN BILLIONS)         ...
The Origins of Hadoop                                                                                                   La...
What is Apache Hadoop?                                                                            CORE HADOOP COMPONENTS  ...
What is CDH?   Cloudera’s Distribution Including   Apache Hadoop (CDH) is an enterprise-ready   distribution of Hadoop tha...
CDH & Enterprise Ecosystem                   Drivers, language enhancements, testing                   File System Mount  ...
Hadoop / RDBMS Use Cases                                                                   Create context                 ...
Hadoop in Production How Apache Hadoop fits into your existing infrastructure.     OPERATORS                              ...
Hadoop Use CasesUse Case                     Application                     Industry                               Applic...
Use Case: Customer Risk Build comprehensive data picture of customer side risk     Publish a consolidated set of attribute...
Use Case: Sentiment Analysis Internet generates a lot of chatter about brands    Understanding what’s being said is crucia...
Journey of CDH UsersDiscover the Benefits                                  Deploy                             Subscribe to...
Get Hadoop                http://www.cloudera.com/hadoop/                                                      cloudera.co...
Upcoming SlideShare
Loading in...5
×

Boston HUG - Cloudera presentation

1,688

Published on

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,688
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
79
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • FinSvc companies are realizing that they need to understand the fundamental risk in their customer base.All of a bank’s working capital originals with customers.Being able to better predict fluctuations can help them optimize how to put that capital to work.
  • Much of the discussions about brands today happens in the social media.This not only impacts the companies perception but can have a direct influence on relationships with customers and the ability to sell.Hadoop is a natural solution for gathering and contextualizing discussions about company brands and products.
  • Boston HUG - Cloudera presentation

    1. 1. PROFIT FROM ALL OF YOUR DATAFebruary 2012Hadoop in the EnterpriseAdam Smieszny | Systems Engineer
    2. 2. Agenda • Hadoop Overview • History of Hadoop • What is Hadoop • Hadoop in the Enterprise2 ©2011 Cloudera, Inc. All Rights Reserved.
    3. 3. Existing Data Management 10,000GIGABYTES OF DATA CREATED (IN BILLIONS) Current Database Solutions are designed for structured data.  Optimized to answer known questions quickly  Schemas dictate form/context 5,000  Difficult to adapt to new data types and new questions  Expensive at Petabyte scale 0 10% 2005 2010 2015 STRUCTURED DATA UNSTRUCTURED DATA 3 ©2011 Cloudera, Inc. All Rights Reserved.
    4. 4. Why the Need for Hadoop? 10,000 GIGABYTES OF DATA CREATED (IN BILLIONS) 1.8 trillion gigabytes of data was created in 2011…  More than 90% is unstructured data  Approx. 500 quadrillion files 5,000  Quantity doubles every 2 years More More Content Devices New & New Better Sources Info 0 2005 2010 2015 STRUCTURED DATA UNSTRUCTURED DATASource: IDC 2011 4 ©2011 Cloudera, Inc. All Rights Reserved.
    5. 5. The Origins of Hadoop Launches SQL support for Hadoop Open Source Open source web Publishes MapReduce MapReduce and Runs 4,000-node Hadoop wins Terabyte Releases CDH and crawler project created and GFS Paper HDFS project created Hadoop cluster sort benchmark Cloudera Enterprise by Doug Cutting by Doug Cutting2002 2007 2012 5 ©2011 Cloudera, Inc. All Rights Reserved.
    6. 6. What is Apache Hadoop? CORE HADOOP COMPONENTS Hadoop is a platform for data storage and processing that is… Hadoop MapReduce Distributed File  Scalable System (HDFS)  Fault tolerant  Open source File Sharing & Data Protection Across Distributed Computing Across Physical Servers Physical Servers Flexibility Scalability Low Cost A single repository for storing  Scale-out architecture divides  Can be deployed on commodity processing & analyzing any type workloads across multiple hardware of data nodes  Open source platform guards Not bound by a single schema  Flexible file system eliminates against vendor lock ETL bottlenecks 6 ©2011 Cloudera, Inc. All Rights Reserved.
    7. 7. What is CDH? Cloudera’s Distribution Including Apache Hadoop (CDH) is an enterprise-ready distribution of Hadoop that is…  100% Apache open source  Contains all components needed for deployment  Fully documented and supported  Released on a reliable schedule Fastest Path to Success Stable and Reliable Community Driven No need to write your own scripts or  Extensive Cloudera QA systems,  Incorporates only main-line do integration testing on different software & processes components from the Apache components Hadoop ecosystem – no forks or  Tested & run in production at scale proprietary underpinnings Works with a wide range of operating  Proven at scale in dozens of systems, hardware, databases and  FREE enterprise environments data warehouses 7 ©2011 Cloudera, Inc. All Rights Reserved.
    8. 8. CDH & Enterprise Ecosystem Drivers, language enhancements, testing File System Mount UI Framework SDK FUSE-DFS HUE HUE SDK Sqoop Workflow APACHE OOZIE Scheduling APACHE OOZIE Metadata APACHE HIVE frame- work, Languages / Compilers More adapters Data Integration APACHE PIG, APACHE HIVE Fast Read/Write coming… Access APACHE FLUME, APACHE SQOOP APACHE HBASE Coordination APACHE ZOOKEEPER Packaging, testing8
    9. 9. Hadoop / RDBMS Use Cases Create context Analyze unstructured data (classification, text mining) Parse, aggregate Analyze, report semi-structured data Active archival Analyze, report Long running queries structured dataSlide borrowed from Krishnan Parasuraman presentation at Enzee’11 9 Copyright 2011 Cloudera Inc. All rights reserved
    10. 10. Hadoop in Production How Apache Hadoop fits into your existing infrastructure. OPERATORS ENGINEERS ANALYSTS BUSINESS USERS CUSTOMERS Management Enterprise Web IDE’s BI / Analytics Tools Reporting Application Enterprise Data Warehouse Low-Latency Serving Systems Relational Logs Files Web Data Data10 ©2011 Cloudera, Inc. All Rights Reserved.
    11. 11. Hadoop Use CasesUse Case Application Industry Application Use Case Social Network Analysis Web Clickstream Sessionization Content Optimization Media Clickstream Sessionization ADVANCED ANALYTICS DATA PROCESSING Network Analytics Telco Mediation Loyalty & Promotions Retail Data Factory Analysis Fraud Analysis Financial Trade Reconciliation Entity Analysis Federal SIGINT Sequencing Analysis Bioinformatics Genome Mapping 11 ©2011 Cloudera, Inc. All Rights Reserved.
    12. 12. Use Case: Customer Risk Build comprehensive data picture of customer side risk Publish a consolidated set of attributes for analysis Map ratings across products Parse and aggregate data from difference sources Credit and debit cards, product payments, deposits and savings Banking activity, browsing behavior, call logs, e-mails and chats Merge data into a single view A “fuzzy join” among data sources Structure and normalize attributes Sentiment analysis, pattern recognition12 Copyright 2010 Cloudera Inc. All rights reserved
    13. 13. Use Case: Sentiment Analysis Internet generates a lot of chatter about brands Understanding what’s being said is crucial to protecting brand value Facebook, Twitter generate a lot of data for a global top brand Capturing and Processing direct feedback Better engagement and alerting via Sentiment Analysis Not yet ready for fully automated customer service Hadoop handles the diverse data types and processing Sources of data changing and semantics continuously evolving Sophistication of algorithms is improving daily13 Copyright 2010 Cloudera Inc. All rights reserved
    14. 14. Journey of CDH UsersDiscover the Benefits Deploy Subscribe to of Apache Hadoop CDH Cloudera Enterprise Gain the flexibility to store and mine The fastest, surest path to success Simplify and accelerate Apache all types of data with Apache Hadoop Hadoop deployment ••• ••• ••• Leverage the scale-out architecture Stable, reliable version of Apache Reduce adoption costs and risks for complex data analysis Hadoop without the vendor lock-in ••• ••• imposed by proprietary vendors More effectively manage cluster Easily scale to meet growing data ••• resources requirements Integrates with your other ••• ••• technology platforms ensuring Leverage the experience of our investment protection experts Avoid vendor lock-in with an open source technology14 ©2011 Cloudera, Inc. All Rights Reserved.
    15. 15. Get Hadoop http://www.cloudera.com/hadoop/ cloudera.com twitter.com/ cloudera facebook.com/ cloudera15 ©2011 Cloudera, Inc. All Rights Reserved.
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×