SlideShare a Scribd company logo
Trends and usage of
Apache Hadoop
Eric Baldeschwieler
CEO Hortonworks
Twitter: @jeric14, @hortonworks



January 2012




© Hortonworks Inc. 2011           Page 1
Agenda
• Define terms
  – What is Hadoop? Why does Hadoop matter?


• What drives Hadoop adoption?

• Observed Trends




     Architecting the Future of Big Data
                                              Page 2
     © Hortonworks Inc. 2011
Hortonworks Vision


 We believe that by 2015, more than
    half the world's data will be
   processed by Apache Hadoop


                         How to achieve that vision???
                                 Enable ecosystem around
                                 enterprise-viable platform.




                                                               Page 3
   © Hortonworks Inc. 2011
What is Apache Hadoop?
•  Solution for big data
    –  Deals with complexities of high
       volume, velocity & variety of data

•  Set of open source projects

•  Transforms commodity hardware
   into a service that:
    –  Stores petabytes of data reliably
    –  Allows huge distributed computations

•  Key attributes:
    –  Redundant and reliable (no data loss)
                                                One of the best examples of
    –  Extremely powerful                      open source driving innovation
    –  Batch processing centric                   and creating a market
    –  Easy to program distributed apps
    –  Runs on commodity hardware



                                                                          Page 4
         © Hortonworks Inc. 2011
Hortonworks Data Platform (HDP)
Key Components of “Standard Hadoop” Open Source Stack


     Core Apache Hadoop                                                      Related Hadoop Projects             Open APIs for:
                                                                                                                  •  Data Integration
                                                                                                                  •  Data Movement
                                                                                                                  •  App Job Management
                                                                                                                  •  System Management
                                                                            Pig                      Hive
                                                                         (Data Flow)                     (SQL)
                                             (Columnar NoSQL Store)
                                     HBase



                                                                                  MapReduce
        Zookeeper
                    (Coordination)




                                                                          (Distributed Programing Framework)



                                                                                       HCatalog
                                                                             (Table & Schema Management)



                                                                                 HDFS
                                                                      (Hadoop Distributed File System)




                                                                                                                                 Page 5
      © Hortonworks Inc. 2011
Big Data Trailblazers and Use Cases


                                                                data
                                analyzing web logs            analytics
                   advertising optimization        machine learning
                                                             mail anti-spam
                  text mining web search
                                                        content optimization
                   customer trend analysis
                                                 ad selection
             video & audio processing
                                                         data mining
                             user interest prediction
                                        social media




                                                                               Page 6
   © Hortonworks Inc. 2011
Yahoo!, Apache Hadoop & Hortonworks
http://www.wired.com/wiredenterprise/2011/10/how-yahoo-spawned-hadoop

      Yahoo! embraced Apache Hadoop, an open source platform, to
   crunch epic amounts of data using an army of dirt-cheap servers

                                         2006




                                  Hadoop at Yahoo!
                                    40K+ Servers
                                    170PB Storage
                                  5M+ Monthly Jobs
                                  1000+ Active Users



                                         2011




  Yahoo! spun off 22+ engineers into Hortonworks, a company focused on
    advancing open source Apache Hadoop for the broader market

                                                                         Page 7
        © Hortonworks Inc. 2011
What drives Hadoop adoption?




  Architecting the Future of Big Data
                                        Page 8
  © Hortonworks Inc. 2011
Market Drivers for Apache Hadoop
• Business drivers
  – High-value projects that require use of more data        Gartner predicts
                                                            800% data growth
  – Belief that there is great ROI in mastering big data    over next 5 years



• Financial drivers
  – Growing cost of data systems as percentage of IT spend
  – Cost advantage of commodity hardware + open source
  – Enables departmental-level big data strategies        80-90% of data
                                                            produced today
                                                            is unstructured

• Technical drivers
  – Existing solutions failing under growing requirements
       – 3Vs - Volume, velocity, variety
  – Proliferation of unstructured data

      © Hortonworks Inc. 2011                                           9
      © Hortonworks Inc. 2011
Every Market has Big Data
       Digital data is personal, everywhere, increasingly
      accessible, and will continue to grow exponentially




Source: McKinsey & Company report. Big data: The next frontier for innovation, competition, and productivity. May 2011.


                                                                                                                          Page 10
           © Hortonworks Inc. 2011
Broader Use Case Opportunities
Financial Services                            Healthcare
•  Detect/prevent fraud                       •  Patient monitoring
•  Model and manage risk                      •  Predictive modeling
•  Personalize banking/insurance products     •  Compliance, Archival, text search
•  Compliance, Archival, …                    •  Data driven research
Retail                                        Web / Social / Mobile
•  Behavior analysis                          •  Sentiment analysis
•  Cross selling, recommendation engines      •  Web log, image, and video analysis
•  Optimize pricing, placement, design        •  Personalization
•  Optimize inventory and distribution        •  Billing, Reporting, Network Analysis

Manufacturing                                 Government
•  Simulation, Analysis, Design               •  Detect/prevent fraud
•  Improve service via product sensor data    •  Security & Intelligence
•  “Digital factory” for lean manufacturing   •  Support open data initiatives



                                                                                     Page 11
           © Hortonworks Inc. 2011
Observed Trends




  Architecting the Future of Big Data
                                        Page 12
  © Hortonworks Inc. 2011
Trend: Agile Data
• The old way
  – Operational systems keep only current records, short history
  – Analytics systems keep only conformed / cleaned / digested data
  – Unstructured data locked away in operational silos
  – Archives offline
       – Inflexible, new questions require system redesigns

• The new trend
  – Keep raw data in Hadoop for a long time
  – Able to produce a new analytics view on-demand
  – Keep a new copy of data that was previously on in silos
  – Can directly do new reports, experiments at low incremental cost
  – New products / services can be added very quickly
  – Agile outcome justifies new infrastructure

      Architecting the Future of Big Data
                                                                  Page 13
      © Hortonworks Inc. 2011
Traditional Enterprise Data Architecture
  Data Silos
                                                                 Traditional Data Warehouses,
  Serving Applications                                                   BI & Analytics

Web       NoSQL                              Traditional ETL &
                                                                             Data      BI /
Serving   RDMS
                                …            Message buses             EDW
                                                                             Marts   Analytics




                          Serving   Social     Sensor          Text
                           Logs     Media       Data         Systems    …


                                    Unstructured Systems
                                                                                                 Page 14
          © Hortonworks Inc. 2011
Agile Data Architecture w/Hadoop
  Connecting All of Your Big Data
                                                                 Traditional Data Warehouses,
  Serving Applications                                                   BI & Analytics

Web       NoSQL                              Traditional ETL &
                                                                             Data      BI /
Serving   RDMS
                                …            Message buses             EDW
                                                                             Marts   Analytics




                                         EsTsL (s = Store)
                                         Custom Analytics




                          Serving   Social     Sensor          Text
                           Logs     Media       Data         Systems    …


                                    Unstructured Systems
                                                                                                 Page 15
          © Hortonworks Inc. 2011
Trend: Data driven development
• Limited runtime logic driven by huge lookup tables

• Data computed offline on Hadoop
  – Machine learning, other expensive computation offline
  – Personalization, classification, fraud, value analysis…


• Application development requires data science
  – Huge amounts of actually observed data key to modern services
  – Hadoop used as the science platform




      Architecting the Future of Big Data
                                                               Page 16
      © Hortonworks Inc. 2011
CASE STUDY
     YAHOO! HOMEPAGE

  •  Serving Maps	
                                        SCIENCE      »	
  Machine learning to build ever
            •  Users	
  -­‐	
  Interests	
                  HADOOP         better categorization models
  	
                                                        CLUSTER
  •  Five	
  Minute	
                        USER	
                         CATEGORIZATION	
  
       Produc7on	
                       BEHAVIOR	
                         MODELS	
  (weekly)	
  
  	
  
  •  Weekly	
                                              PRODUCTION
       Categoriza7on	
                                        HADOOP
                                                                        »	
  Identify user interests using
       models	
                          SERVING
                                                              CLUSTER
                                                                           Categorization models
                                            MAPS
                                 (every 5 minutes)
                                                              USER
                                                            BEHAVIOR



                                      SERVING	
  SYSTEMS                   ENGAGED	
  USERS


    Build	
  customized	
  home	
  pages	
  with	
  latest	
  data	
  (thousands	
  /	
  second)	
  
Copyright	
  Yahoo	
  2011	
                                                                                 17	
  
CASE STUDY
     YAHOO! HOMEPAGE


      Personalized
      for each visitor


      Result:
      twice the engagement

                                                       Recommended	
  links	
       News	
  Interests	
       Top	
  Searches	
  

                                                      +79% clicks                 +160% clicks +43% clicks
                                                      vs. randomly selected       vs. one size fits all     vs. editor selected




Copyright	
  Yahoo	
  2011	
  Hortonworks Inc. 2011
                         ©
                                                                                                                                    18	
  
Trend: Specialization of Data Systems
• Hadoop does not replace existing systems
  – It adds new capabilities to the enterprise
  – It can offload things that are not done efficiently in current systems
       – Especially in scale out situations


• Specialization of traditional data components
  – Use OLTP systems just for transactions
  – Use OLAP systems for interactive analysis


• Hadoop has LOTS of bandwidth to storage and CPU
  – Pull reporting out OLTP systems
  – Pull ELT out of OLAP systems


      Architecting the Future of Big Data
                                                                      Page 19
      © Hortonworks Inc. 2011
Hadoop and OLTP Systems
      MPP Processing of Online Transactions              Hadoop used to Process Reports
•    Mission critical                              •     Free up 50+% processing power for
•    Manages transactions & serves reports               transaction processing system
                                                   •     Significant cost savings due to commodity
                                                         nature of Hadoop


      Web
      Site
                               Transaction     Reports
                               Processing
      Web                       Systems
      Site
                                       $$$    Transaction
                                                 Logs
      Web
      Site




                                                                                             Page 20
             © Hortonworks Inc. 2011
Hadoop and OLAP Systems
 Fast loading, raw data staging, ELT &
           long-term archival                  Allow analysts to use tools they know
         (The Agile Data Zone)                (Take advantage of huge ecosystem of
                                                     BI and Analytics tooling)


Web


                       Hadoop                                       EDW
Mobile



Social
                                         Online
                                         Archival
Other
logs


                                                                               Page 21
         © Hortonworks Inc. 2011
TRENDS: Instrument Clouds of Things
 Clouds of things logging to Hadoop         HDFS + Map-Reduce
              Websites                          Or HBase
 Mobile phones, Enterprise devices…                 +
                                                 Analysis



                                Things
                                   Things




                                Things
                                   Things




                                Things
                                   Things




                                                                Page 22
      © Hortonworks Inc. 2011
Trend: Many POCs, Few Production Systems

• The problem
  – Hadoop is still a young technology
  – Hard to find knowledgeable staff
  – Integration with existing systems


• Hadoop market is maturing at speed
  – Emerging ecosystem of Hadoop platform solutions providers
  – Apache Hadoop continues to get better
  – Hadoop training and support available form several vendors




      Architecting the Future of Big Data
                                                                 Page 23
      © Hortonworks Inc. 2011
Growth in Hadoop Ecosystem
• Hardware vendors, Public Cloud (IAAS, PAAS)
  – Storage, Appliances, Preloaded commodity boxes, cloud

• Data Systems
  – All the major vendors announced Hadoop plans / products in 2011

• BI, Analytics and ETL
  – Hadoop integrations emerging

• Dedicated Hadoop Applications
  – Datamere, Karmashere, Platfora, …

• Systems Integrators
  – Regional and Global providers available

     Architecting the Future of Big Data
                                                                Page 24
     © Hortonworks Inc. 2011
Hadoop Continues to Improve
Apache community, including Hortonworks investing to improve Hadoop:
•  Make Hadoop an Open, Extensible, and Enterprise Viable Platform
•  Enable More Applications to Run on Apache Hadoop
                                                         “Hadoop.Beyond”
                                                      Platform actively evolving

                                       “Hadoop.Next”
                                        (Hadoop 0.23)
                                     HA, Next-gen HDFS & MapReduce
   “Hadoop.Now”                      Extension & Integration APIs
    (Hadoop 1.0)
Most stable version ever
HBase, security, WebHDFS




                                                                            Page 25
           © Hortonworks Inc. 2011
Hortonworks – Approachable Hadoop
•  Apache Hadoop Leadership
   –  Delivered every major release since 0.1
   –  Driving innovation across entire stack
   –  Experience managing world’s largest
      deployment
   –  Access to Yahoo’s 1,000+ Hadoop users
      and 40k+ nodes for testing, QA, etc.


•  Business Focus
   –  Provide 100% open source product
        –  Hortonworks Data Platform                Expert Role-based Training

   –  Help customers and partners overcome
      Hadoop knowledge gaps

                                                Full Lifecycle Support and Services
   –  Help organizations successfully develop
      and deploy solutions based on Hadoop
                                                 Evaluate       Pilot      Production


          Architecting the Future of Big Data
                                                                                 Page 26
          © Hortonworks Inc. 2011
Trend: Finding More Value Over Time
• Hadoop is usually brought in to solve a specific
  problem
  – Build seach indexes for Yahoo
  – Manage web site logs for Facebook
  – Users using EC2 to do data processing at Amazon
  – Simple reporting when existing tools don’t scale


• Once your data is in Hadoop more users find value

• Once you have Hadoop, folks add more data




     Architecting the Future of Big Data
                                                       Page 27
     © Hortonworks Inc. 2011
Thank You! Questions?
Eric Baldeschwieler
@jeric14 @hortonworks




                               Page 28
     © Hortonworks Inc. 2011

More Related Content

What's hot

Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage
CCG
 
Ozone: scaling HDFS to trillions of objects
Ozone: scaling HDFS to trillions of objectsOzone: scaling HDFS to trillions of objects
Ozone: scaling HDFS to trillions of objects
DataWorks Summit
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
James Serra
 
Sharding
ShardingSharding
Sharding
MongoDB
 
Data ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiData ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFi
Lev Brailovskiy
 
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
Owen O'Malley
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
DataWorks Summit
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
Kujambu Murugesan
 
Schema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-WriteSchema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-Write
Amr Awadallah
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
DataWorks Summit/Hadoop Summit
 
Hive: Loading Data
Hive: Loading DataHive: Loading Data
Hive: Loading Data
Benjamin Leonhardi
 
Managing your Hadoop Clusters with Apache Ambari
Managing your Hadoop Clusters with Apache AmbariManaging your Hadoop Clusters with Apache Ambari
Managing your Hadoop Clusters with Apache Ambari
DataWorks Summit
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
Alex Ivy
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
DataWorks Summit
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
James Serra
 
MongoDB
MongoDBMongoDB
MongoDB
nikhil2807
 
How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?
Vincent Terrasi
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
Databricks
 
Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...
Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...
Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...
Databricks
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
Amazon Web Services
 

What's hot (20)

Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage
 
Ozone: scaling HDFS to trillions of objects
Ozone: scaling HDFS to trillions of objectsOzone: scaling HDFS to trillions of objects
Ozone: scaling HDFS to trillions of objects
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Sharding
ShardingSharding
Sharding
 
Data ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiData ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFi
 
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
 
Schema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-WriteSchema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-Write
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
 
Hive: Loading Data
Hive: Loading DataHive: Loading Data
Hive: Loading Data
 
Managing your Hadoop Clusters with Apache Ambari
Managing your Hadoop Clusters with Apache AmbariManaging your Hadoop Clusters with Apache Ambari
Managing your Hadoop Clusters with Apache Ambari
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
MongoDB
MongoDBMongoDB
MongoDB
 
How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
 
Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...
Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...
Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
 

Similar to Hadoop Trends

Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-banking
m_hepburn
 
Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011
Hortonworks
 
Hadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondHadoop - Now, Next and Beyond
Hadoop - Now, Next and Beyond
Teradata Aster
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinery
Steve Loughran
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve Loughran
JAX London
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
POSSCON
 
Introduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsIntroduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for Windows
Hortonworks
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
Hortonworks
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Hortonworks
 
Why hadoop for data science?
Why hadoop for data science?Why hadoop for data science?
Why hadoop for data science?
Hortonworks
 
Enterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the UnionEnterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the Union
Hortonworks
 
Hadoop for shanghai dev meetup
Hadoop for shanghai dev meetupHadoop for shanghai dev meetup
Hadoop for shanghai dev meetup
Roby Chen
 
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptxHortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
Slim Baltagi
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Hortonworks
 
201305 hadoop jpl-v3
201305 hadoop jpl-v3201305 hadoop jpl-v3
201305 hadoop jpl-v3
Eric Baldeschwieler
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Innovative Management Services
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Hortonworks
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks
 

Similar to Hadoop Trends (20)

Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-banking
 
Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011
 
Hadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondHadoop - Now, Next and Beyond
Hadoop - Now, Next and Beyond
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinery
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve Loughran
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Introduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsIntroduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for Windows
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
 
Why hadoop for data science?
Why hadoop for data science?Why hadoop for data science?
Why hadoop for data science?
 
Enterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the UnionEnterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the Union
 
Hadoop for shanghai dev meetup
Hadoop for shanghai dev meetupHadoop for shanghai dev meetup
Hadoop for shanghai dev meetup
 
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptxHortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
 
201305 hadoop jpl-v3
201305 hadoop jpl-v3201305 hadoop jpl-v3
201305 hadoop jpl-v3
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - Webinar
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
 

More from Hortonworks

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Hortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Hortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
Hortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Hortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
Hortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Hortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Hortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
Hortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
Hortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
Hortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Hortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Hortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
Hortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Hortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
Hortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Hortonworks
 

More from Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Recently uploaded

Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
Matthias Neugebauer
 
MAKE MONEY ONLINE Unlock Your Income Potential Today.pptx
MAKE MONEY ONLINE Unlock Your Income Potential Today.pptxMAKE MONEY ONLINE Unlock Your Income Potential Today.pptx
MAKE MONEY ONLINE Unlock Your Income Potential Today.pptx
janagijoythi
 
Communications Mining Series - Zero to Hero - Session 3
Communications Mining Series - Zero to Hero - Session 3Communications Mining Series - Zero to Hero - Session 3
Communications Mining Series - Zero to Hero - Session 3
DianaGray10
 
Redefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI CapabilitiesRedefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI Capabilities
Priyanka Aash
 
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
maigasapphire
 
Types of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technologyTypes of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technology
ldtexsolbl
 
Finetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and DefendingFinetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and Defending
Priyanka Aash
 
Zaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdfZaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdf
AmandaCheung15
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
BrainSell Technologies
 
Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17
Bhajan Mehta
 
Connector Corner: Leveraging Snowflake Integration for Smarter Decision Making
Connector Corner: Leveraging Snowflake Integration for Smarter Decision MakingConnector Corner: Leveraging Snowflake Integration for Smarter Decision Making
Connector Corner: Leveraging Snowflake Integration for Smarter Decision Making
DianaGray10
 
Semantic-Aware Code Model: Elevating the Future of Software Development
Semantic-Aware Code Model: Elevating the Future of Software DevelopmentSemantic-Aware Code Model: Elevating the Future of Software Development
Semantic-Aware Code Model: Elevating the Future of Software Development
Baishakhi Ray
 
LeadMagnet IQ Review: Unlock the Secret to Effortless Traffic and Leads.pdf
LeadMagnet IQ Review:  Unlock the Secret to Effortless Traffic and Leads.pdfLeadMagnet IQ Review:  Unlock the Secret to Effortless Traffic and Leads.pdf
LeadMagnet IQ Review: Unlock the Secret to Effortless Traffic and Leads.pdf
SelfMade bd
 
The Impact of the Internet of Things (IoT) on Smart Homes and Cities
The Impact of the Internet of Things (IoT) on Smart Homes and CitiesThe Impact of the Internet of Things (IoT) on Smart Homes and Cities
The Impact of the Internet of Things (IoT) on Smart Homes and Cities
Arpan Buwa
 
Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024
Nicolás Lopéz
 
Generative AI Reasoning Tech Talk - July 2024
Generative AI Reasoning Tech Talk - July 2024Generative AI Reasoning Tech Talk - July 2024
Generative AI Reasoning Tech Talk - July 2024
siddu769252
 
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
bhumivarma35300
 
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python CodebaseEuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
Jimmy Lai
 
Step-By-Step Process to Develop a Mobile App From Scratch
Step-By-Step Process to Develop a Mobile App From ScratchStep-By-Step Process to Develop a Mobile App From Scratch
Step-By-Step Process to Develop a Mobile App From Scratch
softsuave
 
Mastering OnlyFans Clone App Development: Key Strategies for Success
Mastering OnlyFans Clone App Development: Key Strategies for SuccessMastering OnlyFans Clone App Development: Key Strategies for Success
Mastering OnlyFans Clone App Development: Key Strategies for Success
David Wilson
 

Recently uploaded (20)

Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
 
MAKE MONEY ONLINE Unlock Your Income Potential Today.pptx
MAKE MONEY ONLINE Unlock Your Income Potential Today.pptxMAKE MONEY ONLINE Unlock Your Income Potential Today.pptx
MAKE MONEY ONLINE Unlock Your Income Potential Today.pptx
 
Communications Mining Series - Zero to Hero - Session 3
Communications Mining Series - Zero to Hero - Session 3Communications Mining Series - Zero to Hero - Session 3
Communications Mining Series - Zero to Hero - Session 3
 
Redefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI CapabilitiesRedefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI Capabilities
 
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
 
Types of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technologyTypes of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technology
 
Finetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and DefendingFinetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and Defending
 
Zaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdfZaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdf
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
 
Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17
 
Connector Corner: Leveraging Snowflake Integration for Smarter Decision Making
Connector Corner: Leveraging Snowflake Integration for Smarter Decision MakingConnector Corner: Leveraging Snowflake Integration for Smarter Decision Making
Connector Corner: Leveraging Snowflake Integration for Smarter Decision Making
 
Semantic-Aware Code Model: Elevating the Future of Software Development
Semantic-Aware Code Model: Elevating the Future of Software DevelopmentSemantic-Aware Code Model: Elevating the Future of Software Development
Semantic-Aware Code Model: Elevating the Future of Software Development
 
LeadMagnet IQ Review: Unlock the Secret to Effortless Traffic and Leads.pdf
LeadMagnet IQ Review:  Unlock the Secret to Effortless Traffic and Leads.pdfLeadMagnet IQ Review:  Unlock the Secret to Effortless Traffic and Leads.pdf
LeadMagnet IQ Review: Unlock the Secret to Effortless Traffic and Leads.pdf
 
The Impact of the Internet of Things (IoT) on Smart Homes and Cities
The Impact of the Internet of Things (IoT) on Smart Homes and CitiesThe Impact of the Internet of Things (IoT) on Smart Homes and Cities
The Impact of the Internet of Things (IoT) on Smart Homes and Cities
 
Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024
 
Generative AI Reasoning Tech Talk - July 2024
Generative AI Reasoning Tech Talk - July 2024Generative AI Reasoning Tech Talk - July 2024
Generative AI Reasoning Tech Talk - July 2024
 
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
 
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python CodebaseEuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
 
Step-By-Step Process to Develop a Mobile App From Scratch
Step-By-Step Process to Develop a Mobile App From ScratchStep-By-Step Process to Develop a Mobile App From Scratch
Step-By-Step Process to Develop a Mobile App From Scratch
 
Mastering OnlyFans Clone App Development: Key Strategies for Success
Mastering OnlyFans Clone App Development: Key Strategies for SuccessMastering OnlyFans Clone App Development: Key Strategies for Success
Mastering OnlyFans Clone App Development: Key Strategies for Success
 

Hadoop Trends

  • 1. Trends and usage of Apache Hadoop Eric Baldeschwieler CEO Hortonworks Twitter: @jeric14, @hortonworks January 2012 © Hortonworks Inc. 2011 Page 1
  • 2. Agenda • Define terms – What is Hadoop? Why does Hadoop matter? • What drives Hadoop adoption? • Observed Trends Architecting the Future of Big Data Page 2 © Hortonworks Inc. 2011
  • 3. Hortonworks Vision We believe that by 2015, more than half the world's data will be processed by Apache Hadoop How to achieve that vision??? Enable ecosystem around enterprise-viable platform. Page 3 © Hortonworks Inc. 2011
  • 4. What is Apache Hadoop? •  Solution for big data –  Deals with complexities of high volume, velocity & variety of data •  Set of open source projects •  Transforms commodity hardware into a service that: –  Stores petabytes of data reliably –  Allows huge distributed computations •  Key attributes: –  Redundant and reliable (no data loss) One of the best examples of –  Extremely powerful open source driving innovation –  Batch processing centric and creating a market –  Easy to program distributed apps –  Runs on commodity hardware Page 4 © Hortonworks Inc. 2011
  • 5. Hortonworks Data Platform (HDP) Key Components of “Standard Hadoop” Open Source Stack Core Apache Hadoop Related Hadoop Projects Open APIs for: •  Data Integration •  Data Movement •  App Job Management •  System Management Pig Hive (Data Flow) (SQL) (Columnar NoSQL Store) HBase MapReduce Zookeeper (Coordination) (Distributed Programing Framework) HCatalog (Table & Schema Management) HDFS (Hadoop Distributed File System) Page 5 © Hortonworks Inc. 2011
  • 6. Big Data Trailblazers and Use Cases data analyzing web logs analytics advertising optimization machine learning mail anti-spam text mining web search content optimization customer trend analysis ad selection video & audio processing data mining user interest prediction social media Page 6 © Hortonworks Inc. 2011
  • 7. Yahoo!, Apache Hadoop & Hortonworks http://www.wired.com/wiredenterprise/2011/10/how-yahoo-spawned-hadoop Yahoo! embraced Apache Hadoop, an open source platform, to crunch epic amounts of data using an army of dirt-cheap servers 2006 Hadoop at Yahoo! 40K+ Servers 170PB Storage 5M+ Monthly Jobs 1000+ Active Users 2011 Yahoo! spun off 22+ engineers into Hortonworks, a company focused on advancing open source Apache Hadoop for the broader market Page 7 © Hortonworks Inc. 2011
  • 8. What drives Hadoop adoption? Architecting the Future of Big Data Page 8 © Hortonworks Inc. 2011
  • 9. Market Drivers for Apache Hadoop • Business drivers – High-value projects that require use of more data Gartner predicts 800% data growth – Belief that there is great ROI in mastering big data over next 5 years • Financial drivers – Growing cost of data systems as percentage of IT spend – Cost advantage of commodity hardware + open source – Enables departmental-level big data strategies 80-90% of data produced today is unstructured • Technical drivers – Existing solutions failing under growing requirements – 3Vs - Volume, velocity, variety – Proliferation of unstructured data © Hortonworks Inc. 2011 9 © Hortonworks Inc. 2011
  • 10. Every Market has Big Data Digital data is personal, everywhere, increasingly accessible, and will continue to grow exponentially Source: McKinsey & Company report. Big data: The next frontier for innovation, competition, and productivity. May 2011. Page 10 © Hortonworks Inc. 2011
  • 11. Broader Use Case Opportunities Financial Services Healthcare •  Detect/prevent fraud •  Patient monitoring •  Model and manage risk •  Predictive modeling •  Personalize banking/insurance products •  Compliance, Archival, text search •  Compliance, Archival, … •  Data driven research Retail Web / Social / Mobile •  Behavior analysis •  Sentiment analysis •  Cross selling, recommendation engines •  Web log, image, and video analysis •  Optimize pricing, placement, design •  Personalization •  Optimize inventory and distribution •  Billing, Reporting, Network Analysis Manufacturing Government •  Simulation, Analysis, Design •  Detect/prevent fraud •  Improve service via product sensor data •  Security & Intelligence •  “Digital factory” for lean manufacturing •  Support open data initiatives Page 11 © Hortonworks Inc. 2011
  • 12. Observed Trends Architecting the Future of Big Data Page 12 © Hortonworks Inc. 2011
  • 13. Trend: Agile Data • The old way – Operational systems keep only current records, short history – Analytics systems keep only conformed / cleaned / digested data – Unstructured data locked away in operational silos – Archives offline – Inflexible, new questions require system redesigns • The new trend – Keep raw data in Hadoop for a long time – Able to produce a new analytics view on-demand – Keep a new copy of data that was previously on in silos – Can directly do new reports, experiments at low incremental cost – New products / services can be added very quickly – Agile outcome justifies new infrastructure Architecting the Future of Big Data Page 13 © Hortonworks Inc. 2011
  • 14. Traditional Enterprise Data Architecture Data Silos Traditional Data Warehouses, Serving Applications BI & Analytics Web NoSQL Traditional ETL & Data BI / Serving RDMS … Message buses EDW Marts Analytics Serving Social Sensor Text Logs Media Data Systems … Unstructured Systems Page 14 © Hortonworks Inc. 2011
  • 15. Agile Data Architecture w/Hadoop Connecting All of Your Big Data Traditional Data Warehouses, Serving Applications BI & Analytics Web NoSQL Traditional ETL & Data BI / Serving RDMS … Message buses EDW Marts Analytics EsTsL (s = Store) Custom Analytics Serving Social Sensor Text Logs Media Data Systems … Unstructured Systems Page 15 © Hortonworks Inc. 2011
  • 16. Trend: Data driven development • Limited runtime logic driven by huge lookup tables • Data computed offline on Hadoop – Machine learning, other expensive computation offline – Personalization, classification, fraud, value analysis… • Application development requires data science – Huge amounts of actually observed data key to modern services – Hadoop used as the science platform Architecting the Future of Big Data Page 16 © Hortonworks Inc. 2011
  • 17. CASE STUDY YAHOO! HOMEPAGE •  Serving Maps   SCIENCE »  Machine learning to build ever •  Users  -­‐  Interests   HADOOP better categorization models   CLUSTER •  Five  Minute   USER   CATEGORIZATION   Produc7on   BEHAVIOR   MODELS  (weekly)     •  Weekly   PRODUCTION Categoriza7on   HADOOP »  Identify user interests using models   SERVING CLUSTER Categorization models MAPS (every 5 minutes) USER BEHAVIOR SERVING  SYSTEMS ENGAGED  USERS Build  customized  home  pages  with  latest  data  (thousands  /  second)   Copyright  Yahoo  2011   17  
  • 18. CASE STUDY YAHOO! HOMEPAGE Personalized for each visitor Result: twice the engagement Recommended  links   News  Interests   Top  Searches   +79% clicks +160% clicks +43% clicks vs. randomly selected vs. one size fits all vs. editor selected Copyright  Yahoo  2011  Hortonworks Inc. 2011 © 18  
  • 19. Trend: Specialization of Data Systems • Hadoop does not replace existing systems – It adds new capabilities to the enterprise – It can offload things that are not done efficiently in current systems – Especially in scale out situations • Specialization of traditional data components – Use OLTP systems just for transactions – Use OLAP systems for interactive analysis • Hadoop has LOTS of bandwidth to storage and CPU – Pull reporting out OLTP systems – Pull ELT out of OLAP systems Architecting the Future of Big Data Page 19 © Hortonworks Inc. 2011
  • 20. Hadoop and OLTP Systems MPP Processing of Online Transactions Hadoop used to Process Reports •  Mission critical •  Free up 50+% processing power for •  Manages transactions & serves reports transaction processing system •  Significant cost savings due to commodity nature of Hadoop Web Site Transaction Reports Processing Web Systems Site $$$ Transaction Logs Web Site Page 20 © Hortonworks Inc. 2011
  • 21. Hadoop and OLAP Systems Fast loading, raw data staging, ELT & long-term archival Allow analysts to use tools they know (The Agile Data Zone) (Take advantage of huge ecosystem of BI and Analytics tooling) Web Hadoop EDW Mobile Social Online Archival Other logs Page 21 © Hortonworks Inc. 2011
  • 22. TRENDS: Instrument Clouds of Things Clouds of things logging to Hadoop HDFS + Map-Reduce Websites Or HBase Mobile phones, Enterprise devices… + Analysis Things Things Things Things Things Things Page 22 © Hortonworks Inc. 2011
  • 23. Trend: Many POCs, Few Production Systems • The problem – Hadoop is still a young technology – Hard to find knowledgeable staff – Integration with existing systems • Hadoop market is maturing at speed – Emerging ecosystem of Hadoop platform solutions providers – Apache Hadoop continues to get better – Hadoop training and support available form several vendors Architecting the Future of Big Data Page 23 © Hortonworks Inc. 2011
  • 24. Growth in Hadoop Ecosystem • Hardware vendors, Public Cloud (IAAS, PAAS) – Storage, Appliances, Preloaded commodity boxes, cloud • Data Systems – All the major vendors announced Hadoop plans / products in 2011 • BI, Analytics and ETL – Hadoop integrations emerging • Dedicated Hadoop Applications – Datamere, Karmashere, Platfora, … • Systems Integrators – Regional and Global providers available Architecting the Future of Big Data Page 24 © Hortonworks Inc. 2011
  • 25. Hadoop Continues to Improve Apache community, including Hortonworks investing to improve Hadoop: •  Make Hadoop an Open, Extensible, and Enterprise Viable Platform •  Enable More Applications to Run on Apache Hadoop “Hadoop.Beyond” Platform actively evolving “Hadoop.Next” (Hadoop 0.23) HA, Next-gen HDFS & MapReduce “Hadoop.Now” Extension & Integration APIs (Hadoop 1.0) Most stable version ever HBase, security, WebHDFS Page 25 © Hortonworks Inc. 2011
  • 26. Hortonworks – Approachable Hadoop •  Apache Hadoop Leadership –  Delivered every major release since 0.1 –  Driving innovation across entire stack –  Experience managing world’s largest deployment –  Access to Yahoo’s 1,000+ Hadoop users and 40k+ nodes for testing, QA, etc. •  Business Focus –  Provide 100% open source product –  Hortonworks Data Platform Expert Role-based Training –  Help customers and partners overcome Hadoop knowledge gaps Full Lifecycle Support and Services –  Help organizations successfully develop and deploy solutions based on Hadoop Evaluate Pilot Production Architecting the Future of Big Data Page 26 © Hortonworks Inc. 2011
  • 27. Trend: Finding More Value Over Time • Hadoop is usually brought in to solve a specific problem – Build seach indexes for Yahoo – Manage web site logs for Facebook – Users using EC2 to do data processing at Amazon – Simple reporting when existing tools don’t scale • Once your data is in Hadoop more users find value • Once you have Hadoop, folks add more data Architecting the Future of Big Data Page 27 © Hortonworks Inc. 2011
  • 28. Thank You! Questions? Eric Baldeschwieler @jeric14 @hortonworks Page 28 © Hortonworks Inc. 2011