SlideShare a Scribd company logo
1 of 17
Download to read offline
Powering Next-Generation Data
Architectures with Apache Hadoop
Shaun Connolly, Hortonworks
@shaunconnolly

September 25, 2012




© Hortonworks Inc. 2012        Page 1
Big Data: Changing The Game for Organizations

                                                                     Transactions + Interactions
Petabytes
                  BIG DATA                       Mobile Web                  + Observations
                                                 Sentiment

                                                  User Click Stream
                                                                    SMS/MMS
                                                                                   = BIG DATA
                                                                         Speech to Text

                                                                Social Interactions & Feeds
  Terabytes       WEB                Web logs
                                                                         Spatial & GPS Coordinates
                                         A/B testing
                                                                                Sensors / RFID / Devices
                                                  Behavioral Targeting
   Gigabytes      CRM                                                                   Business Data Feeds
                                                             Dynamic Pricing
                                     Segmentation                                             External Demographics
                                                                    Search Marketing
                                         Customer Touches                                      User Generated Content
                  ERP
   Megabytes                                                           Affiliate Networks
                   Purchase detail              Support Contacts                                  HD Video, Audio, Images
                                                                         Dynamic Funnels
                   Purchase record
                                                    Offer details          Offer history            Product/Service Logs
                   Payment record



                                                  Increasing Data Variety and Complexity


                                                                                                                            Page 2
               © Hortonworks Inc. 2012
Connecting Transactions + Interactions + Observations
 Audio,                 Retain runtime models and
 Video,
Images
                         historical data for ongoing           4         Business
                              refinement & analysis
                                                                       Transactions
 Docs,
 Text,                                                                 & Interactions
 XML


  Web
 Logs,
                                                                                   Web, Mobile, CRM,
 Clicks                                                                            ERP, SCM, …
                           Big Data
Social,                    Platform                                                                              Classic
Graph,
                                                               3   Deliver refined data and                   1     ETL
Feeds
                                                                   runtime models                             processing
Sensors,     2
Devices,
  RFID
           Capture and exchange
           multi-structured data to                                       Business
Spatial,   unlock value                                                  Intelligence
 GPS
                                                                         & Analytics
                                       Retain historical data to
Events,
 Other
                                       unlock additional value     5
                                                                                       Dashboards, Reports,
                                                                                       Visualization, …

                                                                                                                  Page 3
             © Hortonworks Inc. 2012
Goal: Optimize Outcomes at Scale
                     Media     optimize                 Content
       Intelligence            optimize                 Detection
                Finance        optimize                 Algorithms
       Advertising             optimize                 Performance
                     Fraud     optimize                 Prevention
Retail / Wholesale             optimize                 Inventory turns
   Manufacturing               optimize                 Supply chains
         Healthcare            optimize                 Patient outcomes
           Education           optimize                 Learning outcomes
     Government                optimize                 Citizen services
                                     Source: Geoffrey Moore. Hadoop Summit 2012 keynote presentation.

                                                                                              Page 4
     © Hortonworks Inc. 2012
Customer: UC Irvine Medical Center
                                     Optimizing patient outcomes while lowering costs
•  UC Irvine Medical
   Center is ranked                  Current system, Epic holds 22 years of patient
   among the nation's
   best hospitals by U.S.            data, across admissions and clinical information
   News & World Report                   –  Significant cost to maintain and run system
   for the 12th year
                                         –  Difficult to access, not-integrated into any systems, stand alone
•  More than 400
   specialty and primary
   care physicians
                                     Apache Hadoop sunsets legacy system and
                                     augments new electronic medical records
•  Opened in 1976                        1.    Migrate all legacy Epic data to Apache Hadoop
                                               –     Replaced existing ETL and temporary databases with Hadoop
•  422-bed medical                                   resulting in faster more reliable transforms
   facility                                    –     Captures all legacy data not just a subset. Exposes
                                                     this data to EMR and other applications
                                         2.    Eliminate maintenance of legacy system and database licenses
                                               –     $500K in annual savings
                                         3.    Integrate data with EMR and clinical front-end
                                               –     Better service with complete patient history provided to admissions
                                                     and doctors
                                               –     Enable improved research through complete information

                                                                                                                      Page 5
               © Hortonworks Inc. 2012
Emerging Patterns of Use

                                            Big Data
                             Transactions + Interactions + Observations




                             Refine          Explore          Enrich




                                $ Business Case $

                                                                          Page 6
   © Hortonworks Inc. 2012
Operational Data Refinery
Hadoop as platform for ETL modernization
                                                                             Refine   Explore      Enrich



Unstructured     Log files           DB data   Capture
                                               •  Capture new unstructured data along with log
                                                  files all alongside existing sources
                                               •  Retain inputs in raw form for audit and
         Capture and archive
                                                  continuity purposes
           Parse & Cleanse
                                               Process
          Structure and join                   •  Parse the data & cleanse
                 Upload                        •  Apply structure and definition
                              Refinery
                                               •  Join datasets together across disparate data
                                                  sources
                                               Exchange
                                               •  Push to existing data warehouse for
                                                  downstream consumption
              Enterprise
                                               •  Feeds operational reporting and online systems
           Data Warehouse



                                                                                                Page 7
           © Hortonworks Inc. 2012
“Big Bank” Key Benefits
• Capture and archive
  – Retain 3 – 5 years instead of 2 – 10 days
  – Lower costs
  – Improved compliance
• Transform, change, refine
  – Turn upstream raw dumps into small list of “new, update, delete”
    customer records
  – Convert fixed-width EBCDIC to UTF-8 (Java and DB compatible)
  – Turn raw weblogs into sessions and behaviors
• Upload
  – Insert into Teradata for downstream “as-is” reporting and tools
  – Insert into new exploration platform for scientists to play with



      © Hortonworks Inc. 2012
Big Data Exploration & Visualization
  Hadoop as agile, ad-hoc data mart
                                                                                   Refine   Explore      Enrich



  Unstructured       Log files           DB data   Capture
                                                   •  Capture multi-structured data and retain inputs
                                                      in raw form for iterative analysis
             Capture and archive                   Process
                                                   •  Parse the data into queryable format
              Structure and join
                                                   •  Explore & analyze using Hive, Pig, Mahout and
            Categorize into tables                    other tools to discover value
           upload             JDBC / ODBC          •  Label data and type information for
                                                      compatibility and later discovery
                                         Explore
                                                   •  Pre-compute stats, groupings, patterns in data
Optional                                              to accelerate analysis
                                                   Exchange
                                                   •  Use visualization tools to facilitate exploration
                                                      and find key insights
                                  Visualization
  EDW / Datamart                      Tools        •  Optionally move actionable insights into EDW
                                                      or datamart
                                                                                                      Page 9
               © Hortonworks Inc. 2012
“Hardware Manufacturer” Key Benefits
• Capture and archive
  – Store 10M+ survey forms/year for > 3 years
  – Capture text, audio, and systems data in one platform
• Structure and join
  – Unlock freeform text and audio data
  – Un-anonymize customers
• Categorize into tables
  – Create HCatalog tables “customer”, “survey”, “freeform text”
• Upload, JDBC
  – Visualize natural satisfaction levels and groups
  – Tag customers as “happy” and report back to CRM database




      © Hortonworks Inc. 2012
Application Enrichment
Deliver Hadoop analysis to online apps
                                                                                  Refine   Explore       Enrich



Unstructured      Log files          DB data   Capture
                                               •  Capture data that was once
                                                  too bulky and unmanageable
      Capture
                          Enrich
       Parse
                                               Process
    Derive/Filter                              •    Uncover aggregate characteristics across data
                           Scheduled &
                           near real time      •    Use Hive Pig and Map Reduce to identify patterns
   NoSQL, HBase                                •    Filter useful data from mass streams (Pig)
    Low Latency
                                               •    Micro or macro batch oriented schedules

                                               Exchange
                                               •  Push results to HBase or other NoSQL alternative
                                                  for real time delivery
     Online
                                               •  Use patterns to deliver right content/offer to the
   Applications                                   right person at the right time


                                                                                                     Page 11
           © Hortonworks Inc. 2012
“Clothing Retailer” Key Benefits
• Capture
  – Capture weblogs together with sales order history, customer
    master
• Derive useful information
  – Compute relationships between products over time
      – “people who buy shirts eventually need pants”
  – Score customer web behavior / sentiment
  – Connect product recommendations to customer sentiment
• Share
  – Load customer recommendations into HBase for rapid website
    service




     © Hortonworks Inc. 2012
Hadoop in Enterprise Data Architectures
    Existing Business Infrastructure                                                 Web                      New Tech

                                                                                                                   Datameer
                                                                                                                    Tableau
                                                                                                                  Karmasphere
   IDE &          ODS &             Applications &   Visualization &                  Web                            Splunk
  Dev Tools      Datamarts          Spreadsheets       Intelligence                Applications


                                                                                                                                 Operations

                      Discovery                                                 Low Latency/
                        Tools                         EDW
                                                                                  NoSQL
                                                                                                                                 Custom   Existing



                                                              Templeton        WebHDFS             Sqoop            Flume
                                                                              HCatalog
                                                                                                                  HBase
                                                                       Pig                 Hive
                                                                       MapReduce                           HDFS
                                                                     Ambari                Oozie                    HA
                                                                                       ZooKeeper




                                                            Social               Exhaust                   logs          files
       CRM           ERP             financials             Media                 Data


                                                  Big Data Sources
                                      (transactions, observations, interactions)



                                                                                                                                          Page 13
          © Hortonworks Inc. 2012
Hortonworks Vision & Role

                                We believe that by the end of 2015,
                                more than half the world's data will be
                                processed by Apache Hadoop.



  1       Be diligent stewards of the open source core

  2       Be tireless innovators beyond the core

  3       Provide robust data platform services & open APIs

  4       Enable vibrant ecosystem at each layer of the stack

  5       Make Hadoop platform enterprise-ready & easy to use


                                                                          Page 14
      © Hortonworks Inc. 2012
What’s Needed to Drive Success?
•  Enterprise tooling to become
   a complete data platform
   –  Open deployment & provisioning
   –  Higher quality data loading
   –  Monitoring and management
   –  APIs for easy integration
                                                   www.hortonworks.com/moore

•  Ecosystem needs support & development
   –  Existing infrastructure vendors need to continue to integrate
   –  Apps need to continue to be developed on this infrastructure
   –  Well defined use cases and solution architectures need to be promoted



•  Market needs to rally around core Apache Hadoop
   –  To avoid splintering/market distraction
   –  To accelerate adoption


                                                                               Page 15
        © Hortonworks Inc. 2012
Next Steps?

1                                 Download Hortonworks Data Platform
                                  hortonworks.com/download




2   Use the getting started guide
    hortonworks.com/get-started



3   Learn more… get support

                                                             Hortonworks Support
       •  Expert role based training                         •  Full lifecycle technical support
       •  Course for admins, developers                         across four service levels
          and operators                                      •  Delivered by Apache Hadoop
       •  Certification program                                 Experts/Committers
       •  Custom onsite options                              •  Forward-compatible
        hortonworks.com/training                             hortonworks.com/support


                                                                                                   Page 16
        © Hortonworks Inc. 2012
Thank You!
Questions & Answers

Follow: @hortonworks & @shaunconnolly




                                        Page 17
    © Hortonworks Inc. 2012

More Related Content

What's hot

Big Data and Analytics
Big Data and AnalyticsBig Data and Analytics
Big Data and Analyticsdmurph4
 
Knowledgelevers expanded
Knowledgelevers expandedKnowledgelevers expanded
Knowledgelevers expandedKnowledgelevers
 
Golden Rules [Best Practices] to tame the MDM/CDI Beast
Golden Rules [Best Practices] to tame the MDM/CDI BeastGolden Rules [Best Practices] to tame the MDM/CDI Beast
Golden Rules [Best Practices] to tame the MDM/CDI BeastRhapsody Technologies, Inc.
 
Golden Rules [Best Practices] to tame the MDM/CDI Beast - A White Paper
Golden Rules [Best Practices] to tame the MDM/CDI Beast - A White PaperGolden Rules [Best Practices] to tame the MDM/CDI Beast - A White Paper
Golden Rules [Best Practices] to tame the MDM/CDI Beast - A White PaperRhapsody Technologies, Inc.
 
Slides: Using Analytics and Fraud Management To Increase Revenues and Differe...
Slides: Using Analytics and Fraud Management To Increase Revenues and Differe...Slides: Using Analytics and Fraud Management To Increase Revenues and Differe...
Slides: Using Analytics and Fraud Management To Increase Revenues and Differe...DATAVERSITY
 
Tera stream for datastreams
Tera stream for datastreamsTera stream for datastreams
Tera stream for datastreams치민 최
 
SOA and Cloud in Life Sciences
SOA and Cloud in Life SciencesSOA and Cloud in Life Sciences
SOA and Cloud in Life SciencesSandeep Bhat
 
Wso2 apac summit 2021 dassana wijesekara
Wso2 apac summit 2021   dassana wijesekaraWso2 apac summit 2021   dassana wijesekara
Wso2 apac summit 2021 dassana wijesekaraDassana Wijesekara
 
Master Data Management (MDM) 101 & Oracle Trading Community Architecture (TCA...
Master Data Management (MDM) 101 & Oracle Trading Community Architecture (TCA...Master Data Management (MDM) 101 & Oracle Trading Community Architecture (TCA...
Master Data Management (MDM) 101 & Oracle Trading Community Architecture (TCA...Rhapsody Technologies, Inc.
 
Investigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists ToolboxInvestigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists ToolboxData Science London
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)Denodo
 
Business Intelligence for kids (example project)
Business Intelligence for kids (example project)Business Intelligence for kids (example project)
Business Intelligence for kids (example project)Enrique Benito
 
Site/Location Hubs - A Hot Trend In Master Data Management (MDM)
Site/Location Hubs - A Hot Trend In Master Data Management (MDM)Site/Location Hubs - A Hot Trend In Master Data Management (MDM)
Site/Location Hubs - A Hot Trend In Master Data Management (MDM)Rhapsody Technologies, Inc.
 
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data European Data Forum
 
Streamlining the Business Process
Streamlining the Business ProcessStreamlining the Business Process
Streamlining the Business Processjingrato
 
SAP Explorer Visual Intelligence
SAP Explorer Visual IntelligenceSAP Explorer Visual Intelligence
SAP Explorer Visual IntelligenceEric Molner
 

What's hot (20)

Big Data and Analytics
Big Data and AnalyticsBig Data and Analytics
Big Data and Analytics
 
Knowledgelevers expanded
Knowledgelevers expandedKnowledgelevers expanded
Knowledgelevers expanded
 
Golden Rules [Best Practices] to tame the MDM/CDI Beast
Golden Rules [Best Practices] to tame the MDM/CDI BeastGolden Rules [Best Practices] to tame the MDM/CDI Beast
Golden Rules [Best Practices] to tame the MDM/CDI Beast
 
Aod Narrative
Aod NarrativeAod Narrative
Aod Narrative
 
Golden Rules [Best Practices] to tame the MDM/CDI Beast - A White Paper
Golden Rules [Best Practices] to tame the MDM/CDI Beast - A White PaperGolden Rules [Best Practices] to tame the MDM/CDI Beast - A White Paper
Golden Rules [Best Practices] to tame the MDM/CDI Beast - A White Paper
 
Slides: Using Analytics and Fraud Management To Increase Revenues and Differe...
Slides: Using Analytics and Fraud Management To Increase Revenues and Differe...Slides: Using Analytics and Fraud Management To Increase Revenues and Differe...
Slides: Using Analytics and Fraud Management To Increase Revenues and Differe...
 
Tera stream for datastreams
Tera stream for datastreamsTera stream for datastreams
Tera stream for datastreams
 
SOA and Cloud in Life Sciences
SOA and Cloud in Life SciencesSOA and Cloud in Life Sciences
SOA and Cloud in Life Sciences
 
Big Data - Harnessing a game changing asset
Big Data - Harnessing a game changing assetBig Data - Harnessing a game changing asset
Big Data - Harnessing a game changing asset
 
Wso2 apac summit 2021 dassana wijesekara
Wso2 apac summit 2021   dassana wijesekaraWso2 apac summit 2021   dassana wijesekara
Wso2 apac summit 2021 dassana wijesekara
 
Master Data Management (MDM) 101 & Oracle Trading Community Architecture (TCA...
Master Data Management (MDM) 101 & Oracle Trading Community Architecture (TCA...Master Data Management (MDM) 101 & Oracle Trading Community Architecture (TCA...
Master Data Management (MDM) 101 & Oracle Trading Community Architecture (TCA...
 
Investigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists ToolboxInvestigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists Toolbox
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
 
Business Intelligence for kids (example project)
Business Intelligence for kids (example project)Business Intelligence for kids (example project)
Business Intelligence for kids (example project)
 
Tera stream ETL
Tera stream ETLTera stream ETL
Tera stream ETL
 
Site/Location Hubs - A Hot Trend In Master Data Management (MDM)
Site/Location Hubs - A Hot Trend In Master Data Management (MDM)Site/Location Hubs - A Hot Trend In Master Data Management (MDM)
Site/Location Hubs - A Hot Trend In Master Data Management (MDM)
 
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
 
Streamlining the Business Process
Streamlining the Business ProcessStreamlining the Business Process
Streamlining the Business Process
 
SAP Explorer Visual Intelligence
SAP Explorer Visual IntelligenceSAP Explorer Visual Intelligence
SAP Explorer Visual Intelligence
 
OWF12/Java Michael hirt
OWF12/Java Michael hirtOWF12/Java Michael hirt
OWF12/Java Michael hirt
 

Viewers also liked

Innovations in telecom
Innovations in telecomInnovations in telecom
Innovations in telecomYulia Myronova
 
Next Generation Analytics Architecture for Business Advantage
Next Generation Analytics Architecture for Business AdvantageNext Generation Analytics Architecture for Business Advantage
Next Generation Analytics Architecture for Business AdvantageSerendio Inc.
 
Next Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon ThomasNext Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon ThomasThoughtworks
 
JDi Data Claims Management & Policy Administration System Overview
JDi Data Claims Management & Policy Administration System OverviewJDi Data Claims Management & Policy Administration System Overview
JDi Data Claims Management & Policy Administration System Overviewjdidata
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitecturePerficient, Inc.
 
Understanding the New World of Cognitive Computing
Understanding the New World of Cognitive ComputingUnderstanding the New World of Cognitive Computing
Understanding the New World of Cognitive ComputingDATAVERSITY
 

Viewers also liked (7)

Innovations in telecom
Innovations in telecomInnovations in telecom
Innovations in telecom
 
Next Generation Analytics Architecture for Business Advantage
Next Generation Analytics Architecture for Business AdvantageNext Generation Analytics Architecture for Business Advantage
Next Generation Analytics Architecture for Business Advantage
 
Next Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon ThomasNext Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon Thomas
 
SoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in UtahSoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in Utah
 
JDi Data Claims Management & Policy Administration System Overview
JDi Data Claims Management & Policy Administration System OverviewJDi Data Claims Management & Policy Administration System Overview
JDi Data Claims Management & Policy Administration System Overview
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
 
Understanding the New World of Cognitive Computing
Understanding the New World of Cognitive ComputingUnderstanding the New World of Cognitive Computing
Understanding the New World of Cognitive Computing
 

Similar to Powering Next Generation Data Architecture With Apache Hadoop

Big Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsightBig Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsightHortonworks
 
Talend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data PlatformTalend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data PlatformHortonworks
 
Hadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation ArchitecturesHadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation ArchitecturesDataWorks Summit
 
Tackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integrationTackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integrationDataWorks Summit
 
Break Through the Traditional Advertisement Services with Big Data and Apache...
Break Through the Traditional Advertisement Services with Big Data and Apache...Break Through the Traditional Advertisement Services with Big Data and Apache...
Break Through the Traditional Advertisement Services with Big Data and Apache...Hortonworks
 
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, ParisHadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, ParisOW2
 
The Next Generation of Big Data Analytics
The Next Generation of Big Data AnalyticsThe Next Generation of Big Data Analytics
The Next Generation of Big Data AnalyticsHortonworks
 
Hortonworks roadshow
Hortonworks roadshowHortonworks roadshow
Hortonworks roadshowAccenture
 
Unified big data architecture
Unified big data architectureUnified big data architecture
Unified big data architectureDataWorks Summit
 
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptxHortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptxHortonworks
 
Introduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsIntroduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsHortonworks
 
Scaling MySQL: Catch 22 of Read Write Splitting
Scaling MySQL: Catch 22 of Read Write SplittingScaling MySQL: Catch 22 of Read Write Splitting
Scaling MySQL: Catch 22 of Read Write SplittingScaleBase
 
Big dataforcf os1_23_12_final
Big dataforcf os1_23_12_finalBig dataforcf os1_23_12_final
Big dataforcf os1_23_12_finalBurrPilgerMayer
 
Scaling MySQL: Benefits of Automatic Data Distribution
Scaling MySQL: Benefits of Automatic Data DistributionScaling MySQL: Benefits of Automatic Data Distribution
Scaling MySQL: Benefits of Automatic Data DistributionScaleBase
 
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOutScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOutScaleBase
 
Cloudera Sessions - Cloudera Keynote: A Blueprint for Data Management
Cloudera Sessions - Cloudera Keynote: A Blueprint for Data ManagementCloudera Sessions - Cloudera Keynote: A Blueprint for Data Management
Cloudera Sessions - Cloudera Keynote: A Blueprint for Data ManagementCloudera, Inc.
 
Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831Cana Ko
 

Similar to Powering Next Generation Data Architecture With Apache Hadoop (20)

Big Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsightBig Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
 
Talend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data PlatformTalend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data Platform
 
Hadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation ArchitecturesHadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation Architectures
 
Tackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integrationTackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integration
 
2012 06 hortonworks paris hug
2012 06 hortonworks paris hug2012 06 hortonworks paris hug
2012 06 hortonworks paris hug
 
Break Through the Traditional Advertisement Services with Big Data and Apache...
Break Through the Traditional Advertisement Services with Big Data and Apache...Break Through the Traditional Advertisement Services with Big Data and Apache...
Break Through the Traditional Advertisement Services with Big Data and Apache...
 
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, ParisHadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
 
The Next Generation of Big Data Analytics
The Next Generation of Big Data AnalyticsThe Next Generation of Big Data Analytics
The Next Generation of Big Data Analytics
 
Hortonworks roadshow
Hortonworks roadshowHortonworks roadshow
Hortonworks roadshow
 
vBACD July 2012 - Apache Hadoop, Now and Beyond
vBACD July 2012 - Apache Hadoop, Now and BeyondvBACD July 2012 - Apache Hadoop, Now and Beyond
vBACD July 2012 - Apache Hadoop, Now and Beyond
 
Unified big data architecture
Unified big data architectureUnified big data architecture
Unified big data architecture
 
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptxHortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
 
Introduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsIntroduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for Windows
 
Scaling MySQL: Catch 22 of Read Write Splitting
Scaling MySQL: Catch 22 of Read Write SplittingScaling MySQL: Catch 22 of Read Write Splitting
Scaling MySQL: Catch 22 of Read Write Splitting
 
Big dataforcf os1_23_12_final
Big dataforcf os1_23_12_finalBig dataforcf os1_23_12_final
Big dataforcf os1_23_12_final
 
Scaling MySQL: Benefits of Automatic Data Distribution
Scaling MySQL: Benefits of Automatic Data DistributionScaling MySQL: Benefits of Automatic Data Distribution
Scaling MySQL: Benefits of Automatic Data Distribution
 
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOutScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
 
Cloudera Sessions - Cloudera Keynote: A Blueprint for Data Management
Cloudera Sessions - Cloudera Keynote: A Blueprint for Data ManagementCloudera Sessions - Cloudera Keynote: A Blueprint for Data Management
Cloudera Sessions - Cloudera Keynote: A Blueprint for Data Management
 
Search2012 ibm vf
Search2012 ibm vfSearch2012 ibm vf
Search2012 ibm vf
 
Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831
 

More from Hortonworks

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyHortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's NewHortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerHortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeHortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidHortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleHortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATAHortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseHortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseHortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationHortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementHortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCHortonworks
 

More from Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Recently uploaded

CAULIFLOWER BREEDING 1 Parmar pptx
CAULIFLOWER BREEDING 1 Parmar pptxCAULIFLOWER BREEDING 1 Parmar pptx
CAULIFLOWER BREEDING 1 Parmar pptxSaurabhParmar42
 
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxAUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxiammrhaywood
 
Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.raviapr7
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxDr. Asif Anas
 
The Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsThe Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsEugene Lysak
 
General views of Histopathology and step
General views of Histopathology and stepGeneral views of Histopathology and step
General views of Histopathology and stepobaje godwin sunday
 
How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17Celine George
 
UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024UKCGE
 
Diploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdfDiploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdfMohonDas
 
CapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptxCapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptxCapitolTechU
 
Philosophy of Education and Educational Philosophy
Philosophy of Education  and Educational PhilosophyPhilosophy of Education  and Educational Philosophy
Philosophy of Education and Educational PhilosophyShuvankar Madhu
 
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfP4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfYu Kanazawa / Osaka University
 
How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17Celine George
 
Quality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICEQuality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICESayali Powar
 
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRADUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRATanmoy Mishra
 
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...Nguyen Thanh Tu Collection
 
Practical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxPractical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxKatherine Villaluna
 
Benefits & Challenges of Inclusive Education
Benefits & Challenges of Inclusive EducationBenefits & Challenges of Inclusive Education
Benefits & Challenges of Inclusive EducationMJDuyan
 
Prescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptxPrescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptxraviapr7
 
In - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxIn - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxAditiChauhan701637
 

Recently uploaded (20)

CAULIFLOWER BREEDING 1 Parmar pptx
CAULIFLOWER BREEDING 1 Parmar pptxCAULIFLOWER BREEDING 1 Parmar pptx
CAULIFLOWER BREEDING 1 Parmar pptx
 
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxAUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
 
Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptx
 
The Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsThe Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George Wells
 
General views of Histopathology and step
General views of Histopathology and stepGeneral views of Histopathology and step
General views of Histopathology and step
 
How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17
 
UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024
 
Diploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdfDiploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdf
 
CapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptxCapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptx
 
Philosophy of Education and Educational Philosophy
Philosophy of Education  and Educational PhilosophyPhilosophy of Education  and Educational Philosophy
Philosophy of Education and Educational Philosophy
 
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfP4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
 
How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17
 
Quality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICEQuality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICE
 
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRADUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
 
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
 
Practical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxPractical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptx
 
Benefits & Challenges of Inclusive Education
Benefits & Challenges of Inclusive EducationBenefits & Challenges of Inclusive Education
Benefits & Challenges of Inclusive Education
 
Prescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptxPrescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptx
 
In - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxIn - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptx
 

Powering Next Generation Data Architecture With Apache Hadoop

  • 1. Powering Next-Generation Data Architectures with Apache Hadoop Shaun Connolly, Hortonworks @shaunconnolly September 25, 2012 © Hortonworks Inc. 2012 Page 1
  • 2. Big Data: Changing The Game for Organizations Transactions + Interactions Petabytes BIG DATA Mobile Web + Observations Sentiment User Click Stream SMS/MMS = BIG DATA Speech to Text Social Interactions & Feeds Terabytes WEB Web logs Spatial & GPS Coordinates A/B testing Sensors / RFID / Devices Behavioral Targeting Gigabytes CRM Business Data Feeds Dynamic Pricing Segmentation External Demographics Search Marketing Customer Touches User Generated Content ERP Megabytes Affiliate Networks Purchase detail Support Contacts HD Video, Audio, Images Dynamic Funnels Purchase record Offer details Offer history Product/Service Logs Payment record Increasing Data Variety and Complexity Page 2 © Hortonworks Inc. 2012
  • 3. Connecting Transactions + Interactions + Observations Audio, Retain runtime models and Video, Images historical data for ongoing 4 Business refinement & analysis Transactions Docs, Text, & Interactions XML Web Logs, Web, Mobile, CRM, Clicks ERP, SCM, … Big Data Social, Platform Classic Graph, 3 Deliver refined data and 1 ETL Feeds runtime models processing Sensors, 2 Devices, RFID Capture and exchange multi-structured data to Business Spatial, unlock value Intelligence GPS & Analytics Retain historical data to Events, Other unlock additional value 5 Dashboards, Reports, Visualization, … Page 3 © Hortonworks Inc. 2012
  • 4. Goal: Optimize Outcomes at Scale Media optimize Content Intelligence optimize Detection Finance optimize Algorithms Advertising optimize Performance Fraud optimize Prevention Retail / Wholesale optimize Inventory turns Manufacturing optimize Supply chains Healthcare optimize Patient outcomes Education optimize Learning outcomes Government optimize Citizen services Source: Geoffrey Moore. Hadoop Summit 2012 keynote presentation. Page 4 © Hortonworks Inc. 2012
  • 5. Customer: UC Irvine Medical Center Optimizing patient outcomes while lowering costs •  UC Irvine Medical Center is ranked Current system, Epic holds 22 years of patient among the nation's best hospitals by U.S. data, across admissions and clinical information News & World Report –  Significant cost to maintain and run system for the 12th year –  Difficult to access, not-integrated into any systems, stand alone •  More than 400 specialty and primary care physicians Apache Hadoop sunsets legacy system and augments new electronic medical records •  Opened in 1976 1.  Migrate all legacy Epic data to Apache Hadoop –  Replaced existing ETL and temporary databases with Hadoop •  422-bed medical resulting in faster more reliable transforms facility –  Captures all legacy data not just a subset. Exposes this data to EMR and other applications 2.  Eliminate maintenance of legacy system and database licenses –  $500K in annual savings 3.  Integrate data with EMR and clinical front-end –  Better service with complete patient history provided to admissions and doctors –  Enable improved research through complete information Page 5 © Hortonworks Inc. 2012
  • 6. Emerging Patterns of Use Big Data Transactions + Interactions + Observations Refine Explore Enrich $ Business Case $ Page 6 © Hortonworks Inc. 2012
  • 7. Operational Data Refinery Hadoop as platform for ETL modernization Refine Explore Enrich Unstructured Log files DB data Capture •  Capture new unstructured data along with log files all alongside existing sources •  Retain inputs in raw form for audit and Capture and archive continuity purposes Parse & Cleanse Process Structure and join •  Parse the data & cleanse Upload •  Apply structure and definition Refinery •  Join datasets together across disparate data sources Exchange •  Push to existing data warehouse for downstream consumption Enterprise •  Feeds operational reporting and online systems Data Warehouse Page 7 © Hortonworks Inc. 2012
  • 8. “Big Bank” Key Benefits • Capture and archive – Retain 3 – 5 years instead of 2 – 10 days – Lower costs – Improved compliance • Transform, change, refine – Turn upstream raw dumps into small list of “new, update, delete” customer records – Convert fixed-width EBCDIC to UTF-8 (Java and DB compatible) – Turn raw weblogs into sessions and behaviors • Upload – Insert into Teradata for downstream “as-is” reporting and tools – Insert into new exploration platform for scientists to play with © Hortonworks Inc. 2012
  • 9. Big Data Exploration & Visualization Hadoop as agile, ad-hoc data mart Refine Explore Enrich Unstructured Log files DB data Capture •  Capture multi-structured data and retain inputs in raw form for iterative analysis Capture and archive Process •  Parse the data into queryable format Structure and join •  Explore & analyze using Hive, Pig, Mahout and Categorize into tables other tools to discover value upload JDBC / ODBC •  Label data and type information for compatibility and later discovery Explore •  Pre-compute stats, groupings, patterns in data Optional to accelerate analysis Exchange •  Use visualization tools to facilitate exploration and find key insights Visualization EDW / Datamart Tools •  Optionally move actionable insights into EDW or datamart Page 9 © Hortonworks Inc. 2012
  • 10. “Hardware Manufacturer” Key Benefits • Capture and archive – Store 10M+ survey forms/year for > 3 years – Capture text, audio, and systems data in one platform • Structure and join – Unlock freeform text and audio data – Un-anonymize customers • Categorize into tables – Create HCatalog tables “customer”, “survey”, “freeform text” • Upload, JDBC – Visualize natural satisfaction levels and groups – Tag customers as “happy” and report back to CRM database © Hortonworks Inc. 2012
  • 11. Application Enrichment Deliver Hadoop analysis to online apps Refine Explore Enrich Unstructured Log files DB data Capture •  Capture data that was once too bulky and unmanageable Capture Enrich Parse Process Derive/Filter •  Uncover aggregate characteristics across data Scheduled & near real time •  Use Hive Pig and Map Reduce to identify patterns NoSQL, HBase •  Filter useful data from mass streams (Pig) Low Latency •  Micro or macro batch oriented schedules Exchange •  Push results to HBase or other NoSQL alternative for real time delivery Online •  Use patterns to deliver right content/offer to the Applications right person at the right time Page 11 © Hortonworks Inc. 2012
  • 12. “Clothing Retailer” Key Benefits • Capture – Capture weblogs together with sales order history, customer master • Derive useful information – Compute relationships between products over time – “people who buy shirts eventually need pants” – Score customer web behavior / sentiment – Connect product recommendations to customer sentiment • Share – Load customer recommendations into HBase for rapid website service © Hortonworks Inc. 2012
  • 13. Hadoop in Enterprise Data Architectures Existing Business Infrastructure Web New Tech Datameer Tableau Karmasphere IDE & ODS & Applications & Visualization & Web Splunk Dev Tools Datamarts Spreadsheets Intelligence Applications Operations Discovery Low Latency/ Tools EDW NoSQL Custom Existing Templeton WebHDFS Sqoop Flume HCatalog HBase Pig Hive MapReduce HDFS Ambari Oozie HA ZooKeeper Social Exhaust logs files CRM ERP financials Media Data Big Data Sources (transactions, observations, interactions) Page 13 © Hortonworks Inc. 2012
  • 14. Hortonworks Vision & Role We believe that by the end of 2015, more than half the world's data will be processed by Apache Hadoop. 1 Be diligent stewards of the open source core 2 Be tireless innovators beyond the core 3 Provide robust data platform services & open APIs 4 Enable vibrant ecosystem at each layer of the stack 5 Make Hadoop platform enterprise-ready & easy to use Page 14 © Hortonworks Inc. 2012
  • 15. What’s Needed to Drive Success? •  Enterprise tooling to become a complete data platform –  Open deployment & provisioning –  Higher quality data loading –  Monitoring and management –  APIs for easy integration www.hortonworks.com/moore •  Ecosystem needs support & development –  Existing infrastructure vendors need to continue to integrate –  Apps need to continue to be developed on this infrastructure –  Well defined use cases and solution architectures need to be promoted •  Market needs to rally around core Apache Hadoop –  To avoid splintering/market distraction –  To accelerate adoption Page 15 © Hortonworks Inc. 2012
  • 16. Next Steps? 1 Download Hortonworks Data Platform hortonworks.com/download 2 Use the getting started guide hortonworks.com/get-started 3 Learn more… get support Hortonworks Support •  Expert role based training •  Full lifecycle technical support •  Course for admins, developers across four service levels and operators •  Delivered by Apache Hadoop •  Certification program Experts/Committers •  Custom onsite options •  Forward-compatible hortonworks.com/training hortonworks.com/support Page 16 © Hortonworks Inc. 2012
  • 17. Thank You! Questions & Answers Follow: @hortonworks & @shaunconnolly Page 17 © Hortonworks Inc. 2012