Submit Search
Upload
Powering Next Generation Data Architecture With Apache Hadoop
•
3 likes
•
2,227 views
Hortonworks
Follow
Shaun Connolly presentation at Strata_London, October 1-2
Read less
Read more
Education
Report
Share
Report
Share
1 of 17
Download Now
Download to read offline
Recommended
IP&A109 Next-Generation Analytics Architecture for the Year 2020
IP&A109 Next-Generation Analytics Architecture for the Year 2020
Anjan Roy, PMP
The Comprehensive Approach: A Unified Information Architecture
The Comprehensive Approach: A Unified Information Architecture
Inside Analysis
SmartData - Monetizing Data Assets
SmartData - Monetizing Data Assets
Ed Dodds
Hadoop: What It Is and What It's Not
Hadoop: What It Is and What It's Not
Inside Analysis
Monetizing data - An Evening with Eight of Chicago's Data Product Management...
Monetizing data - An Evening with Eight of Chicago's Data Product Management...
Randy Horton
When Worlds Collide: Intelligence, Analytics and Operations
When Worlds Collide: Intelligence, Analytics and Operations
Inside Analysis
From Big Legacy Data to Insight: Lessons Learned Creating New Value from a Bi...
From Big Legacy Data to Insight: Lessons Learned Creating New Value from a Bi...
Fitzgerald Analytics, Inc.
Business & Decision MDM Summit (english version)
Business & Decision MDM Summit (english version)
Jean-Michel Franco
More Related Content
What's hot
Big Data and Analytics
Big Data and Analytics
dmurph4
Knowledgelevers expanded
Knowledgelevers expanded
Knowledgelevers
Golden Rules [Best Practices] to tame the MDM/CDI Beast
Golden Rules [Best Practices] to tame the MDM/CDI Beast
Rhapsody Technologies, Inc.
Aod Narrative
Aod Narrative
DianaBetancourt
Golden Rules [Best Practices] to tame the MDM/CDI Beast - A White Paper
Golden Rules [Best Practices] to tame the MDM/CDI Beast - A White Paper
Rhapsody Technologies, Inc.
Slides: Using Analytics and Fraud Management To Increase Revenues and Differe...
Slides: Using Analytics and Fraud Management To Increase Revenues and Differe...
DATAVERSITY
Tera stream for datastreams
Tera stream for datastreams
치민 최
SOA and Cloud in Life Sciences
SOA and Cloud in Life Sciences
Sandeep Bhat
Big Data - Harnessing a game changing asset
Big Data - Harnessing a game changing asset
SAS Institute India Pvt. Ltd
Wso2 apac summit 2021 dassana wijesekara
Wso2 apac summit 2021 dassana wijesekara
Dassana Wijesekara
Master Data Management (MDM) 101 & Oracle Trading Community Architecture (TCA...
Master Data Management (MDM) 101 & Oracle Trading Community Architecture (TCA...
Rhapsody Technologies, Inc.
Investigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists Toolbox
Data Science London
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
Denodo
Business Intelligence for kids (example project)
Business Intelligence for kids (example project)
Enrique Benito
Tera stream ETL
Tera stream ETL
Nguyễn Nguyễn Mạnh Trung
Site/Location Hubs - A Hot Trend In Master Data Management (MDM)
Site/Location Hubs - A Hot Trend In Master Data Management (MDM)
Rhapsody Technologies, Inc.
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
European Data Forum
Streamlining the Business Process
Streamlining the Business Process
jingrato
SAP Explorer Visual Intelligence
SAP Explorer Visual Intelligence
Eric Molner
OWF12/Java Michael hirt
OWF12/Java Michael hirt
Paris Open Source Summit
What's hot
(20)
Big Data and Analytics
Big Data and Analytics
Knowledgelevers expanded
Knowledgelevers expanded
Golden Rules [Best Practices] to tame the MDM/CDI Beast
Golden Rules [Best Practices] to tame the MDM/CDI Beast
Aod Narrative
Aod Narrative
Golden Rules [Best Practices] to tame the MDM/CDI Beast - A White Paper
Golden Rules [Best Practices] to tame the MDM/CDI Beast - A White Paper
Slides: Using Analytics and Fraud Management To Increase Revenues and Differe...
Slides: Using Analytics and Fraud Management To Increase Revenues and Differe...
Tera stream for datastreams
Tera stream for datastreams
SOA and Cloud in Life Sciences
SOA and Cloud in Life Sciences
Big Data - Harnessing a game changing asset
Big Data - Harnessing a game changing asset
Wso2 apac summit 2021 dassana wijesekara
Wso2 apac summit 2021 dassana wijesekara
Master Data Management (MDM) 101 & Oracle Trading Community Architecture (TCA...
Master Data Management (MDM) 101 & Oracle Trading Community Architecture (TCA...
Investigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists Toolbox
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
Business Intelligence for kids (example project)
Business Intelligence for kids (example project)
Tera stream ETL
Tera stream ETL
Site/Location Hubs - A Hot Trend In Master Data Management (MDM)
Site/Location Hubs - A Hot Trend In Master Data Management (MDM)
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
Streamlining the Business Process
Streamlining the Business Process
SAP Explorer Visual Intelligence
SAP Explorer Visual Intelligence
OWF12/Java Michael hirt
OWF12/Java Michael hirt
Viewers also liked
Innovations in telecom
Innovations in telecom
Yulia Myronova
Next Generation Analytics Architecture for Business Advantage
Next Generation Analytics Architecture for Business Advantage
Serendio Inc.
Next Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon Thomas
Thoughtworks
SoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in Utah
Serhiy (Serge) Haziyev
JDi Data Claims Management & Policy Administration System Overview
JDi Data Claims Management & Policy Administration System Overview
jdidata
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
Perficient, Inc.
Understanding the New World of Cognitive Computing
Understanding the New World of Cognitive Computing
DATAVERSITY
Viewers also liked
(7)
Innovations in telecom
Innovations in telecom
Next Generation Analytics Architecture for Business Advantage
Next Generation Analytics Architecture for Business Advantage
Next Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon Thomas
SoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in Utah
JDi Data Claims Management & Policy Administration System Overview
JDi Data Claims Management & Policy Administration System Overview
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
Understanding the New World of Cognitive Computing
Understanding the New World of Cognitive Computing
Similar to Powering Next Generation Data Architecture With Apache Hadoop
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
Hortonworks
Talend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data Platform
Hortonworks
Hadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation Architectures
DataWorks Summit
Tackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integration
DataWorks Summit
2012 06 hortonworks paris hug
2012 06 hortonworks paris hug
Modern Data Stack France
Break Through the Traditional Advertisement Services with Big Data and Apache...
Break Through the Traditional Advertisement Services with Big Data and Apache...
Hortonworks
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
OW2
The Next Generation of Big Data Analytics
The Next Generation of Big Data Analytics
Hortonworks
Hortonworks roadshow
Hortonworks roadshow
Accenture
vBACD July 2012 - Apache Hadoop, Now and Beyond
vBACD July 2012 - Apache Hadoop, Now and Beyond
CloudStack - Open Source Cloud Computing Project
Unified big data architecture
Unified big data architecture
DataWorks Summit
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks
Introduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for Windows
Hortonworks
Scaling MySQL: Catch 22 of Read Write Splitting
Scaling MySQL: Catch 22 of Read Write Splitting
ScaleBase
Big dataforcf os1_23_12_final
Big dataforcf os1_23_12_final
BurrPilgerMayer
Scaling MySQL: Benefits of Automatic Data Distribution
Scaling MySQL: Benefits of Automatic Data Distribution
ScaleBase
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
ScaleBase
Cloudera Sessions - Cloudera Keynote: A Blueprint for Data Management
Cloudera Sessions - Cloudera Keynote: A Blueprint for Data Management
Cloudera, Inc.
Search2012 ibm vf
Search2012 ibm vf
Isabelle Claverie-Berge
Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831
Cana Ko
Similar to Powering Next Generation Data Architecture With Apache Hadoop
(20)
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
Talend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data Platform
Hadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation Architectures
Tackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integration
2012 06 hortonworks paris hug
2012 06 hortonworks paris hug
Break Through the Traditional Advertisement Services with Big Data and Apache...
Break Through the Traditional Advertisement Services with Big Data and Apache...
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
The Next Generation of Big Data Analytics
The Next Generation of Big Data Analytics
Hortonworks roadshow
Hortonworks roadshow
vBACD July 2012 - Apache Hadoop, Now and Beyond
vBACD July 2012 - Apache Hadoop, Now and Beyond
Unified big data architecture
Unified big data architecture
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Introduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for Windows
Scaling MySQL: Catch 22 of Read Write Splitting
Scaling MySQL: Catch 22 of Read Write Splitting
Big dataforcf os1_23_12_final
Big dataforcf os1_23_12_final
Scaling MySQL: Benefits of Automatic Data Distribution
Scaling MySQL: Benefits of Automatic Data Distribution
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
Cloudera Sessions - Cloudera Keynote: A Blueprint for Data Management
Cloudera Sessions - Cloudera Keynote: A Blueprint for Data Management
Search2012 ibm vf
Search2012 ibm vf
Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831
More from Hortonworks
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Hortonworks
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Hortonworks
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
Hortonworks
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Hortonworks
HDF 3.2 - What's New
HDF 3.2 - What's New
Hortonworks
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Hortonworks
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Hortonworks
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
Hortonworks
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
Hortonworks
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
Hortonworks
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Hortonworks
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Hortonworks
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Hortonworks
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
Hortonworks
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Hortonworks
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
Hortonworks
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Hortonworks
More from Hortonworks
(20)
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
HDF 3.2 - What's New
HDF 3.2 - What's New
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Recently uploaded
CAULIFLOWER BREEDING 1 Parmar pptx
CAULIFLOWER BREEDING 1 Parmar pptx
SaurabhParmar42
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
iammrhaywood
Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.
raviapr7
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptx
Dr. Asif Anas
The Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George Wells
Eugene Lysak
General views of Histopathology and step
General views of Histopathology and step
obaje godwin sunday
How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17
Celine George
UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024
UKCGE
Diploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdf
MohonDas
CapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptx
CapitolTechU
Philosophy of Education and Educational Philosophy
Philosophy of Education and Educational Philosophy
Shuvankar Madhu
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
Yu Kanazawa / Osaka University
How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17
Celine George
Quality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICE
Sayali Powar
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
Tanmoy Mishra
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
Nguyen Thanh Tu Collection
Practical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptx
Katherine Villaluna
Benefits & Challenges of Inclusive Education
Benefits & Challenges of Inclusive Education
MJDuyan
Prescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptx
raviapr7
In - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptx
AditiChauhan701637
Recently uploaded
(20)
CAULIFLOWER BREEDING 1 Parmar pptx
CAULIFLOWER BREEDING 1 Parmar pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptx
The Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George Wells
General views of Histopathology and step
General views of Histopathology and step
How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17
UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024
Diploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdf
CapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptx
Philosophy of Education and Educational Philosophy
Philosophy of Education and Educational Philosophy
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17
Quality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICE
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
Practical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptx
Benefits & Challenges of Inclusive Education
Benefits & Challenges of Inclusive Education
Prescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptx
In - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptx
Powering Next Generation Data Architecture With Apache Hadoop
1.
Powering Next-Generation Data Architectures
with Apache Hadoop Shaun Connolly, Hortonworks @shaunconnolly September 25, 2012 © Hortonworks Inc. 2012 Page 1
2.
Big Data: Changing
The Game for Organizations Transactions + Interactions Petabytes BIG DATA Mobile Web + Observations Sentiment User Click Stream SMS/MMS = BIG DATA Speech to Text Social Interactions & Feeds Terabytes WEB Web logs Spatial & GPS Coordinates A/B testing Sensors / RFID / Devices Behavioral Targeting Gigabytes CRM Business Data Feeds Dynamic Pricing Segmentation External Demographics Search Marketing Customer Touches User Generated Content ERP Megabytes Affiliate Networks Purchase detail Support Contacts HD Video, Audio, Images Dynamic Funnels Purchase record Offer details Offer history Product/Service Logs Payment record Increasing Data Variety and Complexity Page 2 © Hortonworks Inc. 2012
3.
Connecting Transactions +
Interactions + Observations Audio, Retain runtime models and Video, Images historical data for ongoing 4 Business refinement & analysis Transactions Docs, Text, & Interactions XML Web Logs, Web, Mobile, CRM, Clicks ERP, SCM, … Big Data Social, Platform Classic Graph, 3 Deliver refined data and 1 ETL Feeds runtime models processing Sensors, 2 Devices, RFID Capture and exchange multi-structured data to Business Spatial, unlock value Intelligence GPS & Analytics Retain historical data to Events, Other unlock additional value 5 Dashboards, Reports, Visualization, … Page 3 © Hortonworks Inc. 2012
4.
Goal: Optimize Outcomes
at Scale Media optimize Content Intelligence optimize Detection Finance optimize Algorithms Advertising optimize Performance Fraud optimize Prevention Retail / Wholesale optimize Inventory turns Manufacturing optimize Supply chains Healthcare optimize Patient outcomes Education optimize Learning outcomes Government optimize Citizen services Source: Geoffrey Moore. Hadoop Summit 2012 keynote presentation. Page 4 © Hortonworks Inc. 2012
5.
Customer: UC Irvine
Medical Center Optimizing patient outcomes while lowering costs • UC Irvine Medical Center is ranked Current system, Epic holds 22 years of patient among the nation's best hospitals by U.S. data, across admissions and clinical information News & World Report – Significant cost to maintain and run system for the 12th year – Difficult to access, not-integrated into any systems, stand alone • More than 400 specialty and primary care physicians Apache Hadoop sunsets legacy system and augments new electronic medical records • Opened in 1976 1. Migrate all legacy Epic data to Apache Hadoop – Replaced existing ETL and temporary databases with Hadoop • 422-bed medical resulting in faster more reliable transforms facility – Captures all legacy data not just a subset. Exposes this data to EMR and other applications 2. Eliminate maintenance of legacy system and database licenses – $500K in annual savings 3. Integrate data with EMR and clinical front-end – Better service with complete patient history provided to admissions and doctors – Enable improved research through complete information Page 5 © Hortonworks Inc. 2012
6.
Emerging Patterns of
Use Big Data Transactions + Interactions + Observations Refine Explore Enrich $ Business Case $ Page 6 © Hortonworks Inc. 2012
7.
Operational Data Refinery Hadoop
as platform for ETL modernization Refine Explore Enrich Unstructured Log files DB data Capture • Capture new unstructured data along with log files all alongside existing sources • Retain inputs in raw form for audit and Capture and archive continuity purposes Parse & Cleanse Process Structure and join • Parse the data & cleanse Upload • Apply structure and definition Refinery • Join datasets together across disparate data sources Exchange • Push to existing data warehouse for downstream consumption Enterprise • Feeds operational reporting and online systems Data Warehouse Page 7 © Hortonworks Inc. 2012
8.
“Big Bank” Key
Benefits • Capture and archive – Retain 3 – 5 years instead of 2 – 10 days – Lower costs – Improved compliance • Transform, change, refine – Turn upstream raw dumps into small list of “new, update, delete” customer records – Convert fixed-width EBCDIC to UTF-8 (Java and DB compatible) – Turn raw weblogs into sessions and behaviors • Upload – Insert into Teradata for downstream “as-is” reporting and tools – Insert into new exploration platform for scientists to play with © Hortonworks Inc. 2012
9.
Big Data Exploration
& Visualization Hadoop as agile, ad-hoc data mart Refine Explore Enrich Unstructured Log files DB data Capture • Capture multi-structured data and retain inputs in raw form for iterative analysis Capture and archive Process • Parse the data into queryable format Structure and join • Explore & analyze using Hive, Pig, Mahout and Categorize into tables other tools to discover value upload JDBC / ODBC • Label data and type information for compatibility and later discovery Explore • Pre-compute stats, groupings, patterns in data Optional to accelerate analysis Exchange • Use visualization tools to facilitate exploration and find key insights Visualization EDW / Datamart Tools • Optionally move actionable insights into EDW or datamart Page 9 © Hortonworks Inc. 2012
10.
“Hardware Manufacturer” Key
Benefits • Capture and archive – Store 10M+ survey forms/year for > 3 years – Capture text, audio, and systems data in one platform • Structure and join – Unlock freeform text and audio data – Un-anonymize customers • Categorize into tables – Create HCatalog tables “customer”, “survey”, “freeform text” • Upload, JDBC – Visualize natural satisfaction levels and groups – Tag customers as “happy” and report back to CRM database © Hortonworks Inc. 2012
11.
Application Enrichment Deliver Hadoop
analysis to online apps Refine Explore Enrich Unstructured Log files DB data Capture • Capture data that was once too bulky and unmanageable Capture Enrich Parse Process Derive/Filter • Uncover aggregate characteristics across data Scheduled & near real time • Use Hive Pig and Map Reduce to identify patterns NoSQL, HBase • Filter useful data from mass streams (Pig) Low Latency • Micro or macro batch oriented schedules Exchange • Push results to HBase or other NoSQL alternative for real time delivery Online • Use patterns to deliver right content/offer to the Applications right person at the right time Page 11 © Hortonworks Inc. 2012
12.
“Clothing Retailer” Key
Benefits • Capture – Capture weblogs together with sales order history, customer master • Derive useful information – Compute relationships between products over time – “people who buy shirts eventually need pants” – Score customer web behavior / sentiment – Connect product recommendations to customer sentiment • Share – Load customer recommendations into HBase for rapid website service © Hortonworks Inc. 2012
13.
Hadoop in Enterprise
Data Architectures Existing Business Infrastructure Web New Tech Datameer Tableau Karmasphere IDE & ODS & Applications & Visualization & Web Splunk Dev Tools Datamarts Spreadsheets Intelligence Applications Operations Discovery Low Latency/ Tools EDW NoSQL Custom Existing Templeton WebHDFS Sqoop Flume HCatalog HBase Pig Hive MapReduce HDFS Ambari Oozie HA ZooKeeper Social Exhaust logs files CRM ERP financials Media Data Big Data Sources (transactions, observations, interactions) Page 13 © Hortonworks Inc. 2012
14.
Hortonworks Vision &
Role We believe that by the end of 2015, more than half the world's data will be processed by Apache Hadoop. 1 Be diligent stewards of the open source core 2 Be tireless innovators beyond the core 3 Provide robust data platform services & open APIs 4 Enable vibrant ecosystem at each layer of the stack 5 Make Hadoop platform enterprise-ready & easy to use Page 14 © Hortonworks Inc. 2012
15.
What’s Needed to
Drive Success? • Enterprise tooling to become a complete data platform – Open deployment & provisioning – Higher quality data loading – Monitoring and management – APIs for easy integration www.hortonworks.com/moore • Ecosystem needs support & development – Existing infrastructure vendors need to continue to integrate – Apps need to continue to be developed on this infrastructure – Well defined use cases and solution architectures need to be promoted • Market needs to rally around core Apache Hadoop – To avoid splintering/market distraction – To accelerate adoption Page 15 © Hortonworks Inc. 2012
16.
Next Steps? 1
Download Hortonworks Data Platform hortonworks.com/download 2 Use the getting started guide hortonworks.com/get-started 3 Learn more… get support Hortonworks Support • Expert role based training • Full lifecycle technical support • Course for admins, developers across four service levels and operators • Delivered by Apache Hadoop • Certification program Experts/Committers • Custom onsite options • Forward-compatible hortonworks.com/training hortonworks.com/support Page 16 © Hortonworks Inc. 2012
17.
Thank You! Questions &
Answers Follow: @hortonworks & @shaunconnolly Page 17 © Hortonworks Inc. 2012
Download Now