SlideShare a Scribd company logo
1 of 9
HADOOP IN A RELATIONAL DATA
WAREHOUSE
Data andAnalytics/Enterprise DW, Expedia
June 2013
Arek Kaczmarek
Background
 Expedia
 Site
 Competitors
 DW
 Legacy
 EDW
 DNA
 Hadoop at Expedia
 Original Purpose
 Early expectations
A case study
 Project objective
 Datasets
 Competitive shopping comparisons
 Properties
 Bookings
 Clickstream demand
 Forecast
DW architecture –
what’s different?
 Normalized vs denormalized tables
 Does it matter?
 Performance
 Ingestion speed
 Analytical flexibility
DEV work – do you need
different skills?
 Data files: csv, tsv, txt or xml – which work best?
 Hive: HQL UDFs for analytic functions – do you
need them?
 Optimization – reuse your knowledge?
 Architecture (temp tables, partitions)
 HQL (set parameters)
 Load_tags: partitioning, appending, syncing
RDBMSes and Hadoop –
what’s their relationship?
- Syncing from DB2 - Exporting into HBase
- Importing from SQLServer - Exporting into SQLServer
- Exporting into DB2
Place of Hadoop in a Relational
Data Warehouse?
 Conflicting
 Mutually exclusive
 Coexisting
 Complementing
What’s the new Data Warehouse
for data and analytics?
 Complementing:
Polyglot Persistence
Questions
?
akaczmarek@expedia.com

More Related Content

More from Innovation Enterprise

Making Sales and Operations Planning a Truly Collaborative Process, Dick Ling
Making Sales and Operations Planning a Truly Collaborative Process, Dick LingMaking Sales and Operations Planning a Truly Collaborative Process, Dick Ling
Making Sales and Operations Planning a Truly Collaborative Process, Dick LingInnovation Enterprise
 
Building a Fast and Flexible Consumer-Driven Supply Chain, Stanley Black & De...
Building a Fast and Flexible Consumer-Driven Supply Chain, Stanley Black & De...Building a Fast and Flexible Consumer-Driven Supply Chain, Stanley Black & De...
Building a Fast and Flexible Consumer-Driven Supply Chain, Stanley Black & De...Innovation Enterprise
 
Strengthen the Processes to reach another level of excellence, Satish Sandhir
Strengthen the Processes to reach another level of excellence, Satish SandhirStrengthen the Processes to reach another level of excellence, Satish Sandhir
Strengthen the Processes to reach another level of excellence, Satish SandhirInnovation Enterprise
 
How to Keep S&OP From Getting "Stuck", Oliver Wight, JDA
How to Keep S&OP From Getting "Stuck", Oliver Wight, JDAHow to Keep S&OP From Getting "Stuck", Oliver Wight, JDA
How to Keep S&OP From Getting "Stuck", Oliver Wight, JDAInnovation Enterprise
 
Cisco Strategic Planning The Journey, Cisco
Cisco Strategic Planning The Journey, CiscoCisco Strategic Planning The Journey, Cisco
Cisco Strategic Planning The Journey, CiscoInnovation Enterprise
 
Sales and Operations Planning, Supported by Demand Management Capability, Sus...
Sales and Operations Planning, Supported by Demand Management Capability, Sus...Sales and Operations Planning, Supported by Demand Management Capability, Sus...
Sales and Operations Planning, Supported by Demand Management Capability, Sus...Innovation Enterprise
 
Enablers for Maturing your S&OP Processes, SherTrack
Enablers for Maturing your S&OP Processes, SherTrackEnablers for Maturing your S&OP Processes, SherTrack
Enablers for Maturing your S&OP Processes, SherTrackInnovation Enterprise
 
Sales, Inventory & Operations Planning During High Growth, GMCR
Sales, Inventory & Operations Planning During High Growth, GMCRSales, Inventory & Operations Planning During High Growth, GMCR
Sales, Inventory & Operations Planning During High Growth, GMCRInnovation Enterprise
 
Predicting The Future With Big Data: No Crystal Ball Required, TrendSpottr
Predicting The Future With Big Data: No Crystal Ball Required, TrendSpottrPredicting The Future With Big Data: No Crystal Ball Required, TrendSpottr
Predicting The Future With Big Data: No Crystal Ball Required, TrendSpottrInnovation Enterprise
 
Big Data in Education, Desire2Learn Inc
Big Data in Education, Desire2Learn IncBig Data in Education, Desire2Learn Inc
Big Data in Education, Desire2Learn IncInnovation Enterprise
 
Vizualization, The Humanization of Big Data, Aveda Canada
Vizualization, The Humanization of Big Data, Aveda CanadaVizualization, The Humanization of Big Data, Aveda Canada
Vizualization, The Humanization of Big Data, Aveda CanadaInnovation Enterprise
 
Complex Weather data and a Multi-platform Audience: Big Data at The Weather N...
Complex Weather data and a Multi-platform Audience: Big Data at The Weather N...Complex Weather data and a Multi-platform Audience: Big Data at The Weather N...
Complex Weather data and a Multi-platform Audience: Big Data at The Weather N...Innovation Enterprise
 
Tackling the Unquantifiable with Big Data, eBay
Tackling the Unquantifiable with Big Data, eBay Tackling the Unquantifiable with Big Data, eBay
Tackling the Unquantifiable with Big Data, eBay Innovation Enterprise
 
Attribution Modeling and Big Data, Google
Attribution Modeling and Big Data, GoogleAttribution Modeling and Big Data, Google
Attribution Modeling and Big Data, GoogleInnovation Enterprise
 
Google Glass: The Importance of Data Processing and Privacy, Panel Session
Google Glass: The Importance of Data Processing and Privacy, Panel SessionGoogle Glass: The Importance of Data Processing and Privacy, Panel Session
Google Glass: The Importance of Data Processing and Privacy, Panel SessionInnovation Enterprise
 

More from Innovation Enterprise (20)

Making Sales and Operations Planning a Truly Collaborative Process, Dick Ling
Making Sales and Operations Planning a Truly Collaborative Process, Dick LingMaking Sales and Operations Planning a Truly Collaborative Process, Dick Ling
Making Sales and Operations Planning a Truly Collaborative Process, Dick Ling
 
Building a Fast and Flexible Consumer-Driven Supply Chain, Stanley Black & De...
Building a Fast and Flexible Consumer-Driven Supply Chain, Stanley Black & De...Building a Fast and Flexible Consumer-Driven Supply Chain, Stanley Black & De...
Building a Fast and Flexible Consumer-Driven Supply Chain, Stanley Black & De...
 
Strengthen the Processes to reach another level of excellence, Satish Sandhir
Strengthen the Processes to reach another level of excellence, Satish SandhirStrengthen the Processes to reach another level of excellence, Satish Sandhir
Strengthen the Processes to reach another level of excellence, Satish Sandhir
 
How to Keep S&OP From Getting "Stuck", Oliver Wight, JDA
How to Keep S&OP From Getting "Stuck", Oliver Wight, JDAHow to Keep S&OP From Getting "Stuck", Oliver Wight, JDA
How to Keep S&OP From Getting "Stuck", Oliver Wight, JDA
 
S&OP Innovation, Marietta
S&OP Innovation, MariettaS&OP Innovation, Marietta
S&OP Innovation, Marietta
 
Cisco Strategic Planning The Journey, Cisco
Cisco Strategic Planning The Journey, CiscoCisco Strategic Planning The Journey, Cisco
Cisco Strategic Planning The Journey, Cisco
 
Sales and Operations Planning, Supported by Demand Management Capability, Sus...
Sales and Operations Planning, Supported by Demand Management Capability, Sus...Sales and Operations Planning, Supported by Demand Management Capability, Sus...
Sales and Operations Planning, Supported by Demand Management Capability, Sus...
 
Enablers for Maturing your S&OP Processes, SherTrack
Enablers for Maturing your S&OP Processes, SherTrackEnablers for Maturing your S&OP Processes, SherTrack
Enablers for Maturing your S&OP Processes, SherTrack
 
S&OP, Kinaxis
S&OP, KinaxisS&OP, Kinaxis
S&OP, Kinaxis
 
Sales, Inventory & Operations Planning During High Growth, GMCR
Sales, Inventory & Operations Planning During High Growth, GMCRSales, Inventory & Operations Planning During High Growth, GMCR
Sales, Inventory & Operations Planning During High Growth, GMCR
 
Predicting The Future With Big Data: No Crystal Ball Required, TrendSpottr
Predicting The Future With Big Data: No Crystal Ball Required, TrendSpottrPredicting The Future With Big Data: No Crystal Ball Required, TrendSpottr
Predicting The Future With Big Data: No Crystal Ball Required, TrendSpottr
 
Big Data Toronto, Unata
Big Data Toronto, UnataBig Data Toronto, Unata
Big Data Toronto, Unata
 
Big Data in Education, Desire2Learn Inc
Big Data in Education, Desire2Learn IncBig Data in Education, Desire2Learn Inc
Big Data in Education, Desire2Learn Inc
 
Vizualization, The Humanization of Big Data, Aveda Canada
Vizualization, The Humanization of Big Data, Aveda CanadaVizualization, The Humanization of Big Data, Aveda Canada
Vizualization, The Humanization of Big Data, Aveda Canada
 
Crowd Sourced Data, Bit Torrent
Crowd Sourced Data, Bit TorrentCrowd Sourced Data, Bit Torrent
Crowd Sourced Data, Bit Torrent
 
Complex Weather data and a Multi-platform Audience: Big Data at The Weather N...
Complex Weather data and a Multi-platform Audience: Big Data at The Weather N...Complex Weather data and a Multi-platform Audience: Big Data at The Weather N...
Complex Weather data and a Multi-platform Audience: Big Data at The Weather N...
 
Tackling the Unquantifiable with Big Data, eBay
Tackling the Unquantifiable with Big Data, eBay Tackling the Unquantifiable with Big Data, eBay
Tackling the Unquantifiable with Big Data, eBay
 
Attribution Modeling and Big Data, Google
Attribution Modeling and Big Data, GoogleAttribution Modeling and Big Data, Google
Attribution Modeling and Big Data, Google
 
Google Glass: The Importance of Data Processing and Privacy, Panel Session
Google Glass: The Importance of Data Processing and Privacy, Panel SessionGoogle Glass: The Importance of Data Processing and Privacy, Panel Session
Google Glass: The Importance of Data Processing and Privacy, Panel Session
 
Big Data Innovation Summit, Kobo
Big Data Innovation Summit, KoboBig Data Innovation Summit, Kobo
Big Data Innovation Summit, Kobo
 

Recently uploaded

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 

Recently uploaded (20)

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 

Hadoop in a Relational Data Warehouse, Expedia

  • 1. HADOOP IN A RELATIONAL DATA WAREHOUSE Data andAnalytics/Enterprise DW, Expedia June 2013 Arek Kaczmarek
  • 2. Background  Expedia  Site  Competitors  DW  Legacy  EDW  DNA  Hadoop at Expedia  Original Purpose  Early expectations
  • 3. A case study  Project objective  Datasets  Competitive shopping comparisons  Properties  Bookings  Clickstream demand  Forecast
  • 4. DW architecture – what’s different?  Normalized vs denormalized tables  Does it matter?  Performance  Ingestion speed  Analytical flexibility
  • 5. DEV work – do you need different skills?  Data files: csv, tsv, txt or xml – which work best?  Hive: HQL UDFs for analytic functions – do you need them?  Optimization – reuse your knowledge?  Architecture (temp tables, partitions)  HQL (set parameters)  Load_tags: partitioning, appending, syncing
  • 6. RDBMSes and Hadoop – what’s their relationship? - Syncing from DB2 - Exporting into HBase - Importing from SQLServer - Exporting into SQLServer - Exporting into DB2
  • 7. Place of Hadoop in a Relational Data Warehouse?  Conflicting  Mutually exclusive  Coexisting  Complementing
  • 8. What’s the new Data Warehouse for data and analytics?  Complementing: Polyglot Persistence