SlideShare a Scribd company logo
1 of 19
Big Data redefines
   Enterprise Data Warehouse
Big Data Innovation, Unicom - Bangalore
                        February 2013
                       Raghu Kashyap
About Raghu Kashyap
Personal

■ Director – Data Insights Group @ Orbitz Worldwide
■ eMail: raghu.kashyap@orbitz.com
■ Twitter: @ragskashyap
■ Blog: http://kashyaps.com
■ LinkedIn: http://www.linkedin.com/in/raghukashyap/




                                                 Areas of Responsibility

                                                 ■ Orbitz Services Bangalore Center Head

                                                 ■ Lead Big Data team that builds out Global Data Infrastructure for
                                                   Orbitz Worldwide and provides business insights.

                                                 ■ US, Europe, Australia(APAC)



 page 2
Orbitz in a nutshell




page 3
Orbitz Worldwide




page 4
Back to the future.




page 5
Vendor evaluation

     • KARMAsphere
     • Datameer
     • Aster Data




page 6
Traditional Data warehouse
                                Greenplum
         Raw
         logs


                                 ETL
                Staging
                 table


                                            Temp tables




                                                  ETL



                    Data Mart
page 7
Hadoop Infrastructure




page 8
Redefine Enterprise Data warehouse
    ETL only approach
    2:12 seconds


    Run map reduce job


    1m 14.298s


    Port flat file to Greenplum using GP connector


    Time: 5.077 s


page 9
Approach with Hadoop and ETL
          Raw                                 Greenplum
          logs




                                               Event Model

                  Map Reduce


                                                             ETL
     Flat files
                     GP Connector
                                    External Tables

page 10
Resolving database keys
                tag_value_dim                    Greenplum
    id        tag      value                    tag_value_dim
    1         pos      ORB               id      tag          value
    2         pos      ORBC              200     pos          ORB
    3         pos      ORB               157     pos          ORBC


                                          ETL
                    fact
                                                       fact
   id     tag value id fact
                                    id         Tag value id     value
                       value
                                               200              $ 5600
          1                $ 5600
                                               200              $ 7500
          3                $ 7500



page 11
Hadoop Configuration

     • 74 Nodes
     • >1PB
     • Hive
     • Flume
     • HBase
     •R
     • Cloudera Distribution
     • Greenplum Connector




page 12
Hadoop Applications

     Site Analytics


     Machine Learning


     Multi Variate Testing Analysis


     Production Logs


     Hotel Rate Cache TTL


page 13
Hadoop Usage




page 14
Business Performance Monitoring

     • EFX
     • Marketing channels
     • Shopper patterns
     • Recommendation Module




page 15
Multi channel attribution




page 16
MVT

     Analyze behavioral and Test data from our MVT testing




page 17
Lessons Learnt




   Analytics using Big Data comes with a price.

   Data Governance

   Senior Leadership buy in

   I can't tell you the key to success, but the key to failure is
    trying to please everyone." -Ed Sheeran
page 18
Thank you




page 19

More Related Content

What's hot

Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Takrim Ul Islam Laskar
 
Pig and Pig Latin - Module 5
Pig and Pig Latin - Module 5Pig and Pig Latin - Module 5
Pig and Pig Latin - Module 5Rohit Agrawal
 
Big Data Hadoop Training in Pune-Course Content Advanto Software
Big Data Hadoop Training in Pune-Course Content Advanto SoftwareBig Data Hadoop Training in Pune-Course Content Advanto Software
Big Data Hadoop Training in Pune-Course Content Advanto SoftwareAdvanto Software
 
Hadoop 2 cluster architecture
Hadoop 2 cluster architectureHadoop 2 cluster architecture
Hadoop 2 cluster architectureSandeep Patil
 
IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company i...
IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company i...IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company i...
IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company i...Torsten Steinbach
 
Best Practices for Migrating Legacy Data Warehouses into Amazon Redshift
Best Practices for Migrating Legacy Data Warehouses into Amazon RedshiftBest Practices for Migrating Legacy Data Warehouses into Amazon Redshift
Best Practices for Migrating Legacy Data Warehouses into Amazon RedshiftAmazon Web Services
 
Future of HCatalog - Hadoop Summit 2012
Future of HCatalog - Hadoop Summit 2012Future of HCatalog - Hadoop Summit 2012
Future of HCatalog - Hadoop Summit 2012Hortonworks
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadooproyans
 
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaPig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaEdureka!
 
HCatalog Hadoop Summit 2011
HCatalog Hadoop Summit 2011HCatalog Hadoop Summit 2011
HCatalog Hadoop Summit 2011Hortonworks
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingMitsuharu Hamba
 
Big Data Hadoop Training
Big Data Hadoop TrainingBig Data Hadoop Training
Big Data Hadoop Trainingstratapps
 
Lessons learned mongodb to redhsift - meetup July 1st Tel Aviv
Lessons learned   mongodb to redhsift - meetup July 1st Tel AvivLessons learned   mongodb to redhsift - meetup July 1st Tel Aviv
Lessons learned mongodb to redhsift - meetup July 1st Tel AvivRoie Shavit
 

What's hot (20)

Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)
 
Pig and Pig Latin - Module 5
Pig and Pig Latin - Module 5Pig and Pig Latin - Module 5
Pig and Pig Latin - Module 5
 
Hadoop pig
Hadoop pigHadoop pig
Hadoop pig
 
Big Data Hadoop Training in Pune-Course Content Advanto Software
Big Data Hadoop Training in Pune-Course Content Advanto SoftwareBig Data Hadoop Training in Pune-Course Content Advanto Software
Big Data Hadoop Training in Pune-Course Content Advanto Software
 
Hadoop 2 cluster architecture
Hadoop 2 cluster architectureHadoop 2 cluster architecture
Hadoop 2 cluster architecture
 
IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company i...
IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company i...IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company i...
IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company i...
 
Best Practices for Migrating Legacy Data Warehouses into Amazon Redshift
Best Practices for Migrating Legacy Data Warehouses into Amazon RedshiftBest Practices for Migrating Legacy Data Warehouses into Amazon Redshift
Best Practices for Migrating Legacy Data Warehouses into Amazon Redshift
 
Future of HCatalog - Hadoop Summit 2012
Future of HCatalog - Hadoop Summit 2012Future of HCatalog - Hadoop Summit 2012
Future of HCatalog - Hadoop Summit 2012
 
RTree Spatial Indexing with MongoDB - MongoDC
RTree Spatial Indexing with MongoDB - MongoDC RTree Spatial Indexing with MongoDB - MongoDC
RTree Spatial Indexing with MongoDB - MongoDC
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
 
Pilot Project for HDF5 Metadata Structures for SWOT
Pilot Project for HDF5 Metadata Structures for SWOTPilot Project for HDF5 Metadata Structures for SWOT
Pilot Project for HDF5 Metadata Structures for SWOT
 
Vertica
VerticaVertica
Vertica
 
May 2013 HUG: HCatalog/Hive Data Out
May 2013 HUG: HCatalog/Hive Data OutMay 2013 HUG: HCatalog/Hive Data Out
May 2013 HUG: HCatalog/Hive Data Out
 
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaPig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
 
HCatalog Hadoop Summit 2011
HCatalog Hadoop Summit 2011HCatalog Hadoop Summit 2011
HCatalog Hadoop Summit 2011
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReading
 
Big Data Hadoop Training
Big Data Hadoop TrainingBig Data Hadoop Training
Big Data Hadoop Training
 
Lessons learned mongodb to redhsift - meetup July 1st Tel Aviv
Lessons learned   mongodb to redhsift - meetup July 1st Tel AvivLessons learned   mongodb to redhsift - meetup July 1st Tel Aviv
Lessons learned mongodb to redhsift - meetup July 1st Tel Aviv
 
Hive hcatalog
Hive hcatalogHive hcatalog
Hive hcatalog
 
ICESat-2 Metadata and Status
ICESat-2 Metadata and StatusICESat-2 Metadata and Status
ICESat-2 Metadata and Status
 

Similar to Big Data redefines Enterprise Data Warehouse @Bangalore

Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォームPivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォームMasayuki Matsushita
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointInside Analysis
 
Webinar: How Banks Manage Reference Data with MongoDB
 Webinar: How Banks Manage Reference Data with MongoDB Webinar: How Banks Manage Reference Data with MongoDB
Webinar: How Banks Manage Reference Data with MongoDBMongoDB
 
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with InnovationNot Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with InnovationInside Analysis
 
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase DataWorks Summit
 
New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13EDB
 
EDB Postgres Platform 11 Webinar
EDB Postgres Platform 11 WebinarEDB Postgres Platform 11 Webinar
EDB Postgres Platform 11 WebinarEDB
 
Game Changed – How Hadoop is Reinventing Enterprise Thinking
Game Changed – How Hadoop is Reinventing Enterprise ThinkingGame Changed – How Hadoop is Reinventing Enterprise Thinking
Game Changed – How Hadoop is Reinventing Enterprise ThinkingInside Analysis
 
Big Data - HDInsight and Power BI
Big Data - HDInsight and Power BIBig Data - HDInsight and Power BI
Big Data - HDInsight and Power BIPrasad Prabhu (PP)
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFKeith Kraus
 
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...Chetan Khatri
 
Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseMongoDB
 
World Domination with Pentaho EE?
World Domination with Pentaho EE?World Domination with Pentaho EE?
World Domination with Pentaho EE?Jos van Dongen
 
Practical introduction to hadoop
Practical introduction to hadoopPractical introduction to hadoop
Practical introduction to hadoopinside-BigData.com
 
Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)PyData
 
Postgres Databases in Minutes with the EDB Postgres Cloud Database Service
Postgres Databases in Minutes with the EDB Postgres Cloud Database ServicePostgres Databases in Minutes with the EDB Postgres Cloud Database Service
Postgres Databases in Minutes with the EDB Postgres Cloud Database ServiceEDB
 

Similar to Big Data redefines Enterprise Data Warehouse @Bangalore (20)

Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォームPivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
Webinar: How Banks Manage Reference Data with MongoDB
 Webinar: How Banks Manage Reference Data with MongoDB Webinar: How Banks Manage Reference Data with MongoDB
Webinar: How Banks Manage Reference Data with MongoDB
 
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with InnovationNot Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
 
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Spark Worshop
Spark WorshopSpark Worshop
Spark Worshop
 
New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13
 
EDB Postgres Platform 11 Webinar
EDB Postgres Platform 11 WebinarEDB Postgres Platform 11 Webinar
EDB Postgres Platform 11 Webinar
 
Game Changed – How Hadoop is Reinventing Enterprise Thinking
Game Changed – How Hadoop is Reinventing Enterprise ThinkingGame Changed – How Hadoop is Reinventing Enterprise Thinking
Game Changed – How Hadoop is Reinventing Enterprise Thinking
 
Dancing with the Elephant
Dancing with the ElephantDancing with the Elephant
Dancing with the Elephant
 
Big Data - HDInsight and Power BI
Big Data - HDInsight and Power BIBig Data - HDInsight and Power BI
Big Data - HDInsight and Power BI
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
 
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
 
Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick Database
 
World Domination with Pentaho EE?
World Domination with Pentaho EE?World Domination with Pentaho EE?
World Domination with Pentaho EE?
 
Practical introduction to hadoop
Practical introduction to hadoopPractical introduction to hadoop
Practical introduction to hadoop
 
Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)
 
Ibm db2 big sql
Ibm db2 big sqlIbm db2 big sql
Ibm db2 big sql
 
Postgres Databases in Minutes with the EDB Postgres Cloud Database Service
Postgres Databases in Minutes with the EDB Postgres Cloud Database ServicePostgres Databases in Minutes with the EDB Postgres Cloud Database Service
Postgres Databases in Minutes with the EDB Postgres Cloud Database Service
 

More from Raghu Kashyap

Agile 2017 Lean Product Development
Agile 2017 Lean Product DevelopmentAgile 2017 Lean Product Development
Agile 2017 Lean Product DevelopmentRaghu Kashyap
 
Is BI/Analytics and Agile an Oxymoron?
Is BI/Analytics and Agile an Oxymoron?Is BI/Analytics and Agile an Oxymoron?
Is BI/Analytics and Agile an Oxymoron?Raghu Kashyap
 
Traditional BI or Disruptive BI?
Traditional BI or Disruptive BI?Traditional BI or Disruptive BI?
Traditional BI or Disruptive BI?Raghu Kashyap
 
Idiots guide to stocks
Idiots guide to stocksIdiots guide to stocks
Idiots guide to stocksRaghu Kashyap
 
Orbitz fifth elephant_2015_conference_orbitz_presentation
Orbitz fifth elephant_2015_conference_orbitz_presentationOrbitz fifth elephant_2015_conference_orbitz_presentation
Orbitz fifth elephant_2015_conference_orbitz_presentationRaghu Kashyap
 
Big Data Analytics from a Practitioners View
Big Data Analytics from a Practitioners ViewBig Data Analytics from a Practitioners View
Big Data Analytics from a Practitioners ViewRaghu Kashyap
 
Accelerate 2012 chicago - orbitz
Accelerate   2012 chicago - orbitzAccelerate   2012 chicago - orbitz
Accelerate 2012 chicago - orbitzRaghu Kashyap
 
Gartner peer forum sept 2011 orbitz
Gartner peer forum sept 2011   orbitzGartner peer forum sept 2011   orbitz
Gartner peer forum sept 2011 orbitzRaghu Kashyap
 
Web analyticsandbigdata techweek2011
Web analyticsandbigdata techweek2011Web analyticsandbigdata techweek2011
Web analyticsandbigdata techweek2011Raghu Kashyap
 

More from Raghu Kashyap (9)

Agile 2017 Lean Product Development
Agile 2017 Lean Product DevelopmentAgile 2017 Lean Product Development
Agile 2017 Lean Product Development
 
Is BI/Analytics and Agile an Oxymoron?
Is BI/Analytics and Agile an Oxymoron?Is BI/Analytics and Agile an Oxymoron?
Is BI/Analytics and Agile an Oxymoron?
 
Traditional BI or Disruptive BI?
Traditional BI or Disruptive BI?Traditional BI or Disruptive BI?
Traditional BI or Disruptive BI?
 
Idiots guide to stocks
Idiots guide to stocksIdiots guide to stocks
Idiots guide to stocks
 
Orbitz fifth elephant_2015_conference_orbitz_presentation
Orbitz fifth elephant_2015_conference_orbitz_presentationOrbitz fifth elephant_2015_conference_orbitz_presentation
Orbitz fifth elephant_2015_conference_orbitz_presentation
 
Big Data Analytics from a Practitioners View
Big Data Analytics from a Practitioners ViewBig Data Analytics from a Practitioners View
Big Data Analytics from a Practitioners View
 
Accelerate 2012 chicago - orbitz
Accelerate   2012 chicago - orbitzAccelerate   2012 chicago - orbitz
Accelerate 2012 chicago - orbitz
 
Gartner peer forum sept 2011 orbitz
Gartner peer forum sept 2011   orbitzGartner peer forum sept 2011   orbitz
Gartner peer forum sept 2011 orbitz
 
Web analyticsandbigdata techweek2011
Web analyticsandbigdata techweek2011Web analyticsandbigdata techweek2011
Web analyticsandbigdata techweek2011
 

Recently uploaded

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 

Recently uploaded (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 

Big Data redefines Enterprise Data Warehouse @Bangalore

  • 1. Big Data redefines Enterprise Data Warehouse Big Data Innovation, Unicom - Bangalore February 2013 Raghu Kashyap
  • 2. About Raghu Kashyap Personal ■ Director – Data Insights Group @ Orbitz Worldwide ■ eMail: raghu.kashyap@orbitz.com ■ Twitter: @ragskashyap ■ Blog: http://kashyaps.com ■ LinkedIn: http://www.linkedin.com/in/raghukashyap/ Areas of Responsibility ■ Orbitz Services Bangalore Center Head ■ Lead Big Data team that builds out Global Data Infrastructure for Orbitz Worldwide and provides business insights. ■ US, Europe, Australia(APAC) page 2
  • 3. Orbitz in a nutshell page 3
  • 5. Back to the future. page 5
  • 6. Vendor evaluation • KARMAsphere • Datameer • Aster Data page 6
  • 7. Traditional Data warehouse Greenplum Raw logs ETL Staging table Temp tables ETL Data Mart page 7
  • 9. Redefine Enterprise Data warehouse ETL only approach 2:12 seconds Run map reduce job 1m 14.298s Port flat file to Greenplum using GP connector Time: 5.077 s page 9
  • 10. Approach with Hadoop and ETL Raw Greenplum logs Event Model Map Reduce ETL Flat files GP Connector External Tables page 10
  • 11. Resolving database keys tag_value_dim Greenplum id tag value tag_value_dim 1 pos ORB id tag value 2 pos ORBC 200 pos ORB 3 pos ORB 157 pos ORBC ETL fact fact id tag value id fact id Tag value id value value 200 $ 5600 1 $ 5600 200 $ 7500 3 $ 7500 page 11
  • 12. Hadoop Configuration • 74 Nodes • >1PB • Hive • Flume • HBase •R • Cloudera Distribution • Greenplum Connector page 12
  • 13. Hadoop Applications Site Analytics Machine Learning Multi Variate Testing Analysis Production Logs Hotel Rate Cache TTL page 13
  • 15. Business Performance Monitoring • EFX • Marketing channels • Shopper patterns • Recommendation Module page 15
  • 17. MVT Analyze behavioral and Test data from our MVT testing page 17
  • 18. Lessons Learnt  Analytics using Big Data comes with a price.  Data Governance  Senior Leadership buy in  I can't tell you the key to success, but the key to failure is trying to please everyone." -Ed Sheeran page 18