SlideShare a Scribd company logo
1 of 11
The RDBMS and DW
Blender.
Pat Wright
About Me
• 12+ years working with data
• 4 failed DW attempts
• 2 failed Data architectures
• Avid Volunteer
Blog: www.Sqlasylum.com
Twitter: @SQLAsylum
LinkedIn:
https://www.linkedin.com/in/patwright
Email: Sqlasylum@gmail.com
Speaker Rate Link:
What is Big Data?
• What is #BigData?
• Volume
• Variety
• Velocity
• Value
• Old Idea and concept new tools.
3
“Big data is like teen sex. Everybody is talking about
it, everyone thinks everyone else is doing it, so
everyone claims they are doing it.”
--Dan Ariely
Why are we talking about Big data
• Right Tool for the Job!
• DW took to long
• Cost too much
• Was not flexible
• Couldn’t scale without lots more $$$$$
4
• VOCI –Voice of the Customer Intelligence
• Manage and improve the customer experience to retain more customers
• SAAS based, must have a repeatable process for all customers.
Problem
• Slow site performance when querying data
• Not Scalable
• Repeatable except for large scale customers
• Dynamic SQL generating all the filtering/sql statements
• Querying from the read/Write OLTP store.
Architecture v1 – 2008-2013
Architecture v2 – 2013-2013 Fail
Apache Sqoop
Load _ES
(Custom Java DLL)
Transactions
(RDBMS)
RDBMS->HDFS
Transfers
Incremental Load
(Rebuilt each Cycle)
Search
HiveQL
(SQL-ish
ETL)
Internal
Workflow Control
Solution Architecture v 3 2013-2015
Transactions
(RDBMS)
Search
Topology
Apache Spark
Yarn
Solution Architecture v 4 2016-
Lesson Learned…why you really came to
this session
• Spark is Awesome… sort of
• Hbase is less than awesome.
• Scala is just a teeny bit faster than python…..According to Netflix
• Scala developers are hard to find.
• Memory is crucial for Spark
• Versions suck!
• Kafka is pretty awesome.
• This just in! Versions suck (ambari 2.4 is your best friend)
Questions?
•Sqlasylum@gmail.com
•www.sqlasylum.com
•Twitter: @sqlasylum

More Related Content

What's hot

Scottish Summit 2021 The Myth of a successful Teams rollout
Scottish Summit 2021 The Myth of a successful Teams rolloutScottish Summit 2021 The Myth of a successful Teams rollout
Scottish Summit 2021 The Myth of a successful Teams rolloutThomas Gölles
 
ILTA 2017 - Culture of Collaboration: DevOps
ILTA 2017 - Culture of Collaboration: DevOpsILTA 2017 - Culture of Collaboration: DevOps
ILTA 2017 - Culture of Collaboration: DevOpsBeauMersereau
 
What Libraries Stop Doing
What Libraries Stop DoingWhat Libraries Stop Doing
What Libraries Stop DoingRebecca Jones
 
How to Audit a Website
How to Audit a WebsiteHow to Audit a Website
How to Audit a WebsiteLorraine Ball
 
Retailer conference and exhibition, download the sample presentation here
Retailer conference and exhibition, download the sample presentation hereRetailer conference and exhibition, download the sample presentation here
Retailer conference and exhibition, download the sample presentation herebrion
 
From zero to cube in forty minutes (within a web browser)
From zero to cube in forty minutes (within a web browser)From zero to cube in forty minutes (within a web browser)
From zero to cube in forty minutes (within a web browser)Francesco Milano
 
Infoventure presentation Elasticsearch meet up
Infoventure presentation Elasticsearch meet up Infoventure presentation Elasticsearch meet up
Infoventure presentation Elasticsearch meet up DianaGoebel
 
Kanban in Practice
Kanban in PracticeKanban in Practice
Kanban in PracticeRandy Johns
 
Integrating Confluence and JIRA Service Desk for Knowledge Management - Anna ...
Integrating Confluence and JIRA Service Desk for Knowledge Management - Anna ...Integrating Confluence and JIRA Service Desk for Knowledge Management - Anna ...
Integrating Confluence and JIRA Service Desk for Knowledge Management - Anna ...Atlassian
 
Moving Fast at Scale
Moving Fast at ScaleMoving Fast at Scale
Moving Fast at ScaleRandy Shoup
 
Filling Your Freelance Pipeline
Filling Your Freelance PipelineFilling Your Freelance Pipeline
Filling Your Freelance PipelineMichael Fellows
 
20220208Twin Cities ARMA Building Your Records Management Playbook
20220208Twin Cities ARMA Building Your Records Management Playbook20220208Twin Cities ARMA Building Your Records Management Playbook
20220208Twin Cities ARMA Building Your Records Management PlaybookJesse Wilkins
 
Workshop: Search Managers Bootcamp
Workshop: Search Managers BootcampWorkshop: Search Managers Bootcamp
Workshop: Search Managers BootcampAgnes Molnar
 

What's hot (15)

Scottish Summit 2021 The Myth of a successful Teams rollout
Scottish Summit 2021 The Myth of a successful Teams rolloutScottish Summit 2021 The Myth of a successful Teams rollout
Scottish Summit 2021 The Myth of a successful Teams rollout
 
ILTA 2017 - Culture of Collaboration: DevOps
ILTA 2017 - Culture of Collaboration: DevOpsILTA 2017 - Culture of Collaboration: DevOps
ILTA 2017 - Culture of Collaboration: DevOps
 
Amazon alexa
Amazon alexaAmazon alexa
Amazon alexa
 
What Libraries Stop Doing
What Libraries Stop DoingWhat Libraries Stop Doing
What Libraries Stop Doing
 
How to Audit a Website
How to Audit a WebsiteHow to Audit a Website
How to Audit a Website
 
Retailer conference and exhibition, download the sample presentation here
Retailer conference and exhibition, download the sample presentation hereRetailer conference and exhibition, download the sample presentation here
Retailer conference and exhibition, download the sample presentation here
 
From zero to cube in forty minutes (within a web browser)
From zero to cube in forty minutes (within a web browser)From zero to cube in forty minutes (within a web browser)
From zero to cube in forty minutes (within a web browser)
 
Infoventure presentation Elasticsearch meet up
Infoventure presentation Elasticsearch meet up Infoventure presentation Elasticsearch meet up
Infoventure presentation Elasticsearch meet up
 
Kanban in Practice
Kanban in PracticeKanban in Practice
Kanban in Practice
 
Integrating Confluence and JIRA Service Desk for Knowledge Management - Anna ...
Integrating Confluence and JIRA Service Desk for Knowledge Management - Anna ...Integrating Confluence and JIRA Service Desk for Knowledge Management - Anna ...
Integrating Confluence and JIRA Service Desk for Knowledge Management - Anna ...
 
Liti Book Feb2016
Liti Book Feb2016Liti Book Feb2016
Liti Book Feb2016
 
Moving Fast at Scale
Moving Fast at ScaleMoving Fast at Scale
Moving Fast at Scale
 
Filling Your Freelance Pipeline
Filling Your Freelance PipelineFilling Your Freelance Pipeline
Filling Your Freelance Pipeline
 
20220208Twin Cities ARMA Building Your Records Management Playbook
20220208Twin Cities ARMA Building Your Records Management Playbook20220208Twin Cities ARMA Building Your Records Management Playbook
20220208Twin Cities ARMA Building Your Records Management Playbook
 
Workshop: Search Managers Bootcamp
Workshop: Search Managers BootcampWorkshop: Search Managers Bootcamp
Workshop: Search Managers Bootcamp
 

Similar to Integrating RDBMS with Big Data V3.0 now with SPARK!

Big Data for Small Businesses
Big Data for Small BusinessesBig Data for Small Businesses
Big Data for Small BusinessesVivastream
 
20 Lead Optimization Ideas in 20 Minutes
20 Lead Optimization Ideas in 20 Minutes20 Lead Optimization Ideas in 20 Minutes
20 Lead Optimization Ideas in 20 MinutesLeadiD
 
Top BI trends and predictions for 2017
Top BI trends and predictions for 2017Top BI trends and predictions for 2017
Top BI trends and predictions for 2017Panorama Software
 
CRC-STC May 2013 Summit Presentation
CRC-STC May 2013 Summit PresentationCRC-STC May 2013 Summit Presentation
CRC-STC May 2013 Summit Presentationcrcstc
 
DataCanvas: Big Data Analytic Flow in Cloud
DataCanvas: Big Data Analytic Flow in CloudDataCanvas: Big Data Analytic Flow in Cloud
DataCanvas: Big Data Analytic Flow in CloudLei Fang
 
Level Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentationLevel Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentationDoug Denton
 
Data modeling trends for Analytics
Data modeling trends for AnalyticsData modeling trends for Analytics
Data modeling trends for AnalyticsIke Ellis
 
Agile Data Warehousing
Agile Data WarehousingAgile Data Warehousing
Agile Data WarehousingDavide Mauri
 
Case Study: "Making Sense of Data at Any Size"
Case Study: "Making Sense of Data at Any Size"Case Study: "Making Sense of Data at Any Size"
Case Study: "Making Sense of Data at Any Size"iMedia Connection
 
Geek Sync | Avoid the Seven Mistakes Data Modelers Make in Aiding Data Govern...
Geek Sync | Avoid the Seven Mistakes Data Modelers Make in Aiding Data Govern...Geek Sync | Avoid the Seven Mistakes Data Modelers Make in Aiding Data Govern...
Geek Sync | Avoid the Seven Mistakes Data Modelers Make in Aiding Data Govern...IDERA Software
 
Business Centric Data Modeling
Business Centric Data ModelingBusiness Centric Data Modeling
Business Centric Data ModelingDATAVERSITY
 
Making Better Decisions Through Data - H/IMA
Making Better Decisions Through Data - H/IMA Making Better Decisions Through Data - H/IMA
Making Better Decisions Through Data - H/IMA Keith Goode
 
#Techmeetupkz Askhat Murzabayev
#Techmeetupkz Askhat Murzabayev#Techmeetupkz Askhat Murzabayev
#Techmeetupkz Askhat MurzabayevBerik Dossayev
 
Mastering your data with ca e rwin dm 09082010
Mastering your data with ca e rwin dm 09082010Mastering your data with ca e rwin dm 09082010
Mastering your data with ca e rwin dm 09082010ERwin Modeling
 
Technical guidance in SaaS Startups
Technical guidance in SaaS StartupsTechnical guidance in SaaS Startups
Technical guidance in SaaS StartupsMalinda Kapuruge
 
Usama Fayyad talk in South Africa: From BigData to Data Science
Usama Fayyad talk in South Africa:  From BigData to Data ScienceUsama Fayyad talk in South Africa:  From BigData to Data Science
Usama Fayyad talk in South Africa: From BigData to Data ScienceUsama Fayyad
 
Data Detectives - Presentation
Data Detectives - PresentationData Detectives - Presentation
Data Detectives - PresentationClint Campbell
 
Industrial Data Science
Industrial Data ScienceIndustrial Data Science
Industrial Data ScienceNiko Vuokko
 
Self-Tuning MySQL - a Hosting Provider's Unfair Advantage
Self-Tuning MySQL - a Hosting Provider's Unfair AdvantageSelf-Tuning MySQL - a Hosting Provider's Unfair Advantage
Self-Tuning MySQL - a Hosting Provider's Unfair AdvantageDeep Information Sciences
 

Similar to Integrating RDBMS with Big Data V3.0 now with SPARK! (20)

Big Data for Small Businesses
Big Data for Small BusinessesBig Data for Small Businesses
Big Data for Small Businesses
 
20 Lead Optimization Ideas in 20 Minutes
20 Lead Optimization Ideas in 20 Minutes20 Lead Optimization Ideas in 20 Minutes
20 Lead Optimization Ideas in 20 Minutes
 
Top BI trends and predictions for 2017
Top BI trends and predictions for 2017Top BI trends and predictions for 2017
Top BI trends and predictions for 2017
 
[Webinar] Top Strategies for Successful Big Data Projects
[Webinar] Top Strategies for Successful Big Data Projects[Webinar] Top Strategies for Successful Big Data Projects
[Webinar] Top Strategies for Successful Big Data Projects
 
CRC-STC May 2013 Summit Presentation
CRC-STC May 2013 Summit PresentationCRC-STC May 2013 Summit Presentation
CRC-STC May 2013 Summit Presentation
 
DataCanvas: Big Data Analytic Flow in Cloud
DataCanvas: Big Data Analytic Flow in CloudDataCanvas: Big Data Analytic Flow in Cloud
DataCanvas: Big Data Analytic Flow in Cloud
 
Level Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentationLevel Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentation
 
Data modeling trends for Analytics
Data modeling trends for AnalyticsData modeling trends for Analytics
Data modeling trends for Analytics
 
Agile Data Warehousing
Agile Data WarehousingAgile Data Warehousing
Agile Data Warehousing
 
Case Study: "Making Sense of Data at Any Size"
Case Study: "Making Sense of Data at Any Size"Case Study: "Making Sense of Data at Any Size"
Case Study: "Making Sense of Data at Any Size"
 
Geek Sync | Avoid the Seven Mistakes Data Modelers Make in Aiding Data Govern...
Geek Sync | Avoid the Seven Mistakes Data Modelers Make in Aiding Data Govern...Geek Sync | Avoid the Seven Mistakes Data Modelers Make in Aiding Data Govern...
Geek Sync | Avoid the Seven Mistakes Data Modelers Make in Aiding Data Govern...
 
Business Centric Data Modeling
Business Centric Data ModelingBusiness Centric Data Modeling
Business Centric Data Modeling
 
Making Better Decisions Through Data - H/IMA
Making Better Decisions Through Data - H/IMA Making Better Decisions Through Data - H/IMA
Making Better Decisions Through Data - H/IMA
 
#Techmeetupkz Askhat Murzabayev
#Techmeetupkz Askhat Murzabayev#Techmeetupkz Askhat Murzabayev
#Techmeetupkz Askhat Murzabayev
 
Mastering your data with ca e rwin dm 09082010
Mastering your data with ca e rwin dm 09082010Mastering your data with ca e rwin dm 09082010
Mastering your data with ca e rwin dm 09082010
 
Technical guidance in SaaS Startups
Technical guidance in SaaS StartupsTechnical guidance in SaaS Startups
Technical guidance in SaaS Startups
 
Usama Fayyad talk in South Africa: From BigData to Data Science
Usama Fayyad talk in South Africa:  From BigData to Data ScienceUsama Fayyad talk in South Africa:  From BigData to Data Science
Usama Fayyad talk in South Africa: From BigData to Data Science
 
Data Detectives - Presentation
Data Detectives - PresentationData Detectives - Presentation
Data Detectives - Presentation
 
Industrial Data Science
Industrial Data ScienceIndustrial Data Science
Industrial Data Science
 
Self-Tuning MySQL - a Hosting Provider's Unfair Advantage
Self-Tuning MySQL - a Hosting Provider's Unfair AdvantageSelf-Tuning MySQL - a Hosting Provider's Unfair Advantage
Self-Tuning MySQL - a Hosting Provider's Unfair Advantage
 

Recently uploaded

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

Integrating RDBMS with Big Data V3.0 now with SPARK!

  • 1. The RDBMS and DW Blender. Pat Wright
  • 2. About Me • 12+ years working with data • 4 failed DW attempts • 2 failed Data architectures • Avid Volunteer Blog: www.Sqlasylum.com Twitter: @SQLAsylum LinkedIn: https://www.linkedin.com/in/patwright Email: Sqlasylum@gmail.com Speaker Rate Link:
  • 3. What is Big Data? • What is #BigData? • Volume • Variety • Velocity • Value • Old Idea and concept new tools. 3 “Big data is like teen sex. Everybody is talking about it, everyone thinks everyone else is doing it, so everyone claims they are doing it.” --Dan Ariely
  • 4. Why are we talking about Big data • Right Tool for the Job! • DW took to long • Cost too much • Was not flexible • Couldn’t scale without lots more $$$$$ 4
  • 5. • VOCI –Voice of the Customer Intelligence • Manage and improve the customer experience to retain more customers • SAAS based, must have a repeatable process for all customers. Problem • Slow site performance when querying data • Not Scalable • Repeatable except for large scale customers • Dynamic SQL generating all the filtering/sql statements • Querying from the read/Write OLTP store.
  • 7. Architecture v2 – 2013-2013 Fail
  • 8. Apache Sqoop Load _ES (Custom Java DLL) Transactions (RDBMS) RDBMS->HDFS Transfers Incremental Load (Rebuilt each Cycle) Search HiveQL (SQL-ish ETL) Internal Workflow Control Solution Architecture v 3 2013-2015
  • 10. Lesson Learned…why you really came to this session • Spark is Awesome… sort of • Hbase is less than awesome. • Scala is just a teeny bit faster than python…..According to Netflix • Scala developers are hard to find. • Memory is crucial for Spark • Versions suck! • Kafka is pretty awesome. • This just in! Versions suck (ambari 2.4 is your best friend)

Editor's Notes

  1. Great Explanation of Map Reduce http://ksat.me/map-reduce-a-really-simple-introduction-kloudo/ Dan Ariely quote taken from. Read more: http://medcitynews.com/2013/11/big-data-like-teen-sex-memorable-quotes-digital-health-innovation-summit/#ixzz35bDHaMv7
  2. Various Data coming in, Normalized DB lots of tables/Procedures Read/Write all in one place. Steps to make it faster Lots of indexes to support the reporting platform.
  3. Cons to replication Overhead with replication Maintenance aspect with monthly releases. Changes to production systems that would be needed (tables without PK) Dependency on SQL Server and licensing costs. Cons to cubes Cost Scalability Time to process/delay/hardware costs