Integrating RDBMS with Big Data V3.0 now with SPARK!

•Download as PPT, PDF•

1 like•367 views

Our company built a system mixing Big Data technologies (hadoop/ElasticSearch) along with SQL Server to make a system that is both highly scalable and cost effective. In this session I will discuss the reasons we went this route, the pros/cons of going down this path, I'll discuss moving from hive to spark and what we learned. We have been running our platform for 2+ years in the Big Data space and have lots of failures to share with others.

Technology

About Me
• 12+ years working with data
• 4 failed DW attempts
• 2 failed Data architectures
• Avid Volunteer
Blog: www.Sqlasylum.com
Twitter: @SQLAsylum
LinkedIn:
https://www.linkedin.com/in/patwright
Email: Sqlasylum@gmail.com
Speaker Rate Link:

What is Big Data?
• What is #BigData?
• Volume
• Variety
• Velocity
• Value
• Old Idea and concept new tools.
3
“Big data is like teen sex. Everybody is talking about
it, everyone thinks everyone else is doing it, so
everyone claims they are doing it.”
--Dan Ariely

Why are we talking about Big data
• Right Tool for the Job!
• DW took to long
• Cost too much
• Was not flexible
• Couldn’t scale without lots more $$$$$
4

• VOCI –Voice of the Customer Intelligence
• Manage and improve the customer experience to retain more customers
• SAAS based, must have a repeatable process for all customers.
Problem
• Slow site performance when querying data
• Not Scalable
• Repeatable except for large scale customers
• Dynamic SQL generating all the filtering/sql statements
• Querying from the read/Write OLTP store.

Apache Sqoop
Load _ES
(Custom Java DLL)
Transactions
(RDBMS)
RDBMS->HDFS
Transfers
Incremental Load
(Rebuilt each Cycle)
Search
HiveQL
(SQL-ish
ETL)
Internal
Workflow Control
Solution Architecture v 3 2013-2015

Transactions
(RDBMS)
Search
Topology
Apache Spark
Yarn
Solution Architecture v 4 2016-

Lesson Learned…why you really came to
this session
• Spark is Awesome… sort of
• Hbase is less than awesome.
• Scala is just a teeny bit faster than python…..According to Netflix
• Scala developers are hard to find.
• Memory is crucial for Spark
• Versions suck!
• Kafka is pretty awesome.
• This just in! Versions suck (ambari 2.4 is your best friend)

Questions?
•Sqlasylum@gmail.com
•www.sqlasylum.com
•Twitter: @sqlasylum

What's hot

Scottish Summit 2021 The Myth of a successful Teams rolloutThomas Gölles

ILTA 2017 - Culture of Collaboration: DevOpsBeauMersereau

Amazon alexaMacy Kirschenmann

What Libraries Stop DoingRebecca Jones

How to Audit a WebsiteLorraine Ball

Retailer conference and exhibition, download the sample presentation herebrion

From zero to cube in forty minutes (within a web browser)Francesco Milano

Infoventure presentation Elasticsearch meet up DianaGoebel

Kanban in PracticeRandy Johns

Integrating Confluence and JIRA Service Desk for Knowledge Management - Anna ...Atlassian

Liti Book Feb2016Phuc (Aaron) Dang

Moving Fast at ScaleRandy Shoup

Filling Your Freelance PipelineMichael Fellows

20220208Twin Cities ARMA Building Your Records Management PlaybookJesse Wilkins

Workshop: Search Managers BootcampAgnes Molnar

What's hot (15)

Scottish Summit 2021 The Myth of a successful Teams rollout

ILTA 2017 - Culture of Collaboration: DevOps

Amazon alexa

What Libraries Stop Doing

How to Audit a Website

Retailer conference and exhibition, download the sample presentation here

From zero to cube in forty minutes (within a web browser)

Infoventure presentation Elasticsearch meet up

Kanban in Practice

Integrating Confluence and JIRA Service Desk for Knowledge Management - Anna ...

Liti Book Feb2016

Moving Fast at Scale

Filling Your Freelance Pipeline

20220208Twin Cities ARMA Building Your Records Management Playbook

Workshop: Search Managers Bootcamp

Similar to Integrating RDBMS with Big Data V3.0 now with SPARK!

Big Data for Small BusinessesVivastream

20 Lead Optimization Ideas in 20 MinutesLeadiD

Top BI trends and predictions for 2017Panorama Software

[Webinar] Top Strategies for Successful Big Data ProjectsInfochimps, a CSC Big Data Business

CRC-STC May 2013 Summit Presentationcrcstc

DataCanvas: Big Data Analytic Flow in CloudLei Fang

Level Seven - Expedient Big Data presentationDoug Denton

Data modeling trends for AnalyticsIke Ellis

Agile Data WarehousingDavide Mauri

Case Study: "Making Sense of Data at Any Size"iMedia Connection

Geek Sync | Avoid the Seven Mistakes Data Modelers Make in Aiding Data Govern...IDERA Software

Business Centric Data ModelingDATAVERSITY

Making Better Decisions Through Data - H/IMA Keith Goode

#Techmeetupkz Askhat MurzabayevBerik Dossayev

Mastering your data with ca e rwin dm 09082010ERwin Modeling

Technical guidance in SaaS StartupsMalinda Kapuruge

Usama Fayyad talk in South Africa: From BigData to Data ScienceUsama Fayyad

Data Detectives - PresentationClint Campbell

Industrial Data ScienceNiko Vuokko

Self-Tuning MySQL - a Hosting Provider's Unfair AdvantageDeep Information Sciences

Similar to Integrating RDBMS with Big Data V3.0 now with SPARK! (20)

Big Data for Small Businesses

20 Lead Optimization Ideas in 20 Minutes

Top BI trends and predictions for 2017

[Webinar] Top Strategies for Successful Big Data Projects

CRC-STC May 2013 Summit Presentation

DataCanvas: Big Data Analytic Flow in Cloud

Level Seven - Expedient Big Data presentation

Data modeling trends for Analytics

Agile Data Warehousing

Case Study: "Making Sense of Data at Any Size"

Geek Sync | Avoid the Seven Mistakes Data Modelers Make in Aiding Data Govern...

Business Centric Data Modeling

Making Better Decisions Through Data - H/IMA

#Techmeetupkz Askhat Murzabayev

Mastering your data with ca e rwin dm 09082010

Technical guidance in SaaS Startups

Usama Fayyad talk in South Africa: From BigData to Data Science

Data Detectives - Presentation

Industrial Data Science

Self-Tuning MySQL - a Hosting Provider's Unfair Advantage

Recently uploaded

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

GenCyber Cyber Security Day PresentationMichael W. Hawkins

SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

Understanding the Laravel MVC ArchitecturePixlogix Infotech

CloudStudio User manual (basic edition):comworks

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix

Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren

Recently uploaded (20)

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi

Presentation on how to chat with PDF using ChatGPT code interpreter

GenCyber Cyber Security Day Presentation

SQL Database Design For Developers at php[tek] 2024

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service

08448380779 Call Girls In Civil Lines Women Seeking Men

08448380779 Call Girls In Friends Colony Women Seeking Men

Breaking the Kubernetes Kill Chain: Host Path Mount

Understanding the Laravel MVC Architecture

CloudStudio User manual (basic edition):

The Codex of Business Writing Software for Real-World Solutions 2.pptx

Swan(sea) Song – personal research during my six years at Swansea ... and bey...

Azure Monitor & Application Insight to monitor Infrastructure & Application

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...

Injustice - Developers Among Us (SciFiDevCon 2024)

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

Advanced Test Driven-Development @ php[tek] 2024

Integrating RDBMS with Big Data V3.0 now with SPARK!

1. The RDBMS and DW Blender. Pat Wright

2. About Me • 12+ years working with data • 4 failed DW attempts • 2 failed Data architectures • Avid Volunteer Blog: www.Sqlasylum.com Twitter: @SQLAsylum LinkedIn: https://www.linkedin.com/in/patwright Email: Sqlasylum@gmail.com Speaker Rate Link:

3. What is Big Data? • What is #BigData? • Volume • Variety • Velocity • Value • Old Idea and concept new tools. 3 “Big data is like teen sex. Everybody is talking about it, everyone thinks everyone else is doing it, so everyone claims they are doing it.” --Dan Ariely

4. Why are we talking about Big data • Right Tool for the Job! • DW took to long • Cost too much • Was not flexible • Couldn’t scale without lots more $$$$$ 4

5. • VOCI –Voice of the Customer Intelligence • Manage and improve the customer experience to retain more customers • SAAS based, must have a repeatable process for all customers. Problem • Slow site performance when querying data • Not Scalable • Repeatable except for large scale customers • Dynamic SQL generating all the filtering/sql statements • Querying from the read/Write OLTP store.

6. Architecture v1 – 2008-2013

7. Architecture v2 – 2013-2013 Fail

8. Apache Sqoop Load _ES (Custom Java DLL) Transactions (RDBMS) RDBMS->HDFS Transfers Incremental Load (Rebuilt each Cycle) Search HiveQL (SQL-ish ETL) Internal Workflow Control Solution Architecture v 3 2013-2015

9. Transactions (RDBMS) Search Topology Apache Spark Yarn Solution Architecture v 4 2016-

10. Lesson Learned…why you really came to this session • Spark is Awesome… sort of • Hbase is less than awesome. • Scala is just a teeny bit faster than python…..According to Netflix • Scala developers are hard to find. • Memory is crucial for Spark • Versions suck! • Kafka is pretty awesome. • This just in! Versions suck (ambari 2.4 is your best friend)

11. Questions? •Sqlasylum@gmail.com •www.sqlasylum.com •Twitter: @sqlasylum

Editor's Notes

Great Explanation of Map Reduce http://ksat.me/map-reduce-a-really-simple-introduction-kloudo/ Dan Ariely quote taken from. Read more: http://medcitynews.com/2013/11/big-data-like-teen-sex-memorable-quotes-digital-health-innovation-summit/#ixzz35bDHaMv7
Various Data coming in, Normalized DB lots of tables/Procedures Read/Write all in one place. Steps to make it faster Lots of indexes to support the reporting platform.
Cons to replication Overhead with replication Maintenance aspect with monthly releases. Changes to production systems that would be needed (tables without PK) Dependency on SQL Server and licensing costs. Cons to cubes Cost Scalability Time to process/delay/hardware costs

Integrating RDBMS with Big Data V3.0 now with SPARK!

Recommended

Recommended

More Related Content

What's hot

What's hot (15)

Similar to Integrating RDBMS with Big Data V3.0 now with SPARK!

Similar to Integrating RDBMS with Big Data V3.0 now with SPARK! (20)

Recently uploaded

Recently uploaded (20)

Integrating RDBMS with Big Data V3.0 now with SPARK!

Editor's Notes