SlideShare a Scribd company logo
Big Data 2.0 
Milwaukee Big Data Users Group 
12.1.2014
DBMS Technology Overview 
Goal 
• Provide a technology recommendation for 
serving reporting needs for the next 3 – 5 
years 
• Explore different technologies for their 
suitability for a strategic reporting data 
platform
DBMS Technology Overview 
Vendor agnostic approach 
• Vendor agnostic DBMS technologies evaluated 
– Categories 
• RDBMS 
– Row based vs column based 
• In-memory data base 
– Row based vs column based 
• NoSQL 
– Document based (Disk) 
– Key value based (IMDG) 
– Graph based (IMDG) 
– Criteria 
• Overall design 
• Pros/cons
DBMS Technology Overview 
Representative vendor evaluation criteria 
• …followed by quick evaluation of two vendors 
representing each technology 
– Thought leadership 
– Market share / # of production customers 
– Capacity / scalability 
– Functionality 
– Expertise availability 
– Resilience 
– Cost (license, infrastructure & expertise) 
– Interface compatibility (drop-in-ability)
DBMS Technology Overview 
Open Discussion on drop-in-ability 
• Re-tooling interfaces is expensive 
– Focus is on query/reporting tools (in my evaluation) 
– List of possible solutions drastically reduced by this 
criterion 
– SQL compatibility (very important syntactic sugar) 
– ACID compliance (dual use technology for OLTP needs) 
• A cost-effective, performant, resilient solution 
that requires interface re-tooling is DOA for my 
client’s environment
Striking phrases 
• Disk is the new tape, memory is the new disk 
• IMDG’s are increasingly being referred to as 
Big Data 2.0
RDBMS 
row based 
• OLAP needs typically serviced by partitioning (row & 
column) 
• 30 years old (proven technology) 
• IMDB implementations typically have same pros/cons, 
although cost and performance characteristics are different 
• Pros 
Row-based 
Data Cols Time Location Product Vendor 
Block 1 
2/23 0900 IL023 Gown112 ML 
Block 2 
2/23 0423 OH12 Mask221 123 
Block3 
2/24 1543 CN881 Swab993 456 
– Great OLTP performance 
– Efficient at whole-row operations 
• Cons 
– Inefficient at data set operations 
– Scalability is typically not linear
RDBMS 
column based 
• Optimized for OLAP needs as it’s optimized to answer questions on data 
characteristics 
• Great performance on aggregate functions (avg, count, sum, min, max) 
• IMDB implementations typically have same pros/cons, although cost and 
performance characteristics are different 
• Pros 
Column-based 
Block 0 Time 2/23 0900 2/23 0423 2/24 1543 
Block 1 
Location IL023 OH12 CN881 
Block 2 
Product Gown112 Mask221 Swab993 
Block3 
Vendor ML 123 456 
– Aggregate functions are very fast as entire column can be fetched quickly 
– Efficient at data set operations 
– Easily compressed, especially for data that is sparsely populated 
• Cons 
– Inefficient at retrieving many columns of a single row 
– Row functions are slower
NoSQL 
document/XML based 
• Focus is typically on sharding strategy as opposed to up-front data modeling 
(models typically evolve greatly during construction) 
• Similar to key-value stores, where values are stored in a standardized structure 
(although document stores keep metadata as well) 
• An example of data in a document database: 
– {officeName:”3Pillar Noida”, 
{Street: “B-25, City:”Noida”, State:”UP”, Pincode:”201301”} 
} 
– {officeName:”3Pillar Timisoara”, 
{Boulevard:”Coriolan Brediceanu No. 10”, Block:”B, Ist Floor”, City: “Timisoara”, Pincode: 
300011”} 
} 
– {officeName:”3Pillar Cluj”, 
{Latitude:”40.748328”, Longitude:”-73.985560”} 
} 
• Pros 
– Not limited to querying by keys (can query inside documents using JSON/XML query 
mechanisms) 
– Maps well to semi-structured or variable structured data 
• Cons 
– Sharding strategy can be challenging 
– Doesn’t support relations (no RI), as opposed to key value or graph stores
NoSQL 
IMDG (common to key-value / graph) 
• IMDG’s referred to as “Big Data 2.0” 
• Host data in memory and distribute across cluster of commodity 
servers 
• Employ an object-oriented data model that provides read/write 
times << 1 ms 
• As data is stored in virtual memory pool, parallel data computations 
are easily performed 
• As in document databases, focus is on sharding strategy as opposed 
to up-front physical data modeling 
• Majority of implementations utilize JVM’s (although a handful of 
.Net are out there) 
• GC, specifically the unpredictability of GC, is a major concern 
– Vendors utilize off-heap storage to alleviate this by moving LRU data to 
off-heap JVM’s, relying on high-speed messaging for data transport
NoSQL 
IMDG (key-value) 
• Typically stored as a set of distributable maps 
• Pros 
– Data distribution is designed from the ground up 
– Keys and values are Java (or .Net) objects 
– No bias between OLTP and OLAP 
• Cons 
– Alternate lookup mechanisms require a map with an 
alternate key (although main data payload can be 
shared as values are objects that support multiple 
pointers) 
– Expertise is typically harder to find (on characteristics 
of memory structure behavior at larger sizes)
NoSQL 
IMDG (graph) 
• Allow a set of nodes (object instances) with dynamic 
properties (cols/attributes) to be arbitrarily linked to 
other nodes through edges (associations) 
• Each node only knows its adjacent nodes 
• As the number of nodes increases, cost of a local hop 
remains constant 
• Whereas a RDBMS is optimized for aggregation, a 
graph database is optimized for connections 
• Fastest growth area in NoSQL in the last year – 250%
NoSQL 
IMDG (graph cont’d) 
• 60% of Facebook graph is hosted on one instance 
of Neo4J 
• Pros 
– Powerful general purpose (reusable) data model 
– Connected data locally indexed 
– Easy to query 
– Optimized for recursive structures (think BoM) 
– Great at use cases with complex relationships (supply 
chain management) 
• Cons 
– Sharding strategy is difficult 
– Requires re-wiring your brain (think object model 
instead of data model)
Particular vendor evaluations 
<<Vendor evaluation.xls>>
Recap 
• In addition to normal criteria (scalability, 
functionality, cost, etc.), drop-in-ability should 
be considered as well 
• Niche-technologies are available for more 
mainstream use cases, due to falling hardware 
prices
Questions/Comments 
?
Thank you 
… for your time 
Michael Vogt 
Director, Data Management 
NVISIA 
mvogt@nvisia.com 
(o) 414.347.1303 or 312.985.8100 
(c) 312.772.4762

More Related Content

Similar to Big Data 2.0 - Milwaukee Big Data User Group Presentation

Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015 Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Vladi Vexler
 
Nosql data models
Nosql data modelsNosql data models
Nosql data models
Viet-Trung TRAN
 
Government GraphSummit: Leveraging Knowledge Graphs for Foundational Intellig...
Government GraphSummit: Leveraging Knowledge Graphs for Foundational Intellig...Government GraphSummit: Leveraging Knowledge Graphs for Foundational Intellig...
Government GraphSummit: Leveraging Knowledge Graphs for Foundational Intellig...
Neo4j
 
Architecting Database by Jony Sugianto (Detik.com)
Architecting Database by Jony Sugianto (Detik.com)Architecting Database by Jony Sugianto (Detik.com)
Architecting Database by Jony Sugianto (Detik.com)
Tech in Asia ID
 
Using SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS CubesUsing SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS Cubes
Code Mastery
 
Chapter 10 System Architecture.Information Technology Project Management pptx
Chapter 10 System Architecture.Information Technology Project Management pptxChapter 10 System Architecture.Information Technology Project Management pptx
Chapter 10 System Architecture.Information Technology Project Management pptx
AxmedMaxamuudYoonis
 
NoSQL Fundamentals PowerPoint Presentation
NoSQL Fundamentals PowerPoint PresentationNoSQL Fundamentals PowerPoint Presentation
NoSQL Fundamentals PowerPoint Presentation
AnweshMishra21
 
Spatial Master Data Management: Enterprise-level Spatial Information Architec...
Spatial Master Data Management: Enterprise-level Spatial Information Architec...Spatial Master Data Management: Enterprise-level Spatial Information Architec...
Spatial Master Data Management: Enterprise-level Spatial Information Architec...
Safe Software
 
MySQL Visual Analysis and Scale-out Strategy definition - Webinar deck
MySQL Visual Analysis and Scale-out Strategy definition - Webinar deckMySQL Visual Analysis and Scale-out Strategy definition - Webinar deck
MySQL Visual Analysis and Scale-out Strategy definition - Webinar deck
Vladi Vexler
 
chapter10-120827115414-phpapp02.pdf
chapter10-120827115414-phpapp02.pdfchapter10-120827115414-phpapp02.pdf
chapter10-120827115414-phpapp02.pdf
AxmedMaxamuud6
 
Logical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business OutcomesLogical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business Outcomes
Denodo
 
Big iron 2 (published)
Big iron 2 (published)Big iron 2 (published)
Big iron 2 (published)
Ben Stopford
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture Overview
Christopher Foot
 
Session 1 Introduction to NoSQL.pptx
 Session 1 Introduction to NoSQL.pptx Session 1 Introduction to NoSQL.pptx
Session 1 Introduction to NoSQL.pptx
Asst.prof M.Gokilavani
 
Introduction to asdfghjkln b vfgh n v
Introduction to asdfghjkln b vfgh n    vIntroduction to asdfghjkln b vfgh n    v
Introduction to asdfghjkln b vfgh n v
23mz02
 
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degreeThe UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
Pradeeban Kathiravelu, Ph.D.
 
No sql bigdata and postgresql
No sql bigdata and postgresqlNo sql bigdata and postgresql
No sql bigdata and postgresql
Zaid Shabbir
 
Types of Database Models
Types of Database ModelsTypes of Database Models
Types of Database Models
Murassa Gillani
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
Saikiran Panjala
 
North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911
Ines Sombra
 

Similar to Big Data 2.0 - Milwaukee Big Data User Group Presentation (20)

Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015 Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
 
Nosql data models
Nosql data modelsNosql data models
Nosql data models
 
Government GraphSummit: Leveraging Knowledge Graphs for Foundational Intellig...
Government GraphSummit: Leveraging Knowledge Graphs for Foundational Intellig...Government GraphSummit: Leveraging Knowledge Graphs for Foundational Intellig...
Government GraphSummit: Leveraging Knowledge Graphs for Foundational Intellig...
 
Architecting Database by Jony Sugianto (Detik.com)
Architecting Database by Jony Sugianto (Detik.com)Architecting Database by Jony Sugianto (Detik.com)
Architecting Database by Jony Sugianto (Detik.com)
 
Using SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS CubesUsing SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS Cubes
 
Chapter 10 System Architecture.Information Technology Project Management pptx
Chapter 10 System Architecture.Information Technology Project Management pptxChapter 10 System Architecture.Information Technology Project Management pptx
Chapter 10 System Architecture.Information Technology Project Management pptx
 
NoSQL Fundamentals PowerPoint Presentation
NoSQL Fundamentals PowerPoint PresentationNoSQL Fundamentals PowerPoint Presentation
NoSQL Fundamentals PowerPoint Presentation
 
Spatial Master Data Management: Enterprise-level Spatial Information Architec...
Spatial Master Data Management: Enterprise-level Spatial Information Architec...Spatial Master Data Management: Enterprise-level Spatial Information Architec...
Spatial Master Data Management: Enterprise-level Spatial Information Architec...
 
MySQL Visual Analysis and Scale-out Strategy definition - Webinar deck
MySQL Visual Analysis and Scale-out Strategy definition - Webinar deckMySQL Visual Analysis and Scale-out Strategy definition - Webinar deck
MySQL Visual Analysis and Scale-out Strategy definition - Webinar deck
 
chapter10-120827115414-phpapp02.pdf
chapter10-120827115414-phpapp02.pdfchapter10-120827115414-phpapp02.pdf
chapter10-120827115414-phpapp02.pdf
 
Logical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business OutcomesLogical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business Outcomes
 
Big iron 2 (published)
Big iron 2 (published)Big iron 2 (published)
Big iron 2 (published)
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture Overview
 
Session 1 Introduction to NoSQL.pptx
 Session 1 Introduction to NoSQL.pptx Session 1 Introduction to NoSQL.pptx
Session 1 Introduction to NoSQL.pptx
 
Introduction to asdfghjkln b vfgh n v
Introduction to asdfghjkln b vfgh n    vIntroduction to asdfghjkln b vfgh n    v
Introduction to asdfghjkln b vfgh n v
 
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degreeThe UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
 
No sql bigdata and postgresql
No sql bigdata and postgresqlNo sql bigdata and postgresql
No sql bigdata and postgresql
 
Types of Database Models
Types of Database ModelsTypes of Database Models
Types of Database Models
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
 
North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911
 

More from NVISIA

Introduction to GoLang
Introduction to GoLangIntroduction to GoLang
Introduction to GoLang
NVISIA
 
The Evolution of Architecture
The Evolution of ArchitectureThe Evolution of Architecture
The Evolution of Architecture
NVISIA
 
Expected Result - A UX Story
Expected Result - A UX StoryExpected Result - A UX Story
Expected Result - A UX Story
NVISIA
 
Antifragile Teams
Antifragile TeamsAntifragile Teams
Antifragile Teams
NVISIA
 
Digital Operations Service Design
Digital Operations Service DesignDigital Operations Service Design
Digital Operations Service Design
NVISIA
 
Executive Briefing: The Why, What, and Where of Containers
Executive Briefing: The Why, What, and Where of ContainersExecutive Briefing: The Why, What, and Where of Containers
Executive Briefing: The Why, What, and Where of Containers
NVISIA
 
Strengthening Business/IT Relationships
Strengthening Business/IT RelationshipsStrengthening Business/IT Relationships
Strengthening Business/IT Relationships
NVISIA
 
Achieving Business Alignment
Achieving Business AlignmentAchieving Business Alignment
Achieving Business Alignment
NVISIA
 
Intro to AWS Machine Learning
Intro to AWS Machine LearningIntro to AWS Machine Learning
Intro to AWS Machine Learning
NVISIA
 
2015 DevOps Breakfast - DevOps in Action
2015 DevOps Breakfast - DevOps in Action2015 DevOps Breakfast - DevOps in Action
2015 DevOps Breakfast - DevOps in Action
NVISIA
 
DAMA Chicago - Ensuring your data lake doesn’t become a data swamp
DAMA Chicago - Ensuring your data lake doesn’t become a data swampDAMA Chicago - Ensuring your data lake doesn’t become a data swamp
DAMA Chicago - Ensuring your data lake doesn’t become a data swamp
NVISIA
 
Scaling the Lean Startup in the Enterprise
Scaling the Lean Startup in the EnterpriseScaling the Lean Startup in the Enterprise
Scaling the Lean Startup in the Enterprise
NVISIA
 
INNOVATION BLUEPRINTS FOR BIMODAL IT
INNOVATION BLUEPRINTS FOR BIMODAL ITINNOVATION BLUEPRINTS FOR BIMODAL IT
INNOVATION BLUEPRINTS FOR BIMODAL IT
NVISIA
 
Building a Data Talent Pipeline in Southeaster Wisconsin
Building a Data Talent Pipeline in Southeaster WisconsinBuilding a Data Talent Pipeline in Southeaster Wisconsin
Building a Data Talent Pipeline in Southeaster Wisconsin
NVISIA
 
12/2/2014 Milwaukee Agile Presentation: Persuading Your Oganization to be Agile
12/2/2014 Milwaukee Agile Presentation: Persuading Your Oganization to be Agile12/2/2014 Milwaukee Agile Presentation: Persuading Your Oganization to be Agile
12/2/2014 Milwaukee Agile Presentation: Persuading Your Oganization to be Agile
NVISIA
 
NVISIA Mobile Trends Presentation
NVISIA Mobile Trends PresentationNVISIA Mobile Trends Presentation
NVISIA Mobile Trends Presentation
NVISIA
 

More from NVISIA (16)

Introduction to GoLang
Introduction to GoLangIntroduction to GoLang
Introduction to GoLang
 
The Evolution of Architecture
The Evolution of ArchitectureThe Evolution of Architecture
The Evolution of Architecture
 
Expected Result - A UX Story
Expected Result - A UX StoryExpected Result - A UX Story
Expected Result - A UX Story
 
Antifragile Teams
Antifragile TeamsAntifragile Teams
Antifragile Teams
 
Digital Operations Service Design
Digital Operations Service DesignDigital Operations Service Design
Digital Operations Service Design
 
Executive Briefing: The Why, What, and Where of Containers
Executive Briefing: The Why, What, and Where of ContainersExecutive Briefing: The Why, What, and Where of Containers
Executive Briefing: The Why, What, and Where of Containers
 
Strengthening Business/IT Relationships
Strengthening Business/IT RelationshipsStrengthening Business/IT Relationships
Strengthening Business/IT Relationships
 
Achieving Business Alignment
Achieving Business AlignmentAchieving Business Alignment
Achieving Business Alignment
 
Intro to AWS Machine Learning
Intro to AWS Machine LearningIntro to AWS Machine Learning
Intro to AWS Machine Learning
 
2015 DevOps Breakfast - DevOps in Action
2015 DevOps Breakfast - DevOps in Action2015 DevOps Breakfast - DevOps in Action
2015 DevOps Breakfast - DevOps in Action
 
DAMA Chicago - Ensuring your data lake doesn’t become a data swamp
DAMA Chicago - Ensuring your data lake doesn’t become a data swampDAMA Chicago - Ensuring your data lake doesn’t become a data swamp
DAMA Chicago - Ensuring your data lake doesn’t become a data swamp
 
Scaling the Lean Startup in the Enterprise
Scaling the Lean Startup in the EnterpriseScaling the Lean Startup in the Enterprise
Scaling the Lean Startup in the Enterprise
 
INNOVATION BLUEPRINTS FOR BIMODAL IT
INNOVATION BLUEPRINTS FOR BIMODAL ITINNOVATION BLUEPRINTS FOR BIMODAL IT
INNOVATION BLUEPRINTS FOR BIMODAL IT
 
Building a Data Talent Pipeline in Southeaster Wisconsin
Building a Data Talent Pipeline in Southeaster WisconsinBuilding a Data Talent Pipeline in Southeaster Wisconsin
Building a Data Talent Pipeline in Southeaster Wisconsin
 
12/2/2014 Milwaukee Agile Presentation: Persuading Your Oganization to be Agile
12/2/2014 Milwaukee Agile Presentation: Persuading Your Oganization to be Agile12/2/2014 Milwaukee Agile Presentation: Persuading Your Oganization to be Agile
12/2/2014 Milwaukee Agile Presentation: Persuading Your Oganization to be Agile
 
NVISIA Mobile Trends Presentation
NVISIA Mobile Trends PresentationNVISIA Mobile Trends Presentation
NVISIA Mobile Trends Presentation
 

Recently uploaded

一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
y3i0qsdzb
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
VyNguyen709676
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
taqyea
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 

Recently uploaded (20)

一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 

Big Data 2.0 - Milwaukee Big Data User Group Presentation

  • 1. Big Data 2.0 Milwaukee Big Data Users Group 12.1.2014
  • 2. DBMS Technology Overview Goal • Provide a technology recommendation for serving reporting needs for the next 3 – 5 years • Explore different technologies for their suitability for a strategic reporting data platform
  • 3. DBMS Technology Overview Vendor agnostic approach • Vendor agnostic DBMS technologies evaluated – Categories • RDBMS – Row based vs column based • In-memory data base – Row based vs column based • NoSQL – Document based (Disk) – Key value based (IMDG) – Graph based (IMDG) – Criteria • Overall design • Pros/cons
  • 4. DBMS Technology Overview Representative vendor evaluation criteria • …followed by quick evaluation of two vendors representing each technology – Thought leadership – Market share / # of production customers – Capacity / scalability – Functionality – Expertise availability – Resilience – Cost (license, infrastructure & expertise) – Interface compatibility (drop-in-ability)
  • 5. DBMS Technology Overview Open Discussion on drop-in-ability • Re-tooling interfaces is expensive – Focus is on query/reporting tools (in my evaluation) – List of possible solutions drastically reduced by this criterion – SQL compatibility (very important syntactic sugar) – ACID compliance (dual use technology for OLTP needs) • A cost-effective, performant, resilient solution that requires interface re-tooling is DOA for my client’s environment
  • 6. Striking phrases • Disk is the new tape, memory is the new disk • IMDG’s are increasingly being referred to as Big Data 2.0
  • 7. RDBMS row based • OLAP needs typically serviced by partitioning (row & column) • 30 years old (proven technology) • IMDB implementations typically have same pros/cons, although cost and performance characteristics are different • Pros Row-based Data Cols Time Location Product Vendor Block 1 2/23 0900 IL023 Gown112 ML Block 2 2/23 0423 OH12 Mask221 123 Block3 2/24 1543 CN881 Swab993 456 – Great OLTP performance – Efficient at whole-row operations • Cons – Inefficient at data set operations – Scalability is typically not linear
  • 8. RDBMS column based • Optimized for OLAP needs as it’s optimized to answer questions on data characteristics • Great performance on aggregate functions (avg, count, sum, min, max) • IMDB implementations typically have same pros/cons, although cost and performance characteristics are different • Pros Column-based Block 0 Time 2/23 0900 2/23 0423 2/24 1543 Block 1 Location IL023 OH12 CN881 Block 2 Product Gown112 Mask221 Swab993 Block3 Vendor ML 123 456 – Aggregate functions are very fast as entire column can be fetched quickly – Efficient at data set operations – Easily compressed, especially for data that is sparsely populated • Cons – Inefficient at retrieving many columns of a single row – Row functions are slower
  • 9. NoSQL document/XML based • Focus is typically on sharding strategy as opposed to up-front data modeling (models typically evolve greatly during construction) • Similar to key-value stores, where values are stored in a standardized structure (although document stores keep metadata as well) • An example of data in a document database: – {officeName:”3Pillar Noida”, {Street: “B-25, City:”Noida”, State:”UP”, Pincode:”201301”} } – {officeName:”3Pillar Timisoara”, {Boulevard:”Coriolan Brediceanu No. 10”, Block:”B, Ist Floor”, City: “Timisoara”, Pincode: 300011”} } – {officeName:”3Pillar Cluj”, {Latitude:”40.748328”, Longitude:”-73.985560”} } • Pros – Not limited to querying by keys (can query inside documents using JSON/XML query mechanisms) – Maps well to semi-structured or variable structured data • Cons – Sharding strategy can be challenging – Doesn’t support relations (no RI), as opposed to key value or graph stores
  • 10. NoSQL IMDG (common to key-value / graph) • IMDG’s referred to as “Big Data 2.0” • Host data in memory and distribute across cluster of commodity servers • Employ an object-oriented data model that provides read/write times << 1 ms • As data is stored in virtual memory pool, parallel data computations are easily performed • As in document databases, focus is on sharding strategy as opposed to up-front physical data modeling • Majority of implementations utilize JVM’s (although a handful of .Net are out there) • GC, specifically the unpredictability of GC, is a major concern – Vendors utilize off-heap storage to alleviate this by moving LRU data to off-heap JVM’s, relying on high-speed messaging for data transport
  • 11. NoSQL IMDG (key-value) • Typically stored as a set of distributable maps • Pros – Data distribution is designed from the ground up – Keys and values are Java (or .Net) objects – No bias between OLTP and OLAP • Cons – Alternate lookup mechanisms require a map with an alternate key (although main data payload can be shared as values are objects that support multiple pointers) – Expertise is typically harder to find (on characteristics of memory structure behavior at larger sizes)
  • 12. NoSQL IMDG (graph) • Allow a set of nodes (object instances) with dynamic properties (cols/attributes) to be arbitrarily linked to other nodes through edges (associations) • Each node only knows its adjacent nodes • As the number of nodes increases, cost of a local hop remains constant • Whereas a RDBMS is optimized for aggregation, a graph database is optimized for connections • Fastest growth area in NoSQL in the last year – 250%
  • 13. NoSQL IMDG (graph cont’d) • 60% of Facebook graph is hosted on one instance of Neo4J • Pros – Powerful general purpose (reusable) data model – Connected data locally indexed – Easy to query – Optimized for recursive structures (think BoM) – Great at use cases with complex relationships (supply chain management) • Cons – Sharding strategy is difficult – Requires re-wiring your brain (think object model instead of data model)
  • 14. Particular vendor evaluations <<Vendor evaluation.xls>>
  • 15. Recap • In addition to normal criteria (scalability, functionality, cost, etc.), drop-in-ability should be considered as well • Niche-technologies are available for more mainstream use cases, due to falling hardware prices
  • 17. Thank you … for your time Michael Vogt Director, Data Management NVISIA mvogt@nvisia.com (o) 414.347.1303 or 312.985.8100 (c) 312.772.4762