SlideShare a Scribd company logo
Big Data Strategy
for the Relational World
Embracing Disruption, Avoiding Regression
Andrew J. Brust
Founder & CEO, Blue Badge Insights
Big Data correspondent, ZDNet
Big Data Analyst, GigaOM Research
Bio
• CEO and Founder, Blue Badge Insights
• Big Data blogger for ZDNet
• Microsoft Regional Director, MVP
• Co-chair, Visual Studio Live! and 18 years as a speaker
• Founder, Microsoft BI User Group of NYC
– http://www.msbinyc.com
• Co-moderator, NYC .NET Developers Group
– http://www.nycdotnetdev.com
• “Redmond Review” columnist for
Visual Studio Magazine and Redmond Developer News
• Twitter: @andrewbrust
Andrew on ZDNet (bit.ly/bigondata)
Read all about it!
Big Data: Why Should You Care?
• Because analytics (i.e. BI) has always been
important, but it was expensive and obscure
• Because the economics of processing and
storage make Big Data feasible
Big Data: Why Should You be
Cautious?
• Too many vendors; too much churn
• Designed for the lab, not for mainstream
business
• Immature technology and tooling
– Results in serious recruiting and dev costs
• So, you can’t ignore Big Data, but you can’t
just pursue with abandon, either.
– That’s hard!
Agenda
• Trends
• Technologies
– NoSQL
– Hadoop
– SQL Convergence
– NewSQL
– In-Memory
• Forecasts
• Risks
• Recommendations
Database Trends
• Mongo and Cassandra, primarilyNoSQL
• aka “unstructured data”Late-bound schema
• Especially HDFSFile-based table handling
• And Massively Parallel ProcessingColumnar storage
• Very few throwing them awayCo-existence with RDBMS, OLAP
databases
• Still expect tables or cubesLittle change in tools/clients
NoSQL
Key-Value
Store
• Couchbase
• Riak
• Redis
• Voldemort
• DynamoDB
• Azure tables
Document
Store
• MongoDB
• CouchDB
• Cloudant
• Couchbase
Wide Column
Store
• HBase
• Cassandra
Graph
Database
• Neo4J
SQLSQL
Consistency
• CAP Theorem
–Databases may only excel at two of the following
three attributes: consistency, availability and partition
tolerance
• NoSQL does not offer “ACID” guarantees
–Atomicity, consistency, isolation and durability
• Instead offers “eventual consistency”
–Similar to DNS propagation
CAP Theorem
Consistency
Availability
Partition
Tolerance
Relational
NoSQL
NoSQL Upside
• Distributed by default
• Open source lets you peg costs to personnel,
more than to customers
• Developer enthusiasm
Hadoop
• Open source, petabyte-scale data analysis and
processing framework
• Runs on commodity hardware
• Lots of ecosystem
• Two main components:
– Hadoop Distributed File System (HDFS)
– MapReduce engine
Hadoop
• Open source, petabyte-scale data analysis and
processing framework
• Runs on commodity hardware
• Lots of ecosystem
• Two main components:
– Hadoop Distributed File System (HDFS)
– MapReduce engine
Why MapReduce is Cool
• Extremely flexible – full power of a procedural
programming language
• Map step, essentially, allows ad hoc ETL
• With Reduce step, aggregation is a first-class
concept
• Growing ecosystem of tools that generate
MapReduce code
Why MapReduce Sucks
• It’s a batch mode technology
• It’s not declarative
• Most BI products don’t work with MR natively
– They connect via Hive instead (by and large)
• It’s good for a group of use cases, but it’s not a
good general framework
The Google DNA
• Hadoop and HBase came from Google
– MapReduce, GFS
– BigTable
• Hadoop was built for their use cases, and they
don’t use it as extensively now
• So why is the world going Hadoop-crazy?
Benefits of Schema-Free
• Variable schema is accommodated
– Great for product catalogs, content management
and the like
• Simple for archival storage
• For analysis:
– Avoids politics of achieving consensus on
structure
– Allows different schema for different applications
Cloud Effect
• Database as a service and SaaS BI/Analytics gets
companies excited
– Cloudant
– Amazon: DynamoDB, RDS, RedShift, Jaspersoft
• Elastic capabilities of cloud provide small customers
with access to huge clusters
– Amazon EMR, Microsoft Windows Azure HDInsight now
– Google Compute Engine, Rackspace/Hortonworks to come
• Cloud-borne reference data adds value
• But casualties emerging: e.g. Xeround
SQL Skillset and Ecosystem
• Making recruiting faster and cheaper
DBAs, most devs know it
ORMs expect it
• Even if they also talk to MDX and NoSQL sources
Reporting/analysis tools are premised on it
Companies are invested in it
Abandoning it is naive
MPP is Big Data
(via acquisition)
• Acquired Aster DataTeradata
• IBMNetezza
• HPVertica
• EMCPivotal/Greenplum
• ActianParAccel
• Microsoft-DATAllegro acquisitionSQL Server Parallel Data
Warehouse
SQL – BD Convergence
• Brings the SQL language and data warehouse
products, on one side, together with Hadoop, on
the other
• Goal is to make Hadoop interactive, non-batch
• May involve Hive and its APIs
• May involve direct access to HDFS
– Bypassing MapReduce
• Think of the “database” as HDFS, and MapReduce
as merely an access method.
One Repository, Multiple Access
Methods
HCatalog
Cloudera Impala (v1.0 shipped April 30)
Hortonworks “Stinger” initiative
•Make Hive 100x faster
EMC Pivotal
Microsoft PolyBase, Data Explorer
Teradata Aster SQL-H
ParAccel (Actian) ODI
SQL – BD Convergence
NuoDB
VoltDB
Clustrix
TransLattice
NewSQL Entrants
Dremel and Drill
• Dremel is Google’s column store analytical database
– Proprietary; available publicly as BigQuery
• Hierarchical/nested too
– Allows schema variance without anarchy
• “…scales to thousands of CPUs and petabytes of data,
and has thousands of users at Google.”
• Uses SQL, has growing BI tool support
• Petabyte scale
• Drill:Dremel as Hadoop:MapReduce+GFS
• And then there’s Spanner
In-Memory
• SAP HANA
– And Sybase IQ
• Data Warehouse Appliances
• VoltDB
• Oracle TimesTen
• IBM solidDB
– Also TM1 (in-memory OLAP)
• Coming: SQL Server’s “Hekaton” engine
The Truth About In-Memory
• Judicious use of in-memory database technology can
speed analytical queries
– Combine with columnar technology, rinse, repeat
• Can also eliminate need for deferred writes
• A RAM-only strategy like HANA’s seems impractical
• Keep in mind:
– SSD is memory too. It’s slower, but it’s memory.
– Conversely, L1, L2 and L3 cache is faster than RAM. Single
Instruction, Multiple Data (SIMD) makes things faster still.
• Hybrid approaches are most sensible
What’s Ahead?
• Consolidation! We can’t have this many vendors:
– Some will go out of business
– Some will get acquired
– A few will stay independent (but may merge with each
other)
• Hadoop recedes into the service layer
• NoSQL shakes out, matures, coexists
• NewSQL gets adopted or acquired
• In-memory becomes a standard option
Risks and Considerations
• Pick an esoteric database now and you may be
forced to migrate later
• SQL Server and Oracle could add features that
make the specialty products superfluous
– Or new products
• Conversely, NoSQL products may acquire
ACID-like features themselves
• More convergence
Recommendations
• NoSQL has its use cases. But it also has its
abuses.
• Look carefully at the number of customers
• Look also at how widely deployed the product
is within those customer companies
Recommendations
• If you haven’t looked seriously at Hadoop, do so.
But remember, it’s infrastructure.
• You can reach out to Big Data now, or you can
wait for it to reach out to you
– Cost/benefit of earlier adoption vs. late following
• For repeatable big problems, MapReduce works
well; for iterative query, “SQL” technologies are
much better
– akin to standard reports versus ad hoc queries
Parting Thoughts
• NoSQL and Big Data are disruptive
• You ignore them at your peril
• But if they can’t, ultimately, blend into current
technology environments then they’re
destined to fail
• You can embrace the change without being
sacrificed. Just watch your back.
Thank You!
• Email
• andrew.brust@bluebadgeinsights.com
• Blog:
• http://www.zdnet.com/blog/big-data
• Twitter
• @andrewbrust on twitter

More Related Content

What's hot

Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
Andrew Brust
 
Cloud Computing and the Microsoft Developer - A Down-to-Earth Analysis
Cloud Computing and the Microsoft Developer - A Down-to-Earth AnalysisCloud Computing and the Microsoft Developer - A Down-to-Earth Analysis
Cloud Computing and the Microsoft Developer - A Down-to-Earth Analysis
Andrew Brust
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
James Serra
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
Andrew Brust
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7
abdulrahmanhelan
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational Databases
Udi Bauman
 
Relational vs. Non-Relational
Relational vs. Non-RelationalRelational vs. Non-Relational
Relational vs. Non-Relational
PostgreSQL Experts, Inc.
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big data
Steven Francia
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
Tony Tam
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
Dan Gunter
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
Don Demcsak
 
RDBMS vs NoSQL
RDBMS vs NoSQLRDBMS vs NoSQL
RDBMS vs NoSQL
Murat Çakal
 
Rdbms vs. no sql
Rdbms vs. no sqlRdbms vs. no sql
Rdbms vs. no sql
Amar Jagdale
 
NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and How
BigBlueHat
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Management
sameerfaizan
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP Theorem
Rahul Jain
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data Architecture
Venu Anuganti
 
A Practical Look at the NOSQL and Big Data Hullabaloo
A Practical Look at the NOSQL and Big Data HullabalooA Practical Look at the NOSQL and Big Data Hullabaloo
A Practical Look at the NOSQL and Big Data Hullabaloo
Andrew Brust
 
Evolved BI with SQL Server 2012
Evolved BIwith SQL Server 2012Evolved BIwith SQL Server 2012
Evolved BI with SQL Server 2012
Andrew Brust
 
SQL vs NoSQL: Big Data Adoption & Success in the Enterprise
SQL vs NoSQL: Big Data Adoption & Success in the EnterpriseSQL vs NoSQL: Big Data Adoption & Success in the Enterprise
SQL vs NoSQL: Big Data Adoption & Success in the Enterprise
Anita Luthra
 

What's hot (20)

Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
 
Cloud Computing and the Microsoft Developer - A Down-to-Earth Analysis
Cloud Computing and the Microsoft Developer - A Down-to-Earth AnalysisCloud Computing and the Microsoft Developer - A Down-to-Earth Analysis
Cloud Computing and the Microsoft Developer - A Down-to-Earth Analysis
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational Databases
 
Relational vs. Non-Relational
Relational vs. Non-RelationalRelational vs. Non-Relational
Relational vs. Non-Relational
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big data
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
RDBMS vs NoSQL
RDBMS vs NoSQLRDBMS vs NoSQL
RDBMS vs NoSQL
 
Rdbms vs. no sql
Rdbms vs. no sqlRdbms vs. no sql
Rdbms vs. no sql
 
NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and How
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Management
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP Theorem
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data Architecture
 
A Practical Look at the NOSQL and Big Data Hullabaloo
A Practical Look at the NOSQL and Big Data HullabalooA Practical Look at the NOSQL and Big Data Hullabaloo
A Practical Look at the NOSQL and Big Data Hullabaloo
 
Evolved BI with SQL Server 2012
Evolved BIwith SQL Server 2012Evolved BIwith SQL Server 2012
Evolved BI with SQL Server 2012
 
SQL vs NoSQL: Big Data Adoption & Success in the Enterprise
SQL vs NoSQL: Big Data Adoption & Success in the EnterpriseSQL vs NoSQL: Big Data Adoption & Success in the Enterprise
SQL vs NoSQL: Big Data Adoption & Success in the Enterprise
 

Similar to Big Data Strategy for the Relational World

50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
DataWorks Summit
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
Zohar Elkayam
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
DataWorks Summit
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
Seeling Cheung
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
Inside Analysis
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
Dremio Corporation
 
Big data and mstr bridge the elephant
Big data and mstr   bridge the elephantBig data and mstr   bridge the elephant
Big data and mstr bridge the elephant
Kognitio
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
Adaryl "Bob" Wakefield, MBA
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
Amir Shaikh
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
Abhishek Roy
 
Beyond TCO
Beyond TCOBeyond TCO
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
BigDataEverywhere
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
Prashanth Yennampelli
 
Practical introduction to hadoop
Practical introduction to hadoopPractical introduction to hadoop
Practical introduction to hadoop
inside-BigData.com
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
Rittman Analytics
 
Moving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from PivotalMoving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from Pivotal
VMware Tanzu Korea
 
Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem
Things Every Oracle DBA Needs to Know about the Hadoop EcosystemThings Every Oracle DBA Needs to Know about the Hadoop Ecosystem
Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem
Zohar Elkayam
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
N Masahiro
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in London
Dremio Corporation
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
Caserta
 

Similar to Big Data Strategy for the Relational World (20)

50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
 
Big data and mstr bridge the elephant
Big data and mstr   bridge the elephantBig data and mstr   bridge the elephant
Big data and mstr bridge the elephant
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
Beyond TCO
Beyond TCOBeyond TCO
Beyond TCO
 
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Practical introduction to hadoop
Practical introduction to hadoopPractical introduction to hadoop
Practical introduction to hadoop
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
 
Moving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from PivotalMoving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from Pivotal
 
Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem
Things Every Oracle DBA Needs to Know about the Hadoop EcosystemThings Every Oracle DBA Needs to Know about the Hadoop Ecosystem
Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in London
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 

More from Andrew Brust

Azure ml screen grabs
Azure ml screen grabsAzure ml screen grabs
Azure ml screen grabs
Andrew Brust
 
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stackBig Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Andrew Brust
 
Hadoop and its Ecosystem Components in Action
Hadoop and its Ecosystem Components in ActionHadoop and its Ecosystem Components in Action
Hadoop and its Ecosystem Components in Action
Andrew Brust
 
Brust hadoopecosystem
Brust hadoopecosystemBrust hadoopecosystem
Brust hadoopecosystem
Andrew Brust
 
SQL Server Workshop for Developers - Visual Studio Live! NY 2012
SQL Server Workshop for Developers - Visual Studio Live! NY 2012SQL Server Workshop for Developers - Visual Studio Live! NY 2012
SQL Server Workshop for Developers - Visual Studio Live! NY 2012
Andrew Brust
 
Power View: Analysis and Visualization for Your Application’s Data
Power View: Analysis and Visualization for Your Application’s DataPower View: Analysis and Visualization for Your Application’s Data
Power View: Analysis and Visualization for Your Application’s Data
Andrew Brust
 
Grasping The LightSwitch Paradigm
Grasping The LightSwitch ParadigmGrasping The LightSwitch Paradigm
Grasping The LightSwitch Paradigm
Andrew Brust
 
SQL Server Denali: BI on Your Terms
SQL Server Denali: BI on Your Terms SQL Server Denali: BI on Your Terms
SQL Server Denali: BI on Your Terms
Andrew Brust
 
Microsoft and its Competition: A Developer-Friendly Market Analysis
Microsoft and its Competition: A Developer-Friendly Market Analysis Microsoft and its Competition: A Developer-Friendly Market Analysis
Microsoft and its Competition: A Developer-Friendly Market Analysis
Andrew Brust
 

More from Andrew Brust (9)

Azure ml screen grabs
Azure ml screen grabsAzure ml screen grabs
Azure ml screen grabs
 
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stackBig Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
 
Hadoop and its Ecosystem Components in Action
Hadoop and its Ecosystem Components in ActionHadoop and its Ecosystem Components in Action
Hadoop and its Ecosystem Components in Action
 
Brust hadoopecosystem
Brust hadoopecosystemBrust hadoopecosystem
Brust hadoopecosystem
 
SQL Server Workshop for Developers - Visual Studio Live! NY 2012
SQL Server Workshop for Developers - Visual Studio Live! NY 2012SQL Server Workshop for Developers - Visual Studio Live! NY 2012
SQL Server Workshop for Developers - Visual Studio Live! NY 2012
 
Power View: Analysis and Visualization for Your Application’s Data
Power View: Analysis and Visualization for Your Application’s DataPower View: Analysis and Visualization for Your Application’s Data
Power View: Analysis and Visualization for Your Application’s Data
 
Grasping The LightSwitch Paradigm
Grasping The LightSwitch ParadigmGrasping The LightSwitch Paradigm
Grasping The LightSwitch Paradigm
 
SQL Server Denali: BI on Your Terms
SQL Server Denali: BI on Your Terms SQL Server Denali: BI on Your Terms
SQL Server Denali: BI on Your Terms
 
Microsoft and its Competition: A Developer-Friendly Market Analysis
Microsoft and its Competition: A Developer-Friendly Market Analysis Microsoft and its Competition: A Developer-Friendly Market Analysis
Microsoft and its Competition: A Developer-Friendly Market Analysis
 

Recently uploaded

[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
Enterprise Knowledge
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
LizaNolte
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
christinelarrosa
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
Mydbops
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
ScyllaDB
 

Recently uploaded (20)

[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
 

Big Data Strategy for the Relational World

  • 1. Big Data Strategy for the Relational World Embracing Disruption, Avoiding Regression Andrew J. Brust Founder & CEO, Blue Badge Insights Big Data correspondent, ZDNet Big Data Analyst, GigaOM Research
  • 2. Bio • CEO and Founder, Blue Badge Insights • Big Data blogger for ZDNet • Microsoft Regional Director, MVP • Co-chair, Visual Studio Live! and 18 years as a speaker • Founder, Microsoft BI User Group of NYC – http://www.msbinyc.com • Co-moderator, NYC .NET Developers Group – http://www.nycdotnetdev.com • “Redmond Review” columnist for Visual Studio Magazine and Redmond Developer News • Twitter: @andrewbrust
  • 3. Andrew on ZDNet (bit.ly/bigondata)
  • 5. Big Data: Why Should You Care? • Because analytics (i.e. BI) has always been important, but it was expensive and obscure • Because the economics of processing and storage make Big Data feasible
  • 6. Big Data: Why Should You be Cautious? • Too many vendors; too much churn • Designed for the lab, not for mainstream business • Immature technology and tooling – Results in serious recruiting and dev costs • So, you can’t ignore Big Data, but you can’t just pursue with abandon, either. – That’s hard!
  • 7. Agenda • Trends • Technologies – NoSQL – Hadoop – SQL Convergence – NewSQL – In-Memory • Forecasts • Risks • Recommendations
  • 8. Database Trends • Mongo and Cassandra, primarilyNoSQL • aka “unstructured data”Late-bound schema • Especially HDFSFile-based table handling • And Massively Parallel ProcessingColumnar storage • Very few throwing them awayCo-existence with RDBMS, OLAP databases • Still expect tables or cubesLittle change in tools/clients
  • 9. NoSQL Key-Value Store • Couchbase • Riak • Redis • Voldemort • DynamoDB • Azure tables Document Store • MongoDB • CouchDB • Cloudant • Couchbase Wide Column Store • HBase • Cassandra Graph Database • Neo4J SQLSQL
  • 10. Consistency • CAP Theorem –Databases may only excel at two of the following three attributes: consistency, availability and partition tolerance • NoSQL does not offer “ACID” guarantees –Atomicity, consistency, isolation and durability • Instead offers “eventual consistency” –Similar to DNS propagation
  • 12. NoSQL Upside • Distributed by default • Open source lets you peg costs to personnel, more than to customers • Developer enthusiasm
  • 13. Hadoop • Open source, petabyte-scale data analysis and processing framework • Runs on commodity hardware • Lots of ecosystem • Two main components: – Hadoop Distributed File System (HDFS) – MapReduce engine
  • 14. Hadoop • Open source, petabyte-scale data analysis and processing framework • Runs on commodity hardware • Lots of ecosystem • Two main components: – Hadoop Distributed File System (HDFS) – MapReduce engine
  • 15. Why MapReduce is Cool • Extremely flexible – full power of a procedural programming language • Map step, essentially, allows ad hoc ETL • With Reduce step, aggregation is a first-class concept • Growing ecosystem of tools that generate MapReduce code
  • 16. Why MapReduce Sucks • It’s a batch mode technology • It’s not declarative • Most BI products don’t work with MR natively – They connect via Hive instead (by and large) • It’s good for a group of use cases, but it’s not a good general framework
  • 17. The Google DNA • Hadoop and HBase came from Google – MapReduce, GFS – BigTable • Hadoop was built for their use cases, and they don’t use it as extensively now • So why is the world going Hadoop-crazy?
  • 18. Benefits of Schema-Free • Variable schema is accommodated – Great for product catalogs, content management and the like • Simple for archival storage • For analysis: – Avoids politics of achieving consensus on structure – Allows different schema for different applications
  • 19. Cloud Effect • Database as a service and SaaS BI/Analytics gets companies excited – Cloudant – Amazon: DynamoDB, RDS, RedShift, Jaspersoft • Elastic capabilities of cloud provide small customers with access to huge clusters – Amazon EMR, Microsoft Windows Azure HDInsight now – Google Compute Engine, Rackspace/Hortonworks to come • Cloud-borne reference data adds value • But casualties emerging: e.g. Xeround
  • 20. SQL Skillset and Ecosystem • Making recruiting faster and cheaper DBAs, most devs know it ORMs expect it • Even if they also talk to MDX and NoSQL sources Reporting/analysis tools are premised on it Companies are invested in it Abandoning it is naive
  • 21. MPP is Big Data (via acquisition) • Acquired Aster DataTeradata • IBMNetezza • HPVertica • EMCPivotal/Greenplum • ActianParAccel • Microsoft-DATAllegro acquisitionSQL Server Parallel Data Warehouse
  • 22. SQL – BD Convergence • Brings the SQL language and data warehouse products, on one side, together with Hadoop, on the other • Goal is to make Hadoop interactive, non-batch • May involve Hive and its APIs • May involve direct access to HDFS – Bypassing MapReduce • Think of the “database” as HDFS, and MapReduce as merely an access method.
  • 23. One Repository, Multiple Access Methods HCatalog
  • 24. Cloudera Impala (v1.0 shipped April 30) Hortonworks “Stinger” initiative •Make Hive 100x faster EMC Pivotal Microsoft PolyBase, Data Explorer Teradata Aster SQL-H ParAccel (Actian) ODI SQL – BD Convergence
  • 25.
  • 27. Dremel and Drill • Dremel is Google’s column store analytical database – Proprietary; available publicly as BigQuery • Hierarchical/nested too – Allows schema variance without anarchy • “…scales to thousands of CPUs and petabytes of data, and has thousands of users at Google.” • Uses SQL, has growing BI tool support • Petabyte scale • Drill:Dremel as Hadoop:MapReduce+GFS • And then there’s Spanner
  • 28. In-Memory • SAP HANA – And Sybase IQ • Data Warehouse Appliances • VoltDB • Oracle TimesTen • IBM solidDB – Also TM1 (in-memory OLAP) • Coming: SQL Server’s “Hekaton” engine
  • 29. The Truth About In-Memory • Judicious use of in-memory database technology can speed analytical queries – Combine with columnar technology, rinse, repeat • Can also eliminate need for deferred writes • A RAM-only strategy like HANA’s seems impractical • Keep in mind: – SSD is memory too. It’s slower, but it’s memory. – Conversely, L1, L2 and L3 cache is faster than RAM. Single Instruction, Multiple Data (SIMD) makes things faster still. • Hybrid approaches are most sensible
  • 30. What’s Ahead? • Consolidation! We can’t have this many vendors: – Some will go out of business – Some will get acquired – A few will stay independent (but may merge with each other) • Hadoop recedes into the service layer • NoSQL shakes out, matures, coexists • NewSQL gets adopted or acquired • In-memory becomes a standard option
  • 31. Risks and Considerations • Pick an esoteric database now and you may be forced to migrate later • SQL Server and Oracle could add features that make the specialty products superfluous – Or new products • Conversely, NoSQL products may acquire ACID-like features themselves • More convergence
  • 32. Recommendations • NoSQL has its use cases. But it also has its abuses. • Look carefully at the number of customers • Look also at how widely deployed the product is within those customer companies
  • 33. Recommendations • If you haven’t looked seriously at Hadoop, do so. But remember, it’s infrastructure. • You can reach out to Big Data now, or you can wait for it to reach out to you – Cost/benefit of earlier adoption vs. late following • For repeatable big problems, MapReduce works well; for iterative query, “SQL” technologies are much better – akin to standard reports versus ad hoc queries
  • 34. Parting Thoughts • NoSQL and Big Data are disruptive • You ignore them at your peril • But if they can’t, ultimately, blend into current technology environments then they’re destined to fail • You can embrace the change without being sacrificed. Just watch your back.
  • 35. Thank You! • Email • andrew.brust@bluebadgeinsights.com • Blog: • http://www.zdnet.com/blog/big-data • Twitter • @andrewbrust on twitter