SlideShare a Scribd company logo
1 of 56
Download to read offline
Grab some
coffee and
enjoy the
pre-show
banter before
the top of the
hour!
The Briefing Room
Framing the Argument: How to Scale Faster with NoSQL
Twitter Tag: #briefr The Briefing Room
Welcome
Host:
Eric Kavanagh
eric.kavanagh@bloorgroup.com
@eric_kavanagh
Twitter Tag: #briefr The Briefing Room
  Reveal the essential characteristics of enterprise
software, good and bad
  Provide a forum for detailed analysis of today s innovative
technologies
  Give vendors a chance to explain their product to savvy
analysts
  Allow audience members to pose serious questions... and
get answers!
Mission
Twitter Tag: #briefr The Briefing Room
Topics
March: BI/ANALYTICS
April: BIG DATA
May: CLOUD
Twitter Tag: #briefr The Briefing Room
More Than } Way to Skin a Cat
NoSQL engines provide escape hatches
  Force-fitting all data into relational will fail, because:
Performance is ALWAYS important,
now more than ever
Twitter Tag: #briefr The Briefing Room
Analyst: Robin Bloor
Robin Bloor is
Chief Analyst at
The Bloor Group
robin.bloor@bloorgroup.com
@robinbloor
Twitter Tag: #briefr The Briefing Room
IBM Cloudant
  IBM Cloudant offers a non-relational, cloud-based
distributed database
  The product is based on Apache CouchDB and provides data
management, search, hosting, admin tools and analytics
Cloudant’s database-as-a-service is often used for web or
mobile application development
Twitter Tag: #briefr The Briefing Room
Guest: Ryan Millay
Ryan Millay started with IBM® Cloudant® in
May 2014 after three years as a software
engineer. Now he is part of the Field
Engineering team working on both pre- and
post-sales opportunities with a variety of
different accounts. He is also a member of
the Cloudant Local Services team to help
customers scope and install Cloudant’s on-
premises software. When not at Cloudant,
Ryan enjoys travelling, playing a round of
golf, or binging on the latest show on Netflix.
SQL to NoSQL: Top 5 Questions
Mike Broberg
Marketing Communications, Cloudant, IBM Cloud Data Services
Ryan Millay
Field Engineer, Cloudant, IBM Cloud Data Services
Agenda
11
•  About Cloudant
•  Top 5 Questions When Moving to NoSQL
•  Live Q&A
Housekeeping Notes
12
•  Today’s webcast is being recorded. We
will send you a link to the recording, a
link to the library and its code examples,
and a copy of the slide deck after the
presentation.
•  The webcast recording will be available
on our website: https://cloudant.com
•  If you would like to ask a question during
today’s presentation, please type in your
question using the GoToWebinar tool bar.
1. Why NoSQL?
13
But, What Is NoSQL, Really?
14
•  Umbrella term for databases using non-SQL query languages
•  Key-Value stores
•  Wide column stores
•  Document stores
•  Graph stores
•  Some also say "non-relational," because data is not
decomposed into separate tables, rows, and columns
•  As we’ll see, it’s still possible to represent relationships in NoSQL
•  The question is, are these relationships always necessary?
Schema Flexibility
15
•  Cloudant uses JavaScript Object Notation (JSON) as its data format
•  Cloudant is based on Apache CouchDB. In both systems, a "database" is simply
a collection of JSON documents
{
"docs": [
{
"_id": "df8cecd9809662d08eb853989a5ca2f2",
"_rev": "1-8522c9a1d9570566d96b7f7171623270",
"Movie_runtime": 162,
"Movie_rating": "PG-13",
"Person_name": "Zoe Saldana",
"Actor_actor_id": "0757855",
"Movie_genre": "AVYS",
"Movie_name": "Avatar",
"Actor_movie_id": "0499549",
"Movie_earnings_rank": "1",
"Person_pob": "New Jersey, USA",
"Person_id": "0757855",
"Movie_id": "0499549",
"Movie_year": 2009,
"Person_dob": "1978-06-19"
}
]
}
Horizontal Scaling
16
•  Many commodity servers vs. few expensive ones
•  Performance improves linearly with cost, not exponentially
Master-Master Replication
•  Or "masterless replica architecture"
•  Minimize latency by putting data close to users
•  Replicate data widely to mitigate disasters
•  Cloudant excels at data movement
2. Rows and Tables Become ... What?
17
... This!
SQL Terms/Concepts
database -->
table -->
row -->
column -->
materialized view -->
primary key -->
table JOIN operations -->
Document Store Terms/Concepts
database
bunch of documents
document
field
index/database view/secondary index
"_id":
entity relations
18
Rows --> Documents
19
•  Use some field to group documents by schema
•  Example: "type":"user" or "type":"edge:follower"
Tables --> Databases
•  Put all tables in one database; use "type": to distinguish
•  Model entity relationships with secondary indexes
•  More on this later in the webinar
•  If you're curious, we're talking about concepts described in the
CouchDB documentation on entity relations
•  http://wiki.apache.org/couchdb/EntityRelationship
Indexes and Queries
20
•  An "index" in Cloudant is not strictly a performance optimization
•  Instead, more akin to "materialized view" in RDBMS terms
•  Index also called a "database view" in Cloudant
•  Index, then query.
•  You need one before you can do the other
•  Create index, then query by URL
•  Can create a secondary index on any field within a document
•  You get primary index (based on reserved "_id": field) by default
•  Indexes precomputed, updated in real time
•  Performant at big-honkin' scale
3. Will I Have to Rebuild My App?
21
Yes
22
By ripping out the bad parts:
•  Extract, Transform, Load
•  Schema migrations
•  JOINs that don't scale
A little more work up-front, but your application will adapt to scale
much better
4. So Each of My Tables Becomes a
Different Type of JSON Document?
23
No
24
•  Fancy explanation:
•  Best practice is to denormalize data into 3rd normal form
•  Or, less fancy:
•  Smoosh relationships for each entry all together into one JSON doc
•  Denormalization
•  Approach to data modeling that shards well and scales well
•  Works well with data that is somewhat static, or infrequently updated
Static Data Example: TV Cast Members
http://www.sarahmei.com/blog/
2013/11/11/why-you-should-
never-use-mongodb/
25
What Doesn't Scale
26
•  RDBMS JOINs across shards
•  Presumably across different machines
•  Common pain point when scaling RDBMS
What Does Scale
•  Denormalized data models + modern
distributed systems
•  More efficient to distribute data if it's already
in one compact unit
5. But What if I Need Relationships? Can
Cloudant Do JOINs?
27
Yes ... But First, Don't Do This
Relationships as single documents
28
http://www.sarahmei.com/blog/
2013/11/11/why-you-should-never-use-
mongodb/
Some "Key" Concepts
29
•  Inject logic into "_id": field to enforce uniqueness
•  Example: "_id":"<course>-<student>" ensures at most one
document per course per student
•  Give your documents a "type": field
•  Add relations as separate "edge" documents
•  Exploit powerful materialized view engine
Preview: Defining an Index/View
30
•  This design document (built in Cloudant Web dashboard)
encapsulates everything that follows
•  It builds our secondary index/database view, which we will soon query
•  It's the incremental MapReduce view engine we cited earlier
•  https://webinar.cloudant.com/relational/_design/join
Sample Related Data: Twitter
31
User documents flexible & straightforward
How Do We Deal With Followers?
32
a.  Update each user document with a list
b.  Create relation documents and "join"
E.g., Follower Graph
33
Relationships as Documents
34
Goal: Materialize Users & Following List
35
"join" by selecting rows at lines 103–105
Index Sorting Rules
36
http://wiki.apache.org/couchdb/View_collation
Materialize Users, With All Followed
37
Materialize Users, With All Followed
38
Let's Query That View
39
https://webinar.cloudant.com/relational/_design/join/_view/follows?
startkey=["user:kocolosk"]&endkey=["user:kocolosk",{}]
System-generated
unique doc "_id":
Sort key Pointer to related
followed user's
doc "_id":
Let's Query
That View, and
Follow Pointers
40
https://webinar.cloudant.com/relational/_design/join/_view/follows?
startkey=["user:kocolosk"]&endkey=["user:kocolosk",{}]&include_docs=true
Wait. What Did We Get?
41
•  kocolosk’s USER document
•  list of all USERs kocolosk FOLLOWS
•  full USER document for all USERs that kocolosk FOLLOWS
•  In a fast, single query
Legal Slide #1
42
© "Apache", "CouchDB", "Apache CouchDB", "Apache Lucene," "Lucene", and the CouchDB logo are trademarks or registered
trademarks of The Apache Software Foundation. All other brands and trademarks are the property of their respective owners.
Legal Slide #2
43
© Copyright IBM Corporation 2015.
IBM and the IBM Cloudant logo are trademarks of International Business Machines Corp., registered in many
jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A
current list of IBM trademarks is available on the Web at "Copyright and trademark information" at ibm.com/legal/
copytrade.shtml
Thank You
@cloudant
mbroberg@us.ibm.com
rmillay@us.ibm.com
Twitter Tag: #briefr The Briefing Room
Perceptions & Questions
Analyst:
Robin Bloor
Robin Bloor, PhD
Database is Being Disrupted
u  Data volumes
u  Speed of arrival
u  Content data (JSON)
u  IOT data
u  Cloud deployment
u  Schema on read
u  Memory for disk
u  Analytic workloads
THIS IS A PERFECT
STORM OF A KIND
What Is a Database?
A database is software that presides over a heap
of data that:
u  Implements a data model
u  Manages multiple concurrent requests for data
u  Implements a security model
u  Is ACID compliant (?)
u  Is resilient
RDBMS
Databases that:
u  Assume you can represent all data in related
tables
u  Assume that you want to process data in a set-wise
manner
u  Can be used for many problems
u  Are absolutely not universal, hence:
•  The Null kluge
•  The impedance mismatch
•  BLOBS
•  OR Databases
Another Couple of Issues…
Programmers prefer JSON
The SEMANTICS of data
u  It is already beginning to look as though
graph databases are a separate category of
engine
u  The triple store tactic (representing data in
triples) is required for semantics, otherwise
meaning is limited
Data Access
In reality there is no
DATA ACCESS STANDARD
There are several different
approaches according to the
data model
u  How much evangelizing of JSON do you find it
necessary to do?
u  How swiftly do SQL developers adjust to JSON?
u  JOINs are performance hogs in all database
systems. Please explain why you think they are
more economic with Cloudant.
u  Does Cloudant scale better than, say, a column
store SQL model?
u  Can you explain the tuning and other DBA
activities with Cloudant?
u  Is recovery the same as with RDBMS?
u  What is the database size of your largest
customer (users, data volume)?
Twitter Tag: #briefr The Briefing Room
Twitter Tag: #briefr The Briefing Room
Upcoming Topics
www.insideanalysis.com
March: BI/ANALYTICS
April: BIG DATA
May: CLOUD
Twitter Tag: #briefr The Briefing Room
THANK YOU
for your
ATTENTION!
Some images provided courtesy of
Wikimedia Commons

More Related Content

What's hot

Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World
Andrew Brust
 

What's hot (20)

Optimizing DITA Content for Search Engine Optimization tekom tcworld 2016
Optimizing DITA Content for Search Engine Optimization tekom tcworld 2016Optimizing DITA Content for Search Engine Optimization tekom tcworld 2016
Optimizing DITA Content for Search Engine Optimization tekom tcworld 2016
 
Inside the mind of a SharePoint Solutions Architect
Inside the mind of a SharePoint Solutions ArchitectInside the mind of a SharePoint Solutions Architect
Inside the mind of a SharePoint Solutions Architect
 
Facebook Architecture - Breaking it Open
Facebook Architecture - Breaking it OpenFacebook Architecture - Breaking it Open
Facebook Architecture - Breaking it Open
 
SEF2013 - Create a Business Solution, Step by Step, with No Managed Code
SEF2013 - Create a Business Solution, Step by Step, with No Managed CodeSEF2013 - Create a Business Solution, Step by Step, with No Managed Code
SEF2013 - Create a Business Solution, Step by Step, with No Managed Code
 
Out With the Old, in With the Open-source: Brainshark's Complete CMS Migration
Out With the Old, in With the Open-source: Brainshark's Complete CMS MigrationOut With the Old, in With the Open-source: Brainshark's Complete CMS Migration
Out With the Old, in With the Open-source: Brainshark's Complete CMS Migration
 
Stop SharePoint Project Failure
Stop SharePoint Project FailureStop SharePoint Project Failure
Stop SharePoint Project Failure
 
RDBMS vs NoSQL
RDBMS vs NoSQLRDBMS vs NoSQL
RDBMS vs NoSQL
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World
 
OVERVIEW OF FACEBOOK SCALABLE ARCHITECTURE.
OVERVIEW  OF FACEBOOK SCALABLE ARCHITECTURE.OVERVIEW  OF FACEBOOK SCALABLE ARCHITECTURE.
OVERVIEW OF FACEBOOK SCALABLE ARCHITECTURE.
 
Web Services PHP Tutorial
Web Services PHP TutorialWeb Services PHP Tutorial
Web Services PHP Tutorial
 
A SharePoint File Migration Framework
A SharePoint File Migration FrameworkA SharePoint File Migration Framework
A SharePoint File Migration Framework
 
Contours of DITA 2.0
Contours of DITA 2.0Contours of DITA 2.0
Contours of DITA 2.0
 
Your Future HTML: The Evolution of Site Design with Web Components
Your Future HTML: The Evolution of Site Design with Web ComponentsYour Future HTML: The Evolution of Site Design with Web Components
Your Future HTML: The Evolution of Site Design with Web Components
 
SQLCAT: A Preview to PowerPivot Server Best Practices
SQLCAT: A Preview to PowerPivot Server Best PracticesSQLCAT: A Preview to PowerPivot Server Best Practices
SQLCAT: A Preview to PowerPivot Server Best Practices
 
Getting started with SharePoint REST API in custom SharePoint workflows Resto...
Getting started with SharePoint REST API in custom SharePoint workflows Resto...Getting started with SharePoint REST API in custom SharePoint workflows Resto...
Getting started with SharePoint REST API in custom SharePoint workflows Resto...
 
Portal / BI 2008 Presentation by Ted Tschopp
Portal / BI 2008 Presentation by Ted TschoppPortal / BI 2008 Presentation by Ted Tschopp
Portal / BI 2008 Presentation by Ted Tschopp
 
Getting Everything You want Out of SharePoint
Getting Everything You want Out of SharePointGetting Everything You want Out of SharePoint
Getting Everything You want Out of SharePoint
 
To SQL or NoSQL, that is the question
To SQL or NoSQL, that is the questionTo SQL or NoSQL, that is the question
To SQL or NoSQL, that is the question
 
Deploying and Managing PowerPivot for SharePoint
Deploying and Managing PowerPivot for SharePointDeploying and Managing PowerPivot for SharePoint
Deploying and Managing PowerPivot for SharePoint
 
 Active Storage - Modern File Storage? 
 Active Storage - Modern File Storage?  Active Storage - Modern File Storage? 
 Active Storage - Modern File Storage? 
 

Viewers also liked

Viewers also liked (17)

Crawl, Walk, Run: How to Get Started with Hadoop
Crawl, Walk, Run: How to Get Started with HadoopCrawl, Walk, Run: How to Get Started with Hadoop
Crawl, Walk, Run: How to Get Started with Hadoop
 
DisrupTech 2015ek
DisrupTech 2015ekDisrupTech 2015ek
DisrupTech 2015ek
 
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLGoodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
 
Deeper Questions: How Interactive Visualization Empowers Analysts
Deeper Questions: How Interactive Visualization Empowers AnalystsDeeper Questions: How Interactive Visualization Empowers Analysts
Deeper Questions: How Interactive Visualization Empowers Analysts
 
The Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeThe Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On Time
 
Moving Targets: Harnessing Real-time Value from Data in Motion
Moving Targets: Harnessing Real-time Value from Data in Motion Moving Targets: Harnessing Real-time Value from Data in Motion
Moving Targets: Harnessing Real-time Value from Data in Motion
 
A Connected Data Landscape: Virtualization and the Internet of Things
A Connected Data Landscape: Virtualization and the Internet of ThingsA Connected Data Landscape: Virtualization and the Internet of Things
A Connected Data Landscape: Virtualization and the Internet of Things
 
DisrupTech - Dave Duggal
DisrupTech - Dave DuggalDisrupTech - Dave Duggal
DisrupTech - Dave Duggal
 
Big Data Enabled: How YARN Changes the Game
Big Data Enabled: How YARN Changes the GameBig Data Enabled: How YARN Changes the Game
Big Data Enabled: How YARN Changes the Game
 
The Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelThe Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global Level
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big Data
 
Time Difference: How Tomorrow's Companies Will Outpace Today's
Time Difference: How Tomorrow's Companies Will Outpace Today'sTime Difference: How Tomorrow's Companies Will Outpace Today's
Time Difference: How Tomorrow's Companies Will Outpace Today's
 
Structurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureStructurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your Architecture
 
Presumption of Abundance: Architecting the Future of Success
Presumption of Abundance: Architecting the Future of SuccessPresumption of Abundance: Architecting the Future of Success
Presumption of Abundance: Architecting the Future of Success
 
Achieving Business Value by Fusing Hadoop and Corporate Data
Achieving Business Value by Fusing Hadoop and Corporate DataAchieving Business Value by Fusing Hadoop and Corporate Data
Achieving Business Value by Fusing Hadoop and Corporate Data
 
Modus Operandi
Modus OperandiModus Operandi
Modus Operandi
 
Data Wrangling and the Art of Big Data Discovery
Data Wrangling and the Art of Big Data DiscoveryData Wrangling and the Art of Big Data Discovery
Data Wrangling and the Art of Big Data Discovery
 

Similar to Framing the Argument: How to Scale Faster with NoSQL

NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
Adi Challa
 
The View - Leveraging Lotuscript for Database Connectivity
The View - Leveraging Lotuscript for Database ConnectivityThe View - Leveraging Lotuscript for Database Connectivity
The View - Leveraging Lotuscript for Database Connectivity
Bill Buchan
 

Similar to Framing the Argument: How to Scale Faster with NoSQL (20)

SQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 QuestionsSQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 Questions
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
 
Uklug 2014 connections dev faq
Uklug 2014  connections dev faqUklug 2014  connections dev faq
Uklug 2014 connections dev faq
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
 
SQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveSQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The Move
 
SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013
SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013
SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013
 
IBM - Introduction to Cloudant
IBM - Introduction to CloudantIBM - Introduction to Cloudant
IBM - Introduction to Cloudant
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
 
CQRS recipes or how to cook your architecture
CQRS recipes or how to cook your architectureCQRS recipes or how to cook your architecture
CQRS recipes or how to cook your architecture
 
Software Architecture and Architectors: useless VS valuable
Software Architecture and Architectors: useless VS valuableSoftware Architecture and Architectors: useless VS valuable
Software Architecture and Architectors: useless VS valuable
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
 
DB2 and PHP in Depth on IBM i
DB2 and PHP in Depth on IBM iDB2 and PHP in Depth on IBM i
DB2 and PHP in Depth on IBM i
 
"It’s not only Lambda! Economics behind Serverless" at Serverless Architectur...
"It’s not only Lambda! Economics behind Serverless" at Serverless Architectur..."It’s not only Lambda! Economics behind Serverless" at Serverless Architectur...
"It’s not only Lambda! Economics behind Serverless" at Serverless Architectur...
 
How to Survive as a Data Architect in a Polyglot Database World
How to Survive as a Data Architect in a Polyglot Database WorldHow to Survive as a Data Architect in a Polyglot Database World
How to Survive as a Data Architect in a Polyglot Database World
 
Untangling fall2017 week1
Untangling fall2017 week1Untangling fall2017 week1
Untangling fall2017 week1
 
Architecture by Accident
Architecture by AccidentArchitecture by Accident
Architecture by Accident
 
Untangling the web11
Untangling the web11Untangling the web11
Untangling the web11
 
Building FoundationDB
Building FoundationDBBuilding FoundationDB
Building FoundationDB
 
Building your first Analysis Services Tabular BI Semantic model with SQL Serv...
Building your first Analysis Services Tabular BI Semantic model with SQL Serv...Building your first Analysis Services Tabular BI Semantic model with SQL Serv...
Building your first Analysis Services Tabular BI Semantic model with SQL Serv...
 
The View - Leveraging Lotuscript for Database Connectivity
The View - Leveraging Lotuscript for Database ConnectivityThe View - Leveraging Lotuscript for Database Connectivity
The View - Leveraging Lotuscript for Database Connectivity
 

More from Inside Analysis

Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile World
Inside Analysis
 

More from Inside Analysis (20)

An Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BIAn Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BI
 
Agile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for SuccessAgile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for Success
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
 
Fit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data LetdownFit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data Letdown
 
To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security
 
Introducing: A Complete Algebra of Data
Introducing: A Complete Algebra of DataIntroducing: A Complete Algebra of Data
Introducing: A Complete Algebra of Data
 
The Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionThe Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop Adoption
 
Ahead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsAhead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time Analytics
 
All Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of EverythingAll Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of Everything
 
SQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskSQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the Risk
 
A Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseA Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data Warehouse
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of Hadoop
 
Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile World
 
Phasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey MalafskyPhasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey Malafsky
 
Red Hat - Sarangan Rangachari
Red Hat - Sarangan RangachariRed Hat - Sarangan Rangachari
Red Hat - Sarangan Rangachari
 
WebAction-Sami Abkay
WebAction-Sami AbkayWebAction-Sami Abkay
WebAction-Sami Abkay
 
DisrupTech - Robin Bloor (2)
DisrupTech - Robin Bloor (2)DisrupTech - Robin Bloor (2)
DisrupTech - Robin Bloor (2)
 
DisrupTech - Robin Bloor (1)
DisrupTech - Robin Bloor (1)DisrupTech - Robin Bloor (1)
DisrupTech - Robin Bloor (1)
 
Big Data Refinery: Distilling Value for User-Driven Analytics
Big Data Refinery: Distilling Value for User-Driven AnalyticsBig Data Refinery: Distilling Value for User-Driven Analytics
Big Data Refinery: Distilling Value for User-Driven Analytics
 
Understanding What’s Possible: Getting Business Value from Big Data Quickly
Understanding What’s Possible: Getting Business Value from Big Data QuicklyUnderstanding What’s Possible: Getting Business Value from Big Data Quickly
Understanding What’s Possible: Getting Business Value from Big Data Quickly
 

Recently uploaded

Recently uploaded (20)

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 

Framing the Argument: How to Scale Faster with NoSQL

  • 1. Grab some coffee and enjoy the pre-show banter before the top of the hour!
  • 2. The Briefing Room Framing the Argument: How to Scale Faster with NoSQL
  • 3. Twitter Tag: #briefr The Briefing Room Welcome Host: Eric Kavanagh eric.kavanagh@bloorgroup.com @eric_kavanagh
  • 4. Twitter Tag: #briefr The Briefing Room   Reveal the essential characteristics of enterprise software, good and bad   Provide a forum for detailed analysis of today s innovative technologies   Give vendors a chance to explain their product to savvy analysts   Allow audience members to pose serious questions... and get answers! Mission
  • 5. Twitter Tag: #briefr The Briefing Room Topics March: BI/ANALYTICS April: BIG DATA May: CLOUD
  • 6. Twitter Tag: #briefr The Briefing Room More Than } Way to Skin a Cat NoSQL engines provide escape hatches   Force-fitting all data into relational will fail, because: Performance is ALWAYS important, now more than ever
  • 7. Twitter Tag: #briefr The Briefing Room Analyst: Robin Bloor Robin Bloor is Chief Analyst at The Bloor Group robin.bloor@bloorgroup.com @robinbloor
  • 8. Twitter Tag: #briefr The Briefing Room IBM Cloudant   IBM Cloudant offers a non-relational, cloud-based distributed database   The product is based on Apache CouchDB and provides data management, search, hosting, admin tools and analytics Cloudant’s database-as-a-service is often used for web or mobile application development
  • 9. Twitter Tag: #briefr The Briefing Room Guest: Ryan Millay Ryan Millay started with IBM® Cloudant® in May 2014 after three years as a software engineer. Now he is part of the Field Engineering team working on both pre- and post-sales opportunities with a variety of different accounts. He is also a member of the Cloudant Local Services team to help customers scope and install Cloudant’s on- premises software. When not at Cloudant, Ryan enjoys travelling, playing a round of golf, or binging on the latest show on Netflix.
  • 10. SQL to NoSQL: Top 5 Questions Mike Broberg Marketing Communications, Cloudant, IBM Cloud Data Services Ryan Millay Field Engineer, Cloudant, IBM Cloud Data Services
  • 11. Agenda 11 •  About Cloudant •  Top 5 Questions When Moving to NoSQL •  Live Q&A
  • 12. Housekeeping Notes 12 •  Today’s webcast is being recorded. We will send you a link to the recording, a link to the library and its code examples, and a copy of the slide deck after the presentation. •  The webcast recording will be available on our website: https://cloudant.com •  If you would like to ask a question during today’s presentation, please type in your question using the GoToWebinar tool bar.
  • 14. But, What Is NoSQL, Really? 14 •  Umbrella term for databases using non-SQL query languages •  Key-Value stores •  Wide column stores •  Document stores •  Graph stores •  Some also say "non-relational," because data is not decomposed into separate tables, rows, and columns •  As we’ll see, it’s still possible to represent relationships in NoSQL •  The question is, are these relationships always necessary?
  • 15. Schema Flexibility 15 •  Cloudant uses JavaScript Object Notation (JSON) as its data format •  Cloudant is based on Apache CouchDB. In both systems, a "database" is simply a collection of JSON documents { "docs": [ { "_id": "df8cecd9809662d08eb853989a5ca2f2", "_rev": "1-8522c9a1d9570566d96b7f7171623270", "Movie_runtime": 162, "Movie_rating": "PG-13", "Person_name": "Zoe Saldana", "Actor_actor_id": "0757855", "Movie_genre": "AVYS", "Movie_name": "Avatar", "Actor_movie_id": "0499549", "Movie_earnings_rank": "1", "Person_pob": "New Jersey, USA", "Person_id": "0757855", "Movie_id": "0499549", "Movie_year": 2009, "Person_dob": "1978-06-19" } ] }
  • 16. Horizontal Scaling 16 •  Many commodity servers vs. few expensive ones •  Performance improves linearly with cost, not exponentially Master-Master Replication •  Or "masterless replica architecture" •  Minimize latency by putting data close to users •  Replicate data widely to mitigate disasters •  Cloudant excels at data movement
  • 17. 2. Rows and Tables Become ... What? 17
  • 18. ... This! SQL Terms/Concepts database --> table --> row --> column --> materialized view --> primary key --> table JOIN operations --> Document Store Terms/Concepts database bunch of documents document field index/database view/secondary index "_id": entity relations 18
  • 19. Rows --> Documents 19 •  Use some field to group documents by schema •  Example: "type":"user" or "type":"edge:follower" Tables --> Databases •  Put all tables in one database; use "type": to distinguish •  Model entity relationships with secondary indexes •  More on this later in the webinar •  If you're curious, we're talking about concepts described in the CouchDB documentation on entity relations •  http://wiki.apache.org/couchdb/EntityRelationship
  • 20. Indexes and Queries 20 •  An "index" in Cloudant is not strictly a performance optimization •  Instead, more akin to "materialized view" in RDBMS terms •  Index also called a "database view" in Cloudant •  Index, then query. •  You need one before you can do the other •  Create index, then query by URL •  Can create a secondary index on any field within a document •  You get primary index (based on reserved "_id": field) by default •  Indexes precomputed, updated in real time •  Performant at big-honkin' scale
  • 21. 3. Will I Have to Rebuild My App? 21
  • 22. Yes 22 By ripping out the bad parts: •  Extract, Transform, Load •  Schema migrations •  JOINs that don't scale A little more work up-front, but your application will adapt to scale much better
  • 23. 4. So Each of My Tables Becomes a Different Type of JSON Document? 23
  • 24. No 24 •  Fancy explanation: •  Best practice is to denormalize data into 3rd normal form •  Or, less fancy: •  Smoosh relationships for each entry all together into one JSON doc •  Denormalization •  Approach to data modeling that shards well and scales well •  Works well with data that is somewhat static, or infrequently updated
  • 25. Static Data Example: TV Cast Members http://www.sarahmei.com/blog/ 2013/11/11/why-you-should- never-use-mongodb/ 25
  • 26. What Doesn't Scale 26 •  RDBMS JOINs across shards •  Presumably across different machines •  Common pain point when scaling RDBMS What Does Scale •  Denormalized data models + modern distributed systems •  More efficient to distribute data if it's already in one compact unit
  • 27. 5. But What if I Need Relationships? Can Cloudant Do JOINs? 27
  • 28. Yes ... But First, Don't Do This Relationships as single documents 28 http://www.sarahmei.com/blog/ 2013/11/11/why-you-should-never-use- mongodb/
  • 29. Some "Key" Concepts 29 •  Inject logic into "_id": field to enforce uniqueness •  Example: "_id":"<course>-<student>" ensures at most one document per course per student •  Give your documents a "type": field •  Add relations as separate "edge" documents •  Exploit powerful materialized view engine
  • 30. Preview: Defining an Index/View 30 •  This design document (built in Cloudant Web dashboard) encapsulates everything that follows •  It builds our secondary index/database view, which we will soon query •  It's the incremental MapReduce view engine we cited earlier •  https://webinar.cloudant.com/relational/_design/join
  • 31. Sample Related Data: Twitter 31 User documents flexible & straightforward
  • 32. How Do We Deal With Followers? 32 a.  Update each user document with a list b.  Create relation documents and "join"
  • 35. Goal: Materialize Users & Following List 35 "join" by selecting rows at lines 103–105
  • 37. Materialize Users, With All Followed 37
  • 38. Materialize Users, With All Followed 38
  • 39. Let's Query That View 39 https://webinar.cloudant.com/relational/_design/join/_view/follows? startkey=["user:kocolosk"]&endkey=["user:kocolosk",{}] System-generated unique doc "_id": Sort key Pointer to related followed user's doc "_id":
  • 40. Let's Query That View, and Follow Pointers 40 https://webinar.cloudant.com/relational/_design/join/_view/follows? startkey=["user:kocolosk"]&endkey=["user:kocolosk",{}]&include_docs=true
  • 41. Wait. What Did We Get? 41 •  kocolosk’s USER document •  list of all USERs kocolosk FOLLOWS •  full USER document for all USERs that kocolosk FOLLOWS •  In a fast, single query
  • 42. Legal Slide #1 42 © "Apache", "CouchDB", "Apache CouchDB", "Apache Lucene," "Lucene", and the CouchDB logo are trademarks or registered trademarks of The Apache Software Foundation. All other brands and trademarks are the property of their respective owners.
  • 43. Legal Slide #2 43 © Copyright IBM Corporation 2015. IBM and the IBM Cloudant logo are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at ibm.com/legal/ copytrade.shtml
  • 45. Twitter Tag: #briefr The Briefing Room Perceptions & Questions Analyst: Robin Bloor
  • 47. Database is Being Disrupted u  Data volumes u  Speed of arrival u  Content data (JSON) u  IOT data u  Cloud deployment u  Schema on read u  Memory for disk u  Analytic workloads THIS IS A PERFECT STORM OF A KIND
  • 48. What Is a Database? A database is software that presides over a heap of data that: u  Implements a data model u  Manages multiple concurrent requests for data u  Implements a security model u  Is ACID compliant (?) u  Is resilient
  • 49. RDBMS Databases that: u  Assume you can represent all data in related tables u  Assume that you want to process data in a set-wise manner u  Can be used for many problems u  Are absolutely not universal, hence: •  The Null kluge •  The impedance mismatch •  BLOBS •  OR Databases
  • 50. Another Couple of Issues… Programmers prefer JSON The SEMANTICS of data u  It is already beginning to look as though graph databases are a separate category of engine u  The triple store tactic (representing data in triples) is required for semantics, otherwise meaning is limited
  • 51. Data Access In reality there is no DATA ACCESS STANDARD There are several different approaches according to the data model
  • 52. u  How much evangelizing of JSON do you find it necessary to do? u  How swiftly do SQL developers adjust to JSON? u  JOINs are performance hogs in all database systems. Please explain why you think they are more economic with Cloudant. u  Does Cloudant scale better than, say, a column store SQL model?
  • 53. u  Can you explain the tuning and other DBA activities with Cloudant? u  Is recovery the same as with RDBMS? u  What is the database size of your largest customer (users, data volume)?
  • 54. Twitter Tag: #briefr The Briefing Room
  • 55. Twitter Tag: #briefr The Briefing Room Upcoming Topics www.insideanalysis.com March: BI/ANALYTICS April: BIG DATA May: CLOUD
  • 56. Twitter Tag: #briefr The Briefing Room THANK YOU for your ATTENTION! Some images provided courtesy of Wikimedia Commons