SlideShare a Scribd company logo
Data Analytics with NOSQL
Mukundan Agaram
Chris Weiss
Some initial thoughts about data...
Continual issues with large scale web apps
– Data growth + query response time
● Data growth => performance degradation
● Explosion of big data “analytics” use cases
– Increase in unstructured data
● More interconnectivity, more formats, lack of structure...
● Document oriented data (XML/JSON) are difficult to
manage and search
– Distributed server configurations
● Large systems, more distribution and HA
Cloud services has aggravated these issues
Agenda for the night
● What is NOSQL?
● Varieties of NOSQL
● Key Industry Use Cases
● Applications for Data Analytics
● Landscape
● Demos/Walkthroughs
● Closing Discussions
What is NOSQL?
● “...mechanism for storage and retrieval of data
that is modeled in means other than tabular
relations used in relational databases.”
Wikipedia
● Non SQL or Non-relational
● Not Only SQL
● Technically since late 1960...
– E.g. IDMS, IMS, MUMPS, Cache, BerkeleyDB
What is NOSQL?
● Drivers for modern day NOSQL
– Web 2.0
– Big Data
– Facebook, Google, Amazon, Expedia etc.
– Horizontal scaling to clusters of computers
● Achilles heel for RDBMS
– Cost
– Provide
● HA
● Partition Tolerance (a.k.a sharding)
● Speed
NOSQL - Drawbacks and Barriers
● Compromise on consistency (CAP Theorem)
● Custom query languages vs. SQL
● Lack of standardized interfaces
● Existing investments in RDBMS
● Most lack true ACID transactions.
– Use an “eventually” consistent model
– Data is replicated with a conflict resolution algorithm
– Methods for conflict resolution and distribution vary
significantly
CAP Theorem
● a.k.a Brewer's theorem
● Impossible for a distributed computer system to
simultaneously provide
– Consistency
● all nodes see same data at same time
– Availability
● Every request receives a response
– Partition Tolerance
● Fault tolerance to partitioning because of network failures
CAP alignment for NOSQL
Source: http://blog.nahurst.com/visual-guide-to-nosql-systems
NOSQL direction
The landscape is morphing...
● Current NOSQL industry focus
– Address large distributed systems reactionary to the
CAP theorem
● The newer breed of NOSQL address important
aspects such as ACID
● There is a new buzz word …
– NewSQL
Database Evolution
NOSQL Model Classification
Key Value Stores &
Caches
Data is represented as a collection of (K,V) pairs. In-memory,
persistent or eventually persistent.
Document Databases Data is stored in JSON document structures.
RDF, OWL & Triple Stores Meaningful way to connect information. Can inference over
triples (S,P,O). Can be represented graphically. SPARQL
Wide Column Databases Extensible record set. Stores data tables as sections of
columns. Great for EDW.
Graph Databases Stores data as a graph G(V,E). Great for correlation analysis,
recommendation engines and fraud detection.
Multi-model Databases Combination of one or more varieties of the above.
NOSQL Models
● Key-Value
– Cache (EHCache, BigMemory, Coherence, Memcached)
– Store (Redis, Riak, AeroSpike, Oracle NoSQL)
● Document (MongoDB, CouchDB, AmazonDynamoDB)
● Wide Column (Cassandra, HBase, Vertica)
● Graph (Neo4j, Titan, Giraph)
● Multi-model (OrientDB, ArangoDB, Sqrrl)
Source: www.db-engines.com
Consider NOSQL for...
● Enabling “big data” and “web” scale
– Massive distribution through horizontal scaling
● Performant queries (alternatives to RDBMS)
– Denormalization and large horizontal scalability
● Massive write volumes (Facebook, Twitter)
● Fast and dynamic access to key data
● Flexible schemas and data types
● Data/Schema Migration
● Developer centric environments
Consider NOSQL for...
● Diverse data organization options
– Hierarchical correlation
– Graph correlation
– Semantic relationships
– Set based analytics
● Caching in end usage format
● Data Archival
● Big Data Analytics
– Cumulative metrics and insights
– Correlation
Where RDBMS/SQL is better..
● OLTP
● Data Integrity
● SQL centricity
● Complex relationships
– Exception of graph NOSQL
● Maturity, stability and standardization
Use Cases
● Log management (unstructured data)
● Data synchronization (online vs. offline sources)
– Shopping cart, Field sales/services, PoS, Gaming,
Transportation/telemetry
● User profile management
● Customer 360 degree view
● Fraud detection
● Medical/Healthcare diagnosis
● Data Archival
● Recommendation Engines
Applications for Data Analytics
● Complements (part of) Hadoop and Big Data
● Acts as the persistence infrastructure for larger
machine learning use cases
– Predictive Analytics
– Fraud/Anomaly/Outlier Detection
– Recommendation engines
● Provides a back drop for interesting data
visualization initiatives
– Integrate with visualization packages such as
Tableau
Interesting links
● Redis in Practice: Who's online?
www.lukemelia.com/blog/archives/2010/01/17/redis-in-practice-whos-online/
● Inventory list of NOSQL systems
www.nosql-database.org
● Database Engine ranking and analytics
www.db-engines.com
● Visual guide to NOSQL systems
www.blog.nahurst.com/visual-guide-to-nosql-systems
Case Studies / Demos
● Retail fraud detection
– Neo4j
– Contrasting with OrientDB
– Tinkerpop/Gremlin/Blue Print
● 360 degree single view of voter information
– MongoDB
● Schema on read
– Hadoop
Gremlin Blueprints Architecture
Neo4j OrientDB TitanGraph ArangoDB
Qualified Voter – Use Case
● Tracks registration information for all voters in
Michigan
● Uses a tabular geography model
● Highly normalized schema
– Data partitioned into subsets
● Enable local application instances and row level security
● Expensive queries when doing reporting
● Expensive queries for performing “single view”
of voter
● Several tables with tens of millions of records
Voter Schema
Find the first 100 voters in Ingham county with
status and school district
SELECT V.VOTER_IDENTIFICATION_NUMBER,V.FIRST_NAME, V.LAST_NAME, G.CODE AS GENDER,
IDS.NAME AS ID_STATUS, UST.NAME AS UOCAVA_STATUS,
VA.ADDRESS_LINE_ONE, VA.CITY, VA.ZIP_CODE,
DIS.NAME AS SCHOOL_DISTRICT
FROM VOTER V, VOTER_ADDRESS VA, GENDER G,
IDENTIFICATION_STATUS IDS, UOCAVA_STATUS UST, VOTER_STATUS_TYPE VST,
STREET_RANGE SI, DISTINCT_POLITICAL_AREA DPA, DISTINCT_POLITICAL_AREA_DIS DPAD,
DISTRICT DIS, DISTRICT_TYPE DT, COUNTY CO
WHERE V.ID = VA.VOTER_ID AND V.GENDER_ID = G.ID AND V.IDENTIFICATION_STATUS_ID = IDS.ID
AND V.UOCAVA_STATUS_ID = UST.ID AND V.VOTER_STATUS_TYPE_ID = VST.ID AND VST.NAME = 'Active'
AND VA.STREET_RANGE_ID = SI.ID AND SI.DISTINCT_POLITICAL_AREA_ID = DPA.ID
AND VA.IS_ACTIVE = 'Y'
AND DPA.COUNTY_ID = CO.ID AND CO.NAME = 'Ingham'
AND DPA.ID = DPAD.DISTINCT_POLITICAL_AREA_ID AND DPAD.DISTRICT_ID = DIS.ID
AND DIS.DISTRICT_TYPE_ID = DT.ID AND DT.NAME = 'School'
AND ROWNUM <= 100;
Expensive in terms of IO
● Multiple objects read
● Two stage IO:
● Read index
● Read entire table row
● Selected and WHERE clause columns
assembled and then filtered
● Resources for larger volume query would be
high – memory, CPU, fast disk
Parting conclusions
● NOSQL is a mixed bag of fruit
● This space is growing
● There are hundreds of products
● Best value is realized from identifying the
correct use case
– Functional requirements
– Non-functional requirements
Finally you can use NOSQL for...
Thank You!!
Questions?

More Related Content

What's hot

Data Modeling PPT
Data Modeling PPTData Modeling PPT
Data Modeling PPT
Trinath
 
Influence of-structured--semi-structured--unstructured-data-on-various-data-m...
Influence of-structured--semi-structured--unstructured-data-on-various-data-m...Influence of-structured--semi-structured--unstructured-data-on-various-data-m...
Influence of-structured--semi-structured--unstructured-data-on-various-data-m...
shivz3
 
Data structure unitfirst part1
Data structure unitfirst part1Data structure unitfirst part1
Data structure unitfirst part1
Amar Rawat
 
Data Dictionary
Data DictionaryData Dictionary
Data Dictionary
Vishal Anand
 
General concepts: DDI
General concepts: DDIGeneral concepts: DDI
General concepts: DDI
Arhiv družboslovnih podatkov
 
Ch1
Ch1Ch1

What's hot (6)

Data Modeling PPT
Data Modeling PPTData Modeling PPT
Data Modeling PPT
 
Influence of-structured--semi-structured--unstructured-data-on-various-data-m...
Influence of-structured--semi-structured--unstructured-data-on-various-data-m...Influence of-structured--semi-structured--unstructured-data-on-various-data-m...
Influence of-structured--semi-structured--unstructured-data-on-various-data-m...
 
Data structure unitfirst part1
Data structure unitfirst part1Data structure unitfirst part1
Data structure unitfirst part1
 
Data Dictionary
Data DictionaryData Dictionary
Data Dictionary
 
General concepts: DDI
General concepts: DDIGeneral concepts: DDI
General concepts: DDI
 
Ch1
Ch1Ch1
Ch1
 

Viewers also liked

Multimedia01
Multimedia01Multimedia01
Multimedia01
Les Davy
 
Cosug 2012-lzy
Cosug 2012-lzyCosug 2012-lzy
Cosug 2012-lzy
OpenCity Community
 
Elements, Compounds & Mixtures Day 3
Elements, Compounds & Mixtures Day 3Elements, Compounds & Mixtures Day 3
Elements, Compounds & Mixtures Day 3
jmori1
 
My life
My lifeMy life
My life
dcbabb
 
Vesterinen: Etsivä nuorisotyö, ammattina välittäminen
Vesterinen: Etsivä nuorisotyö, ammattina välittäminenVesterinen: Etsivä nuorisotyö, ammattina välittäminen
Vesterinen: Etsivä nuorisotyö, ammattina välittäminen
Kouluterveyskysely
 
Options for filmingh
Options for filminghOptions for filmingh
Options for filmingh
FirstClassProductions
 
Goede leiders zijn goede verhalenvertellers - Hans Donckers - Beanmachine
Goede leiders zijn goede verhalenvertellers - Hans Donckers - BeanmachineGoede leiders zijn goede verhalenvertellers - Hans Donckers - Beanmachine
Goede leiders zijn goede verhalenvertellers - Hans Donckers - Beanmachine
Antwerp Management School
 
Privatsparande
PrivatsparandePrivatsparande
Privatsparande
GiftIdeasForBoyfriend
 
Infográfico Pessoal
Infográfico PessoalInfográfico Pessoal
Infográfico Pessoal
carmelitadesign
 
Doublerbuxtutorial
DoublerbuxtutorialDoublerbuxtutorial
Doublerbuxtutorialcutiekate78
 
Lecture ready class 5
Lecture ready class 5Lecture ready class 5
Lecture ready class 5
Les Davy
 
Walking the talk - 3 insights from Behavior Design
Walking the talk - 3 insights from Behavior DesignWalking the talk - 3 insights from Behavior Design
Walking the talk - 3 insights from Behavior Design
Angad Singh
 
Notam Sul/Sudeste - 01-mai-16
Notam Sul/Sudeste - 01-mai-16Notam Sul/Sudeste - 01-mai-16
Notam Sul/Sudeste - 01-mai-16
Carlos Carvalho
 
Empacotamento e backport de aplicações em debian
Empacotamento e backport de aplicações em debianEmpacotamento e backport de aplicações em debian
Empacotamento e backport de aplicações em debian
Andre Ferraz
 
3words pp
3words pp3words pp
3words pp
ebrown216
 
Link Building With Twitter
Link Building With TwitterLink Building With Twitter
Link Building With Twitter
Aman Talwar
 
Model answers
Model answers Model answers
Model answers
Marian Domeni
 

Viewers also liked (20)

Slide share test 110727
Slide share test 110727Slide share test 110727
Slide share test 110727
 
Multimedia01
Multimedia01Multimedia01
Multimedia01
 
Cosug 2012-lzy
Cosug 2012-lzyCosug 2012-lzy
Cosug 2012-lzy
 
Elements, Compounds & Mixtures Day 3
Elements, Compounds & Mixtures Day 3Elements, Compounds & Mixtures Day 3
Elements, Compounds & Mixtures Day 3
 
My life
My lifeMy life
My life
 
Linkedin
LinkedinLinkedin
Linkedin
 
Vesterinen: Etsivä nuorisotyö, ammattina välittäminen
Vesterinen: Etsivä nuorisotyö, ammattina välittäminenVesterinen: Etsivä nuorisotyö, ammattina välittäminen
Vesterinen: Etsivä nuorisotyö, ammattina välittäminen
 
Options for filmingh
Options for filminghOptions for filmingh
Options for filmingh
 
Goede leiders zijn goede verhalenvertellers - Hans Donckers - Beanmachine
Goede leiders zijn goede verhalenvertellers - Hans Donckers - BeanmachineGoede leiders zijn goede verhalenvertellers - Hans Donckers - Beanmachine
Goede leiders zijn goede verhalenvertellers - Hans Donckers - Beanmachine
 
Privatsparande
PrivatsparandePrivatsparande
Privatsparande
 
Infográfico Pessoal
Infográfico PessoalInfográfico Pessoal
Infográfico Pessoal
 
Doublerbuxtutorial
DoublerbuxtutorialDoublerbuxtutorial
Doublerbuxtutorial
 
Lecture ready class 5
Lecture ready class 5Lecture ready class 5
Lecture ready class 5
 
Walking the talk - 3 insights from Behavior Design
Walking the talk - 3 insights from Behavior DesignWalking the talk - 3 insights from Behavior Design
Walking the talk - 3 insights from Behavior Design
 
Globo
GloboGlobo
Globo
 
Notam Sul/Sudeste - 01-mai-16
Notam Sul/Sudeste - 01-mai-16Notam Sul/Sudeste - 01-mai-16
Notam Sul/Sudeste - 01-mai-16
 
Empacotamento e backport de aplicações em debian
Empacotamento e backport de aplicações em debianEmpacotamento e backport de aplicações em debian
Empacotamento e backport de aplicações em debian
 
3words pp
3words pp3words pp
3words pp
 
Link Building With Twitter
Link Building With TwitterLink Building With Twitter
Link Building With Twitter
 
Model answers
Model answers Model answers
Model answers
 

Similar to Data analytics with NOSQL

Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQL
Philippe Julio
 
Dwdmunit1 a
Dwdmunit1 aDwdmunit1 a
Dwdmunit1 a
bhagathk
 
nosql.pptx
nosql.pptxnosql.pptx
nosql.pptx
Prakash Zodge
 
Nosql
NosqlNosql
Nosql
ROXTAD71
 
Nosql
NosqlNosql
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dataconomy Media
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
DataStax
 
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
Felix Gessert
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology Landscape
ShivanandaVSeeri
 
Introduction of Data Science and Data Analytics
Introduction of Data Science and Data AnalyticsIntroduction of Data Science and Data Analytics
Introduction of Data Science and Data Analytics
VrushaliSolanke
 
How to Build Modern Data Architectures Both On Premises and in the Cloud
How to Build Modern Data Architectures Both On Premises and in the CloudHow to Build Modern Data Architectures Both On Premises and in the Cloud
How to Build Modern Data Architectures Both On Premises and in the Cloud
VMware Tanzu
 
Exploring NoSQL and implementing through Cassandra
Exploring NoSQL and implementing through CassandraExploring NoSQL and implementing through Cassandra
Exploring NoSQL and implementing through Cassandra
Dileep Kalidindi
 
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph AnalysisBig Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
Yuanyuan Tian
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture Overview
Christopher Foot
 
Introduction of big data unit 1
Introduction of big data unit 1Introduction of big data unit 1
Introduction of big data unit 1
RojaT4
 
Erciyes university
Erciyes universityErciyes university
Erciyes university
hothaifa alkhazraji
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
ElsonPaul2
 
Big Data and the growing relevance of NoSQL
Big Data and the growing relevance of NoSQLBig Data and the growing relevance of NoSQL
Big Data and the growing relevance of NoSQL
Abhijit Sharma
 
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
Big Data Value Association
 
big_data_case_studies.pdf
big_data_case_studies.pdfbig_data_case_studies.pdf
big_data_case_studies.pdf
vishal choudhary
 

Similar to Data analytics with NOSQL (20)

Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQL
 
Dwdmunit1 a
Dwdmunit1 aDwdmunit1 a
Dwdmunit1 a
 
nosql.pptx
nosql.pptxnosql.pptx
nosql.pptx
 
Nosql
NosqlNosql
Nosql
 
Nosql
NosqlNosql
Nosql
 
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
 
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology Landscape
 
Introduction of Data Science and Data Analytics
Introduction of Data Science and Data AnalyticsIntroduction of Data Science and Data Analytics
Introduction of Data Science and Data Analytics
 
How to Build Modern Data Architectures Both On Premises and in the Cloud
How to Build Modern Data Architectures Both On Premises and in the CloudHow to Build Modern Data Architectures Both On Premises and in the Cloud
How to Build Modern Data Architectures Both On Premises and in the Cloud
 
Exploring NoSQL and implementing through Cassandra
Exploring NoSQL and implementing through CassandraExploring NoSQL and implementing through Cassandra
Exploring NoSQL and implementing through Cassandra
 
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph AnalysisBig Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture Overview
 
Introduction of big data unit 1
Introduction of big data unit 1Introduction of big data unit 1
Introduction of big data unit 1
 
Erciyes university
Erciyes universityErciyes university
Erciyes university
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Big Data and the growing relevance of NoSQL
Big Data and the growing relevance of NoSQLBig Data and the growing relevance of NoSQL
Big Data and the growing relevance of NoSQL
 
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
 
big_data_case_studies.pdf
big_data_case_studies.pdfbig_data_case_studies.pdf
big_data_case_studies.pdf
 

Recently uploaded

ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative ClassifiersML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
MastanaihnaiduYasam
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
oaxefes
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Marlon Dumas
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
Vietnam Cotton & Spinning Association
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理 原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
tzu5xla
 
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
1tyxnjpia
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptxREUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
KiriakiENikolaidou
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
vasanthatpuram
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
slg6lamcq
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
TeukuEriSyahputra
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
keesa2
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
NABLAS株式会社
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
dataschool1
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
agdhot
 
Drownings spike from May to August in children
Drownings spike from May to August in childrenDrownings spike from May to August in children
Drownings spike from May to August in children
Bisnar Chase Personal Injury Attorneys
 

Recently uploaded (20)

ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative ClassifiersML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理 原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
 
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptxREUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
 
Drownings spike from May to August in children
Drownings spike from May to August in childrenDrownings spike from May to August in children
Drownings spike from May to August in children
 

Data analytics with NOSQL

  • 1. Data Analytics with NOSQL Mukundan Agaram Chris Weiss
  • 2. Some initial thoughts about data... Continual issues with large scale web apps – Data growth + query response time ● Data growth => performance degradation ● Explosion of big data “analytics” use cases – Increase in unstructured data ● More interconnectivity, more formats, lack of structure... ● Document oriented data (XML/JSON) are difficult to manage and search – Distributed server configurations ● Large systems, more distribution and HA Cloud services has aggravated these issues
  • 3. Agenda for the night ● What is NOSQL? ● Varieties of NOSQL ● Key Industry Use Cases ● Applications for Data Analytics ● Landscape ● Demos/Walkthroughs ● Closing Discussions
  • 4. What is NOSQL? ● “...mechanism for storage and retrieval of data that is modeled in means other than tabular relations used in relational databases.” Wikipedia ● Non SQL or Non-relational ● Not Only SQL ● Technically since late 1960... – E.g. IDMS, IMS, MUMPS, Cache, BerkeleyDB
  • 5. What is NOSQL? ● Drivers for modern day NOSQL – Web 2.0 – Big Data – Facebook, Google, Amazon, Expedia etc. – Horizontal scaling to clusters of computers ● Achilles heel for RDBMS – Cost – Provide ● HA ● Partition Tolerance (a.k.a sharding) ● Speed
  • 6. NOSQL - Drawbacks and Barriers ● Compromise on consistency (CAP Theorem) ● Custom query languages vs. SQL ● Lack of standardized interfaces ● Existing investments in RDBMS ● Most lack true ACID transactions. – Use an “eventually” consistent model – Data is replicated with a conflict resolution algorithm – Methods for conflict resolution and distribution vary significantly
  • 7. CAP Theorem ● a.k.a Brewer's theorem ● Impossible for a distributed computer system to simultaneously provide – Consistency ● all nodes see same data at same time – Availability ● Every request receives a response – Partition Tolerance ● Fault tolerance to partitioning because of network failures
  • 8. CAP alignment for NOSQL Source: http://blog.nahurst.com/visual-guide-to-nosql-systems
  • 9. NOSQL direction The landscape is morphing... ● Current NOSQL industry focus – Address large distributed systems reactionary to the CAP theorem ● The newer breed of NOSQL address important aspects such as ACID ● There is a new buzz word … – NewSQL
  • 11. NOSQL Model Classification Key Value Stores & Caches Data is represented as a collection of (K,V) pairs. In-memory, persistent or eventually persistent. Document Databases Data is stored in JSON document structures. RDF, OWL & Triple Stores Meaningful way to connect information. Can inference over triples (S,P,O). Can be represented graphically. SPARQL Wide Column Databases Extensible record set. Stores data tables as sections of columns. Great for EDW. Graph Databases Stores data as a graph G(V,E). Great for correlation analysis, recommendation engines and fraud detection. Multi-model Databases Combination of one or more varieties of the above.
  • 12. NOSQL Models ● Key-Value – Cache (EHCache, BigMemory, Coherence, Memcached) – Store (Redis, Riak, AeroSpike, Oracle NoSQL) ● Document (MongoDB, CouchDB, AmazonDynamoDB) ● Wide Column (Cassandra, HBase, Vertica) ● Graph (Neo4j, Titan, Giraph) ● Multi-model (OrientDB, ArangoDB, Sqrrl)
  • 14. Consider NOSQL for... ● Enabling “big data” and “web” scale – Massive distribution through horizontal scaling ● Performant queries (alternatives to RDBMS) – Denormalization and large horizontal scalability ● Massive write volumes (Facebook, Twitter) ● Fast and dynamic access to key data ● Flexible schemas and data types ● Data/Schema Migration ● Developer centric environments
  • 15. Consider NOSQL for... ● Diverse data organization options – Hierarchical correlation – Graph correlation – Semantic relationships – Set based analytics ● Caching in end usage format ● Data Archival ● Big Data Analytics – Cumulative metrics and insights – Correlation
  • 16. Where RDBMS/SQL is better.. ● OLTP ● Data Integrity ● SQL centricity ● Complex relationships – Exception of graph NOSQL ● Maturity, stability and standardization
  • 17. Use Cases ● Log management (unstructured data) ● Data synchronization (online vs. offline sources) – Shopping cart, Field sales/services, PoS, Gaming, Transportation/telemetry ● User profile management ● Customer 360 degree view ● Fraud detection ● Medical/Healthcare diagnosis ● Data Archival ● Recommendation Engines
  • 18. Applications for Data Analytics ● Complements (part of) Hadoop and Big Data ● Acts as the persistence infrastructure for larger machine learning use cases – Predictive Analytics – Fraud/Anomaly/Outlier Detection – Recommendation engines ● Provides a back drop for interesting data visualization initiatives – Integrate with visualization packages such as Tableau
  • 19. Interesting links ● Redis in Practice: Who's online? www.lukemelia.com/blog/archives/2010/01/17/redis-in-practice-whos-online/ ● Inventory list of NOSQL systems www.nosql-database.org ● Database Engine ranking and analytics www.db-engines.com ● Visual guide to NOSQL systems www.blog.nahurst.com/visual-guide-to-nosql-systems
  • 20. Case Studies / Demos ● Retail fraud detection – Neo4j – Contrasting with OrientDB – Tinkerpop/Gremlin/Blue Print ● 360 degree single view of voter information – MongoDB ● Schema on read – Hadoop
  • 21.
  • 22.
  • 23. Gremlin Blueprints Architecture Neo4j OrientDB TitanGraph ArangoDB
  • 24. Qualified Voter – Use Case ● Tracks registration information for all voters in Michigan ● Uses a tabular geography model ● Highly normalized schema – Data partitioned into subsets ● Enable local application instances and row level security ● Expensive queries when doing reporting ● Expensive queries for performing “single view” of voter ● Several tables with tens of millions of records
  • 26. Find the first 100 voters in Ingham county with status and school district SELECT V.VOTER_IDENTIFICATION_NUMBER,V.FIRST_NAME, V.LAST_NAME, G.CODE AS GENDER, IDS.NAME AS ID_STATUS, UST.NAME AS UOCAVA_STATUS, VA.ADDRESS_LINE_ONE, VA.CITY, VA.ZIP_CODE, DIS.NAME AS SCHOOL_DISTRICT FROM VOTER V, VOTER_ADDRESS VA, GENDER G, IDENTIFICATION_STATUS IDS, UOCAVA_STATUS UST, VOTER_STATUS_TYPE VST, STREET_RANGE SI, DISTINCT_POLITICAL_AREA DPA, DISTINCT_POLITICAL_AREA_DIS DPAD, DISTRICT DIS, DISTRICT_TYPE DT, COUNTY CO WHERE V.ID = VA.VOTER_ID AND V.GENDER_ID = G.ID AND V.IDENTIFICATION_STATUS_ID = IDS.ID AND V.UOCAVA_STATUS_ID = UST.ID AND V.VOTER_STATUS_TYPE_ID = VST.ID AND VST.NAME = 'Active' AND VA.STREET_RANGE_ID = SI.ID AND SI.DISTINCT_POLITICAL_AREA_ID = DPA.ID AND VA.IS_ACTIVE = 'Y' AND DPA.COUNTY_ID = CO.ID AND CO.NAME = 'Ingham' AND DPA.ID = DPAD.DISTINCT_POLITICAL_AREA_ID AND DPAD.DISTRICT_ID = DIS.ID AND DIS.DISTRICT_TYPE_ID = DT.ID AND DT.NAME = 'School' AND ROWNUM <= 100;
  • 27.
  • 28.
  • 29. Expensive in terms of IO ● Multiple objects read ● Two stage IO: ● Read index ● Read entire table row ● Selected and WHERE clause columns assembled and then filtered ● Resources for larger volume query would be high – memory, CPU, fast disk
  • 30. Parting conclusions ● NOSQL is a mixed bag of fruit ● This space is growing ● There are hundreds of products ● Best value is realized from identifying the correct use case – Functional requirements – Non-functional requirements
  • 31. Finally you can use NOSQL for...