SlideShare a Scribd company logo
1
BIG DATA
Kaushal Amin, Chief Technology Officer
KMS Technology – Atlanta, GA, USA
AGENDA
• What is Big Data
• Why not RDBMS
• NoSQL
• NewSQL
• Performance Comparison
• Case Studies
2
WHAT IS BIG DATA
WHAT IS BIG DATA?
4
“Big data exceeds the reach of commonly used
hardware environments and software tools to
capture, manage, and process it with in a tolerable
elapsed time for its user population.” - Teradata
Magazine article, 2011
“Big data refers to data sets whose size is beyond the
ability of typical database software tools to
capture, store, manage and analyze.” - The McKinsey
Global Institute, 2011
Volume and Variety of Data that is difficult to manage
using traditional data management technology
WHAT IS GENERATING BIG DATA?
Homeland Security
Real Time Search
Social
eCommerce
User Tracking &
Engagement
Financial Services
5
HOW MUCH DATA?
• 7 billion people
• Google processes 100 PB/day; 3 million servers
• Facebook has 300 PB + 500 TB/day; 35% of world’s
photos
• YouTube 1000 PB video storage; 4 billion views/day
• Twitter processes124 billion tweets/year
• SMS messages – 6.1T per year
• US Cell Calls – 2.2T minutes per year
• US Credit cards - 1.4B Cards; 20B transactions/year
6
LOWER COST OF STORAGE
7
What can I buy for $100 (USD) ?
(not adjusted for inflation)
Memory Capacity =
128 GB by 2020
x1420 in 20 years
Disk Capacity =
10 TB by 2020
x1000 in 20 years
HOW IS BIG DATA DIFFERENT?
• Automatically generated by a machine
– (e.g. Sensor embedded in an engine)
• Typically an entirely new source of data
– (e.g. Use of the internet)
• Not designed to be friendly
– (e.g. Text streams)
• May not have much values
– Need to focus on the important part
8
WHO UTILIZES IT?
• Companies and organizations who can leverage large
scale consumer produced data
– Marketing
– Consumer Markets (retail, airlines, hotels, Amazon, Netflix)
– Social Media (Facebook, Twitter, YouTube, LinkedIn)
– Search Providers (Google, Yahoo, Microsoft)
– People Data Aggregators (LexisNexis, Equifax, Acxiom)
• Other Enterprises are slowly getting into it
– Healthcare
– Financial Institutes
9
WHY NOT RDBMS?
TYPE OF DATA
• Structured Data (Transactions)
• Text Data (Web Content)
• Semi-structured Data (XML)
• Unstructured Data
– Social Network, SMS, Audio, Video
• Streaming Data
– You can only scan the data once as it travels on network
11
WHAT TO DO WITH THESE DATA?
• Aggregation and Statistics
– Data warehouse and OLAP
• Indexing, Searching, and Querying
– Keyword based search
– Pattern matching (XML/RDF)
• Knowledge discovery
– Data Mining
– Statistical Modeling
12
RDBMS LIMITATIONS
• Very difficult to scale horizontally (more boxes) as the
best way to scale is vertically by utilizing bigger box
– Physical limited to CPUs, Disk storage, and memory
– Large servers are too expensive and still can’t scale
• Requires structure of tables with rows and columns
– Does not deal well with unstructured data
• Relationships have to be pre-defined through schema
– Difficult to add newly discovered data quickly
13
NOSQL
NOSQL CHARACTERISTICS
• Cheap, easy to implement (open source)
– Cluster of cheap commodity servers with cheap storage
• Data are replicated to multiple nodes (therefore
identical and fault-tolerant) and can be partitioned
– Down nodes can easily be replaced while cluster is operational
– No single point of failure
• Easy to distribute
• Don't require a schema
• Massive Scalability
• Relaxed the data consistency requirement (CAP) –
less locking and resource contengency
15
NOSQL – SEVERAL OPTIONS
• Currently 150 implementations and growing
(http://nosql-database.org/)
• Multiple Types based on storage architecture
– Key-Value
– Document
– Column Family
– Graph
16
KEY-VALUE STORE
• Values stored in Key-Value Pairs in hashmap
• Distributed across nodes based on key
• Simple Operations: insert, fetch, update, and delete
• Best for storing high volume dataset with low
complexity (simple data model)
• Some of the market leaders:
– Riak
– Amazon Dynamo
– Voldermort
17
KEY-VALUE STORE
18
COLUMN FAMILY STORE
• Stores family of columns
• Columns are stored as Key-Value pair
• A super column is like a catalogue or a collection of other
columns
• Columns within a family can be distributed across nodes
• Supports semi-structured data with high scalability
• Some of the market leaders:
– HBase
– Cassandra
19
COLUMN FAMILY STORE (HBASE)
20
DOCUMENT STORE
• Supports more complex data model than Key-Value
• Collection of Documents – JSON, XML, other semi-
structured formats
• A document is a key value collection
• Multi-Index support
• Best for storing complex data model but less scalable
• Some of the market leaders:
– MongoDB
– CouchDB
– SimpleDB
21
DOCUMENT STORE
22
GRAPH DATABASE
• Social Graph with Relationship between Entities
• Great for Social Networks
– Facebook friends network
– LinkedIn connections network
• Some of the market leaders:
– Neo4j
– FlockDB
– Pregel
23
GRAPH DATABASE - EXAMPLE
24
• Nodes represent entities such
as
people, businesses, accounts,
or any other item you might
want to keep track of.
• Properties are pertinent
information that relate to
nodes such as
name, age, DOB, gender.
• Edges are the lines that
connect nodes to nodes or
nodes to properties and they
represent the relationship
between the two.
NEWSQL
NEWSQL
• Argument is that Relational Model is not the problem for lack of
scalability but the physical implementation limitations
• Development of new relational database products and services
designed to bring the benefits of the relational model to distributed
architectures
• Three Approaches:
– Optimized MySQL storage engines (ScaleDB, MemSQL, Akiban)
– New SQL databases (Clusterix, VoltDB, NuoDB)
– Sharding Middleware to split RDBMS across nodes
(ScaleBase, Scalearc, dbShards)
26
PERFORMANCE COMPARISON
SOURCE AND APPROACH
• Independent testing done by Altoros Systems Inc.
• More details at
http://www.networkworld.com/news/tech/2012/102212-nosql-
263595.html?page=1
• Using Amazon virtual machines to ensure verifiable results and
research transparency (which also helped minimize errors due to
hardware differences)
– Riak, a key-value store
– Cassandra, a column family store
– Hbase, a column family store
– MongoDB, a document-oriented database
– MySQL Cluster, a NewSQL
– Sharded MySQL, a NewSQL
28
PERFORMANCE ON WRITE
29
30
PERFORMANCE ON READ
CASE STUDIES
32
EXAMPLE: HEALTHCARE
A health care consultancy has made the data coming out of medical practices
the focus of its thriving business. The company collects billing and diagnostic
code data from 10,000 doctors on a daily, weekly and monthly basis to create
a virtual clinical integration model. The consulting company analyzes the data
to help the groups understand how well they are meeting the FTC guidelines
for negotiating with health plans and whether they qualify for enhanced
reimbursement based on offering a more cost-effective standard of care.
It also sends them automated information to better take care of patients, like
creating an automated outbound calling system for pediatric patients who
weren’t up to date on their vaccinations.
33
EXAMPLE: RETAIL
Walmart handles more than 1 million customer transactions every
hour, which is imported into databases estimated to contain more than 2.5
petabytes * of data — the equivalent of 167 times the information
contained in all the books in the US Library of Congress.
34
EXAMPLE: UTILITY
With a smart meter, a utility company goes from collecting one data point
a month per customer (using a meter reader in a truck or car) to receiving
3,000 data points for each customer each month, while smart meters
send usage information up to four times an hour.
One small Midwestern utility is using smart meter data to structure
conservation programs that analyze existing usage to forecast future
use, price usage based on demand and share that information with
customers who might decide to forestall doing that load of wash until
they can pay for it at the nonpeak price.
35
GROWTH FORECAST
36 36
© 2013 KMS Technology
Q&A

More Related Content

What's hot

Big data
Big dataBig data
Big data
Nausheen Hasan
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
Bernard Marr
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
Aswadmehar
 
Big data introduction
Big data introductionBig data introduction
Big data introduction
Chirag Ahuja
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
Jayant Mukherjee
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
Dr.Bhuvaneswari Velumani
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache Hadoop
Suman Saurabh
 
Big data
Big dataBig data
Big Data
Big DataBig Data
Big Data
Seminar Links
 
Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)
SiamAhmed16
 
Big data-ppt
Big data-pptBig data-ppt
Big data-ppt
Nazir Ahmed
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
Maruf Abdullah (Rion)
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI day
Mohammed Barakat
 
Big Data
Big DataBig Data
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
Md. Salman Ahmed
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyRohit Dubey
 
What is big data?
What is big data?What is big data?
What is big data?
David Wellman
 
Big data analytics in healthcare
Big data analytics in healthcareBig data analytics in healthcare
Big data analytics in healthcare
Joseph Thottungal
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Umair Shafique
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentals
rjain51
 

What's hot (20)

Big data
Big dataBig data
Big data
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
 
Big data introduction
Big data introductionBig data introduction
Big data introduction
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache Hadoop
 
Big data
Big dataBig data
Big data
 
Big Data
Big DataBig Data
Big Data
 
Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)
 
Big data-ppt
Big data-pptBig data-ppt
Big data-ppt
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI day
 
Big Data
Big DataBig Data
Big Data
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
What is big data?
What is big data?What is big data?
What is big data?
 
Big data analytics in healthcare
Big data analytics in healthcareBig data analytics in healthcare
Big data analytics in healthcare
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentals
 

Viewers also liked

Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Adam Kawa
 
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CA
Presto: Distributed SQL on Anything -  Strata Hadoop 2017 San Jose, CAPresto: Distributed SQL on Anything -  Strata Hadoop 2017 San Jose, CA
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CA
kbajda
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
Mishika Bharadwaj
 
Realtime Apache Hadoop at Facebook
Realtime Apache Hadoop at FacebookRealtime Apache Hadoop at Facebook
Realtime Apache Hadoop at Facebook
parallellabs
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
Ricardo Varela
 
Pig, Making Hadoop Easy
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop Easy
Nick Dimiduk
 
Integration of Hive and HBase
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBase
Hortonworks
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigMilind Bhandarkar
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start Tutorial
Carl Steinbach
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
royans
 
Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)
Kevin Weil
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reducerantav
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopZheng Shao
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio
 

Viewers also liked (14)

Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
 
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CA
Presto: Distributed SQL on Anything -  Strata Hadoop 2017 San Jose, CAPresto: Distributed SQL on Anything -  Strata Hadoop 2017 San Jose, CA
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CA
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
 
Realtime Apache Hadoop at Facebook
Realtime Apache Hadoop at FacebookRealtime Apache Hadoop at Facebook
Realtime Apache Hadoop at Facebook
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
 
Pig, Making Hadoop Easy
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop Easy
 
Integration of Hive and HBase
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBase
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start Tutorial
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
 
Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 

Similar to Big Data Overview 2013-2014

Lecture1
Lecture1Lecture1
Lecture1
Manish Singh
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
IdontKnow66967
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
AIMLSEMINARS
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in details
AbhishekKumarAgrahar2
 
bigdataintro.pptx
bigdataintro.pptxbigdataintro.pptx
bigdataintro.pptx
Albert Alex
 
Lecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfLecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdf
ahmedibrahimghnnam01
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
RojaT4
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQL
Philippe Julio
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
DataStax
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
Rishikese MR
 
Kaushal Amin & Big 5 IT trends in the world
Kaushal Amin & Big 5 IT trends in the worldKaushal Amin & Big 5 IT trends in the world
Kaushal Amin & Big 5 IT trends in the world
Quang PM
 
Technology Trends and Big Data in 2013-2014
Technology Trends and Big Data in 2013-2014Technology Trends and Big Data in 2013-2014
Technology Trends and Big Data in 2013-2014
KMS Technology
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
itnewsafrica
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Denodo
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
ElsonPaul2
 
Dbms and it infrastructure
Dbms and  it infrastructureDbms and  it infrastructure
Dbms and it infrastructureprojectandppt
 
Total Data Industry Report
Total Data Industry ReportTotal Data Industry Report
Total Data Industry Report
Ran Zhang
 
Database management system
Database management systemDatabase management system
Database management system
sangeethachandrabose
 
Foundations of business intelligence databases and information management
Foundations of business intelligence databases and information managementFoundations of business intelligence databases and information management
Foundations of business intelligence databases and information management
Amity University | FMS - DU | IMT | Stratford University | KKMI International Institute | AIMA | DTU
 

Similar to Big Data Overview 2013-2014 (20)

Lecture1
Lecture1Lecture1
Lecture1
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in details
 
bigdataintro.pptx
bigdataintro.pptxbigdataintro.pptx
bigdataintro.pptx
 
Lecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfLecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdf
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQL
 
Big data
Big dataBig data
Big data
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Kaushal Amin & Big 5 IT trends in the world
Kaushal Amin & Big 5 IT trends in the worldKaushal Amin & Big 5 IT trends in the world
Kaushal Amin & Big 5 IT trends in the world
 
Technology Trends and Big Data in 2013-2014
Technology Trends and Big Data in 2013-2014Technology Trends and Big Data in 2013-2014
Technology Trends and Big Data in 2013-2014
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Dbms and it infrastructure
Dbms and  it infrastructureDbms and  it infrastructure
Dbms and it infrastructure
 
Total Data Industry Report
Total Data Industry ReportTotal Data Industry Report
Total Data Industry Report
 
Database management system
Database management systemDatabase management system
Database management system
 
Foundations of business intelligence databases and information management
Foundations of business intelligence databases and information managementFoundations of business intelligence databases and information management
Foundations of business intelligence databases and information management
 

More from KMS Technology

A journey to a Full Stack Tester
A journey to a Full Stack Tester A journey to a Full Stack Tester
A journey to a Full Stack Tester
KMS Technology
 
React & Redux, how to scale?
React & Redux, how to scale?React & Redux, how to scale?
React & Redux, how to scale?
KMS Technology
 
Sexy React Stack
Sexy React StackSexy React Stack
Sexy React Stack
KMS Technology
 
Common design principles and design patterns in automation testing
Common design principles and design patterns in automation testingCommon design principles and design patterns in automation testing
Common design principles and design patterns in automation testing
KMS Technology
 
[Webinar] Test First, Fail Fast - Simplifying the Tester's Transition to DevOps
[Webinar] Test First, Fail Fast - Simplifying the Tester's Transition to DevOps[Webinar] Test First, Fail Fast - Simplifying the Tester's Transition to DevOps
[Webinar] Test First, Fail Fast - Simplifying the Tester's Transition to DevOps
KMS Technology
 
KMSNext Roadmap
KMSNext RoadmapKMSNext Roadmap
KMSNext Roadmap
KMS Technology
 
KMS Introduction
KMS IntroductionKMS Introduction
KMS Introduction
KMS Technology
 
What's new in the Front-end development nowadays?
What's new in the Front-end development nowadays?What's new in the Front-end development nowadays?
What's new in the Front-end development nowadays?
KMS Technology
 
JavaScript - No Longer A Toy Language
JavaScript - No Longer A Toy LanguageJavaScript - No Longer A Toy Language
JavaScript - No Longer A Toy Language
KMS Technology
 
JavaScript No longer A “toy” Language
JavaScript No longer A “toy” LanguageJavaScript No longer A “toy” Language
JavaScript No longer A “toy” Language
KMS Technology
 
Preparations For A Successful Interview
Preparations For A Successful InterviewPreparations For A Successful Interview
Preparations For A Successful Interview
KMS Technology
 
Introduction To Single Page Application
Introduction To Single Page ApplicationIntroduction To Single Page Application
Introduction To Single Page ApplicationKMS Technology
 
AWS: Scaling With Elastic Beanstalk
AWS: Scaling With Elastic BeanstalkAWS: Scaling With Elastic Beanstalk
AWS: Scaling With Elastic Beanstalk
KMS Technology
 
Behavior-Driven Development and Automation Testing Using Cucumber Framework W...
Behavior-Driven Development and Automation Testing Using Cucumber Framework W...Behavior-Driven Development and Automation Testing Using Cucumber Framework W...
Behavior-Driven Development and Automation Testing Using Cucumber Framework W...
KMS Technology
 
KMS Introduction
KMS IntroductionKMS Introduction
KMS Introduction
KMS Technology
 
Technology Application Development Trends For IT Students
Technology Application Development Trends For IT StudentsTechnology Application Development Trends For IT Students
Technology Application Development Trends For IT StudentsKMS Technology
 
Contributors for Delivering a Successful Testing Project Seminar
Contributors for Delivering a Successful Testing Project SeminarContributors for Delivering a Successful Testing Project Seminar
Contributors for Delivering a Successful Testing Project Seminar
KMS Technology
 
Increase Chances to Be Hired as Software Developers - 2014
Increase Chances to Be Hired as Software Developers - 2014Increase Chances to Be Hired as Software Developers - 2014
Increase Chances to Be Hired as Software Developers - 2014
KMS Technology
 
Behavior Driven Development and Automation Testing Using Cucumber
Behavior Driven Development and Automation Testing Using CucumberBehavior Driven Development and Automation Testing Using Cucumber
Behavior Driven Development and Automation Testing Using CucumberKMS Technology
 
Software Technology Trends in 2013-2014
Software Technology Trends in 2013-2014Software Technology Trends in 2013-2014
Software Technology Trends in 2013-2014
KMS Technology
 

More from KMS Technology (20)

A journey to a Full Stack Tester
A journey to a Full Stack Tester A journey to a Full Stack Tester
A journey to a Full Stack Tester
 
React & Redux, how to scale?
React & Redux, how to scale?React & Redux, how to scale?
React & Redux, how to scale?
 
Sexy React Stack
Sexy React StackSexy React Stack
Sexy React Stack
 
Common design principles and design patterns in automation testing
Common design principles and design patterns in automation testingCommon design principles and design patterns in automation testing
Common design principles and design patterns in automation testing
 
[Webinar] Test First, Fail Fast - Simplifying the Tester's Transition to DevOps
[Webinar] Test First, Fail Fast - Simplifying the Tester's Transition to DevOps[Webinar] Test First, Fail Fast - Simplifying the Tester's Transition to DevOps
[Webinar] Test First, Fail Fast - Simplifying the Tester's Transition to DevOps
 
KMSNext Roadmap
KMSNext RoadmapKMSNext Roadmap
KMSNext Roadmap
 
KMS Introduction
KMS IntroductionKMS Introduction
KMS Introduction
 
What's new in the Front-end development nowadays?
What's new in the Front-end development nowadays?What's new in the Front-end development nowadays?
What's new in the Front-end development nowadays?
 
JavaScript - No Longer A Toy Language
JavaScript - No Longer A Toy LanguageJavaScript - No Longer A Toy Language
JavaScript - No Longer A Toy Language
 
JavaScript No longer A “toy” Language
JavaScript No longer A “toy” LanguageJavaScript No longer A “toy” Language
JavaScript No longer A “toy” Language
 
Preparations For A Successful Interview
Preparations For A Successful InterviewPreparations For A Successful Interview
Preparations For A Successful Interview
 
Introduction To Single Page Application
Introduction To Single Page ApplicationIntroduction To Single Page Application
Introduction To Single Page Application
 
AWS: Scaling With Elastic Beanstalk
AWS: Scaling With Elastic BeanstalkAWS: Scaling With Elastic Beanstalk
AWS: Scaling With Elastic Beanstalk
 
Behavior-Driven Development and Automation Testing Using Cucumber Framework W...
Behavior-Driven Development and Automation Testing Using Cucumber Framework W...Behavior-Driven Development and Automation Testing Using Cucumber Framework W...
Behavior-Driven Development and Automation Testing Using Cucumber Framework W...
 
KMS Introduction
KMS IntroductionKMS Introduction
KMS Introduction
 
Technology Application Development Trends For IT Students
Technology Application Development Trends For IT StudentsTechnology Application Development Trends For IT Students
Technology Application Development Trends For IT Students
 
Contributors for Delivering a Successful Testing Project Seminar
Contributors for Delivering a Successful Testing Project SeminarContributors for Delivering a Successful Testing Project Seminar
Contributors for Delivering a Successful Testing Project Seminar
 
Increase Chances to Be Hired as Software Developers - 2014
Increase Chances to Be Hired as Software Developers - 2014Increase Chances to Be Hired as Software Developers - 2014
Increase Chances to Be Hired as Software Developers - 2014
 
Behavior Driven Development and Automation Testing Using Cucumber
Behavior Driven Development and Automation Testing Using CucumberBehavior Driven Development and Automation Testing Using Cucumber
Behavior Driven Development and Automation Testing Using Cucumber
 
Software Technology Trends in 2013-2014
Software Technology Trends in 2013-2014Software Technology Trends in 2013-2014
Software Technology Trends in 2013-2014
 

Recently uploaded

The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 

Recently uploaded (20)

The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 

Big Data Overview 2013-2014

  • 1. 1 BIG DATA Kaushal Amin, Chief Technology Officer KMS Technology – Atlanta, GA, USA
  • 2. AGENDA • What is Big Data • Why not RDBMS • NoSQL • NewSQL • Performance Comparison • Case Studies 2
  • 3. WHAT IS BIG DATA
  • 4. WHAT IS BIG DATA? 4 “Big data exceeds the reach of commonly used hardware environments and software tools to capture, manage, and process it with in a tolerable elapsed time for its user population.” - Teradata Magazine article, 2011 “Big data refers to data sets whose size is beyond the ability of typical database software tools to capture, store, manage and analyze.” - The McKinsey Global Institute, 2011 Volume and Variety of Data that is difficult to manage using traditional data management technology
  • 5. WHAT IS GENERATING BIG DATA? Homeland Security Real Time Search Social eCommerce User Tracking & Engagement Financial Services 5
  • 6. HOW MUCH DATA? • 7 billion people • Google processes 100 PB/day; 3 million servers • Facebook has 300 PB + 500 TB/day; 35% of world’s photos • YouTube 1000 PB video storage; 4 billion views/day • Twitter processes124 billion tweets/year • SMS messages – 6.1T per year • US Cell Calls – 2.2T minutes per year • US Credit cards - 1.4B Cards; 20B transactions/year 6
  • 7. LOWER COST OF STORAGE 7 What can I buy for $100 (USD) ? (not adjusted for inflation) Memory Capacity = 128 GB by 2020 x1420 in 20 years Disk Capacity = 10 TB by 2020 x1000 in 20 years
  • 8. HOW IS BIG DATA DIFFERENT? • Automatically generated by a machine – (e.g. Sensor embedded in an engine) • Typically an entirely new source of data – (e.g. Use of the internet) • Not designed to be friendly – (e.g. Text streams) • May not have much values – Need to focus on the important part 8
  • 9. WHO UTILIZES IT? • Companies and organizations who can leverage large scale consumer produced data – Marketing – Consumer Markets (retail, airlines, hotels, Amazon, Netflix) – Social Media (Facebook, Twitter, YouTube, LinkedIn) – Search Providers (Google, Yahoo, Microsoft) – People Data Aggregators (LexisNexis, Equifax, Acxiom) • Other Enterprises are slowly getting into it – Healthcare – Financial Institutes 9
  • 11. TYPE OF DATA • Structured Data (Transactions) • Text Data (Web Content) • Semi-structured Data (XML) • Unstructured Data – Social Network, SMS, Audio, Video • Streaming Data – You can only scan the data once as it travels on network 11
  • 12. WHAT TO DO WITH THESE DATA? • Aggregation and Statistics – Data warehouse and OLAP • Indexing, Searching, and Querying – Keyword based search – Pattern matching (XML/RDF) • Knowledge discovery – Data Mining – Statistical Modeling 12
  • 13. RDBMS LIMITATIONS • Very difficult to scale horizontally (more boxes) as the best way to scale is vertically by utilizing bigger box – Physical limited to CPUs, Disk storage, and memory – Large servers are too expensive and still can’t scale • Requires structure of tables with rows and columns – Does not deal well with unstructured data • Relationships have to be pre-defined through schema – Difficult to add newly discovered data quickly 13
  • 14. NOSQL
  • 15. NOSQL CHARACTERISTICS • Cheap, easy to implement (open source) – Cluster of cheap commodity servers with cheap storage • Data are replicated to multiple nodes (therefore identical and fault-tolerant) and can be partitioned – Down nodes can easily be replaced while cluster is operational – No single point of failure • Easy to distribute • Don't require a schema • Massive Scalability • Relaxed the data consistency requirement (CAP) – less locking and resource contengency 15
  • 16. NOSQL – SEVERAL OPTIONS • Currently 150 implementations and growing (http://nosql-database.org/) • Multiple Types based on storage architecture – Key-Value – Document – Column Family – Graph 16
  • 17. KEY-VALUE STORE • Values stored in Key-Value Pairs in hashmap • Distributed across nodes based on key • Simple Operations: insert, fetch, update, and delete • Best for storing high volume dataset with low complexity (simple data model) • Some of the market leaders: – Riak – Amazon Dynamo – Voldermort 17
  • 19. COLUMN FAMILY STORE • Stores family of columns • Columns are stored as Key-Value pair • A super column is like a catalogue or a collection of other columns • Columns within a family can be distributed across nodes • Supports semi-structured data with high scalability • Some of the market leaders: – HBase – Cassandra 19
  • 20. COLUMN FAMILY STORE (HBASE) 20
  • 21. DOCUMENT STORE • Supports more complex data model than Key-Value • Collection of Documents – JSON, XML, other semi- structured formats • A document is a key value collection • Multi-Index support • Best for storing complex data model but less scalable • Some of the market leaders: – MongoDB – CouchDB – SimpleDB 21
  • 23. GRAPH DATABASE • Social Graph with Relationship between Entities • Great for Social Networks – Facebook friends network – LinkedIn connections network • Some of the market leaders: – Neo4j – FlockDB – Pregel 23
  • 24. GRAPH DATABASE - EXAMPLE 24 • Nodes represent entities such as people, businesses, accounts, or any other item you might want to keep track of. • Properties are pertinent information that relate to nodes such as name, age, DOB, gender. • Edges are the lines that connect nodes to nodes or nodes to properties and they represent the relationship between the two.
  • 26. NEWSQL • Argument is that Relational Model is not the problem for lack of scalability but the physical implementation limitations • Development of new relational database products and services designed to bring the benefits of the relational model to distributed architectures • Three Approaches: – Optimized MySQL storage engines (ScaleDB, MemSQL, Akiban) – New SQL databases (Clusterix, VoltDB, NuoDB) – Sharding Middleware to split RDBMS across nodes (ScaleBase, Scalearc, dbShards) 26
  • 28. SOURCE AND APPROACH • Independent testing done by Altoros Systems Inc. • More details at http://www.networkworld.com/news/tech/2012/102212-nosql- 263595.html?page=1 • Using Amazon virtual machines to ensure verifiable results and research transparency (which also helped minimize errors due to hardware differences) – Riak, a key-value store – Cassandra, a column family store – Hbase, a column family store – MongoDB, a document-oriented database – MySQL Cluster, a NewSQL – Sharded MySQL, a NewSQL 28
  • 32. 32 EXAMPLE: HEALTHCARE A health care consultancy has made the data coming out of medical practices the focus of its thriving business. The company collects billing and diagnostic code data from 10,000 doctors on a daily, weekly and monthly basis to create a virtual clinical integration model. The consulting company analyzes the data to help the groups understand how well they are meeting the FTC guidelines for negotiating with health plans and whether they qualify for enhanced reimbursement based on offering a more cost-effective standard of care. It also sends them automated information to better take care of patients, like creating an automated outbound calling system for pediatric patients who weren’t up to date on their vaccinations.
  • 33. 33 EXAMPLE: RETAIL Walmart handles more than 1 million customer transactions every hour, which is imported into databases estimated to contain more than 2.5 petabytes * of data — the equivalent of 167 times the information contained in all the books in the US Library of Congress.
  • 34. 34 EXAMPLE: UTILITY With a smart meter, a utility company goes from collecting one data point a month per customer (using a meter reader in a truck or car) to receiving 3,000 data points for each customer each month, while smart meters send usage information up to four times an hour. One small Midwestern utility is using smart meter data to structure conservation programs that analyze existing usage to forecast future use, price usage based on demand and share that information with customers who might decide to forestall doing that load of wash until they can pay for it at the nonpeak price.
  • 36. 36 36
  • 37. © 2013 KMS Technology Q&A

Editor's Notes

  1. ActiveInsight