Purpose-built NoSQL Database for IoT by Basavaraj Soppannavar

1
Basavaraj Soppannavar
Sr. Strategist, IoT
Toshiba America Research Inc.
Purpose-built In-Memory NoSQL Database
For
Internet of Things
5th Aug 2017
Los Angeles

Agenda
 Internet of Things
 IoT Data & its properties
 GridDB
 Real Use Cases
GridDB by Toshiba 2

Internet of Things
GridDB by Toshiba 3

Internet of Things Predictions
Number of Connected Devices
4GridDB by Toshiba
By 2020 the number of connected devices will be
• 50 Billion – Cisco
• 28.1 Billion* – IDC
• 20.8 Billion* – Gartner
*not including smartphones & computers
Most IoT smart devices aren’t in your home or phone—they are in factories,
businesses, and healthcare – Intel Infographics
• 40.2 % in Business and Manufacturing
• 30.3 % in Healthcare
IoT Revenue projections
• $300 Billion – Gartner
• $470 Billion – Bain
IoT Economics

Technology Stack of IoT
Data Aggregation / Processing
Session / Communication
Transport
Link
Connectivity
Data Storage and Retrieval
CoAP, MQTT, DDS, XMPP, AMQP, HTTP
IPV4, IPV6
Ethernet, WiFi, Bluetooth, BLE, Zigbee, Zwave, RFiD, 2G, 3G, LTE
Wireless, USB, RJ45(Ethernet), DSL
Storm, Kafka, Fluentd, RabbitMQ
GridDB, HBase, Cassandra, MongoDB, MS-SQL, Hadoop
Analytics & AIDeviceandDataManagement
SecurityandPrivacy
BI, Visualization, Data Mining, DPP* Analytics, Machine Learning
Applications Mobile, Web, Business Apps
Device Sensors, Embedded chips, Cameras, Wearables
*Descriptive, Predictive, Prescriptive
5GridDB by Toshiba

Toshiba’s Full Stack
Solution for
IoT & Big Data
GridDB by Toshiba
6
GridDB NoSQL
Database

IoT Data & Databases
GridDB by Toshiba 7

Properties of IoT Data
Periodic
Large volume
but
Small record size
Structured
Time
Stamped
8GridDB by Toshiba
Timestamp Voltage Current Temperature
2017/05/03 10:45:00 100 0.64 20.5
2017/05/03 10:45:30 101 0.63 20.4
2017/05/03 10:46:00 99 0.65 20.5
.
.
.
.
.
.
.
.
.
.
.
.
Single record (size less than 100 bytes)
Millions of records

Database Requirements of IoT
Highly Available &
Fault Tolerant
Great read and write
performance for millions
of records
Time series data &
operations support
Fast Search and Range
Queries
Spatial and geo-location
support
Real-time streaming
support
9GridDB by Toshiba
Support for ever-increasing data (Scale Out)

Evolution of Database Management Systems
RDBMS
NoSQL DBs
Key Value Store
Wide Column Store
Document Store
Graph Store
Hadoop
OLAP / DW
Riak, Aerospike
Cassandra, HBase
MongoDB, Couchbase
Neo4j
MySQL, Postgres
Cloudera, Hortonworks
Teradata, Vertica, GreenPlum
RDBMS RDBMS
OLAP / DW
Operational / Transactional
Database
Data Warehouse for BI
and Analytics
OLAP – Online Analytical Processing
DW – Data Warehouse
10GridDB by Toshiba
Inspired by Source: https://practicalanalytics.co/2015/06/02/the-maturing-nosql-ecoystem-a-c-level-guide/
90s 2000s Today

GridDB
A Purpose-built In-Memory NoSQL Database for IoT
GridDB by Toshiba 11

What is GridDB?
Highly Scalable
In Memory
Distributed
Key-Value
IoT Database
12GridDB by Toshiba

GridDB – Highly Scalable Database for IoT
13GridDB by Toshiba

Highly Scalable Distributed Key-Container Database
14GridDB by Toshiba

NoSQL Data Models
15GridDB by Toshiba
• GridDB has a unique Key-Container data model
• Container can be visualized as a table of a Relational Database
• Fixed schema

Key Container Data Model
16GridDB by Toshiba
 Container is a group of data set with a schema
 GridDB supports 2 types of containers
 Collection container – For generic records management
 Time-series container – For time series records management
 Key Container model provides
 Data Consistency within the container (ACID is guaranteed within the container)
 Faster data retrieval and search because of schema
 TQL, an SQL-like query language for reading data from the containers

Key Container Data Model - Example
17GridDB by Toshiba
static class SMData {
@RowKey Date timestamp;
int voltage;
double current;
int temp;
}
TimeSeries<SMData> ts = store.putTimeSeries(SM101, SMData.class);
Schema definition
Creating a TS Container
Container name
“Key”
Schema

High Performance
18GridDB by Toshiba
GridDB’s hybrid composition of In-Memory and Disk architecture is optimized for maximum performance
Memory from multiple nodes
Node/Server Node Node Node
SSD/DiskSSD/HDD SSD/Disk SSD/Disk
Add new nodes
GridDB 4-node Cluster
In-Memory + Disk Hybrid
Excess data from memory is saved on to SDD/Disk

YCSB Performance Results
19GridDB by Toshiba
• Tests performed under same hardware systems (MS Azure Standard_D2 dual core CPUs, 7GB RAM per node)
• 1 client per core; 128 threads per client
*Tests performed by Fixstars
0
100
200
300
400
A B C D F
Avg.Throughput
('000ops/sec)
YCSB Workloads
Throughput - 16 nodes
GridDB
Cassandra
0
100
200
300
400
500
600
700
800
A B C D F
Avg.Throughput
('000ops/sec)
YCSB Workloads
Throughput - 32 nodes
GridDB
Cassandra
0
50
100
150
A B C D F
Latencyin
Microseconds
YCSB Workloads
Read Latency – 16 nodes
GridDB
Cassandra
Yahoo Cloud Servicing Benchmark (YCSB) comparing
GridDB and Cassandra shows that*
 Average throughput of GridDB is 4x-5x higher than
that of Cassandra
 Average latency of GridDB is 3x-4x lower than that of
Cassandra

Superior Stability
20GridDB by Toshiba
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
0 10,000 20,000 30,000 40,000 50,000 60,000 70,000 80,000 90,000
Throughput(ops)
Elapsed Time (seconds)
YCSB Workload A 24Hrs Stability test
GridDB
Cassandra
3hrs 15hrs9hrs 21hrs 25hrs
Tests performed by Fixstars

High Availability
21GridDB by Toshiba
Advanced Master-Slave Model - Hybrid Cluster Management
• No Single Point of Failure (SPOF) – Master node is selected automatically
• No Split Brain – Quorum Policy is applied
Autonomous Data Distribution
• Data distribution and failover are taken care of automatically
Master
Original Replica
Original Replica
Original Replica
Original Replica
OriginalReplica
Data Distribution Table (Cached)
Hybrid Cluster Management Failover
Node 1 Node 2 Node 3 Node 4 Node 5
Data Replication
Client Client Client
Add new nodes

Time Series Features
22GridDB by Toshiba
• TDPA
• GridDB implements Time Series Data Placement Algorithm for high frequency data to maximize
memory utilization
• Expiry Release Function
• Data retention period can be set to a particular period to release the old data and free storage
• Aggregate Functions
• MIN, MAX, AVG, VARIANCE, STDDEV
• Sampling and Interpolation Functions
• TIME_INTERPOLATED, TIME_SAMPLING, TIME_NEXT, TIME_PREVIOUS
• Trigger functions
• JMS and REST notifications
GridDB is optimized for Time-Series operations

Real Use Cases
1. Building Energy Management Systems
2. Smart Meters – Electric Power Company
3. Smart City – Ishinomaki City

1. Building Energy Management Systems
24GridDB by Toshiba
• 100+ buildings are managed by the BEMS company in Kawasaki, Japan
• BEMS company manages over 1 Peta Byte (million Gigabytes) of sensor data each year
• Average 5MB data per sensor per day or approximately 2GB data from each sensor per year
• 100-1000 sensors per building depending on the sqft area making the collected sensor data of 1TB per building
per year
GridDB was used for its easy scalability, simple data model and Time Series querying & functions

25GridDB by Toshiba
• One of Japan’s top Electric power companies
switched from a Relational Database to
GridDB
• The company saw an increase in throughput
by 2,250 times the old system
• Overall processing time was went down
considerably
• Data center costs reduced significantly
GridDB was used for its high performance, large data handling and reduced cost

26GridDB by Toshiba
• Has been running as a real system since April, 2016
• 3 million smart meters` data is collected every 30 minutes and is stored for 3 months
• Data size is approximately 2.6 TB
• 13 billion records
• Record size of 200 bytes
MDMS
MapReduce
Charge Cal. Imbalance Cal.
30 Min. Balancing
MapReduce
Read Value App
AppServer
Data Input GridDB
GridDB
RDB
Preliminary
Results Usage
Power
Retailers
Usage
Power
Retailers3 million
smart meters
SM
SM
SM
3 node cluster 3 node cluster
5 node cluster
Active-Standby Cluster
3 node cluster
4 node cluster
SM – Smart Meter
MDMS – Meter Data Management System
RDB – Relational Database

3. Smart City – Disaster-tolerant Ishinomaki City
27GridDB by Toshiba
GridDB was used for its high speed processing of large data, long-term data retention, maintain consistency
Post 2011 disaster recovery plan of Ishinomaki city

PoC of Consignment Charge Calculation System
28GridDB by Toshiba
• 30 million smart meters’ data is collected every 30 minutes
and is stored for 1 month
• Data size is approximately 8.6TB
• 43 billion records
• Record size of 200 bytes
• 1 month charge calculation for 30 million meter data was
executed in 96 minutes
MDMS
Imbalance
(43G records)
5 node cluster
MapReduce
Data Input
(30M data)
GridDB
6 node cluster
30 million
smart meters
SM
SM
SM 8.6TB
Charge
Calculation
(43G records)
Associating
Contract Info.
(30M data)
Execution Time
= 1 min 47 secs
Execution Time
= 9 mins
Execution Time
= 30 mins
Execution Time
= 55 mins

GridDB
Editions, Languages, Connectors

GridDB Editions
30GridDB by Toshiba

GridDB on Amazon AWS Marketplace
31GridDB by Toshiba

Languages and Connectors
• GridDB Community Edition is open sourced and is available on GitHub
• https://github.com/griddb
• Currently supports Java, C/C++, REST, Python & Ruby interfaces
• Go, PHP, Perl and JavaScript drivers will be added in the coming months
• MapReduce connector is available on GitHub
• https://github.com/griddb/griddb_hadoop_mapreduce
• KairosDB connector is available on GitHub
• https://github.com/griddb/griddb_kairosdb
• Spark connector is recently released on GitHub
• https://github.com/griddb/griddb_spark
• Kafka-GridDB integration blog post is up on www.griddb.net website
32GridDB by Toshiba

GridDB feature set
33GridDB by Toshiba
Horizontal scaling is near-linear and works great on commodity hardware
• Tested on 100 nodes per cluster, can scale up to 1000 nodes
GridDB's advanced master-slave model eliminates SPOF and split brain
Autonomous data distribution prevents data loss
ACID transactions are guaranteed at the container level
TQL, an SQL-like language for fast querying and analytics
GridDB’s hybrid composition of In-Memory and Disk architecture is optimized for maximum performance
GridDB is custom designed for IoT and other use cases that involve Time Series operations
• TS data types, temporal based querying, geometry type and BLOB types are supported
• Vector sets data type support is in development

Useful Links
• Developers’ website - www.griddb.net
• Toshiba GridDB website - http://solutions.toshiba.com/overview.html
• GitHub repository - https://github.com/griddb
• Quick Start Guide - http://www.griddb.net/en/docs/GridDB_QuickStartGuide.html
• Technical Reference - http://www.griddb.net/en/docs/GridDB_TechnicalReference.pdf
• API Reference - http://www.griddb.net/en/docs/GridDB_API_Reference.html
34GridDB by Toshiba
Contact
Basavaraj Soppannavar
Sr. Strategist, IoT
Basavaraj.Soppannavar@toshiba.com
@griddbcommunity
Follow GridDB

T H A N K YO U

ADDITIONAL INFO

Yahoo Cloud Services
Benchmark (YCSB)

YCSB
Yahoo Cloud Services Benchmark is an open source benchmarking suite designed by Yahoo
Labs for comparative performance evaluation of NoSQL Database Management Systems
• YCSB is used by DBMS vendors for ‘Benchmark Comparison’
• Traditional benchmarking tools such as TPC (Transaction Processing Performance Council) are used
to compare RDBMS
• YCSB measures/compares various attributes of the DBMS such as Latency, Throughput, Durability,
Scalability, Availability, Read/Write optimization, Sync/Async replication etc.
YCSB has 2 main parts
• YCSB Client – an extensible workload generator
• Client generated standard workloads can also be extended to generate user defined workloads that would be operated
on the system (on DBMS)
• YCSB Core Workloads – a set of scenarios generated by the client to run on the existing system
under test
• Core workloads give a well rounded picture of the system’s performance under test

YCSB Workloads
YCSB has 6 core workloads
Workload A-
Update heavy
Workload B -
Read mostly
Workload C -
Read only
Workload D -
Read latest
Workload E -
Short Ranges
Workload F -
Read-modify-
write
This workload has a mix of 50/50 reads and writes. An application example is a session store
recording recent actions
This workload has a 95/5 reads/write mix. Application example: photo tagging; add a tag is
an update, but most operations are to read tags
This workload is 100% read. Application example: user profile cache, where profiles are
constructed elsewhere (e.g., Hadoop)
In this workload, new records are inserted, and the most recently inserted records are the
most popular. Application example: user status updates; people want to read the latest
In this workload, short ranges of records are queried, instead of individual records.
Application example: threaded conversations, where each scan is for the posts in a given
thread (assumed to be clustered by thread id)
In this workload, the client will read a record, modify it, and write back the changes.
Application example: user database, where user records are read and modified by the user
or to record user activity

Purpose-built NoSQL Database for IoT by Basavaraj Soppannavar

More Related Content

What's hot

Similar to Purpose-built NoSQL Database for IoT by Basavaraj Soppannavar

More from Data Con LA

Recently uploaded

Purpose-built NoSQL Database for IoT by Basavaraj Soppannavar