InfiniFlux vs_RDBMS

InfiniFlux vs RDBMS
www.infiniflux.com

Overview
2
• The concept of InfiniFlux: the ultra high-speed database that stores and processes time series data.
• InfiniFlux has very different characteristics compared to the conventional DBMS such as Oracle and DB2 in
order to provide high-speed processing.
• Importance of understanding: the characteristics and architecture for time series log data.
• Technical characteristics: analyzing data using SQL and storing hundreds of thousands of records in a
second in real-time.
• Describe the differences between InfiniFlux and the conventional technology, and provide detailed
explanation about every item.
Document Overview

Comparison Chart
3
Characteristics InfiniFlux DB RDBMS
Transaction
Provide implicit transaction based on
Snapshot
Provide explicit transaction based on log file
INSERT speed 300,000 ~ 3,000,000 in a second less than 10,000 in a second
SELECT performance
Optimized for query over search & statistical
analysis
Optimized for OLTP
Characteristics of Data Time series log data Non-log transaction data
Updatable data Append Only (Write Once, Read Many) Updatable
DELETE operation Delete the oldest data Delete random data
Real-time index Real-time bitmap index Not real-time B+Tree
Data compression
High performance data compression in real-
time
Not support data compression
Real-time search for text Supported (real-time inverted index)
Not supported
(Even if supported, not in real-time)
Support time series data Support data partitioning by sharding
Support data partitioning based on general
timestamp column
Gap between data and index Occurrence of momentary gap Gap not occurred

Support Transaction
4
RDBMS
• Provide explicit transaction for all the data operation
• Transaction: set of operations that satisfy ACID properties
• Atomicity
• Consistency
• Isolation
• Durability
• Savepoint, Commit, and Rollback
InfiniFlux
• Not provide explicit transaction
• No transaction for data operation (input)
• Provide implicit transaction over internal meta data
• Table structure, index structure, and data file structure
Reasons for not
supporting transaction
• No need to store data based on transaction since it is time series data
• Fast storage and processing are much more valuable than transaction
• No need to pay costs and conduct complex operations for transactions

Input Performance
5
RDBMS
• Difficult to input data more than 5,000 per second
• High costs of logging for transaction. As a result, increase I/O costs.
• High costs of index update. Thus, B+Tree is suitable for data search.
• To maintain the consistency, all the operations of index update for a record is conducted in
consecutive order.
• Degrading system performance as the volume of index is greater than data
InfiniFlux
• Able to input hundreds of thousands data per second
• Efficient to create index through real-time bitmap index
• Costs for logging not required
• Parallel index can be created using multi-threads
• Reduce the amount of I/O data by real-time compression. As a result, improve overall system
performance.
• Able to improve performance greatly by creating tablespace based on multiple disks

Query Performance
6
RDBMS
• Row-oriented database has advantage over online transaction processing (OLTP).
• High query performance on high cardinality
• Reason: small search range brings high performance like B+Tree
• Mainly operating based on B+Tree, and select the most efficient index and use it.
• Efficient to search in a small range even with large number of data
• Slow query performance for statistical analysis
InfiniFlux
• Column-oriented database has advantage over online analytical processing (OLAP).
• High query performance on low cardinality
• Most of time series log data are low cardinality since it has high level of duplication.
• Most of them are bitmap indices, and it is efficient since more than two indices can be used at
the same time.
• Fast statistical query against massive data (hundreds of millions of data)
• Relatively slow on search for a certain record of OLTP from the whole DB
• In this case, global index needs to be created.

Characteristics of Data
7
RDBMS
• Optimized for storing data through transactions
• Financial information and individual identification information for banking transactions
• Important data that should be safely stored in conventional database such as Oracle.
• RDMBS is not related with the flow of time and can be updated or deleted.
InfiniFlux
• Target data are time series log data.
• Hundreds of thousands of data were created in a second.
• Update operation is not required and shows high level of data duplication.
• Target data are log files or similar data that were previously stored as text files.
• Constantly describe the status of a certain target over time.

Updatable Data
8
RDBMS
• Updatable model
• Data can be updated or modified anytime.
• Able to delete an arbitrary record at a random moment.
• Create a database that can access and modify all the data with ease.
InfiniFlux
• Write Once Read Many (WORM) model
• Since it is time series log data, data cannot be modified once it is stored.
• Read-oriented, and not support UPDATE operation at all
• Deleting data?
• Cannot delete a record at a certain time
• Able to delete the oldest record from the database in sequential order
* InfiniFlux provides delete feature in order to maintain a certain level of disk usage of
embedded device.

Index Technology
9
RDBMS
• In general, it uses B+Tree
• It is global index and all the record information are stored in the index.
• Expensive costs of logging and recovery operations for supporting transactions
* Due to the reasons mentioned above, it process only thousands data per second.
• Disk usage is increasing rapidly compared to the number of indices because values of raw data are
stored in the index.
• Not provide compression feature due to performance issue
InfiniFlux
• Support real-time bitmap index
• Local index structure of partition unit
• Composed with various indices over all the records
• Able to create index quickly since transaction support and logging operations are not required.
• Able to create millions of indices per second
• Costs of creating index will increase little by little even though the amount of data increased.
• Minimize the disk usage by supporting compression algorithms
• OLTP query is relatively slow when the amount of data increases indefinitely.
• Support semi-global index (4th quarter 2015)
• Able to provide high performance on OLTP query
• Able to provide high performance on compression and statistical query

Compression Technology
10
RDBMS
• No data compression issues.
* The reason for this is that, in general, the number of target records are maintained at a certain number.
In this environment, disk usage is not that big issue.
• The basic idea is that increase search performance at the expense of disk space.
• In many cases, clients are not interested in data compression itself.
• Even if compression is available, challenges below will be presented;
• Increase the amount of data usage due to the structure in data duplication of B+Tree
• Difficult to use the duplication property of row-oriented database
InfiniFlux
• Amount of disk usage is very important since it is assumed that the environment where large amount of
data is created.
• Conduct real-time compression twice and thus able to use disk space efficiently
• Logical compression
• Compress column-based duplicated data in a dictionary structure
• Duplicated bit strings over bitmap index at high compression logically
• Physical compression
• Real-time physical compression of partition pages when storing disks
• Disk usages show slow growth pattern not linear growth as the number of index increases.

Full Text Search
11
RDBMS
• In general, database do not support text search
• Provide alternative methods such as LIKE statement for the search.
• Partial search over the column were conducted through LIKE ‘%pattern%’.
In this case, full scan over all the records is operated, and resulted in slow performance.
• Impossible to use RDBMS for full text search
InfiniFlux
• Provide keyword index, and full text search is available for a certain VARCHAR column.
• “SEARCH” can be used for searching instead of “LIKE”.
• SELECT id FROM user_table WHERE address SEARCH ‘Texas’;
• Able to search text in UTF-8 format.
• Different processing methods are applied to English characters and 2 byte character
(Chinese/Japanese/Korean).
• English
• Separate words with special characters (e.g. “boy” and “boys” are different)
• CJK (Chinese, Japanese, and Korean)
• Using 2-gram method to search
• The word ‘강남구’ will be converted into ‘강남’ & ‘남구’ and operate

Concurrency Level
12
RDBMS
• Generally provide “Record Level Locking”
• Two options based on how to read records while updating
• Consistency Read
• Allow to read previous value of the record that is being updated.
• No confliction of lock  No waiting time
• Non Consistency Read (Record locking conflict occurred)
• Allow to wait and decide to commit/rollback on the record that is being updated.
• Conflicts arise while locking  Possible to wait indefinitely depends on the previous
transaction.
InfiniFlux
• Provide Lockless structure
• No locking conflict due to no update
• No lock mechanism against the records, thus, maximize the performance over data search and
analysis.

Time Series Data Analysis
13
Sliding
Memory
Window
Memory
Window
Memory
Window
File -1 File -2 File -3 File -4 File -5 File -6 File -7
Data
Insert
Current time Old time
RDBMS
• Time series data exist in the form of time data type (date, time, timestamp, and interval).
• It doesn’t recognize the data as time series data, rather treat it with other general data (number,
and varchar etc.).
• Separate devices for analyzing data based on the time base is not provided due to the reasons
above.
• Slow performance when analyzing data with time index.
InfiniFlux
• When data were entered, it creates physical partitions in sequential order.
• Able to access records in a certain time range directly due to the sequential order.
• When data were entered, the timestamp for the record will be stored in nano unit automatically
(_arrival_time as a hidden column).
• Able to operate data of the time base freely based on the hidden column.
• Select data from table where DURATION 10 minute (output data within 10 minutes)

Backup and Restore
14
Backup Restore
RDBMS
Backup data in a certain table or database area
externally
• Online backup is the basic strategy to store data
• Separately provide incremental backup in order to
reduce the time and costs of backup
The process of returning the data to the original database
• Impossible to use backup files and must go through
restore process
• Backup big files take a long time to restore
• Able to backup data based on a certain time, but
complicate to change the starting point.
InfiniFlux
• Backup the whole database based on a certain time
• Not able to backup data based on a certain table or
record
• Backup in a file or directory format multiple files
• Restore the whole backup files
• Overwrite the existing database

Backup and Mount
15
What is mount?
• Rather than storing backup database file, it loads and searches the contents of database after
loading only meta data.
• Similar to the concept of the disk mount of UNIX
• Able to access backup data directly in a certain time in a fast and efficient way.
RDBMS • Not supported
InfiniFlux
• MOUNT
• Mount backup data on 31st December 2014
e.g.) MOUNT DATABASE ‘/home/data/2014-12-31’
• UNMOUNT
• Unmount the mounted data
e.g.) UNMOUNT DATABASE ‘/home/data/2014-12-31’

The World's Fastest
Time Series DBMS
for IoT and Big Data
www.infiniflux.com
info@infiniflux.com
InfiniFlux

InfiniFlux vs_RDBMS

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to InfiniFlux vs_RDBMS

Similar to InfiniFlux vs_RDBMS (20)

Recently uploaded

Recently uploaded (20)

InfiniFlux vs_RDBMS