Real-time
in
Big Data
Big Data
“Every two days now we create as much information
as we did from the dawn of civilization up until 2003.”
Eric Schmidt, Ex Google CEO

Real Time
“85% of respondents say the issue is not about volume
but the ability to analyze and act on data in real time”
Cap Gemini Study on Big Data 2012

Fast Data
“It’s About Fast (not just Big) Data”
Karl Keirstead, BMO Capital Markets 2013
Real-time on Big Data becomes
essential for survival of businesses
Fraud prevention
Algo trading

A/B-Testing

Campaign steering

Interactive Analytics App analytics
Recommendation engine
Trading risk analytics

Algorithmic decisions

Network monitoring

Realtime

Network Data
Web Logs

M2M

Sensors

Shopping Cart

Programmatic ad-serving

Big Data
Twitter

Point of Sale Data

Stock Data

Logicstics

Locations

Car Data

Financial TX
Real Time ?
Immediate
Answers
Immediate
Availability
Immediate Answers & Availability
Batch Import

Real-Time

Automatic response systems
● Offer-Caches

Response time
● Ad-Serving
● Re-Targeting

Trading analytics ●
● Recommendation
● Smart Grids
/ promotional items
● Guided Shopping
● SEO analytics
● Fraud detection
● Investment risk analytics
● Campaign Control
● Application monitoring
● Geo-spatial analytics ● Trend-Spotting
● Web-Analytics

< 1..10 milli sec
10..100 milli sec
1 sec

10 sec

● Geo-Steering
Customer account analytics ●
● Revenue assurance
● Prepaid-accounts

Lag Time

Answers

Interactive Analytics

Continuous Import

1 min

● Customer churn rate reduction

10 min

Post-mortem Analytics
Weekly

Daily

Online Investigation
Hourly

Every minute

Availability

1h
Every second
USE CASES IN ALL INDUSTRIES

Many Applications

All Industries
eCommerce
Services

Social
Networks

Telco

 Facetted
Search
 Web
analytics
 SEOanalytics
 OnlineAdvertising

 Ad serving
 Profiling
 Targeting

 Customer
attrition
prevention
 Network
monitoring
 Targeting
 Prepaid
account
mgmt

Finance
 Trend
analysis
 Fraud
detection
 Automatic
trading
 Risk
analysis

Energy
Oil and Gas
 Smart
metering
 Smart grids
 Wind parks
 Mining
 Solar Panels

Many
More








Production
Mining
M2M
Sensors
Genetics
Intelligence
Weather

Confidential

8
Real-time Requires New Technology
1

Immediate
Availability

2

Billion
Records

3

Immediate
Answers

4

Interactive
Analytics
Real-Time
Monitoring

Any Stream

Continuous
Data Import

Any Bus
Any File

5

Geo-Distributed
Processing

Realtime
Big Data
Engine

Ultra-fast
Querying

Real-Time
Dashboarding
Interactive
Analytics

6

Low
TCO
9
Web-Analytics
etracker is a leading web-analytics and campaign
steering company in Europe

 Real-time web-analytics for 50,000
domains delivering 10 billion web-clicks

 Continuous data import with maximum
latency of 30 seconds

 Complex interactive analytics for lifesegmentation of customer groups

 < 2 sec query response time for
> 100 concurrent interactive user

 Campaign steering – moving ahead
from trail and error to continuous
multidimensional optimization
Gasturbines
ParStream imports 500,000 sensor readings per sec
delivering real-time monitoring and long-term analytics

 5,000 sensors are delivering
1,800,000,000 measurements per hour

 ParStream immediately imports and
stores all sensor readings

 Real-time monitoring with ParStream
ensures early issue identification

 Long-term analytics for predictive
maintenance reduces downtime

 Maintenance of gas turbines is a more
lucrative business than the initial build
FMCG Retailer
ParStream extends usage of QlikView installation
from 400M to 6B records for interactive analytics

 Customer is the leading retail chain in
Austria, a long term QlikView customer

 POS-data analytics is heavily used
for price negotiations with vendors

 QlikView is easy to use and ultra fast
but limits data volume to 400M records

 Limited volume, time range and
granularity of data hinders negotiations

 ParStream extends usage of QlikView
from 2 weeks to 6 month of data

 Further extension to 30 billion records
planned to cover 2.5 years of data
Telecom
End-to-end network monitoring on packet-level detail
unveils bottle-necks unseen for decades
Netw
ork
Analy
tics

NPI
Analy
tics

Analy
tics

CRM/
CEM
Analy
tics

M2M
Analy
tics

 Continuous import with >1 million rows
per second per node

 Package level granularity delivers
Decentralized
storage & analytics

Ad-hoc integration

previously impossible insights

Cache

 Field trail discovered bottle-neck
nobody expected, billion dollar
investment saved

Logical data
warehouse
NoSQL

Federation Server

 Decentralized architecture capturing,
storing and analyzing data at source
Local

NDC

Local

NDC

Local

NDC

Local

NDC

Local

NDC

 Massive reduction in network traffic
due to decentralized storage

 Solution is blue-print for
Internet-of-Things use-cases
SEO Analytics at Searchmetrics

Interactive domain
traffic competitor
report & analysis

Google Search
First 100
domains
for 10 million
keywords in
10 countries

• Keyword-Analysis of competitor
domains
• Complex SQL Queries in Realtime

<1 sec response time

v

Application Server

• 7 Tbyte mport
• 10 billion records

Complex correlative
SQL queries of
many concurrent users
10,000,000,000
domain keyword relations

• < 1 sec Response time
• Reduction from 150 to 4 Servers
Bio-Technology
INRA MetaGenoPolis (MGP) analyzes 17 billion
records interactively – growing 100x per year

 INRA is the world leader in metagenomic research

 Up to 50 million different bacteria are
identified per stool sample

 Sample size will grow by 100x over
next 12 month

 Data volume will grow from 17 billion
to 2 trillion records

 Researchers analyze correlation of
bacteria presence with illnesses

 ParStream is used to interactively
discover and analyze correlations
Science: Climate Research
Detection of Hurricane Risk Areas
• Interactive Analytics of
weather simulation data

• Response time 0.1 sec
on 3 billion data records
• Multi-dimensional querying
on geo-location data

• Run complex queries In-Database
at very high speed
• No need for Cubes –
up-to-date & full granularity

• Continuously import
new data with low-latency
Facetted Search
Coface Services is the Innovation Leader
in reliable Business Information

 Interactive guided selection process
delivers better conversion rate

 Multi-lingual text search and
numeric-multiple-choice filters

 15 billion data points
 1,000 Coface columns
+10,000 Customer columns

 >100 concurrent users
 < 100 ms response time
Real-time Requires New Technology
1

Immediate
Availability

2

Billion
Records

3

Immediate
Answers

4

Interactive
Analytics
Real-Time
Monitoring

Any Stream

Continuous
Data Import

Any Bus
Any File

5

Geo-Distributed
Processing

Realtime
Big Data
Engine

Ultra-fast
Querying

Real-Time
Dashboarding
Interactive
Analytics

6

Low
TCO
18
Needs vs. Reality
You want…

What you get…

Scales on big data
and big streams

Does not scale
(traditional DBMS)

Sub-Second queries
high speed import

Too Slow
(Hadoop, Map Reduce)

Fully flexible
fully granular

Inflexible
(Cassandra, KVS)
ParStream Is Build For Fast Data
ParStream is the
fastest real-time database
for smart data
Continous
Import

Ultra-fast
Querying

High Query
Throughput

Billions of
Records

Thousands
Of Columns

Unique Combination of
continuous high speed import and
ultra-fast query response times
Outstanding Technology with USP –
high performance compressed index
 Patented high performance

Front-End

Application

Tool

compressed index - USP!

 Build from scratch in C++
 100 % own patented IP
 Leading edge DB architecture
 Massively parallel shared
nothing cluster architecture

C++
UDF - API

SQL API / JDBC / ODBC

Real-Time Analytics Engine

In-Memory and
Disk Technology

Massively Parallel
Processing (MPP)

 Optimized for standard hardware

High Performance
Compressed Index
(HPCI)

v

Multi-Dimensional
Partitioning

Shared Nothing
Architecture

3rd generation Columnar Storage
High Speed
Loader with Low Latency

and many Linux distributions

 Runs on single server, cluster
and all clouds

Map-Reduce

RDBMS

Raw-Data
High Performance Compressed Index (HPCI)
Massive Performance Gain On Analytical Operations –
Major Technological Innovation and Differentiation
Standard index architecture

– High Memory Requirements
– High Load on CPUs
– Latency due to Decompression
– Not Suitable for Big Data
Superior ParStream index architecture

+ Immediate Query Processing
+ No Need for Decompression
+ Massively reduced memory + IO load
+ Ultra-high Throughput
Highly Scalable
Standard Hardware + Standard Linux

Embedded
Systems

Single
Server

Cluster

Cloud
Real-time Query Performance
Query Response Time
9000
8000

Q#

PS (mS)

Factor

7797

264

29

2

8036

313

25

3

7949

381

20

4

6000

RS (mS)

1

7000

7086

129

55

5000
Parstream

4000

RedShift

3000
2000
1000
0

1

Query #

2

3

4

QUERY

1

select count(distinct AirlineID) as airlines, count(distinct FlightNum) from otp
where YearD BETWEEN 1997 AND 2012 AND DestState='NY' AND Quarter=3 AND DayOfWeek=4 AND OriginState='FL'

2

select count(distinct AirlineID) as airlines, count(distinct FlightNum), sum(Distance) from otp
where YearD BETWEEN 1997 AND 2012 AND DestState='NY' AND Quarter=3 AND DayOfWeek=4 AND OriginState='FL'

3

select count(distinct AirlineID) as airlines, count(distinct FlightNum), count(distinct Distance), sum(Distance) from otp
where YearD BETWEEN 1997 AND 2012 AND DestState='NY' AND Quarter=3 AND DayOfWeek=4 AND OriginState='FL'

4

select max(TaxiIn), sum(DepDelayMinutes), min(TaxiIn), avg(ArrDelayMinutes) from otp
where YearD BETWEEN 1997 AND 2012 AND DestState='NY' AND Quarter=3 AND DayOfWeek=4 AND OriginState='FL'

Environment: Single EC2 XL node with 15 GB RAM, 2 TB disk on Amazon AWS.
OTP Data Set with about 150 Million records
Comparison with leading analytical databases are available on request
ParStream – real-time demo

Try out the interactive ParStream demo on https://www.parstream.com/product/demos/
ParStream – The Company
• Founded 2008 in Cologne
• 50 employees in Cologne, Paris, Silicon Valley, Boston

• International Customers
• Running 24x7 in production for more than 3 years
• $ 15.6 M funding: Khosla Ventures (lead), Andy Bechtolsheim,
Crunchfund, Data Collective, Baker Capital, Tola Capital, and others
Thank you
Yes,we are hiring

Joerg.bienert@parstream.com

ParStream - Big Data for Business Users

  • 1.
  • 2.
    Big Data “Every twodays now we create as much information as we did from the dawn of civilization up until 2003.” Eric Schmidt, Ex Google CEO Real Time “85% of respondents say the issue is not about volume but the ability to analyze and act on data in real time” Cap Gemini Study on Big Data 2012 Fast Data “It’s About Fast (not just Big) Data” Karl Keirstead, BMO Capital Markets 2013
  • 3.
    Real-time on BigData becomes essential for survival of businesses Fraud prevention Algo trading A/B-Testing Campaign steering Interactive Analytics App analytics Recommendation engine Trading risk analytics Algorithmic decisions Network monitoring Realtime Network Data Web Logs M2M Sensors Shopping Cart Programmatic ad-serving Big Data Twitter Point of Sale Data Stock Data Logicstics Locations Car Data Financial TX
  • 4.
  • 5.
  • 6.
  • 7.
    Immediate Answers &Availability Batch Import Real-Time Automatic response systems ● Offer-Caches Response time ● Ad-Serving ● Re-Targeting Trading analytics ● ● Recommendation ● Smart Grids / promotional items ● Guided Shopping ● SEO analytics ● Fraud detection ● Investment risk analytics ● Campaign Control ● Application monitoring ● Geo-spatial analytics ● Trend-Spotting ● Web-Analytics < 1..10 milli sec 10..100 milli sec 1 sec 10 sec ● Geo-Steering Customer account analytics ● ● Revenue assurance ● Prepaid-accounts Lag Time Answers Interactive Analytics Continuous Import 1 min ● Customer churn rate reduction 10 min Post-mortem Analytics Weekly Daily Online Investigation Hourly Every minute Availability 1h Every second
  • 8.
    USE CASES INALL INDUSTRIES Many Applications All Industries eCommerce Services Social Networks Telco  Facetted Search  Web analytics  SEOanalytics  OnlineAdvertising  Ad serving  Profiling  Targeting  Customer attrition prevention  Network monitoring  Targeting  Prepaid account mgmt Finance  Trend analysis  Fraud detection  Automatic trading  Risk analysis Energy Oil and Gas  Smart metering  Smart grids  Wind parks  Mining  Solar Panels Many More        Production Mining M2M Sensors Genetics Intelligence Weather Confidential 8
  • 9.
    Real-time Requires NewTechnology 1 Immediate Availability 2 Billion Records 3 Immediate Answers 4 Interactive Analytics Real-Time Monitoring Any Stream Continuous Data Import Any Bus Any File 5 Geo-Distributed Processing Realtime Big Data Engine Ultra-fast Querying Real-Time Dashboarding Interactive Analytics 6 Low TCO 9
  • 10.
    Web-Analytics etracker is aleading web-analytics and campaign steering company in Europe  Real-time web-analytics for 50,000 domains delivering 10 billion web-clicks  Continuous data import with maximum latency of 30 seconds  Complex interactive analytics for lifesegmentation of customer groups  < 2 sec query response time for > 100 concurrent interactive user  Campaign steering – moving ahead from trail and error to continuous multidimensional optimization
  • 11.
    Gasturbines ParStream imports 500,000sensor readings per sec delivering real-time monitoring and long-term analytics  5,000 sensors are delivering 1,800,000,000 measurements per hour  ParStream immediately imports and stores all sensor readings  Real-time monitoring with ParStream ensures early issue identification  Long-term analytics for predictive maintenance reduces downtime  Maintenance of gas turbines is a more lucrative business than the initial build
  • 12.
    FMCG Retailer ParStream extendsusage of QlikView installation from 400M to 6B records for interactive analytics  Customer is the leading retail chain in Austria, a long term QlikView customer  POS-data analytics is heavily used for price negotiations with vendors  QlikView is easy to use and ultra fast but limits data volume to 400M records  Limited volume, time range and granularity of data hinders negotiations  ParStream extends usage of QlikView from 2 weeks to 6 month of data  Further extension to 30 billion records planned to cover 2.5 years of data
  • 13.
    Telecom End-to-end network monitoringon packet-level detail unveils bottle-necks unseen for decades Netw ork Analy tics NPI Analy tics Analy tics CRM/ CEM Analy tics M2M Analy tics  Continuous import with >1 million rows per second per node  Package level granularity delivers Decentralized storage & analytics Ad-hoc integration previously impossible insights Cache  Field trail discovered bottle-neck nobody expected, billion dollar investment saved Logical data warehouse NoSQL Federation Server  Decentralized architecture capturing, storing and analyzing data at source Local NDC Local NDC Local NDC Local NDC Local NDC  Massive reduction in network traffic due to decentralized storage  Solution is blue-print for Internet-of-Things use-cases
  • 14.
    SEO Analytics atSearchmetrics Interactive domain traffic competitor report & analysis Google Search First 100 domains for 10 million keywords in 10 countries • Keyword-Analysis of competitor domains • Complex SQL Queries in Realtime <1 sec response time v Application Server • 7 Tbyte mport • 10 billion records Complex correlative SQL queries of many concurrent users 10,000,000,000 domain keyword relations • < 1 sec Response time • Reduction from 150 to 4 Servers
  • 15.
    Bio-Technology INRA MetaGenoPolis (MGP)analyzes 17 billion records interactively – growing 100x per year  INRA is the world leader in metagenomic research  Up to 50 million different bacteria are identified per stool sample  Sample size will grow by 100x over next 12 month  Data volume will grow from 17 billion to 2 trillion records  Researchers analyze correlation of bacteria presence with illnesses  ParStream is used to interactively discover and analyze correlations
  • 16.
    Science: Climate Research Detectionof Hurricane Risk Areas • Interactive Analytics of weather simulation data • Response time 0.1 sec on 3 billion data records • Multi-dimensional querying on geo-location data • Run complex queries In-Database at very high speed • No need for Cubes – up-to-date & full granularity • Continuously import new data with low-latency
  • 17.
    Facetted Search Coface Servicesis the Innovation Leader in reliable Business Information  Interactive guided selection process delivers better conversion rate  Multi-lingual text search and numeric-multiple-choice filters  15 billion data points  1,000 Coface columns +10,000 Customer columns  >100 concurrent users  < 100 ms response time
  • 18.
    Real-time Requires NewTechnology 1 Immediate Availability 2 Billion Records 3 Immediate Answers 4 Interactive Analytics Real-Time Monitoring Any Stream Continuous Data Import Any Bus Any File 5 Geo-Distributed Processing Realtime Big Data Engine Ultra-fast Querying Real-Time Dashboarding Interactive Analytics 6 Low TCO 18
  • 19.
    Needs vs. Reality Youwant… What you get… Scales on big data and big streams Does not scale (traditional DBMS) Sub-Second queries high speed import Too Slow (Hadoop, Map Reduce) Fully flexible fully granular Inflexible (Cassandra, KVS)
  • 20.
    ParStream Is BuildFor Fast Data ParStream is the fastest real-time database for smart data Continous Import Ultra-fast Querying High Query Throughput Billions of Records Thousands Of Columns Unique Combination of continuous high speed import and ultra-fast query response times
  • 21.
    Outstanding Technology withUSP – high performance compressed index  Patented high performance Front-End Application Tool compressed index - USP!  Build from scratch in C++  100 % own patented IP  Leading edge DB architecture  Massively parallel shared nothing cluster architecture C++ UDF - API SQL API / JDBC / ODBC Real-Time Analytics Engine In-Memory and Disk Technology Massively Parallel Processing (MPP)  Optimized for standard hardware High Performance Compressed Index (HPCI) v Multi-Dimensional Partitioning Shared Nothing Architecture 3rd generation Columnar Storage High Speed Loader with Low Latency and many Linux distributions  Runs on single server, cluster and all clouds Map-Reduce RDBMS Raw-Data
  • 22.
    High Performance CompressedIndex (HPCI) Massive Performance Gain On Analytical Operations – Major Technological Innovation and Differentiation Standard index architecture – High Memory Requirements – High Load on CPUs – Latency due to Decompression – Not Suitable for Big Data Superior ParStream index architecture + Immediate Query Processing + No Need for Decompression + Massively reduced memory + IO load + Ultra-high Throughput
  • 23.
    Highly Scalable Standard Hardware+ Standard Linux Embedded Systems Single Server Cluster Cloud
  • 24.
    Real-time Query Performance QueryResponse Time 9000 8000 Q# PS (mS) Factor 7797 264 29 2 8036 313 25 3 7949 381 20 4 6000 RS (mS) 1 7000 7086 129 55 5000 Parstream 4000 RedShift 3000 2000 1000 0 1 Query # 2 3 4 QUERY 1 select count(distinct AirlineID) as airlines, count(distinct FlightNum) from otp where YearD BETWEEN 1997 AND 2012 AND DestState='NY' AND Quarter=3 AND DayOfWeek=4 AND OriginState='FL' 2 select count(distinct AirlineID) as airlines, count(distinct FlightNum), sum(Distance) from otp where YearD BETWEEN 1997 AND 2012 AND DestState='NY' AND Quarter=3 AND DayOfWeek=4 AND OriginState='FL' 3 select count(distinct AirlineID) as airlines, count(distinct FlightNum), count(distinct Distance), sum(Distance) from otp where YearD BETWEEN 1997 AND 2012 AND DestState='NY' AND Quarter=3 AND DayOfWeek=4 AND OriginState='FL' 4 select max(TaxiIn), sum(DepDelayMinutes), min(TaxiIn), avg(ArrDelayMinutes) from otp where YearD BETWEEN 1997 AND 2012 AND DestState='NY' AND Quarter=3 AND DayOfWeek=4 AND OriginState='FL' Environment: Single EC2 XL node with 15 GB RAM, 2 TB disk on Amazon AWS. OTP Data Set with about 150 Million records Comparison with leading analytical databases are available on request
  • 25.
    ParStream – real-timedemo Try out the interactive ParStream demo on https://www.parstream.com/product/demos/
  • 26.
    ParStream – TheCompany • Founded 2008 in Cologne • 50 employees in Cologne, Paris, Silicon Valley, Boston • International Customers • Running 24x7 in production for more than 3 years • $ 15.6 M funding: Khosla Ventures (lead), Andy Bechtolsheim, Crunchfund, Data Collective, Baker Capital, Tola Capital, and others
  • 27.
    Thank you Yes,we arehiring Joerg.bienert@parstream.com