10.000 ft. overview about the options of storing values of 250.000 sensors from a paint shop in automotive sector; presentation for the management of one of our customers.
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Big Data in Production Environments
1. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
1
Proposal for establishing
modern concepts
of data storage and analytics
to production data
2. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
2
Current situation
3. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
3
Current situation
Just the Numbers
✔ Approx. 270.000 sensors in one installation (AUDI Györ)
(but only 17.000 sensures are currently tracked)
✔ Lots of 'unsynchronized' control desks, respective their data
✔ Lot of duplicated data
(because of the 'home-grown' failover/replication concept)
✔ No historical data
(because the amount of data is
overwhelming and can't be
handeled)
✔ Problems with scalability
4. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
4
Current situation
Outdated technologies
✔ Trend Server is developed in Delphi: Who develops in that?
✔ Microsoft SQL Server: not fast enough
✔ Technological breaches between several technologies
Bottlenecks
✔ Query slow for mor than 750k events
✔ No more than 7500 CSV files
✔ CSV & SQL server for the same tasks
5. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
5
Current situation
Scalability and fault tolerance
✔ Only few sensor can be saved
✔ IOM synchronization problems
✔ Buffered data saved with the same
timestamp
✔ Different IOM saved same data with
different timestamps
No integration /
standalone application
✔ Data can not be accessed from every
place (control desk)
✔ Data can not be recorded in case of
failure
6. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
6
Big data and NoSQL
7. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
7
Why the relational model
… sometimes isn't enough
✔ Can't handle extremely large data amounts (in extreme 15
Petabyte data in Gov. Of India)
✔ Hard to scale (esp. scaleing out adding nodes to handle the load)→
✔ Hard to deal with 'unstructured' data due to strict data model
✔ The valuable transactional model sometimes is an overkill
8. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
8
Dealing with data
… awfull lots of data
… petabytes
… and even behind this
plus NoSQL
9. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
9
Dealing with data
Characteristics and value proposition
✔ Big Data gains momentum
(data generates value)
➢ High data velocity
➢ Data variety
➢ Data Volume
➢ Data complexity
✔ Continuous availability
✔ Data location independence
✔ Flexible data model (schemaless databases)
✔ Improved architecture and enhanced analytics
10. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
10
Problems with NoSQL
Document - oriented MongoDB
CouchDB
Column Store Big Table
HBase
Key-Value Cassandra
DynamoDB
Azure Table Storage
Riak
BerkeleyDB
Graph Neo4J
Many players, several concepts, no one size fits all approach
and no standards
11. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
11
Why Cassandra?
Because people with the same problems have chosen it ...
“I can create a Cassandra cluster
in any region of the world in 10 minutes.
When marketing decides we want to move
into a certain part of the world, we’re ready.”
12. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
12
Why Cassandra
Scalability
✔ Add nodes to scale
✔ Millions operations
✔ Low latency in read/write
operations
13. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
13
Why Cassandra
Availability
✔ Created to be distributed
✔ Resistant and flexible to failures
✔ Different data centers
(probably in different parts
of the world)
14. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
14
Why Cassandra
Replication
15. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
15
Why Cassandra
Sometimes things go wrong:
✔ Hardware fails
✔ Bug
✔ Power
✔ Natural disaster
and then...
✔ Fast node recovery
✔ Auto-Balancing when a
node fails
✔ Transparent to the client
16. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
16
Why Cassandra
Easy to use
✔ Large ecosystem
✔ Well documented
✔ Full Java support
✔ SQL-like syntax
INSERT INTO sensor_by_day
(sensor_id,date,event_time,value)
VALUES
(’1234ABCD’,’2013-04-03′,’2013-04-03 07:01:00′,’72F’);
17. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
17
Time Series in Cassandra
Cassandra can store up to 2 billion columns per row,
but if we’re storing data every millisecond you wouldn’t
even get a month’s worth of data.
The solution is to use a pattern called row partitioning
by adding data to the row key to limit the amount of columns
you get per device.
Almost no limits!
18. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
18
Data analysis goals
✔ Low latency (interactive) queries on historical data: enable
faster decisions
✔ Low latency queries on live data (streaming): enable
decisions on real-time data
✔ Sophisticated data processing:
enable “better” decisions
(e.g. anomaly detection,
trend analysis)
19. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
19
Spark ecosystem
Well integrated with Cassandra and includes:
✔ SQL-like interface
✔ Machine learning:
Algorithms that can learn from data, used for predictions
(predictive maintenance: exploit patterns found in historical and
transactional data to identify risks and opportunities)
✔ Streaming:
Real-time streaming data like
sliding windows
20. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
20
Use Cases
21. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
21
Use Case
✔ Data from Oven will be collected
✔ Cassandra stores sequentially
✔ TrendPage reads sequentially for
faster graphic creation.
22. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
22
Use Case
Data Model to support queries
✔ Store data per oven
✔ Store time series
in order: first to last
✔ Get all data for one oven
Queries needed
✔ Get data for a single date
and time
✔ Get data for a range
of dates and times
Cassandra is really good for time-series data
because you can write one column for each period
in your series and then query across a range of time
using sub-string matching.
This is best done using columns for each period
rather than rows, as you get huge IO efficiency
wins from loading only a single row per query.
– MyDrive Telemetry (15 billion records on average)
23. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
23
Time Series in Cassandra
The data model
✔ Row Key is Time Identifier
✔ Column Values are Events
✔ Columns Values are Measurements
✔ Rows Can be Very Wide
1 s Schema
Faster data storage in database
1 min Schema
Avoid networks overloads
Data can be compressed (prior to sending)
Extra data like min, max, avg can be calculated
before stored.
Increment retrieving data speed.
24. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
24
Architectual options
25. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
25
Architectual Options
Unreplicated databases
26. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
26
Architectual Options
Redundant and replicated databases
27. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
27
Architectual Options
Replicated databases plus analytics
28. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
28
What is next
29. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
29
Discussion
30. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
30
Enough propaganda ... Get in touch!
Contact information:
Brockhaus Consulting GmbH
Gustav Stresemann Ring 1
D - 65189 Wiesbaden
Germany
Fon: +49-611-97774-332
Fax: +49-611-97774-432
Web: www.brockhaus-gruppe.de
Mail: office@brockhaus-gruppe.de