Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
Java EE Pattern: The Entity Layer
Next
Download to read offline and view in fullscreen.

0

Share

Download to read offline

Big Data in Production Environments

Download to read offline

10.000 ft. overview about the options of storing values of 250.000 sensors from a paint shop in automotive sector; presentation for the management of one of our customers.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

Big Data in Production Environments

  1. 1. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 1 Proposal for establishing modern concepts of data storage and analytics to production data
  2. 2. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 2 Current situation
  3. 3. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 3 Current situation Just the Numbers ✔ Approx. 270.000 sensors in one installation (AUDI Györ) (but only 17.000 sensures are currently tracked) ✔ Lots of 'unsynchronized' control desks, respective their data ✔ Lot of duplicated data (because of the 'home-grown' failover/replication concept) ✔ No historical data (because the amount of data is overwhelming and can't be handeled) ✔ Problems with scalability
  4. 4. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 4 Current situation Outdated technologies ✔ Trend Server is developed in Delphi: Who develops in that? ✔ Microsoft SQL Server: not fast enough ✔ Technological breaches between several technologies Bottlenecks ✔ Query slow for mor than 750k events ✔ No more than 7500 CSV files ✔ CSV & SQL server for the same tasks
  5. 5. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 5 Current situation Scalability and fault tolerance ✔ Only few sensor can be saved ✔ IOM synchronization problems ✔ Buffered data saved with the same timestamp ✔ Different IOM saved same data with different timestamps No integration / standalone application ✔ Data can not be accessed from every place (control desk) ✔ Data can not be recorded in case of failure
  6. 6. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 6 Big data and NoSQL
  7. 7. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 7 Why the relational model … sometimes isn't enough ✔ Can't handle extremely large data amounts (in extreme 15 Petabyte data in Gov. Of India) ✔ Hard to scale (esp. scaleing out adding nodes to handle the load)→ ✔ Hard to deal with 'unstructured' data due to strict data model ✔ The valuable transactional model sometimes is an overkill
  8. 8. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 8 Dealing with data … awfull lots of data … petabytes … and even behind this plus NoSQL
  9. 9. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 9 Dealing with data Characteristics and value proposition ✔ Big Data gains momentum (data generates value) ➢ High data velocity ➢ Data variety ➢ Data Volume ➢ Data complexity ✔ Continuous availability ✔ Data location independence ✔ Flexible data model (schemaless databases) ✔ Improved architecture and enhanced analytics
  10. 10. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 10 Problems with NoSQL Document - oriented MongoDB CouchDB Column Store Big Table HBase Key-Value Cassandra DynamoDB Azure Table Storage Riak BerkeleyDB Graph Neo4J Many players, several concepts, no one size fits all approach and no standards
  11. 11. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 11 Why Cassandra? Because people with the same problems have chosen it ... “I can create a Cassandra cluster in any region of the world in 10 minutes. When marketing decides we want to move into a certain part of the world, we’re ready.”
  12. 12. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 12 Why Cassandra Scalability ✔ Add nodes to scale ✔ Millions operations ✔ Low latency in read/write operations
  13. 13. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 13 Why Cassandra Availability ✔ Created to be distributed ✔ Resistant and flexible to failures ✔ Different data centers (probably in different parts of the world)
  14. 14. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 14 Why Cassandra Replication
  15. 15. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 15 Why Cassandra Sometimes things go wrong: ✔ Hardware fails ✔ Bug ✔ Power ✔ Natural disaster and then... ✔ Fast node recovery ✔ Auto-Balancing when a node fails ✔ Transparent to the client
  16. 16. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 16 Why Cassandra Easy to use ✔ Large ecosystem ✔ Well documented ✔ Full Java support ✔ SQL-like syntax INSERT INTO sensor_by_day (sensor_id,date,event_time,value) VALUES (’1234ABCD’,’2013-04-03′,’2013-04-03 07:01:00′,’72F’);
  17. 17. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 17 Time Series in Cassandra Cassandra can store up to 2 billion columns per row, but if we’re storing data every millisecond you wouldn’t even get a month’s worth of data. The solution is to use a pattern called row partitioning by adding data to the row key to limit the amount of columns you get per device. Almost no limits!
  18. 18. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 18 Data analysis goals ✔ Low latency (interactive) queries on historical data: enable faster decisions ✔ Low latency queries on live data (streaming): enable decisions on real-time data ✔ Sophisticated data processing: enable “better” decisions (e.g. anomaly detection, trend analysis)
  19. 19. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 19 Spark ecosystem Well integrated with Cassandra and includes: ✔ SQL-like interface ✔ Machine learning: Algorithms that can learn from data, used for predictions (predictive maintenance: exploit patterns found in historical and transactional data to identify risks and opportunities) ✔ Streaming: Real-time streaming data like sliding windows
  20. 20. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 20 Use Cases
  21. 21. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 21 Use Case ✔ Data from Oven will be collected ✔ Cassandra stores sequentially ✔ TrendPage reads sequentially for faster graphic creation.
  22. 22. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 22 Use Case Data Model to support queries ✔ Store data per oven ✔ Store time series in order: first to last ✔ Get all data for one oven Queries needed ✔ Get data for a single date and time ✔ Get data for a range of dates and times Cassandra is really good for time-series data because you can write one column for each period in your series and then query across a range of time using sub-string matching. This is best done using columns for each period rather than rows, as you get huge IO efficiency wins from loading only a single row per query. – MyDrive Telemetry (15 billion records on average)
  23. 23. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 23 Time Series in Cassandra The data model ✔ Row Key is Time Identifier ✔ Column Values are Events ✔ Columns Values are Measurements ✔ Rows Can be Very Wide 1 s Schema Faster data storage in database 1 min Schema Avoid networks overloads Data can be compressed (prior to sending) Extra data like min, max, avg can be calculated before stored. Increment retrieving data speed.
  24. 24. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 24 Architectual options
  25. 25. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 25 Architectual Options Unreplicated databases
  26. 26. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 26 Architectual Options Redundant and replicated databases
  27. 27. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 27 Architectual Options Replicated databases plus analytics
  28. 28. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 28 What is next
  29. 29. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 29 Discussion
  30. 30. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 30 Enough propaganda ... Get in touch! Contact information: Brockhaus Consulting GmbH Gustav Stresemann Ring 1 D - 65189 Wiesbaden Germany Fon: +49-611-97774-332 Fax: +49-611-97774-432 Web: www.brockhaus-gruppe.de Mail: office@brockhaus-gruppe.de

10.000 ft. overview about the options of storing values of 250.000 sensors from a paint shop in automotive sector; presentation for the management of one of our customers.

Views

Total views

545

On Slideshare

0

From embeds

0

Number of embeds

4

Actions

Downloads

8

Shares

0

Comments

0

Likes

0

×