SlideShare a Scribd company logo
1 of 30
Download to read offline
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
1
Proposal for establishing
modern concepts
of data storage and analytics
to production data
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
2
Current situation
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
3
Current situation
Just the Numbers
✔ Approx. 270.000 sensors in one installation (AUDI Györ)
(but only 17.000 sensures are currently tracked)
✔ Lots of 'unsynchronized' control desks, respective their data
✔ Lot of duplicated data
(because of the 'home-grown' failover/replication concept)
✔ No historical data
(because the amount of data is
overwhelming and can't be
handeled)
✔ Problems with scalability
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
4
Current situation
Outdated technologies
✔ Trend Server is developed in Delphi: Who develops in that?
✔ Microsoft SQL Server: not fast enough
✔ Technological breaches between several technologies
Bottlenecks
✔ Query slow for mor than 750k events
✔ No more than 7500 CSV files
✔ CSV & SQL server for the same tasks
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
5
Current situation
Scalability and fault tolerance
✔ Only few sensor can be saved
✔ IOM synchronization problems
✔ Buffered data saved with the same
timestamp
✔ Different IOM saved same data with
different timestamps
No integration /
standalone application
✔ Data can not be accessed from every
place (control desk)
✔ Data can not be recorded in case of
failure
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
6
Big data and NoSQL
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
7
Why the relational model
… sometimes isn't enough
✔ Can't handle extremely large data amounts (in extreme 15
Petabyte data in Gov. Of India)
✔ Hard to scale (esp. scaleing out adding nodes to handle the load)→
✔ Hard to deal with 'unstructured' data due to strict data model
✔ The valuable transactional model sometimes is an overkill
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
8
Dealing with data
… awfull lots of data
… petabytes
… and even behind this
plus NoSQL
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
9
Dealing with data
Characteristics and value proposition
✔ Big Data gains momentum
(data generates value)
➢ High data velocity
➢ Data variety
➢ Data Volume
➢ Data complexity
✔ Continuous availability
✔ Data location independence
✔ Flexible data model (schemaless databases)
✔ Improved architecture and enhanced analytics
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
10
Problems with NoSQL
Document - oriented MongoDB
CouchDB
Column Store Big Table
HBase
Key-Value Cassandra
DynamoDB
Azure Table Storage
Riak
BerkeleyDB
Graph Neo4J
Many players, several concepts, no one size fits all approach
and no standards
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
11
Why Cassandra?
Because people with the same problems have chosen it ...
“I can create a Cassandra cluster
in any region of the world in 10 minutes.
When marketing decides we want to move
into a certain part of the world, we’re ready.”
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
12
Why Cassandra
Scalability
✔ Add nodes to scale
✔ Millions operations
✔ Low latency in read/write
operations
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
13
Why Cassandra
Availability
✔ Created to be distributed
✔ Resistant and flexible to failures
✔ Different data centers
(probably in different parts
of the world)
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
14
Why Cassandra
Replication
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
15
Why Cassandra
Sometimes things go wrong:
✔ Hardware fails
✔ Bug
✔ Power
✔ Natural disaster
and then...
✔ Fast node recovery
✔ Auto-Balancing when a
node fails
✔ Transparent to the client
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
16
Why Cassandra
Easy to use
✔ Large ecosystem
✔ Well documented
✔ Full Java support
✔ SQL-like syntax
INSERT INTO sensor_by_day
(sensor_id,date,event_time,value)
VALUES
(’1234ABCD’,’2013-04-03′,’2013-04-03 07:01:00′,’72F’);
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
17
Time Series in Cassandra
Cassandra can store up to 2 billion columns per row,
but if we’re storing data every millisecond you wouldn’t
even get a month’s worth of data.
The solution is to use a pattern called row partitioning
by adding data to the row key to limit the amount of columns
you get per device.
Almost no limits!
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
18
Data analysis goals
✔ Low latency (interactive) queries on historical data: enable
faster decisions
✔ Low latency queries on live data (streaming): enable
decisions on real-time data
✔ Sophisticated data processing:
enable “better” decisions
(e.g. anomaly detection,
trend analysis)
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
19
Spark ecosystem
Well integrated with Cassandra and includes:
✔ SQL-like interface
✔ Machine learning:
Algorithms that can learn from data, used for predictions
(predictive maintenance: exploit patterns found in historical and
transactional data to identify risks and opportunities)
✔ Streaming:
Real-time streaming data like
sliding windows
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
20
Use Cases
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
21
Use Case
✔ Data from Oven will be collected
✔ Cassandra stores sequentially
✔ TrendPage reads sequentially for
faster graphic creation.
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
22
Use Case
Data Model to support queries
✔ Store data per oven
✔ Store time series
in order: first to last
✔ Get all data for one oven
Queries needed
✔ Get data for a single date
and time
✔ Get data for a range
of dates and times
Cassandra is really good for time-series data
because you can write one column for each period
in your series and then query across a range of time
using sub-string matching.
This is best done using columns for each period
rather than rows, as you get huge IO efficiency
wins from loading only a single row per query.
– MyDrive Telemetry (15 billion records on average)
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
23
Time Series in Cassandra
The data model
✔ Row Key is Time Identifier
✔ Column Values are Events
✔ Columns Values are Measurements
✔ Rows Can be Very Wide
1 s Schema
Faster data storage in database
1 min Schema
Avoid networks overloads
Data can be compressed (prior to sending)
Extra data like min, max, avg can be calculated
before stored.
Increment retrieving data speed.
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
24
Architectual options
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
25
Architectual Options
Unreplicated databases
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
26
Architectual Options
Redundant and replicated databases
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
27
Architectual Options
Replicated databases plus analytics
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
28
What is next
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
29
Discussion
Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt
30
Enough propaganda ... Get in touch!
Contact information:
Brockhaus Consulting GmbH
Gustav Stresemann Ring 1
D - 65189 Wiesbaden
Germany
Fon: +49-611-97774-332
Fax: +49-611-97774-432
Web: www.brockhaus-gruppe.de
Mail: office@brockhaus-gruppe.de

More Related Content

Similar to Big Data in Production Environments

Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantageAmazon Web Services
 
Big Data Learnings from a Vendor's Perspective
Big Data Learnings from a Vendor's PerspectiveBig Data Learnings from a Vendor's Perspective
Big Data Learnings from a Vendor's PerspectiveAerospike, Inc.
 
Lean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataLean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataStylight
 
Tapping the cloud for real time data analytics
 Tapping the cloud for real time data analytics Tapping the cloud for real time data analytics
Tapping the cloud for real time data analyticsAmazon Web Services
 
Snowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data WarehousingSnowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data WarehousingAmazon Web Services
 
Storm at spider.io - London Storm Meetup 2013-06-18
Storm at spider.io - London Storm Meetup 2013-06-18Storm at spider.io - London Storm Meetup 2013-06-18
Storm at spider.io - London Storm Meetup 2013-06-18Ashley Brown
 
How to get Real-Time Value from your IoT Data - Datastax
How to get Real-Time Value from your IoT Data - DatastaxHow to get Real-Time Value from your IoT Data - Datastax
How to get Real-Time Value from your IoT Data - DatastaxDataStax
 
Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...
Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...
Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...DataStax
 
TidalScale Overview
TidalScale OverviewTidalScale Overview
TidalScale OverviewPete Jarvis
 
start_your_datacenter_sds_v3
start_your_datacenter_sds_v3start_your_datacenter_sds_v3
start_your_datacenter_sds_v3David Byte
 
Проектирование крупномасштабных приложений сбора данных (Josh Berkus)
Проектирование крупномасштабных приложений сбора данных (Josh Berkus)Проектирование крупномасштабных приложений сбора данных (Josh Berkus)
Проектирование крупномасштабных приложений сбора данных (Josh Berkus)Ontico
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaKai Wähner
 
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloudHive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloudJaipaul Agonus
 
Big data on AWS
Big data on AWSBig data on AWS
Big data on AWSStylight
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...MSAdvAnalytics
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native PlatformSunil Govindan
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native PlatformSunil Govindan
 

Similar to Big Data in Production Environments (20)

Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
Big Data Learnings from a Vendor's Perspective
Big Data Learnings from a Vendor's PerspectiveBig Data Learnings from a Vendor's Perspective
Big Data Learnings from a Vendor's Perspective
 
Lean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataLean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big Data
 
Tapping the cloud for real time data analytics
 Tapping the cloud for real time data analytics Tapping the cloud for real time data analytics
Tapping the cloud for real time data analytics
 
Snowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data WarehousingSnowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data Warehousing
 
Storm at spider.io - London Storm Meetup 2013-06-18
Storm at spider.io - London Storm Meetup 2013-06-18Storm at spider.io - London Storm Meetup 2013-06-18
Storm at spider.io - London Storm Meetup 2013-06-18
 
How to get Real-Time Value from your IoT Data - Datastax
How to get Real-Time Value from your IoT Data - DatastaxHow to get Real-Time Value from your IoT Data - Datastax
How to get Real-Time Value from your IoT Data - Datastax
 
Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...
Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...
Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...
 
TidalScale Overview
TidalScale OverviewTidalScale Overview
TidalScale Overview
 
start_your_datacenter_sds_v3
start_your_datacenter_sds_v3start_your_datacenter_sds_v3
start_your_datacenter_sds_v3
 
Проектирование крупномасштабных приложений сбора данных (Josh Berkus)
Проектирование крупномасштабных приложений сбора данных (Josh Berkus)Проектирование крупномасштабных приложений сбора данных (Josh Berkus)
Проектирование крупномасштабных приложений сбора данных (Josh Berkus)
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
 
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloudHive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
 
Big Data on AWS
Big Data on AWSBig Data on AWS
Big Data on AWS
 
Big data on AWS
Big data on AWSBig data on AWS
Big data on AWS
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Big Trends in Big Data
Big Trends in Big DataBig Trends in Big Data
Big Trends in Big Data
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 

More from Brockhaus Consulting GmbH

Industrie 40 Symposium an der RFH Köln 7.7.2016
Industrie 40 Symposium an der RFH Köln 7.7.2016 Industrie 40 Symposium an der RFH Köln 7.7.2016
Industrie 40 Symposium an der RFH Köln 7.7.2016 Brockhaus Consulting GmbH
 
Microservices und das Entity Control Boundary Pattern
Microservices und das Entity Control Boundary PatternMicroservices und das Entity Control Boundary Pattern
Microservices und das Entity Control Boundary PatternBrockhaus Consulting GmbH
 
Certification isec 2012 program committee (bohnen, matthias) 2
Certification isec 2012 program committee (bohnen, matthias) 2Certification isec 2012 program committee (bohnen, matthias) 2
Certification isec 2012 program committee (bohnen, matthias) 2Brockhaus Consulting GmbH
 

More from Brockhaus Consulting GmbH (19)

Industrie 40 Symposium an der RFH Köln 7.7.2016
Industrie 40 Symposium an der RFH Köln 7.7.2016 Industrie 40 Symposium an der RFH Köln 7.7.2016
Industrie 40 Symposium an der RFH Köln 7.7.2016
 
Zeitreihen in Apache Cassandra
Zeitreihen in Apache CassandraZeitreihen in Apache Cassandra
Zeitreihen in Apache Cassandra
 
M2M infrastructure using Docker
M2M infrastructure using DockerM2M infrastructure using Docker
M2M infrastructure using Docker
 
Arquillian in a nutshell
Arquillian in a nutshellArquillian in a nutshell
Arquillian in a nutshell
 
Big Data and Business Intelligence
Big Data and Business IntelligenceBig Data and Business Intelligence
Big Data and Business Intelligence
 
Microservices und das Entity Control Boundary Pattern
Microservices und das Entity Control Boundary PatternMicroservices und das Entity Control Boundary Pattern
Microservices und das Entity Control Boundary Pattern
 
OPC -Connectivity using Java
OPC -Connectivity using JavaOPC -Connectivity using Java
OPC -Connectivity using Java
 
Mobile Endgeräte in der Produktion
Mobile Endgeräte in der ProduktionMobile Endgeräte in der Produktion
Mobile Endgeräte in der Produktion
 
Intro 2 Machine Learning
Intro 2 Machine LearningIntro 2 Machine Learning
Intro 2 Machine Learning
 
Messaging im Internet of Things: MQTT
Messaging im Internet of Things: MQTTMessaging im Internet of Things: MQTT
Messaging im Internet of Things: MQTT
 
Industrie 4.0: Symposium an der RFH Köln
Industrie 4.0: Symposium an der RFH KölnIndustrie 4.0: Symposium an der RFH Köln
Industrie 4.0: Symposium an der RFH Köln
 
Industry 4.0
Industry 4.0Industry 4.0
Industry 4.0
 
Architekturbewertung
ArchitekturbewertungArchitekturbewertung
Architekturbewertung
 
Bro110 5 1_software_architecture
Bro110 5 1_software_architectureBro110 5 1_software_architecture
Bro110 5 1_software_architecture
 
Work shop worldbank
Work shop worldbankWork shop worldbank
Work shop worldbank
 
Certification isec 2012 program committee (bohnen, matthias) 2
Certification isec 2012 program committee (bohnen, matthias) 2Certification isec 2012 program committee (bohnen, matthias) 2
Certification isec 2012 program committee (bohnen, matthias) 2
 
Java flyer final_2014
Java flyer final_2014Java flyer final_2014
Java flyer final_2014
 
Cartel java ee 2nd ed
Cartel java ee 2nd edCartel java ee 2nd ed
Cartel java ee 2nd ed
 
Brockhaus Group 2014
Brockhaus Group 2014Brockhaus Group 2014
Brockhaus Group 2014
 

Recently uploaded

Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 

Recently uploaded (20)

Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 

Big Data in Production Environments

  • 1. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 1 Proposal for establishing modern concepts of data storage and analytics to production data
  • 2. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 2 Current situation
  • 3. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 3 Current situation Just the Numbers ✔ Approx. 270.000 sensors in one installation (AUDI Györ) (but only 17.000 sensures are currently tracked) ✔ Lots of 'unsynchronized' control desks, respective their data ✔ Lot of duplicated data (because of the 'home-grown' failover/replication concept) ✔ No historical data (because the amount of data is overwhelming and can't be handeled) ✔ Problems with scalability
  • 4. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 4 Current situation Outdated technologies ✔ Trend Server is developed in Delphi: Who develops in that? ✔ Microsoft SQL Server: not fast enough ✔ Technological breaches between several technologies Bottlenecks ✔ Query slow for mor than 750k events ✔ No more than 7500 CSV files ✔ CSV & SQL server for the same tasks
  • 5. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 5 Current situation Scalability and fault tolerance ✔ Only few sensor can be saved ✔ IOM synchronization problems ✔ Buffered data saved with the same timestamp ✔ Different IOM saved same data with different timestamps No integration / standalone application ✔ Data can not be accessed from every place (control desk) ✔ Data can not be recorded in case of failure
  • 6. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 6 Big data and NoSQL
  • 7. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 7 Why the relational model … sometimes isn't enough ✔ Can't handle extremely large data amounts (in extreme 15 Petabyte data in Gov. Of India) ✔ Hard to scale (esp. scaleing out adding nodes to handle the load)→ ✔ Hard to deal with 'unstructured' data due to strict data model ✔ The valuable transactional model sometimes is an overkill
  • 8. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 8 Dealing with data … awfull lots of data … petabytes … and even behind this plus NoSQL
  • 9. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 9 Dealing with data Characteristics and value proposition ✔ Big Data gains momentum (data generates value) ➢ High data velocity ➢ Data variety ➢ Data Volume ➢ Data complexity ✔ Continuous availability ✔ Data location independence ✔ Flexible data model (schemaless databases) ✔ Improved architecture and enhanced analytics
  • 10. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 10 Problems with NoSQL Document - oriented MongoDB CouchDB Column Store Big Table HBase Key-Value Cassandra DynamoDB Azure Table Storage Riak BerkeleyDB Graph Neo4J Many players, several concepts, no one size fits all approach and no standards
  • 11. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 11 Why Cassandra? Because people with the same problems have chosen it ... “I can create a Cassandra cluster in any region of the world in 10 minutes. When marketing decides we want to move into a certain part of the world, we’re ready.”
  • 12. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 12 Why Cassandra Scalability ✔ Add nodes to scale ✔ Millions operations ✔ Low latency in read/write operations
  • 13. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 13 Why Cassandra Availability ✔ Created to be distributed ✔ Resistant and flexible to failures ✔ Different data centers (probably in different parts of the world)
  • 14. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 14 Why Cassandra Replication
  • 15. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 15 Why Cassandra Sometimes things go wrong: ✔ Hardware fails ✔ Bug ✔ Power ✔ Natural disaster and then... ✔ Fast node recovery ✔ Auto-Balancing when a node fails ✔ Transparent to the client
  • 16. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 16 Why Cassandra Easy to use ✔ Large ecosystem ✔ Well documented ✔ Full Java support ✔ SQL-like syntax INSERT INTO sensor_by_day (sensor_id,date,event_time,value) VALUES (’1234ABCD’,’2013-04-03′,’2013-04-03 07:01:00′,’72F’);
  • 17. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 17 Time Series in Cassandra Cassandra can store up to 2 billion columns per row, but if we’re storing data every millisecond you wouldn’t even get a month’s worth of data. The solution is to use a pattern called row partitioning by adding data to the row key to limit the amount of columns you get per device. Almost no limits!
  • 18. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 18 Data analysis goals ✔ Low latency (interactive) queries on historical data: enable faster decisions ✔ Low latency queries on live data (streaming): enable decisions on real-time data ✔ Sophisticated data processing: enable “better” decisions (e.g. anomaly detection, trend analysis)
  • 19. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 19 Spark ecosystem Well integrated with Cassandra and includes: ✔ SQL-like interface ✔ Machine learning: Algorithms that can learn from data, used for predictions (predictive maintenance: exploit patterns found in historical and transactional data to identify risks and opportunities) ✔ Streaming: Real-time streaming data like sliding windows
  • 20. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 20 Use Cases
  • 21. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 21 Use Case ✔ Data from Oven will be collected ✔ Cassandra stores sequentially ✔ TrendPage reads sequentially for faster graphic creation.
  • 22. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 22 Use Case Data Model to support queries ✔ Store data per oven ✔ Store time series in order: first to last ✔ Get all data for one oven Queries needed ✔ Get data for a single date and time ✔ Get data for a range of dates and times Cassandra is really good for time-series data because you can write one column for each period in your series and then query across a range of time using sub-string matching. This is best done using columns for each period rather than rows, as you get huge IO efficiency wins from loading only a single row per query. – MyDrive Telemetry (15 billion records on average)
  • 23. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 23 Time Series in Cassandra The data model ✔ Row Key is Time Identifier ✔ Column Values are Events ✔ Columns Values are Measurements ✔ Rows Can be Very Wide 1 s Schema Faster data storage in database 1 min Schema Avoid networks overloads Data can be compressed (prior to sending) Extra data like min, max, avg can be calculated before stored. Increment retrieving data speed.
  • 24. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 24 Architectual options
  • 25. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 25 Architectual Options Unreplicated databases
  • 26. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 26 Architectual Options Redundant and replicated databases
  • 27. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 27 Architectual Options Replicated databases plus analytics
  • 28. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 28 What is next
  • 29. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 29 Discussion
  • 30. Copyright by Brockhaus GmbH, alle Rechte reserviert, unautorisierte Vervielfältigung untersagt 30 Enough propaganda ... Get in touch! Contact information: Brockhaus Consulting GmbH Gustav Stresemann Ring 1 D - 65189 Wiesbaden Germany Fon: +49-611-97774-332 Fax: +49-611-97774-432 Web: www.brockhaus-gruppe.de Mail: office@brockhaus-gruppe.de