SlideShare a Scribd company logo
© Copyright 2015 EMC Corporation. All rights reserved. 1© Copyright 2015 EMC Corporation. All rights reserved.
PREDICTIVE MAINTENANCE
EMC/VIRTUSTREAM SOLUTION
RICCARDO ROMANI
© Copyright 2015 EMC Corporation. All rights reserved.
PREDICTIVE MAINTENANCE PROCESS
1. Data acquisition and processing:
A. Data acquisition from raw data
sources ( i.e. railways sensors )
B. Long times for data preparation
and transformation into data set
be used as structured data model
and historical data base
C. Long times for model training and
testing iterations
D. Some batch calculations required
1. Data Storage : Huge amount of data
generated at both acquisition as well
as historical stages
MOST COMMON ISSUES
© Copyright 2015 EMC Corporation. All rights reserved.
DIFFERENT TYPES OF DATA IN PREDICTIVE ANALYTICS
FROM RAW DATA SOURCE…
• Example of Data Sets created starting from the raw data
• Structured Data
• Training data: It is the engine run-to-failure data.
• Testing data: It is the engine operating data without failure events
recorded.
• Ground truth data: It contains the information of true remaining cycles
for each engine in the testing data.
• Predictive data used to predict when an in-service machine will fail, so
that maintenance can be planned in advance.
• Responds to the question :” Given these aircraft engine operation and
failure events history, can we predict when an in-service engine will fail?”
• Regression: Predict the Remaining Useful Life (RUL), or Time to Failure
(TTF).
• Example of raw data gathered from a Stream Processing System.
• Millions of txt files
• Unstructured/semi-structured data
…TO DATASET
© Copyright 2015 EMC Corporation. All rights reserved.
SAP HANA Sybase IQ Sybase IQ
Hot Data Warm Data Cold Data
• Analytics run on Data Model
− Modern in-memory
platform
− Transact/analyze in real-
time
− Native predictive, text,
and spatial algorithms
• Warm Historical Data Model
− Disk backed, smart column store (
HANA on disk or IQ )
− It helps in offloading HANA from
huge amount of data that are
dinamically stored on disk instead
of in-memory
− Excels at queries on structured
data from terabyte to petabyte
scale
• Cold Historical Data Model
− Less frequently accessed data is
archived in time partitions on IQ
− The data is static and used
primarily for read access
− data resides in cost- efficient
storage with fewer backups to
reduce operational costs
− Lower SLA requirements
HOW COMBINING SAP AND EMC/VIRTUSTREAM CAN HELP
HADOOP
Raw data
• Data acquired from Sensors
− New type of User Defined
Function for data federation
− Direct access to HDFS without
need for the package, mapper,
and reducer specification
− Invoke custom Map Reduce jobs
Solution: EMC with Isilon,
Hadoop-native storage
combined with Virtustream
Cloud Storage (EMC ECS Object
Storage )
DIFFERENT TYPES OF DATA IN PREDICTIVE ANALYTICS
Solution : Virtustream with SAP HANA Cloud IaaS/PaaS
© Copyright 2015 EMC Corporation. All rights reserved.
SCENARIO #1
SAP HANA, SAP IQ @ VIRTUSTREAM , DATA ACQUISITION ON ISILON @ HYPERCED
HISTORIC/COLD
Gateway
Rolling Stock
Raw data
RAW DATA
1. Raw Data acquisition done at gateway
level– serialization could be a
bottleneck as well as raw data history
growth
2. Predictive model computation done
at Hot Data level, in Hana
3. Warm data on SAP IQ
4. Cold data archived ( could be SAP IQ
) for compliance and outlier data from
faulty sensors and for statistical
purposes
• EMC can control rolling stock data
growth during acquisition with Isilon
storage certified by SAP and sitting at
gateway layer
HISTORIC/
WARM
HOT DATA
Datacenter
Almaviva “Hyperced” Datacenter
© Copyright 2015 EMC Corporation. All rights reserved.
SCENARIO #2
ADDING HADOOP COMBINED WITH ISILON @ HYPERCED
1. Raw Data acquistion done in
parallel streams. Hadoop
speed up parsing, and
creation of reduced dataset to
be loaded and historicized
2. Predictive model computation
can be done accessing a wider
dataset accessing Warm data
on IQ and Hadoop
3. Cold data stored on hadoop
filesystem as “intelligent”
archiving for statistical
analysis and retrieval
• EMC can control and improve
Data growth management at
both Hadoop and Storage level
− Hadoop for unlimited capacity for raw data
processing
− Apache Kafka or Storm for distributed stream
processing
Gateway/Data Lake
Rolling Stock
Raw data
INTELLIGENT
ARCHIVING FOR
RAW DATA AND
HSISTORIC
HISTORIC/
WARM
HOT DATA
Almaviva
“Hyperced”
Datacenter
© Copyright 2015 EMC Corporation. All rights reserved.
HADOOP AND SAP 1/3
HADOOP IN ACTION @ SAP PREDICTIVE MAINTENANCE BLUEPRINTS
Where SAP and EMC combined technologies fits in a predictive
maintenance project
© Copyright 2015 EMC Corporation. All rights reserved.
HADOOP AND SAP 2/3
Data acquisition
• Real time events streams coming in at a rate of
thousands of raw events per seconds.
• The stream processing system should be able to
process those events in a fault tolerant and distributed
manner and with parallel processing
• Streaming processing systems should also keep record
of old data for some reasonable amount of time before
they are archived or destroyed in order to :
A. build pattern recognition and statistical model
building methods
B. Cope with local country laws.
C. to identify any potential outliers in the
streamed-in data from the sensors. While
monitoring for the faults in the assets it is
possible that the sensor that is taking the
readings, being a machine itself, could fail and
start sending faulty records. Intelligent
CBM management systems capable of detecting
such outliers will try to isolate these faulty
sensors and notify
We have evaluated two popular open source technologies; Apache Kafka , which is the
distributed messaging system, and Storm which is a distributed stream processing engine.
Both having HADOOP as repository for managing data growth.
SAP provides a product called “ESP – Event Stream Processor” that integrates HANA ad
Hadoop
HADOOP IN ACTION @ SAP PREDICTIVE MAINTENANCE BLUEPRINTS
© Copyright 2015 EMC Corporation. All rights reserved.
Build, train and validate the model – creating Data Sets
• To detect failure in a given stream of sensor data, we need to first
define normal behavior.
• For this we need to build model around the historical sensor data.
• Predictive models analyze current and historical data on individuals
to produce metrics.
• A model is reusable and is created by training an algorithm using
historical data and saving the model for reuse purpose to share the
common business rules which can be applied to similar data, in
order to analyze results without the historical data, by using the
trained algorithm
• The process involve running one or more algorithms on the data set
where prediction is going to be carried out. This is an iterative
processing and often involves training the model, using multiple
models on the same data set and finally arriving on the best fit
model based on the business data understanding.
• Raw data, once prepared, are organized in Data Sets, needed to be
stored to be used by different type of models and algorithms
HADOOP AND SAP 3/3
HADOOP IN ACTION @ SAP PREDICTIVE MAINTENANCE BLUEPRINTS
© Copyright 2015 EMC Corporation. All rights reserved.
VIRTUSTREAM
SAP HANA
DT
ALMAVIVA
HYPERCED
Servers & Network
DATA STORAGE powered by EMC
HDFS or NFS
EMC ISILON , VNX
FibreChannel or NFS
EMC VMAX, VNX or XtremIO
EMC/VIRTUSTREAM HYBRID ARCHITECTURE
HOT
Extended
Storage
IQ
NLS
Backup / Archive
Best in class Cloud IaaS/PaaS
for SAP workloads , through
Almaviva White Label Option
EMC Scale Out Unstructured
Storage with de-duplication
features and native HDFS
support
From 15% to 35 % of data
storage efficiency depending
on data specific nature GATEWAY
Unstructured Storage
Raw Data
© Copyright 2015 EMC Corporation. All rights reserved.© Copyright 2015 EMC Corporation. All rights reserved.
© Copyright 2015 EMC Corporation. All rights reserved.
1. Acquire OT data in parallel
streams
2. Clean data and import only the
validated data into SAP HANA
3. Historic OT data in IQ and/or
Hadoop
4. Use Hive to build a
relational/historical view of the
OT data also at Hadoop level
5. Unify OT view with IT data, if
needed
6. Save HANA resources for
realtime predictions only
7. De-duplicate and store raw OT
data for future use or statistic
purposes
Gateway/Data Lake
HADOOP AND SAP
PDMS BLUEPRINT
© Copyright 2015 EMC Corporation. All rights reserved.
HADOOP AND SAP
GATHERING REAL-TIME EVENTS AT SCALE

More Related Content

What's hot

Dealing with Changed Data in Hadoop
Dealing with Changed Data in HadoopDealing with Changed Data in Hadoop
Dealing with Changed Data in Hadoop
DataWorks Summit
 
Hadoop in three use cases
Hadoop in three use casesHadoop in three use cases
Hadoop in three use cases
Joey Echeverria
 
Exploiting machine learning to keep Hadoop clusters healthy
Exploiting machine learning to keep Hadoop clusters healthyExploiting machine learning to keep Hadoop clusters healthy
Exploiting machine learning to keep Hadoop clusters healthy
DataWorks Summit
 

What's hot (20)

Hadoop
HadoopHadoop
Hadoop
 
Apache Flink & Kudu: a connector to develop Kappa architectures
Apache Flink & Kudu: a connector to develop Kappa architecturesApache Flink & Kudu: a connector to develop Kappa architectures
Apache Flink & Kudu: a connector to develop Kappa architectures
 
Show me the Money! Cost & Resource Tracking for Hadoop and Storm
Show me the Money! Cost & Resource  Tracking for Hadoop and Storm Show me the Money! Cost & Resource  Tracking for Hadoop and Storm
Show me the Money! Cost & Resource Tracking for Hadoop and Storm
 
Hadoop data ingestion
Hadoop data ingestionHadoop data ingestion
Hadoop data ingestion
 
Impala use case @ Zoosk
Impala use case @ ZooskImpala use case @ Zoosk
Impala use case @ Zoosk
 
A Common Database Approach for OLTP and OLAP Using an In-Memory Column Database
A Common Database Approach for OLTP and OLAP Using an In-Memory Column DatabaseA Common Database Approach for OLTP and OLAP Using an In-Memory Column Database
A Common Database Approach for OLTP and OLAP Using an In-Memory Column Database
 
Splice machine-bloor-webinar-data-lakes
Splice machine-bloor-webinar-data-lakesSplice machine-bloor-webinar-data-lakes
Splice machine-bloor-webinar-data-lakes
 
Dealing with Changed Data in Hadoop
Dealing with Changed Data in HadoopDealing with Changed Data in Hadoop
Dealing with Changed Data in Hadoop
 
Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNEnabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARN
 
Architecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & ManipulationArchitecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & Manipulation
 
Hadoop in three use cases
Hadoop in three use casesHadoop in three use cases
Hadoop in three use cases
 
Mutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable WorldMutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable World
 
Exploiting machine learning to keep Hadoop clusters healthy
Exploiting machine learning to keep Hadoop clusters healthyExploiting machine learning to keep Hadoop clusters healthy
Exploiting machine learning to keep Hadoop clusters healthy
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
 
Big Data Expo 2015 - Hortonworks Common Hadoop Use Cases
Big Data Expo 2015 - Hortonworks Common Hadoop Use CasesBig Data Expo 2015 - Hortonworks Common Hadoop Use Cases
Big Data Expo 2015 - Hortonworks Common Hadoop Use Cases
 
Enabling the Active Data Warehouse with Apache Kudu
Enabling the Active Data Warehouse with Apache KuduEnabling the Active Data Warehouse with Apache Kudu
Enabling the Active Data Warehouse with Apache Kudu
 
Summary machine learning and model deployment
Summary machine learning and model deploymentSummary machine learning and model deployment
Summary machine learning and model deployment
 
HBaseCon 2012 | Real-Time and Batch HBase for Healthcare at Explorys
HBaseCon 2012 | Real-Time and Batch HBase for Healthcare at ExplorysHBaseCon 2012 | Real-Time and Batch HBase for Healthcare at Explorys
HBaseCon 2012 | Real-Time and Batch HBase for Healthcare at Explorys
 
Format Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and ParquetFormat Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and Parquet
 
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source ToolsData Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
 

Similar to Sap Hana and Virtustream for Predictive Maintenance and Big Data

Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Precisely
 
Lessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloudLessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloud
DataWorks Summit
 
From big data to big value : Infrastructure need and Huawei best practise
From big data to big value : Infrastructure need and Huawei best practise From big data to big value : Infrastructure need and Huawei best practise
From big data to big value : Infrastructure need and Huawei best practise
BSP Media Group
 

Similar to Sap Hana and Virtustream for Predictive Maintenance and Big Data (20)

Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming dataUsing Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
 
IBM Spectrum Scale Overview november 2015
IBM Spectrum Scale Overview november 2015IBM Spectrum Scale Overview november 2015
IBM Spectrum Scale Overview november 2015
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecase
 
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016
 
Autodesk Technical Webinar: SAP HANA in-memory database
Autodesk Technical Webinar: SAP HANA in-memory databaseAutodesk Technical Webinar: SAP HANA in-memory database
Autodesk Technical Webinar: SAP HANA in-memory database
 
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
 
Survey of Real-time Processing Systems for Big Data
Survey of Real-time Processing Systems for Big DataSurvey of Real-time Processing Systems for Big Data
Survey of Real-time Processing Systems for Big Data
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWER
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14
 
Lessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloudLessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloud
 
Intelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockIntelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff Pollock
 
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout SoftwareMaking Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
 
2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration
 
Informix warehouse accelerator update
Informix warehouse accelerator updateInformix warehouse accelerator update
Informix warehouse accelerator update
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS
 
From big data to big value : Infrastructure need and Huawei best practise
From big data to big value : Infrastructure need and Huawei best practise From big data to big value : Infrastructure need and Huawei best practise
From big data to big value : Infrastructure need and Huawei best practise
 

More from Riccardo Romani

More from Riccardo Romani (15)

IDC Multicloud 2019 - Conference Milano , Oracle speech
IDC Multicloud 2019 - Conference Milano , Oracle speechIDC Multicloud 2019 - Conference Milano , Oracle speech
IDC Multicloud 2019 - Conference Milano , Oracle speech
 
Systems Advantage Forum : Autonomous DB e DBaaS
Systems Advantage Forum : Autonomous DB e DBaaS Systems Advantage Forum : Autonomous DB e DBaaS
Systems Advantage Forum : Autonomous DB e DBaaS
 
IDC datacenter of the future : Oracle point of view
IDC datacenter of the future : Oracle point of viewIDC datacenter of the future : Oracle point of view
IDC datacenter of the future : Oracle point of view
 
Annuncio organizzativo-presales-director
Annuncio organizzativo-presales-directorAnnuncio organizzativo-presales-director
Annuncio organizzativo-presales-director
 
Communications Inustry : innovation solutions for Service Providers
Communications Inustry : innovation solutions for Service ProvidersCommunications Inustry : innovation solutions for Service Providers
Communications Inustry : innovation solutions for Service Providers
 
Virtustream Cloud first sales pitch
Virtustream Cloud first sales pitch Virtustream Cloud first sales pitch
Virtustream Cloud first sales pitch
 
Digital health Oracle : dal fascicolo sanitario ai servizi a valore
Digital health Oracle : dal fascicolo sanitario ai servizi a valoreDigital health Oracle : dal fascicolo sanitario ai servizi a valore
Digital health Oracle : dal fascicolo sanitario ai servizi a valore
 
Lift and shift to sparc cloud
Lift and shift to sparc cloudLift and shift to sparc cloud
Lift and shift to sparc cloud
 
Software Defined IT @ Evento SOIEL Roma 6 Aprile 2017
Software Defined IT @ Evento SOIEL Roma 6 Aprile 2017Software Defined IT @ Evento SOIEL Roma 6 Aprile 2017
Software Defined IT @ Evento SOIEL Roma 6 Aprile 2017
 
Il Cliente Al Centro del Datacenter : Tavola Rotonda
Il Cliente Al Centro del Datacenter : Tavola RotondaIl Cliente Al Centro del Datacenter : Tavola Rotonda
Il Cliente Al Centro del Datacenter : Tavola Rotonda
 
Public Cloud services delivered to your Datacenter
Public Cloud services delivered to your DatacenterPublic Cloud services delivered to your Datacenter
Public Cloud services delivered to your Datacenter
 
Oracle Cloud Networking And Security Exposed
Oracle Cloud Networking And Security Exposed Oracle Cloud Networking And Security Exposed
Oracle Cloud Networking And Security Exposed
 
Five Journeys to (your) Cloud Infrastructure
Five Journeys to (your) Cloud InfrastructureFive Journeys to (your) Cloud Infrastructure
Five Journeys to (your) Cloud Infrastructure
 
Utilities Digital Data Driven Innovation
Utilities Digital Data Driven Innovation Utilities Digital Data Driven Innovation
Utilities Digital Data Driven Innovation
 
Oracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and ArchitectureOracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and Architecture
 

Recently uploaded

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 

Recently uploaded (20)

AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 

Sap Hana and Virtustream for Predictive Maintenance and Big Data

  • 1. © Copyright 2015 EMC Corporation. All rights reserved. 1© Copyright 2015 EMC Corporation. All rights reserved. PREDICTIVE MAINTENANCE EMC/VIRTUSTREAM SOLUTION RICCARDO ROMANI
  • 2. © Copyright 2015 EMC Corporation. All rights reserved. PREDICTIVE MAINTENANCE PROCESS 1. Data acquisition and processing: A. Data acquisition from raw data sources ( i.e. railways sensors ) B. Long times for data preparation and transformation into data set be used as structured data model and historical data base C. Long times for model training and testing iterations D. Some batch calculations required 1. Data Storage : Huge amount of data generated at both acquisition as well as historical stages MOST COMMON ISSUES
  • 3. © Copyright 2015 EMC Corporation. All rights reserved. DIFFERENT TYPES OF DATA IN PREDICTIVE ANALYTICS FROM RAW DATA SOURCE… • Example of Data Sets created starting from the raw data • Structured Data • Training data: It is the engine run-to-failure data. • Testing data: It is the engine operating data without failure events recorded. • Ground truth data: It contains the information of true remaining cycles for each engine in the testing data. • Predictive data used to predict when an in-service machine will fail, so that maintenance can be planned in advance. • Responds to the question :” Given these aircraft engine operation and failure events history, can we predict when an in-service engine will fail?” • Regression: Predict the Remaining Useful Life (RUL), or Time to Failure (TTF). • Example of raw data gathered from a Stream Processing System. • Millions of txt files • Unstructured/semi-structured data …TO DATASET
  • 4. © Copyright 2015 EMC Corporation. All rights reserved. SAP HANA Sybase IQ Sybase IQ Hot Data Warm Data Cold Data • Analytics run on Data Model − Modern in-memory platform − Transact/analyze in real- time − Native predictive, text, and spatial algorithms • Warm Historical Data Model − Disk backed, smart column store ( HANA on disk or IQ ) − It helps in offloading HANA from huge amount of data that are dinamically stored on disk instead of in-memory − Excels at queries on structured data from terabyte to petabyte scale • Cold Historical Data Model − Less frequently accessed data is archived in time partitions on IQ − The data is static and used primarily for read access − data resides in cost- efficient storage with fewer backups to reduce operational costs − Lower SLA requirements HOW COMBINING SAP AND EMC/VIRTUSTREAM CAN HELP HADOOP Raw data • Data acquired from Sensors − New type of User Defined Function for data federation − Direct access to HDFS without need for the package, mapper, and reducer specification − Invoke custom Map Reduce jobs Solution: EMC with Isilon, Hadoop-native storage combined with Virtustream Cloud Storage (EMC ECS Object Storage ) DIFFERENT TYPES OF DATA IN PREDICTIVE ANALYTICS Solution : Virtustream with SAP HANA Cloud IaaS/PaaS
  • 5. © Copyright 2015 EMC Corporation. All rights reserved. SCENARIO #1 SAP HANA, SAP IQ @ VIRTUSTREAM , DATA ACQUISITION ON ISILON @ HYPERCED HISTORIC/COLD Gateway Rolling Stock Raw data RAW DATA 1. Raw Data acquisition done at gateway level– serialization could be a bottleneck as well as raw data history growth 2. Predictive model computation done at Hot Data level, in Hana 3. Warm data on SAP IQ 4. Cold data archived ( could be SAP IQ ) for compliance and outlier data from faulty sensors and for statistical purposes • EMC can control rolling stock data growth during acquisition with Isilon storage certified by SAP and sitting at gateway layer HISTORIC/ WARM HOT DATA Datacenter Almaviva “Hyperced” Datacenter
  • 6. © Copyright 2015 EMC Corporation. All rights reserved. SCENARIO #2 ADDING HADOOP COMBINED WITH ISILON @ HYPERCED 1. Raw Data acquistion done in parallel streams. Hadoop speed up parsing, and creation of reduced dataset to be loaded and historicized 2. Predictive model computation can be done accessing a wider dataset accessing Warm data on IQ and Hadoop 3. Cold data stored on hadoop filesystem as “intelligent” archiving for statistical analysis and retrieval • EMC can control and improve Data growth management at both Hadoop and Storage level − Hadoop for unlimited capacity for raw data processing − Apache Kafka or Storm for distributed stream processing Gateway/Data Lake Rolling Stock Raw data INTELLIGENT ARCHIVING FOR RAW DATA AND HSISTORIC HISTORIC/ WARM HOT DATA Almaviva “Hyperced” Datacenter
  • 7. © Copyright 2015 EMC Corporation. All rights reserved. HADOOP AND SAP 1/3 HADOOP IN ACTION @ SAP PREDICTIVE MAINTENANCE BLUEPRINTS Where SAP and EMC combined technologies fits in a predictive maintenance project
  • 8. © Copyright 2015 EMC Corporation. All rights reserved. HADOOP AND SAP 2/3 Data acquisition • Real time events streams coming in at a rate of thousands of raw events per seconds. • The stream processing system should be able to process those events in a fault tolerant and distributed manner and with parallel processing • Streaming processing systems should also keep record of old data for some reasonable amount of time before they are archived or destroyed in order to : A. build pattern recognition and statistical model building methods B. Cope with local country laws. C. to identify any potential outliers in the streamed-in data from the sensors. While monitoring for the faults in the assets it is possible that the sensor that is taking the readings, being a machine itself, could fail and start sending faulty records. Intelligent CBM management systems capable of detecting such outliers will try to isolate these faulty sensors and notify We have evaluated two popular open source technologies; Apache Kafka , which is the distributed messaging system, and Storm which is a distributed stream processing engine. Both having HADOOP as repository for managing data growth. SAP provides a product called “ESP – Event Stream Processor” that integrates HANA ad Hadoop HADOOP IN ACTION @ SAP PREDICTIVE MAINTENANCE BLUEPRINTS
  • 9. © Copyright 2015 EMC Corporation. All rights reserved. Build, train and validate the model – creating Data Sets • To detect failure in a given stream of sensor data, we need to first define normal behavior. • For this we need to build model around the historical sensor data. • Predictive models analyze current and historical data on individuals to produce metrics. • A model is reusable and is created by training an algorithm using historical data and saving the model for reuse purpose to share the common business rules which can be applied to similar data, in order to analyze results without the historical data, by using the trained algorithm • The process involve running one or more algorithms on the data set where prediction is going to be carried out. This is an iterative processing and often involves training the model, using multiple models on the same data set and finally arriving on the best fit model based on the business data understanding. • Raw data, once prepared, are organized in Data Sets, needed to be stored to be used by different type of models and algorithms HADOOP AND SAP 3/3 HADOOP IN ACTION @ SAP PREDICTIVE MAINTENANCE BLUEPRINTS
  • 10. © Copyright 2015 EMC Corporation. All rights reserved. VIRTUSTREAM SAP HANA DT ALMAVIVA HYPERCED Servers & Network DATA STORAGE powered by EMC HDFS or NFS EMC ISILON , VNX FibreChannel or NFS EMC VMAX, VNX or XtremIO EMC/VIRTUSTREAM HYBRID ARCHITECTURE HOT Extended Storage IQ NLS Backup / Archive Best in class Cloud IaaS/PaaS for SAP workloads , through Almaviva White Label Option EMC Scale Out Unstructured Storage with de-duplication features and native HDFS support From 15% to 35 % of data storage efficiency depending on data specific nature GATEWAY Unstructured Storage Raw Data
  • 11. © Copyright 2015 EMC Corporation. All rights reserved.© Copyright 2015 EMC Corporation. All rights reserved.
  • 12. © Copyright 2015 EMC Corporation. All rights reserved. 1. Acquire OT data in parallel streams 2. Clean data and import only the validated data into SAP HANA 3. Historic OT data in IQ and/or Hadoop 4. Use Hive to build a relational/historical view of the OT data also at Hadoop level 5. Unify OT view with IT data, if needed 6. Save HANA resources for realtime predictions only 7. De-duplicate and store raw OT data for future use or statistic purposes Gateway/Data Lake HADOOP AND SAP PDMS BLUEPRINT
  • 13. © Copyright 2015 EMC Corporation. All rights reserved. HADOOP AND SAP GATHERING REAL-TIME EVENTS AT SCALE