When you wake up in the morning, you probably unlock your smartphone with your fingerprint, talk to it in your own language to open your email or agenda or weather apps, ask for a recommendation for a meeting later in the day and look for the shortest path to its location. Our lives are being reshaped thanks to the amount of available data, to the computing capabilities, to Machine Learning (ML) and recently Deep Learning (DL) algorithms.
How does a ML algorithm work? What are the steps to take to success an ML project? What should one do to apply DL? Is ML hard to Learn? Is it hard to apply?
This presentation is an Introduction to the importance of Data Analytics in Product Management. During this talk Etugo Nwokah, former Chief Product Officer for WellMatch, covered how to define Data Analytics why it should be a first class citizen in any software organization
Overview of Machine Learning and Feature EngineeringTuri, Inc.
Machine Learning 101 Tutorial at Strata NYC, Sep 2015
Overview of machine learning models and features. Visualization of feature space and feature engineering methods.
When you wake up in the morning, you probably unlock your smartphone with your fingerprint, talk to it in your own language to open your email or agenda or weather apps, ask for a recommendation for a meeting later in the day and look for the shortest path to its location. Our lives are being reshaped thanks to the amount of available data, to the computing capabilities, to Machine Learning (ML) and recently Deep Learning (DL) algorithms.
How does a ML algorithm work? What are the steps to take to success an ML project? What should one do to apply DL? Is ML hard to Learn? Is it hard to apply?
This presentation is an Introduction to the importance of Data Analytics in Product Management. During this talk Etugo Nwokah, former Chief Product Officer for WellMatch, covered how to define Data Analytics why it should be a first class citizen in any software organization
Overview of Machine Learning and Feature EngineeringTuri, Inc.
Machine Learning 101 Tutorial at Strata NYC, Sep 2015
Overview of machine learning models and features. Visualization of feature space and feature engineering methods.
Defining Data Science
• What Does a Data Science Professional Do?
• Data Science in Business
• Use Cases for Data Science
• Installation of R and R studio
- A new Microsoft via AI and IoT
- Smart Manufacturing - quick steps and ready to go preconfigured solution to jumpstart
- AI - computer vision and AOI in manufacturing
Creating ML models is just the starting of a long journey. In this presentation which was given as a talk on e2e AI talks, I talk about the various challenges in the machine learning life cycle
Digital twins: the power of a virtual visual copy - Unite Copenhagen 2019Unity Technologies
From buildings and infrastructure to industrial machinery and factories, digital twins are becoming integral revisualization tools across the industrial sector. Learn how Unit040, a company specializing in visualization and simulation, creates digital twins that combine real-time 3D technology with BIM, CAD and CAE systems to add value at all stages of the building and product lifecycle, from the early design phase to predictive maintenance using Internet of Things (IoT) data.
Speakers:
Pieter Weterings - Unit040
Guido van Gageldonk - Unit040
Watch the session on YouTube: https://youtu.be/j4i14p89h_s
IoT Architectures for a Digital Twin with Apache Kafka, IoT Platforms and Mac...Kai Wähner
A digital twin is a digital replica of a living or non-living physical entity. This session discusses the benefits and IoT architectures of a Digital Twin in Industrial IoT (IIoT) and its relation to Apache Kafka, IoT frameworks and Machine Learning. Kafka is often used as central event streaming platform to build a scalable and reliable digital twin for real time streaming sensor data. A live demo shows a scalable digital twin infrastructure for condition monitoring and predictive maintenance in real time for a connected car infrastructure leveraging Kafka, MQTT and TensorFlow.
Key Take-Aways:
• Learn about use cases and characteristics of a digital twin in various industries
• Understand how to build a digital twin for every single (of tens of thousands) IoT device or machine
• See different IoT architectures with Kafka and other IoT technologies and products, including edge, hybrid and global deployments
• Understand the relation to Machine Learning and bring added value to your IoT infrastructure by enabling use cases like predictive maintenance
• Understand how the Apache Kafka enables scalable and flexible end-to-end integration processing from IIoT data to various backend applications
• Watch a live demo of an end-to-end integration, real time processing and analytics of thousands of IoT devices
More details:
https://www.kai-waehner.de/blog/2019/11/28/apache-kafka-industrial-iot-iiot-build-an-open-scalable-reliable-digital-twin/
https://www.kai-waehner.de/blog/2020/03/25/architectures-digital-twin-digital-thread-apache-kafka-iot-platforms-machine-learning/
https://youtu.be/Q3eKPEVwNVY
Data Lakes are early in the Gartner hype cycle, but companies are getting value from their cloud-based data lake deployments. Break through the confusion between data lakes and data warehouses and seek out the most appropriate use cases for your big data lakes.
Data Quality: A Raising Data Warehousing ConcernAmin Chowdhury
Characteristics of Data Warehouse
Benefits of a data warehouse
Designing of Data Warehouse
Extract, Transform, Load (ETL)
Data Quality
Classification Of Data Quality Issues
Causes Of Data Quality
Impact of Data Quality Issues
Cost of Poor Data Quality
Confidence and Satisfaction-based impacts
Impact on Productivity
Risk and Compliance impacts
Why Data Quality Influences?
Causes of Data Quality Problems
How to deal: Missing Data
Data Corruption
Data: Out of Range error
Techniques of Data Quality Control
Data warehousing security
Introduction to Data Science and AnalyticsSrinath Perera
This webinar serves as an introduction to WSO2 Summer School. It will discuss how to build a pipeline for your organization and for each use case, and the technology and tooling choices that need to be made for the same.
This session will explore analytics under four themes:
Hindsight (what happened)
Oversight (what is happening)
Insight (why is it happening)
Foresight (what will happen)
Recording http://t.co/WcMFEAJHok
Big data is a term that describes the large volume of data may be both structured and unstructured.
That inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters.
A Seminar Presentation on Big Data for Students.
Big data refers to a process that is used when traditional data mining and handling techniques cannot uncover the insights and meaning of the underlying data. Data that is unstructured or time sensitive or simply very large cannot be processed by relational database engines. This type of data requires a different processing approach called big data, which uses massive parallelism on readily-available hardware.
Why an AI-Powered Data Catalog Tool is Critical to Business SuccessInformatica
Imagine a fast, more efficient business thriving on trusted data-driven decisions. An intelligent data catalog can help your organization discover, organize, and inventory all data assets across the org and democratize data with the right balance of governance and flexibility. Informatica's data catalog tools are powered by AI and can automate tedious data management tasks and offer immediate recommendations based on derived business intelligence. We offer data catalog workshops globally. Visit Informatica.com to attend one near you.
Data science is an interdisciplinary field that uses algorithms, procedures, and processes to examine large amounts of data in order to uncover hidden patterns, generate insights, and direct decision making.
Framework for understanding quantum computing use cases from a multidisciplin...Anastasija Nikiforova
This presentation is a supplementary material for the article "Framework for understanding quantum computing use cases from a multidisciplinary perspective and future research directions" (Ukpabi, D.C., Karjaluoto, H., Botticher, A., Nikiforova, A., Petrescu, D.I., Schindler, P., Valtenbergs, V., Lehmann, L., & Yakaryılmaz, A) available at https://arxiv.org/ftp/arxiv/papers/2212/2212.13909.pdf. THe presentation, however, was delivered for QWorld Quantum Science Days 2023 | May 29-31.
The fourth industrial revolution Industry 4.0 represents a new paradigm shift from “centralized” to “decentralized” industry relies on cyber-physical based automation where sensors send data directly to the cloud and services such as monitoring, control and optimization automatically subscribe to necessary data in real-time. In the coming years, these technologies will be seen as a viable alternative to current manufacturing processes. According to a recent report by Markets and Markets, smart factory technology will have global market size of 74.80 Billion USD by 2022. The talk provides a comprehensive introduction to Industry 4.0 and Smart Factory. Technical challenges and social implications of smart factory will be discussed. The applicability of these emerging technologies in developing economies is highlighted in this talk as well.
Big Data & Analytics in the Manufacturing Industry: The Vaasan GroupIBM Analytics
The Vassan Group struggled to accurately forecast fluctuating sales orders across the Nordic region. As a result, they couldn't effectively plan their resource and production schedule. With IBM Big Data & Analytics, Vaasan gained the ability to predict production requirements and prepare for fluctuating orders ultimately fulfilling 30% more orders. http://bit.ly/1bt5yGt
Real-time Predictive Analytics in Manufacturing - Impetus WebinarImpetus Technologies
Impetus webcast "Real-time Predictive Analytics in Manufacturing" available at http://lf1.me/hqb/
This Impetus webcast talks about:
• The business value of predictive analytics
• How real-time analytics is enabling ‘intelligent-data’ driven manufacturing
• A Reference Architecture and real world examples based on the experiences of Impetus Big Data architects
• A step-by-step guide for successfully implementing a predictive analytics solution
Defining Data Science
• What Does a Data Science Professional Do?
• Data Science in Business
• Use Cases for Data Science
• Installation of R and R studio
- A new Microsoft via AI and IoT
- Smart Manufacturing - quick steps and ready to go preconfigured solution to jumpstart
- AI - computer vision and AOI in manufacturing
Creating ML models is just the starting of a long journey. In this presentation which was given as a talk on e2e AI talks, I talk about the various challenges in the machine learning life cycle
Digital twins: the power of a virtual visual copy - Unite Copenhagen 2019Unity Technologies
From buildings and infrastructure to industrial machinery and factories, digital twins are becoming integral revisualization tools across the industrial sector. Learn how Unit040, a company specializing in visualization and simulation, creates digital twins that combine real-time 3D technology with BIM, CAD and CAE systems to add value at all stages of the building and product lifecycle, from the early design phase to predictive maintenance using Internet of Things (IoT) data.
Speakers:
Pieter Weterings - Unit040
Guido van Gageldonk - Unit040
Watch the session on YouTube: https://youtu.be/j4i14p89h_s
IoT Architectures for a Digital Twin with Apache Kafka, IoT Platforms and Mac...Kai Wähner
A digital twin is a digital replica of a living or non-living physical entity. This session discusses the benefits and IoT architectures of a Digital Twin in Industrial IoT (IIoT) and its relation to Apache Kafka, IoT frameworks and Machine Learning. Kafka is often used as central event streaming platform to build a scalable and reliable digital twin for real time streaming sensor data. A live demo shows a scalable digital twin infrastructure for condition monitoring and predictive maintenance in real time for a connected car infrastructure leveraging Kafka, MQTT and TensorFlow.
Key Take-Aways:
• Learn about use cases and characteristics of a digital twin in various industries
• Understand how to build a digital twin for every single (of tens of thousands) IoT device or machine
• See different IoT architectures with Kafka and other IoT technologies and products, including edge, hybrid and global deployments
• Understand the relation to Machine Learning and bring added value to your IoT infrastructure by enabling use cases like predictive maintenance
• Understand how the Apache Kafka enables scalable and flexible end-to-end integration processing from IIoT data to various backend applications
• Watch a live demo of an end-to-end integration, real time processing and analytics of thousands of IoT devices
More details:
https://www.kai-waehner.de/blog/2019/11/28/apache-kafka-industrial-iot-iiot-build-an-open-scalable-reliable-digital-twin/
https://www.kai-waehner.de/blog/2020/03/25/architectures-digital-twin-digital-thread-apache-kafka-iot-platforms-machine-learning/
https://youtu.be/Q3eKPEVwNVY
Data Lakes are early in the Gartner hype cycle, but companies are getting value from their cloud-based data lake deployments. Break through the confusion between data lakes and data warehouses and seek out the most appropriate use cases for your big data lakes.
Data Quality: A Raising Data Warehousing ConcernAmin Chowdhury
Characteristics of Data Warehouse
Benefits of a data warehouse
Designing of Data Warehouse
Extract, Transform, Load (ETL)
Data Quality
Classification Of Data Quality Issues
Causes Of Data Quality
Impact of Data Quality Issues
Cost of Poor Data Quality
Confidence and Satisfaction-based impacts
Impact on Productivity
Risk and Compliance impacts
Why Data Quality Influences?
Causes of Data Quality Problems
How to deal: Missing Data
Data Corruption
Data: Out of Range error
Techniques of Data Quality Control
Data warehousing security
Introduction to Data Science and AnalyticsSrinath Perera
This webinar serves as an introduction to WSO2 Summer School. It will discuss how to build a pipeline for your organization and for each use case, and the technology and tooling choices that need to be made for the same.
This session will explore analytics under four themes:
Hindsight (what happened)
Oversight (what is happening)
Insight (why is it happening)
Foresight (what will happen)
Recording http://t.co/WcMFEAJHok
Big data is a term that describes the large volume of data may be both structured and unstructured.
That inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters.
A Seminar Presentation on Big Data for Students.
Big data refers to a process that is used when traditional data mining and handling techniques cannot uncover the insights and meaning of the underlying data. Data that is unstructured or time sensitive or simply very large cannot be processed by relational database engines. This type of data requires a different processing approach called big data, which uses massive parallelism on readily-available hardware.
Why an AI-Powered Data Catalog Tool is Critical to Business SuccessInformatica
Imagine a fast, more efficient business thriving on trusted data-driven decisions. An intelligent data catalog can help your organization discover, organize, and inventory all data assets across the org and democratize data with the right balance of governance and flexibility. Informatica's data catalog tools are powered by AI and can automate tedious data management tasks and offer immediate recommendations based on derived business intelligence. We offer data catalog workshops globally. Visit Informatica.com to attend one near you.
Data science is an interdisciplinary field that uses algorithms, procedures, and processes to examine large amounts of data in order to uncover hidden patterns, generate insights, and direct decision making.
Framework for understanding quantum computing use cases from a multidisciplin...Anastasija Nikiforova
This presentation is a supplementary material for the article "Framework for understanding quantum computing use cases from a multidisciplinary perspective and future research directions" (Ukpabi, D.C., Karjaluoto, H., Botticher, A., Nikiforova, A., Petrescu, D.I., Schindler, P., Valtenbergs, V., Lehmann, L., & Yakaryılmaz, A) available at https://arxiv.org/ftp/arxiv/papers/2212/2212.13909.pdf. THe presentation, however, was delivered for QWorld Quantum Science Days 2023 | May 29-31.
The fourth industrial revolution Industry 4.0 represents a new paradigm shift from “centralized” to “decentralized” industry relies on cyber-physical based automation where sensors send data directly to the cloud and services such as monitoring, control and optimization automatically subscribe to necessary data in real-time. In the coming years, these technologies will be seen as a viable alternative to current manufacturing processes. According to a recent report by Markets and Markets, smart factory technology will have global market size of 74.80 Billion USD by 2022. The talk provides a comprehensive introduction to Industry 4.0 and Smart Factory. Technical challenges and social implications of smart factory will be discussed. The applicability of these emerging technologies in developing economies is highlighted in this talk as well.
Big Data & Analytics in the Manufacturing Industry: The Vaasan GroupIBM Analytics
The Vassan Group struggled to accurately forecast fluctuating sales orders across the Nordic region. As a result, they couldn't effectively plan their resource and production schedule. With IBM Big Data & Analytics, Vaasan gained the ability to predict production requirements and prepare for fluctuating orders ultimately fulfilling 30% more orders. http://bit.ly/1bt5yGt
Real-time Predictive Analytics in Manufacturing - Impetus WebinarImpetus Technologies
Impetus webcast "Real-time Predictive Analytics in Manufacturing" available at http://lf1.me/hqb/
This Impetus webcast talks about:
• The business value of predictive analytics
• How real-time analytics is enabling ‘intelligent-data’ driven manufacturing
• A Reference Architecture and real world examples based on the experiences of Impetus Big Data architects
• A step-by-step guide for successfully implementing a predictive analytics solution
Innovation is one of the key enablers for European enterprises to compete in global markets. The term ‘innovation’ is constantly used in speeches of managers, politicians, public administrators. However, in the large majority of cases, the term is used as a generic 'place holder', a sort of container whose actual content is left to the intuition. For this reason it is important to deeply elaborate, specifically on the notion of Enterprise Innovation, to better understand the essence and meaning of innovation.
Innovation stems from a virtuous mix of intuition, creativity, and a solid background knowledge. Each innovation endeavour has its own characteristics, largely different from previous experiences. It falls in the category of ‘wicked problems’, i.e., problems difficult to solve because of incomplete, fuzzy, changing requirements. Nevertheless, there are recurring patterns and it is possible to conceive systematic methods, and supporting information systems, to promote and manage innovation avoiding the risk to close it in a ‘cage’, risking depressing the fundamental creativity and fantasy. This talk will present an innovative framework for enterprise innovation that includes a methodology and an innovation management platform which is based on an generic behavioural pattern (i.e., independent of the industrial sector), a strong knowledge orientation, and an innovation monitoring system funded on a number of Key Performance Indicators, to constantly keep the progress of the innovation project under control.
The FDA now defines process validation as “the collection and evaluation of data, from the process design stage throughout production, which establishes scientific evidence that a process is capable of consistently delivering quality products.” On-going process validation is therefore the most important practical outcome of any QbD program. This sessions helps make the connection between process validation and QbD and why QbD starts in process development and doesn’t end in manufacturing.
Manufacturing Execution System for Industry-I am pleased to share details about our successfully working model, as how we can provide you with innovative & industry proven Plant Intelligence Solutions for Automotive Manufacturing Plant like yours to give you following benefits in Real Time Environment:
• Informed decisions based on Data Analytics
• Streamlined and Optimized Operations
• Improved Productivity
• Reduce Total Defects
• Reduced Inventory
• Lean, “Smart” MES approach and application coverage for low TCO
• Improved return on assets and investments (ROA/ ROI)
• Improved Equipment Up-Time
• Improved responsiveness, improved plant throughput time
• Enhanced Real Time visibility into production data
We have successfully served as per expectations of many End-Users in Manufacturing, Food & Beverage, Pharma, Oil & Gas, Petrochemical , Cement, Power, & metals Industry. We have more than 5000+ software installations throughout India with proven track record in almost every industry vertical and have delivered projects to 40+ countries in every continent including Americas, Europe, Asia, Africa, and Australia.
Predictive Data Analytics to Help Your CustomersExperian_US
The @ExperianDataLab hosts a #DataTalk on Thursdays at 5 p.m. ET on Twitter. Join us.
This week, we talked about data preparation, model evaluation, testing effectiveness of predictive analytics, challenges, and trends in predictive analytics.
We learned from Michael Beygelman, Co-founder and CEO of Joberate and Berry Diepeveen, Partner and Enterprise Intelligence Leader at EY in South Africa, and Chuck Robida, Chief Scientist for Experian Decision Analytics.
Learn about past and upcoming chats at:
http://experian.com/datatalk
Practical Advanced Process Control for Engineers and TechniciansLiving Online
In today's environment, the processing, refining and petrochemical business is becoming more and more competitive and every plant manager is looking for the best quality products at minimum operating and investment costs. The traditional PID loop is used frequently for much of the process control requirements of a typical plant. However there are many drawbacks in using these, including excessive dead time which can make the PID loop very difficult (or indeed impossible) to apply.
Advanced Process Control (APC) is thus essential today in the modern plant. Small differences in process parameters can have large effects on profitability; get it right and profits continue to grow; get it wrong and there are major losses. Many applications of APC have pay back times well below one year. APC does require a detailed knowledge of the plant to design a working system and continual follow up along the life of the plant to ensure it is working optimally. Considerable attention also needs to be given to the interface to the operators to ensure that they can apply these new technologies effectively as well.
WHO SHOULD ATTEND?
Automation engineers
Chemical engineers
Chemical plant technologists
Electrical engineers
Instrumentation and control engineers
Process control engineers
Process engineers
Senior technicians
System integrators
MORE INFORMATION: http://www.idc-online.com/content/practical-advanced-process-control-engineers-and-technicians-26
A data science observatory based on RAMP - rapid analytics and model prototypingAkin Osman Kazakci
RAMP approach to analytics: Rapid Analytics and Model Prototyping; collaborative data challenges with in-built data science process management tools and analytics; An observatory of data science and scientists. Presented at the Design Theory Special Interest Group of International Design Society. Mines ParisTech and Centre for Data Science.
Presentation given to the BCS Data Management Specialist Group on 10th April 2018.
Data quality “tags” are a means of informing decision makers about the quality of the data they use within information systems. Unfortunately, these tags have not been successfully adopted because of the expense of maintaining them. This presentation will demonstrate an alternative approach that achieves improved decision making without the costly overheads.
From making your campaigns more effective to greater accuracy in attribution, Revolution Analytics shows you how data analysis and predictive analytics with Revolution R Enterprise will help you make your marketing budget work harder and last longer.
Big data is data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating and information privacy
Hadoop Training in Chennai from BigDataTraining.IN is a leading Global Talent Development Corporation, building skilled manpower pool for global industry requirements. BigData Training.in has today grown to be amongst worlds leading talent development companies offering learning solutions to Individuals, Institutions & Corporate Clients.
Choosing the right process improvement tool for your project.
Learn how an experienced engineer decides when simulation is the right tool for his projects,
and when it isn't.
With the evolution of process improvement software, it can be difficult to decide the right tool for the job. Using something too powerful and complex can be a lengthy and unnecessary process, but underestimating the depth of analysis required and choosing something too simplistic early in a project can result in repeated work later.
Measure, Metrics, Indicators, Metrics of Process Improvement, Statistical Software Process Improvement, Metrics of Project Management, Metrics of the Software Product, 12 Steps to Useful Software Metrics
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Manufacturing Data Analytics
1. Data Analytics in Manufacturing
Gian Antonio Susto
Statwolf LTD
gianantonio.susto@statwolf.com
1
2. Outline
1. The Data Analytics Environment
2. Principles of Manufacturing Informatics
3. Machine (Statistical) Learning
4. Machine Learning in Manufacturing
a) Virtual Metrology
b) Root Cause Analysis
c) Predictive Maintenance
d) Fault Detection
2
4. The (Big) Data Era
• Data Explosion
– Increased storage capability (Moore’s Law)
– Internet of Things
Gartner: 26 billion of IoT object by 2020
4
Techradar.com
Data aggregated by Gongos Research
7. We are drowning in information and starving for
knowledge
- Rutherford D. Rogers
• Insights/learning
• Predictions
• Decision making suggestions
• ...
7
The (Big) Data Deluge
8. Statistical
Learning
Software
Engineering
Data Analytics
Finance Manufacturing
Biology Robotics
...
8
Data Analytics: an Interdisciplinary Field
10. (Big) Data in Manufacturing
10
• Manufacturing companies record enormous amount of process
data
• Example [1] - Consumer Package Goods company that produces
a personal care product generates:
[1] The rise of Industrial Big Data
- General Electrics
11. (Big) Data in Manufacturing
11
• ‘Leveraging big data is imperative as information is at the heart of competition
and growth for industrial businesses. Data-driven strategies based on real-time
and historical process information will help companies optimize performance’ [1]
• Possible improvements:
- Proving quality to trading partner/costumer
- Maximizing yield
- Reduce downtime
- Recovering capacity
[1] The rise of Industrial Big Data
- General Electrics
12. The Manufacturing Data Analysis Process
12
- Conversion
- Parsing
- Aggregation
- Alignment
Problem Collection Cleaning
Modelling Roll-out
- Definition
- Expected
Impact
- Evaluation
metric
- Quality
- Reconciliation
- Missing data
handling
- Denoising
- Outlier detection
- On-line
implementation
- Business
outcome
- Improvement
- Feature
Extraction
- Building
- Evaluation/
Comparison
13. The Manufacturing Data Analysis Process
13
- Conversion
- Parsing
- Aggregation
- Alignment
Problem Collection Cleaning
Modelling Roll-out
- Definition
- Expected
Impact
- Evaluation
metric
- Quality
- Reconciliation
- Missing data
handling
- Denoising
- Outlier detection
- Feature
Extraction
- Building
- Evaluation/
Comparison
Machine Learning
modeling based on historical
dataset Z of
- n observations
(samples)
- p variables (features)
- On-line
implementation
- Business
outcome
- Improvement
15. Machine Learning Problems
• Two classes of modeling problem depending on
the type of data
– Supervised if labeled data (Z = [X Y] - X input, Y output)
– Unsupervised if un-labeled data (Z = X)
15
Modeling
Problem
Supervised
Regression
Classification
Unsupervised
16. Machine Learning Problems
• Two categories in case of supervised learning,
depending on the output type
– Regression if Y is continuous
– Classification if Y is discrete/categorical
16
Modeling
Problem
Supervised
Regression
Classification
Unsupervised
17. Supervised Learning: a Regression example
• Example: house pricing for real estate market [2]
• Historical dataset of n house transactions with
information regarding
– House price (output - Y)
– Land square footage (input - X)
– Living square feet (input - X)
– Effective year built (input - X)
– Mailing address (input - X)
[2] Machine Learning and the Spatial Structure of House Prices and
Housing Returns – A. Caplin et al.
17
18. Supervised Learning: a Classification example
• Example: Shazam
• A ‘digital fingerprint’ (X) is extracted from a
song sample and compared with a database
of 11 million songs (classes - Y)
Tip 1 - Defining good features is generally half of the battle
19. Unsupervised Learning
• Unlabeled data: quest for
hidden structure in the data
– Market Basket/Affinity
Analysis
• Pattern in the purchases: what is
bought together?
• Amazon 2009 revenue $24.5B,
$5B from recommended
products
– Clustering
• Grouping of a set of ‘similar’
object
• ‘You may also like’
21. Manufacturing Data Analytics Example
Four Examples of Manufacturing data analytics
problems:
A. Regression – Virtual Metrology (Semiconductor)
B. Regression – Root Cause Analysis (Pharmaceutical)
C. Classification – Predictive Maintenance
(Semiconductor)
D. Unsupervised Learning – Fault/Novelty Detection
(Semiconductor/HVAC)
21
22. [A] Regression – Virtual Metrology (VM)
22
• Semiconductor Manufacturing
• Production based on wafers
• Organization in lots (25 wafers)
• Hundreds (thousands!) of processes:
- Etching
- Lithography
- Chemical Vapor Deposition (CVD)
- ...
• Goodness of a process assessed by measuring one or more parameters (Y)
on the wafer (for CVD the thickness of the deposited layer)
• Unfortunately, measuring is costly and time-consuming
23. [A] Regression – Virtual Metrology (VM)
23
Wafer with metrology data Wafer without metrology data
• Common practice to save money/time: measuring just 1 wafer on a lot
• Drawbacks:
- Delays in detecting drifts in production
- No quality check for unmeasured wafers
- Update of the eventual controller just once on 25 process iterations
24. [A] Regression – Virtual Metrology (VM)
24
• Tool data X available for every iteration
(temperatures, pressures, flows, …)
• Exploit tool/logistic/production data to
estimate Y
• Each wafer has now at least an
estimation for quality/control purposes
X
i.e. From Lot-to-Lot to
Run-to-Run control [3]
[3] ‘Virtual Metrology and Feedback Control for
Semiconductor Manufacturing Processes using Recursive
Partial Least Squares’ - Journal of Process Control, Khan,
Moyne and Tilbury
25. [A] Regression – Virtual Metrology (VM)
25
• Modeling difficulties
1. Data fragmentation: several multiple-chambers
machines, multiple
products/recipes
2. High-dimensionality: thousands of variables
3. ‘Skinny problem’ (p >> n): numerical
problems for model estimation
Example Prediction of thickness for CVD: tool
with 3 chambers with 2 sub-chambers
- Exploiting Clustering for subset modeling
Tip 2 – ‘Visualize’/Examine data before
modeling
26. Dealing with high-dimensionality: Regularization
methods
26
• Not all the regression techniques are suitable for high-dimensional problems
• Simplest approach: Least Square Regression
• Objective: minimization of the prediction error on the training data
• OLS solutions with high-dimensional dataset are often ill-conditioned: the
predicted output can change drastically with small perturbations of the input
causing poor prediction performance
27. Dealing with high-dimensionality: Regularization
methods
27
• Regularization methods overcome the issue
• Ridge Regression (RR) [L2]: stable (“easier”) solutions are
encouraged by penalizing coefficients (ill-posed problems or
over-fitting issues are generally resolved)
• Least Absolute Shrinkage and Selection Operator (LASSO) [L1]:
28. Dealing with high-dimensionality: Regularization
methods
28
• A penalty on model complexity generally enhance performances
• Different behaviour: LASSO provides sparse results!
• Ie. Diabetes data: p = 10, n = 367 [4]
• Sparsity provides interpretable models
Essentially, all models are wrong,
but some are useful
- George E.P. Box
[4] ‘The Elements of Statistical Learning:
Data Mining, Inference, and Prediction’ –
Hastie, Tibshirani, Friedman 2009
29. Regularization methods: guidelines
29
• RR & LASSO: no a-priori guarantee on best prediction accuracy (cross-validation
always a necessary step to evaluate results generality)
• LASSO is generally outperformed by RR when:
– p > n
– if there are high correlations between predictors
• Elastic Nets combined the 2 techniques
• Kernel Methods
– non-linear solutions
embedded in a linear framework
(augmented space)
From Chris Thornton, U. Sussex
30. Non-linear Regression: Neural Networks (NNs)
30
• NNs mimic the structure of the brain and how it learns from experience
• Example architecture:
Variables are associated with
nodes and functions with arches
x
a(x)
y
S
b
x
u1
u2
un
w1
w2
wn
31. Non-linear Regression: Neural Networks (NNs)
31
• PROS:
- Great prediction accuracy
- Flexibility in modeling non-linearities
• CONS:
- Time consuming tuning
- Not suitable for high-dimensional problems
• In case of high-dimensionality, 2 steps procedure applied:
1. Dimensionality reduction (correlation, PCA, etc… )
2. Modeling
Tip 3 - The choice between linear vs non-linear approaches should be
tailored to the problem at hand
32. [B] Regression – Root Cause Analysis (RCA)
32
• Pharmaceutical
Manufacturing
• Slow-Release (Time
Release) technologies:
capsules that dissolve over
time for a controlled
release of drug into the
bloodstream
• Dissolution profiles (y1,2,3,4) over different time intervals (T1, T2, T3, T4)
are required to fall within intervals
• Variability in the production: where does it come from? Root Cause
Analysis
33. [B] Regression – Root Cause Analysis (RCA)
x0 y1,2,3,4
• Several production steps and can
be influenced by many factors (e.g.
raw materials quality)
• All the available data sources are
exploited for modeling the
dissolution curves (y1,2,3,4)
• Modeling with sparse approaches
to pinpoint most influential
parameter for variability
33
Process #1 Process #2 Process #3 Process #4
x1 x2 x3 x4
16
14
12
10
8
6
4
2
0
RCA
X1 X2 X3 X4
34. • With data analytics sophisticated approaches maintenances
handling
• 3 groups of approaches in manufacturing for dealing with
maintenances:
34
[C] Classification – Predictive Maintenance
(PdM)
R2F PvM PdM
1. Run-to-Failure (R2F)
• Repairs or restore actions
performed only after the
occurrence of a failure
• ‘If it’s not broken don’t fix it’
35. • With data analytics sophisticated approaches maintenances
handling
• 3 groups of approaches in manufacturing for dealing with
maintenances:
35
[C] Classification – Predictive Maintenance
(PdM)
R2F PvM PdM
2. Preventive Maintenance (PvM)
• Planned schedule of maintenances
with the aim of anticipating
failures
• Failures generally warded off
• Unnecessary maintenances
performed
36. • With data analytics sophisticated approaches maintenances
handling
• 3 groups of approaches in manufacturing for dealing with
maintenances:
36
[C] Classification – Predictive Maintenance
(PdM)
R2F PvM PdM
3. Predictive Maintenance (PdM)
• Maintenance actions based on
suggestion provided by a data
analytics module
• PdM module based on data
available on the tool/production
37. [C] Classification – Predictive Maintenance
(PdM)
37
• Semiconductor
Manufacturing
• Forecast of integral type
faults (caused by machine
usage)
• Use case: breaking of
tungsten filament in ion-implanters
• Goal: define an indicator (y) – health factor – of the current component
status from process parameters (X)
38. [C] Classification – Predictive Maintenance
(PdM)
38
• Health factor indicator is a quantitave index, however we treat this as a
Classification problem
• Observations divided into:
o ‘Non-Faulty’ (data of process iterations with working component)
o ‘Faulty’ (data of process iterations with broken component)
• Use of Support Vector Machines: the distance from the decision
boundary is exploited as ‘distance to fail’
Decision
boundary
Adapted from [4]
39. [C] Classification – Predictive Maintenance
(PdM)
39
• Health factor indicator is a quantitave index, however we treat this as a
Classification problem
• Observations divided into:
o ‘Non-Faulty’ (data of process iterations with working component)
o ‘Faulty’ (data of process iterations with broken component)
• Use of Support Vector Machines: the distance from the decision
boundary is exploited as ‘distance to fail’
Adapted from [4]
40. [C] Classification – Predictive Maintenance
(PdM)
40
• Health factor indicator is a quantitave index, however we treat this as a
Classification problem
• Observations divided into:
o ‘Non-Faulty’ (data of process iterations with working component)
o ‘Faulty’ (data of process iterations with broken component)
• Use of Support Vector Machines: the distance from the decision
boundary is exploited as ‘distance to fail’
Adapted from [4]
41. [C] Classification – Predictive Maintenance
(PdM)
41
• Health factor indicator is a quantitave index, however we treat this as a
Classification problem
• Observations divided into:
o ‘Non-Faulty’ (data of process iterations with working component)
o ‘Faulty’ (data of process iterations with broken component)
• Use of Support Vector Machines: the distance from the decision
boundary is exploited as ‘distance to fail’
Adapted from [4]
42. [C] Classification – Predictive Maintenance
(PdM)
42
• Health factor indicator is a quantitave index, however we treat this as a
Classification problem
• Observations divided into:
o ‘Non-Faulty’ (data of process iterations with working component)
o ‘Faulty’ (data of process iterations with broken component)
• Use of Support Vector Machines: the distance from the decision
boundary is exploited as ‘distance to fail’
Adapted from [4]
44. [C] Classification – Predictive Maintenance
(PdM)
44
• Minimization of the overall costs
• Support Decision System:
from process data and production/maintenances
costs, the PdM module suggests when actions should
be taken to minimize costs
45. [D] Unsupervised Learning – Fault Detection
45
• Two classes of failures related problem
1) Prediction (breakings in the future)
2) Detection (already happened breaking)
• With thousands of variables the
detection of a breaking is not
always a trivial task
• Univariate monitoring can be
measleading
Tip 4 - Multivariate systems need
multivariate approaches
46. [D] Unsupervised Learning – Fault Detection
46
• Employment
1. Issue recognized by the system
2. Drill-down of the ‘guilty’ parameter/s
3. Original data inspection
47. Data Analytics in Manufacturing
Gian Antonio Susto
Statwolf LTD
gianantonio.susto@statwolf.com
47