1. Making better use of data and AI in
Industry 4.0
Albert Y. C. Chen, Ph.D.
albert.y.c.chen@gmail.com
http://slideshare.net/albertycchen
2021/01
2. Albert Y. C. Chen, Ph.D. 陳彥呈博士
asking why → getting to the bottom of things → optimizing them all!
● Experience
○ GM @ Clobotics US (2018--2020)
○ VP of R&D @ Viscovery (2015--2018)
○ Consultant @ Cardinal Blue (US), Cinnamon AI (Japan), Skymizer (Taiwan), AIM (Taiwan)
○ Instructor @ Taiwan AI Academy (Taipei 1st-3rd class, Hsinchu 1st & Taichung 1st class)
○ Reviewer and External Expert @ MOST, MOEA, ITRI, III
○ Mentor @ Make NTU, Hack NTU, MOE Hackathon, Quanta d.-Contest, Women in AI
● Education
○ Ph.D. in Computer Science @ The State University of New York at Buffalo
○ B.S. in Computer Science @ National Tsing-Hua University
3. Optimizing all---not all are on a known distribution
● We simplified things to make problems solvable
○ Assuming data are in linear scale and normally distributed; in fact, many algorithms make
these assumptions intrinsically, e.g., PCA, K-means.
○ Assuming that functions/spaces are known, linear, convex or continuous.
○ Assuming that there is one globally optimal solution instead of having multiple mutually
conflicting objective functions.
a discontinuous function an unknown function estimating unknown distribution pareto optimal solutions
4. Optimizing all---even when the
distribution can’t be modeled,
there are ways to optimize it
● Example 1: Automatically search for
optimal parameters for the Sauvola &
Pietikäinen (2000) image binarization
algorithm, a local (adaptive) variant of
the classical Otsu (1979) algorithm.
● Example 2: Automatically search for
optimal parameters for the
Felzenszwalb & Huttenlocher (2004)
Image Segmentation algorithm,
6. Industry 4.0 solution classification (Deloitte, 2015)
1. vertical networking or smart production
systems (e.g., IT integration, analytics and
data management, operational efficiency
2.0)
3. engineering across entire value chain
(e.g., cross-disciplinary innovation,
efficient management innovation,
efficient life-cycle management)
R&D Plan Buy Make Move Deliver After Market
4. exponential technologies (e.g.
AI, robots, nano material, 3D
printing)
2. horizontal integration (e.g.,
business model optimization,
smart supply chains & logistics)
Industry 4.0, Challenges and solutions for the digital transformation and use of exponential technologies. Deloitte. 2015.
7. Industry 4.0 value framework (KPMG, 2018)
Enterprise Value Network Value Planning Value Execution Value
● Product Portfolio
● Product Customer
Segmentation
● Product
Architecture
● Product Design
● Technology
● Innovation
● Manufacturing
footprint
● Sourcing strategy
● Supply chain
design
● Make versus buy
decisions
● Operating systems
and practices
● Organizational
policies and
procedures
● Production
volumes
● Supplier costs
● Operating
efficiency
● All costs paid:
○ labor
○ taxes
○ production
system
refresh
value ranges: 20-30% value ranges: 10-20% value ranges: 5-11% value ranges: 3-6%
Enabling technology includes: Robotics, 3D printing, sensor technology, AI, Digital twin, AR,
communications network infrastructure (5G), Big Data Platforms, Digital Supply Chain, IoT.
A reality check for today’s C-suite on Industry 4.0. KPMG. 2018.
← longer cycle shorter cycle →
8. Bosch Digital Plant in Blaichach, Germany
● Makes ABS/ESP; 6.7M devices/year from 3.3k
associates. Industry 4.0 innovations include:
○ flexible shift scheduling with mobile apps
○ all raw material with RFID
○ transparency in entire value stream, in real time
○ aware of processing status of every order at any given time
○ demand driven material replenishment
○ smart climate control
○ status of every machine
○ failure/losses of production lines in real time
○ record potential error sources
○ reinoccuring interferences are linked to known error patterns
○ trends and predictions are provided to operation personals
However, these are mostly
about “Execution Value”,
“operating efficiency”, which
only accounts for 3%-6%
improvement in value.
Last two do involve some AI.
https://www.bosch.com/stories/the-connected-factory/ https://www.bosch-industry-consulting.com/guided-tours/
Bosch
ABS
module
9. Fujitsu Telecom Networks, Oyama Factory, Japan
● FJPS (Fujitsu Production System)
○ based on the TPS (Toyota Production System)”, with emphasis on
human-machine colab
○ “tenouchi-ka” (keeping technology within our grasp)
○ “Kaizen” (continuous and steady improvement)
● Using technology to improve enterprise value
○ CAD for product design and VPS (virtual product simulator).
○ Manufacturing processes are also optimized with the help of simulators.
● Using technology to improve execution value
○ Flexible production line that easily switches between products, where
assembly, testing, packaging are all in one production line
○ IoT sensors + AI to identify problems on the production line, with 10x
productivity growth in 6 months since initiative was launched in 2018.
https://www.meti.go.jp/english/publications/pdf/journal2015_05a.pdf
https://www.fujitsu.com/downloads/GLOBAL/vision/2020/download-center/FTSV2020_wp5_EN_1.pdf
10. As promising as it seems, why aren’t we there yet?
● Industry 4.0
○ Requires the buildup of connectivity, accumulation of
information, before the formation of knowledge.
● AI
○ AI (knowledge formed by machines) can identify obscure
patterns and learn complex concepts from noisier data.
○ AI doesn’t work out of the box; it is data driven.
○ However, garbage (data) in, garbage (AI) out.
● AI & Data Flywheel
○ AI isn’t a one-off effort; it’s a flywheel---the faster it turns, the
stronger the AI in the same period of time.
○ AI and data cycle are effected by and constrained by product
design, UX, business constraints, and many factors.
I 4.0
Knowledge
Information
Connectivity
data
AI
business
Data Flywheel
Steps towards Industry 4.0
mostly still
stuck at data
governance
12. Data, AI & Business must be considered as a whole
● Data cost
○ Data that need to be digitized, structurized, fused from multiple sensors and annotated
separately are costlier to obtain. Furthermore, there are unlimited data to collect; pick wisely.
● Data speed
○ Even data that are already digitized and structured have widely varying update frequencies,
e.g., from shortest to longest cycle, production, revenue, operation, R&D.
● AI adaptibility
○ Some AI models adapt to datasets/tasks better; problem domain is key factor to adaptibility.
● AI tolerability
○ Some application requires extreme accuracy, while some could tolerate workarounds.
● Business impact & constraints
○ Even if the data and AI seem “low hanging”, the business impact might not be high; for
businesses with stringent constraints, it should in turn affect our choice of AI and data.
13. Connecting AI with Business Impact
● AI, even with 99.99% accuracy, has no business value alone
○ Accuracy = [ True Positive (TP) + True Negative (TN) ] / [ True Positive (TP) + True Negative
(TN) + False Positive (FP) + False Negative (FN) ]
○ With the presence of both FP and FN, we would still need to reinspect every single product.
○ Adjust AI to only emit false positives, with team setup to manually reinspect those flagged. The
ratio of false positive : true positive, or the total re-inspection cost, should be the KPI.
● Oftentimes, we’re not having enough supporting measures in place
○ What to do with reduced QC headcount?
○ What to do with 30 minute advanced warning?
○ What to do with warning of labor shortage?
○ What to do with advanced demand prediction?
14. Business Constraints dictating the development of AI
● Some examples include:
○ For problems with a big “base number”, a small improvement leads to huge impact.
○ For problems with a fast cycle, small iterative improvements leads to big impact over time.
○ For problems with data that are expensive to acquire and label, design the AI to be not so
data-hungry (e.g., transfer learning, simpler models, break the bigger problem down to smaller
ones to reduce the combinations of free variables instead of using end-to-end models)
○ For problems with short lead time and fast cycle, either speedup the data collection and
annotation pipeline, or change the model itself to require less data with every change.
○ For problems with consistently dirty data, build team and SOP accordingly (e.g., data analyst +
data engineer + platform engineer + data scientist)
○ For problems that cannot allow proper A/B testing, design the AI product accordingly, s.t. new
models could run in background, have results verified, before put into production.
15. Optimizing data + AI + business is complicated...
How to simplify a complex optimization problem?
● Estimate better before we even start
○ Better confidence interval estimates
○ Use premortem analysis to get more precise estimates
● Optimize incrementally and iteratively
○ Getting feedback and refining our estimates
○ Learning from mistakes
○ Sharing the lessons learned
● Oftentimes, the devil is in the detail
○ Paying attention and responding to weak warning signs
○ Avoid the normalization of deviance
● AI helps with better estimates, iterative optimization and catching the details.
16. Conquering complexity---better confidence
interval estimates
● Consider a wider range of outcomes to help refine our estimates.
○ “Research on these types of forecasts finds that 90% confidence intervals, which, by definition,
should hit the mark 9 out of 10 times, tend to include the correct answer less than 50% of the
time.”
○ These bad estimates are usually based on just two endpoints: (1) the best-case outcome and a
plausible worst-case outcome.
○ SPIES “Subjective Probability Interval Estimates” by Moore and Haran, pushes us to consider
the entire range of possible values in several intervals. It is repeatedly shown to bring the 90%
confidence interval hit rate repeatedly > 75%. Using project planning as example, the 90%
confidence interval should be 2-6 months using the SPIES method:
17. Conquering complexity---estimates that are more
precise, with premortem analysis
● We always have a clearer view of
things in retrospect.
○ How do we make use of that?
○ Postmortem analysis seems to provide clear
insights. How do we do it before we actually
fail?
● Answer: premortem analysis.
○ Assume that we failed, what would have
been the cause.
18. Premortem analysis example
Weakness & Threat from SWOT analysis Premortem analysis
Prospective hindsight boosts our ability to identify reasons why an outcome might occur,
providing a wider set of reasons that are more concrete and precise.
19. Conquering complexity---getting feedback and
refining our estimate
● Get in-time feedback to learn from mistakes and refine our estimates
○ Critical decision makers, without feedbacks from their decisions, won’t be much better than
trainees:
■ Immigration officers accepted photo IDs that didn’t match, 1 in 7 times.
■ Judge’s parole acceptance rate are highest after lunch break and almost 0 before it.
○ To make things worse, critical decision makers are expected to be reliable, and its hard for them
to admit, discuss and learn from their mistakes.
● Don’t filter out observations that don’t fit our assumption
○ E.g., Flint water crisis, 2014, flushing pipes before sampling, limiting water flow while sampling,
sampling only newer neighborhoods, and throwing out filtered water samples that still exceed the
safety limit.
○ When we were calculating our labeler accuracy.
20. Conquering complexity---learning from mistakes
● Document close calls, share the fix, and make sure fixed procedure is enforced
or knowledge passed down.
● E.g., 2009 DC train crash. 1970s system started having short circuits. Problem
identified and fix documented in 2005, but was lost in the org in just 4 years.
21. Conquering complexity---sharing the lessons
learned
● Human vs octopus (knowledge not shared between generations)
● The creation of NASA ASRS, in response to 1974 TWA flight 514 crash
22. Conquering complexity---paying attention and
responding to weak warning signs
● It is not possible to find all possible chains of
strange and rare interaction where errors might
emerge.
● Complex systems do give off warning signs before
it fails, but we tend to ignore them as long as things
turn out to be OK.
● Create a culture that promotes the sharing of
misses and near-misses.
○ E.g., Novo Nordisk lost FDA insulin license in 1993. In the
wake of the event, their newly created 20 person task force
that talks to every unit once every few years, uncover
concerns, and elevate as needed.
23. Conquering complexity---avoid the
normalization of deviance
● “With each launch, problems that were once
unexpected become more expected---and ultimately
accepted.”
● NASA space shuttle Challenger, 1986
○ 9 years prior to the disaster, engineers recommended
redesigning booster joints.
○ 9 months before the disaster, primary and secondary o-rings
were severely damaged.
● NASA space shuttle Columbia, 2003
○ Foam insulation of the fuel tank breaks off, was treated as
routine maintenance issues, and eventually caused the
Columbia disaster.
24. Decisions, decisions, decisions
Planning Are we running with blind faith, or are we estimating well enough?
Data &
Models
Are we collecting too much or too few data?
Are the models too problem specific or too general?
Alerts Are we getting too few warnings, or are we overloaded with warnings? If
latter, how do we ensure critical ones are attended with top priority?
Learning Are we learning from mistakes, or are we repeating them?
Are we doing so manually, or are we doing so systematically?
Process Are the processes designed to be sensitive to error signals?
Organization Is the organization designed to service the aforementioned goals?
I-4.0 in
factory
I-4.0 in
whole
coporation
26. Oftentimes, we need to drill down to extreme details
● Fasttrack of becoming an AI expert comes at the cost of skipped details
○ How are signals captured and formed?
○ How do I know a straight line in an image is a straight line?
○ How do I know an edge I see is an edge indeed (to our algorithms)?
○ What features are recorded and how are the features extracted?
○ What happens when I mess with the feature extraction process?
○ What are the assumptions and what happens when they are violated?
○ The design of classifiers and the reason behind their designs.
● Sometimes, seemingly trivial and naive questions have much deeper roots
○ Why do we see color? (we have three cones, but how did that happen?)
○ Why is the sky blue? (hint: not due to the reflection of ocean)
27. Oftentimes, we need to reevaluate our decisions
● What is “truth”? Truth is a human invention.
○ We discover a phenomenon, we make assumptions about it. We repeatedly verify this
assumption and start calling it the “truth”. The truth holds until its proven otherwise.
○ “Science is a system designed to search for falsifiable models of Nature.” David Helfand.
○ There’s no absolute truth nor authority in science. There’s no reason for a person to not ask
“why”.
● It’s OK to be wrong. Fail fast, fail often, fail forward
○ "The smartest people are constantly revising their understanding, reconsidering a problem
they thought they'd already solved. They're open to new points of view, new information, new
ideas, contradictions, and challenges to their own way of thinking." —Jeff Bezos
○ Octopus are smarter than human, however, every octopus start their lives from a blank sheet.
Human are dumb. Human make mistakes. Human learn from mistakes. Human share
mistakes so that others don’t make the same mistakes.
28. How to filter out noise from all the latest research?
● The power of of collective intelligence
○ Group paper discussion. When in doubt, write down questions and ask why.
○ Get the quick (yet not necessarily correct nor comprehensible) answers from colleagues.
○ “If you can’t explain it simply, you don’t understand it well enough”
● Ask yourself “why does it work?”
○ When a new algorithm works, which part contributed to its success? (a) data + (b)
augmentation + (c) preprocessing + (d) feature extraction + (e) dimension reduction + (f)
classification + (g) explanation. Swap out the parts until results degrade significantly.
● Hacking research papers
○ Value consistant reproducibilty over often short-lived fads
○ Don’t retrain/finetune right away. Take a trained model and test it thoroughly on similar
datasets (e.g., lighting & viewing angle change, position and size relative to the image.)
○ After finding out the failure point, dig into why it fails (e.g., data bias, feature not being
invariant, classifier not identifying important features, or optimizing the wrong objective.)
29. ● How many samples do we have for each class in training/validation/testing ?
Is it reflective of the real-world? Would it be reflective of the real-world after a
while with our update rules?
● What happens when the number of samples per class is imbalanced?
● Should we be modeling how frequently a class appears into the classifier, or
should we be training the classifier independent of it and weighing them later
as needed (likelihood, prior and posterior probabilities)
Tackling classifier performance issues:
training/testing set being biased
30. ● It is possible to train a reasonably good classifier with weak features.
● When do we know we’re hitting a glass ceiling due to bad features?
○ Feature visualization doesn’t tell us what is contributing to the output.
○ Activation map doesn’t tell us exactly which features are activated.
○ Need to compare with confusion matrix results; confused SKUs shouldn’t have indiscriminable
feature activation.
Tackling classifier performance issues:
features not being learned properly
31. Tackling classifier performance issues:
classifier overfit/underfit
● Classifier underfit, due to bad features or classifier, is easily identified.
● Is our classifier supposed to be stable or unstable (small changes in the training
data produces large changes in the trained classifier)?
● Does the performance of the trained classifier vary significantly during k-fold
cross validation or bagging (bootstrap aggregating)?
○ E.g., accuracy of 86%+-0.5% vs 86%+-3%.
○ If so, we are likely to have overfit the training set.
34. ● Design carefully to avoid mismatch of data, model & product complexity
○ End-to-end methods are cool and trendy, but be careful when it adds a great degree of
unnecessary free parameters.
○ Modern CNN’s are crazily efficient at learning from few data; that doesn’t mean we should
keep on bundling things and increasing # of free parameters, e.g., with typical range between
20M (Inception V3), and 60M (ResNet 152, EfficientNetB7) params.
● Design the AI product and process s.t. feedback data can be retrieved after
deployment
● Design the R&D process to speedup the data cycle
○ data → AI model → production → data
○ Google runs through this cycle in 2 weeks for its search engine; how fast are we iterating?
Data vs model complexity vs product design
35. 1. Speeding up data collection and storage
● Use a multi-tiered setup to store data from different sources, with vastly different characteristics,
instead of forcing a one-size-fit-all design and optimizing storage space/query speed prematurely.
● Leverage/reference the elegant designs of different managed services, e.g., Snow Flake
36. 2. Speeding up data cleanup
● Selecting a annotation tool
○ Complete set of functions for task at hand.
○ Annotation team/task management.
○ Integration with labeling service outsourcing.
● Annotation task design
○ Easy yes/no annotations that can be
completed in a few seconds usually yields
better quality labels at lower overall cost.
○ Regardless of how well the labeling team is
trained, there will be mislabeled samples, so
which the labeling task design need to take
into account.
37. 3. Speeding up dataset preparation
● Build mechanisms to auto-verfiy annotations and auto-update datasets
○ There will always be errors in data annotations. The more automated mechanisms that we
have in place to pick out these errors, the faster we’d be able to update datasets.
○ A good data team not only builds models, but also builds a better understanding of the data.
● Datasets should be version-controlled
○ For the purpose of reproducbiliblity and speeding up the data→model→production cycle.
○ To make it easier for annotators to revisit and revise annotations.
○ Git-LFS, DVC, Pachyderm all provide data versioning capabilities. Pachyderm, like MLflow,
emphasizes on automating the cycle, while DVC is simpler for just dataset versioning.
● Don’t obsess over having a unified and fixed label index
○ The annotation process itself might need many layers of label index transcription.
○ During the model training process, we would discover needs to further split/combine classes.
38. 4. Speeding up model training
● Must have proper resource management infrastructure
○ 1 GPU per person * number of persons is the wrong setup. Virtualized development
environment + training task scheduling, e.g., Google Colab + virtual GPU
○ Cloud comes with additional benefits of scale-up, e.g., GCP Burst Training; private/hybrid
cloud should consider similar offering.
● Sometimes, the code/SW isn’t as parallelize-able as one thinks
○ Is your GPU really busy? Really?
○ Is PyTorch/Keras Data Generator doing you a favor?
○ Preparing data properly for training tasks
○ Commonly accepted data augmentation methods might not be necessary nor helping your
model performance
○ Additional configuration details.
39. 4.1 Is your GPU really busy? Really?
● Issue
○ nvidia-smi only gives you GPU utilization at a single instance
○ nvidia-smi -l 1 repeats every 1 second
○ top doesn’t give you GPU readings
● Solution: use nvtop
○ Simply sudo apt install nvtop
○ If it fails, tap contrib and non-free,
then try again.
○ If all fails, build it from source
https://github.com/Syllo/nvtop
● Solution: TF2.1+Keras checkpoints
○ However, after CUDA 10.2,
non-admin programs have limited
GPU access.
40. 4.2 Is PyTorch/Keras generator doing you a favor?
● keras.preprocessing.image.imagedatagenerator is handy but slow
○ It takes input from a directory or a CSV, does data augmentation (rotate, sheer, zoom, flip),
and serializes the image for tensorflow training
○ GPU utilization <30% on anything more powerful than K80, even with SSD.
● PyTorch data generator seems to keep GPU busy, but it trains much slower
than comparable Tensorflow code with serialized TFrecord.
● Serialize data beforehand whenever possible
○ Single-process (default) TFrecord writer takes 4 minutes to serialize 5000 224x224 image.
○ Hand-crafted multi-process TFrecord writer can reduce it to 39 seconds on 8 core HDD VM
(throttled by the sustained IOPS limit on GCP VMs), 14 seconds on 8 core SSD VM, and 7.8
seconds on 16 core SSD VM (throttled by sustained throughput limit on GCP VMs)
○ Use the TFrecord writer in TFDS ( https://tensorflow.org/datasets/api-docs/python/tfds) util for
comparable but easier maintenance. Use image_label_folder to create custom datasets.
42. 4.4 To augment or not to augment?
● PyTorch/Keras has a relatively complete set of data augmentation routines;
● If you need more data augmentation than Tensorflow’s built-in ones:
○ Tensorflow Addons (https://github.com/tensorflow/addons)
○ Beware that not all routines implemented are GPU optimized; avoid having data processing
flow that moves back and forth between CPU and GPU due to PCIE bottleneck.
● However, rethink whether all the augmentations are really necessary
43. 4.5 Aim for this
● Easy to get >90% on
single V100; drops to
70%-80% on 8*V100.
● Same CNN on PyTorch
gets 100% all the time,
yet it trains slower.
● Due to PCIE bus
bottleneck, multi-GPU
parallelization suffers
with >8 GPUs.
44. 4.6 Additional devil in the detail
● Not all GPU FLOPS are born equal; compute capability have large influence
on training efficiency.
● Even with the aforementioned optimizations, we’d be lucky to hit 50% MXU
utilization 8*TPUv2; this setup easily beat 8*V100 at > 90% utilization.
● Increasing mini-batch size seems to reduce time/epoch, yet it comes with the
huge cost of requiring more epochs to converge, at a subpar local optimal.
● Some deep neural networks might train fast, but are extremely bad at transfer
learning. (e.g., EfficientNet, RNN, LSTM)
● Additional details including training vs testing image scale (FixResNet,
FixEfficientNet series from FAIR) and normalization (Big Transfer) have a
huge impact over other surface-level stuff, e.g., augmentation.
45. Speeding up model verification
● Biased sample and biased model
○ Our datasets are always somehow biased from the true underlying population.
○ AI model that performs well on a dataset doesn’t necessarily perform well in production.
● How to reduce bias?
○ Don’t trust models that miraculously perform better on one training run. Get multiple models
from the same architecture and hyperparameter and record their testing set performance
(mean and standard deviation). Only models with statistically significant improvements (e.g.,
85 +-1% vs 80% +-3%) should be considered.
○ Sampling with replacement works well with the
aforementioned practice.
○ AI team should be used to measuring
performance with “moving target”. When the
performance drops, it’s good indicator of
model bias or population shift.
46. 6. Speeding up model deployment
● Being evaluated on a moving target
○ ML team should get used to ML models being evaluated on a moving target instead of a fixed
val/test set.
○ The “moving target” should translate to actual business value instead of a vague score, e.g., a
cost of $30k over 500k units, instead of 99% accuracy in 2020/12.
● Model deployment
○ Should have version control and rollback mechanisms.
○ New model should be put into pre-production/staging and evaluated side-by-side for weeks;
only new models with statistically significant improvements should go into production.
48. Running into roadblocks while expanding AI in I-4.0?
● Bad politics?
○ The conflict of interest between IT and OT never ends; choose data capturing and model
deployment solutions that are acceptable middle-ground.
● Bad design?
○ Still too many moving parts and total number of combinations; simplify combination and
develop standardized play books.
● Bad practice?
○ Still too many manual process from “data collection → data cleanup → model training →
model deployment → data collection”
● Bad AI?
○ Oftentimes, it comes back down to the AI model being designed without the consideration of
product and business impact in mind.
49. Removing roadblocks towards expanding AI in I-4.0
● Balancing between nuts and bolts vs “black boxes”
○ We want to “open-up” OT modules to appease IT; yet having too many
combinations results in the unlimited requirements/expenses of SI.
○ How much SI work is too much?
● Get people that’s all over the place
○ Form teams of evangelists and solution architects to assist and customize
for different scenarios. Benchmark Intel (<2%), Salesforce (10%)
○ Combine components into modules, and form “playbooks” for scenarios.
● Sometimes, a simple change of org structure does wonders
○ E.g., Micron sending data scientists into IT and different orgs.
50. When are we halfway there?
● Gauge the percentage of value created from AI and the “data cycle”
● If we are not measuring, we won’t know what we’re missing
● These discussions are only focused on Industry 4.0’s application in factory
productions; using AI for design and business decisions shall be covered in
future topics.