Industry 4.0, aka the "Fourth Industrial Revolution," refers to the computerization of manufacturing. One important aspect of Industry 4.0 is the ability to monitor the health and reliability of a physical manufacturing plant using low-cost IoT sensors. For example, machine learning models can be trained to predict the physical degradation of a manufacturing system as a function of acoustic measurements obtained from strategically placed microphones; however, the same acoustic measurements can be used to reverse engineer proprietary information about the manufacturing process and/or precisely what is being manufactured at the time of recording. Thus, improved reliability and fault tolerance is achieved at the cost of what appears to be an unprecedented new class of security vulnerabilities related to the acoustic side channel.
As a case study, we report a novel acoustic side channel attack against a commercial DNA synthesizer, a commonly used instrument in fields such as synthetic biology. Using a smart phone-quality microphone placed on or in the near vicinity of a DNA synthesizer, we were able to determine with 88.07% accuracy the sequence of DNA being produced; using a database of biologically relevant known-sequences, we increased the accuracy of our model to 100%. An academic or industrial research project may use the synthetic DNA to engineer an organism with desired traits or functions; however, while the organism is still under development, prior to publication, patent, and/or copyright, the research remains vulnerable to academic intellectual property theft and/or industrial espionage. On the other hand, this attack could also be used for benevolent purposes, for example, to determine whether a suspected criminal or terrorist is engineering a harmful pathogen. Thus, it is essential to recognize both the benefits and risks inherent to the cyber-physical systems that will inevitably control Industry 4.0 manufacturing processes and to take steps to mitigate them whenever possible.
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security Vulnerabilities
1. Acoustic Time Series in Industry 4.0
Improved Reliability and Cyber-Security Vulnerabilities
Philip Brisk
Professor
Department of Computer Science and Engineering
University of California, Riverside
3. Industry 4.0 IoT Capabilities
• Optimize industrial throughput and efficiency
• Increase product quality
• Increase manufacturing throughput
• Reduce manufacturing cost
• Reduce manufacturing variability
• Monitor the reliability/degradation of industrial equipment
• Accurately predict failures before they occur
4. Example: Anheuser-Busch InBev
• Ft. Collins CO Brewery: ultrasonic wireless sensors can predict when
machines needed maintenance
• Variations in ultrasonic acoustic signals observed days in advance
• Can accurately predict failure hours in advance
https://www.wsj.com/articles/beer-maker-uses-machine-learning-to-keep-beverages-flowing-11548239401
• Newark, NJ Brewery: optimized filtration process
• Increase length of each filter run by 40-50%
• Increase barrelage per filter run by 60%
• Optimize beer taste
https://cloud.google.com/customers/abinbev-pluto7/
5. IoT Cyber-Security in Industry 4.0
• Challenge: The same data analytics techniques that can assess
physical equipment can also reverse engineer its operation
• Solution: Secure access to physical equipment
• Ensure that only trusted parties may access the sensor
• Solution: Secure access to sensory data
• Ensure that only trusted parties may analyze sensory data at all stages of
analytics processing
9. I Bet You Didn’t See This Coming!!!
• I can recover with 100% accuracy what
DNA is being synthesized using acoustic
measurements and domain knowledge
exclusively.
Sina Faezi, Sujit Rokka Chhetri, Arnav Vaibhav Malawade, John
Charles Chaput, William H. Grover, Philip Brisk, Mohammad
Abdullah Al Faruque: Oligo-Snoop: A Non-Invasive Side Channel
Attack Against DNA Synthesis Machines. NDSS 2019
10. There is a Market for Synthetic DNA!
Drug Discovery
Crop Optimization
Medical Treatment
Archival Data Storage
$38.7 billion by 20201
1R. Singh. (2014) Synthetic
biology market by products (dna
synthesis, oligonucleotide
synthesis, synthetic dna,
synthetic genes, synthetic cells,
xna) and technology (genome
engineering, microfluidics
technologies, dna synthesis and
sequencing technologies) –
global opportunity analysis and
industry forecast, 2013 - 2020
11. Intellectual Property Concerns
• The relevant intellectual property is rarely the synthetic DNA itself
• It is often an organism derived from the DNA
• Knowing the DNA sequence might allow an attacker to infer a
valuable property about your billion-dollar organism
13. From Oligos to DNA
• An AB 3400 can synthesize two
complementary oligo strands, not
the actual DNA
• Someone else has to “combine” the
complementary oligos to form the
double-helix structure of DNA
• Synthetic oligo length is 200-300
bases, which is much shorter than
naturally occurring DNA
https://en.wikipedia.org/wiki/Directionality_(molecular_biology)
14. Attack
Model
• Adversary intent
• Outcome
• Target system
and known
vulnerabilities
• Attack medium
• Attacker
capabilities
• Attacker
resources
• Cost
15. Attacker Capabilities and Resources
• Has domain knowledge of DNA synthesis process
• Has access to the AB 3400 user manual
• Explains machine-specific procedures
• Has access to an AB 3400
• Profiling needed to build an accurate model
• Can place microphone close to the DNA synthesizer
• Wireless transmission: one-time-access
• Otherwise, a second physical visit is needed to retrieve the recording
17. AB 3400
Setup
• User guide
• Available online
• Findable via
Google search
• Site preparation
and safety guide
• Available online
• Findable via
Google search
18. AB 3400 Acoustic Side-Channel
Each valve occupies a unique position in the AB 3400
• The surface area that causes reflections is unique for
each valve
• Reverberation time is unique for each valve
• Collected acoustic signal is unique for each valve
Acoustic sources
• Solenoid valves opening and closing
• Fluid flowing through pipes
• Cooling system fans
• Pressure regulators
20. Attack Model Design (Physics)
• Principle-based equation (ideal, but unrealistic)
• Inverse estimates the sequence (ideal, but unrealistic)
𝐴 = 𝑓(𝑆)
𝑆 = 𝑓−1(𝐴)
S: System state
A: Acoustic side channel
22. Key Steps
Preprocessing: eliminate background noise
Preliminary feature extraction: isolate acoustics for each valve
Signal segmentation: isolate acoustics for base delivery
Feature extraction: convert acoustic signal to a set of features
Nucleotide base classifier: train a classifier that correlates a set of
features to one of the four nucleotide bases
Post-processing: Apply domain-knowledge to correct misprediction errors
𝐴𝑖 = 𝐴𝑖1
, 𝐴𝑖2
, … , 𝐴𝑖 𝑘
→ 𝑓𝑖 = 𝑓𝑖1
, 𝑓𝑖2
, … , 𝑓𝑖 𝑙
, 𝑙 ≪ 𝑘
𝑆𝑖 = 𝑓(𝑓𝑖, 𝜃)
23. Experimental Setup
AB 3400 DNA Synthesizer
Record signals through three
simultaneous channels at 48 kHz
with 24 bits per sample resolution
Zoom H6 audio recorder
• Similar to iPhone 4 microphone
Also use a contact microphone to
record acoustic signals with almost
no environmental noise
24. Training and Evaluation
• Synthesized seven synthetic oligos
• Each with 60-base with 15 A’s, C’s, G’s, and T’s in varying orders
• Each run took 7 hours, 29 minutes, 53 seconds
• Label acquired signal into stages, exploiting information from the AB
3400 user manual
• Initialization (787 seconds)
• Repetitive cycle (463 seconds)
• Base delivery (5 seconds within the repetitive cycle)
25. Machine preparation steps
Nucleotide Base Addition Cycles
Delivery of Base A Delivery of Base C Delivery of Base G
Time (Minute)
Magnitude
Time (second)
Magnitude
Magnitude
Magnitude
Time (second) Time (second)
Base Delivery
• Identify peak locations in signal using continuous wavelet transforms
• Use the cycle script to identify sequences of distances that
correspond to base delivery stage
• For each stage, extract the segment that corresponds to the base
delivery valve operation
26. Feature Extraction
• 57,018 time domain, frequency domain, and wavelet-based features
• Initial Feature selection
• Calculate significance of each feature and select the 75 most relevant features
with the lowest dependency scores
• Improved Feature selection
• Calculate the frequency components with 200 MHz accuracy at frequencies
above 300 Hz with local peaks in the frequency transform
• Calculate significance of each feature and select the 310 most relevant
features with the lowest dependency scores
27. Classifier Training Without Post-Processing
• 200 samples used to train each classifier
• 80% of data set for training, 20% for validation
• 10-fold cross validation
28. Experimental Tradeoffs
• Classifier accuracy degrades when less than 70 samples used for
training
• Classifier accuracy is susceptible to noise
• White noise at 56 dB or higher
• People in the room taking at 65 dB or higher
• Classifier accuracy depends on microphone distance from the DNA
synthesizer
• Degradation occurs at 0.7 meters and further
29. Biologically Relevant DNA Sequences
• Assumptions:
• DNA sequence to be implanted in an living organism to create a protein
• Every 3 bases translates to a certain amino acid
• Four DNA sequences synthesized
• Contoxin: Translates to a lethal protein; highly regulated
• Human Insulin: Originally extracted from pig pancreases; in 1979, DNA encoding
human insulin added to bacteria to produce actual human insulin. Led to
founding of Genentech (multi-$Billion pharmaceutical company)
• 2 peptides: isolated by in vitro selection to bind the protein target streptavidin
30. Domain Knowledge
• Extra Assumption:
• The attacker desires the intended purpose of a reconstructed DNA sequence
• This is more valuable than the actual sequence itself
• BLAST software
• Stores DNA sequences and their functionality
• Can determine the most similar known DNA sequence along with its
application, for a given amino application sequence
• An attacker is satisfied with a positive BLAST match
32. Open Issues
• Different microphones varying in cost/capabilities/distance
• Different AB 3400s used for training/attack
• Variability in acoustic emissions over the lifetime of an AB 3400
• Other DNA synthesis machines
• Possible countermeasures (e.g., internal acoustic padding)
• Similar attacks on other biological laboratory instruments
33. Conclusion
• Industry 4.0 integrates IoT sensing + machine learning into all scales
of manufacturing to improve reliability and efficiency
• While it’s easy to think about heavy industry, don’t ignore
manufacturing in biotech, nanotech, etc.
• Sensing opens up new side channel attack vectors, often coupled with
social engineering
• Successful attacks can benefit significantly from domain knowledge
• Must secure access to physical equipment and sensory data
34. Collaborators
Sujit R. Chhetri (UC Irvine)
Sina Faezi (UC Irvine)
Arnav V. Malawade (UC Irvine)
Mohammad Al Faruque
(UC Irvine)
John C. Chaput
(UC Irvine)
William H. Grover
(UC Riverside)
36. Acknowledgment
This material is based upon work supported by the
National Science Foundation under Grant No.
1740052. Any opinions, findings, and
conclusions or recommendations expressed in
this material are those of the author(s) and do
not necessarily reflect the views of the National
Science Foundation