Materials Science in the Era of Knowledge Discovery and Artificial Inteligence
1. Materials Sciences in the Era of Knowledge
Discovery and Artificial Intelligence
Osvaldo N. Oliveira Jr
chu@ifsc.usp.br
University of São Paulo, Brazil
2. Language is so important that
we should teach it to our
children and to our machines
Osvaldo N. Oliveira Jr - 2021
3. Outline
• Knowledge Discovery
• Sensors and Biosensors
• Machine Learning and Natural Language
Processing
• The Fifth Paradigm
5. Aykol M.; et al., The Materials Research Platform: Defining the Requirements from User
Stories, Matter, 1, 1433-1438 (2019).
Adaptive systems—active-learning and
beyond;
Automation of experiments;
Automation of simulations;
Collaboration;
Data ingestion and sharing;
Integration;
Knowledge discovery;
Machine learning for experiments;
Machine learning for simulations;
Multi-fidelity and uncertainty
quantification;
Reproducibility and provenance;
Scale bridging;
Simulation tools;
Software infrastructure;
Text mining and natural language
processing;
Visualization.
Materials research of the future
6. Data
collection
Visualization
Clustering
Unsupervised ML
Classification
Supervised ML
Data processing pipeline
Data collection: planned experiments for balanced classes
Visualization: multiple methods, user interaction, attribute selection
Clustering: unsupervised machine learning, classes unknown a priori
Classification: supervised machine learning, classes are known. Care to
avoid overfitting on small data sets (as in sensor data)
7. Popolin et al., Bull. Japanese Chem. Soc. 2021
Machine Learning Used to Create a Multidimensional Calibration
Space for Sensing and Biosensing Data
Multidimensional calibration space
• Calibration curve replaced by multidimensional space
• Equation replaced by rules from Decision Trees or Random Forests
• Number of dimensions is the number of features
• Minimum number of rules is number of classes
• Rule coverage – 1 if all instances are classified correctly
• Feature importance – percentage of samples explained
8. Rule r1: Coverage 1.0 (supporting all instances)
IF 5.0 ≤ C (F) @ F1000 (Hz) < 6.0
THEN Class 0.0
Distinction:
4 samples at
10 Hz.
higher
feature
importance
2 samples at
1MHz
1D MCS
6 rules
(minimum)
Full coverage
Same feature
importance
2D MCS
Multidimensional calibration space
9. Popolin et al., Bull. Japanese Chem. Soc. - 2021
Machine Learning Used to Create a Multidimensional Calibration
Space for Sensing and Biosensing Data
Multidimensional calibration space
Detection of phytic acid with a bad sensor. Capacitance at
three frequencies to generate MCS (3D)
Seven rules used to classify samples with 5 concentrations. Rule coverage was
usually lower than one, and the highest feature importance applied to F100
10. Popolin et al., Bull. Japanese Chem. Soc. - 2021
Machine Learning Used to Create a Multidimensional Calibration
Space for Sensing and Biosensing Data
Multidimensional calibration space
Rules from
Decision
Trees
11. Milk samples: S.aureus concentrations: 0 - 107 CFU/mL discretized as classes. MCS has 5 dimensions
(F1000, F21, F46, F10000 and F464158). Most important feature: F1000 with importance value of 0.33.
Soares et al. Detection of Staphylococcus aureus in milk samples using impedance spectroscopy and data
processing with information visualization and machine learning (Sensors & Actuators Reports, 2022)
Immunosensor to detect bacteria in milk
12. • MCS
• Nested K-Fold
Riul Jr et al. Analyst, 135, 2010.
• Salmonella
• Lactose
• Mucin
• NaOH
• H2O
Detection of S.aureus with immunosensors
A.C. Soares et al., Analyst, 2020
Mastitis Diagnosis
Diagnosis with an electronic tongue
6-Dimension MCS to detect bacteria in
crude milk samples: Average accuracy:
94%.
A.C. Soares et al, Chem. Eng. J., 2022
13. Braz et al. Using machine learning and an electronic tongue for discriminating saliva
samples from cancer patients and healthy individuals (Talanta, 2022)
MCS: 26 dimensions - 19 frequencies and 7 clinic features. Most important
features: 2 first columns, frequency 215 Hz and "alcoholism_no".
E-tongue for cancer diagnosis
14. Genosensor to detect SARS-CoV-2
Gold electrodes coated with SAM
functionalized with EDC/NHS and a
layer of ssDNA sequences
Probe: cp DNA SARS-CoV-2: 5’-5AmMC6/-
ATTTCGCTGATTTTGGGGTC-3’
Positive Control: ssDNA SARS-CoV-2
5’-
TGATAATGGACCCCAAAATCAGCGAAATGC
ACCCCGCATTACGTTTGGTGGACCCTCAGA
TTCAACTGGCAGTAACCAGA-3’
Negative control: From TP53 gene
5’ - CCCATCCTCACCATCATCACA
CTGGAAGACTCCAGTGGTAATCTACTGGGA
CGGAACAGCTTTGAGGTGCGGTTTGTG - 3’
Impedance spectroscopy (IS)
Electrochemical IS
Optical – LSPR
Image analysis J.C. Soares et al, Materials Chemistry Frontiers, 2021
15. (a) blank
(b) negative control
(c) HPV16
(d) PCA3
(e) 10−18 mol L−1
(f) 10−16 mol L−1
(g) 10−14 mol L-1
(h) 10−12 mol L−1
(i) 10−10 mol L−1
(j) 10−8 mol L−1
(k) 10−6 mol L−1
Scale bar: 50µm.
Image Analysis
Supervised machine learning
99.7% accuracy in binary
classification with SVM
95.8% accuracy in multiclass
with LDA Soares et al, Materials Chemistry Frontiers, 2021
16. 250 PFU
6000 PFU
100 nm
200 nm 200 nm
200 nm 200 nm
200 nm
Functionalized AuNPs aggregate after exposure to 250 and
6000 PFU of inactivated SARS-CoV-2.
(D)
(E)
(F)
(G)
(H)
(I)
Absorbance
Efficiency
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Absorbance
Efficiency
Wavelength (nm)
500 600 700
Wavelength (nm)
500 600 700
Conf_1
Conf_2
Conf_3
Conf_4
Conf_5
Avg_3
Conf_1
Conf_2
Conf_3
Conf_4
Conf_5
Avg_2
Conf_1
Conf_2
Conf_3
Conf_4
Conf_5
Avg_1
Gold_NP
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Absorbance
Efficiency
Wavelength (nm)
500 600 700
30
40
50
60
70
80
90
100
X (nm)
-60 -20 20
-40 0
Z
(nm)
30
40
50
60
70
80
90
100
Z
(nm)
X (nm)
0 40 80
20 60
-40
-30
-20
-10
0
10
20
30
Z
(nm)
X (nm)
-60 -20 20
-40 0 40
100
-10
0
10
20
30
Z
(nm)
90
X (nm)
80 60 40
70 50 30
-40
X (nm)
-60 -20 20
-40 0 40
-30
-20
-10
0
10
20
30
Z
(nm)
-40
X (nm)
-60 -20 20
-40 0 40
-30
-20
-10
0
10
20
30
Z
(nm)
3
3.5
4
4.5
5
5.5
6
6.5
2.5
2
1.5
1
10
12
14
16
18
20
22
24
8
6
4
2
50
100
150
200
250
(A)
(B)
(C)
Computer simulations indicate that clustering of the
functionalized AuNPs is essential for detection
Colorimetric detection of SARS-CoV-2 virus using a smartphone app and a plasmonic biosensor
Materón et al, Unpublished
Detection in 5 min
17. 400 500 600 700
0.0
0.2
0.4
0.6
0.8
1.0
Absorbance
Wavelength (nm)
A
400 500 600 700 800
0.0
0.2
0.4
Absorbance
Wavelength (nm)
AuNp
avg_1
avg_2
avg_3
avg_4
avg_8
avg_16
avg_32
avg_69
B
636
526
2981
0
f-AuNPs with SARS-CoV-2 (0 - 2980 PFU mL-1). Spectral
absorption efficiency clusters (FDTD simulations).
Inactivated SARS-CoV-2 and tests
with human saliva.
Colorimetric detection of SARS-CoV-2 virus using a smartphone app and a plasmonic biosensor
Materón et al, Unpublished
Distinction of SARS-CoV-2 at various concentrations.
No effects from interferents
18. Elastic mechanochromic sensor
Color change
is reversible
Color changes
during stretching/releasing
cycles
Mechanochromic sensors
Works under sunlight and
under water
20. Machine Learning prediction
The CV system predicts deformation based on color change
Castro et al, Experts Systems with Applications, 2022
21. Data analysis and diagnostics
Materials design and discovery
Knowledge discovery
Machine learning and materials
“machine learning and
(chemistry or
materials discovery)”
• Students trained to interact with AI experts, and identify opportunities. No need to write
computer programs, but they should understand the concepts, limitations, risks of misuse.
• Students could be trained to use the software packages
• Students trained to use the software packages and write programs implementing ML algorithms.
22. Starting point: clustering of atomic species and structures. ML model to group structures
according to the possible composition, crystal point group, and local distortions.
ML strategy to determine and predict magnetism (step
I) in a 2D compound and specific magnetic ordering
(step II).
23. Two glasses discovered with machine learning and genetic algorithms from a database of 45,032 compositions:
Refractive index for Glass 1 and Glass 2: 1.713(1) and 1.749(1), within predicted values (1.71(3) and 1.76(3)).
They met the design properties (refractive index above 1.7 and a glass transition temperature below 500 °C).
Designing optical glasses by machine learning coupled with a genetic algorithm
Daniel R. Cassar, Gisele G. Santos, Edgar D. Zanotto, Ceramics International 47 (2021)
Predicted
Refractive index
Glass transition
temperature
Materials Discovery - Glass
25. Identify:
• Precursors (sludge, agriculture waste)
• Synthesis and post-synthesis methods
• Synthesis conditions
Find correlations:
• Most used precursors for agriculture, fuel, adsorbents
• Most efficient precursors for CFM production depending on the
synthesis method
• Optimized synthesis conditions depending on the precursors and
method
• CFM properties and possible applications
10,975 scientific articles
on carbon functional materials (CFM)
Knowledge Discovery in practice
26. Patient History
Repository
Preprocessing Data Mining
Diagnosis
Visualization
INPUT
Knowledge
Transformation
Discretization
Cleaning
Selection,
binarization,
...
Clustering
Classification
Regression,
...
Reports
Sensors
Images
Patient
History of
Patients
Holy Grail: Diagnostics in the future
Oliveira et al., Chem. Lett. Japan, 2014
27. The Fifth Paradigm
• 1st Empirical, descriptive
• 2nd Theory and experiment
• 3rd Theory, experiment, computer simulation
• 4th All of the above + Big Data
• 5th Machine-generated knowledge
29. Some Requirements
• Text analytics – large text databases
• Lots of data: experimental, theoretical (DFT, etc)
and simulation (MD, etc)
• Internet of Things
• Machine Learning Methods (Deep Learning, etc)
Computer-assisted diagnosis as an example
30. Machine learning will change the landscape of science and
technology in the XXI century.
In a few decades, most intellectual tasks will be better
performed by machines.
Is society being prepared for that?
The machines of the future
Final Recommendation/Provocation
• How would an intelligent machine solve the scientific problem you
are addressing?
• Are you sure the problem could not be obviated by other means?
31. ACS Applied Materials & Interfaces
ACS Applied Nano Materials
ACS Applied Polymer Materials
ACS Applied Energy Materials
ACS Applied Bio Materials
ACS Applied Electronic Materials
ACS Applied Optical Materials
ACS Applied Engineering Materials
Available for free
32. Acknowledgments
Roberto M. Faria, Débora Gonçalves, Paulo B. Miranda, Gregório C. Faria, Débora T. Balogh, Rafael M. Maki, Robson R.
Silva, Maria Cristina F. Oliveira, Fernando V. Paulovich, José F. Rodrigues Jr., Tácito A. Neves, Alexandre Delbem, Valtencir
Zucolotto, Frank N. Crespilho, Andrey C. Soares, Flávio M. Shimizu, Juliana C. Soares, Nirav Joshi, Gustavo F. Nascimento,
Valquíria C. R. Barioto, Paulo A. R. Pereira, Nathália O. Gomes, Sérgio A.S. Machado, Cristiane M. Daikuzono, Giovana
Rosso, Deivy Wilson, Rafael O. Pedro, Olívia Carr, Gisela Ibañez-Redin, Beatriz Tirich, Elsa M. Materón, Anderson M.
Campos, Lorenzo Buscaglia, Eder Cavalheiro, Lucas Ribas, Leonardo Scabini, Odemir M. Bruno, Luciano F. Costa, Sandra M.
Aluísio, Graça Nunes, Thiago A. Pardo, Diego R. Amancio, Filipi N. Silva, Daniel C. Braz, Lucas C. Castro, Faustino Reyez-
Gómez, José Luiz Bott, Thiago S. Martins, André Ponce de Leon Carvalho, Emanuel Carrilho (USP)
Carlos J.L. Constantino, Priscila Aléssio, Sabrina A. Camacho (FCT-Unesp), Luciano Caseli (Unifesp-Diadema), Pedro Aoki
(Unesp-Assis), Marystela Ferreira, Fábio L. Leite, Carolina Bueno, Jéssica Ierich, Cléber Dantas (UFSCar – Sorocaba), Caio G.
Otoni, Ronaldo C. Faria (UFSCar), Marli L. Moraes (Unifesp-SJ Campos), José R. Siqueira Jr. (UFTM-Uberaba), Antonio Riul
Jr, Monara Kaelle, Pedro Vieira, Varlei Rodrigues (Unicamp), Luiz H. C. Mattoso, João M. Naime, Rejane Trombini, Ednaldo
J. Ferreira, Paulo S.P. Herrmann, Daniel S. Corrêa (Embrapa), Hernane S. Barud (Uniara), Rafael R. Domeneguetti, Sidney J.
L. Ribeiro (Unesp, Araraquara), Ângelo L. Gobbi, Carlos Costa, Maria Helena Piazzetta (LNNano), Matias Melendez, Ana
Carolina Carvalho, Alexandre C. Santos, Eliney F. Faria, Lídia Rebolho Arantes, André L. Carvalho, Rui M. Reis (HCB),
Ricardo Azevedo (UnB)
Martin Taylor (Bangor, UK), Ricardo F. Aroca (Windsor, Canada), Maria Bardosova (Cork, Ireland), Dermot Diamond, Larisa
Florea (Dublin, Ireland), Alexandre Brolo (Victoria, Canada), Ana Barros (Aveiro, Portugal), Maria Raposo, Paulo A. Ribeiro,
Elvira Fortunato, Rodrigo Martins (Lisbon, Portugal)