Velimir V Vesselinov (monty) 2019
Unsupervised machine learning methods for feature extraction
New Mexico Big Data & Analytics Summit, http://nmbdas.com, Albuquerque, February 2019.
http://tensors.lanl.gov
LA-UR-19-21450
Vesselinov 2018 Novel machine learning methods for extraction of features cha...Velimir (monty) Vesselinov
Vesselinov, V.V., Novel Machine Learning Methods for Extraction of Features Characterizing Complex Datasets and Models, Recent Advances in Machine Learning and Computational Methods for Geoscience, Institute for Mathematics and its Applications, University of Minnesota, 2018.
LA UR-18-30987
Novel Machine Learning Methods for Extraction of Features Characterizing Data...Velimir (monty) Vesselinov
Vesselinov, V.V., Novel Machine Learning Methods for Extraction of Features Characterizing Datasets and Models, AGU Fall meeting, Washington D.C., 2018.
Novel Machine Learning Methods for Extraction of Features Characterizing Comp...Velimir (monty) Vesselinov
1) Unsupervised machine learning methods like non-negative matrix factorization (NMF) and non-negative tensor factorization (NTF) are used to extract hidden features from datasets without prior information or training.
2) NMF/NTF decompose datasets into core tensors and factor matrices to identify dominant patterns and compress large datasets for analysis.
3) The document provides an example of using NTF to analyze simulations of reactive mixing, extracting the main time/space features representing physical processes from over 200GB of model output data.
This document discusses pattern recognition. It defines a pattern as a set of measurements describing a physical object and a pattern class as a set of patterns sharing common attributes. Pattern recognition involves relating perceived patterns to previously perceived patterns to classify them. The goals are to put patterns into categories and learn to distinguish patterns of interest. Examples of pattern recognition applications include optical character recognition, biometrics, medical diagnosis, and military target recognition. Common approaches to pattern recognition are statistical, neural networks, and structural. The process involves data acquisition, pre-processing, feature extraction, classification, and post-processing. An example of classifying fish into salmon and sea bass is provided.
Fundementals of Machine Learning and Deep Learning ParrotAI
Introduction to machine learning and deep learning to beginners.Learn the applications of machine learning and deep learning and how ti can solve different problems
This course provides an introduction to machine learning techniques and methods. It covers machine learning paradigms such as supervised learning techniques including regression and classification algorithms, unsupervised learning techniques including clustering, and reinforcement learning. Students will learn how to apply machine learning algorithms to problems using programming tools like Matlab and Python. References listed provide additional resources for further learning on topics like neural networks, decision trees, naive Bayes classifiers, and more.
Pattern recognition involves the identification of recurring trends or structures within a given dataset, enabling us to recognize similarities and make predictions. They provide insights into underlying concepts and facilitate informed decision-making based on observed regularities. In machine learning, pattern recognition employs advanced algorithms to detect and analyze regularities within data. This field has wide-ranging applications, particularly in technical domains such as computer vision, speech recognition, and face recognition. Pattern recognition utilizes statistical information, historical data, and the system’s memory to recognize and classify events or entities.
One key attribute of pattern recognition is the ability to learn from data. It leverages available data to improve its performance continually. ML adapts and refines its algorithms through training and iterative processes, enhancing the accuracy and efficiency of pattern recognition. For instance, in the context of recommending books or movies, if a user consistently prefers black comedies, machine learning algorithms can recognize this pattern and suggest similar genre preferences, avoiding suggestions that do not align with the established pattern.
Vesselinov 2018 Novel machine learning methods for extraction of features cha...Velimir (monty) Vesselinov
Vesselinov, V.V., Novel Machine Learning Methods for Extraction of Features Characterizing Complex Datasets and Models, Recent Advances in Machine Learning and Computational Methods for Geoscience, Institute for Mathematics and its Applications, University of Minnesota, 2018.
LA UR-18-30987
Novel Machine Learning Methods for Extraction of Features Characterizing Data...Velimir (monty) Vesselinov
Vesselinov, V.V., Novel Machine Learning Methods for Extraction of Features Characterizing Datasets and Models, AGU Fall meeting, Washington D.C., 2018.
Novel Machine Learning Methods for Extraction of Features Characterizing Comp...Velimir (monty) Vesselinov
1) Unsupervised machine learning methods like non-negative matrix factorization (NMF) and non-negative tensor factorization (NTF) are used to extract hidden features from datasets without prior information or training.
2) NMF/NTF decompose datasets into core tensors and factor matrices to identify dominant patterns and compress large datasets for analysis.
3) The document provides an example of using NTF to analyze simulations of reactive mixing, extracting the main time/space features representing physical processes from over 200GB of model output data.
This document discusses pattern recognition. It defines a pattern as a set of measurements describing a physical object and a pattern class as a set of patterns sharing common attributes. Pattern recognition involves relating perceived patterns to previously perceived patterns to classify them. The goals are to put patterns into categories and learn to distinguish patterns of interest. Examples of pattern recognition applications include optical character recognition, biometrics, medical diagnosis, and military target recognition. Common approaches to pattern recognition are statistical, neural networks, and structural. The process involves data acquisition, pre-processing, feature extraction, classification, and post-processing. An example of classifying fish into salmon and sea bass is provided.
Fundementals of Machine Learning and Deep Learning ParrotAI
Introduction to machine learning and deep learning to beginners.Learn the applications of machine learning and deep learning and how ti can solve different problems
This course provides an introduction to machine learning techniques and methods. It covers machine learning paradigms such as supervised learning techniques including regression and classification algorithms, unsupervised learning techniques including clustering, and reinforcement learning. Students will learn how to apply machine learning algorithms to problems using programming tools like Matlab and Python. References listed provide additional resources for further learning on topics like neural networks, decision trees, naive Bayes classifiers, and more.
Pattern recognition involves the identification of recurring trends or structures within a given dataset, enabling us to recognize similarities and make predictions. They provide insights into underlying concepts and facilitate informed decision-making based on observed regularities. In machine learning, pattern recognition employs advanced algorithms to detect and analyze regularities within data. This field has wide-ranging applications, particularly in technical domains such as computer vision, speech recognition, and face recognition. Pattern recognition utilizes statistical information, historical data, and the system’s memory to recognize and classify events or entities.
One key attribute of pattern recognition is the ability to learn from data. It leverages available data to improve its performance continually. ML adapts and refines its algorithms through training and iterative processes, enhancing the accuracy and efficiency of pattern recognition. For instance, in the context of recommending books or movies, if a user consistently prefers black comedies, machine learning algorithms can recognize this pattern and suggest similar genre preferences, avoiding suggestions that do not align with the established pattern.
Machine learning involves systems learning from data rather than following explicitly programmed instructions. There are three main types of machine learning: supervised learning where labeled examples are provided, unsupervised learning where no labels are given, and reinforcement learning where an agent learns to achieve a goal in an environment through trial-and-error using rewards. The document provides examples of applications for each type and discusses algorithms commonly used for classification, clustering, and other machine learning tasks.
This document provides an overview of machine learning. It begins with an introduction and definitions, explaining that machine learning allows computers to learn without being explicitly programmed by exploring algorithms that can learn from data. The document then discusses the different types of machine learning problems including supervised learning, unsupervised learning, and reinforcement learning. It provides examples and applications of each type. The document also covers popular machine learning techniques like decision trees, artificial neural networks, and frameworks/tools used for machine learning.
This document provides an overview of machine learning. It begins with an introduction and discusses the basics, types (supervised, unsupervised, reinforcement learning), technologies, applications, and vision for the next few years. Key points covered include definitions of machine learning, examples of applications (search engines, spam filters, personalized recommendations), and descriptions of different problem types (classification, regression, clustering) and learning approaches (decision trees, neural networks, Bayesian methods).
This document provides information about an internship in artificial intelligence using Python. It includes definitions of common AI abbreviations and compares human organs to AI tools. It also discusses basics of AI, concepts in AI like machine learning and neural networks, qualities of humans and AI, important IDE software, useful Python packages, types of AI and machine learning, supervised and unsupervised machine learning algorithms, and the methodology for an image classification project including preprocessing data and extracting features from images.
This document provides information about an internship in artificial intelligence using Python. It includes abbreviations commonly used in AI and machine learning and compares human organs to AI tools. It also discusses basics of AI, concepts in AI like machine learning and neural networks, qualities of humans and AI, important software for AI like Anaconda and TensorFlow, and types of machine learning algorithms. The document provides an overview of the topics that will be covered in the internship.
The document provides an overview of machine learning, including a brief history noting pioneers like Alan Turing, Arthur Samuel, and Frank Rosenblatt. It describes different machine learning algorithms and applications in domains such as healthcare, banking, and retail. The document concludes by discussing current trends in machine learning research and careers involving machine learning skills.
The document discusses machine learning, defining it as using algorithms to automatically learn from labeled examples to create hypotheses that can predict labels for new examples. It provides examples of machine learning applications like spam filtering and autonomous vehicles, and covers different types of learning algorithms like decision trees and neural networks that are used to perform these tasks. The document also discusses why machine learning is useful and relevant disciplines like statistics, psychology, and computer science that contribute to its development.
Artificial intelligence is the ability of machines to learn and solve problems like humans. There are two main types of machine learning: supervised learning, where the machine is provided labeled input and output data to learn from, and unsupervised learning, where only unlabeled input data is provided. Supervised learning techniques include regression and classification, while unsupervised techniques include clustering and association. Some applications of AI include medical image analysis, autonomous vehicle control, games, robotic toys, bioinformatics, text analysis and natural language processing.
1. Machine learning involves using algorithms to learn from data without being explicitly programmed. It is an interdisciplinary field that draws from statistics, computer science, and many other areas.
2. There are massive amounts of data being generated every day from sources like Google, Facebook, YouTube, and more. This data provides opportunities for machine learning applications.
3. Machine learning tasks can be supervised, involving labeled example data, or unsupervised, involving unlabeled data. Supervised learning predicts labels for new data based on patterns in labeled training data, while unsupervised learning finds hidden patterns in unlabeled data.
This document provides an overview of a Machine Learning course, including:
- The course is taught by Max Welling and includes homework, a project, quizzes, and a final exam.
- Topics covered include classification, neural networks, clustering, reinforcement learning, Bayesian methods, and more.
- Machine learning involves computers learning from data to improve performance and make predictions. It is a subfield of artificial intelligence.
This document provides an overview of machine learning, including definitions, key concepts, and different types of machine learning models. It defines machine learning as a field that gives computers the ability to learn without being explicitly programmed. The document discusses general machine learning terms like data, datasets, features, labels, and models. It also explains the differences between supervised learning techniques like regression and classification, unsupervised learning, and reinforcement learning. Examples of linear regression models and neural networks are provided.
1. Dr. R. Gunavathi of the PG and Research Department of Computer Applications at [institution name redacted] organized a seminar on IoT applications and machine learning.
2. The seminar featured a presentation by Assistant Professor Sushama of JECRC University on machine learning and its applications.
3. Machine learning involves using algorithms to improve performance on tasks based on experience. It is commonly used when human expertise is limited, models must be customized, or huge amounts of data are involved.
Pattern recognition and Machine Learning.Rohit Kumar
Machine learning involves using examples to generate a program or model that can classify new examples. It is useful for tasks like recognizing patterns, generating patterns, and predicting outcomes. Some common applications of machine learning include optical character recognition, biometrics, medical diagnosis, and information retrieval. The goal of machine learning is to build models that can recognize patterns in data and make predictions.
Machine learning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention
A tutorial on deep learning at icml 2013Philip Zheng
This document provides an overview of deep learning presented by Yann LeCun and Marc'Aurelio Ranzato at an ICML tutorial in 2013. It discusses how deep learning learns hierarchical representations through multiple stages of non-linear feature transformations, inspired by the hierarchical structure of the mammalian visual cortex. It also compares different types of deep learning architectures and training protocols.
This document provides an introduction to machine learning and inductive inference. It discusses what machine learning is, common learning tasks like concept learning and function learning, different data representations, and example applications such as knowledge discovery and building adaptive systems. The course will cover generalizing from specific examples to broader concepts through inductive inference and different learning approaches.
Artificial Intelligence IA at the service of LaboratoriesYvon Gervaise
Share my conference given in Paris on February 6 , 2024 at the 1st Green Analytical Chemistry Worrkshop with the Theme : " Artificial ( IA) at the service of Laboratories "
Conférence Y. GervaiseEN1st Green Analytical Y. Gervaise.pdfYvonGervaise
share my conference given in Paris on February 6, 2024 at the 1st Green Analytical Chemistry Workshop with the theme : Artificial Intelligence at the service of Laboratories
This document introduces machine learning algorithms. It discusses supervised and unsupervised learning problems and strategies. It provides examples of machine learning applications including neural networks for handwritten digit recognition, evolutionary algorithms for nozzle design, and Bayesian networks for gene expression analysis.
This document introduces machine learning algorithms. It discusses supervised and unsupervised learning problems and strategies. It provides examples of machine learning applications including neural networks for handwritten digit recognition, evolutionary algorithms for nozzle design, and Bayesian networks for gene expression analysis.
Physics-Informed Machine Learning Methods for Data Analytics and Model Diagno...Velimir (monty) Vesselinov
This document summarizes research on physics-informed machine learning methods for data and model analysis. Key points include:
1) The methods couple data and model analytics to extract common hidden features using techniques like nonnegative tensor factorization.
2) Physics constraints are incorporated to identify important processes in datasets and model outputs.
3) The methods have been applied to analyze climate model outputs from Europe to identify dominant patterns in air temperature and water tables over time.
Data and Model-Driven Decision Support for Environmental Management of a Chro...Velimir (monty) Vesselinov
Vesselinov, V.V., Katzman, D., Broxton, D., Birdsell, K., Reneau, S., Vaniman, D., Longmire, P., Fabryka-Martin, J., Heikoop, J., Ding, M., Hickmott, D., Jacobs, E., Goering, T., Harp, D., Mishra, P., Data and Model-Driven Decision Support for Environmental Management of a Chromium Plume at Los Alamos National Laboratory (LANL), Waste Management Symposium 2013, Session 109: ER Challenges: Alternative Approaches for Achieving End State, Phoenix, AZ, February 28, 2013.
More Related Content
Similar to Unsupervised machine learning methods for feature extraction
Machine learning involves systems learning from data rather than following explicitly programmed instructions. There are three main types of machine learning: supervised learning where labeled examples are provided, unsupervised learning where no labels are given, and reinforcement learning where an agent learns to achieve a goal in an environment through trial-and-error using rewards. The document provides examples of applications for each type and discusses algorithms commonly used for classification, clustering, and other machine learning tasks.
This document provides an overview of machine learning. It begins with an introduction and definitions, explaining that machine learning allows computers to learn without being explicitly programmed by exploring algorithms that can learn from data. The document then discusses the different types of machine learning problems including supervised learning, unsupervised learning, and reinforcement learning. It provides examples and applications of each type. The document also covers popular machine learning techniques like decision trees, artificial neural networks, and frameworks/tools used for machine learning.
This document provides an overview of machine learning. It begins with an introduction and discusses the basics, types (supervised, unsupervised, reinforcement learning), technologies, applications, and vision for the next few years. Key points covered include definitions of machine learning, examples of applications (search engines, spam filters, personalized recommendations), and descriptions of different problem types (classification, regression, clustering) and learning approaches (decision trees, neural networks, Bayesian methods).
This document provides information about an internship in artificial intelligence using Python. It includes definitions of common AI abbreviations and compares human organs to AI tools. It also discusses basics of AI, concepts in AI like machine learning and neural networks, qualities of humans and AI, important IDE software, useful Python packages, types of AI and machine learning, supervised and unsupervised machine learning algorithms, and the methodology for an image classification project including preprocessing data and extracting features from images.
This document provides information about an internship in artificial intelligence using Python. It includes abbreviations commonly used in AI and machine learning and compares human organs to AI tools. It also discusses basics of AI, concepts in AI like machine learning and neural networks, qualities of humans and AI, important software for AI like Anaconda and TensorFlow, and types of machine learning algorithms. The document provides an overview of the topics that will be covered in the internship.
The document provides an overview of machine learning, including a brief history noting pioneers like Alan Turing, Arthur Samuel, and Frank Rosenblatt. It describes different machine learning algorithms and applications in domains such as healthcare, banking, and retail. The document concludes by discussing current trends in machine learning research and careers involving machine learning skills.
The document discusses machine learning, defining it as using algorithms to automatically learn from labeled examples to create hypotheses that can predict labels for new examples. It provides examples of machine learning applications like spam filtering and autonomous vehicles, and covers different types of learning algorithms like decision trees and neural networks that are used to perform these tasks. The document also discusses why machine learning is useful and relevant disciplines like statistics, psychology, and computer science that contribute to its development.
Artificial intelligence is the ability of machines to learn and solve problems like humans. There are two main types of machine learning: supervised learning, where the machine is provided labeled input and output data to learn from, and unsupervised learning, where only unlabeled input data is provided. Supervised learning techniques include regression and classification, while unsupervised techniques include clustering and association. Some applications of AI include medical image analysis, autonomous vehicle control, games, robotic toys, bioinformatics, text analysis and natural language processing.
1. Machine learning involves using algorithms to learn from data without being explicitly programmed. It is an interdisciplinary field that draws from statistics, computer science, and many other areas.
2. There are massive amounts of data being generated every day from sources like Google, Facebook, YouTube, and more. This data provides opportunities for machine learning applications.
3. Machine learning tasks can be supervised, involving labeled example data, or unsupervised, involving unlabeled data. Supervised learning predicts labels for new data based on patterns in labeled training data, while unsupervised learning finds hidden patterns in unlabeled data.
This document provides an overview of a Machine Learning course, including:
- The course is taught by Max Welling and includes homework, a project, quizzes, and a final exam.
- Topics covered include classification, neural networks, clustering, reinforcement learning, Bayesian methods, and more.
- Machine learning involves computers learning from data to improve performance and make predictions. It is a subfield of artificial intelligence.
This document provides an overview of machine learning, including definitions, key concepts, and different types of machine learning models. It defines machine learning as a field that gives computers the ability to learn without being explicitly programmed. The document discusses general machine learning terms like data, datasets, features, labels, and models. It also explains the differences between supervised learning techniques like regression and classification, unsupervised learning, and reinforcement learning. Examples of linear regression models and neural networks are provided.
1. Dr. R. Gunavathi of the PG and Research Department of Computer Applications at [institution name redacted] organized a seminar on IoT applications and machine learning.
2. The seminar featured a presentation by Assistant Professor Sushama of JECRC University on machine learning and its applications.
3. Machine learning involves using algorithms to improve performance on tasks based on experience. It is commonly used when human expertise is limited, models must be customized, or huge amounts of data are involved.
Pattern recognition and Machine Learning.Rohit Kumar
Machine learning involves using examples to generate a program or model that can classify new examples. It is useful for tasks like recognizing patterns, generating patterns, and predicting outcomes. Some common applications of machine learning include optical character recognition, biometrics, medical diagnosis, and information retrieval. The goal of machine learning is to build models that can recognize patterns in data and make predictions.
Machine learning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention
A tutorial on deep learning at icml 2013Philip Zheng
This document provides an overview of deep learning presented by Yann LeCun and Marc'Aurelio Ranzato at an ICML tutorial in 2013. It discusses how deep learning learns hierarchical representations through multiple stages of non-linear feature transformations, inspired by the hierarchical structure of the mammalian visual cortex. It also compares different types of deep learning architectures and training protocols.
This document provides an introduction to machine learning and inductive inference. It discusses what machine learning is, common learning tasks like concept learning and function learning, different data representations, and example applications such as knowledge discovery and building adaptive systems. The course will cover generalizing from specific examples to broader concepts through inductive inference and different learning approaches.
Artificial Intelligence IA at the service of LaboratoriesYvon Gervaise
Share my conference given in Paris on February 6 , 2024 at the 1st Green Analytical Chemistry Worrkshop with the Theme : " Artificial ( IA) at the service of Laboratories "
Conférence Y. GervaiseEN1st Green Analytical Y. Gervaise.pdfYvonGervaise
share my conference given in Paris on February 6, 2024 at the 1st Green Analytical Chemistry Workshop with the theme : Artificial Intelligence at the service of Laboratories
This document introduces machine learning algorithms. It discusses supervised and unsupervised learning problems and strategies. It provides examples of machine learning applications including neural networks for handwritten digit recognition, evolutionary algorithms for nozzle design, and Bayesian networks for gene expression analysis.
This document introduces machine learning algorithms. It discusses supervised and unsupervised learning problems and strategies. It provides examples of machine learning applications including neural networks for handwritten digit recognition, evolutionary algorithms for nozzle design, and Bayesian networks for gene expression analysis.
Similar to Unsupervised machine learning methods for feature extraction (20)
Physics-Informed Machine Learning Methods for Data Analytics and Model Diagno...Velimir (monty) Vesselinov
This document summarizes research on physics-informed machine learning methods for data and model analysis. Key points include:
1) The methods couple data and model analytics to extract common hidden features using techniques like nonnegative tensor factorization.
2) Physics constraints are incorporated to identify important processes in datasets and model outputs.
3) The methods have been applied to analyze climate model outputs from Europe to identify dominant patterns in air temperature and water tables over time.
Data and Model-Driven Decision Support for Environmental Management of a Chro...Velimir (monty) Vesselinov
Vesselinov, V.V., Katzman, D., Broxton, D., Birdsell, K., Reneau, S., Vaniman, D., Longmire, P., Fabryka-Martin, J., Heikoop, J., Ding, M., Hickmott, D., Jacobs, E., Goering, T., Harp, D., Mishra, P., Data and Model-Driven Decision Support for Environmental Management of a Chromium Plume at Los Alamos National Laboratory (LANL), Waste Management Symposium 2013, Session 109: ER Challenges: Alternative Approaches for Achieving End State, Phoenix, AZ, February 28, 2013.
Environmental Management Modeling Activities at Los Alamos National Laborator...Velimir (monty) Vesselinov
esselinov, V.V., et al., Environmental Management Modeling Activities at Los Alamos National Laboratory (LANL), Department of Energy Technical Exchange Meeting, Performance Assessment Community of Practice, Hanford, April 13-14, 2010.
GNI: Coupling Model Analysis Tools and High-Performance Subsurface Flow and T...Velimir (monty) Vesselinov
Vesselinov, V.V., et al., AGNI: Coupling Model Analysis Tools and High-Performance Subsurface Flow and Transport Simulators for Risk and Performance Assessments, XIX International Conference on Computational Methods in Water Resources (CMWR 2012), University of Illinois at Urbana-Champaign, June 17-22, 2012.
Tomographic inverse estimation of aquifer properties based on pressure varia...Velimir (monty) Vesselinov
Vesselinov, V.V., Harp, D., Koch, R., Birdsell, K., Katzman, K., Tomographic inverse estimation of aquifer properties based on pressure variations caused by transient water-supply pumping, <em>AGU Meeting</em>, San Francisco, CA, December 15-19, 2008.
103. Vesselinov, V.V., Uncertainties in Transient Capture-Zone Estimates, CMWR 2006 XVI International Conference on Computational Methods in Water Resources, Copenhagen, Denmark, 18-22 June 2006.
Model-driven decision support for monitoring network design based on analysis...Velimir (monty) Vesselinov
Vesselinov, V.V., Harp, D., Katzman, D., Model-driven decision support for monitoring network design based on analysis of data and model uncertainties: methods and applications, H32F: Uncertainty Quantification and Parameter Estimation: Impacts on Risk and Decision Making, AGU Fall meeting, San Francisco, December 3-7, 2012, LA-UR-13-20189, (invited).
Decision Analyses Related to a Chromium Plume at Los Alamos National LaboratoryVelimir (monty) Vesselinov
Vesselinov, V.V., O'Malley, D., Katzman, D., Model-Assisted Decision Analyses Related to a Chromium Plume at Los Alamos National Laboratory, Waste Management Symposium, Phoenix, AZ, 2015.
Decision Support for Environmental Management of a Chromium Plume at Los Alam...Velimir (monty) Vesselinov
Vesselinov, V.V., Katzman, D., Broxton, D., Birdsell, K., Reneau, S., Vaniman, D., Longmire, P., Fabryka-Martin, J., Heikoop, J., Ding, M., Hickmott, D., Jacobs, E., Goering, T., Harp, D., Mishra, P., Data and Model-Driven Decision Support for Environmental Management of a Chromium Plume at Los Alamos National Laboratory (LANL), Waste Management Symposium 2013, Session 109: ER Challenges: Alternative Approaches for Achieving End State, Phoenix, AZ, February 28, 2013.
Reduced Order Models for Decision Analysis and Upscaling of Aquifer Heterogen...Velimir (monty) Vesselinov
Vesselinov, V.V., O'Malley, D., Alexandrov, B., Moore, B., Reduced Order Models for Decision Analysis and Upscaling of Aquifer Heterogeneity, AGU Fall Meeting, San Francisco, CA, 2016, (invited).
ZEM: Integrated Framework for Real-Time Data and Model Analyses for Robust En...Velimir (monty) Vesselinov
Vesselinov, V.V., O'Malley, D., Katzman, D., ZEM: Integrated Framework for Real-Time Data and Model Analyses for Robust Environmental Management Decision Making, Waste Management Symposium, Phoenix, AZ, 2016.
Vesselinov, V.V., O'Malley, D., Katzman, D., Decision Analyses for Groundwater Remediation, Waste Management Symposium, Phoenix, AZ, 2017.
la ur-17-21909
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Kaxil Naik
Navigating today's data landscape isn't just about managing workflows; it's about strategically propelling your business forward. Apache Airflow has stood out as the benchmark in this arena, driving data orchestration forward since its early days. As we dive into the complexities of our current data-rich environment, where the sheer volume of information and its timely, accurate processing are crucial for AI and ML applications, the role of Airflow has never been more critical.
In my journey as the Senior Engineering Director and a pivotal member of Apache Airflow's Project Management Committee (PMC), I've witnessed Airflow transform data handling, making agility and insight the norm in an ever-evolving digital space. At Astronomer, our collaboration with leading AI & ML teams worldwide has not only tested but also proven Airflow's mettle in delivering data reliably and efficiently—data that now powers not just insights but core business functions.
This session is a deep dive into the essence of Airflow's success. We'll trace its evolution from a budding project to the backbone of data orchestration it is today, constantly adapting to meet the next wave of data challenges, including those brought on by Generative AI. It's this forward-thinking adaptability that keeps Airflow at the forefront of innovation, ready for whatever comes next.
The ever-growing demands of AI and ML applications have ushered in an era where sophisticated data management isn't a luxury—it's a necessity. Airflow's innate flexibility and scalability are what makes it indispensable in managing the intricate workflows of today, especially those involving Large Language Models (LLMs).
This talk isn't just a rundown of Airflow's features; it's about harnessing these capabilities to turn your data workflows into a strategic asset. Together, we'll explore how Airflow remains at the cutting edge of data orchestration, ensuring your organization is not just keeping pace but setting the pace in a data-driven future.
Session in https://budapestdata.hu/2024/04/kaxil-naik-astronomer-io/ | https://dataml24.sessionize.com/session/667627
Unsupervised machine learning methods for feature extraction
1. Unsupervised Machine Learning Methods for Feature Extraction
Velimir V. Vesselinov (monty) (vvv@lanl.gov)
Earth and Environmental Sciences Division, Los Alamos National Laboratory, NM, USA
http://tensors.lanl.gov
LA-UR-19-21450
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
2. Why unsupervised Machine Learning (ML)?
Supervised ML: requires prior categorization (knowledge) of the processed data
random forest, neural networks, active, reinforcement learning, ...
Example: Recognize images of cats and dogs after extensive training; but cannot
recognize horses if not trained
Cannot find something that we do not already know
Unsupervised ML: extracts hidden features (signals) in the processed data without any
prior information (exploratory analysis for data-driven science)
Example: Identify features that distinguish images of animals (e.g., cats, dogs, horses,
etc.); without prior information or training
AI + Unsupervised ML: ... coupled AI and unsupervised techniques ... currently
pursued by many
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
3. Why unsupervised Machine Learning (ML)?
Supervised ML: requires prior categorization (knowledge) of the processed data
random forest, neural networks, active, reinforcement learning, ...
Example: Recognize images of cats and dogs after extensive training; but cannot
recognize horses if not trained
Cannot find something that we do not already know
Unsupervised ML: extracts hidden features (signals) in the processed data without any
prior information (exploratory analysis for data-driven science)
Example: Identify features that distinguish images of animals (e.g., cats, dogs, horses,
etc.); without prior information or training
AI + Unsupervised ML: ... coupled AI and unsupervised techniques ... currently
pursued by many
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
4. Why unsupervised Machine Learning (ML)?
Supervised ML: requires prior categorization (knowledge) of the processed data
random forest, neural networks, active, reinforcement learning, ...
Example: Recognize images of cats and dogs after extensive training; but cannot
recognize horses if not trained
Cannot find something that we do not already know
Unsupervised ML: extracts hidden features (signals) in the processed data without any
prior information (exploratory analysis for data-driven science)
Example: Identify features that distinguish images of animals (e.g., cats, dogs, horses,
etc.); without prior information or training
AI + Unsupervised ML: ... coupled AI and unsupervised techniques ... currently
pursued by many
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
5. Why unsupervised Machine Learning (ML)?
Supervised ML: requires prior categorization (knowledge) of the processed data
random forest, neural networks, active, reinforcement learning, ...
Example: Recognize images of cats and dogs after extensive training; but cannot
recognize horses if not trained
Cannot find something that we do not already know
Unsupervised ML: extracts hidden features (signals) in the processed data without any
prior information (exploratory analysis for data-driven science)
Example: Identify features that distinguish images of animals (e.g., cats, dogs, horses,
etc.); without prior information or training
AI + Unsupervised ML: ... coupled AI and unsupervised techniques ... currently
pursued by many
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
6. Why unsupervised Machine Learning (ML)?
Supervised ML: requires prior categorization (knowledge) of the processed data
random forest, neural networks, active, reinforcement learning, ...
Example: Recognize images of cats and dogs after extensive training; but cannot
recognize horses if not trained
Cannot find something that we do not already know
Unsupervised ML: extracts hidden features (signals) in the processed data without any
prior information (exploratory analysis for data-driven science)
Example: Identify features that distinguish images of animals (e.g., cats, dogs, horses,
etc.); without prior information or training
AI + Unsupervised ML: ... coupled AI and unsupervised techniques ... currently
pursued by many
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
7. Why unsupervised Machine Learning (ML)?
Supervised ML: requires prior categorization (knowledge) of the processed data
random forest, neural networks, active, reinforcement learning, ...
Example: Recognize images of cats and dogs after extensive training; but cannot
recognize horses if not trained
Cannot find something that we do not already know
Unsupervised ML: extracts hidden features (signals) in the processed data without any
prior information (exploratory analysis for data-driven science)
Example: Identify features that distinguish images of animals (e.g., cats, dogs, horses,
etc.); without prior information or training
AI + Unsupervised ML: ... coupled AI and unsupervised techniques ... currently
pursued by many
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
8. Why not supervised Machine Learning (ML)
Supervised ML
can introduce subjectivity (through the labeling process)
does not provide insights why horses are different than dogs / cats
cannot make predictions
requires huge training (labeled) datasets
is impacted by “adversarial examples”
⇒ major limitations of the supervised methods
for data-analytics and data-driven science applications
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
10. Unsupervised Machine Learning Applications
Data Analytics: Identify signals (features) in datasets
Model Analytics/Diagnostics: Identify processes (features) in model outputs
Physics-Informed Machine Learning: Coupled Data/Model Analytics (fusion)
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
11. Unsupervised Machine Learning Applications
Data Analytics: Identify signals (features) in datasets
Feature extraction (FE):
Blind source separation (BSS)
Detection of disruptions / anomalies
Image recognition
Guide development of physics / reduced-order models representing the data
Discover unknown dependencies and phenomena
Develop reduced-order/surrogate models
Make predictions
Optimize data acquisition
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
12. Unsupervised Machine Learning Applications
Data Analytics: Identify signals (features) in datasets
Feature extraction (FE):
Blind source separation (BSS)
Detection of disruptions / anomalies
Image recognition
Guide development of physics / reduced-order models representing the data
Discover unknown dependencies and phenomena
Develop reduced-order/surrogate models
Make predictions
Optimize data acquisition
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
13. Unsupervised Machine Learning Applications
Data Analytics: Identify signals (features) in datasets
Feature extraction (FE):
Blind source separation (BSS)
Detection of disruptions / anomalies
Image recognition
Guide development of physics / reduced-order models representing the data
Discover unknown dependencies and phenomena
Develop reduced-order/surrogate models
Make predictions
Optimize data acquisition
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
14. Unsupervised Machine Learning Applications
Data Analytics: Identify signals (features) in datasets
Feature extraction (FE):
Blind source separation (BSS)
Detection of disruptions / anomalies
Image recognition
Guide development of physics / reduced-order models representing the data
Discover unknown dependencies and phenomena
Develop reduced-order/surrogate models
Make predictions
Optimize data acquisition
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
15. Unsupervised Machine Learning Applications
Data Analytics: Identify signals (features) in datasets
Feature extraction (FE):
Blind source separation (BSS)
Detection of disruptions / anomalies
Image recognition
Guide development of physics / reduced-order models representing the data
Discover unknown dependencies and phenomena
Develop reduced-order/surrogate models
Make predictions
Optimize data acquisition
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
16. Unsupervised Machine Learning Applications
Data Analytics: Identify signals (features) in datasets
Feature extraction (FE):
Blind source separation (BSS)
Detection of disruptions / anomalies
Image recognition
Guide development of physics / reduced-order models representing the data
Discover unknown dependencies and phenomena
Develop reduced-order/surrogate models
Make predictions
Optimize data acquisition
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
17. Unsupervised Machine Learning Applications
Data Analytics: Identify signals (features) in datasets
Feature extraction (FE):
Blind source separation (BSS)
Detection of disruptions / anomalies
Image recognition
Guide development of physics / reduced-order models representing the data
Discover unknown dependencies and phenomena
Develop reduced-order/surrogate models
Make predictions
Optimize data acquisition
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
18. Unsupervised Machine Learning Applications
Data Analytics: Identify signals (features) in datasets
Feature extraction (FE):
Blind source separation (BSS)
Detection of disruptions / anomalies
Image recognition
Guide development of physics / reduced-order models representing the data
Discover unknown dependencies and phenomena
Develop reduced-order/surrogate models
Make predictions
Optimize data acquisition
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
19. Unsupervised Machine Learning Applications
Model Analytics/Diagnostics: Identify processes (features) in model outputs
Separate processes (inseparable during modeling)
Model reduction (develop reduced-order/surrogate models)
Identify dependencies between model inputs and outputs
Discover unknown dependencies and phenomena
Make predictions
Optimize data acquisition
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
20. Unsupervised Machine Learning Applications
Physics-Informed Machine Learning: Coupled Data/Model Analytics (fusion)
Simultaneous analyses of data and model outputs
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
21. Nonnegative Matrix/Tensor Factorization
We have developed a series of novel unsupervised Machine Learning (ML) methods
and computational techniques
Our methods are based in matrix/tensor factorization coupled with custom k-means
clustering and nonnegativity/sparsity constraints:
NMFk: Nonnegative Matrix Factorization
NTFk: Nonnegative Tensor Factorization
NMFk/NTFk are capable to efficiently process large datasets (GB/TB’s) utilizing GPU’s
& TPU’s (TensorFlow, PyTorch, MXNet)
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
22. Why nonnegativity?
Nonnegativity constraints provide meaningful and interpretable results (+sparsity)
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
23. Why tensors?
Tensors (multi-dimensional/multi-modal/multi-way datasets) are everywhere:
monitoring data are typically a 5-D tensor (x,y,z,t,attributes)
model outputs are typically a 5-D tensor (x,y,z,t,attributes)
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
24. Why tensors-based unsupervised machine learning?
SVD cannot be applied on multi-way (tensor datasets)
Tensor-based unsupervised machine learning can be applied to
resolve interactions between dimensions (modes)
identify how controls (e.g., pressure, temperature, volume, etc.) are
impacting outcomes (e.g. oil production)
identify how observables (e.g., pressure, temperature, volume, etc.) are
impacting other observables (e.g. concentration)
identify how model inputs (e.g., conductivity, storage, etc.) are impacting
model outputs (e.g. pressure)
simultaneous analyses of data and model outputs (data/model fusion; PIML)
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
33. Tucker Tensor Factorization (3D case): Tucker-3
≈
X
[K, M, N]
G
[k, m, n]
H
[k, K]
W
[n, N]
V
[m, M]
X ≈ G ⊗ H ⊗ W ⊗ V
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
34. Tucker Tensor Decomposition: Feature extraction
≈
X
[K, M, N]
G
[k, m, n]
H
[k, K]
W
[n, N]
V
[m, M]
X ≈ G ⊗ H ⊗ W ⊗ V
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
35. Tucker Tensor Decomposition: Feature extraction
≈
t
y
x
t
y
x
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
36. Tucker Tensor Decomposition: Tucker-2 (three possible alternatives)
≈
X
[K, M, N]
G
[k, M, n]
H
[k, K]
W
[n, N]
X ≈ G ⊗ H ⊗ W
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
37. Tucker Tensor Decomposition: Tucker-1 (three possible alternatives)
≈
X
[K, M, N]
G
[k, M, N]
H
[k, K]
X ≈ G ⊗ H
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
38. Tucker Tensor Factorization (3D case): Tucker-3
≈
X
[K, M, N]
G
[k, m, n]
H
[k, K]
W
[n, N]
V
[m, M]
X ≈ G ⊗ H ⊗ W ⊗ V
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
39. Tucker Tensor Decomposition: Feature extraction
Tucker decomposition is achieved through minimization
Nonnegativity and sparsity constraints help the feature extraction
Optimal number of features [k, m, n] is estimated through k-means clustering of
a series minimization solutions with random initial guesses
≈
X
[K, M, N]
G
[k, m, n]
H
[k, K]
W
[n, N]
V
[m, M]
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
40. Tucker Tensor Decomposition: Example
V =
t
t2
tanh(t − 10) + 1
W =
x
x3
ex
sin(x) + 1
H =
y
y4
ln(y)
sin(2y) + 1
cos(y) + 1
(12 elements)
X = G ⊗ H ⊗ W ⊗ V
= xyt + xy4
t + xtln(y)+
xt(sin(2y) + 1) + x3
yt+
xt(cos(y) + 1) + ytex
+
yt(sin(x) + 1)+
xyt2
+ xy(1 + tanh(t − 10))+
t2
ex
(sin(2y) + 1)+
(1 + tanh(t − 10))(sin(x) + 1)
(cos(y) + 1)
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
41. Tucker Tensor Decomposition: Example
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
42. Tucker Tensor Decomposition: Example
← Truth vs Predictions →
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
43. Tucker Tensor Decomposition: Example
← Truth vs Predictions →
V =
t
t2
tanh(t − 10) + 1
W =
x
x3
ex
sin(x) + 1
H =
y
y4
ln(y)
sin(2y) + 1
cos(y) + 1
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
44. Tucker Tensor Decomposition: Example
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
45. Tucker Tensor Decomposition: Example
M = G ⊗ H ⊗ W
M1 = xy + xy4
+ xlog(y + 1)+
x(sin(2y) + 1) + yex
x(cos(y) + 1) + x3
y+
y(sin(x) + 1)
M2 = xy+
ex
(cos(z) + 1)
M3 = xy+
(sin(x) + 1)(cos(y) + 1)
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
46. Tucker Tensor Decomposition: Example
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
47. Tucker Tensor Decomposition: Example II
(50 × 50 × 50) tensor
50 columns in x
50 rows in y
50 time frames
‘ones‘ swimming in a sea of ‘zeros‘
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
48. Tucker Tensor Decomposition: Example II
Factorizing all 3 dimensions (50 × 50 × 50) → (6 × 44 × 48)
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
49. Tucker Tensor Decomposition: Example II
6 groups of swimmers (x); 44 lanes occupied (y); 48 time frames (first/last empty)
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
50. NMFk / NTFk Challenges
Identifying the number of unknown features:
resolved using custom k-means clustering and sparsity constraints on the core
tensor
number of features identified based on the reconstruction quality (e.g., Frobenius
norm) and cluster Silhouettes
Solving a non-unique optimization problem:
addressed through multistarts, regularization and nonnegativity constraints
applying diverse optimization techniques (Multiplicative/Alternating Least
Squares algorithms, NLopt, Ipopt, Gurobi, MOSEK, GLPK, Clp, Cbc, ...)
Processing Big Data:
GPU’s / TPU’s / Distributed computing
Account for data sparsity and structure
Nonnegative Tensor Trains
Dealing with Noisy Data:
Random noise impacts accuracy but its accountable
Systematic noise is identified as separate signals
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
53. NMFk / NTFk Analyses
Field Data:
Characterization of groundwater
contaminant sources
US Climate data
Geothermal data
Seismic data
Lab Data:
X-ray Spectroscopy
UV Fluorescence Spectroscopy
Microbial population analyses
Operational Data:
LANSCE: Los Alamos Neutron
Accelerator
Oil/gas production
Model Data:
Reactive mixing A + B → C
Phase separation of co-polymers
Molecular Dynamics of proteins
EU Climate modeling (Helmholtz
Institute, Germany)
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
54. NMFk / NTFk Publications
Stanev, Vesselinov, Kusne, Antoszewski, Takeuchi, Alexandrov, Unsupervised Phase
Mapping of X-ray Diffraction Data by Nonnegative Matrix Factorization Integrated with
Custom Clustering, Nature Computational Materials, 2018.
Vesselinov, Munuduru, Karra, O’Maley, Alexandrov, Unsupervised Machine Learning
Based on Non-Negative Tensor Factorization for Analyzing Reactive-Mixing, Journal of
Computational Physics, (in review), 2018.
Vesselinov, O’Malley, Alexandrov, Nonnegative Tensor Factorization for Contaminant
Source Identification, Journal of Contaminant Hydrology, (accepted), 2018.
O’Malley, Vesselinov, Alexandrov, Alexandrov, Nonnegative/binary matrix factorization
with a D-Wave quantum annealer, PLOS ONE, (accepted), 2018.
Vesselinov, O’Malley, Alexandrov, Contaminant source identification using
semi-supervised machine learning, Journal of Contaminant Hydrology,
10.1016/j.jconhyd.2017.11.002, 2017.
Alexandrov, Vesselinov, Blind source separation for groundwater level analysis based
on nonnegative matrix factorization, WRR, 10.1002/2013WR015037, 2014.
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
55. Climate model of Europe: air temperature
fluctuations in the air temperature [◦
C]
(424 × 412 × 365)
(columns × rows × days)
NTFk extracts spatial and temporal
footprints of dominant features:
storm signal
winter seasonal signal
summer seasonal signal
....
Data compression: ∼ 4GB → ∼ 0.5MB
Compression ratio: ∼ 8000
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
56. Climate model of Europe: air temperature fluctuations represented by 3 signals
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
57. Climate model of Europe: air temperature fluctuations represented by 3 signals
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
58. Climate model of Europe: water-table depth
fluctuations in the water-table depth [m]
(424 × 412 × 365)
(columns × rows × days)
NTFk extracts spatial and temporal
footprints of dominant signals
spring snowmelt signal
summer rainfall signal
seasonal signal
....
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
59. Climate model of Europe: water-table fluctuations represented by 3 signals
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
60. Climate model of Europe: water-table fluctuations represented by 3 signals
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
61. Climate model of Europe: analyze all model outputs (>40) simultaneously
Find interconnections among model outputs
Evaluate impacts of different model setups
Find dominant processes impacting model predictions
(e.g., climate impacts on groundwater resources, impacts of subsurface processes on
atmospheric conditions)
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
62. ML for extracting contaminant plumes
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
63. ML for extracting contaminant plumes
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
64. ML for extracting contaminant plumes
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
66. Geochemistry: Nonnegative Tensor Factorization based on Tucker-1 decomposition
X: data tensor
W: source (groundwater type) matrix
(unknown)
G: mixing tensor (unknown)
M: number of observation points (wells)
N: number of observation times (e.g,
2001, 2002, ..., 2017)
K: number of geochemical species
observed (e.g., Cr6+
, SO2+
4 , NO−
3 , etc.)
k: number of unknown groundwater
types mixed at each well
Constraints:
all tensor/matrix elements ≥ 0
k
i=1
Gi,j,t = 1 ∀j, t
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
68. NTFk estimated concentrations at various wells
R-28 R-42 R-44#1 R-45#1 R-11 R-1 R-50#1
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
69. NTFk estimated time-dependent mixing of 7 groundwater types at various wells
R-28
R-42
R-44#1
R-45#1
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
70. NTFk identified sources (groundwater types) Jan-Dec 2016
Source 7: (background)
Source 1: Cr, Cl
−
, NO3, Ca, Mg, and SO4
Source 3: Cl
−
, Ca, Mg (background)
Source 2: ClO4
Source 6: Cl
−
, Ca, Mg, and SO4
Source 4: NO3
Source 5:
3
H, Cr, Cl
−
, Ca, Mg, and SO4
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
71. NTFk identified sources (groundwater types) Jan-Dec 2005
Source 7: (background)
Source 1: Cr, Cl
−
, NO3, Ca, Mg, and SO4
Source 3: Cl
−
, Ca, Mg (background)
Source 2: ClO4
Source 6: Cl
−
, Ca, Mg, and SO4
Source 4: NO3
Source 5:
3
H, Cr, Cl
−
, Ca, Mg, and SO4
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
72. NTFk identified sources (groundwater types) Jan-Dec 2006
Source 7: (background)
Source 1: Cr, Cl
−
, NO3, Ca, Mg, and SO4
Source 3: Cl
−
, Ca, Mg (background)
Source 2: ClO4
Source 6: Cl
−
, Ca, Mg, and SO4
Source 4: NO3
Source 5:
3
H, Cr, Cl
−
, Ca, Mg, and SO4
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
73. NTFk identified sources (groundwater types) Jan-Dec 2007
Source 7: (background)
Source 1: Cr, Cl
−
, NO3, Ca, Mg, and SO4
Source 3: Cl
−
, Ca, Mg (background)
Source 2: ClO4
Source 6: Cl
−
, Ca, Mg, and SO4
Source 4: NO3
Source 5:
3
H, Cr, Cl
−
, Ca, Mg, and SO4
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
74. NTFk identified sources (groundwater types) Jan-Dec 2008
Source 7: (background)
Source 1: Cr, Cl
−
, NO3, Ca, Mg, and SO4
Source 3: Cl
−
, Ca, Mg (background)
Source 2: ClO4
Source 6: Cl
−
, Ca, Mg, and SO4
Source 4: NO3
Source 5:
3
H, Cr, Cl
−
, Ca, Mg, and SO4
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
75. NTFk identified sources (groundwater types) Jan-Dec 2009
Source 7: (background)
Source 1: Cr, Cl
−
, NO3, Ca, Mg, and SO4
Source 3: Cl
−
, Ca, Mg (background)
Source 2: ClO4
Source 6: Cl
−
, Ca, Mg, and SO4
Source 4: NO3
Source 5:
3
H, Cr, Cl
−
, Ca, Mg, and SO4
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
76. NTFk identified sources (groundwater types) Jan-Dec 2010
Source 7: (background)
Source 1: Cr, Cl
−
, NO3, Ca, Mg, and SO4
Source 3: Cl
−
, Ca, Mg (background)
Source 2: ClO4
Source 6: Cl
−
, Ca, Mg, and SO4
Source 4: NO3
Source 5:
3
H, Cr, Cl
−
, Ca, Mg, and SO4
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
77. NTFk identified sources (groundwater types) Jan-Dec 2011
Source 7: (background)
Source 1: Cr, Cl
−
, NO3, Ca, Mg, and SO4
Source 3: Cl
−
, Ca, Mg (background)
Source 2: ClO4
Source 6: Cl
−
, Ca, Mg, and SO4
Source 4: NO3
Source 5:
3
H, Cr, Cl
−
, Ca, Mg, and SO4
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
78. NTFk identified sources (groundwater types) Jan-Dec 2012
Source 7: (background)
Source 1: Cr, Cl
−
, NO3, Ca, Mg, and SO4
Source 3: Cl
−
, Ca, Mg (background)
Source 2: ClO4
Source 6: Cl
−
, Ca, Mg, and SO4
Source 4: NO3
Source 5:
3
H, Cr, Cl
−
, Ca, Mg, and SO4
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
79. NTFk identified sources (groundwater types) Jan-Dec 2013
Source 7: (background)
Source 1: Cr, Cl
−
, NO3, Ca, Mg, and SO4
Source 3: Cl
−
, Ca, Mg (background)
Source 2: ClO4
Source 6: Cl
−
, Ca, Mg, and SO4
Source 4: NO3
Source 5:
3
H, Cr, Cl
−
, Ca, Mg, and SO4
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
80. NTFk identified sources (groundwater types) Jan-Dec 2014
Source 7: (background)
Source 1: Cr, Cl
−
, NO3, Ca, Mg, and SO4
Source 3: Cl
−
, Ca, Mg (background)
Source 2: ClO4
Source 6: Cl
−
, Ca, Mg, and SO4
Source 4: NO3
Source 5:
3
H, Cr, Cl
−
, Ca, Mg, and SO4
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
81. NTFk identified sources (groundwater types) Jan-Dec 2015
Source 7: (background)
Source 1: Cr, Cl
−
, NO3, Ca, Mg, and SO4
Source 3: Cl
−
, Ca, Mg (background)
Source 2: ClO4
Source 6: Cl
−
, Ca, Mg, and SO4
Source 4: NO3
Source 5:
3
H, Cr, Cl
−
, Ca, Mg, and SO4
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
82. NTFk identified sources (groundwater types) Jan-Dec 2016
Source 7: (background)
Source 1: Cr, Cl
−
, NO3, Ca, Mg, and SO4
Source 3: Cl
−
, Ca, Mg (background)
Source 2: ClO4
Source 6: Cl
−
, Ca, Mg, and SO4
Source 4: NO3
Source 5:
3
H, Cr, Cl
−
, Ca, Mg, and SO4
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
83. NTFk analysis of LANL hydrogeochemical datasets
(18 × 8 × 12) tensor
(wells × species × years)
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
84. Oklahoma seismicity
32,251 seismic events from 1989 to
2017
Tensor: total energy of events over a
discretized domain
(118 × 97 × 520)
(columns × rows × weeks)
NTFk applied to extract dominant
hidden (latent) features based on spatial
footprints and temporal characteristics
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
93. LANSCE: Problem
Numerous independent variables ... knobs controlling the accelerator performance
15 provided / analyzed
Numerous dependent variables ... observations representing the accelerator
performance
8 provided / analyzed
312,326 * 23 = 7,183,498 ≈ 100 MB (per month)
Full monthly dataset ≈ 2 GB; For 30 years ≈ 1 TB;
Goal #1: ML a reduced-order model to simulate the performance (physics model does
not exist and cannot be built)
Goal #2: Use ML to control the knobs to optimize the performance
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
96. LANSCE: NMFk estimated signal #1 based on the dependent variables
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
97. LANSCE: NMFk estimated signals #2 based on the dependent variables
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
98. LANSCE: NMFk Estimated signal #3 based on the dependent variables
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
99. LANSCE: NMFk reconstruction of one of the dependent variables
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
100. LANSCE: NMFk estimated signal #3 vs one of the independent variables
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
101. LANSCE: Reduced-order (SVR) Model
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
102. Reactive mixing
A BBarrier
A + B → C
> 2000 simulations of C concentrations in time/space with varying
model inputs representing reactive mixing (5 input model
parameters)
NTFk identifies physics processes impacting C concentrations and
their relationship to model inputs
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
103. Reactive mixing: NTFk results
NTFk extracts the dominant time/space features (processes /
vortices) and compresses the model outputs
Compression: > 200GB → ∼ 70MB (ratio ∼ 3000
Here, (1000 × 81 × 81) → (3 × 12 × 13) (time × rows × columns)
Advection Dispersion Diffusion
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
105. Reactive mixing: NTFk results (κf L impacts)
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
106. Summary
Developed novel unsupervised ML methods and
computational tools based on Nonnegative
Factorization (Matrices/Tensors)
Our ML methods have been used to solve various
real-world problems (brought breakthrough
discoveries related to human cancer research)
Our ML work already contributes in many
research areas
Modeldata
Sensor data Experimentaldata
Robust Unsupervised
Machine Learning
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary
107. Machine Learning (ML) Algorithms / Codes developed by our team
Codes:
NMFk MADS NTFk
Examples:
http://madsjulia.github.io/Mads.jl/Examples/blind_source_separation
http://tensors.lanl.gov
http://tensordecompositions.github.io
Unsupervised ML Tucker Studies Climate Geochem Seimic LANSCE Mixing Summary