This talk will discuss risk assessment in ML Systems from the perspective of reliability, operations and especially causal aspects that can lead to outages in ML Systems.
Команда Data Phoenix Events приглашает всех, 17 августа в 19:00, на первый вебинар из серии "The A-Z of Data", который будет посвящен MLOps. В рамках вводного вебинара, мы рассмотрим, что такое MLOps, основные принципы и практики, лучшие инструменты и возможные архитектуры. Мы начнем с простого жизненного цикла разработки ML решений и закончим сложным, максимально автоматизированным, циклом, который нам позволяет реализовать MLOps.
https://dataphoenix.info/the-a-z-of-data/
https://dataphoenix.info/the-a-z-of-data-introduction-to-mlops/
Given at the MLOps. Summit 2020 - I cover the origins of MLOps in 2018, how MLOps has evolved from 2018 to 2020, and what I expect for the future of MLOps
Vertex AI: Pipelines for your MLOps workflowsMárton Kodok
In recent years, one of the biggest trends in applications development has been the rise of Machine Learning solutions, tools, and managed platforms. Vertex AI is a managed unified ML platform for all your AI workloads. On the MLOps side, Vertex AI Pipelines solutions let you adopt experiment pipelining beyond the classic build, train, eval, and deploy a model. It is engineered for data scientists and data engineers, and it’s a tremendous help for those teams who don’t have DevOps or sysadmin engineers, as infrastructure management overhead has been almost completely eliminated.
Based on practical examples we will demonstrate how Vertex AI Pipelines scores high in terms of developer experience, how fits custom ML needs, and analyze results. It’s a toolset for a fully-fledged machine learning workflow, a sequence of steps in the model development, a deployment cycle, such as data preparation/validation, model training, hyperparameter tuning, model validation, and model deployment. Vertex AI comes with all standard resources plus an ML metadata store, a fully managed feature store, and a fully managed pipelines runner.
Vertex AI Pipelines is a managed serverless toolkit, which means you don't have to fiddle with infrastructure or back-end resources to run workflows.
ML-Ops: From Proof-of-Concept to Production ApplicationHunter Carlisle
Successfully deploying a working machine learning prototype to a production application is a challenging task, frought with difficulties not experienced in traditional software deployments.
In this talk, you will learn techniques to successfully deploy ML applications in a scalable, maintainable, and automated way.
Команда Data Phoenix Events приглашает всех, 17 августа в 19:00, на первый вебинар из серии "The A-Z of Data", который будет посвящен MLOps. В рамках вводного вебинара, мы рассмотрим, что такое MLOps, основные принципы и практики, лучшие инструменты и возможные архитектуры. Мы начнем с простого жизненного цикла разработки ML решений и закончим сложным, максимально автоматизированным, циклом, который нам позволяет реализовать MLOps.
https://dataphoenix.info/the-a-z-of-data/
https://dataphoenix.info/the-a-z-of-data-introduction-to-mlops/
Given at the MLOps. Summit 2020 - I cover the origins of MLOps in 2018, how MLOps has evolved from 2018 to 2020, and what I expect for the future of MLOps
Vertex AI: Pipelines for your MLOps workflowsMárton Kodok
In recent years, one of the biggest trends in applications development has been the rise of Machine Learning solutions, tools, and managed platforms. Vertex AI is a managed unified ML platform for all your AI workloads. On the MLOps side, Vertex AI Pipelines solutions let you adopt experiment pipelining beyond the classic build, train, eval, and deploy a model. It is engineered for data scientists and data engineers, and it’s a tremendous help for those teams who don’t have DevOps or sysadmin engineers, as infrastructure management overhead has been almost completely eliminated.
Based on practical examples we will demonstrate how Vertex AI Pipelines scores high in terms of developer experience, how fits custom ML needs, and analyze results. It’s a toolset for a fully-fledged machine learning workflow, a sequence of steps in the model development, a deployment cycle, such as data preparation/validation, model training, hyperparameter tuning, model validation, and model deployment. Vertex AI comes with all standard resources plus an ML metadata store, a fully managed feature store, and a fully managed pipelines runner.
Vertex AI Pipelines is a managed serverless toolkit, which means you don't have to fiddle with infrastructure or back-end resources to run workflows.
ML-Ops: From Proof-of-Concept to Production ApplicationHunter Carlisle
Successfully deploying a working machine learning prototype to a production application is a challenging task, frought with difficulties not experienced in traditional software deployments.
In this talk, you will learn techniques to successfully deploy ML applications in a scalable, maintainable, and automated way.
Machine Learning at Scale with MLflow and Apache SparkDatabricks
Societe Generale is one of the major banks in France and has many data science teams across the globe. After years of explorations and prototyping, it is time for the company to really deploy machine learning projects at scale to the production environment.
To achieve that goal, we have been working hard to define a standard process of collaboration between data engineers and data scientists. And we also designed and deployed an infrastructure for productionizing machine learning.
During this presentation, you will be looking at the following points of our adventure:
1. Difficulties that we had for putting ML applications into production, such as lack of model registry; hard to deploy ML libraries to our Hadoop cluster; collaboration between data scientists and data engineers etc. ?
2. How did we deploy MLflow as a key technical component to our production hadoop environment given different security constraints.
3. How did we build a CI/CD pipeline to deploy ML applications automatically. MLflow plays an important role in this piepline.
4. A first and concrete production project developed on top of this infrastructure with MLflow, Spark streaming, Sklearn and CI/CD.
The key takeaways of this presentation would be:
1. To productionize machine learning in a big structure like Société Générale, a process of collaboration should be clearly defined.
2. A ML model registry is key to ML productionization. MLflow is the best solution we found.
3. A CI/CD pipeline is essential to the success of a machine learning application.
MLFlow: Platform for Complete Machine Learning Lifecycle Databricks
Description
Data Science and ML development bring many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools, and parameters to get the best results, and they need to track this information to reproduce work.
MLflow addresses some of these challenges during an ML model development cycle.
Abstract
ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools, and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. To address these problems, many companies are building custom “ML platforms” that automate this lifecycle, but even these platforms are limited to a few supported algorithms and to each company’s internal infrastructure.
In this session, we introduce MLflow, a new open source project from Databricks that aims to design an open ML platform where organizations can use any ML library and development tool of their choice to reliably build and share ML applications. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size.
With a short demo, you see a complete ML model life-cycle example, you will walk away with: MLflow concepts and abstractions for models, experiments, and projects How to get started with MLFlow Using tracking Python APIs during model training Using MLflow UI to visually compare and contrast experimental runs with different tuning parameters and evaluate metrics
Challenges of Operationalising Data Science in Productioniguazio
The presentation topic for this meet-up was covered in two sections without any breaks in-between
Section 1: Business Aspects (20 mins)
Speaker: Rasmi Mohapatra, Product Owner, Experian
https://www.linkedin.com/in/rasmi-m-428b3a46/
Once your data science application is in the production, there are many typical data science operational challenges experienced today - across business domains - we will cover a few challenges with example scenarios
Section 2: Tech Aspects (40 mins, slides & demo, Q&A )
Speaker: Santanu Dey, Solution Architect, Iguazio
https://www.linkedin.com/in/santanu/
In this part of the talk, we will cover how these operational challenges can be overcome e.g. automating data collection & preparation, making ML models portable & deploying in production, monitoring and scaling, etc.
with relevant demos.
MLflow and Azure Machine Learning—The Power Couple for ML Lifecycle ManagementDatabricks
The ML Lifecycle management process is quickly becoming the bottleneck for a lot of ML projects. With MLflow’s newest release, and its enhanced integration with Azure Machine Learning, this process is now showing the right promise and capabilities on Azure. In this talk, we intend to take a tour of the integration details and how MLOps is now becoming a strength of the platform. We’ll talk about versioning, maintaining run history, production pipeline automation, deployment to cloud and edge, and CI/CD pipelines with MLOps as the backdrop.
Be prepared for an interactive conversation as we intend to seek a lot of feedback on the integration and capabilities being lit up.
Spark Summit EU 2017 - Preventing revenue leakage and monitoring distributed ...Flavio Clesio
Our presentation at Spark Summit EU 2017 - spark-summit.org/eu-2017/events/preventing-revenue-leakage-and-monitoring-distributed-systems-with-machine-learning/
Managing the Machine Learning Lifecycle with MLflowDatabricks
ML development brings many new complexities beyond the traditional software development lifecycle. MLflow is an open-source project from Databricks aiming to solve some of these challenges such as experiment tracking, reproducibility, model packaging, deployment, and governance, in order to manage and accelerate the lifecycle of your ML projects.
Revolutionary container based hybrid cloud solution for MLPlatform
Ness' data science platform, NextGenML, puts the entire machine learning process: modelling, execution and deployment in the hands of data science teams.
The entire paradigm approaches collaboration around AI/ML, being implemented with full respect for best practices and commitment to innovation.
Kubernetes (onPrem) + Docker, Azure Kubernetes Cluster (AKS), Nexus, Azure Container Registry(ACR), GlusterFS
Workflow
Argo->Kubeflow
DevOps
Helm, kSonnet, Kustomize,Azure DevOps
Code Management & CI/CD
Git, TeamCity, SonarQube, Jenkins
Security
MS Active Directory, Azure VPN, Dex (K8s) integrated with GitLab
Machine Learning
TensorFlow (model training, boarding, serving), Keras, Seldon
Storage (Azure)
Storage Gen1 & Gen2, Data Lake, File Storage
ETL (Azure)
Databricks, Spark on K8, Data Factory (ADF), HDInsight (Kafka and Spark), Service Bus (ASB)
Lambda functions & VMs, Cache for Redis
Monitoring and Logging
Graphana, Prometeus, GrayLog
Forget becoming a Data Scientist, become a Machine Learning Engineer insteadData Con LA
Data Con LA 2020
Description
Machine learning is an essential skill in today's job market. But when it comes to learning Machine Learning, beginners get lot of conflicting advice. I have been teaching ML for software engineers for years. In this talk
*I will dis-spell some of the myths surrounding machine learning
*give you solid, tangible plan on how to go about learning ML
*and give you good pointers to start from
*and steer you away from common mistakes
Speaker
Sujee Maniyam, Elephant Scale, Founder, Principal instructor
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Sri Ambati
This talk was recorded in London on Oct 30, 2018 and can be viewed here: https://youtu.be/p4iAnxwC_Eg
The good news is building fair, accountable, and transparent machine learning systems is possible. The bad news is it’s harder than many blogs and software package docs would have you believe. The truth is nearly all interpretable machine learning techniques generate approximate explanations, that the fields of eXplainable AI (XAI) and Fairness, Accountability, and Transparency in Machine Learning (FAT/ML) are very new, and that few best practices have been widely agreed upon. This combination can lead to some ugly outcomes!
This talk aims to make your interpretable machine learning project a success by describing fundamental technical challenges you will face in building an interpretable machine learning system, defining the real-world value proposition of approximate explanations for exact models, and then outlining the following viable techniques for debugging, explaining, and testing machine learning models
Mateusz is a software developer who loves all things distributed, machine learning and hates buzzwords. His favourite hobby data juggling.
He obtained his M.Sc. in Computer Science from AGH UST in Krakow, Poland, during which he did an exchange at L’ECE Paris in France and worked on distributed flight booking systems. After graduation he move to Tokyo to work as a researcher at Fujitsu Laboratories on machine learning and NLP projects, where he is still currently based.
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...Sri Ambati
This talk was recorded in London on October 30, 2018.
KNIME Analytics Platform is an easy to use and comprehensive open source data integration, analysis, and exploration platform, enabling data scientists to visually compose end to end data analysis workflows. The over 2,000 available modules ("nodes") cover each step of the analysis workflow, including blending heterogeneous data types, data transformation, wrangling and cleansing, advanced data visualization, or model training and deployment.
Many of these nodes are provided through open source integrations (why reinvent the wheel?). This provides seamless access to large open source projects such as Keras and Tensorflow for deep learning, Apache Spark for big data processing, Python and R for scripting, and more. These integrations can be used in combination with other KNIME nodes meaning that data scientists can freely select from a vast variety of options when tackling an analysis problem.
The integration of H2O in KNIME offers an extensive number of nodes and encapsulating functionalities of the H2O open source machine learning libraries, making it easy to use H2O algorithms from a KNIME workflow without touching any code - each of the H2O nodes looks and feels just like a normal KNIME node - and the data scientist benefits from the high performance libraries and proven quality of H2O during execution. For prototyping these algorithms are executed locally, however training and deployment can easily be scaled up using a Sparkling Water cluster.
In our talk we give a short introduction to KNIME Analytics Platform and then demonstrate how data scientists benefit from using KNIME Analytics Platform and H2O Machine Learning in combination by using a real world analysis example.
Bio: Christian received a Master’s degree in Computer Science from the University of Konstanz. Having gained experience as a research software engineer at the University of Konstanz, where he developed frameworks and libraries in the fields of bioimage analysis and machine learning, Christian moved on to become a software engineer at KNIME. He now focuses on developing new functionalities and extensions for KNIME Analytics Platform. Some of his recent projects include deep learning integrations built upon Keras and Tensorflow, extensions for image analysis and active learning, and the integration of H2O Machine Learning and H2O Sparkling Water in KNIME Analytics Platform.
Model Experiments Tracking and Registration using MLflow on DatabricksDatabricks
Machine learning models are only as good as the quality of data and the size of datasets used to train the models. Data has shown that data scientists spend around 80% of their time on preparing and managing data for analysis and 57% of the data scientists regard cleaning and organizing data as the least enjoyable part of their work. This further validates the idea of MLOps and the need for collaboration between data scientists and data engineers.
AI Modernization at AT&T and the Application to Fraud with DatabricksDatabricks
AT&T has been involved in AI from the beginning, with many firsts; “first to coin the term AI”, “inventors of R”, “foundational work on Conv. Neural Nets”, etc. and we have applied AI to hundreds of solutions. Today we are modernizing these AI solutions in the cloud with the help of Databricks and a variety of in-house developments. This talk will highlight our AI modernization effort along with its application to Fraud which is one of our biggest benefitting applications.
Scaling up Deep Learning by Scaling DownDatabricks
In the last few years, deep learning has achieved dramatic success in a wide range of domains, including computer vision, artificial intelligence, speech recognition, natural language processing and reinforcement learning.
Machine Learning Operations Active Failures, Latent ConditionsFlavio Clesio
This talk will discuss risk assessment in ML Systems from the perspective of reliability, operations, and especially causal aspects that can lead to outages in ML Systems.
This presentation is a keynote in the AI4SE International Workshop exploring the challenges and opportunities of bringing Systems Engineering the development of AI/ML functions for safety-critical systems.
Machine Learning at Scale with MLflow and Apache SparkDatabricks
Societe Generale is one of the major banks in France and has many data science teams across the globe. After years of explorations and prototyping, it is time for the company to really deploy machine learning projects at scale to the production environment.
To achieve that goal, we have been working hard to define a standard process of collaboration between data engineers and data scientists. And we also designed and deployed an infrastructure for productionizing machine learning.
During this presentation, you will be looking at the following points of our adventure:
1. Difficulties that we had for putting ML applications into production, such as lack of model registry; hard to deploy ML libraries to our Hadoop cluster; collaboration between data scientists and data engineers etc. ?
2. How did we deploy MLflow as a key technical component to our production hadoop environment given different security constraints.
3. How did we build a CI/CD pipeline to deploy ML applications automatically. MLflow plays an important role in this piepline.
4. A first and concrete production project developed on top of this infrastructure with MLflow, Spark streaming, Sklearn and CI/CD.
The key takeaways of this presentation would be:
1. To productionize machine learning in a big structure like Société Générale, a process of collaboration should be clearly defined.
2. A ML model registry is key to ML productionization. MLflow is the best solution we found.
3. A CI/CD pipeline is essential to the success of a machine learning application.
MLFlow: Platform for Complete Machine Learning Lifecycle Databricks
Description
Data Science and ML development bring many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools, and parameters to get the best results, and they need to track this information to reproduce work.
MLflow addresses some of these challenges during an ML model development cycle.
Abstract
ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools, and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. To address these problems, many companies are building custom “ML platforms” that automate this lifecycle, but even these platforms are limited to a few supported algorithms and to each company’s internal infrastructure.
In this session, we introduce MLflow, a new open source project from Databricks that aims to design an open ML platform where organizations can use any ML library and development tool of their choice to reliably build and share ML applications. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size.
With a short demo, you see a complete ML model life-cycle example, you will walk away with: MLflow concepts and abstractions for models, experiments, and projects How to get started with MLFlow Using tracking Python APIs during model training Using MLflow UI to visually compare and contrast experimental runs with different tuning parameters and evaluate metrics
Challenges of Operationalising Data Science in Productioniguazio
The presentation topic for this meet-up was covered in two sections without any breaks in-between
Section 1: Business Aspects (20 mins)
Speaker: Rasmi Mohapatra, Product Owner, Experian
https://www.linkedin.com/in/rasmi-m-428b3a46/
Once your data science application is in the production, there are many typical data science operational challenges experienced today - across business domains - we will cover a few challenges with example scenarios
Section 2: Tech Aspects (40 mins, slides & demo, Q&A )
Speaker: Santanu Dey, Solution Architect, Iguazio
https://www.linkedin.com/in/santanu/
In this part of the talk, we will cover how these operational challenges can be overcome e.g. automating data collection & preparation, making ML models portable & deploying in production, monitoring and scaling, etc.
with relevant demos.
MLflow and Azure Machine Learning—The Power Couple for ML Lifecycle ManagementDatabricks
The ML Lifecycle management process is quickly becoming the bottleneck for a lot of ML projects. With MLflow’s newest release, and its enhanced integration with Azure Machine Learning, this process is now showing the right promise and capabilities on Azure. In this talk, we intend to take a tour of the integration details and how MLOps is now becoming a strength of the platform. We’ll talk about versioning, maintaining run history, production pipeline automation, deployment to cloud and edge, and CI/CD pipelines with MLOps as the backdrop.
Be prepared for an interactive conversation as we intend to seek a lot of feedback on the integration and capabilities being lit up.
Spark Summit EU 2017 - Preventing revenue leakage and monitoring distributed ...Flavio Clesio
Our presentation at Spark Summit EU 2017 - spark-summit.org/eu-2017/events/preventing-revenue-leakage-and-monitoring-distributed-systems-with-machine-learning/
Managing the Machine Learning Lifecycle with MLflowDatabricks
ML development brings many new complexities beyond the traditional software development lifecycle. MLflow is an open-source project from Databricks aiming to solve some of these challenges such as experiment tracking, reproducibility, model packaging, deployment, and governance, in order to manage and accelerate the lifecycle of your ML projects.
Revolutionary container based hybrid cloud solution for MLPlatform
Ness' data science platform, NextGenML, puts the entire machine learning process: modelling, execution and deployment in the hands of data science teams.
The entire paradigm approaches collaboration around AI/ML, being implemented with full respect for best practices and commitment to innovation.
Kubernetes (onPrem) + Docker, Azure Kubernetes Cluster (AKS), Nexus, Azure Container Registry(ACR), GlusterFS
Workflow
Argo->Kubeflow
DevOps
Helm, kSonnet, Kustomize,Azure DevOps
Code Management & CI/CD
Git, TeamCity, SonarQube, Jenkins
Security
MS Active Directory, Azure VPN, Dex (K8s) integrated with GitLab
Machine Learning
TensorFlow (model training, boarding, serving), Keras, Seldon
Storage (Azure)
Storage Gen1 & Gen2, Data Lake, File Storage
ETL (Azure)
Databricks, Spark on K8, Data Factory (ADF), HDInsight (Kafka and Spark), Service Bus (ASB)
Lambda functions & VMs, Cache for Redis
Monitoring and Logging
Graphana, Prometeus, GrayLog
Forget becoming a Data Scientist, become a Machine Learning Engineer insteadData Con LA
Data Con LA 2020
Description
Machine learning is an essential skill in today's job market. But when it comes to learning Machine Learning, beginners get lot of conflicting advice. I have been teaching ML for software engineers for years. In this talk
*I will dis-spell some of the myths surrounding machine learning
*give you solid, tangible plan on how to go about learning ML
*and give you good pointers to start from
*and steer you away from common mistakes
Speaker
Sujee Maniyam, Elephant Scale, Founder, Principal instructor
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Sri Ambati
This talk was recorded in London on Oct 30, 2018 and can be viewed here: https://youtu.be/p4iAnxwC_Eg
The good news is building fair, accountable, and transparent machine learning systems is possible. The bad news is it’s harder than many blogs and software package docs would have you believe. The truth is nearly all interpretable machine learning techniques generate approximate explanations, that the fields of eXplainable AI (XAI) and Fairness, Accountability, and Transparency in Machine Learning (FAT/ML) are very new, and that few best practices have been widely agreed upon. This combination can lead to some ugly outcomes!
This talk aims to make your interpretable machine learning project a success by describing fundamental technical challenges you will face in building an interpretable machine learning system, defining the real-world value proposition of approximate explanations for exact models, and then outlining the following viable techniques for debugging, explaining, and testing machine learning models
Mateusz is a software developer who loves all things distributed, machine learning and hates buzzwords. His favourite hobby data juggling.
He obtained his M.Sc. in Computer Science from AGH UST in Krakow, Poland, during which he did an exchange at L’ECE Paris in France and worked on distributed flight booking systems. After graduation he move to Tokyo to work as a researcher at Fujitsu Laboratories on machine learning and NLP projects, where he is still currently based.
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...Sri Ambati
This talk was recorded in London on October 30, 2018.
KNIME Analytics Platform is an easy to use and comprehensive open source data integration, analysis, and exploration platform, enabling data scientists to visually compose end to end data analysis workflows. The over 2,000 available modules ("nodes") cover each step of the analysis workflow, including blending heterogeneous data types, data transformation, wrangling and cleansing, advanced data visualization, or model training and deployment.
Many of these nodes are provided through open source integrations (why reinvent the wheel?). This provides seamless access to large open source projects such as Keras and Tensorflow for deep learning, Apache Spark for big data processing, Python and R for scripting, and more. These integrations can be used in combination with other KNIME nodes meaning that data scientists can freely select from a vast variety of options when tackling an analysis problem.
The integration of H2O in KNIME offers an extensive number of nodes and encapsulating functionalities of the H2O open source machine learning libraries, making it easy to use H2O algorithms from a KNIME workflow without touching any code - each of the H2O nodes looks and feels just like a normal KNIME node - and the data scientist benefits from the high performance libraries and proven quality of H2O during execution. For prototyping these algorithms are executed locally, however training and deployment can easily be scaled up using a Sparkling Water cluster.
In our talk we give a short introduction to KNIME Analytics Platform and then demonstrate how data scientists benefit from using KNIME Analytics Platform and H2O Machine Learning in combination by using a real world analysis example.
Bio: Christian received a Master’s degree in Computer Science from the University of Konstanz. Having gained experience as a research software engineer at the University of Konstanz, where he developed frameworks and libraries in the fields of bioimage analysis and machine learning, Christian moved on to become a software engineer at KNIME. He now focuses on developing new functionalities and extensions for KNIME Analytics Platform. Some of his recent projects include deep learning integrations built upon Keras and Tensorflow, extensions for image analysis and active learning, and the integration of H2O Machine Learning and H2O Sparkling Water in KNIME Analytics Platform.
Model Experiments Tracking and Registration using MLflow on DatabricksDatabricks
Machine learning models are only as good as the quality of data and the size of datasets used to train the models. Data has shown that data scientists spend around 80% of their time on preparing and managing data for analysis and 57% of the data scientists regard cleaning and organizing data as the least enjoyable part of their work. This further validates the idea of MLOps and the need for collaboration between data scientists and data engineers.
AI Modernization at AT&T and the Application to Fraud with DatabricksDatabricks
AT&T has been involved in AI from the beginning, with many firsts; “first to coin the term AI”, “inventors of R”, “foundational work on Conv. Neural Nets”, etc. and we have applied AI to hundreds of solutions. Today we are modernizing these AI solutions in the cloud with the help of Databricks and a variety of in-house developments. This talk will highlight our AI modernization effort along with its application to Fraud which is one of our biggest benefitting applications.
Scaling up Deep Learning by Scaling DownDatabricks
In the last few years, deep learning has achieved dramatic success in a wide range of domains, including computer vision, artificial intelligence, speech recognition, natural language processing and reinforcement learning.
Machine Learning Operations Active Failures, Latent ConditionsFlavio Clesio
This talk will discuss risk assessment in ML Systems from the perspective of reliability, operations, and especially causal aspects that can lead to outages in ML Systems.
This presentation is a keynote in the AI4SE International Workshop exploring the challenges and opportunities of bringing Systems Engineering the development of AI/ML functions for safety-critical systems.
Machine learning algorithms are permeating our world. With applications in banking, investing, social media, advertising, and crime prevention, to name a few, these little black boxes are increasingly being used to inform and drive decisions about our lives and businesses. Machine Learning Risk Management is an often overlooked aspect of creating, deploying, and monitoring machine learning applications. Andrew will explain the dangers associated with an absence of controls during the machine learning process. He will then demonstrate how controls prevent modeling biases and suggest ways to develop and deploy machine learning applications with a control-centric, engineered approach.
Ontonix Complexity Measurement and Predictive Analytics WP Oct 2013Datonix.it
Breakthrough analytics for your business. Ontonix model-free and patented technology is used for advanced BI, Risk and Business Governance Management. Discover the big picture from all structured business process and discover the hidden fragility an what your options are to fix it.
Do not measure the wrong KPI - we automatically discover the native and intrinsic key performance indicators for you.
Business is all about Numbers & Speed, Professionalism is all about realization of Commitments. How to make these two ends meet.. is by reducing Waste.
Is increasing entropy of information systems a fatalityRené MANDEL
The Information System Complexity is a natural trend.
Data Governance is a solution. Un other way is to act on reference composents (MDM, Data Wells), giving them capability of "perfect" integration.
The Quality “Logs”-Jam: Why Alerting for Cybersecurity is Awash with False Po...Mark Underwood
What happens when the (Observe) Plan-Do-Check-Adjust cycle is undermined by lapses in data integrity? Observations are questioned. Plans may be ill-conceived. Actions may be undertaken that undermine rather than enhance. “Checks” can fail. Adjustments may be guesswork. In cybersecurity, the results of poor data integrity can be expensive outages, ransom requests, breaches, fines -- even bankruptcy (think Cambridge Analytica). But data integrity issues take many forms, ranging from benign to malicious. The full range of these issues is surveyed from a cybersecurity perspective, where logs and alerts are critical for defenders -- as well as quality engineers . Techniques borrowed from model-based systems engineering and ontology AI to are identified that can mitigate these deleterious effects on PDCA.
Resilience Engineering & Human Error... in ITJoão Miranda
A system is resilient if it can adjust its functioning prior to, during, or following events (changes, disturbances, and opportunities), and thereby sustain required operations under both expected and unexpected conditions.
Also, in a world of complex systems, human error as an explanation for failure is somewhat a fallacy, an obstacle to learning and therefore, to create resilient systems.
This is the presentation of the paper about the integration of artificial intelligence and the systems engineering lifecycle.
You can find more information in the following link: https://event.conflr.com/IS2019/sessiondetail_395325
The objective of this presentation to present some challenges and opportunities in the integration of Systems Engineering and the Artificial Intelligence/Machine Learning model lifecycle.
Synergy of Human and Artificial Intelligence in Software EngineeringTao Xie
Keynote Talk by Tao Xie at International NSF sponsored Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE 2013) http://promisedata.org/raise/2013/
Dr. Haden A. Land, CEO and CTO of Safely2Prosperity, graced the cover of World’s Leaders Magazine as one of the Worlds Most Influential Leaders Inspiring The Tech World, 2024
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1fjTxvB.
Trisha Gee and Todd Montgomery attack the technology industry’s sacred cows by exposing the motivations that hide behind them. They discuss how these motivations lead us into practices that hinder rather than help us deliver quality software. Also, they discuss why some organisations seem to be achieving things that the traditional corporate IT departments can only dream of. Filmed at qconnewyork.com.
Todd Montgomery is Ex-NASA researcher, Chief Architect at Kaazing. Trisha Gee is Java Champion and Engineer.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
2. ABOUT ME
Flávio Clésio
• Machine Learning Engineer @ MyHammer AG
• MSc. in Production Engineering (Machine Learning in Credit
Derivatives/NPL)
• Specialist in Database Engineering and Business Intelligence
• Blogger @ flavioclesio.com
• Talked in some venues (Strata Hadoop World, Spark Summit, PAPIS.io, The
Developers Conference and so on…)
flavioclesio
3. CURRENT STATE OF ML SYSTEMS
Machine Learning Systems play a huge role in several businesses from the banking
industry, recommender systems until health domains.
When we talk about high stakes Machine Learning in Production we can consider that
this era of "a-data-scientist-with-a-script-in-a-single-machine" is officially over.
4. This talk will discuss risk assessment in ML Systems from the perspective of
reliability, operations and especially causal aspects that can lead to outages in ML
Systems.
WHAT IT’S ABOUT?
5. SURVIVORSHIP BIAS
Several posts, conference talks, papers most of the time they presents only what
worked extremely well, how those solutions generated revenue for the company and
other happy cases.
6. SURVIVORSHIP BIAS
Almost no one disclosures what went wrong during the development of these
solutions.
This is essentially a problem given that we are only seeing the final outcome and not
how that outcome was generated and the failures/errors made along the way.
7. ● Not sexy
● People can feel blame or silly to talk about their errors
● Can turns in a "bad personal/corporate branding"
● Who died in during the process cannot tell what went wrong
FAILURE: A NOT SO ROMANTIC TOPIC
9. ● Amazon: The data on a load-balancer was
deleted and that caused a disruption in
practically an entire AWS region at the
time;
● Gitlab: A deletion of a production
database led to an 18-hour unavailability
with loss of customer data;
● Knight Capital: Lack of code review
culture allowed an engineer to deploy a
code 8 years outdated in production.
Outcome: losses of $ 172,222 per second
for 45 minutes (or U$ 465 million).
SOME SPECIFIC FAILURE CASES
● European Space Agency: A conversion
from a 16-bit to 64-bit number caused an
overflow in the rocket steering system
that triggered a chain of events that
caused the rocket to be destroyed and a
loss of more than $ 370 million
● NASA: A degradation from an engineering
culture to political/product culture led to
a catastrophic failure that not only cost
billions of dollars but also killed the crew
in the Challenger space shuttle.
10. FAILURE = LEARNING = OPPORTUNITY TO IMPROVE
● There is always a lesson to be learned in the face of what went wrong
● A good culture it’s not about blaming or not thinking about the problems, but to
analyse them, learn and improve.
● For every lesson learned, all system becomes more reliable
11. The aviation industry it’s one example where the reliability are significantly increased
for every incident/accident.
This is one of industries that becomes more reliable even with the increase of
transactions along the time; and because of that the number of fatalities has been
falling year by year.
RELIABILITY BENCHMARK
12. This model was created by James Reason in early 90ies as a general framework for
understanding the dynamics of accident causation.
The idea was to identify latent conditions and active failures to put in place
countermeasures to minimize a unwanted variability in human behaviour in
socio-technological systems.
Sources: “Human error: models and management” and "The contribution of latent
human failures to the breakdown of complex systems"
SWISS CHEESE MODEL
13. DEFENSES, BARRIERS, SAFEGUARDS
[...] High-tech systems have many defensive layers: some are designed (alarms,
physical barriers, automatic shutdowns, etc.), others rely on people (surgeons,
anesthetists, pilots, control room operators, etc.) and others rely on procedures and
administrative controls. [...]
Source: Human error: models and management
14. [...] In an ideal world, each defensive layer would be intact. In reality, however, they are
more like slices of Swiss cheese, with many holes - although, unlike cheese, these holes
are continually opening, closing and moving. The presence of holes in any "slice" does
not normally cause a bad result. This can usually happen only when the holes in several
layers line up momentarily to allow for an accident opportunity trajectory - bringing risks
to harmful contact with victims[...]
Source: Human error: models and management
DEFENSES, BARRIERS, SAFEGUARDS
16. ● Local fixes are appealing because of sounds productive and look good for the
other teams (e.g. see how this person solved the problem very fast?)
● Cultivates several dormant problems and silent risks that can potentially cause
harm, instead to eliminate them
● Promotes a stagnated engineering culture instead to aim for a continuous reform
(i.e. not only cosmetic enhancements but a substantial reform)
● Local fixes it’s like have a problem with mosquitoes and keep swatting them
every day, instead to solve the problem draining the swamps in which they breed.
WHY NOT JUST FIX THE PROBLEM AND MOVE ON?
17. That is, in this case each slice of Swiss cheese would be a line of defense with
projected layers (e.g., monitoring, alarms, code push locks in production, etc.) and / or
the procedural layers that involve people (e.g., cultural aspects , training and
qualification of commiters in the repository, rollback mechanisms, unit and integration
tests, etc.).
LATENT CONDITIONS AND ACTIVE FAILURES
18. LATENT CONDITIONS
[...] Latent conditions are like a kind of situations intrinsically resident within the
system; which are consequences of design, engineering decisions, who wrote the rules
or procedures and even the highest hierarchical levels in an organization. These latent
conditions can lead to two types of adverse effects, which are situations that cause
error and the creation of vulnerabilities. That is, the solution has a design that increases
the likelihood of high negative impact events that can be equivalent to a causal or
contributing factor.[...]
Source: Human error: models and management
19. ACTIVE FAILURES
[...]Active failures are insecure acts or minor transgressions committed by people who
are in direct contact with the system; these acts can be mistakes, lapses, distortions,
omissions, errors and procedural violations.[...]
Source: Human error: models and management
21. ● Absence of Code Review culture (e.g. London Whale and Knight Capital)
● Culture of improvised technical arrangements (e.g. workarounds)
● Lack of observability
● Democracy-type decisions with less informed people rather than consensus
between experts and risk-takers
SOME LATENT CONDITIONS IN ML
22. ● Resumé-Driven Development
● Unreviewed code going into production
● Data Leakage in model training
● Lack of reproducibility / replicability
● Glue code
SOME ACTIVE FAILURES IN ML
25. ● Orchestration (e.g. Mesos, Airflow, Kubernetes, AWS ECS, Kubeflow)
● Observability (e.g. Elasticsearch, Kibana, Prometheus, Sentry, Grafana, FluentBit,
Datadog)
● ML Experiment Management ( e.g. ModelChimp, Randopt, Forge, Lore, Datmo,
Studio ML, Sacred, MLFlow, Polyaxon)
● Data Versioning and management (e.g. DVC, Pachyderm, Snorkel)
● ML SaaS (e.g. Algorithmia, Peltarion, Databricks, Seldon IO, Google AI Platform,
AWS Sage Maker, Azure ML Studio, Dotscience, Daitaku DSS, Domino AI,
Polyaxon, Weights & Biases, Spell, Gradient, Paperspace, H2O AI, Stack ML,
Comet, Valohai, Neptune AI)
SOME TOOLS
26. • There’s no silver bullet regarding risk management in ML Platforms. The hard
part it’s to know how perceive and manage those risks
• An outage never happens due to a single reason. Outages are several latent
conditions and active failures combined and aligned that triggers the event
• Human factors plays a huge role in outages
• If possible, share your mistakes. When someone shares what went wrong,
everyone learns and all ML systems becomes more robust.
FINAL REMARKS
28. ● Reason, James. “The contribution of latent human failures to the breakdown of complex
systems.” Philosophical Transactions of the Royal Society of London. B, Biological Sciences
327.1241 (1990): 475-484.
● Reason, J. “Human error: models and management.” BMJ (Clinical research ed.) vol. 320,7237
(2000): 768-70. doi:10.1136/bmj.320.7237.768
● Morgenthaler, J. David, et al. “Searching for build debt: Experiences managing technical debt at
Google.” 2012 Third International Workshop on Managing Technical Debt (MTD). IEEE, 2012.
● Alahdab, Mohannad, and Gül Çalıklı. “Empirical Analysis of Hidden Technical Debt Patterns in
Machine Learning Software.” International Conference on Product-Focused Software Process
Improvement. Springer, Cham, 2019.
REFERENCES
29. ● Perneger, Thomas V. “The Swiss cheese model of safety incidents: are there holes in the
metaphor?.” BMC health services research vol. 5 71. 9 Nov. 2005, doi:10.1186/1472-6963-5-71
● “Hot cheese: a processed Swiss cheese model.” JR Coll Physicians Edinb 44 (2014): 116-21.
● Breck, Eric, et al. “What’s your ML Test Score? A rubric for ML production systems.” (2016).
● SEC Charges Knight Capital With Violations of Market Access Rule
● Machine Learning Goes Production! Engineering, Maintenance Cost, Technical Debt, Applied Data
Analysis Lab Seminar
REFERENCES
30. REFERENCES
● Nassim Taleb – Lectures on Fat Tails, (Anti)Fragility, Precaution, and Asymmetric Exposures
● Skybrary – Human Factors Analysis and Classification System (HFACS)
● CEFA Aviation – Swiss Cheese Model
● A List of Post-mortems
● Richard Cook – How Complex Systems Fail
● Airbus – Hull Losses
● Number of flights performed by the global airline industry from 2004 to 2020