In this presentation, drawing upon Thorogood’s experience with a customer’s global Data & Analytics division as their MLOps delivery partner, we share important learnings and takeaways from delivering productionized ML solutions and shaping MLOps best practices and organizational standards needed to be successful.
We open by providing high-level context & answering key questions such as “What is MLOps exactly?” & “What are the benefits of establishing MLOps Standards?”
The subsequent presentation focuses on our learnings & best practices. We start by discussing common challenges when refactoring experimentation use-cases & how to best get ahead of these issues in a global organization. We then outline an Engagement Model for MLOps addressing: People, Processes, and Tools. ‘Processes’ highlights how to manage the often siloed data science use case demand pipeline for MLOps & documentation to facilitate seamless integration with an MLOps framework. ‘People’ provides context around the appropriate team structures & roles to be involved in an MLOps initiative. ‘Tools’ addresses key requirements of tools used for MLOps, considering the match of services to use-cases.
MLflow is an MLOps tool that enables data scientist to quickly productionize their Machine Learning projects. To achieve this, MLFlow has four major components which are Tracking, Projects, Models, and Registry. MLflow lets you train, reuse, and deploy models with any library and package them into reproducible steps. MLflow is designed to work with any machine learning library and require minimal changes to integrate into an existing codebase. In this session, we will cover the common pain points of machine learning developers such as tracking experiments, reproducibility, deployment tool and model versioning. Ready to get your hands dirty by doing quick ML project using mlflow and release to production to understand the ML-Ops lifecycle.
The catalyst for the success of automobiles came not through the invention of the car but rather through the establishment of an innovative assembly line. History shows us that the ability to mass produce and distribute a product is the key to driving adoption of any innovation, and machine learning is no different. MLOps is the assembly line of Machine Learning and in this presentation we will discuss the core capabilities your organization should be focused on to implement a successful MLOps system.
MLOps (a compound of “machine learning” and “operations”) is a practice for collaboration and communication between data scientists and operations professionals to help manage the production machine learning lifecycle. Similar to the DevOps term in the software development world, MLOps looks to increase automation and improve the quality of production ML while also focusing on business and regulatory requirements. MLOps applies to the entire ML lifecycle - from integrating with model generation (software development lifecycle, continuous integration/continuous delivery), orchestration, and deployment, to health, diagnostics, governance, and business metrics.
To watch the full presentation click here: https://info.cnvrg.io/mlopsformachinelearning
In this webinar, we’ll discuss core practices in MLOps that will help data science teams scale to the enterprise level. You’ll learn the primary functions of MLOps, and what tasks are suggested to accelerate your teams machine learning pipeline. Join us in a discussion with cnvrg.io Solutions Architect, Aaron Schneider, and learn how teams use MLOps for more productive machine learning workflows.
- Reduce friction between science and engineering
- Deploy your models to production faster
- Health, diagnostics and governance of ML models
- Kubernetes as a core platform for MLOps
- Support advanced use-cases like continual learning with MLOps
Machine Learning operations brings data science to the world of devops. Data scientists create models on their workstations. MLOps adds automation, validation and monitoring to any environment including machine learning on kubernetes. In this session you hear about latest developments and see it in action.
MLflow is an MLOps tool that enables data scientist to quickly productionize their Machine Learning projects. To achieve this, MLFlow has four major components which are Tracking, Projects, Models, and Registry. MLflow lets you train, reuse, and deploy models with any library and package them into reproducible steps. MLflow is designed to work with any machine learning library and require minimal changes to integrate into an existing codebase. In this session, we will cover the common pain points of machine learning developers such as tracking experiments, reproducibility, deployment tool and model versioning. Ready to get your hands dirty by doing quick ML project using mlflow and release to production to understand the ML-Ops lifecycle.
The catalyst for the success of automobiles came not through the invention of the car but rather through the establishment of an innovative assembly line. History shows us that the ability to mass produce and distribute a product is the key to driving adoption of any innovation, and machine learning is no different. MLOps is the assembly line of Machine Learning and in this presentation we will discuss the core capabilities your organization should be focused on to implement a successful MLOps system.
MLOps (a compound of “machine learning” and “operations”) is a practice for collaboration and communication between data scientists and operations professionals to help manage the production machine learning lifecycle. Similar to the DevOps term in the software development world, MLOps looks to increase automation and improve the quality of production ML while also focusing on business and regulatory requirements. MLOps applies to the entire ML lifecycle - from integrating with model generation (software development lifecycle, continuous integration/continuous delivery), orchestration, and deployment, to health, diagnostics, governance, and business metrics.
To watch the full presentation click here: https://info.cnvrg.io/mlopsformachinelearning
In this webinar, we’ll discuss core practices in MLOps that will help data science teams scale to the enterprise level. You’ll learn the primary functions of MLOps, and what tasks are suggested to accelerate your teams machine learning pipeline. Join us in a discussion with cnvrg.io Solutions Architect, Aaron Schneider, and learn how teams use MLOps for more productive machine learning workflows.
- Reduce friction between science and engineering
- Deploy your models to production faster
- Health, diagnostics and governance of ML models
- Kubernetes as a core platform for MLOps
- Support advanced use-cases like continual learning with MLOps
Machine Learning operations brings data science to the world of devops. Data scientists create models on their workstations. MLOps adds automation, validation and monitoring to any environment including machine learning on kubernetes. In this session you hear about latest developments and see it in action.
ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. To address these problems, many companies are building custom “ML platforms” that automate this lifecycle, but even these platforms are limited to a few supported algorithms and to each company’s internal infrastructure. In this talk, I present MLflow, a new open source project from Databricks that aims to design an open ML platform where organizations can use any ML library and development tool of their choice to reliably build and share ML applications. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size.
MLOps Bridging the gap between Data Scientists and Ops.Knoldus Inc.
Through this session we're going to introduce the MLOps lifecycle and discuss the hidden loopholes that can affect the MLProject. Then we are going to discuss the ML Model lifecycle and discuss the problem with training. We're going to introduce the MLFlow Tracking module in order to track the experiments.
Команда Data Phoenix Events приглашает всех, 17 августа в 19:00, на первый вебинар из серии "The A-Z of Data", который будет посвящен MLOps. В рамках вводного вебинара, мы рассмотрим, что такое MLOps, основные принципы и практики, лучшие инструменты и возможные архитектуры. Мы начнем с простого жизненного цикла разработки ML решений и закончим сложным, максимально автоматизированным, циклом, который нам позволяет реализовать MLOps.
https://dataphoenix.info/the-a-z-of-data/
https://dataphoenix.info/the-a-z-of-data-introduction-to-mlops/
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...Databricks
ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools, and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. To address these problems, many companies are building custom “ML platforms” that automate this lifecycle, but even these platforms are limited to a few supported algorithms and to each company’s internal infrastructure. In this session, we introduce MLflow, a new open source project from Databricks that aims to design an open ML platform where organizations can use any ML library and development tool of their choice to reliably build and share ML applications. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size. In this deep-dive session, through a complete ML model life-cycle example, you will walk away with:
MLflow concepts and abstractions for models, experiments, and projects
How to get started with MLFlow
Understand aspects of MLflow APIs
Using tracking APIs during model training
Using MLflow UI to visually compare and contrast experimental runs with different tuning parameters and evaluate metrics
Package, save, and deploy an MLflow model
Serve it using MLflow REST API
What’s next and how to contribute
Given at the MLOps. Summit 2020 - I cover the origins of MLOps in 2018, how MLOps has evolved from 2018 to 2020, and what I expect for the future of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsWeaveworks
In this final Weave Online User Group of 2019, David Aronchick asks: have you ever struggled with having different environments to build, train and serve ML models, and how to orchestrate between them? While DevOps and GitOps have made huge traction in recent years, many customers struggle to apply these practices to ML workloads. This talk will focus on the ways MLOps has helped to effectively infuse AI into production-grade applications through establishing practices around model reproducibility, validation, versioning/tracking, and safe/compliant deployment. We will also talk about the direction for MLOps as an industry, and how we can use it to move faster, with more stability, than ever before.
The recording of this session is on our YouTube Channel here: https://youtu.be/twsxcwgB0ZQ
Speaker: David Aronchick, Head of Open Source ML Strategy, Microsoft
Bio: David leads Open Source Machine Learning Strategy at Azure. This means he spends most of his time helping humans to convince machines to be smarter. He is only moderately successful at this. Previously, David led product management for Kubernetes at Google, launched GKE, and co-founded the Kubeflow project. David has also worked at Microsoft, Amazon and Chef and co-founded three startups.
Sign up for a free Machine Learning Ops Workshop: http://bit.ly/MLOps_Workshop_List
Weaveworks will cover concepts such as GitOps (operations by pull request), Progressive Delivery (canary, A/B, blue-green), and how to apply those approaches to your machine learning operations to mitigate risk.
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
Looking to build a robust machine learning infrastructure to streamline MLOps? Learn from Provectus experts how to ensure the success of your MLOps initiative by implementing Data QA components in your ML infrastructure.
For most organizations, the development of multiple machine learning models, their deployment and maintenance in production are relatively new tasks. Join Provectus as we explain how to build an end-to-end infrastructure for machine learning, with a focus on data quality and metadata management, to standardize and streamline machine learning life cycle management (MLOps).
Agenda
- Data Quality and why it matters
- Challenges and solutions of Data Testing
- Challenges and solutions of Model Testing
- MLOps pipelines and why they matter
- How to expand validation pipelines for Data Quality
In this talk, I present an introduction of MLFlow. I also show some examples of using it by means of MLFlow Tracking, MLFlow Projects and MLFlow Models. I also used Databricks as an example of remote tracking.
End to end Machine Learning using Kubeflow - Build, Train, Deploy and ManageAnimesh Singh
With the breadth of sheer functionalities which need to be addressed in the Machine Learning world around building, training, serving and managing models, getting it done in a consistent, composable, portable, and scalable manner is hard. The Kubernetes framework is well suited to address these issues, which is why it's a great foundation for deploying ML workloads. Kubeflow is designed to take advantage of these benefits. In this talk, we are going to address how to make it easy for everyone to develop, deploy, and manage portable, scalable ML everywhere and support the full lifecycle Machine Learning using open source technologies like Kubeflow, Tensorflow, PyTorch,Tekton, Knative, Istio and others. We are going to discuss how to enable distributed training of models, model serving, canary rollouts, drift detection, model explainability, metadata management, pipelines and others. Additionally we will discuss Watson productization in progress based on Kubeflow Pipelines and Tekton, and point to Kubeflow Dojo materials and follow-on workshops.
How to use Azure Machine Learning service to manage the lifecycle of your models. Azure Machine Learning uses a Machine Learning Operations (MLOps) approach, which improves the quality and consistency of your machine learning solutions.
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleDatabricks
Successfully building a machine learning model is hard enough. Reproducing your results at scale — enabling others to reproduce pipelines, comparing results from other versions, moving models into production, redeploying and rolling out updated models — is exponentially harder. To address these challenges and accelerate innovation, many companies are building custom “ML platforms” to automate the end-to-end ML lifecycle.
Watch a replay of this MLOps Virtual Event to hear more about the latest developments and best practices for managing the full ML lifecycle on Databricks with MLflow. We covered a checklist of capabilities you’ll need, common pitfalls, technological and organizational challenges, and how to overcome them.
https://www.youtube.com/playlist?list=PLTPXxbhUt-YUFNBwBsSIlknoNbS7GExZw
Introdution to Dataops and AIOps (or MLOps)Adrien Blind
This presentation introduces the audience to the DataOps and AIOps practices. It deals with organizational & tech aspects, and provide hints to start you data journey.
MLflow: Infrastructure for a Complete Machine Learning Life CycleDatabricks
ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. To address these problems, many companies are building custom “ML platforms” that automate this lifecycle, but even these platforms are limited to a few supported algorithms and to each company’s internal infrastructure.
In this talk, we will present MLflow, a new open source project from Databricks that aims to design an open ML platform where organizations can use any ML library and development tool of their choice to reliably build and share ML applications. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size.
MLOps – Applying DevOps to Competitive AdvantageDATAVERSITY
MLOps is a practice for collaboration between Data Science and operations to manage the production machine learning (ML) lifecycles. As an amalgamation of “machine learning” and “operations,” MLOps applies DevOps principles to ML delivery, enabling the delivery of ML-based innovation at scale to result in:
Faster time to market of ML-based solutions
More rapid rate of experimentation, driving innovation
Assurance of quality, trustworthiness, and ethical AI
MLOps is essential for scaling ML. Without it, enterprises risk struggling with costly overhead and stalled progress. Several vendors have emerged with offerings to support MLOps: the major offerings are Microsoft Azure ML and Google Vertex AI. We looked at these offerings from the perspective of enterprise features and time-to-value.
ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. To address these problems, many companies are building custom “ML platforms” that automate this lifecycle, but even these platforms are limited to a few supported algorithms and to each company’s internal infrastructure. In this talk, I present MLflow, a new open source project from Databricks that aims to design an open ML platform where organizations can use any ML library and development tool of their choice to reliably build and share ML applications. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size.
MLOps Bridging the gap between Data Scientists and Ops.Knoldus Inc.
Through this session we're going to introduce the MLOps lifecycle and discuss the hidden loopholes that can affect the MLProject. Then we are going to discuss the ML Model lifecycle and discuss the problem with training. We're going to introduce the MLFlow Tracking module in order to track the experiments.
Команда Data Phoenix Events приглашает всех, 17 августа в 19:00, на первый вебинар из серии "The A-Z of Data", который будет посвящен MLOps. В рамках вводного вебинара, мы рассмотрим, что такое MLOps, основные принципы и практики, лучшие инструменты и возможные архитектуры. Мы начнем с простого жизненного цикла разработки ML решений и закончим сложным, максимально автоматизированным, циклом, который нам позволяет реализовать MLOps.
https://dataphoenix.info/the-a-z-of-data/
https://dataphoenix.info/the-a-z-of-data-introduction-to-mlops/
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...Databricks
ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools, and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. To address these problems, many companies are building custom “ML platforms” that automate this lifecycle, but even these platforms are limited to a few supported algorithms and to each company’s internal infrastructure. In this session, we introduce MLflow, a new open source project from Databricks that aims to design an open ML platform where organizations can use any ML library and development tool of their choice to reliably build and share ML applications. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size. In this deep-dive session, through a complete ML model life-cycle example, you will walk away with:
MLflow concepts and abstractions for models, experiments, and projects
How to get started with MLFlow
Understand aspects of MLflow APIs
Using tracking APIs during model training
Using MLflow UI to visually compare and contrast experimental runs with different tuning parameters and evaluate metrics
Package, save, and deploy an MLflow model
Serve it using MLflow REST API
What’s next and how to contribute
Given at the MLOps. Summit 2020 - I cover the origins of MLOps in 2018, how MLOps has evolved from 2018 to 2020, and what I expect for the future of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsWeaveworks
In this final Weave Online User Group of 2019, David Aronchick asks: have you ever struggled with having different environments to build, train and serve ML models, and how to orchestrate between them? While DevOps and GitOps have made huge traction in recent years, many customers struggle to apply these practices to ML workloads. This talk will focus on the ways MLOps has helped to effectively infuse AI into production-grade applications through establishing practices around model reproducibility, validation, versioning/tracking, and safe/compliant deployment. We will also talk about the direction for MLOps as an industry, and how we can use it to move faster, with more stability, than ever before.
The recording of this session is on our YouTube Channel here: https://youtu.be/twsxcwgB0ZQ
Speaker: David Aronchick, Head of Open Source ML Strategy, Microsoft
Bio: David leads Open Source Machine Learning Strategy at Azure. This means he spends most of his time helping humans to convince machines to be smarter. He is only moderately successful at this. Previously, David led product management for Kubernetes at Google, launched GKE, and co-founded the Kubeflow project. David has also worked at Microsoft, Amazon and Chef and co-founded three startups.
Sign up for a free Machine Learning Ops Workshop: http://bit.ly/MLOps_Workshop_List
Weaveworks will cover concepts such as GitOps (operations by pull request), Progressive Delivery (canary, A/B, blue-green), and how to apply those approaches to your machine learning operations to mitigate risk.
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
Looking to build a robust machine learning infrastructure to streamline MLOps? Learn from Provectus experts how to ensure the success of your MLOps initiative by implementing Data QA components in your ML infrastructure.
For most organizations, the development of multiple machine learning models, their deployment and maintenance in production are relatively new tasks. Join Provectus as we explain how to build an end-to-end infrastructure for machine learning, with a focus on data quality and metadata management, to standardize and streamline machine learning life cycle management (MLOps).
Agenda
- Data Quality and why it matters
- Challenges and solutions of Data Testing
- Challenges and solutions of Model Testing
- MLOps pipelines and why they matter
- How to expand validation pipelines for Data Quality
In this talk, I present an introduction of MLFlow. I also show some examples of using it by means of MLFlow Tracking, MLFlow Projects and MLFlow Models. I also used Databricks as an example of remote tracking.
End to end Machine Learning using Kubeflow - Build, Train, Deploy and ManageAnimesh Singh
With the breadth of sheer functionalities which need to be addressed in the Machine Learning world around building, training, serving and managing models, getting it done in a consistent, composable, portable, and scalable manner is hard. The Kubernetes framework is well suited to address these issues, which is why it's a great foundation for deploying ML workloads. Kubeflow is designed to take advantage of these benefits. In this talk, we are going to address how to make it easy for everyone to develop, deploy, and manage portable, scalable ML everywhere and support the full lifecycle Machine Learning using open source technologies like Kubeflow, Tensorflow, PyTorch,Tekton, Knative, Istio and others. We are going to discuss how to enable distributed training of models, model serving, canary rollouts, drift detection, model explainability, metadata management, pipelines and others. Additionally we will discuss Watson productization in progress based on Kubeflow Pipelines and Tekton, and point to Kubeflow Dojo materials and follow-on workshops.
How to use Azure Machine Learning service to manage the lifecycle of your models. Azure Machine Learning uses a Machine Learning Operations (MLOps) approach, which improves the quality and consistency of your machine learning solutions.
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleDatabricks
Successfully building a machine learning model is hard enough. Reproducing your results at scale — enabling others to reproduce pipelines, comparing results from other versions, moving models into production, redeploying and rolling out updated models — is exponentially harder. To address these challenges and accelerate innovation, many companies are building custom “ML platforms” to automate the end-to-end ML lifecycle.
Watch a replay of this MLOps Virtual Event to hear more about the latest developments and best practices for managing the full ML lifecycle on Databricks with MLflow. We covered a checklist of capabilities you’ll need, common pitfalls, technological and organizational challenges, and how to overcome them.
https://www.youtube.com/playlist?list=PLTPXxbhUt-YUFNBwBsSIlknoNbS7GExZw
Introdution to Dataops and AIOps (or MLOps)Adrien Blind
This presentation introduces the audience to the DataOps and AIOps practices. It deals with organizational & tech aspects, and provide hints to start you data journey.
MLflow: Infrastructure for a Complete Machine Learning Life CycleDatabricks
ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. To address these problems, many companies are building custom “ML platforms” that automate this lifecycle, but even these platforms are limited to a few supported algorithms and to each company’s internal infrastructure.
In this talk, we will present MLflow, a new open source project from Databricks that aims to design an open ML platform where organizations can use any ML library and development tool of their choice to reliably build and share ML applications. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size.
MLOps – Applying DevOps to Competitive AdvantageDATAVERSITY
MLOps is a practice for collaboration between Data Science and operations to manage the production machine learning (ML) lifecycles. As an amalgamation of “machine learning” and “operations,” MLOps applies DevOps principles to ML delivery, enabling the delivery of ML-based innovation at scale to result in:
Faster time to market of ML-based solutions
More rapid rate of experimentation, driving innovation
Assurance of quality, trustworthiness, and ethical AI
MLOps is essential for scaling ML. Without it, enterprises risk struggling with costly overhead and stalled progress. Several vendors have emerged with offerings to support MLOps: the major offerings are Microsoft Azure ML and Google Vertex AI. We looked at these offerings from the perspective of enterprise features and time-to-value.
Databricks for MLOps Presentation (AI/ML)Knoldus Inc.
In this session, we will be introducing how we can utilize Databricks to achieve MLflow in Machine learning. The main highlight for this session will be featured in machine learning like MLflow with Databricks for every experiment tracking, how we can do model packaging, and how we can deploy the model of machine learning in Databricks.
Software engineering practices for the data science and machine learning life...DataWorks Summit
With the advent of newer frameworks and toolkits, data scientists are now more productive than ever and starting to prove indispensable to enterprises. Typical organizations have large teams of data scientists who build out key analytics assets that are used on a daily basis and an integral part of live transactions. However, there is also quite a lot of chaos and complexities that get introduced because of the state of the industry. Many packages used by data scientists are from open source, and even if they are well curated, there is a growing tendency to pick out the cutting-edge or unstable packages and frameworks to accelerate analytics. Different data scientists may use different versions of runtimes, different Python or R versions, or even different versions of the same packages. Predominantly data scientists work on their laptops and it becomes difficult to reproduce their environments for use by others. Since data science is now a team sport across multiple personas, involving non-practitioners, traditional application developers, execs, and IT operators, how does an enterprise create a platform for productive cross-role collaboration?
Enterprises need a very reliable and repeatable process, especially when it results in something that affects their production environments. They also require a well managed approach that enables the graduation of an asset from development through a testing and staging process to production. Given the pace of businesses nowadays, the process needs to be quite agile and flexible too—even enabling an easy path to reversing a change. Compliance and audit processes require clear lineage and history as well as approval chains.
In the traditional software engineering world, this lifecycle has been well understood and best practices have been followed for ages. But what does it mean when you have non-programmers or users who are not really trained in software engineering philosophies or who perceive all of this as "big process" roadblocks in their daily work ? How do you we engage them in a productive manner and yet support enterprise requirements for reliability, tracking, and a clear continuous integration and delivery practice? The presenters, in this session, will bring up interesting techniques based on their user research, real life customer interviews, and productized best practices. The presenters also invite the audience to share their stories and best practices to make this a lively conversation.
Speaker
Sriram Srinivasan, Senior Technical Staff Member, Analytics Platform Architect, IBM
For this talk, I will be discussing about various approaches to accelerate deep learning solutions from notebooks or research environment to production environment and how these solutions can be transformed as an enterprise level end to end Deep Learning Solution, which can be consumed as a service by any software application, with a practical use-case example.
FlorenceAI: Reinventing Data Science at HumanaDatabricks
Humana strives to help the communities we serve and our individual members achieve their best health – no small task in the past year! We had the opportunity to rethink our existing operations and reimagine what a collaborative ML platform for hundreds of data scientists might look like. The primary goal of our ML Platform, named FlorenceAI, is to automate and accelerate the delivery lifecycle of data science solutions at scale. In this presentation, we will walk through an end-to-end example of how to build a model at scale on FlorenceAI and deploy it to production. Tools highlighted include Azure Databricks, MLFlow, AppInsights, and Azure Data Factory.
We will employ slides, notebooks and code snippets covering problem framing and design, initial feature selection, model design and experimentation, and a framework of centralized production code to streamline implementation. Hundreds of data scientists now use our feature store that has tens of thousands of features refreshed in daily and monthly cadences across several years of historical data. We already have dozens of models in production and also daily provide fresh insights for our Enterprise Clinical Operating Model. Each day, billions of rows of data are generated to give us timely information.
We already have examples of teams operating orders of magnitude faster and at a scale not within reach using fixed on-premise resources. Given rapid adoption from a dozen pilot users to over 100 MAU in the first 5 months, we will also share some anecodotes about key early wins created by the platform. We want FlorenceAI to enable Humana’s data scientists to focus their efforts where they add the most value so we can continue to deliver high-quality solutions that remain fresh, relevant and fair in an ever changing world.
Agile, Automated, Aware: How to Model for SuccessInside Analysis
The Briefing Room with David Loshin and Embarcadero
Live Webcast October 27, 2015
Watch the archive: https://bloorgroup.webex.com/bloorgroup/onstage/g.php?MTID=eea9877b71c653c499c809c5693eae8fe
Data management teams face some tough challenges these days. Organizations need business-driven visibility that enables understanding and awareness of enterprise data assets – without worrying about definitions and change management. But with information architectures evolving into a hybrid mix of data objects and data services built over relational databases as well as big data stores, serving up accurately defined, reusable data can become a complex issue.
Register for this episode of The Briefing Room to learn from veteran Analyst David Loshin as he explains the importance of agile, automated workflows in today’s enterprise. He’ll be briefed by Ron Huizenga of Embarcadero, who will discuss how his company’s ER/Studio suite approaches data modeling and management from a modern architecture standpoint. He will explain that unifying the way information is represented can not only eliminate the need for costly workarounds, but also foster collaboration between data architects, developers and business users.
Visit InsideAnalysis.com for more information.
DutchMLSchool. ML for Energy Trading and Automotive SectorBigML, Inc
Machine Learning for Energy Trading, Automotive Sector, and Logistics, presented by BigML's Partners A1 Digital.
Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
Building Data Science into Organizations: Field ExperienceDatabricks
We will share our experiences in building Data Science and Machine Learning (DS/ML) into organizations. As new DS/ML teams are created, many wrestle with questions such as: How can we most efficiently achieve short-term goals while planning for scale and production long-term? How should DS/ML be incorporated into a company?
We will bring unique perspectives: one as a previous Databricks customer leading a DS team, one as the second ML engineer at Databricks, and both as current Solutions Architects guiding customers through their DS/ML journeys.We will cover best practices through the crawl-walk-run journey of DS/ML: how to immediately become more productive with an initial team, how to scale and move towards production when needed, and how to integrate effectively with the broader organization.
This talk is meant for technical leaders who are building new DS/ML teams or helping to spread DS/ML practices across their organizations. Technology discussion will focus on Databricks, but the lessons apply to any tech platforms in this space.
Bridging the Gap: from Data Science to ProductionFlorian Wilhelm
A recent but quite common observation in industry is that although there is an overall high adoption of data science, many companies struggle to get it into production. Huge teams of well-payed data scientists often present one fancy model after the other to their managers but their proof of concepts never manifest into something business relevant. The frustration grows on both sides, managers and data scientists.
In my talk I elaborate on the many reasons why data science to production is such a hard nut to crack. I start with a taxonomy of data use cases in order to easier assess technical requirements. Based thereon, my focus lies on overcoming the two-language-problem which is Python/R loved by data scientists vs. the enterprise-established Java/Scala. From my project experiences I present three different solutions, namely 1) migrating to a single language, 2) reimplementation and 3) usage of a framework. The advantages and disadvantages of each approach is presented and general advices based on the introduced taxonomy is given.
Additionally, my talk also addresses organisational as well as problems in quality assurance and deployment. Best practices and further references are presented on a high-level in order to cover all facets of data science to production.
With my talk I hope to convey the message that breakdowns on the road from data science to production are rather the rule than the exception, so you are not alone. At the end of my talk, you will have a better understanding of why your team and you are struggling and what to do about it.
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Databricks
Getting machine learning models to production is notoriously difficult: it involves multiple teams (data scientists, data and machine learning engineers, operations, …), who often does not speak to each other very well; the model can be trained in one environment but then productionalized in completely different environment; it is not just about the code, but also about the data (features) and the model itself… At DataSentics, as a machine learning and cloud engineering studio, we see this struggle firsthand – on our internal projects and client’s projects as well.
Managers guide to effective building of machine learning productsGianmario Spacagna
Part 1/2 (Managers)
Data and Machine Learning (ML) technologies are now widespread and adopted by literally all industries. Although recent advancements in the field have reached an unthinkable level of maturity, many organizations still struggle with turning these advances into tangible profits. Unfortunately, many ML projects get stuck in a proof-of-concept stage without ever reaching customers and generating revenue. In order to effectively adopt ML technologies, enterprises need to build the right business cases as well as to be ready to face the inevitable challenges. In this talk, we will share common pitfalls, lessons learned, and best practices, while building different enterprise products. In particular, we will focus on the generic use case of ML as the core technology enabling customer-facing products regardless of the specific industry or application.
You will:
Understand if ML is the right solution for your business and set the right expectations;
Deal with the additional uncertainty of ML projects with respect to traditional software;
Build a balanced ML team and cover the broad spectrum of skills;
Know how to apply the scientific workflow in an agile development framework;
Learn how to turn research into production systems including engineering practices and tools;
Be able to leverage modern cloud and serverless architecture for scalable, autonomous and cheaper deployments.
It is a fascinating, explosive time for enterprise analytics.
It is from the position of analytics leadership that the mission will be executed and company leadership will emerge. The data professional is absolutely sitting on the performance of the company in this information economy and has an obligation to demonstrate the possibilities and originate the architecture, data, and projects that will deliver analytics. After all, no matter what business you’re in, you’re in the business of analytics.
The coming years will be full of big changes in enterprise analytics and Data Architecture. William will kick off the fourth year of the Advanced Analytics series with a discussion of the trends winning organizations should build into their plans, expectations, vision, and awareness now.
The Complexity to "Yes" in Analytics Software and the Possibilities with Dock...Docker, Inc.
When you have to say ""YES"" to everything, what does that really mean?
The Boston Consulting Group (BCG) is a global management consulting firm that advises industry leading companies on value creation strategies, innovation, transformation, supply chain management and much more. BCG provides recommendations and analytics that are custom tailored to the needs of each client. To do this BCG is transforming what used to be presentations and manual models into software that is prototyped and distributed to their client to run on their infrastructure - in effect changing the way BCG delivers value.
However the productization process poses unique challenges and opportunities. Their clients have every type of infrastructure and application stack within their IT environments, how can BCG ensure that the custom built analytics applications will operate well at scale in their client's production environment - especially when it is an environment that they don't control? The bespoke nature of the BCG business led the team to embark on an engineering led journey to containerization with Docker. Attend this session to learn more about the approach, challenges and how BCG is enabling transformation with Docker Enterprise Edition.
Similar to Experimentation to Industrialization: Implementing MLOps (20)
Data Lakehouse Symposium | Day 1 | Part 1Databricks
The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.
Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.
Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.
This is an educational event.
Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.
Data Lakehouse Symposium | Day 1 | Part 2Databricks
The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.
Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.
Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.
This is an educational event.
Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.
The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.
Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.
Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.
This is an educational event.
Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.
The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.
Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.
Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.
This is an educational event.
Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
In this session, learn how to quickly supplement your on-premises Hadoop environment with a simple, open, and collaborative cloud architecture that enables you to generate greater value with scaled application of analytics and AI on all your data. You will also learn five critical steps for a successful migration to the Databricks Lakehouse Platform along with the resources available to help you begin to re-skill your data teams.
Democratizing Data Quality Through a Centralized PlatformDatabricks
Bad data leads to bad decisions and broken customer experiences. Organizations depend on complete and accurate data to power their business, maintain efficiency, and uphold customer trust. With thousands of datasets and pipelines running, how do we ensure that all data meets quality standards, and that expectations are clear between producers and consumers? Investing in shared, flexible components and practices for monitoring data health is crucial for a complex data organization to rapidly and effectively scale.
At Zillow, we built a centralized platform to meet our data quality needs across stakeholders. The platform is accessible to engineers, scientists, and analysts, and seamlessly integrates with existing data pipelines and data discovery tools. In this presentation, we will provide an overview of our platform’s capabilities, including:
Giving producers and consumers the ability to define and view data quality expectations using a self-service onboarding portal
Performing data quality validations using libraries built to work with spark
Dynamically generating pipelines that can be abstracted away from users
Flagging data that doesn’t meet quality standards at the earliest stage and giving producers the opportunity to resolve issues before use by downstream consumers
Exposing data quality metrics alongside each dataset to provide producers and consumers with a comprehensive picture of health over time
Learn to Use Databricks for Data ScienceDatabricks
Data scientists face numerous challenges throughout the data science workflow that hinder productivity. As organizations continue to become more data-driven, a collaborative environment is more critical than ever — one that provides easier access and visibility into the data, reports and dashboards built against the data, reproducibility, and insights uncovered within the data.. Join us to hear how Databricks’ open and collaborative platform simplifies data science by enabling you to run all types of analytics workloads, from data preparation to exploratory analysis and predictive analytics, at scale — all on one unified platform.
Why APM Is Not the Same As ML MonitoringDatabricks
Application performance monitoring (APM) has become the cornerstone of software engineering allowing engineering teams to quickly identify and remedy production issues. However, as the world moves to intelligent software applications that are built using machine learning, traditional APM quickly becomes insufficient to identify and remedy production issues encountered in these modern software applications.
As a lead software engineer at NewRelic, my team built high-performance monitoring systems including Insights, Mobile, and SixthSense. As I transitioned to building ML Monitoring software, I found the architectural principles and design choices underlying APM to not be a good fit for this brand new world. In fact, blindly following APM designs led us down paths that would have been better left unexplored.
In this talk, I draw upon my (and my team’s) experience building an ML Monitoring system from the ground up and deploying it on customer workloads running large-scale ML training with Spark as well as real-time inference systems. I will highlight how the key principles and architectural choices of APM don’t apply to ML monitoring. You’ll learn why, understand what ML Monitoring can successfully borrow from APM, and hear what is required to build a scalable, robust ML Monitoring architecture.
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
Autonomy and ownership are core to working at Stitch Fix, particularly on the Algorithms team. We enable data scientists to deploy and operate their models independently, with minimal need for handoffs or gatekeeping. By writing a simple function and calling out to an intuitive API, data scientists can harness a suite of platform-provided tooling meant to make ML operations easy. In this talk, we will dive into the abstractions the Data Platform team has built to enable this. We will go over the interface data scientists use to specify a model and what that hooks into, including online deployment, batch execution on Spark, and metrics tracking and visualization.
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
In this talk, I will dive into the stage level scheduling feature added to Apache Spark 3.1. Stage level scheduling extends upon Project Hydrogen by improving big data ETL and AI integration and also enables multiple other use cases. It is beneficial any time the user wants to change container resources between stages in a single Apache Spark application, whether those resources are CPU, Memory or GPUs. One of the most popular use cases is enabling end-to-end scalable Deep Learning and AI to efficiently use GPU resources. In this type of use case, users read from a distributed file system, do data manipulation and filtering to get the data into a format that the Deep Learning algorithm needs for training or inference and then sends the data into a Deep Learning algorithm. Using stage level scheduling combined with accelerator aware scheduling enables users to seamlessly go from ETL to Deep Learning running on the GPU by adjusting the container requirements for different stages in Spark within the same application. This makes writing these applications easier and can help with hardware utilization and costs.
There are other ETL use cases where users want to change CPU and memory resources between stages, for instance there is data skew or perhaps the data size is much larger in certain stages of the application. In this talk, I will go over the feature details, cluster requirements, the API and use cases. I will demo how the stage level scheduling API can be used by Horovod to seamlessly go from data preparation to training using the Tensorflow Keras API using GPUs.
The talk will also touch on other new Apache Spark 3.1 functionality, such as pluggable caching, which can be used to enable faster dataframe access when operating from GPUs.
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
In this talk, I would like to introduce an open-source tool built by our team that simplifies the data conversion from Apache Spark to deep learning frameworks.
Imagine you have a large dataset, say 20 GBs, and you want to use it to train a TensorFlow model. Before feeding the data to the model, you need to clean and preprocess your data using Spark. Now you have your dataset in a Spark DataFrame. When it comes to the training part, you may have the problem: How can I convert my Spark DataFrame to some format recognized by my TensorFlow model?
The existing data conversion process can be tedious. For example, to convert an Apache Spark DataFrame to a TensorFlow Dataset file format, you need to either save the Apache Spark DataFrame on a distributed filesystem in parquet format and load the converted data with third-party tools such as Petastorm, or save it directly in TFRecord files with spark-tensorflow-connector and load it back using TFRecordDataset. Both approaches take more than 20 lines of code to manage the intermediate data files, rely on different parsing syntax, and require extra attention for handling vector columns in the Spark DataFrames. In short, all these engineering frictions greatly reduced the data scientists’ productivity.
The Databricks Machine Learning team contributed a new Spark Dataset Converter API to Petastorm to simplify these tedious data conversion process steps. With the new API, it takes a few lines of code to convert a Spark DataFrame to a TensorFlow Dataset or a PyTorch DataLoader with default parameters.
In the talk, I will use an example to show how to use the Spark Dataset Converter to train a Tensorflow model and how simple it is to go from single-node training to distributed training on Databricks.
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
There is no doubt Kubernetes has emerged as the next generation of cloud native infrastructure to support a wide variety of distributed workloads. Apache Spark has evolved to run both Machine Learning and large scale analytics workloads. There is growing interest in running Apache Spark natively on Kubernetes. By combining the flexibility of Kubernetes and scalable data processing with Apache Spark, you can run any data and machine pipelines on this infrastructure while effectively utilizing resources at disposal.
In this talk, Rajesh Thallam and Sougata Biswas will share how to effectively run your Apache Spark applications on Google Kubernetes Engine (GKE) and Google Cloud Dataproc, orchestrate the data and machine learning pipelines with managed Apache Airflow on GKE (Google Cloud Composer). Following topics will be covered: – Understanding key traits of Apache Spark on Kubernetes- Things to know when running Apache Spark on Kubernetes such as autoscaling- Demonstrate running analytics pipelines on Apache Spark orchestrated with Apache Airflow on Kubernetes cluster.
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
Pipelines have become ubiquitous, as the need for stringing multiple functions to compose applications has gained adoption and popularity. Common pipeline abstractions such as “fit” and “transform” are even shared across divergent platforms such as Python Scikit-Learn and Apache Spark.
Scaling pipelines at the level of simple functions is desirable for many AI applications, however is not directly supported by Ray’s parallelism primitives. In this talk, Raghu will describe a pipeline abstraction that takes advantage of Ray’s compute model to efficiently scale arbitrarily complex pipeline workflows. He will demonstrate how this abstraction cleanly unifies pipeline workflows across multiple platforms such as Scikit-Learn and Spark, and achieves nearly optimal scale-out parallelism on pipelined computations.
Attendees will learn how pipelined workflows can be mapped to Ray’s compute model and how they can both unify and accelerate their pipelines with Ray.
Sawtooth Windows for Feature AggregationsDatabricks
In this talk about zipline, we will introduce a new type of windowing construct called a sawtooth window. We will describe various properties about sawtooth windows that we utilize to achieve online-offline consistency, while still maintaining high-throughput, low-read latency and tunable write latency for serving machine learning features.We will also talk about a simple deployment strategy for correcting feature drift – due operations that are not “abelian groups”, that operate over change data.
We want to present multiple anti patterns utilizing Redis in unconventional ways to get the maximum out of Apache Spark.All examples presented are tried and tested in production at Scale at Adobe. The most common integration is spark-redis which interfaces with Redis as a Dataframe backing Store or as an upstream for Structured Streaming. We deviate from the common use cases to explore where Redis can plug gaps while scaling out high throughput applications in Spark.
Niche 1 : Long Running Spark Batch Job – Dispatch New Jobs by polling a Redis Queue
· Why?
o Custom queries on top a table; We load the data once and query N times
· Why not Structured Streaming
· Working Solution using Redis
Niche 2 : Distributed Counters
· Problems with Spark Accumulators
· Utilize Redis Hashes as distributed counters
· Precautions for retries and speculative execution
· Pipelining to improve performance
Re-imagine Data Monitoring with whylogs and SparkDatabricks
In the era of microservices, decentralized ML architectures and complex data pipelines, data quality has become a bigger challenge than ever. When data is involved in complex business processes and decisions, bad data can, and will, affect the bottom line. As a result, ensuring data quality across the entire ML pipeline is both costly, and cumbersome while data monitoring is often fragmented and performed ad hoc. To address these challenges, we built whylogs, an open source standard for data logging. It is a lightweight data profiling library that enables end-to-end data profiling across the entire software stack. The library implements a language and platform agnostic approach to data quality and data monitoring. It can work with different modes of data operations, including streaming, batch and IoT data.
In this talk, we will provide an overview of the whylogs architecture, including its lightweight statistical data collection approach and various integrations. We will demonstrate how the whylogs integration with Apache Spark achieves large scale data profiling, and we will show how users can apply this integration into existing data and ML pipelines.
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
Machine learning (ML) models are typically part of prediction queries that consist of a data processing part (e.g., for joining, filtering, cleaning, featurization) and an ML part invoking one or more trained models. In this presentation, we identify significant and unexplored opportunities for optimization. To the best of our knowledge, this is the first effort to look at prediction queries holistically, optimizing across both the ML and SQL components.
We will present Raven, an end-to-end optimizer for prediction queries. Raven relies on a unified intermediate representation that captures both data processing and ML operators in a single graph structure.
This allows us to introduce optimization rules that
(i) reduce unnecessary computations by passing information between the data processing and ML operators
(ii) leverage operator transformations (e.g., turning a decision tree to a SQL expression or an equivalent neural network) to map operators to the right execution engine, and
(iii) integrate compiler techniques to take advantage of the most efficient hardware backend (e.g., CPU, GPU) for each operator.
We have implemented Raven as an extension to Spark’s Catalyst optimizer to enable the optimization of SparkSQL prediction queries. Our implementation also allows the optimization of prediction queries in SQL Server. As we will show, Raven is capable of improving prediction query performance on Apache Spark and SQL Server by up to 13.1x and 330x, respectively. For complex models, where GPU acceleration is beneficial, Raven provides up to 8x speedup compared to state-of-the-art systems. As part of the presentation, we will also give a demo showcasing Raven in action.
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
Semantic segmentation is the classification of every pixel in an image/video. The segmentation partitions a digital image into multiple objects to simplify/change the representation of the image into something that is more meaningful and easier to analyze [1][2]. The technique has a wide variety of applications ranging from perception in autonomous driving scenarios to cancer cell segmentation for medical diagnosis.
Exponential growth in the datasets that require such segmentation is driven by improvements in the accuracy and quality of the sensors generating the data extending to 3D point cloud data. This growth is further compounded by exponential advances in cloud technologies enabling the storage and compute available for such applications. The need for semantically segmented datasets is a key requirement to improve the accuracy of inference engines that are built upon them.
Streamlining the accuracy and efficiency of these systems directly affects the value of the business outcome for organizations that are developing such functionalities as a part of their AI strategy.
This presentation details workflows for labeling, preprocessing, modeling, and evaluating performance/accuracy. Scientists and engineers leverage domain-specific features/tools that support the entire workflow from labeling the ground truth, handling data from a wide variety of sources/formats, developing models and finally deploying these models. Users can scale their deployments optimally on GPU-based cloud infrastructure to build accelerated training and inference pipelines while working with big datasets. These environments are optimized for engineers to develop such functionality with ease and then scale against large datasets with Spark-based clusters on the cloud.
Massive Data Processing in Adobe Using Delta LakeDatabricks
At Adobe Experience Platform, we ingest TBs of data every day and manage PBs of data for our customers as part of the Unified Profile Offering. At the heart of this is a bunch of complex ingestion of a mix of normalized and denormalized data with various linkage scenarios power by a central Identity Linking Graph. This helps power various marketing scenarios that are activated in multiple platforms and channels like email, advertisements etc. We will go over how we built a cost effective and scalable data pipeline using Apache Spark and Delta Lake and share our experiences.
What are we storing?
Multi Source – Multi Channel Problem
Data Representation and Nested Schema Evolution
Performance Trade Offs with Various formats
Go over anti-patterns used
(String FTW)
Data Manipulation using UDFs
Writer Worries and How to Wipe them Away
Staging Tables FTW
Datalake Replication Lag Tracking
Performance Time!
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
StarCompliance is a leading firm specializing in the recovery of stolen cryptocurrency. Our comprehensive services are designed to assist individuals and organizations in navigating the complex process of fraud reporting, investigation, and fund recovery. We combine cutting-edge technology with expert legal support to provide a robust solution for victims of crypto theft.
Our Services Include:
Reporting to Tracking Authorities:
We immediately notify all relevant centralized exchanges (CEX), decentralized exchanges (DEX), and wallet providers about the stolen cryptocurrency. This ensures that the stolen assets are flagged as scam transactions, making it impossible for the thief to use them.
Assistance with Filing Police Reports:
We guide you through the process of filing a valid police report. Our support team provides detailed instructions on which police department to contact and helps you complete the necessary paperwork within the critical 72-hour window.
Launching the Refund Process:
Our team of experienced lawyers can initiate lawsuits on your behalf and represent you in various jurisdictions around the world. They work diligently to recover your stolen funds and ensure that justice is served.
At StarCompliance, we understand the urgency and stress involved in dealing with cryptocurrency theft. Our dedicated team works quickly and efficiently to provide you with the support and expertise needed to recover your assets. Trust us to be your partner in navigating the complexities of the crypto world and safeguarding your investments.
2. About Us
Deb Lee
SENIOR CONSULTANT
MLOps Practice Lead
deb.lee@thorogood.com
Al McEwan
PRINCIPAL CONSULTANT
Solutions Architect, Databricks Champion, Global
Head of Capability Development
al.mcewan@thorogood.com
Independent, Specialist Data & AI Consultancy
US • UK • Singapore • Brazil • India
Databricks Partner Since 2018
www.thorogood.com
Data Science MLOps & DevOps
Data Engineering Data Visualization
5. Thinking must shift to embrace operationalization
• Sandbox environments
• Ad Hoc, Exploratory
• Low Commitment
• Familiar tool for the data
scientist
• Done locally or in non-
integrated environments
EXPERIMENTAL
• Automated
• Integrated
• Reusable
• Scalable
• Understood and trusted
• Cost efficient
• Ongoing experiments
OPERATIONAL
• Sandbox environments
• Ad Hoc, Exploratory
• Low Commitment
• Familiar tool for the data
scientist
• Done locally or in non-
integrated environments
EXPERIMENTAL
• Automated
• Integrated
• Reusable
• Scalable
• Understood and trusted
• Cost efficient
• Ongoing experiments
OPERATIONAL
• Sandbox environments
• Ad Hoc, Exploratory
• Low Commitment
• Familiar tool for the data
scientist
• Done locally or in non-
integrated environments
EXPERIMENTAL
• Automated
• Integrated
• Reusable
• Scalable
• Understood and trusted
• Cost efficient
• Ongoing experiments
OPERATIONAL
MLOps
6. Key Benefits of MLOps
SCALABILITY
Ability to scale horizontally and vertically,
consumption efficiencies from running data
engineering and data science at-scale
MODEL EVALUATION
Maintain and monitor model quality using standardized &
consolidated custom KPIs and model evaluation metrics
FAST FEEDBACK LOOP
Respond to business opportunities and changes
quickly, incorporate enhancements to product on
regular basis
REUSABLE ASSETS
Track, monitor, and identify reusable assets
(registered models, datasets, pipelines) to
increase efficiency & cost savings
MODEL TRACEABILITY
Create traceability & wider auditability using enterprise
model registries, experiment tracking, and monitoring
operations for greater observability
AUTOMATED MODEL TRAINING
Decrease manual dependencies using pipelines
configured to kick off automated retraining based on
defined triggers
REPRODUCIBILITY
Save time & create governance for product teams
by using tools that enable reproducibility of
experiments and model training
VERSION SECURITY & COMPATABILITY
Maintain security by using licensed packages on
tested versions, keep OS versions of clusters up to
date, keep all libraries and packages up to date
8. Establishing a Global MLOps Framework
Customer situation
In order to stay ahead, the customer recognized that a global coordinated
strategy and framework was needed to realize the benefits of MLOps
Investment in experimentation that has proven
valuable
Data science teams work in focused business
areas, following independent practices
Fortune Global 500
Consumer Goods
Company
• 190 countries
• 2.5 billion+ consumers
daily
• 400 brands
9. Establishing a Global MLOps Framework
Thorogood’s approach
Experimentation
ML models
operationalized
MLOps guidance,
recommendations &
artefacts, project-tested
Creation of reusable
Code & Pipeline
Accelerator templates
10. Establishing a Global MLOps Framework
Framework impact
REUSABILITY
As more products are onboarded, a central
function will improve reusability of existing assets
and help consolidate models and approaches
used across products.
TIME & COST SAVINGS
Reduce duplicative effort & apply responsible
cloud consumption principles to all projects,
receive cost efficiencies from consolidation of
operations.
SIMPLIFICATION
A centralized function will maintain
adherence to MLOps suggested standards to
simplify toolsets used and improve ways of
working for all teams.
CONTINUOUS IMPROVEMENT
The MLOps service will have dedicated teams
for ongoing operations and one-off activities
such as product enhancements &
industrialization efforts.
SCALABILITY
Enable data science projects to scale up
more quickly, rapidly realize a vision to
unlock business value using data science
in all areas of the organization.
RELIABILITY
Build greater trust and confidence from business
users and data science teams by allowing them
to realize the value of MLOps delivered using a
consistent and high-quality methodology
Customer’s
Global MLOps
Service
13. People
Real-world ML Systems
Reference: “Hidden Technical Debt in Machine Learning Systems” by D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips 2015
Configuration
Data Collection
Feature Extraction
ML
Code
Data Verification Machine
Resource
Management
Analysis Tools
Process Management Tools
Serving
Infrastructure
Monitoring
14. People
MLOps Requires Data Scientists who understand both Scale and Reproducibility
ML Code: could be relatively small, but key to success
Data Scientist skillset specialization
Training in making code scalable, efficient and reproducible
15. People
Blend of capabilities and skills needed depends on the engagement
Machine Learning Engineering
Data Science
Data Engineering
Data Visualization
Solution Architect
Program Management
Scenario 1
Operationalization of a
use case requiring:
• Real-time model
serving capabilities
• Web application
interface and backend
• Creation of data
engineering and data
science pipelines
• Scripted management
and versioning of
compute, datastore,
datasets, pipelines
Scenario 2
Continuous improvements
to baseline monitoring
operations requiring:
• Create automatically
refreshed monitoring
dashboards
• Enhance tracking of
and reporting on drift
and other scoring
metrics alongside
experiment tracking
• Design for various
target audiences: data
scientists, ML support
engineers, business
users
16. Processes
Key Takeaways & Learnings – Artefacts Created
There are a number of moving parts and handshakes needed for a centralized MLOps service to function and teams to be in sync.
Without a defined framework and process, it’s hard to be successful.
QUESTIONNAIRE
Used to qualify use
cases & projects in the
pipeline for
onboarding to MLOps
service
ML TEST SCORE
Measures the overall
readiness of the ML
system for production
DECISION TREE
For anyone embarking on a
data science project, guide on
tools to use considering
training volumes, libraries,
serving method,
parallelization, retraining
frequency
PLAYBOOK
Guidelines for
experimentation and
operationalization to
streamline the MLOps
process
REPRODUCIBILITY
CHECKLIST
Requires code versioning,
data versioning, model
versioning in model
registry, cluster
configuration, environment
specification
Reference: “The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction” by E.Breck et al. 2017
17. Tools
Decision trees helping to guide tool selection at critical junctures
How many models are being
built?
A large model spanning the
entire business
One model per dimension (i.e.
per product)
We recommend use of Spark’s
MLLib if model is trained on a
big dataset (>0.5GB)
We recommended use of
Spark’s MLLib if cross-
validation scenarios exist
Non-Spark options can be
considered for smaller training
datasets
Non-Spark options can be
considered for this scenario
Training & Evaluation
Orchestration
Deployment
Tracking
Experimentation Initial Industrialization
Model Monitoring &
Enhancements
Considerations:
Decision Points:
…
…
…
…
…
…
…
…
18. Tools
Databricks is Optimally Positioned to Support MLOps
Databricks Spark: Optimized for
large training data volumes per
model
Best-in-class and widely used for
data science experiments
Multi-Cloud ready:
available on Azure, AWS, and GCP
Unifies requisite data engineering &
data science capabilities with in-built
functions
MLFlow provides a powerful platform
to manage the ML lifecycle
Integrated with serving and reporting
technologies
19. How to get started
Ø Assess your current state
Ø Define your target state
Ø Refine your approach to People, Tools and Processes
Ø Educate yourself on the ‘art of the possible’
• Check out our MLOps Resource Hub for useful content at www.thorogood.com
• Most importantly, please reach out to us with any questions or feedback on this topic
CONTACT US
Deb Lee
deb.lee@thorogood.com
Al McEwan
al.mcewan@thorogood.com