Productionzing ML Model Using MLflow Model ServingDatabricks
Productionzing ML Models are needs to ensure model integrity while it efficiently replicate runtime environments across servers besides it keep track of how each of our models were created. It helps us better trace the root cause of changes and issues over time as we acquire new data and update our model. We have greater accountability over our models and the results they generate.
MLflow Model Serving delivers cost-effective and on-click deployment of model for real-time inferences. Also the Model Version deployed in the Model Serving can also be conveniently managed with MLflow Model Registry. We will going to cover following topics Deployment, Consumption and Monitoring. For deployment, we will demo the different version deployment and validate the deployment. For consumption, we demo connecting power bi and generate prediction report using ML Model deployed in MLflow serving. Lastly will wrap up with managing the MLflow serving like, access rights and monitoring capabilities.
Machine Learning operations brings data science to the world of devops. Data scientists create models on their workstations. MLOps adds automation, validation and monitoring to any environment including machine learning on kubernetes. In this session you hear about latest developments and see it in action.
Using MLOps to Bring ML to Production/The Promise of MLOpsWeaveworks
In this final Weave Online User Group of 2019, David Aronchick asks: have you ever struggled with having different environments to build, train and serve ML models, and how to orchestrate between them? While DevOps and GitOps have made huge traction in recent years, many customers struggle to apply these practices to ML workloads. This talk will focus on the ways MLOps has helped to effectively infuse AI into production-grade applications through establishing practices around model reproducibility, validation, versioning/tracking, and safe/compliant deployment. We will also talk about the direction for MLOps as an industry, and how we can use it to move faster, with more stability, than ever before.
The recording of this session is on our YouTube Channel here: https://youtu.be/twsxcwgB0ZQ
Speaker: David Aronchick, Head of Open Source ML Strategy, Microsoft
Bio: David leads Open Source Machine Learning Strategy at Azure. This means he spends most of his time helping humans to convince machines to be smarter. He is only moderately successful at this. Previously, David led product management for Kubernetes at Google, launched GKE, and co-founded the Kubeflow project. David has also worked at Microsoft, Amazon and Chef and co-founded three startups.
Sign up for a free Machine Learning Ops Workshop: http://bit.ly/MLOps_Workshop_List
Weaveworks will cover concepts such as GitOps (operations by pull request), Progressive Delivery (canary, A/B, blue-green), and how to apply those approaches to your machine learning operations to mitigate risk.
How to Utilize MLflow and Kubernetes to Build an Enterprise ML PlatformDatabricks
In large enterprises, large solutions are sometimes required to tackle even the smallest tasks and ML is no different. At Comcast we are building a comprehensive, configuration based, continuously integrated and deployed platform for data pipeline transformations, model development and deployment. This is accomplished using a range of tools and frameworks such as Databricks, MLflow, Apache Spark and others. With a Databricks environment used by hundreds of researchers and petabytes of data, scale is critical to Comcast, so making it all work together in a frictionless experience is a high priority. The platform consists of a number of components: an abstraction for data pipelines and transformation to allow our data scientists the freedom to combine the most appropriate algorithms from different frameworks , experiment tracking, project and model packaging using MLflow and model serving via the Kubeflow environment on Kubernetes. The architecture, progress and current state of the platform will be discussed as well as the challenges we had to overcome to make this platform work at Comcast scale. As a machine learning practitioner, you will gain knowledge in: an example of data pipeline abstraction; ways to package and track your ML project and experiments at scale; and how Comcast uses Kubeflow on Kubernetes to bring everything together.
ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. To address these problems, many companies are building custom “ML platforms” that automate this lifecycle, but even these platforms are limited to a few supported algorithms and to each company’s internal infrastructure. In this talk, I present MLflow, a new open source project from Databricks that aims to design an open ML platform where organizations can use any ML library and development tool of their choice to reliably build and share ML applications. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size.
Discuss the different ways model can be served with MLflow. We will cover both the open source MLflow and Databricks managed MLflow ways to serve models. Will cover the basic differences between batch scoring and real-time scoring. Special emphasis on the new upcoming Databricks production-ready model serving.
Productionzing ML Model Using MLflow Model ServingDatabricks
Productionzing ML Models are needs to ensure model integrity while it efficiently replicate runtime environments across servers besides it keep track of how each of our models were created. It helps us better trace the root cause of changes and issues over time as we acquire new data and update our model. We have greater accountability over our models and the results they generate.
MLflow Model Serving delivers cost-effective and on-click deployment of model for real-time inferences. Also the Model Version deployed in the Model Serving can also be conveniently managed with MLflow Model Registry. We will going to cover following topics Deployment, Consumption and Monitoring. For deployment, we will demo the different version deployment and validate the deployment. For consumption, we demo connecting power bi and generate prediction report using ML Model deployed in MLflow serving. Lastly will wrap up with managing the MLflow serving like, access rights and monitoring capabilities.
Machine Learning operations brings data science to the world of devops. Data scientists create models on their workstations. MLOps adds automation, validation and monitoring to any environment including machine learning on kubernetes. In this session you hear about latest developments and see it in action.
Using MLOps to Bring ML to Production/The Promise of MLOpsWeaveworks
In this final Weave Online User Group of 2019, David Aronchick asks: have you ever struggled with having different environments to build, train and serve ML models, and how to orchestrate between them? While DevOps and GitOps have made huge traction in recent years, many customers struggle to apply these practices to ML workloads. This talk will focus on the ways MLOps has helped to effectively infuse AI into production-grade applications through establishing practices around model reproducibility, validation, versioning/tracking, and safe/compliant deployment. We will also talk about the direction for MLOps as an industry, and how we can use it to move faster, with more stability, than ever before.
The recording of this session is on our YouTube Channel here: https://youtu.be/twsxcwgB0ZQ
Speaker: David Aronchick, Head of Open Source ML Strategy, Microsoft
Bio: David leads Open Source Machine Learning Strategy at Azure. This means he spends most of his time helping humans to convince machines to be smarter. He is only moderately successful at this. Previously, David led product management for Kubernetes at Google, launched GKE, and co-founded the Kubeflow project. David has also worked at Microsoft, Amazon and Chef and co-founded three startups.
Sign up for a free Machine Learning Ops Workshop: http://bit.ly/MLOps_Workshop_List
Weaveworks will cover concepts such as GitOps (operations by pull request), Progressive Delivery (canary, A/B, blue-green), and how to apply those approaches to your machine learning operations to mitigate risk.
How to Utilize MLflow and Kubernetes to Build an Enterprise ML PlatformDatabricks
In large enterprises, large solutions are sometimes required to tackle even the smallest tasks and ML is no different. At Comcast we are building a comprehensive, configuration based, continuously integrated and deployed platform for data pipeline transformations, model development and deployment. This is accomplished using a range of tools and frameworks such as Databricks, MLflow, Apache Spark and others. With a Databricks environment used by hundreds of researchers and petabytes of data, scale is critical to Comcast, so making it all work together in a frictionless experience is a high priority. The platform consists of a number of components: an abstraction for data pipelines and transformation to allow our data scientists the freedom to combine the most appropriate algorithms from different frameworks , experiment tracking, project and model packaging using MLflow and model serving via the Kubeflow environment on Kubernetes. The architecture, progress and current state of the platform will be discussed as well as the challenges we had to overcome to make this platform work at Comcast scale. As a machine learning practitioner, you will gain knowledge in: an example of data pipeline abstraction; ways to package and track your ML project and experiments at scale; and how Comcast uses Kubeflow on Kubernetes to bring everything together.
ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. To address these problems, many companies are building custom “ML platforms” that automate this lifecycle, but even these platforms are limited to a few supported algorithms and to each company’s internal infrastructure. In this talk, I present MLflow, a new open source project from Databricks that aims to design an open ML platform where organizations can use any ML library and development tool of their choice to reliably build and share ML applications. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size.
Discuss the different ways model can be served with MLflow. We will cover both the open source MLflow and Databricks managed MLflow ways to serve models. Will cover the basic differences between batch scoring and real-time scoring. Special emphasis on the new upcoming Databricks production-ready model serving.
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...Databricks
Airbnb has a wide variety of ML problems ranging from models on traditional structured data to models built on unstructured data such as user reviews, messages and listing images. The ability to build, iterate on, and maintain healthy machine learning models is critical to Airbnb’s success. Many ML Platforms cover data collection, feature engineering, training, deploying, productionalization, and monitoring but few, if any, do all of the above seamlessly.
Bighead aims to tie together various open source and in-house projects to remove incidental complexity from ML workflows. Bighead is built on Python and Spark and can be used in modular pieces as each ML problem presents unique challenges. Through standardization of the path to production, training environments and the methods for collecting and transforming data on Spark, each model is reproducible and iterable.
This talk covers the architecture, the problems that each individual component and the overall system aims to solve, and a vision for the future of machine learning infrastructure. It’s widely adapted in Airbnb and we have variety of models running in production. We have seen the overall model development time go down from many months to days on Bighead. We plan to open source Bighead to allow the wider community to benefit from our work.
Simplifying Model Management with MLflowDatabricks
<p>Last summer, Databricks launched MLflow, an open source platform to manage the machine learning lifecycle, including experiment tracking, reproducible runs and model packaging. MLflow has grown quickly since then, with over 120 contributors from dozens of companies, including major contributions from R Studio and Microsoft. It has also gained new capabilities such as automatic logging from TensorFlow and Keras, Kubernetes integrations, and a high-level Java API. In this talk, we’ll cover some of the new features that have come to MLflow, and then focus on a major upcoming feature: model management with the MLflow Model Registry. Many organizations face challenges tracking which models are available in the organization and which ones are in production. The MLflow Model Registry provides a centralized database to keep track of these models, share and describe new model versions, and deploy the latest version of a model through APIs. We’ll demonstrate how these features can simplify common ML lifecycle tasks.</p>
MLOps (a compound of “machine learning” and “operations”) is a practice for collaboration and communication between data scientists and operations professionals to help manage the production machine learning lifecycle. Similar to the DevOps term in the software development world, MLOps looks to increase automation and improve the quality of production ML while also focusing on business and regulatory requirements. MLOps applies to the entire ML lifecycle - from integrating with model generation (software development lifecycle, continuous integration/continuous delivery), orchestration, and deployment, to health, diagnostics, governance, and business metrics.
To watch the full presentation click here: https://info.cnvrg.io/mlopsformachinelearning
In this webinar, we’ll discuss core practices in MLOps that will help data science teams scale to the enterprise level. You’ll learn the primary functions of MLOps, and what tasks are suggested to accelerate your teams machine learning pipeline. Join us in a discussion with cnvrg.io Solutions Architect, Aaron Schneider, and learn how teams use MLOps for more productive machine learning workflows.
- Reduce friction between science and engineering
- Deploy your models to production faster
- Health, diagnostics and governance of ML models
- Kubernetes as a core platform for MLOps
- Support advanced use-cases like continual learning with MLOps
Intro to Vertex AI, unified MLOps platform for Data Scientists & ML EngineersDaniel Zivkovic
#MLOps is a hot buzzword, just like #DevOps before it. It sparked a gold rush for software vendors, so it's hard to choose the best tool for your needs. Vertex AI is a unified MLOps platform for the entire #AI #workflow on #GoogleCloud. It is the 3rd iteration of the Google Cloud #ML platform (since its original launch), and we think they did it right (this time).
That's why #ServerlessTO invited 2 AI/ML gurus from #GCP (Jarek Kazmierczak & Brian Kang) to introduce the #VertexAI you to.
The lecture recording with Q&A is at https://youtu.be/X1S7360ip-k
MEETUP "CODE-ALONG" RESOURCES
Vertex workbench - Managed and User-managed Notebooks
https://cloud.google.com/vertex-ai/docs/workbench/managed/quickstarts
Example that the training code was based on - Fashion MNIST dataset
https://www.tensorflow.org/tutorials/keras/classification
Hyperparameter tuning codelab
https://codelabs.developers.google.com/vertex_hyperparameter_tuning
Vertex pipeline codelabs
https://codelabs.developers.google.com/vertex-pipelines-intro
https://codelabs.developers.google.com/vertex-pipelines-custom-model
CI/CD slides
https://github.com/shivajid/MLOpsCICD/blob/master/presentation/AI%20Workshop%20Day4.pdf
CI/CD github example
https://github.com/shivajid/MLOpsCICD
Model monitoring example
https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/master/notebooks/official/model_monitoring/model_monitoring.ipynb
Best practices for MLOps
https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning
https://cloud.google.com/resources/mlops-whitepaper
Official Vertex AI Github repository
https://github.com/GoogleCloudPlatform/vertex-ai-samples/
MEETUP CHAT LINKS
https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/master/notebooks/notebook_template.ipynb
https://github.com/GoogleCloudPlatform/vertex-ai-samples/tree/master/notebooks/official/custom
https://github.com/GoogleCloudPlatform/vertex-ai-samples/tree/master/notebooks/community/sdk
https://cloud.google.com/architecture/ml-on-gcp-best-practices#model-deployment-and-serving
https://www.youtube.com/watch?v=ntBEQdD1IeQ&list=PLd31CCJlr9FrZazLqRg1Lxq7xw9b6VNP6&index=3
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
Specialized tools for machine learning development and model governance are becoming essential. MlFlow is an open source platform for managing the machine learning lifecycle. Just by adding a few lines of code in the function or script that trains their model, data scientists can log parameters, metrics, artifacts (plots, miscellaneous files, etc.) and a deployable packaging of the ML model. Every time that function or script is run, the results will be logged automatically as a byproduct of those lines of code being added, even if the party doing the training run makes no special effort to record the results. MLflow application programming interfaces (APIs) are available for the Python, R and Java programming languages, and MLflow sports a language-agnostic REST API as well. Over a relatively short time period, MLflow has garnered more than 3,300 stars on GitHub , almost 500,000 monthly downloads and 80 contributors from more than 40 companies. Most significantly, more than 200 companies are now using MLflow. We will demo MlFlow Tracking , Project and Model components with Azure Machine Learning (AML) Services and show you how easy it is to get started with MlFlow on-prem or in the cloud.
MLflow: Infrastructure for a Complete Machine Learning Life CycleDatabricks
ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. To address these problems, many companies are building custom “ML platforms” that automate this lifecycle, but even these platforms are limited to a few supported algorithms and to each company’s internal infrastructure.
In this talk, we will present MLflow, a new open source project from Databricks that aims to design an open ML platform where organizations can use any ML library and development tool of their choice to reliably build and share ML applications. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size.
Managing the Complete Machine Learning Lifecycle with MLflowDatabricks
ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models.
To solve for these challenges, Databricks unveiled last year MLflow, an open source project that aims at simplifying the entire ML lifecycle. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size.
In the past year, the MLflow community has grown quickly: over 120 contributors from over 40 companies have contributed code to the project, and over 200 companies are using MLflow.
In this tutorial, we will show you how using MLflow can help you:
Keep track of experiments runs and results across frameworks.
Execute projects remotely on to a Databricks cluster, and quickly reproduce your runs.
Quickly productionize models using Databricks production jobs, Docker containers, Azure ML, or Amazon SageMaker.
We will demo the building blocks of MLflow as well as the most recent additions since the 1.0 release.
What you will learn:
Understand the three main components of open source MLflow (MLflow Tracking, MLflow Projects, MLflow Models) and how each help address challenges of the ML lifecycle.
How to use MLflow Tracking to record and query experiments: code, data, config, and results.
How to use MLflow Projects packaging format to reproduce runs on any platform.
How to use MLflow Models general format to send models to diverse deployment tools.
Prerequisites:
A fully-charged laptop (8-16GB memory) with Chrome or Firefox
Python 3 and pip pre-installed
Pre-Register for a Databricks Standard Trial
Basic knowledge of Python programming language
Basic understanding of Machine Learning Concepts
In this talk, I present an introduction of MLFlow. I also show some examples of using it by means of MLFlow Tracking, MLFlow Projects and MLFlow Models. I also used Databricks as an example of remote tracking.
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...Databricks
ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools, and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. To address these problems, many companies are building custom “ML platforms” that automate this lifecycle, but even these platforms are limited to a few supported algorithms and to each company’s internal infrastructure. In this session, we introduce MLflow, a new open source project from Databricks that aims to design an open ML platform where organizations can use any ML library and development tool of their choice to reliably build and share ML applications. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size. In this deep-dive session, through a complete ML model life-cycle example, you will walk away with:
MLflow concepts and abstractions for models, experiments, and projects
How to get started with MLFlow
Understand aspects of MLflow APIs
Using tracking APIs during model training
Using MLflow UI to visually compare and contrast experimental runs with different tuning parameters and evaluate metrics
Package, save, and deploy an MLflow model
Serve it using MLflow REST API
What’s next and how to contribute
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
Looking to build a robust machine learning infrastructure to streamline MLOps? Learn from Provectus experts how to ensure the success of your MLOps initiative by implementing Data QA components in your ML infrastructure.
For most organizations, the development of multiple machine learning models, their deployment and maintenance in production are relatively new tasks. Join Provectus as we explain how to build an end-to-end infrastructure for machine learning, with a focus on data quality and metadata management, to standardize and streamline machine learning life cycle management (MLOps).
Agenda
- Data Quality and why it matters
- Challenges and solutions of Data Testing
- Challenges and solutions of Model Testing
- MLOps pipelines and why they matter
- How to expand validation pipelines for Data Quality
MLFlow: Platform for Complete Machine Learning Lifecycle Databricks
Description
Data Science and ML development bring many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools, and parameters to get the best results, and they need to track this information to reproduce work.
MLflow addresses some of these challenges during an ML model development cycle.
Abstract
ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools, and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. To address these problems, many companies are building custom “ML platforms” that automate this lifecycle, but even these platforms are limited to a few supported algorithms and to each company’s internal infrastructure.
In this session, we introduce MLflow, a new open source project from Databricks that aims to design an open ML platform where organizations can use any ML library and development tool of their choice to reliably build and share ML applications. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size.
With a short demo, you see a complete ML model life-cycle example, you will walk away with: MLflow concepts and abstractions for models, experiments, and projects How to get started with MLFlow Using tracking Python APIs during model training Using MLflow UI to visually compare and contrast experimental runs with different tuning parameters and evaluate metrics
Reproducible AI using MLflow and PyTorchDatabricks
Model reproducibility is becoming the next frontier for successful AI models building and deployments for both Research and Production scenarios. In this talk, we will show you how to build reproducible AI models and workflows using PyTorch and MLflow that can be shared across your teams, with traceability and speed up collaboration for AI projects.
End to end Machine Learning using Kubeflow - Build, Train, Deploy and ManageAnimesh Singh
With the breadth of sheer functionalities which need to be addressed in the Machine Learning world around building, training, serving and managing models, getting it done in a consistent, composable, portable, and scalable manner is hard. The Kubernetes framework is well suited to address these issues, which is why it's a great foundation for deploying ML workloads. Kubeflow is designed to take advantage of these benefits. In this talk, we are going to address how to make it easy for everyone to develop, deploy, and manage portable, scalable ML everywhere and support the full lifecycle Machine Learning using open source technologies like Kubeflow, Tensorflow, PyTorch,Tekton, Knative, Istio and others. We are going to discuss how to enable distributed training of models, model serving, canary rollouts, drift detection, model explainability, metadata management, pipelines and others. Additionally we will discuss Watson productization in progress based on Kubeflow Pipelines and Tekton, and point to Kubeflow Dojo materials and follow-on workshops.
The catalyst for the success of automobiles came not through the invention of the car but rather through the establishment of an innovative assembly line. History shows us that the ability to mass produce and distribute a product is the key to driving adoption of any innovation, and machine learning is no different. MLOps is the assembly line of Machine Learning and in this presentation we will discuss the core capabilities your organization should be focused on to implement a successful MLOps system.
Dmitry Kan, Principal AI Scientist at Silo AI and host of the Vector Podcast [1], will give an overview of the landscape of vector search databases and their role in NLP, along with the latest news and his view on the future of vector search. Further, he will share how he and his team participated in the Billion-Scale Approximate Nearest Neighbor Challenge and improved recall by 12% over a baseline FAISS.
Presented at https://www.meetup.com/open-nlp-meetup/events/282678520/
YouTube: https://www.youtube.com/watch?v=RM0uuMiqO8s&t=179s
Follow Vector Podcast to stay up to date on this topic: https://www.youtube.com/@VectorPodcast
Vertex AI: Pipelines for your MLOps workflowsMárton Kodok
In recent years, one of the biggest trends in applications development has been the rise of Machine Learning solutions, tools, and managed platforms. Vertex AI is a managed unified ML platform for all your AI workloads. On the MLOps side, Vertex AI Pipelines solutions let you adopt experiment pipelining beyond the classic build, train, eval, and deploy a model. It is engineered for data scientists and data engineers, and it’s a tremendous help for those teams who don’t have DevOps or sysadmin engineers, as infrastructure management overhead has been almost completely eliminated.
Based on practical examples we will demonstrate how Vertex AI Pipelines scores high in terms of developer experience, how fits custom ML needs, and analyze results. It’s a toolset for a fully-fledged machine learning workflow, a sequence of steps in the model development, a deployment cycle, such as data preparation/validation, model training, hyperparameter tuning, model validation, and model deployment. Vertex AI comes with all standard resources plus an ML metadata store, a fully managed feature store, and a fully managed pipelines runner.
Vertex AI Pipelines is a managed serverless toolkit, which means you don't have to fiddle with infrastructure or back-end resources to run workflows.
Come può .NET contribuire alla Data Science? Cosa è .NET Interactive? Cosa c'entrano i notebook? E Apache Spark? E il pythonismo? E Azure? Vediamo in questa sessione di mettere in ordine le idee.
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...Databricks
Airbnb has a wide variety of ML problems ranging from models on traditional structured data to models built on unstructured data such as user reviews, messages and listing images. The ability to build, iterate on, and maintain healthy machine learning models is critical to Airbnb’s success. Many ML Platforms cover data collection, feature engineering, training, deploying, productionalization, and monitoring but few, if any, do all of the above seamlessly.
Bighead aims to tie together various open source and in-house projects to remove incidental complexity from ML workflows. Bighead is built on Python and Spark and can be used in modular pieces as each ML problem presents unique challenges. Through standardization of the path to production, training environments and the methods for collecting and transforming data on Spark, each model is reproducible and iterable.
This talk covers the architecture, the problems that each individual component and the overall system aims to solve, and a vision for the future of machine learning infrastructure. It’s widely adapted in Airbnb and we have variety of models running in production. We have seen the overall model development time go down from many months to days on Bighead. We plan to open source Bighead to allow the wider community to benefit from our work.
Simplifying Model Management with MLflowDatabricks
<p>Last summer, Databricks launched MLflow, an open source platform to manage the machine learning lifecycle, including experiment tracking, reproducible runs and model packaging. MLflow has grown quickly since then, with over 120 contributors from dozens of companies, including major contributions from R Studio and Microsoft. It has also gained new capabilities such as automatic logging from TensorFlow and Keras, Kubernetes integrations, and a high-level Java API. In this talk, we’ll cover some of the new features that have come to MLflow, and then focus on a major upcoming feature: model management with the MLflow Model Registry. Many organizations face challenges tracking which models are available in the organization and which ones are in production. The MLflow Model Registry provides a centralized database to keep track of these models, share and describe new model versions, and deploy the latest version of a model through APIs. We’ll demonstrate how these features can simplify common ML lifecycle tasks.</p>
MLOps (a compound of “machine learning” and “operations”) is a practice for collaboration and communication between data scientists and operations professionals to help manage the production machine learning lifecycle. Similar to the DevOps term in the software development world, MLOps looks to increase automation and improve the quality of production ML while also focusing on business and regulatory requirements. MLOps applies to the entire ML lifecycle - from integrating with model generation (software development lifecycle, continuous integration/continuous delivery), orchestration, and deployment, to health, diagnostics, governance, and business metrics.
To watch the full presentation click here: https://info.cnvrg.io/mlopsformachinelearning
In this webinar, we’ll discuss core practices in MLOps that will help data science teams scale to the enterprise level. You’ll learn the primary functions of MLOps, and what tasks are suggested to accelerate your teams machine learning pipeline. Join us in a discussion with cnvrg.io Solutions Architect, Aaron Schneider, and learn how teams use MLOps for more productive machine learning workflows.
- Reduce friction between science and engineering
- Deploy your models to production faster
- Health, diagnostics and governance of ML models
- Kubernetes as a core platform for MLOps
- Support advanced use-cases like continual learning with MLOps
Intro to Vertex AI, unified MLOps platform for Data Scientists & ML EngineersDaniel Zivkovic
#MLOps is a hot buzzword, just like #DevOps before it. It sparked a gold rush for software vendors, so it's hard to choose the best tool for your needs. Vertex AI is a unified MLOps platform for the entire #AI #workflow on #GoogleCloud. It is the 3rd iteration of the Google Cloud #ML platform (since its original launch), and we think they did it right (this time).
That's why #ServerlessTO invited 2 AI/ML gurus from #GCP (Jarek Kazmierczak & Brian Kang) to introduce the #VertexAI you to.
The lecture recording with Q&A is at https://youtu.be/X1S7360ip-k
MEETUP "CODE-ALONG" RESOURCES
Vertex workbench - Managed and User-managed Notebooks
https://cloud.google.com/vertex-ai/docs/workbench/managed/quickstarts
Example that the training code was based on - Fashion MNIST dataset
https://www.tensorflow.org/tutorials/keras/classification
Hyperparameter tuning codelab
https://codelabs.developers.google.com/vertex_hyperparameter_tuning
Vertex pipeline codelabs
https://codelabs.developers.google.com/vertex-pipelines-intro
https://codelabs.developers.google.com/vertex-pipelines-custom-model
CI/CD slides
https://github.com/shivajid/MLOpsCICD/blob/master/presentation/AI%20Workshop%20Day4.pdf
CI/CD github example
https://github.com/shivajid/MLOpsCICD
Model monitoring example
https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/master/notebooks/official/model_monitoring/model_monitoring.ipynb
Best practices for MLOps
https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning
https://cloud.google.com/resources/mlops-whitepaper
Official Vertex AI Github repository
https://github.com/GoogleCloudPlatform/vertex-ai-samples/
MEETUP CHAT LINKS
https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/master/notebooks/notebook_template.ipynb
https://github.com/GoogleCloudPlatform/vertex-ai-samples/tree/master/notebooks/official/custom
https://github.com/GoogleCloudPlatform/vertex-ai-samples/tree/master/notebooks/community/sdk
https://cloud.google.com/architecture/ml-on-gcp-best-practices#model-deployment-and-serving
https://www.youtube.com/watch?v=ntBEQdD1IeQ&list=PLd31CCJlr9FrZazLqRg1Lxq7xw9b6VNP6&index=3
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
Specialized tools for machine learning development and model governance are becoming essential. MlFlow is an open source platform for managing the machine learning lifecycle. Just by adding a few lines of code in the function or script that trains their model, data scientists can log parameters, metrics, artifacts (plots, miscellaneous files, etc.) and a deployable packaging of the ML model. Every time that function or script is run, the results will be logged automatically as a byproduct of those lines of code being added, even if the party doing the training run makes no special effort to record the results. MLflow application programming interfaces (APIs) are available for the Python, R and Java programming languages, and MLflow sports a language-agnostic REST API as well. Over a relatively short time period, MLflow has garnered more than 3,300 stars on GitHub , almost 500,000 monthly downloads and 80 contributors from more than 40 companies. Most significantly, more than 200 companies are now using MLflow. We will demo MlFlow Tracking , Project and Model components with Azure Machine Learning (AML) Services and show you how easy it is to get started with MlFlow on-prem or in the cloud.
MLflow: Infrastructure for a Complete Machine Learning Life CycleDatabricks
ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. To address these problems, many companies are building custom “ML platforms” that automate this lifecycle, but even these platforms are limited to a few supported algorithms and to each company’s internal infrastructure.
In this talk, we will present MLflow, a new open source project from Databricks that aims to design an open ML platform where organizations can use any ML library and development tool of their choice to reliably build and share ML applications. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size.
Managing the Complete Machine Learning Lifecycle with MLflowDatabricks
ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models.
To solve for these challenges, Databricks unveiled last year MLflow, an open source project that aims at simplifying the entire ML lifecycle. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size.
In the past year, the MLflow community has grown quickly: over 120 contributors from over 40 companies have contributed code to the project, and over 200 companies are using MLflow.
In this tutorial, we will show you how using MLflow can help you:
Keep track of experiments runs and results across frameworks.
Execute projects remotely on to a Databricks cluster, and quickly reproduce your runs.
Quickly productionize models using Databricks production jobs, Docker containers, Azure ML, or Amazon SageMaker.
We will demo the building blocks of MLflow as well as the most recent additions since the 1.0 release.
What you will learn:
Understand the three main components of open source MLflow (MLflow Tracking, MLflow Projects, MLflow Models) and how each help address challenges of the ML lifecycle.
How to use MLflow Tracking to record and query experiments: code, data, config, and results.
How to use MLflow Projects packaging format to reproduce runs on any platform.
How to use MLflow Models general format to send models to diverse deployment tools.
Prerequisites:
A fully-charged laptop (8-16GB memory) with Chrome or Firefox
Python 3 and pip pre-installed
Pre-Register for a Databricks Standard Trial
Basic knowledge of Python programming language
Basic understanding of Machine Learning Concepts
In this talk, I present an introduction of MLFlow. I also show some examples of using it by means of MLFlow Tracking, MLFlow Projects and MLFlow Models. I also used Databricks as an example of remote tracking.
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...Databricks
ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools, and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. To address these problems, many companies are building custom “ML platforms” that automate this lifecycle, but even these platforms are limited to a few supported algorithms and to each company’s internal infrastructure. In this session, we introduce MLflow, a new open source project from Databricks that aims to design an open ML platform where organizations can use any ML library and development tool of their choice to reliably build and share ML applications. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size. In this deep-dive session, through a complete ML model life-cycle example, you will walk away with:
MLflow concepts and abstractions for models, experiments, and projects
How to get started with MLFlow
Understand aspects of MLflow APIs
Using tracking APIs during model training
Using MLflow UI to visually compare and contrast experimental runs with different tuning parameters and evaluate metrics
Package, save, and deploy an MLflow model
Serve it using MLflow REST API
What’s next and how to contribute
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
Looking to build a robust machine learning infrastructure to streamline MLOps? Learn from Provectus experts how to ensure the success of your MLOps initiative by implementing Data QA components in your ML infrastructure.
For most organizations, the development of multiple machine learning models, their deployment and maintenance in production are relatively new tasks. Join Provectus as we explain how to build an end-to-end infrastructure for machine learning, with a focus on data quality and metadata management, to standardize and streamline machine learning life cycle management (MLOps).
Agenda
- Data Quality and why it matters
- Challenges and solutions of Data Testing
- Challenges and solutions of Model Testing
- MLOps pipelines and why they matter
- How to expand validation pipelines for Data Quality
MLFlow: Platform for Complete Machine Learning Lifecycle Databricks
Description
Data Science and ML development bring many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools, and parameters to get the best results, and they need to track this information to reproduce work.
MLflow addresses some of these challenges during an ML model development cycle.
Abstract
ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools, and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. To address these problems, many companies are building custom “ML platforms” that automate this lifecycle, but even these platforms are limited to a few supported algorithms and to each company’s internal infrastructure.
In this session, we introduce MLflow, a new open source project from Databricks that aims to design an open ML platform where organizations can use any ML library and development tool of their choice to reliably build and share ML applications. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size.
With a short demo, you see a complete ML model life-cycle example, you will walk away with: MLflow concepts and abstractions for models, experiments, and projects How to get started with MLFlow Using tracking Python APIs during model training Using MLflow UI to visually compare and contrast experimental runs with different tuning parameters and evaluate metrics
Reproducible AI using MLflow and PyTorchDatabricks
Model reproducibility is becoming the next frontier for successful AI models building and deployments for both Research and Production scenarios. In this talk, we will show you how to build reproducible AI models and workflows using PyTorch and MLflow that can be shared across your teams, with traceability and speed up collaboration for AI projects.
End to end Machine Learning using Kubeflow - Build, Train, Deploy and ManageAnimesh Singh
With the breadth of sheer functionalities which need to be addressed in the Machine Learning world around building, training, serving and managing models, getting it done in a consistent, composable, portable, and scalable manner is hard. The Kubernetes framework is well suited to address these issues, which is why it's a great foundation for deploying ML workloads. Kubeflow is designed to take advantage of these benefits. In this talk, we are going to address how to make it easy for everyone to develop, deploy, and manage portable, scalable ML everywhere and support the full lifecycle Machine Learning using open source technologies like Kubeflow, Tensorflow, PyTorch,Tekton, Knative, Istio and others. We are going to discuss how to enable distributed training of models, model serving, canary rollouts, drift detection, model explainability, metadata management, pipelines and others. Additionally we will discuss Watson productization in progress based on Kubeflow Pipelines and Tekton, and point to Kubeflow Dojo materials and follow-on workshops.
The catalyst for the success of automobiles came not through the invention of the car but rather through the establishment of an innovative assembly line. History shows us that the ability to mass produce and distribute a product is the key to driving adoption of any innovation, and machine learning is no different. MLOps is the assembly line of Machine Learning and in this presentation we will discuss the core capabilities your organization should be focused on to implement a successful MLOps system.
Dmitry Kan, Principal AI Scientist at Silo AI and host of the Vector Podcast [1], will give an overview of the landscape of vector search databases and their role in NLP, along with the latest news and his view on the future of vector search. Further, he will share how he and his team participated in the Billion-Scale Approximate Nearest Neighbor Challenge and improved recall by 12% over a baseline FAISS.
Presented at https://www.meetup.com/open-nlp-meetup/events/282678520/
YouTube: https://www.youtube.com/watch?v=RM0uuMiqO8s&t=179s
Follow Vector Podcast to stay up to date on this topic: https://www.youtube.com/@VectorPodcast
Vertex AI: Pipelines for your MLOps workflowsMárton Kodok
In recent years, one of the biggest trends in applications development has been the rise of Machine Learning solutions, tools, and managed platforms. Vertex AI is a managed unified ML platform for all your AI workloads. On the MLOps side, Vertex AI Pipelines solutions let you adopt experiment pipelining beyond the classic build, train, eval, and deploy a model. It is engineered for data scientists and data engineers, and it’s a tremendous help for those teams who don’t have DevOps or sysadmin engineers, as infrastructure management overhead has been almost completely eliminated.
Based on practical examples we will demonstrate how Vertex AI Pipelines scores high in terms of developer experience, how fits custom ML needs, and analyze results. It’s a toolset for a fully-fledged machine learning workflow, a sequence of steps in the model development, a deployment cycle, such as data preparation/validation, model training, hyperparameter tuning, model validation, and model deployment. Vertex AI comes with all standard resources plus an ML metadata store, a fully managed feature store, and a fully managed pipelines runner.
Vertex AI Pipelines is a managed serverless toolkit, which means you don't have to fiddle with infrastructure or back-end resources to run workflows.
Come può .NET contribuire alla Data Science? Cosa è .NET Interactive? Cosa c'entrano i notebook? E Apache Spark? E il pythonismo? E Azure? Vediamo in questa sessione di mettere in ordine le idee.
Apache Spark’s machine learning library provides a simple, elegant, yet powerful framework for creating scalable machine learning pipelines. It provides out of the box components for feature extraction and transformation, as well as various machine learning algorithms.
However, in recent years specialized systems (such as TensorFlow, Caffe, PyTorch and Apache MXNet) have been dominant in the domain of AI and deep learning, as they allow greater performance and flexibility for training complex models. While there are a few deep learning frameworks that are Spark specific, in most cases these frameworks are separate from Spark and the ease of integration and feature set exposed varies considerably.
This session will explore the role of Spark within the AI landscape, the current state of deep learning on top of Spark and the most recent developments in the Spark project to better integrate Spark with the deep learning ecosystem.
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/dec-2016-member-meeting-khronos
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Peter McGuinness, representing the Khronos Group, delivers the presentation "New Standards for Embedded Vision and Neural Networks" at the December 2016 Embedded Vision Alliance Member Meeting. McGuinness discusses new standardization work for embedded neural network and vision software.
The Download: Tech Talks by the HPCC Systems Community, Episode 11HPCC Systems
Join us as we continue this series of webinars specifically designed for the community by the community with the goal to share knowledge, spark innovation and further build and link the relationships within our HPCC Systems community.
Episode 11 includes Tech Talks featuring speakers from our community on topics covering Big Data solutions, Spark Integration and other ECL Tips leveraging the HPCC Systems platform.
1) Raj Chandrasekaran, CTO & Co-Founder, ClearFunnel - Scaling Data Science capabilities: Leveraging a homogeneous Big Data ecosystem
2) James McMullan, Software Engineer III, LexisNexis Risk Solutions - HDFS Connector Preview
3) Bob Foreman, Senior Software Engineer, LexisNexis Risk Solutions - Building a RELATIONal Dataset - A Valentine’s Day Special!
dotnetconf 2020 è andato e ci ha lasciato .NET 5. Ovvero una delle più importanti release di .NET di sempre. Cosa significa per il nostro lavoro? Scopriamolo assieme
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2023/07/introducing-the-i-mx-93-your-go-to-processor-for-embedded-vision-a-presentation-from-nxp-semiconductors/
Srikanth Jagannathan, Product Manager at NXP Semiconductors, presents the “Introducing the i.MX 93: Your “Go-to” Processor for Embedded Vision” tutorial at the May 2023 Embedded Vision Summit.
In this presentation, you’ll learn all about NXP’s just-launched i.MX 93 applications processor family. The i.MX 93 is built with NXP’s innovative Energy Flex architecture, which delivers high performance, low power consumption and incredible versatility at an affordable price. Jagannathan introduces the i.MX 93 processing cores, including two high-performance Arm Cortex-A55 CPUs, a Cortex-M33 for low-power, real-time operation, and an Arm Ethos U-65 NPU that enables high-performance, cost-effective and energy-efficient ML applications.
Jagannathan also shares NPU architecture details and benchmark results, and explores the i.MX 93 processors’ rich set of on chip peripherals, such as MIPI-CSI, MIPI-DSI, Ethernet and USB. And he introduces the eIQ software toolkit, which allows developers to create complete system-level applications with ease. Jagannathan shows how the rich feature set of the i.MX 93 processors supports demanding embedded vision applications through examples, such as a driver monitoring system. And he highlights NXP’s unique commitment to industrial-grade quality, product longevity and customer support.
Dot net platform and dotnet core fundamentalsLalit Kale
This is the presentation deck, I did for LimerickDotNet-Azure User group.
Event Url: https://www.meetup.com/Limerick-DotNet/events/240897689/
Session Details:
This session represented .NET journey of almost 17 years. Through this slid-deck, I narrated .NET platform progression till .NET Standards 2.0.
This session was accompanied by a small demo of running small dotnet program on alpine linux with docker container.
28March2024-Codeless-Generative-AI-Pipelines
https://www.meetup.com/futureofdata-princeton/events/299440871/
https://www.meetup.com/real-time-analytics-meetup-ny/events/299290822/
******Note*****
The event is seat-limited, therefore please complete your registration here. Only people completing the form will be able to attend.
-----------------------
We're excited to invite you to join us in-person, for a Real-Time Analytics exploration!
Join us for an evening of insights, networking as we delve into the OSS technologies shaping the field!
Agenda:
05:30-06:00: Pizza and friends
06:00- 06:40: Codeless GenAI Pipelines with Flink, Kafka, NiFi
06:40- 07:20 Real-Time Analytics in the Corporate World: How Apache Pinot® Powers Industry Leaders
07:20-07:30 QNA
Codeless GenAI Pipelines with Flink, Kafka, NiFi | Tim Spann, Cloudera
Explore the power of real-time streaming with GenAI using Apache NiFi. Learn how NiFi simplifies data engineering workflows, allowing you to focus on creativity over technical complexities. I'll guide you through practical examples, showcasing NiFi's automation impact from ingestion to delivery. Whether you're a seasoned data engineer or new to GenAI, this talk offers valuable insights into optimizing workflows. Join us to unlock the potential of real-time streaming and witness how NiFi makes data engineering a breeze for GenAI applications!
Real-Time Analytics in the Corporate World: How Apache Pinot® Powers Industry Leaders | Viktor Gamov, StarTree
Explore how industry leaders like LinkedIn, Uber Eats, and Stripe are mastering real-time data with Viktor as your guide. Discover how Apache Pinot transforms data into actionable insights instantly. Viktor will showcase Pinot's features, including the Star-Tree Index, and explain why it's a game-changer in data strategy. This session is for everyone, from data geeks to business gurus, eager to uncover the future of tech. Join us and be wowed by the power of real-time analytics with Apache Pinot!
-------
Tim Spann is a Principal Developer Advocate in Data In Motion for Cloudera.
He works with Apache NiFi, Apache Kafka, Apache Pulsar, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more.
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...Databricks
What we call the public cloud was developed primarily to manage and deploy web servers. The target audience for these products is Dev Ops. While this is a massive and exciting market, the world of Data Science and Deep Learning is very different — and possibly even bigger. Unfortunately, the tools available today are not designed for this new audience and the cloud needs to evolve. This talk would cover what the next 10 years of cloud computing will look like.
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageMayaData Inc
Webinar Session - https://youtu.be/_5MfGMf8PG4
In this webinar, we share how the Container Attached Storage pattern makes performance tuning more tractable, by giving each workload its own storage system, thereby decreasing the variables needed to understand and tune performance.
We then introduce MayaStor, a breakthrough in the use of containers and Kubernetes as a data plane. MayaStor is the first containerized data engine available that delivers near the theoretical maximum performance of underlying systems. MayaStor performance scales with the underlying hardware and has been shown, for example, to deliver in excess of 10 million IOPS in a particular environment.
MLflow model serving
How to score models with MLflow
Offline scoring with Spark
Online scoring with MLflow model server
Custom model deployment and scoring
Databricks model server
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
StarCompliance is a leading firm specializing in the recovery of stolen cryptocurrency. Our comprehensive services are designed to assist individuals and organizations in navigating the complex process of fraud reporting, investigation, and fund recovery. We combine cutting-edge technology with expert legal support to provide a robust solution for victims of crypto theft.
Our Services Include:
Reporting to Tracking Authorities:
We immediately notify all relevant centralized exchanges (CEX), decentralized exchanges (DEX), and wallet providers about the stolen cryptocurrency. This ensures that the stolen assets are flagged as scam transactions, making it impossible for the thief to use them.
Assistance with Filing Police Reports:
We guide you through the process of filing a valid police report. Our support team provides detailed instructions on which police department to contact and helps you complete the necessary paperwork within the critical 72-hour window.
Launching the Refund Process:
Our team of experienced lawyers can initiate lawsuits on your behalf and represent you in various jurisdictions around the world. They work diligently to recover your stolen funds and ensure that justice is served.
At StarCompliance, we understand the urgency and stress involved in dealing with cryptocurrency theft. Our dedicated team works quickly and efficiently to provide you with the support and expertise needed to recover your assets. Trust us to be your partner in navigating the complexities of the crypto world and safeguarding your investments.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
5. ONNX
5
• Open interoperable format to represent ML models
• Model portability across frameworks
• Decouple training and scoring
• Interoperability between frameworks, compilers, runtimes,
and hardware accelerators
• Accelerated inferencing on cloud and edge (IOT)
6. ONNX
6
• Wide support by ML industry vendors
• Bringing the worlds of AI research and products closer
together so that they innovate and deploy faster
• Train in one framework and score in another
• Optimize models for deployments on multiple platforms
• Originally for DL models, but now covers non-DL (sklearn)
7. ONNX Pillars
7
• ONNX format standard - Linux Foundation - Nov. 2019
• ONNX converters
○ TensorFlow to ONNX, ONNX to TensorFlow, etc.
• ONNX Runtime
○ MSFT ONNX Runtime - open-source
○ Nvidia TensorRT
8. ONNX Timelines
8
• Founded in Sep. 2017 by Microsoft and Facebook
• AWS, Intel, AMD and NVIDIA support by Dec. 2017
• Enhanced Facebook F8 support in May 2018
• ONNX Runtime open-sourced by MSFT in Dec. 2018
• ONNX joins Linux Foundation
13. ONNX Model Zoo
13
• Collection of pre-trained state-of-the-art DL models
• Central repository of reusable models
• Python Jupyter notebooks
• Image classification, natural language, vision, etc.
• https://github.com/onnx/models
14. Intermediate Representation (IR)
14
• IR is common concept in compilers and virtual machines
• Two key features of an IR:
○ Capture source code without loss of information
○ Be independent of target platform
• Common representation for source tensor formats
• Providers optimize IR for target hardware devices
15. ONNX IR
15
• 116 operators
• Acos, BatchNormalization, HardSigmoid, Relu, Softmax, etc.
• Export models is reasonably robust
• Import is less robust due to unimplemented ops
• https://github.com/onnx/onnx/blob/master/docs/Operators.
md
16. Optimizer Processing
16
• Fusion - Fuse multiple ops
• Data layout abstraction
• Data reuse - reuse for subgraphs
• Graph scheduling - run similar subgraphs in parallel
• Graph partitioning - Partition subgraphs to run on different
devices
• Memory management
18. ONNX and MLeap
18
• Think of ONNX as MLeap on steroids
• Both address ML model interoperability but...
• MLeap focuses on real-time scoring with Spark ML
• ONNX support for Spark ML is weak
• MLeap: 2 person company Combust no longer supporting
• ONNX backed by large number of big ML vendors
19. Microsoft
19
• Focus on WinML and ML.net
• Office, Windows, Cognitive Services, Skype, Bing Ads, PowerBI
• 100s millions of devices, serving billions of requests
• 2019 MSFT announced that Windows 10 will have ONNX
embedded in the OS to include the ability to run ML models
natively with hardware acceleration
20. Microsoft ONNX Runtime Usage
20
• Used in millions of Windows devices and powers core models
across Office, Bing, and Azure
• Average of 2x performance gains
• Office - 14.6x reduction in latency
• Bing QnA - 2.8x reduction in latency
• Azure Cognitive Services - 3.5x reduction in latency for OCR
21. Microsoft - BERT - ONNX
21
• Bidirectional Encoder Representations from Transformers
• Google’s state-of-the-art NLP model
• BERT is widely used in Bing
• MSFT just open-sources BERT in Jan. 2020
• 17x BERT inference acceleration with ONNX Runtime
• Scores with Nvidia V100 GPU in 1.7 milliseconds
• https://cloudblogs.microsoft.com/opensource/2020/01/21/microsoft-onnx-open-
source-optimizations-transformer-inference-gpu-cpu
22. Microsoft Raven- SQL Server + ONNX
22
• Extending relational query processing with ML inference
○ http://cidrdb.org/cidr2020/papers/p24-karanasos-cidr20.pdf
• Project Raven
○ Raven, deep native integration of ONNX runtimes with SQL Server
○ and a unified IR
○ advanced cross-optimizations between ML and database operators
• Can in-RDBMS scoring outperform dedicated frameworks?
24. Raven Concepts
24
• Introduces IR that includes both ML and relational operators.
• Optimize inference query that includes both data and ML
operations in a holistic manner
• Leverage relational operator and data properties to optimize
ML part of query
25. Raven Operator Sets
25
• Relation algebra (RA)
• Linear Algebra (LA)
• Other ML operators and data featurizers (MLD) - classical non-
NN frameworks such as sklearn
• UDFs - Used to wrap the non-optimizable code as a black box
26. Raven Inference Engine
26
• Inference execution modes
○ In-process execution (Raven)
○ Out-of-process execution (Raven Ext)
○ Containerized execution
• For small data sets Raven slower than ONNX runtime
• For large data sets Raven is 5x faster
27. Microsoft - SQL Database Edge
27
• Deploy and make predictions with an ONNX model in SQL
Database Edge Preview - 2019-11-04
○ https://docs.microsoft.com/en-us/azure/sql-database-edge/deploy-onnx
• Machine learning and AI with ONNX in SQL Database Edge
Preview - 2019-11-07
○ https://docs.microsoft.com/en-us/azure/sql-database-edge/onnx-overview
28. AWS
28
• ONNX is already integrated with MXNet
• ONNX installed on AWS Deep Learning AMIs (DLAMI)
• New Inferentia chip supports ONNX
• Amazon Elastic Inference supports ONNX
• Model Server for Apache MXNet (MMS)
• Score with ONNX.js using Lambda and Serverless
29. Facebook
29
• PyTorch 1.0 has native ONNX export format since May 2018
• Has not been nearly as active recently as MIcrosoft
• But is quietly contributing to ONNX github
• https://www.facebook.com/onnxai
• https://ai.facebook.com/blog/onnx-expansion-speeds-ai-
development-
30. Nvidia
30
• TensorRT - SDK for high-performance DL inferencing
• Nvidia GPU Cloud ONNX support for TensorRT in Dec. 2017
• ONNX Runtime support for TensorRT in Dec. 2018
• TensorRT backend for ONNX
○ https://github.com/onnx/onnx-tensorrt
• Jetson NANO
33. Apple
33
• Production-grade Core ML to ONNX conversion
• https://github.com/onnx/onnx-coreml
• https://apple.github.io/coremltools/
34. ONNX and Spark ML
34
• Spark ML is not advertised as ONNX supported
• Conversion project does exist:
○ https://github.com/onnx/onnxmltools
• Very few examples
• Preliminary testing reveals problems
• Opportunity to contribute!
35. ONNX and MLflow
35
• ONNX support introduced in MLflow 1.5.0
• Convert model to ONNX format
• Save ONNX model as ONNX flavor
• No automatic ONNX model logging like MLeap
• Scoring: use ONNX Runtime or convert to native flavor