Developing and deploying AI solutions on the cloud using Team Data Science Process (TDSP) and Azure Machine Learning (AML), Presented @ GAIC Jan 18 2018, Santa Clara
Presented at: Global Big AI Conference, Santa Clara, Jan 2018 Developing and deploying AI solutions on the cloud using Team Data Science Process (TDSP) and Azure Machine Learning (AML)
How to design your ML application to be production ready from the day one
How to switch from notebooks to deployable and maintainable software
How to deploy, serve and monitor prediction pipelines
How to re-train models in production
How to shift machine learning experimentation phase to production
Feature drift monitoring as a service for machine learning models at scaleNoriaki Tatsumi
In this talk, you’ll learn about techniques used to build a feature drift detection as a service capability for your enterprise and beyond. Feature drift monitoring is a way to check volatility of machine learning model inputs. It can trigger investigations for potential model degradation as well as explain why models have shifted.
Rsqrd AI: How to Design a Reliable and Reproducible PipelineSanjana Chowdhury
In this talk, David Aronchick, co-founder of Kubeflow and Microsoft's Head of Open Source ML, talks about designing reproducible and reliable ML pipelines. He speaks about the importance and impact of MLOps and use of metadata in pipelines. He also talks about a library he wrote to help with this problem, MLSpecLib.
**These slides are from a talk given at Rsqrd AI. Learn more at rsqrdai.org**
Tutorial for Machine Learning 101 (an all-day tutorial at Strata + Hadoop World, New York City, 2015)
The course is designed to introduce machine learning via real applications like building a recommender image analysis using deep learning.
In this talk we cover deployment of machine learning models.
Feature Store as a Data Foundation for Machine LearningProvectus
Looking to design and build a centralized, scalable Feature Store for your Data Science & Machine Learning teams to take advantage of? Come and learn from experts of Provectus and Amazon Web Services (AWS) how to!
Feature Store is a key component of the ML stack and data infrastructure, which enables feature engineering and management. By having a Feature Store, organizations can save massive amounts of resources, innovate faster, and drive ML processes at scale. In this webinar, you will learn how to build a Feature Store with a data mesh pattern and see how to achieve consistency between real-time and training features, to improve reproducibility with time-traveling for data.
Agenda
- Modern Data Lakes & Modern ML Infrastructure
- Existing and Emerging Architectural Shifts
- Feature Store: Overview and Reference Architecture
- AWS Perspective on Feature Store
Intended Audience
Technology executives & decision makers, manager-level tech roles, data architects & analysts, data engineers & data scientists, ML practitioners & ML engineers, and developers
Presenters
- Stepan Pushkarev, Chief Technology Officer, Provectus
- Gandhi Raketla, Senior Solutions Architect, AWS
- German Osin, Senior Solutions Architect, Provectus
Feel free to share this presentation with your colleagues and don't hesitate to reach out to us at info@provectus.com if you have any questions!
REQUEST WEBINAR: https://provectus.com/webinar-feature-store-as-data-foundation-for-ml-nov-2020/
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2S7lDiS.
Sasha Rosenbaum shows how a CI/CD pipeline for Machine Learning can greatly improve both productivity and reliability. Filmed at qconsf.com.
Sasha Rosenbaum is a Program Manager on the Azure DevOps engineering team, focused on improving the alignment of the product with open source software. She is a co-organizer of the DevOps Days Chicago and the DeliveryConf conferences, and recently published a book on Serverless computing in Azure with .NET.
Data summit connect fall 2020 - rise of data opsRyan Gross
Data governance teams attempt to apply manual control at various points for consistency and quality of the data. By thinking of our machine learning data pipelines as compilers that convert data into executable functions and leveraging data version control, data governance and engineering teams can engineer the data together, filing bugs against data versions, applying quality control checks to the data compilers, and other activities. This talk illustrates how innovations are poised to drive process and cultural changes to data governance, leading to order-of-magnitude improvements.
How to design your ML application to be production ready from the day one
How to switch from notebooks to deployable and maintainable software
How to deploy, serve and monitor prediction pipelines
How to re-train models in production
How to shift machine learning experimentation phase to production
Feature drift monitoring as a service for machine learning models at scaleNoriaki Tatsumi
In this talk, you’ll learn about techniques used to build a feature drift detection as a service capability for your enterprise and beyond. Feature drift monitoring is a way to check volatility of machine learning model inputs. It can trigger investigations for potential model degradation as well as explain why models have shifted.
Rsqrd AI: How to Design a Reliable and Reproducible PipelineSanjana Chowdhury
In this talk, David Aronchick, co-founder of Kubeflow and Microsoft's Head of Open Source ML, talks about designing reproducible and reliable ML pipelines. He speaks about the importance and impact of MLOps and use of metadata in pipelines. He also talks about a library he wrote to help with this problem, MLSpecLib.
**These slides are from a talk given at Rsqrd AI. Learn more at rsqrdai.org**
Tutorial for Machine Learning 101 (an all-day tutorial at Strata + Hadoop World, New York City, 2015)
The course is designed to introduce machine learning via real applications like building a recommender image analysis using deep learning.
In this talk we cover deployment of machine learning models.
Feature Store as a Data Foundation for Machine LearningProvectus
Looking to design and build a centralized, scalable Feature Store for your Data Science & Machine Learning teams to take advantage of? Come and learn from experts of Provectus and Amazon Web Services (AWS) how to!
Feature Store is a key component of the ML stack and data infrastructure, which enables feature engineering and management. By having a Feature Store, organizations can save massive amounts of resources, innovate faster, and drive ML processes at scale. In this webinar, you will learn how to build a Feature Store with a data mesh pattern and see how to achieve consistency between real-time and training features, to improve reproducibility with time-traveling for data.
Agenda
- Modern Data Lakes & Modern ML Infrastructure
- Existing and Emerging Architectural Shifts
- Feature Store: Overview and Reference Architecture
- AWS Perspective on Feature Store
Intended Audience
Technology executives & decision makers, manager-level tech roles, data architects & analysts, data engineers & data scientists, ML practitioners & ML engineers, and developers
Presenters
- Stepan Pushkarev, Chief Technology Officer, Provectus
- Gandhi Raketla, Senior Solutions Architect, AWS
- German Osin, Senior Solutions Architect, Provectus
Feel free to share this presentation with your colleagues and don't hesitate to reach out to us at info@provectus.com if you have any questions!
REQUEST WEBINAR: https://provectus.com/webinar-feature-store-as-data-foundation-for-ml-nov-2020/
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2S7lDiS.
Sasha Rosenbaum shows how a CI/CD pipeline for Machine Learning can greatly improve both productivity and reliability. Filmed at qconsf.com.
Sasha Rosenbaum is a Program Manager on the Azure DevOps engineering team, focused on improving the alignment of the product with open source software. She is a co-organizer of the DevOps Days Chicago and the DeliveryConf conferences, and recently published a book on Serverless computing in Azure with .NET.
Data summit connect fall 2020 - rise of data opsRyan Gross
Data governance teams attempt to apply manual control at various points for consistency and quality of the data. By thinking of our machine learning data pipelines as compilers that convert data into executable functions and leveraging data version control, data governance and engineering teams can engineer the data together, filing bugs against data versions, applying quality control checks to the data compilers, and other activities. This talk illustrates how innovations are poised to drive process and cultural changes to data governance, leading to order-of-magnitude improvements.
Monitoring AI applications with AI
The best performing offline algorithm can lose in production. The most accurate model does not always improve business metrics. Environment misconfiguration or upstream data pipeline inconsistency can silently kill the model performance. Neither prodops, data science or engineering teams are skilled to detect, monitor and debug such types of incidents.
Was it possible for Microsoft to test Tay chatbot in advance and then monitor and adjust it continuously in production to prevent its unexpected behaviour? Real mission critical AI systems require advanced monitoring and testing ecosystem which enables continuous and reliable delivery of machine learning models and data pipelines into production. Common production incidents include:
Data drifts, new data, wrong features
Vulnerability issues, malicious users
Concept drifts
Model Degradation
Biased Training set / training issue
Performance issue
In this demo based talk we discuss a solution, tooling and architecture that allows machine learning engineer to be involved in delivery phase and take ownership over deployment and monitoring of machine learning pipelines.
It allows data scientists to safely deploy early results as end-to-end AI applications in a self serve mode without assistance from engineering and operations teams. It shifts experimentation and even training phases from offline datasets to live production and closes a feedback loop between research and production.
Technical part of the talk will cover the following topics:
Automatic Data Profiling
Anomaly Detection
Clustering of inputs and outputs of the model
A/B Testing
Service Mesh, Envoy Proxy, trafic shadowing
Stateless and stateful models
Monitoring of regression, classification and prediction models
TensorFlow London 18: Dr Alastair Moore, Towards the use of Graphical Models ...Seldon
Abstract: Recent developments in understanding technology diffusion and business strategy lend themselves towards analysis as directed graphs. Alastair will briefly introduce a Wardley Map, a directed dependency graph situated in a metric space. I will highlight aspects of this representation that lend themselves to analysis using dynamic graphical models. I will discuss some preliminary thinking about modelling aspects of a business, specifically those that are dependent on machine learning systems.
Bio: Alastair is a UCL Computer Science PhD (Computer Vision) with broad experience of applied Machine Learning and Statistical Analysis in a variety of settings. His background includes stints with corporate research, internet startups and universities. He was on the founding team on spin-out Satalia.com (Data Science) and venture backed WeArePopUp.com (Real Estate contracting) and helped setup the IDEALondon innovation centre with UCL and Cisco Systems. His primary objective at Mishcon de Reya is to ensure the firm consistently uses data and machine learning techniques to support and improve its business process, and helps advise its clients on innovation and technology. Alastair continues to maintain an active teaching role in the UCL School of Management (Predictive systems) the Cambridge Judge Business School (Entrepreneurship) and Peking University, Beijing (Technology innovation). His research interests include technology strategy, smart contracting and computational law.
This presentation was made on June 9th, 2020.
Video recording of the session can be viewed here: https://youtu.be/OCB9sTUnUug
In this meetup with Sanyam Bhutani, Machine Learning Engineer at H2O.ai, he gives a recap of the eight annual ICLR (International Conference on Learning Representations) 2020 - a niche deep learning conference whose focus is to study how to learn representations of data, which is basically what deep learning does.
Sanyam goes through a few of his favorite selected papers from this year’s ICLR, note this session may not be able to capture the richness of all papers or allow a detailed discussion.
You will be able to find Sanyam in our community slack (https://www.h2o.ai/slack-community/), please feel free to start a discussion with him, if you send a emoji greeting, you’ll find the answers.
Following are the papers we will look into:
U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation
AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty
Your classifier is secretly an energy based model and you should treat it like one
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Reformer: The Efficient Transformer
Generative Models for Effective ML on Private, Decentralized Datasets
Once for All: Train One Network and Specialize it for Efficient Deployment
Thieves on Sesame Street! Model Extraction of BERT-based APIs
Plug and Play Language Models: A Simple Approach to Controlled Text Generation
BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning
Real or Not Real, that is the Question
When you wake up in the morning, you probably unlock your smartphone with your fingerprint, talk to it in your own language to open your email or agenda or weather apps, ask for a recommendation for a meeting later in the day and look for the shortest path to its location. Our lives are being reshaped thanks to the amount of available data, to the computing capabilities, to Machine Learning (ML) and recently Deep Learning (DL) algorithms.
How does a ML algorithm work? What are the steps to take to success an ML project? What should one do to apply DL? Is ML hard to Learn? Is it hard to apply?
Agile, Automated, Aware: How to Model for SuccessInside Analysis
The Briefing Room with David Loshin and Embarcadero
Live Webcast October 27, 2015
Watch the archive: https://bloorgroup.webex.com/bloorgroup/onstage/g.php?MTID=eea9877b71c653c499c809c5693eae8fe
Data management teams face some tough challenges these days. Organizations need business-driven visibility that enables understanding and awareness of enterprise data assets – without worrying about definitions and change management. But with information architectures evolving into a hybrid mix of data objects and data services built over relational databases as well as big data stores, serving up accurately defined, reusable data can become a complex issue.
Register for this episode of The Briefing Room to learn from veteran Analyst David Loshin as he explains the importance of agile, automated workflows in today’s enterprise. He’ll be briefed by Ron Huizenga of Embarcadero, who will discuss how his company’s ER/Studio suite approaches data modeling and management from a modern architecture standpoint. He will explain that unifying the way information is represented can not only eliminate the need for costly workarounds, but also foster collaboration between data architects, developers and business users.
Visit InsideAnalysis.com for more information.
CD4ML and the challenges of testing and quality in ML systemsSeldon
Speaker: Danilo Sato, principal consultant at ThoughtWorks.
Bio: Danilo Sato (@dtsato) is a principal consultant at ThoughtWorks with experience in many areas of architecture and engineering: software, data, infrastructure, and machine learning. He is the author of "DevOps in Practice: Reliable and Automated Software Delivery", a member of ThoughtWorks Technology Advisory Board, and ThoughtWorks Office of the CTO.
Title: CD4ML and the challenges of testing and quality in ML systems
Abstract: Continuous Delivery for Machine Learning (CD4ML) deals with the challenges of applying Continuous Delivery principles to ML systems to make the end-to-end process of developing and deploying them more repeatable and reliable. These systems are generally more complex than traditional software applications, and ML models are non-deterministic and hard to explain. In this talk we will discuss the challenges of testing and quality in ML systems, and share some practices for applying different types of tests to help overcome those issues.
www.devopsinpractice.com
www.devopsnapratica.com.br
Introduction to Machine Learning - WeCloudDataWeCloudData
In this talk, WeCloudData introduces the lifecycle of machine learning and its tools/ecosystems. For more detail about WeCloudData's machine learning course please visit: https://weclouddata.com/data-science/
Why do the majority of Data Science projects never make it to production?Itai Yaffe
María de la Fuente (Solutions Architect Manager for IMEA) @ Databricks
While most companies understand the value creation of leveraging data and are taking on board an AI strategy, only 13% of the data science projects make it to production successfully.
Besides the well-known skills gap in the market, we need to level up our end-to-end approach and cover all aspects involved when working with AI.
In this session, we will discuss the main obstacles to overcome and how we can avoid the major pitfalls to ensure our data science journey becomes successful.
Scaling & Managing Production Deployments with H2O ModelOpsSri Ambati
This presentation was made on June 30th, 2020.
Recording of the presentation is available here: https://youtu.be/9LajqAL_CU8
As enterprises “make their own AI”, a new set of challenges emerge. Maintaining reproducibility, traceability, and verifiability of machine learning models, as well as recording experiments, tracking insights, and reproducing results, are key. Collaboration between teams is also necessary as “model factories” are created for enterprise-wide model data science efforts. Additionally, monitoring of models ensures that drift or performance degradation is addressed with either retraining or model updates. Finally, data and model lineage in case of rollbacks or addressing regulatory compliance is necessary.
H2O ModelOps delivers centralized catalog and management, deployment, monitoring, collaboration, and administration of machine learning models. In this webinar, we learn how H2O can assist with operationalizing, scaling and managing production deployments.
Speaker's Bio:
Felix is a part of the Customer Success team in Asia Pacific at H2O.ai. An engineer and an IIM alumni, Felix has held prominent positions in the data science industry.
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...Sri Ambati
Numerai is an open, crowd-sourced hedge fund powered by predictions from data scientists around the world. In return, participants are rewarded with weekly payouts in crypto.
In this talk, Joe will give an overview of the Numerai tournament based on his own experience. He will then explain how he automates the time-consuming tasks such as testing different modelling strategies, scoring new datasets, submitting predictions to Numerai as well as monitoring model performance with H2O Driverless AI and R.
Developing and deploying NLP services on the cloud using Azure ML and the Tea...Debraj GuhaThakurta
Developing and deploying NLP services on the cloud using Azure ML and the Team Data Science Process. Presentation at Global AI Conference, Seattle, April 28, 2018
Monitoring AI applications with AI
The best performing offline algorithm can lose in production. The most accurate model does not always improve business metrics. Environment misconfiguration or upstream data pipeline inconsistency can silently kill the model performance. Neither prodops, data science or engineering teams are skilled to detect, monitor and debug such types of incidents.
Was it possible for Microsoft to test Tay chatbot in advance and then monitor and adjust it continuously in production to prevent its unexpected behaviour? Real mission critical AI systems require advanced monitoring and testing ecosystem which enables continuous and reliable delivery of machine learning models and data pipelines into production. Common production incidents include:
Data drifts, new data, wrong features
Vulnerability issues, malicious users
Concept drifts
Model Degradation
Biased Training set / training issue
Performance issue
In this demo based talk we discuss a solution, tooling and architecture that allows machine learning engineer to be involved in delivery phase and take ownership over deployment and monitoring of machine learning pipelines.
It allows data scientists to safely deploy early results as end-to-end AI applications in a self serve mode without assistance from engineering and operations teams. It shifts experimentation and even training phases from offline datasets to live production and closes a feedback loop between research and production.
Technical part of the talk will cover the following topics:
Automatic Data Profiling
Anomaly Detection
Clustering of inputs and outputs of the model
A/B Testing
Service Mesh, Envoy Proxy, trafic shadowing
Stateless and stateful models
Monitoring of regression, classification and prediction models
TensorFlow London 18: Dr Alastair Moore, Towards the use of Graphical Models ...Seldon
Abstract: Recent developments in understanding technology diffusion and business strategy lend themselves towards analysis as directed graphs. Alastair will briefly introduce a Wardley Map, a directed dependency graph situated in a metric space. I will highlight aspects of this representation that lend themselves to analysis using dynamic graphical models. I will discuss some preliminary thinking about modelling aspects of a business, specifically those that are dependent on machine learning systems.
Bio: Alastair is a UCL Computer Science PhD (Computer Vision) with broad experience of applied Machine Learning and Statistical Analysis in a variety of settings. His background includes stints with corporate research, internet startups and universities. He was on the founding team on spin-out Satalia.com (Data Science) and venture backed WeArePopUp.com (Real Estate contracting) and helped setup the IDEALondon innovation centre with UCL and Cisco Systems. His primary objective at Mishcon de Reya is to ensure the firm consistently uses data and machine learning techniques to support and improve its business process, and helps advise its clients on innovation and technology. Alastair continues to maintain an active teaching role in the UCL School of Management (Predictive systems) the Cambridge Judge Business School (Entrepreneurship) and Peking University, Beijing (Technology innovation). His research interests include technology strategy, smart contracting and computational law.
This presentation was made on June 9th, 2020.
Video recording of the session can be viewed here: https://youtu.be/OCB9sTUnUug
In this meetup with Sanyam Bhutani, Machine Learning Engineer at H2O.ai, he gives a recap of the eight annual ICLR (International Conference on Learning Representations) 2020 - a niche deep learning conference whose focus is to study how to learn representations of data, which is basically what deep learning does.
Sanyam goes through a few of his favorite selected papers from this year’s ICLR, note this session may not be able to capture the richness of all papers or allow a detailed discussion.
You will be able to find Sanyam in our community slack (https://www.h2o.ai/slack-community/), please feel free to start a discussion with him, if you send a emoji greeting, you’ll find the answers.
Following are the papers we will look into:
U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation
AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty
Your classifier is secretly an energy based model and you should treat it like one
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Reformer: The Efficient Transformer
Generative Models for Effective ML on Private, Decentralized Datasets
Once for All: Train One Network and Specialize it for Efficient Deployment
Thieves on Sesame Street! Model Extraction of BERT-based APIs
Plug and Play Language Models: A Simple Approach to Controlled Text Generation
BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning
Real or Not Real, that is the Question
When you wake up in the morning, you probably unlock your smartphone with your fingerprint, talk to it in your own language to open your email or agenda or weather apps, ask for a recommendation for a meeting later in the day and look for the shortest path to its location. Our lives are being reshaped thanks to the amount of available data, to the computing capabilities, to Machine Learning (ML) and recently Deep Learning (DL) algorithms.
How does a ML algorithm work? What are the steps to take to success an ML project? What should one do to apply DL? Is ML hard to Learn? Is it hard to apply?
Agile, Automated, Aware: How to Model for SuccessInside Analysis
The Briefing Room with David Loshin and Embarcadero
Live Webcast October 27, 2015
Watch the archive: https://bloorgroup.webex.com/bloorgroup/onstage/g.php?MTID=eea9877b71c653c499c809c5693eae8fe
Data management teams face some tough challenges these days. Organizations need business-driven visibility that enables understanding and awareness of enterprise data assets – without worrying about definitions and change management. But with information architectures evolving into a hybrid mix of data objects and data services built over relational databases as well as big data stores, serving up accurately defined, reusable data can become a complex issue.
Register for this episode of The Briefing Room to learn from veteran Analyst David Loshin as he explains the importance of agile, automated workflows in today’s enterprise. He’ll be briefed by Ron Huizenga of Embarcadero, who will discuss how his company’s ER/Studio suite approaches data modeling and management from a modern architecture standpoint. He will explain that unifying the way information is represented can not only eliminate the need for costly workarounds, but also foster collaboration between data architects, developers and business users.
Visit InsideAnalysis.com for more information.
CD4ML and the challenges of testing and quality in ML systemsSeldon
Speaker: Danilo Sato, principal consultant at ThoughtWorks.
Bio: Danilo Sato (@dtsato) is a principal consultant at ThoughtWorks with experience in many areas of architecture and engineering: software, data, infrastructure, and machine learning. He is the author of "DevOps in Practice: Reliable and Automated Software Delivery", a member of ThoughtWorks Technology Advisory Board, and ThoughtWorks Office of the CTO.
Title: CD4ML and the challenges of testing and quality in ML systems
Abstract: Continuous Delivery for Machine Learning (CD4ML) deals with the challenges of applying Continuous Delivery principles to ML systems to make the end-to-end process of developing and deploying them more repeatable and reliable. These systems are generally more complex than traditional software applications, and ML models are non-deterministic and hard to explain. In this talk we will discuss the challenges of testing and quality in ML systems, and share some practices for applying different types of tests to help overcome those issues.
www.devopsinpractice.com
www.devopsnapratica.com.br
Introduction to Machine Learning - WeCloudDataWeCloudData
In this talk, WeCloudData introduces the lifecycle of machine learning and its tools/ecosystems. For more detail about WeCloudData's machine learning course please visit: https://weclouddata.com/data-science/
Why do the majority of Data Science projects never make it to production?Itai Yaffe
María de la Fuente (Solutions Architect Manager for IMEA) @ Databricks
While most companies understand the value creation of leveraging data and are taking on board an AI strategy, only 13% of the data science projects make it to production successfully.
Besides the well-known skills gap in the market, we need to level up our end-to-end approach and cover all aspects involved when working with AI.
In this session, we will discuss the main obstacles to overcome and how we can avoid the major pitfalls to ensure our data science journey becomes successful.
Scaling & Managing Production Deployments with H2O ModelOpsSri Ambati
This presentation was made on June 30th, 2020.
Recording of the presentation is available here: https://youtu.be/9LajqAL_CU8
As enterprises “make their own AI”, a new set of challenges emerge. Maintaining reproducibility, traceability, and verifiability of machine learning models, as well as recording experiments, tracking insights, and reproducing results, are key. Collaboration between teams is also necessary as “model factories” are created for enterprise-wide model data science efforts. Additionally, monitoring of models ensures that drift or performance degradation is addressed with either retraining or model updates. Finally, data and model lineage in case of rollbacks or addressing regulatory compliance is necessary.
H2O ModelOps delivers centralized catalog and management, deployment, monitoring, collaboration, and administration of machine learning models. In this webinar, we learn how H2O can assist with operationalizing, scaling and managing production deployments.
Speaker's Bio:
Felix is a part of the Customer Success team in Asia Pacific at H2O.ai. An engineer and an IIM alumni, Felix has held prominent positions in the data science industry.
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...Sri Ambati
Numerai is an open, crowd-sourced hedge fund powered by predictions from data scientists around the world. In return, participants are rewarded with weekly payouts in crypto.
In this talk, Joe will give an overview of the Numerai tournament based on his own experience. He will then explain how he automates the time-consuming tasks such as testing different modelling strategies, scoring new datasets, submitting predictions to Numerai as well as monitoring model performance with H2O Driverless AI and R.
Similar to Developing and deploying AI solutions on the cloud using Team Data Science Process (TDSP) and Azure Machine Learning (AML), Presented @ GAIC Jan 18 2018, Santa Clara
Developing and deploying NLP services on the cloud using Azure ML and the Tea...Debraj GuhaThakurta
Developing and deploying NLP services on the cloud using Azure ML and the Team Data Science Process. Presentation at Global AI Conference, Seattle, April 28, 2018
Machine Learning in the Cloud has influences operations company-wide. Learn from Data Scientist, Ahmed Sherif, how to leverage Cloud offerings including AWS, IBM Cloud, and Microsoft Azure.
Big Data Adavnced Analytics on Microsoft AzureMark Tabladillo
This presentation provides a survey of the advanced analytics strengths of Microsoft Azure from an enterprise perspective (with these organizations being the bulk of big data users) based on the Team Data Science Process. The talk also covers the range of analytics and advanced analytics solutions available for developers using data science and artificial intelligence from Microsoft Azure.
This presentation anchors best practices for Enterprise Data Science based on Microsoft's "Team Data Science Process". The talk includes introducing the concepts, describing some real-world advice for project planning, and discusses typical titles of professionals who make enterprise data science successful. These techniques also apply for AI (artificial intelligence), deep learning, machine learning, and advanced analytics.
How to build your own Delve: combining machine learning, big data and SharePointJoris Poelmans
You are experiencing the benefits of machine learning everyday through product recommendations on Amazon & Bol.com, credit card fraud prevention, etc… So how can we leverage machine learning together with SharePoint and Yammer. We will first look into the fundamentals of machine learning and big data solutions and next we will explore how we can combine tools such as Windows Azure HDInsight, R, Azure Machine Learning to extend and support collaboration and content management scenarios within your organization.
Architecting an Open Source AI Platform 2018 editionDavid Talby
How to build a scalable AI platform using open source software. The end-to-end architecture covers data integration, interactive queries & visualization, machine learning & deep learning, deploying models to production, and a full 24x7 operations toolset in a high-compliance environment.
In this deck from the 2019 UK HPC Conference, Glyn Bowden from HPE presents: The Eco-System of AI and How to Use It.
"This presentation walks through HPE's current view on AI applications, where it is driving outcomes and innovation, and where the challenges lay. We look at the eco-system that sits around an AI project and look at ways this can impact the success of the endeavor."
Watch the video: https://wp.me/p3RLHQ-kVS
Learn more: https://www.hpe.com/us/en/solutions/artificial-intelligence.html
and
http://hpcadvisorycouncil.com/events/2019/uk-conference/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Data Scientists and Machine Learning practitioners, nowadays, seem to be churning out models by the dozen and they continuously experiment to find ways to improve their accuracies. They also use a variety of ML and DL frameworks & languages , and a typical organization may find that this results in a heterogenous, complicated bunch of assets that require different types of runtimes, resources and sometimes even specialized compute to operate efficiently.
But what does it mean for an enterprise to actually take these models to "production" ? How does an organization scale inference engines out & make them available for real-time applications without significant latencies ? There needs to be different techniques for batch (offline) inferences and instant, online scoring. Data needs to be accessed from various sources and cleansing, transformations of data needs to be enabled prior to any predictions. In many cases, there maybe no substitute for customized data handling with scripting either.
Enterprises also require additional auditing and authorizations built in, approval processes and still support a "continuous delivery" paradigm whereby a data scientist can enable insights faster. Not all models are created equal, nor are consumers of a model - so enterprises require both metering and allocation of compute resources for SLAs.
In this session, we will take a look at how machine learning is operationalized in IBM Data Science Experience (DSX), a Kubernetes based offering for the Private Cloud and optimized for the HortonWorks Hadoop Data Platform. DSX essentially brings in typical software engineering development practices to Data Science, organizing the dev->test->production for machine learning assets in much the same way as typical software deployments. We will also see what it means to deploy, monitor accuracies and even rollback models & custom scorers as well as how API based techniques enable consuming business processes and applications to remain relatively stable amidst all the chaos.
Speaker
Piotr Mierzejewski, Program Director Development IBM DSX Local, IBM
Building Data Science into Organizations: Field ExperienceDatabricks
We will share our experiences in building Data Science and Machine Learning (DS/ML) into organizations. As new DS/ML teams are created, many wrestle with questions such as: How can we most efficiently achieve short-term goals while planning for scale and production long-term? How should DS/ML be incorporated into a company?
We will bring unique perspectives: one as a previous Databricks customer leading a DS team, one as the second ML engineer at Databricks, and both as current Solutions Architects guiding customers through their DS/ML journeys.We will cover best practices through the crawl-walk-run journey of DS/ML: how to immediately become more productive with an initial team, how to scale and move towards production when needed, and how to integrate effectively with the broader organization.
This talk is meant for technical leaders who are building new DS/ML teams or helping to spread DS/ML practices across their organizations. Technology discussion will focus on Databricks, but the lessons apply to any tech platforms in this space.
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Ali Alkan
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi | Automating Machine Learning, Artificial Intelligence, and Data Science | Guided Analytics
Data science holds tremendous potential for organizations to uncover new insights and drivers of revenue and profitability. Big Data has brought the promise of doing data science at scale to enterprises, however this promise also comes with challenges for data scientists to continuously learn and collaborate. Data Scientists have many tools at their disposal such as notebooks like Juypter and Apache Zeppelin & IDEs such as RStudio with languages like R, Python, Scala and frameworks like Apache Spark. Given all the choices how do you best collaborate to build your model and then work through the development lifecycle to deploy it from test into production ?
In this session learn the attributes of a modern data science platform that empowers data scientists to build models using all the data in their data lake and foster continuous learning and collaboration. We will show a demo of DSX with HDP with the focus on integration, security and model deployment and management.
Speakers:
Sriram Srinivasan, Senior Technical Staff Member, Analytics Platform Architect, IBM
Vikram Murali, Program Director, Data Science and Machine Learning, IBM
The catalyst for the success of automobiles came not through the invention of the car but rather through the establishment of an innovative assembly line. History shows us that the ability to mass produce and distribute a product is the key to driving adoption of any innovation, and machine learning is no different. MLOps is the assembly line of Machine Learning and in this presentation we will discuss the core capabilities your organization should be focused on to implement a successful MLOps system.
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXTDataWorks Summit
Standard Bank is a leading South African bank with a vision to be the leading financial services organization in and for Africa. We will share our vision, greatest challenges, and most valuable lessons learned on our journey towards enterprise adoption of a big data strategy.
This includes our implementation of: a multi-tenant enterprise data lake, a real time streaming capability, appropriate data management and governance principles, a data science workbench, and a process for model productionisation to support data science teams across the Group and across Africa and Europe.
Speakers
Zakeera Mahomen, Standard Bank, Big Data Practice Lead
Kristel Sampson, Standard Bank, Platform Lead
Think of big data as all data, no matter what the volume, velocity, or variety. The simple truth is a traditional on-prem data warehouse will not handle big data. So what is Microsoft’s strategy for building a big data solution? And why is it best to have this solution in the cloud? That is what this presentation will cover. Be prepared to discover all the various Microsoft technologies and products from collecting data, transforming it, storing it, to visualizing it. My goal is to help you not only understand each product but understand how they all fit together, so you can be the hero who builds your companies big data solution.
Scaling up with Cisco Big Data: Data + Science = Data ScienceeRic Choo
Discover how IoT data is ingested with real-time technologies such as Apache Spark and analysed using Data Science techniques to glean insights.
Similar to Developing and deploying AI solutions on the cloud using Team Data Science Process (TDSP) and Azure Machine Learning (AML), Presented @ GAIC Jan 18 2018, Santa Clara (20)
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Developing and deploying AI solutions on the cloud using Team Data Science Process (TDSP) and Azure Machine Learning (AML), Presented @ GAIC Jan 18 2018, Santa Clara
1. Developing and deploying AI solutions on
the cloud using Team Data Science Process
(TDSP) and Azure Machine Learning (AML)
Global Artificial
Intelligence Conference
January 18, 2018, Santa Clara, CA
Deck available @: https://aka.ms/tdsp-presentations
3. Agile and iterative process to develop,
deploy and manage AI applications in
the cloud
https://aka.ms/tdsp
tdsp-feedback@microsoft.com
Business
Understanding
Data
acquisition &
understanding
Modeling
(ML/AI)
Deployment
4. The opportunity and challenge of data science in
enterprises
Opportunity: 17% had a well-developed Predictive/Prescriptive Analytics
program in place, while 80% planned on implementing such a program within
five years – Dataversity 2015 Survey
Challenge: Only 27% of the big data projects are regarded as successful –
CapGenimi 2014
Tools & data platforms have matured -
Still a major gap in executing on the potential
5. One reason: Process challenge in Data Science
Organization
Collaboration
Quality
Knowledge Accumulation
Agility
Global Teams
• Geographic Locations
Team Growth
• Onboard New
Members Rapidly
Varied Use Cases
• Industries and Use
Cases
Diverse DS
Backgrounds
• DS have diverse
backgrounds,
experiences with
tools, languages
6. Why is a process useful for Data Science?
A process is a detailed sequence of activities necessary to
perform specific business tasks
It is used to standardize procedures and establish best practices
Technology and tools are changing rapidly. A standardized
process can provide continuity and stability of work-flow.
- Based on discussions with Luis Morinigo, Dir. IoT, NewSignature
8. TDSP objective
Integrate DevOps with data science workflows to improve
collaboration, quality, robustness and efficiency in data science projects
o Infrastructure as Code (IaC)
o Build
o Test
o CI / CD
o Release management
o Performance monitoring
9. TDSP features for data science teams
Standardized Data Science Lifecycle
Project Structure, Templates & Roles
Infrastructure & Toolkits
Re-usable Data Science Utilities
Project Execution (incl. DevOps components)
11. TDSP lifecycle stages can be integrated with specific
deliverables & checkpoints
Business
Understanding
• Project Objective
• Data, Target &
Feature Definition
• Data Dictionary
Data acquisition and
understanding
Modeling Deployment
Lifecycle
12. Project roles & tasks
Personas: account managers, PMs, DSs, SWEs, architects
Governance and Project
Management
AI Developers
Structure,templates,roles
NOTE: Specific roles depend on organizations
14. Agile work planning and execution template
o Use Agile work planning & execution
template (data science specific)
https://commons.wikimedia.org
Scrum
framework
Projectexecutionguidelines
17. Re-usable data science utilities: Analytics
Interactive data exploration and reporting – IDEAR (Python, R, MRS)
o Data quality assessment
o Getting business insights from the data
o Association between variables
o Generating standardized data quality reports automatically
Clustering Distribution assessment
https://github.com/Azure/Azure-TDSP-Utilities
DSutilities
18. Re-usable data science utilities: Analytics - modeling
Automated modeling and reporting AMAR (R)
Predicted vs. Actual (multiple algorithms) Feature Importance (multiple algorithms)
https://github.com/Azure/Azure-TDSP-Utilities
DSutilities
22. Adoption: How to stage (if needed)
Level1
- One git repository
per project
- Standard directory
structure
- Standardized
templates like
charter, exit reports
- Planning and
tracking of work
items
Level2 - Customize
templates to fit
team needs
- Create shared
team utility repo
(like IDEAR, AMAR)
Leve3
- Develop process
to graduate code
from projects to the
shared team utility
repo
- Develop E2E
worked-out
templates
- Use mature work
planning and
tracking system
(e.g. Agile)
Level4
- Link git branch
with work items
- Code review
- Manage and
version model and
data assets
- Develop
automated testing
framework
- Develop
automated CI/CD
23. Adoption: Customers
Microsoft internal
Microsoft consulting services (MCS)
AI & R Cloud Platform: Algorithm and
data sciences team
Windows Devices DS team
…
…
• External partners
• New Signature
• BlueGranite
25. Apps + insightsSocial
LOB
Graph
IoT
Image
CRM INGEST STORE PREP &
TRAIN
MODEL &
SERVE
Data orchestration
and monitoring
Data lake
and storage
Hadoop/Spark/SQL
and ML
Azure Machine Learning
IoT
Azure Machine Learning
25
26. Bring AI everywhere
Build with the tools and
platforms you know
Build, deploy, and
manage at scale
Boost productivity with
agile development
Benefit from the fastest AI developer cloud
26
27. Build, deploy,
and manage at scale
Build and deploy everywhere – cloud,
on-premises, edge, and in-data
Deploy in minutes with data driven
management and retraining of all your
models
Prototype locally then scale up and out
with VMs, Spark clusters, and GPUs
Real time, high through-put insights
everywhere, including Excel integration
27
28. Deployment and management of models as HTTP
services
Container-based hosting of real time and batch
processing
Management and monitoring through Azure
(e.g., AppInsights)
First class support for SparkML, Python, CNTK, TF, TLC, R,
extensible to support others (Caffe, MXnet)
Service authoring in Python and .NET Core
Manage models
28
30. Boost productivity
with agile development
collaboration and sharing with
notebooks and Git
version control and
reproducibility
metrics, lineage, run history, asset
management
30
31. Build with tools and
platforms you know
Choose between visual drag-and-drop or
code-first authoring
your favorite IDEs
any framework or library with the
most popular languages
Train quicker and easier with industry-leading
Spark and GPUs
31
32. Use your favorite IDE
Leverage all types of platforms and
tools/libraries
Use what you want
U S E T H E M O S T P O P U L A R I N N O VAT I O N S
U S E A N Y TO O L
U S E A N Y F R A M E W O R K O R L I B R A RY
32
35. TDSP – AML worked-out samples in AI
35
https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/walkthroughs
36. Sample: Sentiment-specific word embeddings (SSWE)
improves classification of sentiments
https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/predict-twitter-sentiment
http://www.aclweb.org/anthology/P14-1146
37. Summary
A well-defined data science process is critical to bridge the gap between
opportunities and challenges, and put AI models in production to impact businesses
A combination of appropriate process (e.g. TDSP), tools (e.g. AML) and data-
platforms (e.g. DSVM) is important for efficient development and deployment of AI
solutions
TDSP – process
AML – tool
A process such as TDSP can be used with a variety of tools and data platforms
Resources are available to use TDSP and AML together, along with other data
platforms on Azure to efficiently build and deploy AI solutions on the cloud