Purpose of this presentation is to highlight how end to end machine learning looks like in real world enterprise. This is to provide insight to aspiring data scientist who have been through courses or education in ML that mostly focus on ML algorithms and not end to end pipeline.
Architecture and components mentioned in Slide 11 will be discussed in detailed in series of post on LinkedIn over the course of next few month
To get updates on this follow me on LinkedIn or search/follow hashtag #end2endDS. Post will be active in August 2019 and will be posted till September 2019
Guiding through a typical Machine Learning PipelineMichael Gerke
Many People are talking about AI and Machine Learning. Here's a quick guideline how to manage ML Projects and what to consider in order to implement machine learning use cases.
Machine Learning Ml Overview Algorithms Use Cases And ApplicationsSlideTeam
"You can download this product from SlideTeam.net"
Machine Learning ML Overview Algorithms Use Cases and Applications is for the mid level managers giving information about Machine Learning, how Machine Learning works, Machine Learning algorithms and its use cases. You can also learn the difference between Machine learning vs Traditional programming to understand how to implement machine learning in a better way for business growth. https://bit.ly/2ZaVSG9
Data Scientists and Machine Learning practitioners, nowadays, seem to be churning out models by the dozen and they continuously experiment to find ways to improve their accuracies. They also use a variety of ML and DL frameworks & languages , and a typical organization may find that this results in a heterogenous, complicated bunch of assets that require different types of runtimes, resources and sometimes even specialized compute to operate efficiently.
But what does it mean for an enterprise to actually take these models to "production" ? How does an organization scale inference engines out & make them available for real-time applications without significant latencies ? There needs to be different techniques for batch (offline) inferences and instant, online scoring. Data needs to be accessed from various sources and cleansing, transformations of data needs to be enabled prior to any predictions. In many cases, there maybe no substitute for customized data handling with scripting either.
Enterprises also require additional auditing and authorizations built in, approval processes and still support a "continuous delivery" paradigm whereby a data scientist can enable insights faster. Not all models are created equal, nor are consumers of a model - so enterprises require both metering and allocation of compute resources for SLAs.
In this session, we will take a look at how machine learning is operationalized in IBM Data Science Experience (DSX), a Kubernetes based offering for the Private Cloud and optimized for the HortonWorks Hadoop Data Platform. DSX essentially brings in typical software engineering development practices to Data Science, organizing the dev->test->production for machine learning assets in much the same way as typical software deployments. We will also see what it means to deploy, monitor accuracies and even rollback models & custom scorers as well as how API based techniques enable consuming business processes and applications to remain relatively stable amidst all the chaos.
Speaker
Piotr Mierzejewski, Program Director Development IBM DSX Local, IBM
A short presentation for beginners on Introduction of Machine Learning, What it is, how it works, what all are the popular Machine Learning techniques and learning models (supervised, unsupervised, semi-supervised, reinforcement learning) and how they works with various Industry use-cases and popular examples.
An AI Maturity Roadmap for Becoming a Data-Driven OrganizationDavid Solomon
The initial version of a maturity roadmap to help guide businesses when adopting AI technology into their workflow. IBM Watson Studio is referenced as an example of technology that can help in accelerating the adoption process.
What Is Data Science? | Introduction to Data Science | Data Science For Begin...Simplilearn
This Data Science Presentation will help you in understanding what is Data Science, why we need Data Science, prerequisites for learning Data Science, what does a Data Scientist do, Data Science lifecycle with an example and career opportunities in Data Science domain. You will also learn the differences between Data Science and Business intelligence. The role of a data scientist is one of the sexiest jobs of the century. The demand for data scientists is high, and the number of opportunities for certified data scientists is increasing. Every day, companies are looking out for more and more skilled data scientists and studies show that there is expected to be a continued shortfall in qualified candidates to fill the roles. So, let us dive deep into Data Science and understand what is Data Science all about.
This Data Science Presentation will cover the following topics:
1. Need for Data Science?
2. What is Data Science?
3. Data Science vs Business intelligence
4. Prerequisites for learning Data Science
5. What does a Data scientist do?
6. Data Science life cycle with use case
7. Demand for Data scientists
This Data Science with Python course will establish your mastery of data science and analytics techniques using Python. With this Python for Data Science Course, you’ll learn the essential concepts of Python programming and become an expert in data analytics, machine learning, data visualization, web scraping and natural language processing. Python is a required skill for many data science positions, so jumpstart your career with this interactive, hands-on course.
Why learn Data Science?
Data Scientists are being deployed in all kinds of industries, creating a huge demand for skilled professionals. Data scientist is the pinnacle rank in an analytics organization. Glassdoor has ranked data scientist first in the 25 Best Jobs for 2016, and good data scientists are scarce and in great demand. As a data you will be required to understand the business problem, design the analysis, collect and format the required data, apply algorithms or techniques using the correct tools, and finally make recommendations backed by data.
The Data Science with python is recommended for:
1. Analytics professionals who want to work with Python
2. Software professionals looking to get into the field of analytics
3. IT professionals interested in pursuing a career in analytics
4. Graduates looking to build a career in analytics and data science
5. Experienced professionals who would like to harness data science in their fields
Guiding through a typical Machine Learning PipelineMichael Gerke
Many People are talking about AI and Machine Learning. Here's a quick guideline how to manage ML Projects and what to consider in order to implement machine learning use cases.
Machine Learning Ml Overview Algorithms Use Cases And ApplicationsSlideTeam
"You can download this product from SlideTeam.net"
Machine Learning ML Overview Algorithms Use Cases and Applications is for the mid level managers giving information about Machine Learning, how Machine Learning works, Machine Learning algorithms and its use cases. You can also learn the difference between Machine learning vs Traditional programming to understand how to implement machine learning in a better way for business growth. https://bit.ly/2ZaVSG9
Data Scientists and Machine Learning practitioners, nowadays, seem to be churning out models by the dozen and they continuously experiment to find ways to improve their accuracies. They also use a variety of ML and DL frameworks & languages , and a typical organization may find that this results in a heterogenous, complicated bunch of assets that require different types of runtimes, resources and sometimes even specialized compute to operate efficiently.
But what does it mean for an enterprise to actually take these models to "production" ? How does an organization scale inference engines out & make them available for real-time applications without significant latencies ? There needs to be different techniques for batch (offline) inferences and instant, online scoring. Data needs to be accessed from various sources and cleansing, transformations of data needs to be enabled prior to any predictions. In many cases, there maybe no substitute for customized data handling with scripting either.
Enterprises also require additional auditing and authorizations built in, approval processes and still support a "continuous delivery" paradigm whereby a data scientist can enable insights faster. Not all models are created equal, nor are consumers of a model - so enterprises require both metering and allocation of compute resources for SLAs.
In this session, we will take a look at how machine learning is operationalized in IBM Data Science Experience (DSX), a Kubernetes based offering for the Private Cloud and optimized for the HortonWorks Hadoop Data Platform. DSX essentially brings in typical software engineering development practices to Data Science, organizing the dev->test->production for machine learning assets in much the same way as typical software deployments. We will also see what it means to deploy, monitor accuracies and even rollback models & custom scorers as well as how API based techniques enable consuming business processes and applications to remain relatively stable amidst all the chaos.
Speaker
Piotr Mierzejewski, Program Director Development IBM DSX Local, IBM
A short presentation for beginners on Introduction of Machine Learning, What it is, how it works, what all are the popular Machine Learning techniques and learning models (supervised, unsupervised, semi-supervised, reinforcement learning) and how they works with various Industry use-cases and popular examples.
An AI Maturity Roadmap for Becoming a Data-Driven OrganizationDavid Solomon
The initial version of a maturity roadmap to help guide businesses when adopting AI technology into their workflow. IBM Watson Studio is referenced as an example of technology that can help in accelerating the adoption process.
What Is Data Science? | Introduction to Data Science | Data Science For Begin...Simplilearn
This Data Science Presentation will help you in understanding what is Data Science, why we need Data Science, prerequisites for learning Data Science, what does a Data Scientist do, Data Science lifecycle with an example and career opportunities in Data Science domain. You will also learn the differences between Data Science and Business intelligence. The role of a data scientist is one of the sexiest jobs of the century. The demand for data scientists is high, and the number of opportunities for certified data scientists is increasing. Every day, companies are looking out for more and more skilled data scientists and studies show that there is expected to be a continued shortfall in qualified candidates to fill the roles. So, let us dive deep into Data Science and understand what is Data Science all about.
This Data Science Presentation will cover the following topics:
1. Need for Data Science?
2. What is Data Science?
3. Data Science vs Business intelligence
4. Prerequisites for learning Data Science
5. What does a Data scientist do?
6. Data Science life cycle with use case
7. Demand for Data scientists
This Data Science with Python course will establish your mastery of data science and analytics techniques using Python. With this Python for Data Science Course, you’ll learn the essential concepts of Python programming and become an expert in data analytics, machine learning, data visualization, web scraping and natural language processing. Python is a required skill for many data science positions, so jumpstart your career with this interactive, hands-on course.
Why learn Data Science?
Data Scientists are being deployed in all kinds of industries, creating a huge demand for skilled professionals. Data scientist is the pinnacle rank in an analytics organization. Glassdoor has ranked data scientist first in the 25 Best Jobs for 2016, and good data scientists are scarce and in great demand. As a data you will be required to understand the business problem, design the analysis, collect and format the required data, apply algorithms or techniques using the correct tools, and finally make recommendations backed by data.
The Data Science with python is recommended for:
1. Analytics professionals who want to work with Python
2. Software professionals looking to get into the field of analytics
3. IT professionals interested in pursuing a career in analytics
4. Graduates looking to build a career in analytics and data science
5. Experienced professionals who would like to harness data science in their fields
MLflow is an MLOps tool that enables data scientist to quickly productionize their Machine Learning projects. To achieve this, MLFlow has four major components which are Tracking, Projects, Models, and Registry. MLflow lets you train, reuse, and deploy models with any library and package them into reproducible steps. MLflow is designed to work with any machine learning library and require minimal changes to integrate into an existing codebase. In this session, we will cover the common pain points of machine learning developers such as tracking experiments, reproducibility, deployment tool and model versioning. Ready to get your hands dirty by doing quick ML project using mlflow and release to production to understand the ML-Ops lifecycle.
Intro to Data Science for Enterprise Big DataPaco Nathan
If you need a different format (PDF, PPT) instead of Keynote, please email me: pnathan AT concurrentinc DOT com
An overview of Data Science for Enterprise Big Data. In other words, how to combine structured and unstructured data, leveraging the tools of automation and mathematics, for highly scalable businesses. We discuss management strategy for building Data Science teams, basic requirements of the "science" in Data Science, and typical data access patterns for working with Big Data. We review some great algorithms, tools, and truisms for building a Data Science practice, and provide plus some great references to read for further study.
Presented initially at the Enterprise Big Data meetup at Tata Consultancy Services, Santa Clara, 2012-08-20 http://www.meetup.com/Enterprise-Big-Data/events/77635202/
The right architecture is key for any IT project. This is especially the case for big data projects, where there are no standard architectures which have proven their suitability over years. This session discusses the different Big Data Architectures which have evolved over time, including traditional Big Data Architecture, Streaming Analytics architecture as well as Lambda and Kappa architecture and presents the mapping of components from both Open Source as well as the Oracle stack onto these architectures.
Data Science Training | Data Science For Beginners | Data Science With Python...Simplilearn
This Data Science presentation will help you understand what is Data Science, who is a Data Scientist, what does a Data Scientist do and also how Python is used for Data Science. Data science is an interdisciplinary field of scientific methods, processes, algorithms and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining. This Data Science tutorial will help you establish your skills at analytical techniques using Python. With this Data Science video, you’ll learn the essential concepts of Data Science with Python programming and also understand how data acquisition, data preparation, data mining, model building & testing, data visualization is done. This Data Science tutorial is ideal for beginners who aspire to become a Data Scientist.
This Data Science presentation will cover the following topics:
1. What is Data Science?
2. Who is a Data Scientist?
3. What does a Data Scientist do?
This Data Science with Python course will establish your mastery of data science and analytics techniques using Python. With this Python for Data Science Course, you’ll learn the essential concepts of Python programming and become an expert in data analytics, machine learning, data visualization, web scraping and natural language processing. Python is a required skill for many data science positions, so jumpstart your career with this interactive, hands-on course.
Why learn Data Science?
Data Scientists are being deployed in all kinds of industries, creating a huge demand for skilled professionals. A data scientist is the pinnacle rank in an analytics organization. Glassdoor has ranked data scientist first in the 25 Best Jobs for 2016, and good data scientists are scarce and in great demand. As a data you will be required to understand the business problem, design the analysis, collect and format the required data, apply algorithms or techniques using the correct tools, and finally make recommendations backed by data.
You can gain in-depth knowledge of Data Science by taking our Data Science with python certification training course. With Simplilearn’s Data Science certification training course, you will prepare for a career as a Data Scientist as you master all the concepts and techniques. Those who complete the course will be able to:
1. Gain an in-depth understanding of data science processes, data wrangling, data exploration, data visualization, hypothesis building, and testing. You will also learn the basics of statistics.
Install the required Python environment and other auxiliary tools and libraries
2. Understand the essential concepts of Python programming such as data types, tuples, lists, dicts, basic operators and functions
3. Perform high-level mathematical computing using the NumPy package and its largelibrary of mathematical functions.
Learn more at: https://www.simplilearn.com
How to Become a Data Scientist
SF Data Science Meetup, June 30, 2014
Video of this talk is available here: https://www.youtube.com/watch?v=c52IOlnPw08
More information at: http://www.zipfianacademy.com
Zipfian Academy @ Crowdflower
key note address delivered on 23rd March 2011 in the Workshop on Data Mining and Computational Biology in Bioinformatics, sponsored by DBT India and organised by Unit of Simulation and Informatics, IARI, New Delhi.
I do not claim any originality either to slides or their content and in fact aknowledge various web sources.
Loading your Life into a Vector DatabaseBen Church
This is slides from a talk I gave at Scale By the Bay titled "Loading your life into a Vector Database"
It discusses how to build a system that uses Retrieval Augmented Generation, what the constraints are, and why GraphQL is a powerful choice.
In this talk we cover Vectors, Vector Databases, Token Limits, Context Stuffing and Schema introspection
Machine Learning Model Deployment: Strategy to ImplementationDataWorks Summit
This talk will introduce participants to the theory and practice of machine learning in production. The talk will begin with an intro on machine learning models and data science systems and then discuss data pipelines, containerization, real-time vs. batch processing, change management and versioning.
As part of this talk, an audience will learn more about:
• How data scientists can have the complete self-service capability to rapidly build, train, and deploy machine learning models.
• How organizations can accelerate machine learning from research to production while preserving the flexibility and agility of data scientists and modern business use cases demand.
A small demo will showcase how to rapidly build, train, and deploy machine learning models in R, python, and Spark, and continue with a discussion of API services, RESTful wrappers/Docker, PMML/PFA, Onyx, SQLServer embedded models, and
lambda functions.
Speakers
Sagar Kewalramani, Solutions Architect
Cloudera
Justin Norman, Director, Research and Data Science Services
Cloudera Fast Forward Labs
Machine learning allows us to build predictive analytics solutions of tomorrow - these solutions allow us to better diagnose and treat patients, correctly recommend interesting books or movies, and even make the self-driving car a reality. Microsoft Azure Machine Learning (Azure ML) is a fully-managed Platform-as-a-Service (PaaS) for building these predictive analytics solutions. It is very easy to build solutions with it, helping to overcome the challenges most businesses have in deploying and using machine learning. In this presentation, we will take a look at how to create ML models with Azure ML Studio and deploy those models to production in minutes.
The breath and depth of Azure products that fall under the AI and ML umbrella can be difficult to follow. In this presentation I’ll first define exactly what AI, ML, and deep learning is, and then go over the various Microsoft AI and ML products and their use cases.
This presenation explains basics of ETL (Extract-Transform-Load) concept in relation to such data solutions as data warehousing, data migration, or data integration. CloverETL is presented closely as an example of enterprise ETL tool. It also covers typical phases of data integration projects.
The recent focus on Big Data in the data management community brings with it a paradigm shift—from the more traditional top-down, “design then build” approach to data warehousing and business intelligence, to the more bottom up, “discover and analyze” approach to analytics with Big Data. Where does data modeling fit in this new world of Big Data? Does it go away, or can it evolve to meet the emerging needs of these exciting new technologies? Join this webinar to discuss:
Big Data –A Technical & Cultural Paradigm Shift
Big Data in the Larger Information Management Landscape
Modeling & Technology Considerations
Organizational Considerations
The Role of the Data Architect in the World of Big Data
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleDatabricks
Successfully building a machine learning model is hard enough. Reproducing your results at scale — enabling others to reproduce pipelines, comparing results from other versions, moving models into production, redeploying and rolling out updated models — is exponentially harder. To address these challenges and accelerate innovation, many companies are building custom “ML platforms” to automate the end-to-end ML lifecycle.
Watch a replay of this MLOps Virtual Event to hear more about the latest developments and best practices for managing the full ML lifecycle on Databricks with MLflow. We covered a checklist of capabilities you’ll need, common pitfalls, technological and organizational challenges, and how to overcome them.
https://www.youtube.com/playlist?list=PLTPXxbhUt-YUFNBwBsSIlknoNbS7GExZw
MLflow is an MLOps tool that enables data scientist to quickly productionize their Machine Learning projects. To achieve this, MLFlow has four major components which are Tracking, Projects, Models, and Registry. MLflow lets you train, reuse, and deploy models with any library and package them into reproducible steps. MLflow is designed to work with any machine learning library and require minimal changes to integrate into an existing codebase. In this session, we will cover the common pain points of machine learning developers such as tracking experiments, reproducibility, deployment tool and model versioning. Ready to get your hands dirty by doing quick ML project using mlflow and release to production to understand the ML-Ops lifecycle.
Intro to Data Science for Enterprise Big DataPaco Nathan
If you need a different format (PDF, PPT) instead of Keynote, please email me: pnathan AT concurrentinc DOT com
An overview of Data Science for Enterprise Big Data. In other words, how to combine structured and unstructured data, leveraging the tools of automation and mathematics, for highly scalable businesses. We discuss management strategy for building Data Science teams, basic requirements of the "science" in Data Science, and typical data access patterns for working with Big Data. We review some great algorithms, tools, and truisms for building a Data Science practice, and provide plus some great references to read for further study.
Presented initially at the Enterprise Big Data meetup at Tata Consultancy Services, Santa Clara, 2012-08-20 http://www.meetup.com/Enterprise-Big-Data/events/77635202/
The right architecture is key for any IT project. This is especially the case for big data projects, where there are no standard architectures which have proven their suitability over years. This session discusses the different Big Data Architectures which have evolved over time, including traditional Big Data Architecture, Streaming Analytics architecture as well as Lambda and Kappa architecture and presents the mapping of components from both Open Source as well as the Oracle stack onto these architectures.
Data Science Training | Data Science For Beginners | Data Science With Python...Simplilearn
This Data Science presentation will help you understand what is Data Science, who is a Data Scientist, what does a Data Scientist do and also how Python is used for Data Science. Data science is an interdisciplinary field of scientific methods, processes, algorithms and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining. This Data Science tutorial will help you establish your skills at analytical techniques using Python. With this Data Science video, you’ll learn the essential concepts of Data Science with Python programming and also understand how data acquisition, data preparation, data mining, model building & testing, data visualization is done. This Data Science tutorial is ideal for beginners who aspire to become a Data Scientist.
This Data Science presentation will cover the following topics:
1. What is Data Science?
2. Who is a Data Scientist?
3. What does a Data Scientist do?
This Data Science with Python course will establish your mastery of data science and analytics techniques using Python. With this Python for Data Science Course, you’ll learn the essential concepts of Python programming and become an expert in data analytics, machine learning, data visualization, web scraping and natural language processing. Python is a required skill for many data science positions, so jumpstart your career with this interactive, hands-on course.
Why learn Data Science?
Data Scientists are being deployed in all kinds of industries, creating a huge demand for skilled professionals. A data scientist is the pinnacle rank in an analytics organization. Glassdoor has ranked data scientist first in the 25 Best Jobs for 2016, and good data scientists are scarce and in great demand. As a data you will be required to understand the business problem, design the analysis, collect and format the required data, apply algorithms or techniques using the correct tools, and finally make recommendations backed by data.
You can gain in-depth knowledge of Data Science by taking our Data Science with python certification training course. With Simplilearn’s Data Science certification training course, you will prepare for a career as a Data Scientist as you master all the concepts and techniques. Those who complete the course will be able to:
1. Gain an in-depth understanding of data science processes, data wrangling, data exploration, data visualization, hypothesis building, and testing. You will also learn the basics of statistics.
Install the required Python environment and other auxiliary tools and libraries
2. Understand the essential concepts of Python programming such as data types, tuples, lists, dicts, basic operators and functions
3. Perform high-level mathematical computing using the NumPy package and its largelibrary of mathematical functions.
Learn more at: https://www.simplilearn.com
How to Become a Data Scientist
SF Data Science Meetup, June 30, 2014
Video of this talk is available here: https://www.youtube.com/watch?v=c52IOlnPw08
More information at: http://www.zipfianacademy.com
Zipfian Academy @ Crowdflower
key note address delivered on 23rd March 2011 in the Workshop on Data Mining and Computational Biology in Bioinformatics, sponsored by DBT India and organised by Unit of Simulation and Informatics, IARI, New Delhi.
I do not claim any originality either to slides or their content and in fact aknowledge various web sources.
Loading your Life into a Vector DatabaseBen Church
This is slides from a talk I gave at Scale By the Bay titled "Loading your life into a Vector Database"
It discusses how to build a system that uses Retrieval Augmented Generation, what the constraints are, and why GraphQL is a powerful choice.
In this talk we cover Vectors, Vector Databases, Token Limits, Context Stuffing and Schema introspection
Machine Learning Model Deployment: Strategy to ImplementationDataWorks Summit
This talk will introduce participants to the theory and practice of machine learning in production. The talk will begin with an intro on machine learning models and data science systems and then discuss data pipelines, containerization, real-time vs. batch processing, change management and versioning.
As part of this talk, an audience will learn more about:
• How data scientists can have the complete self-service capability to rapidly build, train, and deploy machine learning models.
• How organizations can accelerate machine learning from research to production while preserving the flexibility and agility of data scientists and modern business use cases demand.
A small demo will showcase how to rapidly build, train, and deploy machine learning models in R, python, and Spark, and continue with a discussion of API services, RESTful wrappers/Docker, PMML/PFA, Onyx, SQLServer embedded models, and
lambda functions.
Speakers
Sagar Kewalramani, Solutions Architect
Cloudera
Justin Norman, Director, Research and Data Science Services
Cloudera Fast Forward Labs
Machine learning allows us to build predictive analytics solutions of tomorrow - these solutions allow us to better diagnose and treat patients, correctly recommend interesting books or movies, and even make the self-driving car a reality. Microsoft Azure Machine Learning (Azure ML) is a fully-managed Platform-as-a-Service (PaaS) for building these predictive analytics solutions. It is very easy to build solutions with it, helping to overcome the challenges most businesses have in deploying and using machine learning. In this presentation, we will take a look at how to create ML models with Azure ML Studio and deploy those models to production in minutes.
The breath and depth of Azure products that fall under the AI and ML umbrella can be difficult to follow. In this presentation I’ll first define exactly what AI, ML, and deep learning is, and then go over the various Microsoft AI and ML products and their use cases.
This presenation explains basics of ETL (Extract-Transform-Load) concept in relation to such data solutions as data warehousing, data migration, or data integration. CloverETL is presented closely as an example of enterprise ETL tool. It also covers typical phases of data integration projects.
The recent focus on Big Data in the data management community brings with it a paradigm shift—from the more traditional top-down, “design then build” approach to data warehousing and business intelligence, to the more bottom up, “discover and analyze” approach to analytics with Big Data. Where does data modeling fit in this new world of Big Data? Does it go away, or can it evolve to meet the emerging needs of these exciting new technologies? Join this webinar to discuss:
Big Data –A Technical & Cultural Paradigm Shift
Big Data in the Larger Information Management Landscape
Modeling & Technology Considerations
Organizational Considerations
The Role of the Data Architect in the World of Big Data
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleDatabricks
Successfully building a machine learning model is hard enough. Reproducing your results at scale — enabling others to reproduce pipelines, comparing results from other versions, moving models into production, redeploying and rolling out updated models — is exponentially harder. To address these challenges and accelerate innovation, many companies are building custom “ML platforms” to automate the end-to-end ML lifecycle.
Watch a replay of this MLOps Virtual Event to hear more about the latest developments and best practices for managing the full ML lifecycle on Databricks with MLflow. We covered a checklist of capabilities you’ll need, common pitfalls, technological and organizational challenges, and how to overcome them.
https://www.youtube.com/playlist?list=PLTPXxbhUt-YUFNBwBsSIlknoNbS7GExZw
There are patterns for things such as domain-driven design, enterprise architectures, continuous delivery, microservices, and many others.
But where are the data science and data engineering patterns?
Sometimes, data engineering reminds me of cowboy coding - many workarounds, immature technologies and lack of market best practices.
Dear students get fully solved assignments
Send your semester & Specialization name to our mail id :
“ help.mbaassignments@gmail.com ”
or
Call us at : 08263069601
(Prefer mailing. Call in emergency )
This is the Machine Learning Engineering in Production Course notes. This is the Week 3 of Machine Learning Data Life Cycle in Production (Course 2) course. This is the course 2 of MLOps specialization on coursera
How to classify documents automatically using NLPSkyl.ai
About the webinar
Documents come in different shapes and sizes - From technical documents, customer support chat, emails, reviews to news articles - all of them contain information that is valuable to the business.
Managing these large volume data documents in a traditional manual way has been a complex and time-consuming task that requires enormous human efforts.
In this webinar, we will discuss how Machine learning can be used to identify and automatically label news articles into categories like business, politics, music, etc. This can be applied in another context like categorizing emails, reviews, and processing text documents, etc.
What you will learn
- How businesses are leveraging document classification to their advantage
- Best practice to automate machine learning models in hours not months
- Demo: Classify news articles into the right category using convolution neural network
Accelerating Machine Learning as a Service with Automated Feature EngineeringCognizant
Building scalable machine learning as a service, or MLaaS, is critical to enterprise success. Key to translate machine learning project success into program success is to solve the evolving convoluted data engineering challenge, using local and global data. Enabling sharing of data features across a multitude of models within and across various line of business is pivotal to program success.
Data pipelines are the heart and soul of data science. Are you a beginner looking to understand data pipelines? A glimpse into what they are and how they work.
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...NadinaLisbon1
Joined our community-led event to dive into the world of Artificial Intelligence (AI)! Whether you were just starting your AI journey or already familiar with its concepts, one thing was certain: AI was reshaping the future of work. This enablement session was your chance to level up your skills and stay ahead in that rapidly evolving landscape.
As AI news continues to dominate headlines, it's natural to have questions and concerns about its impact on our lives. Will AI take over human jobs? Will it render us obsolete? Rest assured, the outlook is far brighter than you may think. Rather than replacing humans, AI is designed to enhance our capabilities and work alongside us. It won't be replacing marketers, service representatives, or salespeople—it will be empowering them to achieve even greater results. Companies across industries recognize this potential and are embracing AI to unlock new levels of performance.
During this enablement session, you'll have the opportunity to explore how AI advancements can positively influence your professional journey and daily life. We'll debunk common misconceptions, address fears, and showcase real-world examples of how successful AI implementation leads to workforce augmentation rather than replacement. Be prepared to gain valuable insights and practical knowledge that will help you navigate the AI landscape with confidence.
Exploring Data Modeling Techniques in Modern Data Warehousespriyanka rajput
This article delves deep into data modeling techniques in modern data warehouses, shedding light on their significance and various approaches. If you are aspiring to be a data analyst or data scientist, understanding data modeling is essential, making a Data Analytics Course in Bangalore, Lucknow, Bangalore, Pune, Delhi, Mumbai, Gandhinagar, and other cities across India an attractive proposition.
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Daniel Zivkovic
Two #ModernDataStack talks and one DevOps talk: https://youtu.be/4R--iLnjCmU
1. "From Data-driven Business to Business-driven Data: Hands-on #DataModelling exercise" by Jacob Frackson of Montreal Analytics
2. "Trends in the #DataEngineering Consulting Landscape" by Nadji Bessa of Infostrux Solutions
3. "Building Secure #Serverless Delivery Pipelines on #GCP" by Ugo Udokporo of Google Cloud Canada
We ran out of time for the 4th presenter, so the event will CONTINUE in March... stay tuned! Compliments of #ServerlessTO.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
1. End to End Machine Learning for Aspiring
Data Scientist
-S r i v a t s a n S r i n i v a s a n
h t t p s : / / w w w . l i n k e d i n . c o m / i n / s r i v a t s a n - s r i n i v a s a n - b 8 1 3 1 b /
1
2. Before you proceed.. Stop.. Read .. Proceed at your own terms
This presentation is not to complain on online courses and academics but to highlight the difference in
expectation between these courses and what enterprise need
Doing data science has it’s own set of challenges and multiple failure points. Some of the information I will be
sharing on Linkedin will cover in detail on those failure points and how to overcome the same
If you are Aspiring to be in Data Science this presentation and series of post that I will be sharing over next few
months will take you through end to end machine learning cycle in typical organization
-> Use this information to fill in the skills that can get you closer to industry needs.
-> Use this content to define strategy for yourself to land a job in enterprise world.
You can search for post using hashtag #end2endDS in LinkedIn content or follow me on LinkedIn to get updates
as I post in LinkedIn
h t t p s : / / w w w . l i n k e d i n . c o m / i n / s r i v a t s a n - s r i n i v a s a n - b 8 1 3 1 b /
Content on this topic will be posted between 29th July and 27th September, 2019. The frequency will purely
depend on bandwidth I have. On average you can expect 1 or max 2 posts in a week
I will also summarize key take away in article as well as update this presentation over time
Every data scientist need not be expert in entire ML pipeline but it is good for them to know the process
- Happy Learning
6. If you see below “Data Science Hierarchy of Needs” as Hill climbing,
Academia puts you on top of the Hill and real world is when one
understand the path to climb is the most difficult one
Image Source: Hackernoon
7. Education (Courses/Academics) vs Enterprise
Education Enterprise
Focus on Model Accuracy
and usage of algorithms
Focus on
deployment/Integration.
Balance between accuracy
and explain-ability
Focus on increasing
complexity of Models for
better accuracy
Keep it simple as much as
possible and as long as
possible
Data Mostly comes in Single
or few Files
Data comes from multiple
enterprise system. Need to
be integrated, cross
referenced and summarized
Data size is Typically Small
to Medium
Data size ranges from
Medium to Very Large
Data typically is 80% clean Data is 80% noisy
Limited Tools More Tools + Dev Ops +
Cloud + Other Craps
Do it at decent Pace Agile (Not now, don’t make
me talk)
8. For most online courses
Data Science = ML Code + Some Data Analysis
In Reality
Data Science = ML Code + Data Analysis + Data Collection + Data Engineering + Software Engineering + Dev Ops + BI Engineer
+ Product Manager
Note: If you are coming from premier institute that addresses all of the reality. Please feel free to exit the presentation
9. 5 Biggest Challenge for Enterprise deploying ML solution
• Data Collection
• Deploying and Reproducing the model in production
• Model Monitoring
• Keeping model relevant by adopting to changing business scenarios
• Communicate and interpret model output to various stakeholders
11. Data
Collection
Data
Analysis/Cle
aning
Data
Organization
and
Transformation
Feature
Engineering
Model
Training
Model
Evaluation
and
Validation
Model
Deployment
Model Re-calibration (Some steps might be optional on case basis)
Business
Understanding
Data
Understanding
Model
Monitoring
Model Drift
Analysis
Components of End to End Machine Learning Pipeline in Real World
Problem
Definition
Model
Explanation
(Local and
Global)
Health Dashboard, Reports & Alerts
Model Training (Iterative/Some steps might be optional on case basis)
Model Management and Governance
Data Management
Model and Application Logging
Pipeline Orchestrator
Infrastructure/Dev Ops/Automation
Data Drift
Analysis
Data
Validation/An
omalies
detection
Model
Integration
and SLA
understanding
12. ML Components and Skills/Role mapping
Components Primary Responsibility Secondary Responsibility
Problem Definition Business Owner, AI Champion Product Owner
Business Understanding Product Owner, Business
Owner, AI Champion
ML Engineer
Data Understanding Data Engineer, ML Engineer,
Product Owner
Business Owner/Analyst
Model Integration and SLA
understanding
ML Engineer, Data Engineer,
Software Engineer
Business Owner, Product
Owner
Data Collection Data Engineer, Data Analyst
Data Analysis/ Cleaning Data Engineer, Data Analyst
Data
Organization/Transformation
Data Engineer, ML Engineer Data Analyst
Data Validation/Anomaly
Detection
Data Analyst, Data Engineer
Feature Engineering ML Engineer Data Engineer
Model Training ML Engineer
Model Evaluation/validation ML Engineer Business Owner, Model
Governance team
Model Monitoring Operations Engineer, ML
Engineer
BI Engineer
Model Deployment Software Engineer, Data
Engineer, ML Engineer
Data Drift/Model Drift Operations Engineer, ML
Engineer
BI Engineer, ML Engineer
Dashboard/Reports BI Engineer Business Owner, Product
Owner
Note: Depending on size of ML project, One person might play multiple role or there might be multiple person required for single role.
Some role might also be part time or some components can be built as capability that can be leveraged across projects
13. Most of the Role Definition in previous slide can be found online, let me talk about AI
Champion as not much is mentioned on it….
AI Champion (Head of Analytics or Sometimes CAO himself) is responsible for driving intelligent insights backed
by data science capability within enterprise. He also owns the resulting ROI or Impact numbers on delivering
intelligent solution. He leads the data science team by developing policies, strategies and propagates culture of
experimentation and research. He and his team are also responsible for working with business stakeholders in
planning, identifying, prioritizing and Implementing AI use cases
You can find more details here: https://www.linkedin.com/pulse/identifying-prioritizing-artificial-intelligence-use-cases-srivatsan
This role might be more relevant in mid to large size organization where organization has multiple use cases to deliver and AI
Champion helps enterprise focus on prioritizing use case that can be fit for AI as well as generate substantial business value
14. Few Components of End to End ML Explained
(Will cover more details on each on my LinkedIn post)
15. Data Collection
• Data is typically collected and centralized from variety of sources either into Data Lake or Data Warehouse or any
enterprise data ecosystem
• Data is sourced from High volume transactional systems like ERP, Sales etc. or from High velocity IOT devices, POS systems
etc
• Data takes variety of shapes - Structured, Semi Structured and Unstructured sources of data
• Data takes variety of forms - Batch, Streaming, API, Alternate Data etc.
• While ingesting data is one part of the puzzle, data also needs to be cataloged, secured and governed
Further Reading: https://www.linkedin.com/pulse/think-data-first-before-being-ai-srivatsan-srinivasan
“Define a efficient Data Strategy that is simple to implement and help accelerate on AI strategy”
16. Data Analysis and Validation
Inspect and clean data to discover useful information that can further help in modeling AI driven intelligent solution.
Purpose of Data Analysis and Validation is to understand
• What is characteristic of my data and how does my data look like?
• Are there any outliers or errors in the data?
• How does independent variable respond to target variable?
• Base statistics out of analysis phase is used against production inference data to identify if the data has evolved (drifted)
from the underlying assumptions than what the model was trained on?
Further Reading: https://www.linkedin.com/pulse/tensorflow-extended-tfx-data-analysis-validation-drift-srinivasan/
“Understanding your data is key step to insight”
17. Data Organization and Transformation
Data collected from source systems into Data ecosystem are typically at granular level not directly consumable by ML model.
Sources are as well spread across multiple domain. Take marketing as example data might be spread across customer,
product, transaction systems, loyalty etc. Data Organization and Transformation is to make data consumable for ML models
and as well make data accessible for self service
Raw data typically in TB is cleansed, aggregated in a form that can be fed into model directly. This is where most heavy lifting
work happens in close collaboration with Business, Data Engineers, ML Engineers and Data Analyst
Integrate
Explore
Aggregate
Model
Deploy
Monitor
Raw Data (TB-PB)
Model Input Data (MB-GB)
60%
40%
Data Engineering and Data
Analyst
ML Engineer, Data
Engineer and Software
Engineer
Insight (KB)
18. Model Deployment
Few key things to remember while deploying models to production or integrating models with business process
Further Reading: https://www.linkedin.com/pulse/ml-model-deployment-considerations-srivatsan-srinivasan/
https://www.linkedin.com/pulse/integrating-machine-learning-models-within-matured-srinivasan/
• Training deployment skew - Models developed on historical sources might have to be deployed in streaming
flow or in edge of network/devices
• Not everything can be flask’ed or exposed as service. Deployment scenario varies based on technology in
business process, inference SLA etc
• Keep model pipeline as simple as possible. Avoid spaghetti pipeline code
• Provision for experimentation of new models when implementing deployment framework -
Champion/Challenger or A/B testing based model deployment and analysis
• Training deployment skew – Features that are hard to compute in inference time or features that were forward
computed during training time (This may sound not so sensible but trust me have seen enterprises doing such
mistake)
19. Model Monitoring
Machine Learning today is essential for running some of our critical business process. ML is deployed in decision making
substituting or replacing humans and needs to be monitored continuously as it is making decisions
Ongoing monitoring of ML models is essential to evaluate whether the assumptions that model was developed on is not
drifted and is performing as intended.
Model can drift due to changes in business assumption, Changes or issues with data, market conditions that might need
adjustment among others Ongoing monitoring highlight scenarios when model might need re-calibration. For some business
process it can be yearly for some it can be as frequently as daily.
Plan for monitoring the models continuously -> Alert on drift in data, concept or model. Business today evolves rapidly and
assumptions on which models are trained on becomes quickly invalidated. You want to know before your models starts
making wrong predictions
20. Other Key components to succeed in Enterprise Machine Learning
Structured and modularized code base
Experiment tracking for reproducibility
Version Control of ML code, data and Experiment results
Dev Ops for both Infrastructure and Model deployment
Orchestrator for Data and Model pipeline
Logging deployment runtime critical info and making it searchable
22. Food for thought #1 - Various point of Failure in ML Lifecycle
Machine Learning cycle is not complete post deployment. Model needs to be monitored continuously and be prepared for
failure at any part of pre and post modeling exercise
• Failure during experimentation. This is ideal case as well if you figure out the problem earlier.
• Failure during development by not thinking about real world inference scenario. Using features that are hard or
impossible to compute during inference
• Failure post deployment where few models did not generate business value they were supposed to
• Failure post deployment to keep up with even changing data landscape. These model need to have frequent re-calibration
or need to have some form of continuous learning
• Failure in using right performance metrics. Think from your business to succeed not for model to succeed
Further Reading –
Reasons why ML project fail: https://www.linkedin.com/pulse/top-reasons-why-artificial-intelligence-projects-fail-srinivasan/
23. Food for thought #2 - Infrastructure
Further Reading – https://www.linkedin.com/pulse/accelerating-artificial-intelligence-initiatives-srivatsan-srinivasan/
Enterprises hiring artificial intelligence and machine learning expert without right infrastructure and tools is like
“Hiring astronauts to drive a bullock cart”
Building data science capability within enterprise must be thought ground up right from selection of silicon chip. Data
Engineering and ML process are typically compute and memory intensive and on large dataset the infrastructure has to be
thought ground up.
Data scientist typically performs 100’s of iteration to come up with right algorithm, hyper parameters, metrics. Not having right
infrastructure can derail enterprise getting onto machine learning
Plan for Infrastructure with right kind of hardware (GPU, CPU, HPC etc), technologies (Hadoop, Kubernetes etc.) and tools
(Spark ML, Tensorflow, scikit etc.) that can distribute ML/DL pipelines for faster hypothesis and value generation
Cloud is very good alternative to accelerate ML journey where you can spin up compute on demand and tear down when
not needed
24. Food for thought #3 - Cloud for AI/ML
Further Reading – https://www.linkedin.com/pulse/artificial-intelligence-google-cloud-platform-srivatsan-srinivasan/
https://www.linkedin.com/pulse/data-analytics-google-cloud-platform-srivatsan-srinivasan/
Cloud is key component of AI/ML journey especially for enterprise that needs Agility to meet the huge compute demand needed
to run ML jobs
Key benefits cloud provide are
Scale - Instant access to hundreds of compute instances
Speed - Easy availability of specialized device like (GPU/TPU) that can help accelerate AI development Cloud AI API's - Quick
jump start into complex activities rather build from scratch. For cases like speech to text or language translation, enterprise as
well might lack data to build models with high accuracy as available in cloud
Cloud AutoML - Train high quality models specific to business needs with citizen data scientist or even by business users
Cloud Bursting - With advances in Hybrid Cloud, start small in local data center and use cloud to scale AI compute
25. Food for thought #4 - Stay simple as long as possible
Fitting simple models and if accuracy is low, Do you immediately jump to complex models?
Try below 2 steps before moving to trendy and complex algorithms
Follow your model output -> Listen to what your algorithm metrics says. Drill down into misclassification scenarios and see if
you are able to find any interesting pattern
Be Curious and Creative with your data -> Try to see if you find any pattern or relationship in data that has ability to influence
your model outcome. Lot can be solved by proper EDA and feature engineering
If you are still not meeting the performance targets go for complex models in increments. The steps you performed above is
still relevant and can be input to your complex models to enhance decision boundary
In some critical business process 84% of simple model performance might be better than 86% of complex models
26. Food for thought #5 - Data Science and Agile
There is lot of misconception on use of Agile for Data Science. Data Science outcome depends on continuous experimentation
where as Agile focuses on early and continuous delivery throughout the development lifecycle
First thing to remember Agile is set of guiding principles and not set in stone methodology. Agile can be tailored to one’s
unique Data Science need
Here is one way of doing data science in Agile way especially the machine learning part
• Don't set strict deliverables at the end of every sprint
• Use daily/weekly meeting to get road blockers alone not daily status
• As soon as you have working model (Say every sprint or 2) with decent accuracy put it in private beta mode. Private beta
mode or dark mode is where model generate output but it is not actioned on. This will help us monitor the data with real
world information and test its reliability
• Keep updating private beta as you build models with better performance accuracy
• Launch the private beta model to small percentage of live traffic. Collect feedback based on response from end users
• Keep increasing the volume of transaction to model in frequent interval until all traffic is diverted and feedback/outcome is
met
In real world there are scenarios where ML model might not get you same value that was seen during training/evaluation
phase. In this case agile delivery allows machine learning projects to be value and outcome focused and to achieve project
objectives in a timely manner.
27. Fact
Traditional ML algorithms can scale on large datasets. There are distributed
frameworks that can train your model on large dataset and are very effective
in learning from large dataset as well. Choose technology based on your
business and data needs
If your tabular data is big in size, switch to deep learning. Traditional
ML will not work
Machine Learning will eventually replace existing rules in legacy
system
Think ML as initially technology for complementing your legacy rules.
One can reduce the complexity of rules by introducing ML solution. It
can eventually replace but it is always better to have some deterministic
rules complementing your probabilistic ML models
Machine Learning is the new “Magic Wand” for making your business
process smart and intelligent
Do not take a non ML problem and try to fit ML into it. Use ML when
you believe it will add value to the business process. You can make
your business process smart by advance analytics or statistical
techniques as well
Data science is more than what AutoML can currently do. It will be
assistant to Data Scientist taking care of boring part of Data Scientist and
have them focus more on delivering business value
AutoML will replace and automate data science work
Myth
Food for thought #5 - Myth v/s Fact
Further Reading on AutoML – https://www.linkedin.com/pulse/fear-data-scientist-called-autophobia-srivatsan-srinivasan/
28. To Summarize
Plan for investing in right
Infrastructure (GPU, CPU,
Cloud) to accelerate model
development process
Only 20% or less of actual
pipeline is ML code
29. Thank You and Stay Tuned on LinkedIn for more info
on End to End Data Science Pipeline
Follow or search with hashtag #end2endDS in
LinkedIn to get updates