Machine Learning: Business Perspective - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
Enhancing and Automating Decision Making with Machine Learning - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
Automating your own Machine Learning Projects - Workshop: Working with the Masters.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
DutchMLSchool. ML: A Technical PerspectiveBigML, Inc
DutchMLSchool. Machine Learning: A Technical Perspective
TITLE AS IN SCHEDULE - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
DutchMLSchool. Logistic Regression, Deepnets, Time SeriesBigML, Inc
DutchMLSchool. Logistic Regression, Deepnets, and Time Series (Supervised Learning II) - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
Anatomy of an Application: Machine Learning End-to-End - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
DutchMLSchool. Introduction to Machine Learning with the BigML PlatformBigML, Inc
Introduction to Machine Learning with the BigML Platform - ML for Executives Course.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
DutchMLSchool. ML for Energy Trading and Automotive SectorBigML, Inc
Machine Learning for Energy Trading, Automotive Sector, and Logistics, presented by BigML's Partners A1 Digital.
Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
Machine Learning for Logistics: Predicting Expedition Outcome - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
Enhancing and Automating Decision Making with Machine Learning - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
Automating your own Machine Learning Projects - Workshop: Working with the Masters.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
DutchMLSchool. ML: A Technical PerspectiveBigML, Inc
DutchMLSchool. Machine Learning: A Technical Perspective
TITLE AS IN SCHEDULE - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
DutchMLSchool. Logistic Regression, Deepnets, Time SeriesBigML, Inc
DutchMLSchool. Logistic Regression, Deepnets, and Time Series (Supervised Learning II) - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
Anatomy of an Application: Machine Learning End-to-End - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
DutchMLSchool. Introduction to Machine Learning with the BigML PlatformBigML, Inc
Introduction to Machine Learning with the BigML Platform - ML for Executives Course.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
DutchMLSchool. ML for Energy Trading and Automotive SectorBigML, Inc
Machine Learning for Energy Trading, Automotive Sector, and Logistics, presented by BigML's Partners A1 Digital.
Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
Machine Learning for Logistics: Predicting Expedition Outcome - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
Machine learning is becoming widely used to automate decision making. While machine learning seems complex, it involves finding patterns in data that can be used to make useful predictions. The document discusses how factors like increased data availability, faster computation, and easier tools have led to the rise of machine learning applications. It also notes common pitfalls in early machine learning adoption like overhyping results and failing to develop a clear strategy. Overall machine learning is transforming industries by enabling cheaper and more data-driven decisions at scale.
DutchMLSchool. Supervised vs Unsupervised LearningBigML, Inc
Supervised versus Unsupervised Learning Techniques - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
DutchMLSchool. Models, Evaluations, and EnsemblesBigML, Inc
DutchMLSchool. Introduction to Machine Learning, Models, Evaluations, and Ensembles (Supervised Learning I) - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
The document provides guidance on building an end-to-end machine learning project to predict California housing prices using census data. It discusses getting real data from open data repositories, framing the problem as a supervised regression task, preparing the data through cleaning, feature engineering, and scaling, selecting and training models, and evaluating on a held-out test set. The project emphasizes best practices like setting aside test data, exploring the data for insights, using pipelines for preprocessing, and techniques like grid search, randomized search, and ensembles to fine-tune models.
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016MLconf
Before the Model: How Machine Learning Products Start, with Examples from Airbnb: Often the most important part of building a machine learning product is the formulation of the problem; the most elegant model is rendered useless without the right application and model architecture. Airbnb is an online marketplace for accommodations which has found many interesting applications for machine learning products by taking a data driven approach to investment in Machine learning products. Come hear about how the Airbnb team generates and vets ideas for machine learning products and tailors the product to business problems, with some examples of success and lessons learned along the way.
Data Workflows for Machine Learning - Seattle DAMLPaco Nathan
First public meetup at Twitter Seattle, for Seattle DAML:
http://www.meetup.com/Seattle-DAML/events/159043422/
We compare/contrast several open source frameworks which have emerged for Machine Learning workflows, including KNIME, IPython Notebook and related Py libraries, Cascading, Cascalog, Scalding, Summingbird, Spark/MLbase, MBrace on .NET, etc. The analysis develops several points for "best of breed" and what features would be great to see across the board for many frameworks... leading up to a "scorecard" to help evaluate different alternatives. We also review the PMML standard for migrating predictive models, e.g., from SAS to Hadoop.
Square's Machine Learning Infrastructure and Applications - Rong YanHakka Labs
1) Square uses machine learning for fraud detection in payments and to power recommendations on its Square Market platform.
2) Random forests and gradient boosted trees are the primary algorithms used for fraud detection, achieving up to a 10-11% improvement over random forests alone.
3) Square has built scalable machine learning infrastructure including parallel environments, data transport systems, and a learning management system to support rapid model development and evaluation.
This document provides an introduction to machine learning, including definitions, types, and case studies. It begins with an agenda and overview of artificial intelligence applications. It then defines machine learning as a field that allows computers to learn without being explicitly programmed. The main types of machine learning are described as supervised, unsupervised, semi-supervised, and reinforcement learning. Example case studies on Netflix recommendations, cancer diagnosis, and Amazon inventory are outlined. The document concludes with tips on prerequisites and resources for studying machine learning, including mathematics, programming tools, and course recommendations.
The Barclays Data Science Hackathon: Building Retail Recommender Systems base...Data Science Milan
In the depths of the last cold, wet British winter, the Advanced Data Analytics team from Barclays escaped to a villa on Lanzarote, Canary Islands, for a one week hackathon where they collaboratively developed a recommendation system on top of Apache Spark. The contest consisted on using Bristol customer shopping behaviour data to make personalised recommendations in a sort of Kaggle-like competition where each team's goal was to build an MVP and then repeatedly iterate on it using common interfaces defined by a specifically built framework.
The talk will cover:
• How to rapidly prototype in Spark (via the native Scala API) on your laptop and magically scale to a production cluster without huge re-engineering effort.
• The benefits of doing type-safe ETLs representing data in hybrid, and possibly nested, structures like case classes.
• Enhanced collaboration and fair performance comparison by sharing ad-hoc APIs plugged into a common evaluation framework.
• The co-existence of machine learning models available in MLlib and domain-specific bespoke algorithms implemented from scratch.
• A showcase of different families of recommender models (business-to-business similarity, customer-to-customer similarity, matrix factorisation, random forest and ensembling techniques).
• How Scala (and functional programming) helped our cause.
Gianmario is a Senior Data Scientist at Pirelli Tyre, processing telemetry data for smart manufacturing and connected vehicles applications. His main expertise is on building production-oriented machine learning systems. Co-author of the Professional Manifesto for Data Science, he loves evangelising his passion for best practices and effective methodologies amongst the community. Prior to Pirelli, he worked in Financial Services (Barclays), Cyber Security (Cisco) and Predictive Marketing (AgilOne).
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Alok Singh
Alok Singh is a Principal Engineer at IBM CODAIT who has built multiple analytical frameworks and machine learning algorithms. The presentation provides an overview of building predictive models for imbalanced datasets using scikit-learn and XGBoost. It discusses challenges with imbalanced data, evaluation metrics like confusion matrix and ROC curves, and techniques for imbalanced learning including weighted classes, oversampling minorities and undersampling majorities, and SMOTE. The presentation concludes with a hands-on tutorial demonstrating these techniques on an imbalanced bank marketing dataset.
Building Custom Machine Learning Algorithms with Apache SystemMLsparktc
This document discusses Apache SystemML, which is a machine learning framework for building custom machine learning algorithms on Apache Spark. It originated from research projects at IBM involving machine learning on Hadoop. SystemML aims to allow data scientists to build ML solutions using languages like R and Python, while executing algorithms on big data platforms like Spark. It provides a high-level language for expressing algorithms and performs automatic parallelization and optimization. The document demonstrates SystemML through a matrix factorization example for a targeted advertising problem. It shows how to use SystemML, Spark and Zeppelin together to build a custom algorithm and optimize part of the machine learning pipeline.
Valencian Summer School 2015
Day 2
Lecture 11
The Future of Machine Learning
José David Martín-Guerrero (IDAL, UV)
https://bigml.com/events/valencian-summer-school-in-machine-learning-2015
Building a performing Machine Learning model from A to ZCharles Vestur
A 1-hour read to become highly knowledgeable about Machine learning and the machinery underneath, from scratch!
A presentation introducing to all fundamental concepts of Machine Learning step by step, following a classical approach to build a performing model. Simple examples and illustrations are used all along the presentation to make the concepts easier to grasp.
Data Science Training | Data Science For Beginners | Data Science With Python...Simplilearn
This Data Science presentation will help you understand what is Data Science, who is a Data Scientist, what does a Data Scientist do and also how Python is used for Data Science. Data science is an interdisciplinary field of scientific methods, processes, algorithms and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining. This Data Science tutorial will help you establish your skills at analytical techniques using Python. With this Data Science video, you’ll learn the essential concepts of Data Science with Python programming and also understand how data acquisition, data preparation, data mining, model building & testing, data visualization is done. This Data Science tutorial is ideal for beginners who aspire to become a Data Scientist.
This Data Science presentation will cover the following topics:
1. What is Data Science?
2. Who is a Data Scientist?
3. What does a Data Scientist do?
This Data Science with Python course will establish your mastery of data science and analytics techniques using Python. With this Python for Data Science Course, you’ll learn the essential concepts of Python programming and become an expert in data analytics, machine learning, data visualization, web scraping and natural language processing. Python is a required skill for many data science positions, so jumpstart your career with this interactive, hands-on course.
Why learn Data Science?
Data Scientists are being deployed in all kinds of industries, creating a huge demand for skilled professionals. A data scientist is the pinnacle rank in an analytics organization. Glassdoor has ranked data scientist first in the 25 Best Jobs for 2016, and good data scientists are scarce and in great demand. As a data you will be required to understand the business problem, design the analysis, collect and format the required data, apply algorithms or techniques using the correct tools, and finally make recommendations backed by data.
You can gain in-depth knowledge of Data Science by taking our Data Science with python certification training course. With Simplilearn’s Data Science certification training course, you will prepare for a career as a Data Scientist as you master all the concepts and techniques. Those who complete the course will be able to:
1. Gain an in-depth understanding of data science processes, data wrangling, data exploration, data visualization, hypothesis building, and testing. You will also learn the basics of statistics.
Install the required Python environment and other auxiliary tools and libraries
2. Understand the essential concepts of Python programming such as data types, tuples, lists, dicts, basic operators and functions
3. Perform high-level mathematical computing using the NumPy package and its largelibrary of mathematical functions.
Learn more at: https://www.simplilearn.com
Machine Learning has become a must to improve insight, quality and time to market. But it's also been called the 'high interest credit card of technical debt' with challenges in managing both how it's applied and how its results are consumed.
This document discusses machine learning, including differentiating it from artificial intelligence and deep learning. It covers the need for machine learning due to increasing data volumes and how machine learning processes work through experiences to build rules and logic from data. The types of machine learning are described as supervised learning, unsupervised learning, and reinforcement learning. Examples of machine learning applications like recommendation engines and spam filters are also provided.
This document summarizes Andres Kull's presentation on machine learning applications at Pipedrive. Kull discusses how Pipedrive uses machine learning to predict trial conversion rates and the likelihood of deals closing. Key models include a decision tree to predict trial success based on user actions. Features are selected and ranked by importance, and a random forest model is trained with 5-fold cross-validation. The model is retrained daily and predictions are monitored for quality. Traces of model training and predictions are stored to explain results.
A presentation covers how data science is connected to build effective machine learning solutions. How to build end to end solutions in Azure ML. How to build, model, and evaluate algorithms in Azure ML.
The document provides an overview of digital analytics and Google Analytics. It discusses why analytics is useful for solving problems like Wanamaker's dilemma of not knowing which advertising channels are effective. It explains how analytics works by collecting anonymous data on user behavior and interactions with a website and turning that raw data into useful reports and metrics. Key aspects of Google Analytics that are covered include its data model of users, sessions, and interactions. The document also discusses segmentation, internal vs external analytics, and how analytics can be used for reporting, optimization, and strategic decision making.
Machine learning is becoming widely used to automate decision making. While machine learning seems complex, it involves finding patterns in data that can be used to make useful predictions. The document discusses how factors like increased data availability, faster computation, and easier tools have led to the rise of machine learning applications. It also notes common pitfalls in early machine learning adoption like overhyping results and failing to develop a clear strategy. Overall machine learning is transforming industries by enabling cheaper and more data-driven decisions at scale.
DutchMLSchool. Supervised vs Unsupervised LearningBigML, Inc
Supervised versus Unsupervised Learning Techniques - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
DutchMLSchool. Models, Evaluations, and EnsemblesBigML, Inc
DutchMLSchool. Introduction to Machine Learning, Models, Evaluations, and Ensembles (Supervised Learning I) - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
The document provides guidance on building an end-to-end machine learning project to predict California housing prices using census data. It discusses getting real data from open data repositories, framing the problem as a supervised regression task, preparing the data through cleaning, feature engineering, and scaling, selecting and training models, and evaluating on a held-out test set. The project emphasizes best practices like setting aside test data, exploring the data for insights, using pipelines for preprocessing, and techniques like grid search, randomized search, and ensembles to fine-tune models.
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016MLconf
Before the Model: How Machine Learning Products Start, with Examples from Airbnb: Often the most important part of building a machine learning product is the formulation of the problem; the most elegant model is rendered useless without the right application and model architecture. Airbnb is an online marketplace for accommodations which has found many interesting applications for machine learning products by taking a data driven approach to investment in Machine learning products. Come hear about how the Airbnb team generates and vets ideas for machine learning products and tailors the product to business problems, with some examples of success and lessons learned along the way.
Data Workflows for Machine Learning - Seattle DAMLPaco Nathan
First public meetup at Twitter Seattle, for Seattle DAML:
http://www.meetup.com/Seattle-DAML/events/159043422/
We compare/contrast several open source frameworks which have emerged for Machine Learning workflows, including KNIME, IPython Notebook and related Py libraries, Cascading, Cascalog, Scalding, Summingbird, Spark/MLbase, MBrace on .NET, etc. The analysis develops several points for "best of breed" and what features would be great to see across the board for many frameworks... leading up to a "scorecard" to help evaluate different alternatives. We also review the PMML standard for migrating predictive models, e.g., from SAS to Hadoop.
Square's Machine Learning Infrastructure and Applications - Rong YanHakka Labs
1) Square uses machine learning for fraud detection in payments and to power recommendations on its Square Market platform.
2) Random forests and gradient boosted trees are the primary algorithms used for fraud detection, achieving up to a 10-11% improvement over random forests alone.
3) Square has built scalable machine learning infrastructure including parallel environments, data transport systems, and a learning management system to support rapid model development and evaluation.
This document provides an introduction to machine learning, including definitions, types, and case studies. It begins with an agenda and overview of artificial intelligence applications. It then defines machine learning as a field that allows computers to learn without being explicitly programmed. The main types of machine learning are described as supervised, unsupervised, semi-supervised, and reinforcement learning. Example case studies on Netflix recommendations, cancer diagnosis, and Amazon inventory are outlined. The document concludes with tips on prerequisites and resources for studying machine learning, including mathematics, programming tools, and course recommendations.
The Barclays Data Science Hackathon: Building Retail Recommender Systems base...Data Science Milan
In the depths of the last cold, wet British winter, the Advanced Data Analytics team from Barclays escaped to a villa on Lanzarote, Canary Islands, for a one week hackathon where they collaboratively developed a recommendation system on top of Apache Spark. The contest consisted on using Bristol customer shopping behaviour data to make personalised recommendations in a sort of Kaggle-like competition where each team's goal was to build an MVP and then repeatedly iterate on it using common interfaces defined by a specifically built framework.
The talk will cover:
• How to rapidly prototype in Spark (via the native Scala API) on your laptop and magically scale to a production cluster without huge re-engineering effort.
• The benefits of doing type-safe ETLs representing data in hybrid, and possibly nested, structures like case classes.
• Enhanced collaboration and fair performance comparison by sharing ad-hoc APIs plugged into a common evaluation framework.
• The co-existence of machine learning models available in MLlib and domain-specific bespoke algorithms implemented from scratch.
• A showcase of different families of recommender models (business-to-business similarity, customer-to-customer similarity, matrix factorisation, random forest and ensembling techniques).
• How Scala (and functional programming) helped our cause.
Gianmario is a Senior Data Scientist at Pirelli Tyre, processing telemetry data for smart manufacturing and connected vehicles applications. His main expertise is on building production-oriented machine learning systems. Co-author of the Professional Manifesto for Data Science, he loves evangelising his passion for best practices and effective methodologies amongst the community. Prior to Pirelli, he worked in Financial Services (Barclays), Cyber Security (Cisco) and Predictive Marketing (AgilOne).
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Alok Singh
Alok Singh is a Principal Engineer at IBM CODAIT who has built multiple analytical frameworks and machine learning algorithms. The presentation provides an overview of building predictive models for imbalanced datasets using scikit-learn and XGBoost. It discusses challenges with imbalanced data, evaluation metrics like confusion matrix and ROC curves, and techniques for imbalanced learning including weighted classes, oversampling minorities and undersampling majorities, and SMOTE. The presentation concludes with a hands-on tutorial demonstrating these techniques on an imbalanced bank marketing dataset.
Building Custom Machine Learning Algorithms with Apache SystemMLsparktc
This document discusses Apache SystemML, which is a machine learning framework for building custom machine learning algorithms on Apache Spark. It originated from research projects at IBM involving machine learning on Hadoop. SystemML aims to allow data scientists to build ML solutions using languages like R and Python, while executing algorithms on big data platforms like Spark. It provides a high-level language for expressing algorithms and performs automatic parallelization and optimization. The document demonstrates SystemML through a matrix factorization example for a targeted advertising problem. It shows how to use SystemML, Spark and Zeppelin together to build a custom algorithm and optimize part of the machine learning pipeline.
Valencian Summer School 2015
Day 2
Lecture 11
The Future of Machine Learning
José David Martín-Guerrero (IDAL, UV)
https://bigml.com/events/valencian-summer-school-in-machine-learning-2015
Building a performing Machine Learning model from A to ZCharles Vestur
A 1-hour read to become highly knowledgeable about Machine learning and the machinery underneath, from scratch!
A presentation introducing to all fundamental concepts of Machine Learning step by step, following a classical approach to build a performing model. Simple examples and illustrations are used all along the presentation to make the concepts easier to grasp.
Data Science Training | Data Science For Beginners | Data Science With Python...Simplilearn
This Data Science presentation will help you understand what is Data Science, who is a Data Scientist, what does a Data Scientist do and also how Python is used for Data Science. Data science is an interdisciplinary field of scientific methods, processes, algorithms and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining. This Data Science tutorial will help you establish your skills at analytical techniques using Python. With this Data Science video, you’ll learn the essential concepts of Data Science with Python programming and also understand how data acquisition, data preparation, data mining, model building & testing, data visualization is done. This Data Science tutorial is ideal for beginners who aspire to become a Data Scientist.
This Data Science presentation will cover the following topics:
1. What is Data Science?
2. Who is a Data Scientist?
3. What does a Data Scientist do?
This Data Science with Python course will establish your mastery of data science and analytics techniques using Python. With this Python for Data Science Course, you’ll learn the essential concepts of Python programming and become an expert in data analytics, machine learning, data visualization, web scraping and natural language processing. Python is a required skill for many data science positions, so jumpstart your career with this interactive, hands-on course.
Why learn Data Science?
Data Scientists are being deployed in all kinds of industries, creating a huge demand for skilled professionals. A data scientist is the pinnacle rank in an analytics organization. Glassdoor has ranked data scientist first in the 25 Best Jobs for 2016, and good data scientists are scarce and in great demand. As a data you will be required to understand the business problem, design the analysis, collect and format the required data, apply algorithms or techniques using the correct tools, and finally make recommendations backed by data.
You can gain in-depth knowledge of Data Science by taking our Data Science with python certification training course. With Simplilearn’s Data Science certification training course, you will prepare for a career as a Data Scientist as you master all the concepts and techniques. Those who complete the course will be able to:
1. Gain an in-depth understanding of data science processes, data wrangling, data exploration, data visualization, hypothesis building, and testing. You will also learn the basics of statistics.
Install the required Python environment and other auxiliary tools and libraries
2. Understand the essential concepts of Python programming such as data types, tuples, lists, dicts, basic operators and functions
3. Perform high-level mathematical computing using the NumPy package and its largelibrary of mathematical functions.
Learn more at: https://www.simplilearn.com
Machine Learning has become a must to improve insight, quality and time to market. But it's also been called the 'high interest credit card of technical debt' with challenges in managing both how it's applied and how its results are consumed.
This document discusses machine learning, including differentiating it from artificial intelligence and deep learning. It covers the need for machine learning due to increasing data volumes and how machine learning processes work through experiences to build rules and logic from data. The types of machine learning are described as supervised learning, unsupervised learning, and reinforcement learning. Examples of machine learning applications like recommendation engines and spam filters are also provided.
This document summarizes Andres Kull's presentation on machine learning applications at Pipedrive. Kull discusses how Pipedrive uses machine learning to predict trial conversion rates and the likelihood of deals closing. Key models include a decision tree to predict trial success based on user actions. Features are selected and ranked by importance, and a random forest model is trained with 5-fold cross-validation. The model is retrained daily and predictions are monitored for quality. Traces of model training and predictions are stored to explain results.
A presentation covers how data science is connected to build effective machine learning solutions. How to build end to end solutions in Azure ML. How to build, model, and evaluate algorithms in Azure ML.
The document provides an overview of digital analytics and Google Analytics. It discusses why analytics is useful for solving problems like Wanamaker's dilemma of not knowing which advertising channels are effective. It explains how analytics works by collecting anonymous data on user behavior and interactions with a website and turning that raw data into useful reports and metrics. Key aspects of Google Analytics that are covered include its data model of users, sessions, and interactions. The document also discusses segmentation, internal vs external analytics, and how analytics can be used for reporting, optimization, and strategic decision making.
Travis Cox, Kathy Applebaum, and Kevin McClusky from Inductive Automation will discuss key concepts and best practices, show demos, and answer questions from the audience, to help you start integrating ML into your day-to-day processes.
Learn more about:
• Practical ways to use ML in your factory or facility
• What you'll need to get started
• Existing ML tools and platforms
• And more
Future of data science as a professionJose Quesada
How can you thrive in a future where machine learning has been popular for a few years already?
In this talk, I will give you actionable advice from my experience training serious data scientists at our retreat center in Berlin. You are going to face these pointy, hard questions:
- What is the promise of machine learning? Has it happened yet?
- Is it easy to take advance of machine learning, now that most algorithms are nicely packaged in APIs and libraries?
- How much time should I spend getting good at machine learning? Am I good enough now?
- Are data scientists going to be replaced by algorithms? Are we all?
- Is it easy to hire talent in machine learning after the explosion of MOOCs?
Travis Cox, Kathy Applebaum, and Kevin McClusky from Inductive Automation will discuss key concepts and best practices, show demos, and answer questions from the audience, to help you start integrating ML into your day-to-day processes.
Learn more about:
• Practical ways to use ML in your factory or facility
• What you'll need to get started
• Existing ML tools and platforms
• And more
Machine Learning automation. Advanced WhizzML workflows: feature selection, boosting, gradient descent, and stacking.
VSSML18: 4th edition of the Valencian Summer School in Machine Learning.
Machine learning is a type of artificial intelligence that allows systems to learn from data without being explicitly programmed. The document provides an introduction to machine learning, explaining what it is, why it is used, common algorithms, advantages, and challenges. Some key challenges discussed include poor quality data, overfitting or underfitting training data, the complexity of machine learning processes, lack of training data, slow implementation speeds, and imperfections in algorithms as data grows.
Roger S. Barga discusses his experience in data science and predictive analytics projects across multiple industries. He provides examples of predictive models built for customer segmentation, predictive maintenance, customer targeting, and network intrusion prevention. Barga also outlines a sample predictive analytics project for a real estate client to predict whether they can charge above or below market rates. The presentation emphasizes best practices for building predictive models such as starting small, leveraging third-party tools, and focusing on proxy metrics that drive business outcomes.
MLSEV Virtual. Supervised vs UnsupervisedBigML, Inc
Supervised vs Unsupervised Learning Techniques, by Charles Parker, Vice President of Machine Learning algorithms at BigML.
*MLSEV 2020: Virtual Conference.
Defcon 21-pinto-defending-networks-machine-learning by pseudor00tpseudor00t overflow
1) The document discusses using machine learning to help with the challenges of security monitoring and log management. Specifically, it presents a case study of using machine learning to build a model to detect malicious external agents based on firewall block data.
2) The model calculates "badness" ranks for IP addresses, netblocks, and autonomous system numbers based on proximity, temporal decay, and other factors. It then trains a support vector machine classifier on these features to detect malicious behaviors with 80-85% accuracy on new data.
3) The author argues this type of machine learning approach could help analysts focus on the most important alerts and events, since the models are 5-8 times more likely to correctly identify truly malicious traffic.
When we hear the Word Machine Learning we think of Self Driving Car and Advanced Medical Solutions. This brings the awe-inspiring of Huge and Complex Data, Advanced Statistics, Algebra and Sophisticated Solutions & we get scared to Build Solutions in Machine Learning.
Machine Learning solutions are not that Hard to develop and the same time not that easy to make them perfect. This slide decks will provide insight and demos of How a Software Engineer can start Developing Machine Learning Solutions easily and Eventually master the Knowledge of Machine Learning.
This document provides an overview of MLOps principles and practices based on the author's experiences developing and deploying machine learning systems. It discusses key concepts like machine learning, models, algorithms, and ground-truth data. The document then explains that operationalizing machine learning involves both data scientists developing algorithms on historical data and ML engineering teams integrating models into operational systems and data flows. It outlines the typical steps of initial model development, integration/deployment, monitoring performance, and updating models. Several principles of MLOps are also presented, including having solid data foundations with accessible, high-quality ground-truth data for data scientists and maintainers to use.
This document advertises an analytics conference and promotes advanced analytics skills. It summarizes:
- The conference will take place May 2-4 in San Jose and focus on business analytics skills and technologies to stay competitive.
- The keynote speaker, Daniel Fylstra, will discuss how analytics builds on business intelligence and the three levels of analytics: descriptive, predictive, and prescriptive.
- Analytic models can provide significant business benefits like cost savings, avoiding risks, and better decision making. Excel is positioned as a tool to build analytic models and gain these benefits.
- Attendees will learn how to access and prepare data, apply predictive algorithms, create simulation and optimization models, and interpret
Machine learning is a term thrown around in technology circles with an ever-increasing intensity. Major
technology companies have attached themselves to
this buzzword to receive capital investments, and every
major technology company is pushing its even shinier
parentartificial intelligence (AI).
Hacking Predictive Modeling - RoadSec 2018HJ van Veen
This document provides an overview of machine learning and predictive modeling techniques for hackers and data scientists. It discusses foundational concepts in machine learning like functionalism, connectionism, and black box modeling. It also covers practical techniques like feature engineering, model selection, evaluation, optimization, and popular Python libraries. The document encourages an experimental approach to hacking predictive models through techniques like brute forcing hyperparameters, fuzzing with data permutations, and social engineering within data science communities.
Are you ready for Data science? A 12 point testBertil Hatt
Presentation for the MancML on data readiness.
If you are considering starting to invest in Data science, this is a helpful guide to understand:
- what you need *before* you start looking for a Data scientist
- the skillset and experience that you should be looking for when you do.
This document provides an introduction to machine learning concepts and tools. It begins with an overview of what will be covered in the course, including machine learning types, algorithms, applications, and mathematics. It then discusses data science concepts like feature engineering and the typical steps in a machine learning project, including collecting and examining data, fitting models, evaluating performance, and deploying models. Finally, it reviews common machine learning tools and terminologies and where to find datasets.
The document discusses machine learning and how it can be used by SEOs. It defines machine learning and provides examples of applications like spam filtering, product recommendations, and home price predictions. The document encourages readers to think of problems machine learning could solve using available data and models. Specific opportunities for SEOs are discussed, like predicting customer churn, title tag optimization, and log file analysis. Readers are provided resources for learning machine learning.
This use case showcases how Machine Learning can help you understand your customers to better develop personalized relationships. The lecturer is Arturo Moreno, Associate Professor at ICADE Business School, and a technology entrepreneur, investor, and innovative leader working on the intersection of venture capital and Machine Learning.
*Machine Learning School for Business Schools 2021: Virtual Conference.
Similar to DutchMLSchool. ML Business Perspective (20)
Digital Transformation and Process Optimization in ManufacturingBigML, Inc
Keyanoush Razavidinani, Digital Services Consultant at A1 Digital, a BigML Partner, highlights why it is important to identify and reduce human bottlenecks that optimize processes and let you focus on important activities. Additionally, Guillem Vidal, Machine Learning Engineer at BigML completes the session by showcasing how Machine Learning is put to use in the manufacturing industry with a use case to detect factory failures.
The Road to Production: Automating your Anomaly Detectors - by jao (Jose A. Ortega), Co-Founder and Chief Technology Officer at BigML.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - ML for AML ComplianceBigML, Inc
Machine Learning for Anti Money Laundering Compliance, by Kevin Nagel, Consultant and Data Scientist at INFORM.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - Multi Perspective AnomaliesBigML, Inc
Multi Perspective Anomalies, by Jan W Veldsink, Master in the art of AI at Nyenrode, Rabobank, and Grio.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - My First Anomaly Detector BigML, Inc
The document discusses building an anomaly detector model to identify unusual transactions in a dataset. It describes loading transaction data with 31 features into the BigML platform and creating an anomaly detector model. The model scores new data and identifies the most anomalous fields to help detect fraud. Creating the anomaly detector involves interpreting the data, exploring the dataset distribution, and setting a threshold score to define what is considered anomalous.
DutchMLSchool 2022 - History and Developments in MLBigML, Inc
History and Present Developments in Machine Learning, by Tom Dietterich, Emeritus Professor of computer science at Oregon State University and Chief Scientist at BigML.
*Machine Learning School in The Netherlands 2022.
Introduction to End-to-End Machine Learning: Classification and Regression - Mercè Martín, VP of Bindings and Applications at BigML.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - A Data-Driven CompanyBigML, Inc
A Data-Driven Company: 21 Lessons for Large Organizations to Create Value from AI, by Richard Benjamins, Chief AI and Data Strategist at Telefónica.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - ML in the Legal SectorBigML, Inc
How Machine Learning Transforms and Automates Legal Services, by Arnoud Engelfriet, Co-Founder at Lynn Legal.
*Machine Learning School in The Netherlands 2022.
This document describes a proposed solution using machine learning and artificial intelligence to help create a safer stadium experience. The solution involves two parts: 1) linking access to stadiums to a verified identity through a fan app for preregistration, and 2) using AI/ML to help detect unwanted behaviors or events early. The rest of the document provides more details on the proposed smart video review framework, including using computer vision and audio analysis techniques to help identify issues like flares, flags, banners, chants including monkey chants. The goal is to help reviewers more efficiently identify potential problems but with privacy, ethics and human oversight.
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsBigML, Inc
Process Optimization in Manufacturing Plants, by Keyanoush Razavidinani, Digital Business Consultant at A1 Digital.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - Anomaly Detection at ScaleBigML, Inc
Lessons Learned Applying Anomaly Detection at Scale, by Álvaro Clemente, Machine Learning Engineer at BigML.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - Citizen Development in AIBigML, Inc
The document discusses the need for citizen developers and humans in the AI/ML process. It notes that while technology and talent are important, company culture must also support broad data analytics and AI/ML adoption. It then provides examples of how involving domain experts can help attribute meaning to correlations and build better causal models to improve AI systems. The document advocates for a systems thinking approach and having humans in the loop to help AI/ML systems consider the wider context and avoid issues like bias.
This new feature is a continuation of and improvement on our previous Image Processing release. Now, Object Detection lets you go a step further with your image data and allows you to locate objects and annotate regions in your images. Once your image regions are defined, you can train and evaluate Object Detection models, make predictions with them, and automate end-to-end Machine Learning workflows on a single platform. To make that possible, BigML enables Object Detection by introducing the regions optype.
As with any other BigML feature, Object Detection is available from the BigML Dashboard, API, and WhizzML for automation. Object Detection is extremely helpful to tackle a wide range of computer vision use cases such as medical image analysis, quality control in manufacturing, license plate recognition in transportation, people detection in security surveillance, among many others.
This new release brings Image Processing to the BigML platform, a feature that enhances our offering to solve image data-driven business problems with remarkable ease of use. Because BigML treats images as any other data type, this unique implementation allows you to easily use image data alongside text, categorical, numeric, date-time, and items data types as input to create any Machine Learning model available in our platform, both supervised and unsupervised.
Now, it is easier than ever to solve a wide variety of computer vision and image classification use cases in a single platform: label your image data, train and evaluate your models, make predictions, and automate your end-to-end Machine Learning workflows. As with any other BigML feature, Image Processing is available from the BigML Dashboard, API, and WhizzML, and it can be applied to solve use cases such as medical image analysis, visual product search, security surveillance, and vehicle damage detection, among others.
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureBigML, Inc
This session presents a quite common situation for those working in food and beverage retail (FnB) and highlights interesting insights to fight waste reduction.
Speaker: Stephen Kinns, CEO and Co-Founder at catsAi.
*ML in Retail 2021: Webinar.
Machine Learning in Retail: ML in the Retail SectorBigML, Inc
This is an introductory session about the role that Machine Learning is playing in the retail sector and how it is being deployed across the different areas of this industry.
Speaker: Atakan Cetinsoy, VP of Predictive Applications at BigML.
*ML in Retail 2021: Webinar.
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotBigML, Inc
This presentation analyzes the role that Machine Learning plays in legal automation with a real-world Machine Learning application.
Speaker: Arnoud Engelfriet, Co-Founder at Lynn Legal.
*ML in GRC 2021: Virtual Conference.
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...BigML, Inc
This is a real-life Machine Learning use case about integrated risk.
Speakers: Thomas Rengersen, Product Owner of the Governance Risk and Compliance Tool for Rabobank, and Thomas Alderse Baas, Co-Founder and Director of The Bowmen Group.
*ML in GRC 2021: Virtual Conference.
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
We are pleased to share with you the latest VCOSA statistical report on the cotton and yarn industry for the month of March 2024.
Starting from January 2024, the full weekly and monthly reports will only be available for free to VCOSA members. To access the complete weekly report with figures, charts, and detailed analysis of the cotton fiber market in the past week, interested parties are kindly requested to contact VCOSA to subscribe to the newsletter.
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
2. BigML, Inc #DutchMLSchool 2
ML: Business Perspective
A Gentle Introduction to Machine Learning
Charles Parker
VP, Machine Learning Algorithms
3. BigML, Inc #DutchMLSchool 3
In This Talk
• A simple introduction to supervised machine learning
• An introduction to some of the core concepts of the BigML
platform
• A tiny peek behind the curtain to see what really happens when
ML algorithms learn a model
• Ways to evaluate and interpret your model’s predictions
4. BigML, Inc #DutchMLSchool 4
A Churn Problem
• You are the CEO of a mobile phone
company (congratulations!)
• Some percentage of your customers
leave the service (or “churn” every
month)
• You have a budget to reach out to
some customers each month to try to
persuade them to stay with the service
(with, for example, incentives)
• But to do that, you need to find out
who those customers are
5. BigML, Inc #DutchMLSchool 5
Begin with the End In Mind
• Currently, you have a simple targeting strategy designed by
hand that identifies the 10,000 most likely customers to churn
• For every five people you call, two are actually thinking about
leaving (4,000 for a precision of 40%)
• Of these customers, your operators can convince half to stay (so
about 2000)
• Each of these saved customers has a net value of $500
• What if you could increase the precision of your targeting to
50%?
6. BigML, Inc #DutchMLSchool 6
You Have The Data!
Minutes Used Last Month’s
Bill
Calls To
Support
Website Visits Churn?
104 103,60 0 0 No
124 56,33 1 0 No
56 214,60 2 0 Yes
2410 305,60 0 5 No
536 145,70 0 0 No
234 122,09 0 1 No
201 185,76 1 7 Yes
111 83,60 3 2 No
7. BigML, Inc #DutchMLSchool 7
Now . . . Magic!
• Can we use this data to create a
better targeting strategy?
(Spoiler: Yes!)
• Can we use the very same data
to measure the effectiveness of
that strategy? (Spoiler: Yes!)
• And how do we do that? (Spoiler:
MACHINE LEARNING)
8. BigML, Inc #DutchMLSchool 8
Aside: BigML Resources
• We can now upload that data to BigML
• Everything created on BigML is a resource
• Resources are:
• Mostly immutable: You can’t “screw them up”
• Assigned a unique ID
• Always available via both the API and the UI
• Working with BigML is a process of creating resources
9. BigML, Inc #DutchMLSchool 9
Data Sources @ BigML
• A data source is a raw data file that you upload to the BigML
platform
• We make some initial guesses about the number and type of
columns in the file, and a bit about their content (such as the
language for text fields)
• Data can come from uploaded CSVs, Google drive, dropbox, a
random URL, and so on
10. BigML, Inc #DutchMLSchool 10
Datasets @ BigML
• A BigML dataset represents processed row-column data
• We’ve made a final determination of the number and type of
columns in the source
• Some summary stats have been calculated for each column
11. BigML, Inc #DutchMLSchool 11
Supervised Machine Learning
• Collect training data from the past about
your prediction problem, including the
right answer (e.g., statistics for each
customer month and whether or not the
customer churned at the end of that
month)
• Feed that data to a machine learning
algorithm
• The algorithm creates a program (that
we typically call a model, or classifier or
predictor) which can make that
prediction for you on future data
12. BigML, Inc #DutchMLSchool 12
Traditional: Expert and Programmer
• Machine learning breaks the expert system
paradigm
• To make an expert software system before
machine learning, you used an expert and
a programmer
• The expert’s job was to know how the system
should work and be able to communicate that
knowledge
• The programmer’s job was to convert the expert’s
knowledge into a running computer program
• These could be the same person, but you
must have both of them
13. BigML, Inc #DutchMLSchool 13
Now: Data and Algorithm
• Instead of an expert we have data
• Data can be easier to get (and is in some cases already there)
• You can get a volume of data much larger than any expert
could possibly see
• Humans are notoriously bad at being good at things:
• https://www.newscientist.com/article/mg21628930-400-specialist-knowledge-is-useless-and-unhelpful
• Instead of a programmer we have a learning algorithm
• Once you have the data in the proper format, learning
algorithms work much faster (enabling iteration)
• Learning algorithms are modular
14. BigML, Inc #DutchMLSchool 14
Back to the Data
Minutes Used Last Month’s
Bill
Calls To
Support
Website Visits Churn?
104 103,60 0 0 No
124 56,33 1 0 No
56 214,60 2 0 Yes
2410 305,60 0 5 No
536 145,70 0 0 No
234 122,09 0 1 No
201 185,76 1 7 Yes
111 83,60 3 2 No
15. BigML, Inc #DutchMLSchool 15
The Goal: A Program that Predicts
• The goal of learning is to take this sort of training data and
create a program (a model or classifier or predictor)
• This model takes as input a single row with a value for each of
the columns given in the training data
• The model will output its predicted value for the objective based
on the given column values
• Importantly this row can contain any values for the given
columns, not just the ones seen in the training data
17. BigML, Inc #DutchMLSchool 17
Behind The Scenes
• A learning algorithm is:
• A space of models that can be learned (a hypothesis space)
• A clever way of searching through that space to find a “good”
model
• A good model is one that, for example, makes accurate
predictions on the training data
• So “machine learning” is finding a model amongst all possible
models that has a good “fit” with the training data
18. BigML, Inc #DutchMLSchool 18
A Simple Hypothesis Space
• Suppose we tell our machine to split the data into two parts
based on some threshold of some feature
• If a data point is on one side of the threshold, we’ll predict the
majority class of all the training points on that side
• We can measure how many points in the training data would be
correctly predicted using this method
• This is how good our “fit” is to the training data
• The best threshold is the one with the best fit (and we will try
them all)
19. BigML, Inc #DutchMLSchool 19
Back to the Data
Minutes Used Last Month’s
Bill
Calls To
Support
Website Visits Churn?
104 103,60 0 0 No
124 56,33 1 0 No
56 214,60 2 0 Yes
2410 305,60 0 5 No
536 145,70 0 0 No
234 122,09 0 1 No
201 185,76 1 7 Yes
111 83,60 3 2 No
20. BigML, Inc #DutchMLSchool 20
Minutes Used > 200
Minutes Used Last Month’s
Bill
Calls To
Support
Website Visits Churn?
104 103,60 0 0 No
124 56,33 1 0 No
56 214,60 2 0 Yes
2410 305,60 0 5 No
536 145,70 0 0 No
234 122,09 0 1 No
201 185,76 1 7 Yes
111 83,60 3 2 No
21. BigML, Inc #DutchMLSchool 21
Website Visits > 0
Minutes Used Last Month’s
Bill
Calls To
Support
Website Visits Churn?
104 103,60 0 0 No
124 56,33 1 0 No
56 214,60 2 0 Yes
2410 305,60 0 5 No
536 145,70 0 0 No
234 122,09 0 1 No
201 185,76 1 7 Yes
111 83,60 3 2 No
22. BigML, Inc #DutchMLSchool 22
Last Bill > $180
Minutes Used Last Month’s
Bill
Calls To
Support
Website Visits Churn?
104 103,60 0 0 No
124 56,33 1 0 No
56 214,60 2 0 Yes
2410 305,60 0 5 No
536 145,70 0 0 No
234 122,09 0 1 No
201 185,76 1 7 Yes
111 83,60 3 2 No
23. BigML, Inc #DutchMLSchool 23
So Far, So Good!
• This is basically what machine
learning algorithms do
• Try a solution and see how well it fits
the training data
• If “not well”, take some steps to
“improve” it
• There are many, many different
ways of doing it, but this is
usually what it boils down to
25. BigML, Inc #DutchMLSchool 25
Now What?
• The next thing is to use the training data to test the model
• Split the data into training and test sets (machine learning is very good at
memorizing the data)
• Train a model on the training set
• Evaluate it using the test set (or, the “held out data”)
• We’ll get to the evaluation tool more fully later on
26. BigML, Inc #DutchMLSchool 26
And Now?
• Is the model good enough?
• If not:
• Different modeling approaches (model types, parameter
tuning)
• Better features (more information, transformations of the
information you already have)
• The more you fiddle with things, the more you contaminate
your results (through overfitting)
• Thus, if it’s “good enough”, it’s often best to leave it alone
28. BigML, Inc #DutchMLSchool 28
Field Importance
• While our model is good, we don’t really have a good high level
overview of why it thinks what it thinks
• BigML supervised models provide this in the form of field
importance under the model summary report
29. BigML, Inc #DutchMLSchool 29
Individual Explanations
• Individual predictions can be explained as well (as the model’s
reasoning for a particular point can be different from the model
at large)
• Use the magnifying glass in the prediction form
30. BigML, Inc #DutchMLSchool 30
Two Takeaways
• When beginning a machine learning project, the more concrete
the goal, the better. Numbers are the lifeblood of analytics so if
you can quantify your objective(s), success is unlikely
• Machine Learning isn’t the right solution for every problem! Be
wary of your algorithm being replaced by a human!
• “Before embarking on an ambitious project, try to kill it.” - Edsgar
Dijkstra