Introduction to Machine Learning with the BigML Platform - ML for Executives Course.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
DutchMLSchool. ML: A Technical PerspectiveBigML, Inc
DutchMLSchool. Machine Learning: A Technical Perspective
TITLE AS IN SCHEDULE - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
Enhancing and Automating Decision Making with Machine Learning - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
DutchMLSchool. Supervised vs Unsupervised LearningBigML, Inc
Supervised versus Unsupervised Learning Techniques - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
DutchMLSchool. Models, Evaluations, and EnsemblesBigML, Inc
DutchMLSchool. Introduction to Machine Learning, Models, Evaluations, and Ensembles (Supervised Learning I) - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
DutchMLSchool. Logistic Regression, Deepnets, Time SeriesBigML, Inc
DutchMLSchool. Logistic Regression, Deepnets, and Time Series (Supervised Learning II) - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
Machine learning is becoming widely used to automate decision making. While machine learning seems complex, it involves finding patterns in data that can be used to make useful predictions. The document discusses how factors like increased data availability, faster computation, and easier tools have led to the rise of machine learning applications. It also notes common pitfalls in early machine learning adoption like overhyping results and failing to develop a clear strategy. Overall machine learning is transforming industries by enabling cheaper and more data-driven decisions at scale.
Machine Learning: Business Perspective - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
DutchMLSchool. ML for Energy Trading and Automotive SectorBigML, Inc
Machine Learning for Energy Trading, Automotive Sector, and Logistics, presented by BigML's Partners A1 Digital.
Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
DutchMLSchool. ML: A Technical PerspectiveBigML, Inc
DutchMLSchool. Machine Learning: A Technical Perspective
TITLE AS IN SCHEDULE - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
Enhancing and Automating Decision Making with Machine Learning - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
DutchMLSchool. Supervised vs Unsupervised LearningBigML, Inc
Supervised versus Unsupervised Learning Techniques - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
DutchMLSchool. Models, Evaluations, and EnsemblesBigML, Inc
DutchMLSchool. Introduction to Machine Learning, Models, Evaluations, and Ensembles (Supervised Learning I) - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
DutchMLSchool. Logistic Regression, Deepnets, Time SeriesBigML, Inc
DutchMLSchool. Logistic Regression, Deepnets, and Time Series (Supervised Learning II) - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
Machine learning is becoming widely used to automate decision making. While machine learning seems complex, it involves finding patterns in data that can be used to make useful predictions. The document discusses how factors like increased data availability, faster computation, and easier tools have led to the rise of machine learning applications. It also notes common pitfalls in early machine learning adoption like overhyping results and failing to develop a clear strategy. Overall machine learning is transforming industries by enabling cheaper and more data-driven decisions at scale.
Machine Learning: Business Perspective - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
DutchMLSchool. ML for Energy Trading and Automotive SectorBigML, Inc
Machine Learning for Energy Trading, Automotive Sector, and Logistics, presented by BigML's Partners A1 Digital.
Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
Anatomy of an Application: Machine Learning End-to-End - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
Automating your own Machine Learning Projects - Workshop: Working with the Masters.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
DutchMLSchool. Associations and Topic ModelsBigML, Inc
DutchMLSchool. Association Discovery and Topic Modeling (Unsupervised II) - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
This document discusses deepnets, which are a type of supervised learning algorithm for classification and regression. Deepnets build upon logistic regression by adding hidden layers between the input and output layers. This allows deepnets to model more complex nonlinear relationships than logistic regression. While deepnets have powerful representational abilities, their success depends on finding the optimal network structure for a given problem. The document outlines how BigML uses metalearning and network search techniques to automate this process and make deepnets more accessible for users. Deepnets work best for problems where computational resources allow exploring many network structures to find the best performing one.
Feature engineering is the process of using domain knowledge to create new features that allow machine learning algorithms to work better or work at all. It involves applying transformations and encoding schemes to raw data to construct informative features for modeling. Feature engineering is important because ML algorithms only learn from the data and features provided, so carefully engineered features are crucial. Effective feature engineering requires domain expertise, experimentation, and evaluation to identify representations of the data that best support predictive tasks.
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Alok Singh
Alok Singh is a Principal Engineer at IBM CODAIT who has built multiple analytical frameworks and machine learning algorithms. The presentation provides an overview of building predictive models for imbalanced datasets using scikit-learn and XGBoost. It discusses challenges with imbalanced data, evaluation metrics like confusion matrix and ROC curves, and techniques for imbalanced learning including weighted classes, oversampling minorities and undersampling majorities, and SMOTE. The presentation concludes with a hands-on tutorial demonstrating these techniques on an imbalanced bank marketing dataset.
This document summarizes a presentation on feature engineering for machine learning. It discusses how feature engineering is important for allowing machine learning algorithms to work better or at all by creating new features that provide better representations of the data. Various techniques for feature engineering are presented, including transforming date/time fields, handling categorical variables, text analysis, and discretizing continuous variables. The use of feature engineering tools like Flatline for programmatically creating new features is also demonstrated. Feature selection techniques are briefly discussed to help identify the most important and non-leaky features.
Square's Machine Learning Infrastructure and Applications - Rong YanHakka Labs
1) Square uses machine learning for fraud detection in payments and to power recommendations on its Square Market platform.
2) Random forests and gradient boosted trees are the primary algorithms used for fraud detection, achieving up to a 10-11% improvement over random forests alone.
3) Square has built scalable machine learning infrastructure including parallel environments, data transport systems, and a learning management system to support rapid model development and evaluation.
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016MLconf
Before the Model: How Machine Learning Products Start, with Examples from Airbnb: Often the most important part of building a machine learning product is the formulation of the problem; the most elegant model is rendered useless without the right application and model architecture. Airbnb is an online marketplace for accommodations which has found many interesting applications for machine learning products by taking a data driven approach to investment in Machine learning products. Come hear about how the Airbnb team generates and vets ideas for machine learning products and tailors the product to business problems, with some examples of success and lessons learned along the way.
Feature engineering is the process of using domain knowledge to create new features that allow machine learning algorithms to work better or work at all. It involves applying transformations to existing features, like splitting date-time fields or normalizing numeric values, as well as computing new features from existing ones. Flatline is a domain-specific language for programmatic feature engineering and filtering that allows creating new features using expressions over existing fields. Care must be taken to avoid leakage when creating new features.
Valencian Summer School 2015
Day 2
Lecture 11
The Future of Machine Learning
José David Martín-Guerrero (IDAL, UV)
https://bigml.com/events/valencian-summer-school-in-machine-learning-2015
Big Data & Machine Learning - TDC2013 Sao PauloOCTO Technology
BigData and Machine Learning: Usage and Opportunities for your IT department
Talk presented at The Developer Conference in São Paulo - 12/0713
Mathieu DESPRIEE
Data Science: A Mindset for Productivity
Keynote at 2015 Ronin Labs West Coast CTO Summit
https://www.eventjoy.com/e/west-coast-cto-summit-2015
Abstract
Data science isn't just about using a collection of technologies and algorithms. Data science requires a mindset that solves problems at a higher level of abstraction. How do we model utility when we think about optimization? How do we decide which hypotheses to test? How do we allocate our scarce resources to make progress?
There are no silver bullets. But I'll share what I've learned from a variety of contexts over the course of my work at Endeca, Google, and LinkedIn; and I hope you'll leave this talk with some practical wisdom you can apply to your next data science project.
Yuri is a Member of Technical Staff / Data Scientist at eBay in New York City. He is currently focused on developing scalable machine learning algorithms to produce high quality item recommendations. Yuri holds a Ph.D. degree from the Applied Physics and Applied Mathematics department from Columbia University and an undergraduate degree in Physics from UC Berkeley.
Abstract Summary:
Innovations in Recommender Systems for a Semi-structured Marketplace:
eBay has over 1 billion live items on the site at any given time. The lack of structured information about listings as well as variable inventory makes traditional collaborative filtering algorithms difficult to use in eBay’s large semi-structured marketplace. We will discuss approaches to overcome these challenges using machine learning and deep learning (both text and image based models). The details of the sampling strategy, feature engineering, and machine learned ranking model are all important for delivering improved operational metrics in A/B tests. We will cover both system architecture engineering as well as data science and machine learning methods that were developed to generate high quality recommendations.
As we move into a new era of ITSM computing, new big data and machine learning tools and methodologies are being developed to support IT staff by intelligently extracting insights and making predictions from the enormous amounts of data accumulated from the organization. According to Gartner, I&O leaders must take a comprehensive approach to incorporate advanced big data and machine learning technologies into their organizations or risk becoming irrelevant. But what exactly is big data and machine learning all about? How can you introduce these concepts into your existing Service Desk?
Join USF’s distinguished Computer Science and Engineering Professor Lawrence Hall and SunView Software’s VP of Marketing and Product Strategy John Prestridge as they break down the fundamentals of big data and machine learning and provide real-world examples of the impact the technologies will have on ITSM.
This document provides an introduction to machine learning, including definitions, types, and case studies. It begins with an agenda and overview of artificial intelligence applications. It then defines machine learning as a field that allows computers to learn without being explicitly programmed. The main types of machine learning are described as supervised, unsupervised, semi-supervised, and reinforcement learning. Example case studies on Netflix recommendations, cancer diagnosis, and Amazon inventory are outlined. The document concludes with tips on prerequisites and resources for studying machine learning, including mathematics, programming tools, and course recommendations.
The document discusses machine learning techniques for analyzing big data. It outlines three tenants of success: prediction, optimization, and automation. Various machine learning models are examined, including linear models, decision trees, neural networks, and clustering. Implementing machine learning algorithms in Hadoop distributed environments is also discussed. Optimization techniques like evolutionary algorithms are presented. Regularly adapting models with updated data is recommended to keep analyses current.
DutchMLSchool 2022 - Anomaly Detection at ScaleBigML, Inc
Lessons Learned Applying Anomaly Detection at Scale, by Álvaro Clemente, Machine Learning Engineer at BigML.
*Machine Learning School in The Netherlands 2022.
Anatomy of an Application: Machine Learning End-to-End - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
Automating your own Machine Learning Projects - Workshop: Working with the Masters.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
DutchMLSchool. Associations and Topic ModelsBigML, Inc
DutchMLSchool. Association Discovery and Topic Modeling (Unsupervised II) - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
This document discusses deepnets, which are a type of supervised learning algorithm for classification and regression. Deepnets build upon logistic regression by adding hidden layers between the input and output layers. This allows deepnets to model more complex nonlinear relationships than logistic regression. While deepnets have powerful representational abilities, their success depends on finding the optimal network structure for a given problem. The document outlines how BigML uses metalearning and network search techniques to automate this process and make deepnets more accessible for users. Deepnets work best for problems where computational resources allow exploring many network structures to find the best performing one.
Feature engineering is the process of using domain knowledge to create new features that allow machine learning algorithms to work better or work at all. It involves applying transformations and encoding schemes to raw data to construct informative features for modeling. Feature engineering is important because ML algorithms only learn from the data and features provided, so carefully engineered features are crucial. Effective feature engineering requires domain expertise, experimentation, and evaluation to identify representations of the data that best support predictive tasks.
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Alok Singh
Alok Singh is a Principal Engineer at IBM CODAIT who has built multiple analytical frameworks and machine learning algorithms. The presentation provides an overview of building predictive models for imbalanced datasets using scikit-learn and XGBoost. It discusses challenges with imbalanced data, evaluation metrics like confusion matrix and ROC curves, and techniques for imbalanced learning including weighted classes, oversampling minorities and undersampling majorities, and SMOTE. The presentation concludes with a hands-on tutorial demonstrating these techniques on an imbalanced bank marketing dataset.
This document summarizes a presentation on feature engineering for machine learning. It discusses how feature engineering is important for allowing machine learning algorithms to work better or at all by creating new features that provide better representations of the data. Various techniques for feature engineering are presented, including transforming date/time fields, handling categorical variables, text analysis, and discretizing continuous variables. The use of feature engineering tools like Flatline for programmatically creating new features is also demonstrated. Feature selection techniques are briefly discussed to help identify the most important and non-leaky features.
Square's Machine Learning Infrastructure and Applications - Rong YanHakka Labs
1) Square uses machine learning for fraud detection in payments and to power recommendations on its Square Market platform.
2) Random forests and gradient boosted trees are the primary algorithms used for fraud detection, achieving up to a 10-11% improvement over random forests alone.
3) Square has built scalable machine learning infrastructure including parallel environments, data transport systems, and a learning management system to support rapid model development and evaluation.
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016MLconf
Before the Model: How Machine Learning Products Start, with Examples from Airbnb: Often the most important part of building a machine learning product is the formulation of the problem; the most elegant model is rendered useless without the right application and model architecture. Airbnb is an online marketplace for accommodations which has found many interesting applications for machine learning products by taking a data driven approach to investment in Machine learning products. Come hear about how the Airbnb team generates and vets ideas for machine learning products and tailors the product to business problems, with some examples of success and lessons learned along the way.
Feature engineering is the process of using domain knowledge to create new features that allow machine learning algorithms to work better or work at all. It involves applying transformations to existing features, like splitting date-time fields or normalizing numeric values, as well as computing new features from existing ones. Flatline is a domain-specific language for programmatic feature engineering and filtering that allows creating new features using expressions over existing fields. Care must be taken to avoid leakage when creating new features.
Valencian Summer School 2015
Day 2
Lecture 11
The Future of Machine Learning
José David Martín-Guerrero (IDAL, UV)
https://bigml.com/events/valencian-summer-school-in-machine-learning-2015
Big Data & Machine Learning - TDC2013 Sao PauloOCTO Technology
BigData and Machine Learning: Usage and Opportunities for your IT department
Talk presented at The Developer Conference in São Paulo - 12/0713
Mathieu DESPRIEE
Data Science: A Mindset for Productivity
Keynote at 2015 Ronin Labs West Coast CTO Summit
https://www.eventjoy.com/e/west-coast-cto-summit-2015
Abstract
Data science isn't just about using a collection of technologies and algorithms. Data science requires a mindset that solves problems at a higher level of abstraction. How do we model utility when we think about optimization? How do we decide which hypotheses to test? How do we allocate our scarce resources to make progress?
There are no silver bullets. But I'll share what I've learned from a variety of contexts over the course of my work at Endeca, Google, and LinkedIn; and I hope you'll leave this talk with some practical wisdom you can apply to your next data science project.
Yuri is a Member of Technical Staff / Data Scientist at eBay in New York City. He is currently focused on developing scalable machine learning algorithms to produce high quality item recommendations. Yuri holds a Ph.D. degree from the Applied Physics and Applied Mathematics department from Columbia University and an undergraduate degree in Physics from UC Berkeley.
Abstract Summary:
Innovations in Recommender Systems for a Semi-structured Marketplace:
eBay has over 1 billion live items on the site at any given time. The lack of structured information about listings as well as variable inventory makes traditional collaborative filtering algorithms difficult to use in eBay’s large semi-structured marketplace. We will discuss approaches to overcome these challenges using machine learning and deep learning (both text and image based models). The details of the sampling strategy, feature engineering, and machine learned ranking model are all important for delivering improved operational metrics in A/B tests. We will cover both system architecture engineering as well as data science and machine learning methods that were developed to generate high quality recommendations.
As we move into a new era of ITSM computing, new big data and machine learning tools and methodologies are being developed to support IT staff by intelligently extracting insights and making predictions from the enormous amounts of data accumulated from the organization. According to Gartner, I&O leaders must take a comprehensive approach to incorporate advanced big data and machine learning technologies into their organizations or risk becoming irrelevant. But what exactly is big data and machine learning all about? How can you introduce these concepts into your existing Service Desk?
Join USF’s distinguished Computer Science and Engineering Professor Lawrence Hall and SunView Software’s VP of Marketing and Product Strategy John Prestridge as they break down the fundamentals of big data and machine learning and provide real-world examples of the impact the technologies will have on ITSM.
This document provides an introduction to machine learning, including definitions, types, and case studies. It begins with an agenda and overview of artificial intelligence applications. It then defines machine learning as a field that allows computers to learn without being explicitly programmed. The main types of machine learning are described as supervised, unsupervised, semi-supervised, and reinforcement learning. Example case studies on Netflix recommendations, cancer diagnosis, and Amazon inventory are outlined. The document concludes with tips on prerequisites and resources for studying machine learning, including mathematics, programming tools, and course recommendations.
The document discusses machine learning techniques for analyzing big data. It outlines three tenants of success: prediction, optimization, and automation. Various machine learning models are examined, including linear models, decision trees, neural networks, and clustering. Implementing machine learning algorithms in Hadoop distributed environments is also discussed. Optimization techniques like evolutionary algorithms are presented. Regularly adapting models with updated data is recommended to keep analyses current.
DutchMLSchool 2022 - Anomaly Detection at ScaleBigML, Inc
Lessons Learned Applying Anomaly Detection at Scale, by Álvaro Clemente, Machine Learning Engineer at BigML.
*Machine Learning School in The Netherlands 2022.
Introduction to End-to-End Machine Learning: Classification and Regression - Mercè Martín, VP of Bindings and Applications at BigML.
*Machine Learning School in The Netherlands 2022.
Learn about OptiML, the automatic optimization feature for model selection and parameterization on BigML. OptiML helps you avoid the difficult and time-consuming work of hand-tuning multiple supervised algorithms until you find the best one that solves your specific problem.
Cluster Analysis and Anomaly Detection (Unsupervised I) - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...Sri Ambati
Presented at #H2OWorld 2017 in Mountain View, CA.
Enjoy the video: https://youtu.be/-qfEOwm5Th4.
Learn more about H2O.ai: https://www.h2o.ai/.
Follow @h2oai: https://twitter.com/h2oai.
- - -
In this talk, we discuss how we implemented H2O and LIME to predict and explain employee turnover on the IBM Watson HR Employee Attrition dataset. We use H2O’s new automated machine learning algorithm to improve on the accuracy of IBM Watson. We use LIME to produce feature importance and ultimately explain the black-box model produced by H2O.
Matt Dancho is the founder of Business Science (www.business-science.io), a consulting firm that assists organizations in applying data science to business applications. He is the creator of R packages tidyquant and timetk and has been working with data science for business and financial analysis since 2011. Matt holds master’s degrees in business and engineering, and has extensive experience in business intelligence, data mining, time series analysis, statistics and machine learning. Connect with Matt on twitter (https://twitter.com/mdancho84) and LinkedIn (https://www.linkedin.com/in/mattdancho/).
Hacking Predictive Modeling - RoadSec 2018HJ van Veen
This document provides an overview of machine learning and predictive modeling techniques for hackers and data scientists. It discusses foundational concepts in machine learning like functionalism, connectionism, and black box modeling. It also covers practical techniques like feature engineering, model selection, evaluation, optimization, and popular Python libraries. The document encourages an experimental approach to hacking predictive models through techniques like brute forcing hyperparameters, fuzzing with data permutations, and social engineering within data science communities.
MLSEV. Models, Evaluations and Ensembles BigML, Inc
Introduction to Machine Learning. Supervised Learning (Part I): Models, Evaluations and Ensembles, by BigML.
MLSEV 2019: 1st edition of the Machine Learning School in Seville, Spain.
Machine Learning automation. Advanced WhizzML workflows: feature selection, boosting, gradient descent, and stacking.
VSSML18: 4th edition of the Valencian Summer School in Machine Learning.
This document provides guidance on how to become a competent data professional. It discusses the various types of data careers and skills required, including problem solving, statistics, programming, communication and business skills. It recommends taking online courses and finding a mentor, as well as gaining hands-on experience through competitions like Kaggle. With 5-6 years of consistent practice spending several hours per day learning, one can become competent in data skills. The document also addresses common questions for beginners and provides tips for progression in a data career.
NYC Open Data Meetup-- Thoughtworks chief data scientist talkVivian S. Zhang
This document summarizes a presentation on data science consulting. It discusses:
1) The Agile Analytics group at ThoughtWorks which does data science consulting projects using probabilistic modeling, machine learning, and big data technologies.
2) Two case studies are described, including developing a machine learning model to improve matching of healthcare product data and using logistic regression for retail recommendation systems.
3) The origins and future of the field are discussed, noting that while not entirely new, data science has grown due to improvements in technology, programming languages, and libraries that have increased productivity and driven new career opportunities in the field.
The document describes the development of a house recommender system using machine learning techniques. It discusses using anomaly detection to identify and remove unusual homes from recommendations, using machine learning to impute missing data values, clustering homes into groups to provide more varied recommendations, association discovery to understand relationships between home features and clusters, and topic modeling of home descriptions to provide deeper insights into grouping homes. The overall goal is to build a preference model based on user input to filter and recommend relevant homes for sale.
The document discusses parameter optimization and machine learning techniques for tuning models. It covers using machine learning to predict the performance of parameter configurations before training models on them, called Bayesian parameter optimization. It also discusses dangers of naive cross-validation and how to select the best model by considering factors beyond just performance like retraining needs and prediction speed. The document advocates creating diverse ensembles through techniques like fusions to improve stability and importance profiles.
Artur Suchwalko “What are common mistakes in Data Science projects and how to...Lviv Startup Club
Common mistakes in data science projects include:
1) Not properly defining the business problem or focusing on optimizing the wrong process.
2) Not adequately preparing the data or understanding how it was generated.
3) Rushing the modeling process or implementation without proper testing.
4) Choosing complex methods or "AI" solutions when simpler approaches may work better.
5) Not involving experienced people or adequately educating the team.
To avoid these mistakes, it is important to carefully analyze the business problem, data, modeling process, and make sure the right people are involved.
Similar to DutchMLSchool. Introduction to Machine Learning with the BigML Platform (20)
Digital Transformation and Process Optimization in ManufacturingBigML, Inc
Keyanoush Razavidinani, Digital Services Consultant at A1 Digital, a BigML Partner, highlights why it is important to identify and reduce human bottlenecks that optimize processes and let you focus on important activities. Additionally, Guillem Vidal, Machine Learning Engineer at BigML completes the session by showcasing how Machine Learning is put to use in the manufacturing industry with a use case to detect factory failures.
The Road to Production: Automating your Anomaly Detectors - by jao (Jose A. Ortega), Co-Founder and Chief Technology Officer at BigML.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - ML for AML ComplianceBigML, Inc
Machine Learning for Anti Money Laundering Compliance, by Kevin Nagel, Consultant and Data Scientist at INFORM.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - Multi Perspective AnomaliesBigML, Inc
Multi Perspective Anomalies, by Jan W Veldsink, Master in the art of AI at Nyenrode, Rabobank, and Grio.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - My First Anomaly Detector BigML, Inc
The document discusses building an anomaly detector model to identify unusual transactions in a dataset. It describes loading transaction data with 31 features into the BigML platform and creating an anomaly detector model. The model scores new data and identifies the most anomalous fields to help detect fraud. Creating the anomaly detector involves interpreting the data, exploring the dataset distribution, and setting a threshold score to define what is considered anomalous.
DutchMLSchool 2022 - History and Developments in MLBigML, Inc
History and Present Developments in Machine Learning, by Tom Dietterich, Emeritus Professor of computer science at Oregon State University and Chief Scientist at BigML.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - A Data-Driven CompanyBigML, Inc
A Data-Driven Company: 21 Lessons for Large Organizations to Create Value from AI, by Richard Benjamins, Chief AI and Data Strategist at Telefónica.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - ML in the Legal SectorBigML, Inc
How Machine Learning Transforms and Automates Legal Services, by Arnoud Engelfriet, Co-Founder at Lynn Legal.
*Machine Learning School in The Netherlands 2022.
This document describes a proposed solution using machine learning and artificial intelligence to help create a safer stadium experience. The solution involves two parts: 1) linking access to stadiums to a verified identity through a fan app for preregistration, and 2) using AI/ML to help detect unwanted behaviors or events early. The rest of the document provides more details on the proposed smart video review framework, including using computer vision and audio analysis techniques to help identify issues like flares, flags, banners, chants including monkey chants. The goal is to help reviewers more efficiently identify potential problems but with privacy, ethics and human oversight.
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsBigML, Inc
Process Optimization in Manufacturing Plants, by Keyanoush Razavidinani, Digital Business Consultant at A1 Digital.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - Citizen Development in AIBigML, Inc
The document discusses the need for citizen developers and humans in the AI/ML process. It notes that while technology and talent are important, company culture must also support broad data analytics and AI/ML adoption. It then provides examples of how involving domain experts can help attribute meaning to correlations and build better causal models to improve AI systems. The document advocates for a systems thinking approach and having humans in the loop to help AI/ML systems consider the wider context and avoid issues like bias.
This new feature is a continuation of and improvement on our previous Image Processing release. Now, Object Detection lets you go a step further with your image data and allows you to locate objects and annotate regions in your images. Once your image regions are defined, you can train and evaluate Object Detection models, make predictions with them, and automate end-to-end Machine Learning workflows on a single platform. To make that possible, BigML enables Object Detection by introducing the regions optype.
As with any other BigML feature, Object Detection is available from the BigML Dashboard, API, and WhizzML for automation. Object Detection is extremely helpful to tackle a wide range of computer vision use cases such as medical image analysis, quality control in manufacturing, license plate recognition in transportation, people detection in security surveillance, among many others.
This new release brings Image Processing to the BigML platform, a feature that enhances our offering to solve image data-driven business problems with remarkable ease of use. Because BigML treats images as any other data type, this unique implementation allows you to easily use image data alongside text, categorical, numeric, date-time, and items data types as input to create any Machine Learning model available in our platform, both supervised and unsupervised.
Now, it is easier than ever to solve a wide variety of computer vision and image classification use cases in a single platform: label your image data, train and evaluate your models, make predictions, and automate your end-to-end Machine Learning workflows. As with any other BigML feature, Image Processing is available from the BigML Dashboard, API, and WhizzML, and it can be applied to solve use cases such as medical image analysis, visual product search, security surveillance, and vehicle damage detection, among others.
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureBigML, Inc
This session presents a quite common situation for those working in food and beverage retail (FnB) and highlights interesting insights to fight waste reduction.
Speaker: Stephen Kinns, CEO and Co-Founder at catsAi.
*ML in Retail 2021: Webinar.
Machine Learning in Retail: ML in the Retail SectorBigML, Inc
This is an introductory session about the role that Machine Learning is playing in the retail sector and how it is being deployed across the different areas of this industry.
Speaker: Atakan Cetinsoy, VP of Predictive Applications at BigML.
*ML in Retail 2021: Webinar.
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotBigML, Inc
This presentation analyzes the role that Machine Learning plays in legal automation with a real-world Machine Learning application.
Speaker: Arnoud Engelfriet, Co-Founder at Lynn Legal.
*ML in GRC 2021: Virtual Conference.
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...BigML, Inc
This is a real-life Machine Learning use case about integrated risk.
Speakers: Thomas Rengersen, Product Owner of the Governance Risk and Compliance Tool for Rabobank, and Thomas Alderse Baas, Co-Founder and Director of The Bowmen Group.
*ML in GRC 2021: Virtual Conference.
ML in GRC: Cybersecurity versus Governance, Risk Management, and ComplianceBigML, Inc
Some of these concepts (Cybersecurity, Governance, Risk Management, and Compliance) overlap and sometimes they can be confusing. This session helps us understand why those terms are key for any business to be successful.
Speaker: Jon Shende, Founding Investor at MyVayda.
*ML in GRC 2021: Virtual Conference.
Intelligent Mobility: Machine Learning in the Mobility IndustryBigML, Inc
The document discusses intelligent mobility and how machine learning can help improve transportation systems. It provides examples of how ML can be applied to roads, ports, railways and airports for tasks like license plate recognition, container tracking, flight delay prediction and predictive maintenance. The document also discusses how ML platforms can help companies build scalable predictive applications by standardizing workflows, integrating various data sources and empowering employees and domain experts to develop and use ML models.
Discover the cutting-edge telemetry solution implemented for Alan Wake 2 by Remedy Entertainment in collaboration with AWS. This comprehensive presentation dives into our objectives, detailing how we utilized advanced analytics to drive gameplay improvements and player engagement.
Key highlights include:
Primary Goals: Implementing gameplay and technical telemetry to capture detailed player behavior and game performance data, fostering data-driven decision-making.
Tech Stack: Leveraging AWS services such as EKS for hosting, WAF for security, Karpenter for instance optimization, S3 for data storage, and OpenTelemetry Collector for data collection. EventBridge and Lambda were used for data compression, while Glue ETL and Athena facilitated data transformation and preparation.
Data Utilization: Transforming raw data into actionable insights with technologies like Glue ETL (PySpark scripts), Glue Crawler, and Athena, culminating in detailed visualizations with Tableau.
Achievements: Successfully managing 700 million to 1 billion events per month at a cost-effective rate, with significant savings compared to commercial solutions. This approach has enabled simplified scaling and substantial improvements in game design, reducing player churn through targeted adjustments.
Community Engagement: Enhanced ability to engage with player communities by leveraging precise data insights, despite having a small community management team.
This presentation is an invaluable resource for professionals in game development, data analytics, and cloud computing, offering insights into how telemetry and analytics can revolutionize player experience and game performance optimization.
We are pleased to share with you the latest VCOSA statistical report on the cotton and yarn industry for the month of May 2024.
Starting from January 2024, the full weekly and monthly reports will only be available for free to VCOSA members. To access the complete weekly report with figures, charts, and detailed analysis of the cotton fiber market in the past week, interested parties are kindly requested to contact VCOSA to subscribe to the newsletter.
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Marlon Dumas
This webinar discusses the limitations of traditional approaches for business process simulation based on had-crafted model with restrictive assumptions. It shows how process mining techniques can be assembled together to discover high-fidelity digital twins of end-to-end processes from event data.
Did you know that drowning is a leading cause of unintentional death among young children? According to recent data, children aged 1-4 years are at the highest risk. Let's raise awareness and take steps to prevent these tragic incidents. Supervision, barriers around pools, and learning CPR can make a difference. Stay safe this summer!
3. BigML, Inc #DutchMLSchool
Sampling the Audience
3
Expert: Published papers at KDD, ICML, NIPS, etc or
developed own ML algorithms used at large scale
Aficionado: Understands pros/cons of different
techniques and/or can tweak algorithms as needed
Practitioner: Very familiar with ML packages (Weka,
Scikit, BigML, etc.)
Newbie: Just taking Coursera ML class or reading an
introductory book to ML
Absolute beginner: ML sounds like science fiction
6. BigML, Inc #DutchMLSchool
A Brief History of BigML
6
• BigML Mission: To make Machine
Learning Beautifully Simple
• BigML Founded in Corvallis,
Oregon in 2011 - long before ML
was "cool"
• You’ve never heard of it?
• Most innovative city in the United
States!
8. BigML, Inc #DutchMLSchool
BigML Platform
8
Web-based Frontend
Visualizations
Distributed Machine Learning Backend
SOURCE
SERVER
DATASET
SERVER
MODEL
SERVER
PREDICTION
SERVER
EVALUATION
SERVER
SAMPLE
SERVER
WHIZZML
SERVER
Tools - https://bigml.com/tools
REST API - https://bigml.com/api
Smart Infrastructure
(auto-deployable, auto-scalable)
SERVERS
EVENTS GEARMAN
QUEUE
DESIRED
TOPOLOGY
AWS
COSTS
RUNQUEUE
SCALER
BUSY
SCALER
AUTO
TOPOLOGY
AUTO
TOPOLOGY
AUTO
TOPOLOGY
AUTO
TOPOLOGY
ACTUAL
TOPOLOGY
MESSAGE
QUEUE
9. BigML, Inc #DutchMLSchool
BigML Platform
9
Web-based Frontend
Visualizations
Distributed Machine Learning Backend
SOURCE
SERVER
DATASET
SERVER
MODEL
SERVER
PREDICTION
SERVER
EVALUATION
SERVER
SAMPLE
SERVER
WHIZZML
SERVER
Tools - https://bigml.com/tools
REST API - https://bigml.com/api
Smart Infrastructure
(auto-deployable, auto-scalable)
SERVERS
EVENTS GEARMAN
QUEUE
DESIRED
TOPOLOGY
AWS
COSTS
RUNQUEUE
SCALER
BUSY
SCALER
AUTO
TOPOLOGY
AUTO
TOPOLOGY
AUTO
TOPOLOGY
AUTO
TOPOLOGY
ACTUAL
TOPOLOGY
MESSAGE
QUEUE
On-Premises
10. BigML, Inc #DutchMLSchool
Machine Learning Motivation
10
• You are looking to buy a house
• Recently found a house you like
• Is the asking price fair?
Imagine:
What Next?
11. BigML, Inc #DutchMLSchool
Machine Learning Motivation
11
Why not ask an expert?
• Experts can be rare / expensive
• Hard to validate experience:
• Experience with similar properties?
• Do they consider all relevant variables?
• Knowledge of market up to date?
• Hard to validate answer:
• How many times expert right / wrong?
• Probably can’t explain decision in detail
• Humans are not good at intuitive statistics
12. BigML, Inc #DutchMLSchool
Data vs Expert
12
Replace the expert with data?
• Intuition: square footage relates to price.
• Collect data from past sales
SQFT SOLD
2424 360000
1785 307500
1003 185000
4135 600000
1676 328500
1012 247000
3352 420000
2825 435350
PRICE = 125.3*SQFT + 96535
PREDICT
400262
320195
222211
614651
306538
223339
516541
450508
13. BigML, Inc #DutchMLSchool
Data vs Expert
13
Replace the expert scorecard
• Experts can be rare / expensive
• Hard to validate experience:
• Experience with similar properties?
• Do they consider all relevant variables?
• Knowledge of market up to date?
• Hard to validate answer:
• How many times expert right / wrong?
• Probably can’t explain decision in detail
• Humans are not good at intuitive statistics
14. BigML, Inc #DutchMLSchool
Data vs Expert
14
Replace the expert with data
• Intuition: square footage relates to price.
• Collect data from past sales
SQFT SOLD
2424 360000
1785 307500
1003 185000
4135 600000
1676 328500
1012 247000
3352 420000
2825 435350
PRICE = 125.3*SQFT + 96535
15. BigML, Inc #DutchMLSchool
More Data!
15
SQFT BEDS BATHS ADDRESS LOCATION
LOT
SIZE
YEAR
BUILT
PARKING
SPOTS
LATITUDE LONGITUDE SOLD
2424 4 3
1522 NW
Jonquil
Timberhill
SE 2nd
5227 1991 2 44,594828 -123,269328 360000
1785 3 2
7360 NW
Valley Vw
Country
Estates
25700 1979 2 44,643876 -123,238189 307500
1003 2 1
2620 NW
Chinaberry
Tamarack
Village
4792 1978 2 44,593704 -123,295424 185000
4135 5 3,5
4748 NW
Veronica
Suncrest 6098 2004 3 44,5929659 -123,306916 600000
1676 3 2
2842 NW
Monterey
Corvallis 8712 1975 2 44,5945279 -123,291523 328500
1012 3 1
2320 NW
Highland
Corvallis 9583 1959 2 44,591476 -123,262841 247000
3352 4 3
1205 NW
Ridgewood
Ridgewood
2
60113 1975 2 44,579439 -123,333888 420000
2825 3 411 NW 16th
Wilkins
Addition
4792 1938 1 44,570883 -123,272113 435350
Uhhhh……..
• Can we still fit a line to 10 variables? (well, yes)
• Will fitting a line give good results? (unlikely)
• What about those text fields and categorical values?
18. BigML, Inc #DutchMLSchool
Some Terminology…
18
Home
Data
Model Prediction:
Price=418K
Training
Data
• Modeling
• Clustering
• Anomaly Detection
• Association Discovery
ML
Resource
ML
Platform
“Consume” the model
or
“put into production”
• Dashboard
• Custom Application
• Wearable / Edge device
• Batch Process
19. BigML, Inc #DutchMLSchool
Model Choices
19
• Single Decision Tree was Easy to understand, but could we
build something stronger?
• There are actually hundreds of algorithms…
21. BigML, Inc #DutchMLSchool
Model Choices
21
• Single Decision Tree was Easy to understand, but could we
build something stronger?
• There are actually hundreds of algorithms…
• BigML carefully implements the best in terms of interpretability
and the ability to work with real-world data:
• Linear Regression
• Logistic Regression
• Single Decision Trees
• Decision Forest / Random Decision Forest
• Boosted Trees
• Deepnets (wait - those are hard, right?)
23. BigML, Inc #DutchMLSchool
BigML Deepnets
23
• The success of a Deepnet is dependent on getting the right
network structure for the dataset
• But, there are too many parameters:
• Nodes, layers, activation function, learning rate, etc…
• And setting them takes significant expert knowledge
• Solution: Metalearning (a good initial guess)
• Solution: Network search (try a bunch)
25. BigML, Inc #DutchMLSchool
Choosing the Algorithm
25
Decreasing Interpretability / Better Representation / Longer Training
IncreasingDataSize/Complexity
Early Stage
Rapid Prototyping
Mid Stage
Proven Application
Late Stage
Critical Performance
DeepnetsSingle Tree Model
Logistic Regression Boosted Trees
Random
Decision Forest
Decision Forest
STILL
TO
O
H
AR
D
?
26. BigML, Inc #DutchMLSchool
OptiML
26
• Each resource has several parameters that impact quality
• Number of trees, missing splits, nodes, weight
• Rather than trial and error, we can use ML to find ideal
parameters
• Why not make the model type, Decision Tree, Boosted Tree,
etc, a parameter as well?
• Similar to Deepnet network search, but finds the optimum
machine learning algorithm and parameters for your data
automatically
• Outputs the top performing algorithms and parameters for your
data… Why use just one “best” result?
27. BigML, Inc #DutchMLSchool
Fusions
27
• Similar to an Ensemble, but we can mix different model types
• Logistic Regression, plus a Deepnet for example
• You can also create a fusion with different training sets!
• Last week, plus last month data, etc
• Or a Fusion of OptiML models
• Combines the “best of the best”
29. BigML, Inc #DutchMLSchool
ML Workflows
29
MODEL
FILTERSOLD HOMES
BATCH
PREDICTION
NEW FEATURES
DATASET DEALS
DATASET
FILTERFORSALE HOMES NEW FEATURES
• Real-world ML Applications
are workflows!
• Often requires
unsupervised learning!
31. BigML, Inc #DutchMLSchool
Recommender Idea
31
?
?
?
?
Preference
Model
Preference
Data
Sample
… then use the Preference Model to
filter all the homes on the market
All Homes
Forsale
32. BigML, Inc #DutchMLSchool
Title
32
What if there are really unusual homes in the data?
• A mansion with 20 bathrooms
• A home with no bedrooms
• A lot size that is smaller than the home?
We don’t want to show these as suggestions
because they are unusual…. How do we detect
anomalies?
34. BigML, Inc #DutchMLSchool
What just happened?
34
• We wanted to find and remove unusual houses.
• We created an Anomaly Detector and examined
the top anomalies.
• We found some unusual houses to remove and
discovered bad data (missing values) that we want
to fix.
35. BigML, Inc #DutchMLSchool
A clever way to fix missing data
35
Let’s use Machine Learning…
BEDS BATHS
SQFT PRICE BEDS BATHS
3.125 US$530.000 5 3
2.100 US$460.000 2
1.200 US$250.000 3
3.950 US$610.000 6 4
4
1.5
37. BigML, Inc #DutchMLSchool
What just happened?
37
• We had a Dataset with missing values.
• We wanted to apply an algorithm to fix the missing
values with Machine Learning
• Rather than write the algorithm, we found what we
needed in the WhizzML public gallery.
• Now that we have cloned the Script we can use it
again and again.
• We can write new ones too!
38. BigML, Inc #DutchMLSchool
Recommender Problem #2
38
• How can we avoid showing essentially the
same house over and over?
All Homes
?
?
?
Sample
Modern
39. BigML, Inc #DutchMLSchool
Recommender Problem #2
39
• How can we avoid showing essentially the
same house over and over?
All Homes
Modern
Lots of
Land
• Great! What if we don’t know how to group
them? Or how many groups?
?
sample
?
sample
41. BigML, Inc #DutchMLSchool
What just happened?
41
• Since we don’t know how many groups of homes
there should be, we used G-means Clustering to find
the optimum number of groups of homes
• Our recommender will use these groups to create a
better sampling for user preference
• We also tried to understand the home clusters using
“model clusters” but the models were difficult to
interpret
42. BigML, Inc #DutchMLSchool
Understanding Clusters Better
42
If SQFT >= 3,125 THEN “Cluster 1”
What if we could get rules like…
SQFT PRICE BEDS BATHS CLUSTER
3.125 US$530.000 5 3 Cluster 1
2.100 US$460.000 4 2 Cluster 3
1.200 US$250.000 3 1,5 Cluster 5
3.950 US$610.000 6 4 Cluster 1
44. BigML, Inc #DutchMLSchool
What just happened?
44
• We used a Batch Centroid to add the Cluster
assignment of each home as a feature to the Dataset
• We use Association Discovery to find “interesting”
relationships between the features including the Cluster
assignment
45. BigML, Inc #DutchMLSchool
Recommender Problem #3
45
There is much more interesting information than just the
number of BEDS, BATHS, etc.
• Unfortunately, these "remarks" are not available in the
Redfin download
• Adding them to our dataset requires crawling the
website
• Like most ML projects, preparing the data is 80% of
the difficulty (fortunately I already did it!)
47. BigML, Inc #DutchMLSchool
What just happened?
47
• We extending the home dataset with the syndicated
remarks text field
• We built a model to predict sale price and explored how
key words discovered in the remarks impacted price
• We used topic modeling to create a deeper thematic
understanding of the remarks
• Homes that are "in-town" or "out-of-town"
• We extended the dataset with fields that represent for
each home how related they are to each of these topics
• This will allow our clustering to group homes by a deeper
meaning than just BEDS, BATHS, etc
• Is there a better way to capture “locality”?
52. BigML, Inc #DutchMLSchool
What just happened?
52
• We wanted to create a new feature “distance from OSU”
• This is possible with Flatline, a DSL for feature engineering
• Rather than writing the code for the coordinate
transformation, we found a ready-made script shared in
the WhizzML gallery
• We cloned the script and transformed the dataset
• This can be easily repeated with new datasets: fresh data
or different cities