SlideShare a Scribd company logo
Data Science Company 
Machine Learning in Practice 
An InfoFarm Seminar 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
Data 
Science 
Big 
Data 
Identifying, extracting and using data of all types 
and origins; exploring, correlating and using it in new 
and innovative ways in order to extract meaning 
and business value from it.
2 Data Scientists 4 Big Data 
Consultants 
1 Infrastructure 
Specialist 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
Java 
PHP 
E-Commerce 
Mobile 
Web 
Development
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Agenda 
• 13:00 What is Machine Learning? 
• 13:30 Techniques 
• 14:30 Tools 
• 15:00 Practical examples 
• 16:00 Wrap up 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
What is Machine Learning? 
Veldkant 33A, Kontich ● info@infofarmDa.btae S●ciwewncwe. inCfoomfaprman.bye
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
Magic?
Machine Learning is a subfield of 
computer science and statistics that deals 
with systems that can learn from data, 
instead of follow explicitly programmed 
instructions. 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Machine Learning vs Data Science vs Big Data 
• You don’t need Big Data to leverage the 
benefits of machine learning, but more 
learning data makes a better machine 
• Data Science can help you to get the most 
out of Machine Learning 
• Machine Learning can help you to get the 
most out of Data Science 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Terminology 
Veldkant 33A, Kontich ● info@infofarmDa.btae S●ciwewncwe. inCfoomfaprman.bye
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
Terminology 
Weight (g) Wingspan (cm) Webbed feet? Back color Species 
1000.1 125.0 No Brown Buteo jamaicenis 
3000.7 200.0 No Gray Sagittarius serpentarius 
3300.0 220.3 No Gray Sagittarius serpentarius 
4100.0 136.0 Yes Black Gavia Immer 
3.0 11.0 No Green Colothorax lucifer 
570.0 75.0 No Black Campephilus principalic 
• Features / attributes 
• Instance / data point 
• Label / target variable 
• Factorial versus Numeric versus Binary data
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
Learning 
• Supervised Learning 
• Unsupervised Learning
Techniques 
Veldkant 33A, Kontich ● info@infofarmDa.btae S●ciwewncwe. inCfoomfaprman.bye
Machine 
Learning 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
Clustering 
Classification 
Association 
Rules 
Regression 
Information 
extraction
Classification 
• Predict a category for a given instance 
• Mostly supervised learning. 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
• Algorithms 
– Naïve Bayes 
– Support Vector Machine 
– Decision Trees 
– Neural Networks
Classification: Use Cases 
• Incoming mail redirection 
• Sentiment analysis 
• Order picking optimization 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Clustering 
• Try to find clusters in unstructured data 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
• Unsupervised learning 
• Algorithms: K-Means
Clustering: Use cases 
• Customer profiling 
• Grouping of shopping items 
• Recommendation systems 
• Fraud detection 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Association Rule Learning 
• Find interesting relations 
• Find frequent occurring patterns 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
• Algorithms 
– Apriori 
– Singular Value Decomposition 
– FP-growth
Association Rule Learning: Use Cases 
• Recommendations 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
• Data exploration 
• Find connections between unrelated 
events 
• Frequent pattern mining
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
Regression 
• Prediction of a quantity 
• Algorithms: 
– Linear regression 
– Logistic regression
Regression: Use Cases 
• Order Quantity Prediction 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
• Lag analysis 
• Trend estimation
Information Extraction 
• Extract variables out of unstructured data 
like text. 
• Named Entity Extraction 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Veldkant 33A, Kontich ● info@infofarmDa.btae S●ciwewncwe. inCfoomfaprman.bye 
Tools
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
Apache Mahout 
Pro Contra 
Relatively stable Poor documentation 
Build on Hadoop – Scales well Mahout is currently migrating from 
Apache Hadoop to Apache Spark. 
Development is slow and Apache Spark 
already built a machine learning library of 
their own… Instant legacy? 
Command-line access for most algorithms Kind of slow for smaller use cases 
All important algorithms are available
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
Weka 
Pro Contra 
A lot of algorithms are available Not ‘Big Data’ ready 
Graphical user interface for prototyping 
and experimenting 
Requires custom data format – ARRF-files 
Available as a Java library Optimized for academic use cases
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
Apache Spark: MLLib 
Pro Contra 
Based on Apache Spark – Very, fast and 
scalable 
Based on Apache Spark – Requires 
knowledge of Spark and Scala 
Very fast development cycle, new features 
are rolling out every couple of months 
Relatively new, so a small choice of 
algorithms. But the essential ones are 
there. 
New and refreshing API, easy integration 
with other components of Apache Spark.
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
R 
Pro Contra 
A lot of algorithms are available Can run on Hadoop/Spark, but requires a 
lot of knowledge from both platforms 
Well documented Must learn a new language 
Lot’s of existing packages, that are easily 
available
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
Noteworthy 
Java 
• DeepLearning4J 
• Mallet 
• MOA 
Python 
• NLTK 
• Theano 
• PyBrain 
• SciKit-Learn 
Lua 
• Torch 
General 
• LibSVM 
• LibLinear
Integration with Software Development 
Veldkant 33A, Kontich ● info@infofarmDa.btae S●ciwewncwe. inCfoomfaprman.bye
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
Development Cycle 
Collect Analyze Extract Train Test Use
Feature extraction 
• Describe an instance to be 
used in an algorithm 
• Recognize hand-written digits 
by converting the images to 
lines of 1’s and 0’s 
00000000000001000000000000000000 
00000000000001110000000000000000 
00000000000011110000000000000000 
00000000001111100000000000000000 
00000000001111000000000000000000 
00000000000111100000000000000000 
00000000001111100000000000000000 
00000000011111000000000000000000 
00000000011110000000000000000000 
00000000111110000000000000000000 
00000000011111000000000000000000 
00000000111111000000000000000000 
00000000111110000000000000000000 
00000000111100000000000000000000 
00000000011110000000000000000000 
00000000111110000111000000000000 
00000001111111111111111100000000 
00000001111111111111111110000000 
00000001111111111111111110000000 
00000000111111111111111111100000 
00000001111111110000011111100000 
00000001111100000000000111100000 
00000000111100000000000111100000 
00000000011110000000000011110000 
00000000011111000000000011110000 
00000000011111100000001111110000 
00000000011111111111111111110000 
00000000011111111111111111100000 
00000000000111111111111111100000 
00000000000011111111111111100000 
00000000000000111111111000000000 
00000000000000001111110000000000 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Training an algorithm 
1. Collect you’re data as a collection of 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
instances 
2. Split you’re data set into a training set 
and a testing set 
3. Train the algorithm with the training set 
4. Validate the results using the test set
Runtime model 
• During training most algorithms generate a 
mathematical runtime model. 
• Model should be updated on a regular 
basis 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
A / B Testing 
• Slow integration in the main system. 
• If the machine is certain (enough) the 
machine can take over 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Hands-on 
Veldkant 33A, Kontich ● info@infofarmDa.btae S●ciwewncwe. inCfoomfaprman.bye
Demo 
• K-Nearest Neighbour Classifier 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
• Clustering using Weka 
• Named-Entity Extraction 
• Classification of tweets
What’s in it for you? 
Veldkant 33A, Kontich ● info@infofarmDa.btae S●ciwewncwe. inCfoomfaprman.bye
Benefits of using machine learning 
• Automate repetitive tasks 
• Can be a solution for problems that are 
difficult to automate 
• Gain insights about your business 
• Optimize business decisions by using the 
opinion of the computer 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Questions? 
Veldkant 33A, Kontich ● info@infofarmDa.btae S●ciwewncwe. inCfoomfaprman.bye
Wrap-up 
Veldkant 33A, Kontich ● info@infofarmDa.btae S●ciwewncwe. inCfoomfaprman.bye

More Related Content

What's hot

Machine learning
Machine learningMachine learning
Machine learning
Dr Geetha Mohan
 
machine learning
machine learningmachine learning
machine learning
soundaryasarya
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
KmPooja4
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.butest
 
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Simplilearn
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
Ganesh Satpute
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
Marina Santini
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Kumar P
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
Koundinya Desiraju
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science
Venkata Reddy Konasani
 
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Simplilearn
 
What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...
What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...
What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...
Edureka!
 
Machine learning
Machine learningMachine learning
Machine learning
Wes Eklund
 
Machine learning ppt.
Machine learning ppt.Machine learning ppt.
Machine learning ppt.
ASHOK KUMAR
 
Machine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and TechniquesMachine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and TechniquesRui Pedro Paiva
 
Machine learning
Machine learningMachine learning
Machine learning
Tushar Nikam
 
Machine learning overview
Machine learning overviewMachine learning overview
Machine learning overview
prih_yah
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
amalalhait
 
Machine learning
Machine learningMachine learning
Machine learning
Ayesha Ahsan khan
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learningbutest
 

What's hot (20)

Machine learning
Machine learningMachine learning
Machine learning
 
machine learning
machine learningmachine learning
machine learning
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science
 
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
 
What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...
What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...
What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine learning ppt.
Machine learning ppt.Machine learning ppt.
Machine learning ppt.
 
Machine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and TechniquesMachine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and Techniques
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine learning overview
Machine learning overviewMachine learning overview
Machine learning overview
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learning
 

Viewers also liked

Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Rahul Jain
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningLior Rokach
 
Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
Venkata Reddy Konasani
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
bigdatasyd
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
Lars Marius Garshol
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Darshan Ambhaikar
 
Machine Learning and Data Mining: 10 Introduction to Classification
Machine Learning and Data Mining: 10 Introduction to ClassificationMachine Learning and Data Mining: 10 Introduction to Classification
Machine Learning and Data Mining: 10 Introduction to Classification
Pier Luca Lanzi
 
林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning
台灣資料科學年會
 
Introduction to Machine Learning and Deep Learning
Introduction to Machine Learning and Deep LearningIntroduction to Machine Learning and Deep Learning
Introduction to Machine Learning and Deep Learning
Terry Taewoong Um
 
Retail Detail OmniChannel Congress 2015 - Data Science for e-commerce
Retail Detail OmniChannel Congress 2015 - Data Science for e-commerceRetail Detail OmniChannel Congress 2015 - Data Science for e-commerce
Retail Detail OmniChannel Congress 2015 - Data Science for e-commerce
InfoFarm
 
Boosting big data with apache spark
Boosting big data with apache sparkBoosting big data with apache spark
Boosting big data with apache spark
InfoFarm
 
Data Driven Decisions seminar
Data Driven Decisions seminarData Driven Decisions seminar
Data Driven Decisions seminar
InfoFarm
 
An explanation of machine learning for business
An explanation of machine learning for businessAn explanation of machine learning for business
An explanation of machine learning for business
Clement Levallois
 
Big Data with Apache Hadoop
Big Data with Apache HadoopBig Data with Apache Hadoop
Big Data with Apache Hadoop
InfoFarm
 
First impressions of SparkR: our own machine learning algorithm
First impressions of SparkR: our own machine learning algorithmFirst impressions of SparkR: our own machine learning algorithm
First impressions of SparkR: our own machine learning algorithm
InfoFarm
 
Fikrimuhal TRHUG 2016 Machine Learning
Fikrimuhal TRHUG 2016 Machine LearningFikrimuhal TRHUG 2016 Machine Learning
Fikrimuhal TRHUG 2016 Machine Learning
Sukru Hasdemir
 
Real Time Big Data
Real Time Big DataReal Time Big Data
Real Time Big Data
InfoFarm
 
Harvesting business Value with Data Science
Harvesting business Value with Data ScienceHarvesting business Value with Data Science
Harvesting business Value with Data Science
InfoFarm
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning Classifiers
Functional Imperative
 

Viewers also liked (20)

Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine Learning and Data Mining: 10 Introduction to Classification
Machine Learning and Data Mining: 10 Introduction to ClassificationMachine Learning and Data Mining: 10 Introduction to Classification
Machine Learning and Data Mining: 10 Introduction to Classification
 
林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning
 
Introduction to Machine Learning and Deep Learning
Introduction to Machine Learning and Deep LearningIntroduction to Machine Learning and Deep Learning
Introduction to Machine Learning and Deep Learning
 
Retail Detail OmniChannel Congress 2015 - Data Science for e-commerce
Retail Detail OmniChannel Congress 2015 - Data Science for e-commerceRetail Detail OmniChannel Congress 2015 - Data Science for e-commerce
Retail Detail OmniChannel Congress 2015 - Data Science for e-commerce
 
Boosting big data with apache spark
Boosting big data with apache sparkBoosting big data with apache spark
Boosting big data with apache spark
 
Data Driven Decisions seminar
Data Driven Decisions seminarData Driven Decisions seminar
Data Driven Decisions seminar
 
An explanation of machine learning for business
An explanation of machine learning for businessAn explanation of machine learning for business
An explanation of machine learning for business
 
Big Data with Apache Hadoop
Big Data with Apache HadoopBig Data with Apache Hadoop
Big Data with Apache Hadoop
 
First impressions of SparkR: our own machine learning algorithm
First impressions of SparkR: our own machine learning algorithmFirst impressions of SparkR: our own machine learning algorithm
First impressions of SparkR: our own machine learning algorithm
 
Fikrimuhal TRHUG 2016 Machine Learning
Fikrimuhal TRHUG 2016 Machine LearningFikrimuhal TRHUG 2016 Machine Learning
Fikrimuhal TRHUG 2016 Machine Learning
 
Real Time Big Data
Real Time Big DataReal Time Big Data
Real Time Big Data
 
Harvesting business Value with Data Science
Harvesting business Value with Data ScienceHarvesting business Value with Data Science
Harvesting business Value with Data Science
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning Classifiers
 

Similar to Machine learning

Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingAutomated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and Tracking
Databricks
 
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
Laura Chiticariu
 
Strategies for Processing and Explaining Distributed Queries on Linked Data
Strategies for Processing and Explaining Distributed Queries on Linked DataStrategies for Processing and Explaining Distributed Queries on Linked Data
Strategies for Processing and Explaining Distributed Queries on Linked Data
Rakebul Hasan
 
Big data
Big dataBig data
Tips to get the most out of OpenERP. Jean Luc Delsaute & Coralie Girardet, Au...
Tips to get the most out of OpenERP. Jean Luc Delsaute & Coralie Girardet, Au...Tips to get the most out of OpenERP. Jean Luc Delsaute & Coralie Girardet, Au...
Tips to get the most out of OpenERP. Jean Luc Delsaute & Coralie Girardet, Au...Odoo
 
Tips to get the most out of OpenERP
Tips to get the most out of OpenERPTips to get the most out of OpenERP
Tips to get the most out of OpenERP
Audaxis
 
An AI-Powered Chatbot to Simplify Apache Spark Performance Management
An AI-Powered Chatbot to Simplify Apache Spark Performance ManagementAn AI-Powered Chatbot to Simplify Apache Spark Performance Management
An AI-Powered Chatbot to Simplify Apache Spark Performance Management
Databricks
 
Mortal analytics - Covid-19 and the problem of data quality
Mortal analytics - Covid-19 and the problem of data qualityMortal analytics - Covid-19 and the problem of data quality
Mortal analytics - Covid-19 and the problem of data quality
Lars Albertsson
 
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALSPYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
QuantUniversity
 
MITRE ATT&CKcon 2018: Hunters ATT&CKing with the Data, Roberto Rodriguez, Spe...
MITRE ATT&CKcon 2018: Hunters ATT&CKing with the Data, Roberto Rodriguez, Spe...MITRE ATT&CKcon 2018: Hunters ATT&CKing with the Data, Roberto Rodriguez, Spe...
MITRE ATT&CKcon 2018: Hunters ATT&CKing with the Data, Roberto Rodriguez, Spe...
MITRE - ATT&CKcon
 
Elasticsearch Performance Testing and Scaling @ Signal
Elasticsearch Performance Testing and Scaling @ SignalElasticsearch Performance Testing and Scaling @ Signal
Elasticsearch Performance Testing and Scaling @ Signal
Joachim Draeger
 
ML master class
ML master classML master class
ML master class
QuantUniversity
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
C4Media
 
IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079
ibankuk
 
Don't build a data science team
Don't build a data science teamDon't build a data science team
Don't build a data science team
Lars Albertsson
 
How I Learned to Stop Worrying and Love Building Data Products
How I Learned to Stop Worrying and Love Building Data ProductsHow I Learned to Stop Worrying and Love Building Data Products
How I Learned to Stop Worrying and Love Building Data Products
Alejandro Correa Bahnsen, PhD
 
Big data
Big dataBig data
Big data
Big dataBig data
Big data
Harshit Namdev
 
Hadoop PDF
Hadoop PDFHadoop PDF
Hadoop PDF
1904saikrishna
 
Skillwise Big data
Skillwise Big dataSkillwise Big data
Skillwise Big data
Skillwise Group
 

Similar to Machine learning (20)

Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingAutomated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and Tracking
 
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
 
Strategies for Processing and Explaining Distributed Queries on Linked Data
Strategies for Processing and Explaining Distributed Queries on Linked DataStrategies for Processing and Explaining Distributed Queries on Linked Data
Strategies for Processing and Explaining Distributed Queries on Linked Data
 
Big data
Big dataBig data
Big data
 
Tips to get the most out of OpenERP. Jean Luc Delsaute & Coralie Girardet, Au...
Tips to get the most out of OpenERP. Jean Luc Delsaute & Coralie Girardet, Au...Tips to get the most out of OpenERP. Jean Luc Delsaute & Coralie Girardet, Au...
Tips to get the most out of OpenERP. Jean Luc Delsaute & Coralie Girardet, Au...
 
Tips to get the most out of OpenERP
Tips to get the most out of OpenERPTips to get the most out of OpenERP
Tips to get the most out of OpenERP
 
An AI-Powered Chatbot to Simplify Apache Spark Performance Management
An AI-Powered Chatbot to Simplify Apache Spark Performance ManagementAn AI-Powered Chatbot to Simplify Apache Spark Performance Management
An AI-Powered Chatbot to Simplify Apache Spark Performance Management
 
Mortal analytics - Covid-19 and the problem of data quality
Mortal analytics - Covid-19 and the problem of data qualityMortal analytics - Covid-19 and the problem of data quality
Mortal analytics - Covid-19 and the problem of data quality
 
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALSPYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
 
MITRE ATT&CKcon 2018: Hunters ATT&CKing with the Data, Roberto Rodriguez, Spe...
MITRE ATT&CKcon 2018: Hunters ATT&CKing with the Data, Roberto Rodriguez, Spe...MITRE ATT&CKcon 2018: Hunters ATT&CKing with the Data, Roberto Rodriguez, Spe...
MITRE ATT&CKcon 2018: Hunters ATT&CKing with the Data, Roberto Rodriguez, Spe...
 
Elasticsearch Performance Testing and Scaling @ Signal
Elasticsearch Performance Testing and Scaling @ SignalElasticsearch Performance Testing and Scaling @ Signal
Elasticsearch Performance Testing and Scaling @ Signal
 
ML master class
ML master classML master class
ML master class
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079
 
Don't build a data science team
Don't build a data science teamDon't build a data science team
Don't build a data science team
 
How I Learned to Stop Worrying and Love Building Data Products
How I Learned to Stop Worrying and Love Building Data ProductsHow I Learned to Stop Worrying and Love Building Data Products
How I Learned to Stop Worrying and Love Building Data Products
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Hadoop PDF
Hadoop PDFHadoop PDF
Hadoop PDF
 
Skillwise Big data
Skillwise Big dataSkillwise Big data
Skillwise Big data
 

Recently uploaded

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 

Recently uploaded (20)

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 

Machine learning

  • 1. Data Science Company Machine Learning in Practice An InfoFarm Seminar Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
  • 2. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Data Science Big Data Identifying, extracting and using data of all types and origins; exploring, correlating and using it in new and innovative ways in order to extract meaning and business value from it.
  • 3. 2 Data Scientists 4 Big Data Consultants 1 Infrastructure Specialist Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
  • 4. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Java PHP E-Commerce Mobile Web Development
  • 5. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
  • 6. Agenda • 13:00 What is Machine Learning? • 13:30 Techniques • 14:30 Tools • 15:00 Practical examples • 16:00 Wrap up Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
  • 7. What is Machine Learning? Veldkant 33A, Kontich ● info@infofarmDa.btae S●ciwewncwe. inCfoomfaprman.bye
  • 8. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Magic?
  • 9. Machine Learning is a subfield of computer science and statistics that deals with systems that can learn from data, instead of follow explicitly programmed instructions. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
  • 10. Machine Learning vs Data Science vs Big Data • You don’t need Big Data to leverage the benefits of machine learning, but more learning data makes a better machine • Data Science can help you to get the most out of Machine Learning • Machine Learning can help you to get the most out of Data Science Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
  • 11. Terminology Veldkant 33A, Kontich ● info@infofarmDa.btae S●ciwewncwe. inCfoomfaprman.bye
  • 12. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Terminology Weight (g) Wingspan (cm) Webbed feet? Back color Species 1000.1 125.0 No Brown Buteo jamaicenis 3000.7 200.0 No Gray Sagittarius serpentarius 3300.0 220.3 No Gray Sagittarius serpentarius 4100.0 136.0 Yes Black Gavia Immer 3.0 11.0 No Green Colothorax lucifer 570.0 75.0 No Black Campephilus principalic • Features / attributes • Instance / data point • Label / target variable • Factorial versus Numeric versus Binary data
  • 13. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Learning • Supervised Learning • Unsupervised Learning
  • 14. Techniques Veldkant 33A, Kontich ● info@infofarmDa.btae S●ciwewncwe. inCfoomfaprman.bye
  • 15. Machine Learning Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Clustering Classification Association Rules Regression Information extraction
  • 16. Classification • Predict a category for a given instance • Mostly supervised learning. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be • Algorithms – Naïve Bayes – Support Vector Machine – Decision Trees – Neural Networks
  • 17. Classification: Use Cases • Incoming mail redirection • Sentiment analysis • Order picking optimization Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
  • 18. Clustering • Try to find clusters in unstructured data Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be • Unsupervised learning • Algorithms: K-Means
  • 19. Clustering: Use cases • Customer profiling • Grouping of shopping items • Recommendation systems • Fraud detection Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
  • 20. Association Rule Learning • Find interesting relations • Find frequent occurring patterns Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be • Algorithms – Apriori – Singular Value Decomposition – FP-growth
  • 21. Association Rule Learning: Use Cases • Recommendations Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be • Data exploration • Find connections between unrelated events • Frequent pattern mining
  • 22. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Regression • Prediction of a quantity • Algorithms: – Linear regression – Logistic regression
  • 23. Regression: Use Cases • Order Quantity Prediction Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be • Lag analysis • Trend estimation
  • 24. Information Extraction • Extract variables out of unstructured data like text. • Named Entity Extraction Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
  • 25. Veldkant 33A, Kontich ● info@infofarmDa.btae S●ciwewncwe. inCfoomfaprman.bye Tools
  • 26. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
  • 27. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Apache Mahout Pro Contra Relatively stable Poor documentation Build on Hadoop – Scales well Mahout is currently migrating from Apache Hadoop to Apache Spark. Development is slow and Apache Spark already built a machine learning library of their own… Instant legacy? Command-line access for most algorithms Kind of slow for smaller use cases All important algorithms are available
  • 28. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
  • 29. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Weka Pro Contra A lot of algorithms are available Not ‘Big Data’ ready Graphical user interface for prototyping and experimenting Requires custom data format – ARRF-files Available as a Java library Optimized for academic use cases
  • 30. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
  • 31. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Apache Spark: MLLib Pro Contra Based on Apache Spark – Very, fast and scalable Based on Apache Spark – Requires knowledge of Spark and Scala Very fast development cycle, new features are rolling out every couple of months Relatively new, so a small choice of algorithms. But the essential ones are there. New and refreshing API, easy integration with other components of Apache Spark.
  • 32. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
  • 33. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be R Pro Contra A lot of algorithms are available Can run on Hadoop/Spark, but requires a lot of knowledge from both platforms Well documented Must learn a new language Lot’s of existing packages, that are easily available
  • 34. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Noteworthy Java • DeepLearning4J • Mallet • MOA Python • NLTK • Theano • PyBrain • SciKit-Learn Lua • Torch General • LibSVM • LibLinear
  • 35. Integration with Software Development Veldkant 33A, Kontich ● info@infofarmDa.btae S●ciwewncwe. inCfoomfaprman.bye
  • 36. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Development Cycle Collect Analyze Extract Train Test Use
  • 37. Feature extraction • Describe an instance to be used in an algorithm • Recognize hand-written digits by converting the images to lines of 1’s and 0’s 00000000000001000000000000000000 00000000000001110000000000000000 00000000000011110000000000000000 00000000001111100000000000000000 00000000001111000000000000000000 00000000000111100000000000000000 00000000001111100000000000000000 00000000011111000000000000000000 00000000011110000000000000000000 00000000111110000000000000000000 00000000011111000000000000000000 00000000111111000000000000000000 00000000111110000000000000000000 00000000111100000000000000000000 00000000011110000000000000000000 00000000111110000111000000000000 00000001111111111111111100000000 00000001111111111111111110000000 00000001111111111111111110000000 00000000111111111111111111100000 00000001111111110000011111100000 00000001111100000000000111100000 00000000111100000000000111100000 00000000011110000000000011110000 00000000011111000000000011110000 00000000011111100000001111110000 00000000011111111111111111110000 00000000011111111111111111100000 00000000000111111111111111100000 00000000000011111111111111100000 00000000000000111111111000000000 00000000000000001111110000000000 Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
  • 38. Training an algorithm 1. Collect you’re data as a collection of Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be instances 2. Split you’re data set into a training set and a testing set 3. Train the algorithm with the training set 4. Validate the results using the test set
  • 39. Runtime model • During training most algorithms generate a mathematical runtime model. • Model should be updated on a regular basis Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
  • 40. A / B Testing • Slow integration in the main system. • If the machine is certain (enough) the machine can take over Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
  • 41. Hands-on Veldkant 33A, Kontich ● info@infofarmDa.btae S●ciwewncwe. inCfoomfaprman.bye
  • 42. Demo • K-Nearest Neighbour Classifier Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be • Clustering using Weka • Named-Entity Extraction • Classification of tweets
  • 43. What’s in it for you? Veldkant 33A, Kontich ● info@infofarmDa.btae S●ciwewncwe. inCfoomfaprman.bye
  • 44. Benefits of using machine learning • Automate repetitive tasks • Can be a solution for problems that are difficult to automate • Gain insights about your business • Optimize business decisions by using the opinion of the computer Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
  • 45. Questions? Veldkant 33A, Kontich ● info@infofarmDa.btae S●ciwewncwe. inCfoomfaprman.bye
  • 46. Wrap-up Veldkant 33A, Kontich ● info@infofarmDa.btae S●ciwewncwe. inCfoomfaprman.bye