Industrial Machine Learning

INDUSTRIAL MACHINE LEARNING
Grigorios Tsoumakas,
School of Informatics,
Aristotle University of Thessaloniki

OUTLINE
What is Machine Learning?
Industrial Applications of Machine Learning
2

DEFINITIONS OF ML
Machine learning is the subfield of computer science that gives
computers the ability to learn without being explicitly programmed
Arthur Samuel, 1959
A computer program is said to learn from experience 𝐸 with respect to
some class of tasks 𝑇 and performance measure 𝑃 if its performance at
tasks in 𝑇, as measured by 𝑃, improves with experience 𝐸
Tom Mitchell, 1998
3

Supervised Learning
 Input variables 𝒙
 Output variable 𝑦
 Mapping function 𝑦 = 𝑓(𝒙)
Unsupervised Learning
 Input variables 𝒙
 Learn more about the data
Reinforcement Learning
 Agent acting in an environment so
as to maximize cumulative reward
4
MAIN TASKS
http://www.isaziconsulting.co.za/machinelearning.html

Association Rules
 Items X => Items Z
Anomaly Detection
 Identify unusual data points
Recommender Systems
 Predict the rating that a user
would give to an item
…
5
OTHER TASKS

ALGORITHMS / APPROACHES / TRIBES
Discriminative vs Generative
 𝑝(𝑦|𝑥) vs 𝑝(𝑦, 𝑥)
Lazy vs Eager
 No learning until a test instance arrives
Parametric vs Non-Parametric
 Representations (don’t) grow with
more training data
The 5 Tribes of ML
6

SL: LINEAR MODELS, SVMS, TREES AND NNS
7
Pedro Domingos. 2012. A few useful things to
know about machine learning. Commun. ACM 55
“MORE DATA BEATS A CLEVERER ALGORITHM”
The Economist. Facebook post, May 5th, 2017
“Those who gather the most data will
dominate the digital landscapes of the future”

8
“LEARN MANY MODELS, NOT JUST ONE”
Anthony Goldbloom. Kaggle CEO. Oct 2015.
“As long as Kaggle has been around, it
has almost always been ensembles of
decision trees that have won
competitions. It used to be random forest
that was the big winner, but over the last
six months a new algorithm called
XGboost has cropped up, and it’s winning
practically every competition in the
structured data category.”

9

DIMENSIONALITY REDUCTION: PCA, SVD
11

LANGUAGES, LIBRARIES, TOOLS & APIS
13

METHODOLOGIES
14
http://www.kdnuggets.com/2014/10/crisp-dm-top-
methodology-analytics-data-mining-data-science-projects.html
“FEATURE ENGINEERING IS THE KEY”
“Data scientists spend 50-80% of their
time in data collection and preparation”
https://www.nytimes.com/2014/08/18/technology/for
-big-data-scientists-hurdle-to-insights-is-janitor-work.html

OUTLINE
What is Machine Learning?
Industrial Applications of Machine Learning
15

WHAT HAS CHANGED?
Faster distributed systems
The explosion in computing power has
allowed us to use machine learning to
tackle evermore-complex problems
Exponential data growth
The explosion of data being
captured and stored has allowed us
to apply machine learning to an
ever-expanding range of domains
17
The amount of collected data is
doubling every 12 months and
will reach 44 zettabytes by 2020

NATURAL GAS LOAD FORECASTING
Collaboration with Gas Supply Company of
Thessaloniki & Thessaly
The problem
 Daily statements of one day ahead demand must be
submitted to the regulatory entity
 Actual consumption must lie within a percentage of the
statement (e.g. 10%), otherwise economic fines are imposed
Similar framework in the electricity domain

UNDERSTANDING ACADEMIC PUBLICATIONS
Collaboration with Atypon Inc.
 Online content hosting and management software
 Atypon is home to more than one-third of the world’s English-language
professional and scholarly journals — clients include Elsevier, IEEE, MIT Press,
Oxford University Press, Taylor & Francis, …
Some of the things we do
 Automated semantic indexing of articles and figures
 Information extraction (e.g. funding information)
 Question answering

PubMed Central
22
UNDERSTANDING ACADEMIC PUBLICATIONS
PubMed
 10,876,004 abstracts (18Gb)
 26,563 MeSH terms, ~13 on avg.
0
200000
400000
600000
800000
1000000
1200000
1950
1953
1956
1959
1962
1965
1968
1971
1974
1977
1980
1983
1986
1989
1992
1995
1998
2001
2004
2007
2010
2013
x $10

INDUSTRY – ACADEMIA PARTNERSHIPS
Industry funded research & development
 Staff, senior researchers, and PhD students
Pro bono exploratory work
 MSc theses
National and EU funding
23

THE END… OR THE BEGINNING?
24

Industrial Machine Learning

More Related Content

What's hot

Similar to Industrial Machine Learning

Recently uploaded

Industrial Machine Learning