© 2018 IBM Corporation
Eclipse Day
Machine Learning for Java Developers
Nasser Ebrahim(enasser@in.ibm.com)
Software Architect, IBM Software Lab
2 © 2017 IBM Corporation© 2018 IBM Corporation
Agenda
• Introduction
• Algorithms
• Frameworks in Java
• Demo with Jupyter Notebook & Weka
• Demo with Eclipse & DL4J
• Q & A
3 © 2017 IBM Corporation© 2018 IBM Corporation
Machine Learning
Machine learning is a field of
computer science that gives
computers the ability to learn
without being explicitly
programmed.
• an application of artificial intelligence
• ability to automatically learn and improve from experience
without being explicitly programmed
• development of computer programs that can access data and
use it learn for themselves
4 © 2017 IBM Corporation© 2018 IBM Corporation
Machine Learning – Why now?
• Availability of Data
• Storage cost
• Computational power
5 © 2017 IBM Corporation© 2018 IBM Corporation
Machine Learning – Applications
6 © 2017 IBM Corporation© 2018 IBM Corporation
Machine Learning – Workflow
7 © 2017 IBM Corporation© 2018 IBM Corporation
Machine Learning – Features & Label
Features are relevant and independent variables in data (X)
Label is the dependent variable that we need to predict (y)
8 © 2017 IBM Corporation© 2018 IBM Corporation
Types of Machine Learning
9 © 2017 IBM Corporation© 2018 IBM Corporation
Supervised Learning
Labeled training data
• Classification
• Regression
Unsupervised Learning
• Clustering
• Association
Unlabeled training data
10 © 2017 IBM Corporation© 2018 IBM Corporation
Reinforcement Learning
11 © 2017 IBM Corporation© 2018 IBM Corporation
12 © 2017 IBM Corporation© 2018 IBM Corporation
Linear Regression
In statistics, linear regression is a linear approach for
modelling the relationship between a scalar dependent
variable y and one or more explanatory variables (or
independent variables) denoted X.
y = b0 + b1*X
Best-fit-line that best
describe dataset with
reduced square error
13 © 2017 IBM Corporation© 2018 IBM Corporation
Logistic Regression
Used when data has binary dependent variables
14 © 2017 IBM Corporation© 2018 IBM Corporation
K Nearest Neighbors (Classification)
• Based on Euclidian distance of
K points
• K should be greater than
classification groups
• K should always be odd number
for better classification
15 © 2017 IBM Corporation© 2018 IBM Corporation
K Means clustering
• Works with very large datasets
• Starts by picking k, the number
of clusters
• Start by choosing k random
points – called as centroids
• Populate clusters
16 © 2017 IBM Corporation© 2018 IBM Corporation
K Means clustering
17 © 2017 IBM Corporation© 2018 IBM Corporation
Apriori Algorithm - Illustration
• Uses level-wise search, where k-itemsets are used to explore (k+1)
itemsets
• Candidate Generation – Frequent itemsets are extended one at a
time
• Determines frequent itemsets that can be used to determine
association rule which highlight general trends in the database.
• Provides insight into which products tend to be purchased together and
which are most amenable to promotion.
• Trivial pattern
• People who buy chalk-
piece also buy duster
• Inexplicable pattern
• People who buy
mobile also buy bag
18 © 2017 IBM Corporation© 2018 IBM Corporation
Apriori Algorithm - Example
Min Support = 2
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
500 1 3 5
Itemset Support
1 3
2 3
3 4
4 1
5 4
Itemset Support
1 3
2 3
3 4
5 4
CL1 FL1
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
500 1 3 5
Itemset Support
1 2 1
1 3 3
1 5 2
2 3 2
2 5 3
3 5 3
Itemset Support
1 3 3
1 5 2
2 3 2
2 5 3
3 5 3
CL2
FL2
19 © 2017 IBM Corporation© 2018 IBM Corporation
Java ML Libraries & Frameworks
ADAMS
ELKI
Java-ML
JSAT
Encog
20 © 2017 IBM Corporation© 2018 IBM Corporation
Waikato Environment for Knowledge Analysis (Weka)
• Machine learning/data mining software written in Java (distributed
under the GNU Public License)
• Developed at University of Waikato, New Zealand
• Comprehensive set of data pre-processing tools, learning algorithms
and evaluation methods
• Graphical user interfaces (incl. data visualization)
• Environment for comparing learning algorithms
21 © 2017 IBM Corporation© 2018 IBM Corporation
Deeplearning4j - DL4J
22 © 2017 IBM Corporation© 2018 IBM Corporation
Jupyter Notebook
• An open-source web application that allows you to create and
share documents that contain live code, equations, visualizations
and narrative text.
• Install Jupyter using anaconda
• http://jupyter.org/install
• Different kernals to work on
different languages
• python, R, scala, Java, Spark
23 © 2017 IBM Corporation© 2018 IBM Corporation
Jupyter Notebook with Java Kernal
• IJAVA - Jupyter kernel for executing Java code
• https://github.com/SpencerPark/IJava
• Install IJAVA using the archive from
https://github.com/SpencerPark/IJava/releases/download/v1.1.2/ijava-
1.1.2.zip
• The kernel executes code via the new JShell tool from Java 9.
24 © 2017 IBM Corporation© 2018 IBM Corporation
Eclipse with maven M2Eclipse
• Launching Maven builds from within Eclipse
• Dependency management for Eclipse build
• Resolving Maven dependencies from the Eclipse workspace
• Wizards for creating new Maven projects, pom.xml, etc
Install Maven in Eclipse
• Eclipse -> Help -> Install New Software
• Enter http://download.eclipse.org/technology/m2e/releases/
• Select “Maven Integration for Eclipse”
• Click on next, accept agreement & Finish
25 © 2017 IBM Corporation© 2018 IBM Corporation
Q & A

Machine Learning for Java Developers - Nasser Ebrahim

  • 1.
    © 2018 IBMCorporation Eclipse Day Machine Learning for Java Developers Nasser Ebrahim(enasser@in.ibm.com) Software Architect, IBM Software Lab
  • 2.
    2 © 2017IBM Corporation© 2018 IBM Corporation Agenda • Introduction • Algorithms • Frameworks in Java • Demo with Jupyter Notebook & Weka • Demo with Eclipse & DL4J • Q & A
  • 3.
    3 © 2017IBM Corporation© 2018 IBM Corporation Machine Learning Machine learning is a field of computer science that gives computers the ability to learn without being explicitly programmed. • an application of artificial intelligence • ability to automatically learn and improve from experience without being explicitly programmed • development of computer programs that can access data and use it learn for themselves
  • 4.
    4 © 2017IBM Corporation© 2018 IBM Corporation Machine Learning – Why now? • Availability of Data • Storage cost • Computational power
  • 5.
    5 © 2017IBM Corporation© 2018 IBM Corporation Machine Learning – Applications
  • 6.
    6 © 2017IBM Corporation© 2018 IBM Corporation Machine Learning – Workflow
  • 7.
    7 © 2017IBM Corporation© 2018 IBM Corporation Machine Learning – Features & Label Features are relevant and independent variables in data (X) Label is the dependent variable that we need to predict (y)
  • 8.
    8 © 2017IBM Corporation© 2018 IBM Corporation Types of Machine Learning
  • 9.
    9 © 2017IBM Corporation© 2018 IBM Corporation Supervised Learning Labeled training data • Classification • Regression Unsupervised Learning • Clustering • Association Unlabeled training data
  • 10.
    10 © 2017IBM Corporation© 2018 IBM Corporation Reinforcement Learning
  • 11.
    11 © 2017IBM Corporation© 2018 IBM Corporation
  • 12.
    12 © 2017IBM Corporation© 2018 IBM Corporation Linear Regression In statistics, linear regression is a linear approach for modelling the relationship between a scalar dependent variable y and one or more explanatory variables (or independent variables) denoted X. y = b0 + b1*X Best-fit-line that best describe dataset with reduced square error
  • 13.
    13 © 2017IBM Corporation© 2018 IBM Corporation Logistic Regression Used when data has binary dependent variables
  • 14.
    14 © 2017IBM Corporation© 2018 IBM Corporation K Nearest Neighbors (Classification) • Based on Euclidian distance of K points • K should be greater than classification groups • K should always be odd number for better classification
  • 15.
    15 © 2017IBM Corporation© 2018 IBM Corporation K Means clustering • Works with very large datasets • Starts by picking k, the number of clusters • Start by choosing k random points – called as centroids • Populate clusters
  • 16.
    16 © 2017IBM Corporation© 2018 IBM Corporation K Means clustering
  • 17.
    17 © 2017IBM Corporation© 2018 IBM Corporation Apriori Algorithm - Illustration • Uses level-wise search, where k-itemsets are used to explore (k+1) itemsets • Candidate Generation – Frequent itemsets are extended one at a time • Determines frequent itemsets that can be used to determine association rule which highlight general trends in the database. • Provides insight into which products tend to be purchased together and which are most amenable to promotion. • Trivial pattern • People who buy chalk- piece also buy duster • Inexplicable pattern • People who buy mobile also buy bag
  • 18.
    18 © 2017IBM Corporation© 2018 IBM Corporation Apriori Algorithm - Example Min Support = 2 TID Items 100 1 3 4 200 2 3 5 300 1 2 3 5 400 2 5 500 1 3 5 Itemset Support 1 3 2 3 3 4 4 1 5 4 Itemset Support 1 3 2 3 3 4 5 4 CL1 FL1 TID Items 100 1 3 4 200 2 3 5 300 1 2 3 5 400 2 5 500 1 3 5 Itemset Support 1 2 1 1 3 3 1 5 2 2 3 2 2 5 3 3 5 3 Itemset Support 1 3 3 1 5 2 2 3 2 2 5 3 3 5 3 CL2 FL2
  • 19.
    19 © 2017IBM Corporation© 2018 IBM Corporation Java ML Libraries & Frameworks ADAMS ELKI Java-ML JSAT Encog
  • 20.
    20 © 2017IBM Corporation© 2018 IBM Corporation Waikato Environment for Knowledge Analysis (Weka) • Machine learning/data mining software written in Java (distributed under the GNU Public License) • Developed at University of Waikato, New Zealand • Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods • Graphical user interfaces (incl. data visualization) • Environment for comparing learning algorithms
  • 21.
    21 © 2017IBM Corporation© 2018 IBM Corporation Deeplearning4j - DL4J
  • 22.
    22 © 2017IBM Corporation© 2018 IBM Corporation Jupyter Notebook • An open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. • Install Jupyter using anaconda • http://jupyter.org/install • Different kernals to work on different languages • python, R, scala, Java, Spark
  • 23.
    23 © 2017IBM Corporation© 2018 IBM Corporation Jupyter Notebook with Java Kernal • IJAVA - Jupyter kernel for executing Java code • https://github.com/SpencerPark/IJava • Install IJAVA using the archive from https://github.com/SpencerPark/IJava/releases/download/v1.1.2/ijava- 1.1.2.zip • The kernel executes code via the new JShell tool from Java 9.
  • 24.
    24 © 2017IBM Corporation© 2018 IBM Corporation Eclipse with maven M2Eclipse • Launching Maven builds from within Eclipse • Dependency management for Eclipse build • Resolving Maven dependencies from the Eclipse workspace • Wizards for creating new Maven projects, pom.xml, etc Install Maven in Eclipse • Eclipse -> Help -> Install New Software • Enter http://download.eclipse.org/technology/m2e/releases/ • Select “Maven Integration for Eclipse” • Click on next, accept agreement & Finish
  • 25.
    25 © 2017IBM Corporation© 2018 IBM Corporation Q & A