SlideShare a Scribd company logo
Machine Learning for Java
Developers
Nirmal Fernando
WSO2 Inc.
{Java Colombo}
Few things about me...
● Associated Technical Lead at WSO2
● Team Lead of WSO2 Machine Learner
● Just completed 4th year in the industry
● Graduated from Department of Computer Science, University
of Moratuwa.
● Schooled at St. Sebastian’s College, Moratuwa.
● Can sing a bit :-)
https://goo.gl/qbAXLz
Predictive Analytics
Extract information from existing datasets to determine
patterns and predict future
outcomes and trends.
It does not tell you what will
happen in the future.
But forecasts what might happen
in the future with an acceptable
level of reliability.
source: http://insidebigdata.com/2014/08/25/salespredict-
marketo-partner-using-predictive-analytics/
Predictive Analytics
“Big Data Predictive Analytics”
Forrester Research report is the
second most read Forrester report
in Q3, 2015
https://www.forrester.com
Predictive Analytics - Use cases
http://californialoanfind.com/what-and-who-is-teletrack/
Predictive Analytics - Use cases
http://www.chrisdunn.com/
Machine Learning
Field of study that gives computers
the ability to learn
without being explicitly
programmed.
- Arthur Samuel (1959)
Machine Learning - Pipeline
Machine Learning - Terminology
● Input data must be in tabular format
● Each row is called a data point
● Each column is called a feature
● Value you are going to predict is called the “response
variable”
● Next value prediction
● Classification
● Clustering
● Recommendations
etc…
Machine Learning - What type of a problem?
Next value prediction
Example of linear regression on one
independent variable
Predicting a discrete value
Classification
Grouping similar data points
together.
Clustering
Seek to predict preferences a user
would give to an item/product.
Recommendations
● Supervised learning
● Unsupervised learning
● Reinforcement learning
Machine Learning - Which algorithm category?
Supervised vs Unsupervised
Supervised Learning Algorithms
Regression Classification
Linear Regression
Lasso Regression
Ridge Regression
Logistic Regression
Support Vector Machine
(SVM)
Decision Tree
Random Forest
Naive Bayes
Bayesian Network
Unsupervised Learning Algorithms
Clustering
K-means
K-medians
Hierarchical Clustering
….
Java tools for Machine Learning
Tool License URL
Weka GNU General Public
License
http://www.cs.
waikato.ac.
nz/ml/weka/
JSAT GPL v3 https://github.
com/EdwardRaff/JSAT
Mahout Apache v2 https://mahout.
apache.org/
Spark MLlib Apache v2 http://spark.apache.
org/mllib/
Speed
Run programs up to 100x faster than Hadoop MapReduce in
memory, or 10x faster on disk.
Ease of Use
Write applications quickly in Java, Scala, Python, R.
Easy to Deploy
Runs on existing Hadoop clusters and data.
Apache Spark MLlib - scalable machine learning library
SparkConf - Configuration for a Spark application. Used to
set various Spark parameters as key-value pairs.
SparkContext / JavaSparkContext - Main entry point for Spark
functionality. A SparkContext represents the connection to a
Spark cluster. Only one SparkContext may active per JVM.
RDD / JavaRDD - A Resilient Distributed Dataset (RDD), the
basic abstraction in Spark. Represents an immutable,
partitioned collection of elements that can be operated in
parallel.
Apache Spark - few terms
Filter - Return a new dataset formed by selecting those
elements of the source on which function returns true.
Map - Return a new distributed dataset formed by passing
each element of the source through a function.
Random Split - Split a dataset randomly based on a given
ratio.
Cache - Persisting (or caching) a dataset in memory across
operations.
Apache Spark - few operations on a RDD
● Dataset
Pima Indian diabetes dataset
https://archive.ics.uci.
edu/ml/datasets/Pima+Indians+Diabetes
Number of instances : 768
Number of features : 8
Let’s solve a classification problem using Apache Spark
● Response variable
Name : class
Values : 0 or 1
Interpretation : Whether a given Pima Indian has diabetes
or not
Let’s solve a classification problem using Apache Spark
● Objective
Build a classification model to predict whether a given
Pima Indian has diabetes or not.
Let’s try to build a Logistic Regression
model for this.
Let’s solve a classification problem using Apache Spark
Code:
https://github.com/nirmal070125/ml-java-meetup
Solution using Apache Spark
Powered by Apache Spark and Apache Spark MLlib.
● Manage and explore your data
● Analyze the data using machine learning algorithms
● Build machine learning models
● Compare and manage generated machine learning models
● Predict using the built models
● Use the built models with WSO2 CEP and WSO2 ESB.
http://wso2.com/products/machine-learner/
Few words on WSO2 Machine Learner
Machine learning for java developers

More Related Content

What's hot

Data science
Data scienceData science
Data science
Purna Chander
 
Graph Based Machine Learning on Relational Data
Graph Based Machine Learning on Relational DataGraph Based Machine Learning on Relational Data
Graph Based Machine Learning on Relational DataBenjamin Bengfort
 
Quick presentation for the OpenML workshop in Eindhoven 2014
Quick presentation for the OpenML workshop in Eindhoven 2014Quick presentation for the OpenML workshop in Eindhoven 2014
Quick presentation for the OpenML workshop in Eindhoven 2014
Manuel Martín
 
Application of Clustering in Data Science using Real-life Examples
Application of Clustering in Data Science using Real-life Examples Application of Clustering in Data Science using Real-life Examples
Application of Clustering in Data Science using Real-life Examples
Edureka!
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
Simplilearn
 
Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)
Benjamin Bengfort
 
Maoye resume 2017_1_v10_short
Maoye resume 2017_1_v10_shortMaoye resume 2017_1_v10_short
Maoye resume 2017_1_v10_short
Mao Ye
 
Big Data Science in Scala
Big Data Science in ScalaBig Data Science in Scala
Big Data Science in Scala
Anastasia Bobyreva
 
Anomaly Detection and Automatic Labeling with Deep Learning
Anomaly Detection and Automatic Labeling with Deep LearningAnomaly Detection and Automatic Labeling with Deep Learning
Anomaly Detection and Automatic Labeling with Deep Learning
Adam Gibson
 
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2OStrata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Sri Ambati
 
Building Data Apps with Python
Building Data Apps with PythonBuilding Data Apps with Python
Building Data Apps with Python
Benjamin Bengfort
 
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
Sri Ambati
 
AllegroGraph - Cognitive Probability Graph webcast
AllegroGraph - Cognitive Probability Graph webcastAllegroGraph - Cognitive Probability Graph webcast
AllegroGraph - Cognitive Probability Graph webcast
Franz Inc. - AllegroGraph
 
Deep learning and Apache Spark
Deep learning and Apache SparkDeep learning and Apache Spark
Deep learning and Apache Spark
QuantUniversity
 
Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference
Srinath Perera
 
Challenges on Distributed Machine Learning
Challenges on Distributed Machine LearningChallenges on Distributed Machine Learning
Challenges on Distributed Machine Learning
jie cao
 
Distributed machine learning
Distributed machine learningDistributed machine learning
Distributed machine learning
Stanley Wang
 
Top Machine Learning Tools and Frameworks for Beginners | Edureka
Top Machine Learning Tools and Frameworks for Beginners | EdurekaTop Machine Learning Tools and Frameworks for Beginners | Edureka
Top Machine Learning Tools and Frameworks for Beginners | Edureka
Edureka!
 

What's hot (20)

Data science
Data scienceData science
Data science
 
Graph Based Machine Learning on Relational Data
Graph Based Machine Learning on Relational DataGraph Based Machine Learning on Relational Data
Graph Based Machine Learning on Relational Data
 
Quick presentation for the OpenML workshop in Eindhoven 2014
Quick presentation for the OpenML workshop in Eindhoven 2014Quick presentation for the OpenML workshop in Eindhoven 2014
Quick presentation for the OpenML workshop in Eindhoven 2014
 
Application of Clustering in Data Science using Real-life Examples
Application of Clustering in Data Science using Real-life Examples Application of Clustering in Data Science using Real-life Examples
Application of Clustering in Data Science using Real-life Examples
 
resume_MH
resume_MHresume_MH
resume_MH
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
 
Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)
 
Maoye resume 2017_1_v10_short
Maoye resume 2017_1_v10_shortMaoye resume 2017_1_v10_short
Maoye resume 2017_1_v10_short
 
Big Data Science in Scala
Big Data Science in ScalaBig Data Science in Scala
Big Data Science in Scala
 
Anomaly Detection and Automatic Labeling with Deep Learning
Anomaly Detection and Automatic Labeling with Deep LearningAnomaly Detection and Automatic Labeling with Deep Learning
Anomaly Detection and Automatic Labeling with Deep Learning
 
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2OStrata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2O
 
Building Data Apps with Python
Building Data Apps with PythonBuilding Data Apps with Python
Building Data Apps with Python
 
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
 
AllegroGraph - Cognitive Probability Graph webcast
AllegroGraph - Cognitive Probability Graph webcastAllegroGraph - Cognitive Probability Graph webcast
AllegroGraph - Cognitive Probability Graph webcast
 
Deep learning and Apache Spark
Deep learning and Apache SparkDeep learning and Apache Spark
Deep learning and Apache Spark
 
Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference
 
Challenges on Distributed Machine Learning
Challenges on Distributed Machine LearningChallenges on Distributed Machine Learning
Challenges on Distributed Machine Learning
 
Project
ProjectProject
Project
 
Distributed machine learning
Distributed machine learningDistributed machine learning
Distributed machine learning
 
Top Machine Learning Tools and Frameworks for Beginners | Edureka
Top Machine Learning Tools and Frameworks for Beginners | EdurekaTop Machine Learning Tools and Frameworks for Beginners | Edureka
Top Machine Learning Tools and Frameworks for Beginners | Edureka
 

Similar to Machine learning for java developers

DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
Stepan Pushkarev
 
Energy analytics with Apache Spark workshop
Energy analytics with Apache Spark workshopEnergy analytics with Apache Spark workshop
Energy analytics with Apache Spark workshop
QuantUniversity
 
Started with-apache-spark
Started with-apache-sparkStarted with-apache-spark
Started with-apache-spark
Happiest Minds Technologies
 
Spark1
Spark1Spark1
Analyzing Big data in R and Scala using Apache Spark 17-7-19
Analyzing Big data in R and Scala using Apache Spark  17-7-19Analyzing Big data in R and Scala using Apache Spark  17-7-19
Analyzing Big data in R and Scala using Apache Spark 17-7-19
Ahmed Elsayed
 
Hadoop/Spark Non-Technical Basics
Hadoop/Spark Non-Technical BasicsHadoop/Spark Non-Technical Basics
Hadoop/Spark Non-Technical Basics
Zitao Liu
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
Rajesh Muppalla
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Databricks
 
Scaling Analytics with Apache Spark
Scaling Analytics with Apache SparkScaling Analytics with Apache Spark
Scaling Analytics with Apache Spark
QuantUniversity
 
Intro to Spark development
 Intro to Spark development  Intro to Spark development
Intro to Spark development
Spark Summit
 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Edureka!
 
Introduction to Spark Training
Introduction to Spark TrainingIntroduction to Spark Training
Introduction to Spark Training
Spark Summit
 
Deploying Data Science Engines to Production
Deploying Data Science Engines to ProductionDeploying Data Science Engines to Production
Deploying Data Science Engines to Production
Mostafa Majidpour
 
IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for Spark
Mark Kerzner
 
PPT5: Neuron Introduction
PPT5: Neuron IntroductionPPT5: Neuron Introduction
PPT5: Neuron Introduction
akira-ai
 
2015 Data Science Summit @ dato Review
2015 Data Science Summit @ dato Review2015 Data Science Summit @ dato Review
2015 Data Science Summit @ dato Review
Hang Li
 
Himansu-Java&BigdataDeveloper
Himansu-Java&BigdataDeveloperHimansu-Java&BigdataDeveloper
Himansu-Java&BigdataDeveloperHimansu Behera
 

Similar to Machine learning for java developers (20)

DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
 
NYC_2016_slides
NYC_2016_slidesNYC_2016_slides
NYC_2016_slides
 
Energy analytics with Apache Spark workshop
Energy analytics with Apache Spark workshopEnergy analytics with Apache Spark workshop
Energy analytics with Apache Spark workshop
 
Started with-apache-spark
Started with-apache-sparkStarted with-apache-spark
Started with-apache-spark
 
Spark1
Spark1Spark1
Spark1
 
Analyzing Big data in R and Scala using Apache Spark 17-7-19
Analyzing Big data in R and Scala using Apache Spark  17-7-19Analyzing Big data in R and Scala using Apache Spark  17-7-19
Analyzing Big data in R and Scala using Apache Spark 17-7-19
 
Hadoop/Spark Non-Technical Basics
Hadoop/Spark Non-Technical BasicsHadoop/Spark Non-Technical Basics
Hadoop/Spark Non-Technical Basics
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
 
Aniket_Gaikwad_ML
Aniket_Gaikwad_MLAniket_Gaikwad_ML
Aniket_Gaikwad_ML
 
Scaling Analytics with Apache Spark
Scaling Analytics with Apache SparkScaling Analytics with Apache Spark
Scaling Analytics with Apache Spark
 
Intro to Spark development
 Intro to Spark development  Intro to Spark development
Intro to Spark development
 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
 
Introduction to Spark Training
Introduction to Spark TrainingIntroduction to Spark Training
Introduction to Spark Training
 
Deploying Data Science Engines to Production
Deploying Data Science Engines to ProductionDeploying Data Science Engines to Production
Deploying Data Science Engines to Production
 
IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for Spark
 
sudipto_resume
sudipto_resumesudipto_resume
sudipto_resume
 
PPT5: Neuron Introduction
PPT5: Neuron IntroductionPPT5: Neuron Introduction
PPT5: Neuron Introduction
 
2015 Data Science Summit @ dato Review
2015 Data Science Summit @ dato Review2015 Data Science Summit @ dato Review
2015 Data Science Summit @ dato Review
 
Himansu-Java&BigdataDeveloper
Himansu-Java&BigdataDeveloperHimansu-Java&BigdataDeveloper
Himansu-Java&BigdataDeveloper
 

Recently uploaded

Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
e20449
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Natan Silnitsky
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
Tendenci - The Open Source AMS (Association Management Software)
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
IES VE
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 

Recently uploaded (20)

Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 

Machine learning for java developers

  • 1. Machine Learning for Java Developers Nirmal Fernando WSO2 Inc. {Java Colombo}
  • 2. Few things about me... ● Associated Technical Lead at WSO2 ● Team Lead of WSO2 Machine Learner ● Just completed 4th year in the industry ● Graduated from Department of Computer Science, University of Moratuwa. ● Schooled at St. Sebastian’s College, Moratuwa. ● Can sing a bit :-) https://goo.gl/qbAXLz
  • 3. Predictive Analytics Extract information from existing datasets to determine patterns and predict future outcomes and trends. It does not tell you what will happen in the future. But forecasts what might happen in the future with an acceptable level of reliability. source: http://insidebigdata.com/2014/08/25/salespredict- marketo-partner-using-predictive-analytics/
  • 4. Predictive Analytics “Big Data Predictive Analytics” Forrester Research report is the second most read Forrester report in Q3, 2015 https://www.forrester.com
  • 5. Predictive Analytics - Use cases http://californialoanfind.com/what-and-who-is-teletrack/
  • 6. Predictive Analytics - Use cases http://www.chrisdunn.com/
  • 7. Machine Learning Field of study that gives computers the ability to learn without being explicitly programmed. - Arthur Samuel (1959)
  • 9. Machine Learning - Terminology ● Input data must be in tabular format ● Each row is called a data point ● Each column is called a feature ● Value you are going to predict is called the “response variable”
  • 10. ● Next value prediction ● Classification ● Clustering ● Recommendations etc… Machine Learning - What type of a problem?
  • 11. Next value prediction Example of linear regression on one independent variable
  • 12. Predicting a discrete value Classification
  • 13. Grouping similar data points together. Clustering
  • 14. Seek to predict preferences a user would give to an item/product. Recommendations
  • 15. ● Supervised learning ● Unsupervised learning ● Reinforcement learning Machine Learning - Which algorithm category?
  • 17. Supervised Learning Algorithms Regression Classification Linear Regression Lasso Regression Ridge Regression Logistic Regression Support Vector Machine (SVM) Decision Tree Random Forest Naive Bayes Bayesian Network
  • 19. Java tools for Machine Learning Tool License URL Weka GNU General Public License http://www.cs. waikato.ac. nz/ml/weka/ JSAT GPL v3 https://github. com/EdwardRaff/JSAT Mahout Apache v2 https://mahout. apache.org/ Spark MLlib Apache v2 http://spark.apache. org/mllib/
  • 20. Speed Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk. Ease of Use Write applications quickly in Java, Scala, Python, R. Easy to Deploy Runs on existing Hadoop clusters and data. Apache Spark MLlib - scalable machine learning library
  • 21. SparkConf - Configuration for a Spark application. Used to set various Spark parameters as key-value pairs. SparkContext / JavaSparkContext - Main entry point for Spark functionality. A SparkContext represents the connection to a Spark cluster. Only one SparkContext may active per JVM. RDD / JavaRDD - A Resilient Distributed Dataset (RDD), the basic abstraction in Spark. Represents an immutable, partitioned collection of elements that can be operated in parallel. Apache Spark - few terms
  • 22. Filter - Return a new dataset formed by selecting those elements of the source on which function returns true. Map - Return a new distributed dataset formed by passing each element of the source through a function. Random Split - Split a dataset randomly based on a given ratio. Cache - Persisting (or caching) a dataset in memory across operations. Apache Spark - few operations on a RDD
  • 23. ● Dataset Pima Indian diabetes dataset https://archive.ics.uci. edu/ml/datasets/Pima+Indians+Diabetes Number of instances : 768 Number of features : 8 Let’s solve a classification problem using Apache Spark
  • 24. ● Response variable Name : class Values : 0 or 1 Interpretation : Whether a given Pima Indian has diabetes or not Let’s solve a classification problem using Apache Spark
  • 25. ● Objective Build a classification model to predict whether a given Pima Indian has diabetes or not. Let’s try to build a Logistic Regression model for this. Let’s solve a classification problem using Apache Spark
  • 27. Powered by Apache Spark and Apache Spark MLlib. ● Manage and explore your data ● Analyze the data using machine learning algorithms ● Build machine learning models ● Compare and manage generated machine learning models ● Predict using the built models ● Use the built models with WSO2 CEP and WSO2 ESB. http://wso2.com/products/machine-learner/ Few words on WSO2 Machine Learner

Editor's Notes

  1. Fraud detection
  2. stock market prediction Stock market prediction is the act of trying to determine the future value of a company stock or other financial instrument traded on an exchange. The successful prediction of a stock's future price could yield significant profit.
  3. stock market prediction
  4. stock market prediction
  5. Reinforcement learning : A computer program interacts with a dynamic environment in which it must perform a certain goal (such as driving a vehicle), without a teacher explicitly telling it whether it has come close to its goal or not. Another example is learning to play a game by playing against an opponent
  6. Reinforcement learning : A computer program interacts with a dynamic environment in which it must perform a certain goal (such as driving a vehicle), without a teacher explicitly telling it whether it has come close to its goal or not. Another example is learning to play a game by playing against an opponent
  7. Mention about the row wise operations
  8. Reinforcement learning : A computer program interacts with a dynamic environment in which it must perform a certain goal (such as driving a vehicle), without a teacher explicitly telling it whether it has come close to its goal or not. Another example is learning to play a game by playing against an opponent
  9. Reinforcement learning : A computer program interacts with a dynamic environment in which it must perform a certain goal (such as driving a vehicle), without a teacher explicitly telling it whether it has come close to its goal or not. Another example is learning to play a game by playing against an opponent