SlideShare a Scribd company logo
1 of 31
Download to read offline
Something about
Data
Sanjeev Mishra Chris Bedford
Acknowledgement
● Bing for free images
● Machine Learning in Action (Peter Harrington)
● Wikipedia
Did you know that?
What about these?
What about these?
I guess you have heard of
● Siri or Google Now
● IBM Watson
● IBM Deep Blue
● Google Translate
● WolframAlpha
The Big Picture
What is Learning
Definition:
The acquisition of knowledge or skills through experience,
study, or by being taught.
Knowledge
Knowledge
reasoning
deduction
reasoning
What is Machine Learning
Field of study that gives computers the ability to learn
without being explicitly programmed
A computer program is said to learn from experience E
with respect to some class of tasks T and performance
measure P, if its performance at tasks in T, as measured by
P, improves with experience E
Data Mining
● Computational process of
discovering patterns in large data
sets
○ Structured or unstructured data
○ Patterns must be: valid, novel, potentially
useful, understandable
■ 80% of customers who buy cheese and milk also
buy bread, and 5% of customers buy all of them
together
■ Correlation among variables: positive or negative
Types Machine Learning
Unsupervised Supervised
Learn the patterns in data
● no training
● face detection in a set images
● group objects based on some
similarity
● clustering (nominal data)
● density estimation (numeric data)
Predict or forecast a something
● training
● recognize a face in a set of images
● given an object predict the type
● classification (nominal data)
● regression or curve fitting (numeric
data)
Clustering
Clustering using k-Means
● Input
○ M (set of points)
○ k (number of clusters)
● Output
○ k cluster centroids c1,..
ck
(ci
is the centroid of all x
j
€ S
i
)
● Approach
○ Minimizing the squared error function
where is a chosen distance
measure between a data point
and the cluster centre , is an
indicator of the distance of the
n data points from their
respective cluster centres.
k-Means
create k points for starting centroids (random)
while any point has changed cluster assignment
for every point in our dataset:
for every centroid
calculate the distance between the centroid and point
assign the point to the cluster with the lowest distance
for every cluster calculate the mean of the points in that cluster
assign the centroid to the mean
Clustering Demo
k-Means
Pros
● Easy to implement
● Fast on small dataset
Cons
● A priori knowledge of K
● Slow on very large dataset
● Sensitive to outliers
● Can converge to local minima
k-Means (wrong k)
K = 4
K = 3
Improving K-means
● Bisecting K-means
○ Choose cluster with largest SSE
○ Split it till k
Supervised Learning: Linear Regression
Attempts to find a mathematical (linear) function that can approximate the relationship between a set of
one or more input variables and what is called a response variable.
Example: A web site for amusement park X
* Interested in offering ride coupons
* Rides have height requirements
* Avoid issuing coupons for ride if user is too short
* Most users sign up from Facebook, so we have their ages.
* So: we use age to predict height.
Supervised Learning: Linear Regression
Supervised Learning: Linear Regression
Supervised Learning: Linear Regression
A more complex data set: two input variables.
sqFt,bathrooms,priceInThousands
1200,1,750
1250,2,900
2000,2.5,1500
1800,2,1200
1000,1.5,700
1800,3,1400
1100,1.5,800
2200,3,1700
1250,1.5,850
1300,2,1100
Our previous example had a one dimensional set of input variables, now we have a 2-
dimensional set: for each two-tuple consisting of numBathrooms and squareFeet we
have the selling price of a corresponding home. From this training data, we
create a model that predicts a “plane of best fit”. Given a new two-tuple
[ numBathrooms-x, squareFeet-y ] our model will predict the point on the plane which
denotes the most likely selling prices for a house with those attributes.
FOR SALE
Supervised Learning: Linear Regression
For a one dimensional set of input variables we had a line of best fit, for a two
dimensional set, we have a plane of best fit. Here’s what our plane looks like.
Why Use R ?
Many data scientists use R, due to
- extensive, well tested libraries of statistical, mathematical functions
- math friendly syntax
- excellent support for charting and plotting functions
- active user community to provide support
R skills are valuable for big data engineers, since:
- data scientists we work with will often develop their models using R
- significant effort is required to translate such models to Java, C++, etc.
So: useful not only to understand R,
but also to be able to invoke R from your native language
R code for 2 dimensional model
values <- read.csv(filePath)
model <- lm(priceInThousands ~ sqFt + bathrooms, data=values)
# predict new value
#
# set up 'data frame'
newdata <- data.frame(sqFt=1600, bathrooms=3)
#
# invoke prediction function
predict(model, newdata)
csv file is in same format we saw in intro
slide on linear regression
response variableinput (independent) variables
R’s linear model
creation function response variableresponse variable
predict most likely selling price using model ‘model’ and the data frame that wraps
variables sqFt (1600), and bathrooms (3).
Calling R from Java
import org.rosuda.JRI.REXP;
import org.rosuda.JRI.Rengine;
class RegressionModelExecutor {
// Current R session (only one per JVM,
// since rjava is not multi-threaded).
Rengine rengine = null
RegressionModelExecutor(String inputDataPath) {
String []engineArgs = new String[1];
engineArgs [0] = "--vanilla";
rengine=new Rengine (engineArgs, false, null);
String script =
"""
values = read.csv('$inputDataPath')
newModel.lm = lm(
priceInThousands ~
sqFt + bathrooms, data=values)
"""
evaluateScript(script) // initialize model
}
public void shutdown() {
rengine.end();
}
// Apply model 'newModel.lm' to predict price of a house
// with given values for squareFeet and numBathrooms.
public double predictInstance(int sqft, float baths) {
rengine.eval(
"newdata = data.frame(
sqFt=$sqft, bathrooms=$baths)")
REXP result = rengine.eval(
"predict(newModel.lm , newdata)")
return result.asDouble()
}
// Evaluate block of R expressions, taking into account
// the fact that Rengine only executes one statement at
// a time. Unconditionally dumps out lines before executing
// the script so that if anything goes wrong we can copy
// paste the constructed output (scriptLines) directly
// into an R session.
public String evaluateScript(String scriptLines) {
println("evaluating: n$scriptLines")
for (String line: scriptLines.split("n")) {
rengine.eval(line)
}
}
~
Calling R from Java
More detailed article on R/Java:
http://buildlackey.com/integrating-r-and-java-with-jrirjava-a-jni-based-bridge/
How linear regression = Machine
learning?
A computer program is said to learn from experience E
with respect to some class of tasks T and performance
measure P, if its performance at tasks in T, as measured by
P, improves with experience E
Supervised Learning: Linear Regression
LEARN MORE:
KHAN ACADEMY
https://www.khanacademy.org/
COURSERA:
Coding the Matrix Course (Linear Algebra)
http://www.youtube.com/watch?v=IWugXcWpfoM
MIT Open Courseware
Linear Algebra Course
http://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/index.htm
Software and Tools
● Apache Mahout (http://mahout.apache.org/): Java, Apache
● http://prediction.io/ (Machine learning server)
● Weka (http://www.cs.waikato.ac.nz/ml/weka/): Java, GPL
● OpenNLP (http://opennlp.apache.org/): Java, Apache
● Stanford NLP (http://nlp.stanford.edu/software/): Java, GPL
● Scikit-learn (http://scikit-learn.org/stable/): Python, BSD
● mply (http://mlpy.sourceforge.net/): Python, GPL
● NLTK (http://nltk.org/): Python, Apache
● http://www.alchemyapi.com/
Tools
R, Matlab, Octave
http://mloss.org/software/
http://sourceforge.net/directory/science-engineering/ai/machinelearning/os:linux/freshness:recently-updated/
Courses and other materials
● Coursera (http://www.coursera.org/):
○ machine learning
○ natural language processing
○ neural networks
● Udacity (https://www.udacity.com/courses)
○ artificial intelligence
● http://cs229.stanford.edu/materials.html
● http://www.ai.mit.edu/courses/6.867-f03/lectures.html
● wikipedia.org
Something about
Data
Sanjeev Mishra Chris Bedford
sanjeev.mishra@gmail.com chris@buildlackey.com

More Related Content

What's hot

Visualizing the model selection process
Visualizing the model selection processVisualizing the model selection process
Visualizing the model selection processRebecca Bilbro
 
Visualizing the Model Selection Process
Visualizing the Model Selection ProcessVisualizing the Model Selection Process
Visualizing the Model Selection ProcessBenjamin Bengfort
 
Yellowbrick: Steering machine learning with visual transformers
Yellowbrick: Steering machine learning with visual transformersYellowbrick: Steering machine learning with visual transformers
Yellowbrick: Steering machine learning with visual transformersRebecca Bilbro
 
Embedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking systemEmbedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking systemMarsan Ma
 
The ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptxThe ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptxRuby Shrestha
 
Mathematical Analysis of Non-Recursive Algorithm.
Mathematical Analysis of Non-Recursive Algorithm.Mathematical Analysis of Non-Recursive Algorithm.
Mathematical Analysis of Non-Recursive Algorithm.mohanrathod18
 
Intellectual technologies
Intellectual technologiesIntellectual technologies
Intellectual technologiesPolad Saruxanov
 
Game playing (tic tac-toe), andor graph
Game playing (tic tac-toe), andor graphGame playing (tic tac-toe), andor graph
Game playing (tic tac-toe), andor graphSyed Zaid Irshad
 
Introduction to Bayesian Analysis in Python
Introduction to Bayesian Analysis in PythonIntroduction to Bayesian Analysis in Python
Introduction to Bayesian Analysis in PythonPeadar Coyle
 
Variational Inference in Python
Variational Inference in PythonVariational Inference in Python
Variational Inference in PythonPeadar Coyle
 
Introduction to Machine Learning in Python using Scikit-Learn
Introduction to Machine Learning in Python using Scikit-LearnIntroduction to Machine Learning in Python using Scikit-Learn
Introduction to Machine Learning in Python using Scikit-LearnAmol Agrawal
 
VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1BigML, Inc
 
Lec5 pagerank
Lec5 pagerankLec5 pagerank
Lec5 pagerankCarlos
 
Pagerank (from Google)
Pagerank (from Google)Pagerank (from Google)
Pagerank (from Google)Sri Prasanna
 
(Py)testing the Limits of Machine Learning
(Py)testing the Limits of Machine Learning(Py)testing the Limits of Machine Learning
(Py)testing the Limits of Machine LearningRebecca Bilbro
 
Nearest neighbour algorithm
Nearest neighbour algorithmNearest neighbour algorithm
Nearest neighbour algorithmAnmitas1
 
educational course/tutorialoutlet.com
educational course/tutorialoutlet.comeducational course/tutorialoutlet.com
educational course/tutorialoutlet.comjorge0043
 
Introduction to Machine Learning with Python and scikit-learn
Introduction to Machine Learning with Python and scikit-learnIntroduction to Machine Learning with Python and scikit-learn
Introduction to Machine Learning with Python and scikit-learnMatt Hagy
 

What's hot (20)

Visualizing the model selection process
Visualizing the model selection processVisualizing the model selection process
Visualizing the model selection process
 
Visualizing the Model Selection Process
Visualizing the Model Selection ProcessVisualizing the Model Selection Process
Visualizing the Model Selection Process
 
Yellowbrick: Steering machine learning with visual transformers
Yellowbrick: Steering machine learning with visual transformersYellowbrick: Steering machine learning with visual transformers
Yellowbrick: Steering machine learning with visual transformers
 
Lec5 Pagerank
Lec5 PagerankLec5 Pagerank
Lec5 Pagerank
 
Nearest neighbors
Nearest neighborsNearest neighbors
Nearest neighbors
 
Embedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking systemEmbedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking system
 
The ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptxThe ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptx
 
Mathematical Analysis of Non-Recursive Algorithm.
Mathematical Analysis of Non-Recursive Algorithm.Mathematical Analysis of Non-Recursive Algorithm.
Mathematical Analysis of Non-Recursive Algorithm.
 
Intellectual technologies
Intellectual technologiesIntellectual technologies
Intellectual technologies
 
Game playing (tic tac-toe), andor graph
Game playing (tic tac-toe), andor graphGame playing (tic tac-toe), andor graph
Game playing (tic tac-toe), andor graph
 
Introduction to Bayesian Analysis in Python
Introduction to Bayesian Analysis in PythonIntroduction to Bayesian Analysis in Python
Introduction to Bayesian Analysis in Python
 
Variational Inference in Python
Variational Inference in PythonVariational Inference in Python
Variational Inference in Python
 
Introduction to Machine Learning in Python using Scikit-Learn
Introduction to Machine Learning in Python using Scikit-LearnIntroduction to Machine Learning in Python using Scikit-Learn
Introduction to Machine Learning in Python using Scikit-Learn
 
VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1
 
Lec5 pagerank
Lec5 pagerankLec5 pagerank
Lec5 pagerank
 
Pagerank (from Google)
Pagerank (from Google)Pagerank (from Google)
Pagerank (from Google)
 
(Py)testing the Limits of Machine Learning
(Py)testing the Limits of Machine Learning(Py)testing the Limits of Machine Learning
(Py)testing the Limits of Machine Learning
 
Nearest neighbour algorithm
Nearest neighbour algorithmNearest neighbour algorithm
Nearest neighbour algorithm
 
educational course/tutorialoutlet.com
educational course/tutorialoutlet.comeducational course/tutorialoutlet.com
educational course/tutorialoutlet.com
 
Introduction to Machine Learning with Python and scikit-learn
Introduction to Machine Learning with Python and scikit-learnIntroduction to Machine Learning with Python and scikit-learn
Introduction to Machine Learning with Python and scikit-learn
 

Similar to Silicon valleycodecamp2013

Introduction to Machine Learning with Spark
Introduction to Machine Learning with SparkIntroduction to Machine Learning with Spark
Introduction to Machine Learning with Sparkdatamantra
 
Hadoop France meetup Feb2016 : recommendations with spark
Hadoop France meetup  Feb2016 : recommendations with sparkHadoop France meetup  Feb2016 : recommendations with spark
Hadoop France meetup Feb2016 : recommendations with sparkModern Data Stack France
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnBenjamin Bengfort
 
Ml programming with python
Ml programming with pythonMl programming with python
Ml programming with pythonKumud Arora
 
K-Means Algorithm Implementation In python
K-Means Algorithm Implementation In pythonK-Means Algorithm Implementation In python
K-Means Algorithm Implementation In pythonAfzal Ahmad
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithmsArunangsu Sahu
 
House price prediction
House price predictionHouse price prediction
House price predictionSabahBegum
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fittingWush Wu
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaChetan Khatri
 
Start machine learning in 5 simple steps
Start machine learning in 5 simple stepsStart machine learning in 5 simple steps
Start machine learning in 5 simple stepsRenjith M P
 
Building and deploying analytics
Building and deploying analyticsBuilding and deploying analytics
Building and deploying analyticsCollin Bennett
 
Deep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudDataDeep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudDataWeCloudData
 
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...Chetan Khatri
 
Development Infographic
Development InfographicDevelopment Infographic
Development InfographicRealMassive
 
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...IRJET Journal
 
Application's of Numerical Math in CSE
Application's of Numerical Math in CSEApplication's of Numerical Math in CSE
Application's of Numerical Math in CSEsanjana mun
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017Manish Pandey
 
Big data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial UsecasesBig data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial UsecasesArvind Rapaka
 

Similar to Silicon valleycodecamp2013 (20)

Introduction to Machine Learning with Spark
Introduction to Machine Learning with SparkIntroduction to Machine Learning with Spark
Introduction to Machine Learning with Spark
 
Hadoop France meetup Feb2016 : recommendations with spark
Hadoop France meetup  Feb2016 : recommendations with sparkHadoop France meetup  Feb2016 : recommendations with spark
Hadoop France meetup Feb2016 : recommendations with spark
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Ml programming with python
Ml programming with pythonMl programming with python
Ml programming with python
 
K-Means Algorithm Implementation In python
K-Means Algorithm Implementation In pythonK-Means Algorithm Implementation In python
K-Means Algorithm Implementation In python
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithms
 
House price prediction
House price predictionHouse price prediction
House price prediction
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fitting
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
 
Start machine learning in 5 simple steps
Start machine learning in 5 simple stepsStart machine learning in 5 simple steps
Start machine learning in 5 simple steps
 
Building and deploying analytics
Building and deploying analyticsBuilding and deploying analytics
Building and deploying analytics
 
Deep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudDataDeep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudData
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
 
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
 
Development Infographic
Development InfographicDevelopment Infographic
Development Infographic
 
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
 
Application's of Numerical Math in CSE
Application's of Numerical Math in CSEApplication's of Numerical Math in CSE
Application's of Numerical Math in CSE
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017
 
Big data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial UsecasesBig data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial Usecases
 
working with python
working with pythonworking with python
working with python
 

Recently uploaded

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialJoão Esperancinha
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 

Recently uploaded (20)

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorial
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 

Silicon valleycodecamp2013

  • 2. Acknowledgement ● Bing for free images ● Machine Learning in Action (Peter Harrington) ● Wikipedia
  • 3. Did you know that?
  • 6. I guess you have heard of ● Siri or Google Now ● IBM Watson ● IBM Deep Blue ● Google Translate ● WolframAlpha
  • 8. What is Learning Definition: The acquisition of knowledge or skills through experience, study, or by being taught. Knowledge Knowledge reasoning deduction reasoning
  • 9. What is Machine Learning Field of study that gives computers the ability to learn without being explicitly programmed A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E
  • 10. Data Mining ● Computational process of discovering patterns in large data sets ○ Structured or unstructured data ○ Patterns must be: valid, novel, potentially useful, understandable ■ 80% of customers who buy cheese and milk also buy bread, and 5% of customers buy all of them together ■ Correlation among variables: positive or negative
  • 11. Types Machine Learning Unsupervised Supervised Learn the patterns in data ● no training ● face detection in a set images ● group objects based on some similarity ● clustering (nominal data) ● density estimation (numeric data) Predict or forecast a something ● training ● recognize a face in a set of images ● given an object predict the type ● classification (nominal data) ● regression or curve fitting (numeric data)
  • 13. Clustering using k-Means ● Input ○ M (set of points) ○ k (number of clusters) ● Output ○ k cluster centroids c1,.. ck (ci is the centroid of all x j € S i ) ● Approach ○ Minimizing the squared error function where is a chosen distance measure between a data point and the cluster centre , is an indicator of the distance of the n data points from their respective cluster centres.
  • 14. k-Means create k points for starting centroids (random) while any point has changed cluster assignment for every point in our dataset: for every centroid calculate the distance between the centroid and point assign the point to the cluster with the lowest distance for every cluster calculate the mean of the points in that cluster assign the centroid to the mean Clustering Demo
  • 15. k-Means Pros ● Easy to implement ● Fast on small dataset Cons ● A priori knowledge of K ● Slow on very large dataset ● Sensitive to outliers ● Can converge to local minima
  • 16. k-Means (wrong k) K = 4 K = 3
  • 17. Improving K-means ● Bisecting K-means ○ Choose cluster with largest SSE ○ Split it till k
  • 18. Supervised Learning: Linear Regression Attempts to find a mathematical (linear) function that can approximate the relationship between a set of one or more input variables and what is called a response variable. Example: A web site for amusement park X * Interested in offering ride coupons * Rides have height requirements * Avoid issuing coupons for ride if user is too short * Most users sign up from Facebook, so we have their ages. * So: we use age to predict height.
  • 21. Supervised Learning: Linear Regression A more complex data set: two input variables. sqFt,bathrooms,priceInThousands 1200,1,750 1250,2,900 2000,2.5,1500 1800,2,1200 1000,1.5,700 1800,3,1400 1100,1.5,800 2200,3,1700 1250,1.5,850 1300,2,1100 Our previous example had a one dimensional set of input variables, now we have a 2- dimensional set: for each two-tuple consisting of numBathrooms and squareFeet we have the selling price of a corresponding home. From this training data, we create a model that predicts a “plane of best fit”. Given a new two-tuple [ numBathrooms-x, squareFeet-y ] our model will predict the point on the plane which denotes the most likely selling prices for a house with those attributes. FOR SALE
  • 22. Supervised Learning: Linear Regression For a one dimensional set of input variables we had a line of best fit, for a two dimensional set, we have a plane of best fit. Here’s what our plane looks like.
  • 23. Why Use R ? Many data scientists use R, due to - extensive, well tested libraries of statistical, mathematical functions - math friendly syntax - excellent support for charting and plotting functions - active user community to provide support R skills are valuable for big data engineers, since: - data scientists we work with will often develop their models using R - significant effort is required to translate such models to Java, C++, etc. So: useful not only to understand R, but also to be able to invoke R from your native language
  • 24. R code for 2 dimensional model values <- read.csv(filePath) model <- lm(priceInThousands ~ sqFt + bathrooms, data=values) # predict new value # # set up 'data frame' newdata <- data.frame(sqFt=1600, bathrooms=3) # # invoke prediction function predict(model, newdata) csv file is in same format we saw in intro slide on linear regression response variableinput (independent) variables R’s linear model creation function response variableresponse variable predict most likely selling price using model ‘model’ and the data frame that wraps variables sqFt (1600), and bathrooms (3).
  • 25. Calling R from Java import org.rosuda.JRI.REXP; import org.rosuda.JRI.Rengine; class RegressionModelExecutor { // Current R session (only one per JVM, // since rjava is not multi-threaded). Rengine rengine = null RegressionModelExecutor(String inputDataPath) { String []engineArgs = new String[1]; engineArgs [0] = "--vanilla"; rengine=new Rengine (engineArgs, false, null); String script = """ values = read.csv('$inputDataPath') newModel.lm = lm( priceInThousands ~ sqFt + bathrooms, data=values) """ evaluateScript(script) // initialize model } public void shutdown() { rengine.end(); } // Apply model 'newModel.lm' to predict price of a house // with given values for squareFeet and numBathrooms. public double predictInstance(int sqft, float baths) { rengine.eval( "newdata = data.frame( sqFt=$sqft, bathrooms=$baths)") REXP result = rengine.eval( "predict(newModel.lm , newdata)") return result.asDouble() } // Evaluate block of R expressions, taking into account // the fact that Rengine only executes one statement at // a time. Unconditionally dumps out lines before executing // the script so that if anything goes wrong we can copy // paste the constructed output (scriptLines) directly // into an R session. public String evaluateScript(String scriptLines) { println("evaluating: n$scriptLines") for (String line: scriptLines.split("n")) { rengine.eval(line) } } ~
  • 26. Calling R from Java More detailed article on R/Java: http://buildlackey.com/integrating-r-and-java-with-jrirjava-a-jni-based-bridge/
  • 27. How linear regression = Machine learning? A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E
  • 28. Supervised Learning: Linear Regression LEARN MORE: KHAN ACADEMY https://www.khanacademy.org/ COURSERA: Coding the Matrix Course (Linear Algebra) http://www.youtube.com/watch?v=IWugXcWpfoM MIT Open Courseware Linear Algebra Course http://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/index.htm
  • 29. Software and Tools ● Apache Mahout (http://mahout.apache.org/): Java, Apache ● http://prediction.io/ (Machine learning server) ● Weka (http://www.cs.waikato.ac.nz/ml/weka/): Java, GPL ● OpenNLP (http://opennlp.apache.org/): Java, Apache ● Stanford NLP (http://nlp.stanford.edu/software/): Java, GPL ● Scikit-learn (http://scikit-learn.org/stable/): Python, BSD ● mply (http://mlpy.sourceforge.net/): Python, GPL ● NLTK (http://nltk.org/): Python, Apache ● http://www.alchemyapi.com/ Tools R, Matlab, Octave http://mloss.org/software/ http://sourceforge.net/directory/science-engineering/ai/machinelearning/os:linux/freshness:recently-updated/
  • 30. Courses and other materials ● Coursera (http://www.coursera.org/): ○ machine learning ○ natural language processing ○ neural networks ● Udacity (https://www.udacity.com/courses) ○ artificial intelligence ● http://cs229.stanford.edu/materials.html ● http://www.ai.mit.edu/courses/6.867-f03/lectures.html ● wikipedia.org
  • 31. Something about Data Sanjeev Mishra Chris Bedford sanjeev.mishra@gmail.com chris@buildlackey.com