SlideShare a Scribd company logo
1 of 98
Joel Grus
Seattle DAML Meetup
June 23, 2015
Data Science from Scratch
About me
Old-school DAML-er
Wrote a book ---------->
SWE at Google
Formerly data science at
VoloMetrix, Decide,
Farecast
The Road to
Data Science
The Road to
Data Science
My
Grad School
Fareology
Data Science Is A Broad Field
Some Stuff
More
Stuff
Even
More
Stuff
Data
Science
People who think they're
data scientists, but they're
not really data scientists
People who are a danger
to everyone around them
People who say
"machine learnings"
a data scientist should be able to
JOEL GRUS
a data scientist should be able to
run a regression,
JOEL GRUS
a data scientist should be able to
run a regression, write a sql query,
JOEL GRUS
a data scientist should be able to
run a regression, write a sql query, scrape a web
site,
JOEL GRUS
a data scientist should be able to
run a regression, write a sql query, scrape a web
site, design an experiment,
JOEL GRUS
a data scientist should be able to
run a regression, write a sql query, scrape a web
site, design an experiment, factor matrices,
JOEL GRUS
a data scientist should be able to
run a regression, write a sql query, scrape a web
site, design an experiment, factor matrices, use a
data frame,
JOEL GRUS
a data scientist should be able to
run a regression, write a sql query, scrape a web
site, design an experiment, factor matrices, use a
data frame, pretend to understand deep learning,
JOEL GRUS
a data scientist should be able to
run a regression, write a sql query, scrape a web
site, design an experiment, factor matrices, use a
data frame, pretend to understand deep learning,
steal from the d3 gallery,
JOEL GRUS
a data scientist should be able to
run a regression, write a sql query, scrape a web
site, design an experiment, factor matrices, use a
data frame, pretend to understand deep learning,
steal from the d3 gallery, argue r versus python,
JOEL GRUS
a data scientist should be able to
run a regression, write a sql query, scrape a web
site, design an experiment, factor matrices, use a
data frame, pretend to understand deep learning,
steal from the d3 gallery, argue r versus python,
think in mapreduce,
JOEL GRUS
a data scientist should be able to
run a regression, write a sql query, scrape a web
site, design an experiment, factor matrices, use a
data frame, pretend to understand deep learning,
steal from the d3 gallery, argue r versus python,
think in mapreduce, update a prior,
JOEL GRUS
a data scientist should be able to
run a regression, write a sql query, scrape a web
site, design an experiment, factor matrices, use a
data frame, pretend to understand deep learning,
steal from the d3 gallery, argue r versus python,
think in mapreduce, update a prior, build a
dashboard,
JOEL GRUS
a data scientist should be able to
run a regression, write a sql query, scrape a web
site, design an experiment, factor matrices, use a
data frame, pretend to understand deep learning,
steal from the d3 gallery, argue r versus python,
think in mapreduce, update a prior, build a
dashboard, clean up messy data,
JOEL GRUS
a data scientist should be able to
run a regression, write a sql query, scrape a web
site, design an experiment, factor matrices, use a
data frame, pretend to understand deep learning,
steal from the d3 gallery, argue r versus python,
think in mapreduce, update a prior, build a
dashboard, clean up messy data, test a hypothesis,
JOEL GRUS
a data scientist should be able to
run a regression, write a sql query, scrape a web
site, design an experiment, factor matrices, use a
data frame, pretend to understand deep learning,
steal from the d3 gallery, argue r versus python,
think in mapreduce, update a prior, build a
dashboard, clean up messy data, test a hypothesis,
talk to a businessperson,
JOEL GRUS
a data scientist should be able to
run a regression, write a sql query, scrape a web
site, design an experiment, factor matrices, use a
data frame, pretend to understand deep learning,
steal from the d3 gallery, argue r versus python,
think in mapreduce, update a prior, build a
dashboard, clean up messy data, test a hypothesis,
talk to a businessperson, script a shell,
JOEL GRUS
a data scientist should be able to
run a regression, write a sql query, scrape a web
site, design an experiment, factor matrices, use a
data frame, pretend to understand deep learning,
steal from the d3 gallery, argue r versus python,
think in mapreduce, update a prior, build a
dashboard, clean up messy data, test a hypothesis,
talk to a businessperson, script a shell, code on a
whiteboard,
JOEL GRUS
a data scientist should be able to
run a regression, write a sql query, scrape a web
site, design an experiment, factor matrices, use a
data frame, pretend to understand deep learning,
steal from the d3 gallery, argue r versus python,
think in mapreduce, update a prior, build a
dashboard, clean up messy data, test a hypothesis,
talk to a businessperson, script a shell, code on a
whiteboard, hack a p-value,
JOEL GRUS
a data scientist should be able to
run a regression, write a sql query, scrape a web
site, design an experiment, factor matrices, use a
data frame, pretend to understand deep learning,
steal from the d3 gallery, argue r versus python,
think in mapreduce, update a prior, build a
dashboard, clean up messy data, test a hypothesis,
talk to a businessperson, script a shell, code on a
whiteboard, hack a p-value, machine-learn a model.
JOEL GRUS
a data scientist should be able to
run a regression, write a sql query, scrape a web
site, design an experiment, factor matrices, use a
data frame, pretend to understand deep learning,
steal from the d3 gallery, argue r versus python,
think in mapreduce, update a prior, build a
dashboard, clean up messy data, test a hypothesis,
talk to a businessperson, script a shell, code on a
whiteboard, hack a p-value, machine-learn a model.
specialization is for engineers.
JOEL GRUS
A lot of stuff!
What Are Hiring Managers Looking For?
What Are Hiring Managers Looking For?
Let's check LinkedIn
a data scientist should be able to
run a regression, write a sql query, scrape a web
site, design an experiment, factor matrices, use a
data frame, pretend to understand deep learning,
steal from the d3 gallery, argue r versus python,
think in mapreduce, update a prior, build a
dashboard, clean up messy data, test a hypothesis,
talk to a businessperson, script a shell, code on a
whiteboard, hack a p-value, machine-learn a model.
specialization is for engineers.
JOEL GRUS
grad students!
Learning Data Science
I want to be a
data scientist. Great!
The Math Way
I like to start with
matrix
decompositions.
How's your
measure theory?
The Math Way
The Good:
Solid foundation
Math is the noblest
known pursuit
The Math Way
The Good:
Solid foundation
Math is the noblest
known pursuit
The Bad:
Some weirdos don't
think math is fun
Can be pretty
forbidding
Can miss practical
skills
So, did you
count the
words in that
document?
No, but I have an
elegant proof
that the number
of words is finite!
OK, Let's Try Again
I want to be a
data scientist. Great!
The Tools Way
Here's a list of
the 25 libraries
you really ought
to know. How's
your R
programming?
The Tools Way
The Good:
Don't have to
understand the
math
Practical
Can get started doing
fun stuff right away
The Tools Way
The Good:
Don't have to
understand the
math
Practical
Can get started doing
fun stuff right away
The Bad:
Don't have to
understand the
math
Can get started doing
bad science right
away
So, did you
build that
model?
Yes, and it fits the
training data
almost perfectly!
OK, Maybe Not That Either
So Then What?
Example: k-means clustering
Unsupervised machine learning technique
Given a set of points, group them into k clusters
in a way that minimizes the within-cluster sum-
of-squares
i.e. in a way such that the clusters are as "small"
as possible (for a particular conception of
"small")
The Math Way
The Math Way
The Tools Way
# a 2-dimensional example
x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
colnames(x) <- c("x", "y")
(cl <- kmeans(x, 2))
plot(x, col = cl$cluster)
points(cl$centers, col = 1:2, pch = 8, cex = 2)
The Tools Way
>>> from sklearn import cluster, datasets
>>> iris = datasets.load_iris()
>>> X_iris = iris.data
>>> y_iris = iris.target
>>> k_means = cluster.KMeans(n_clusters=3)
>>> k_means.fit(X_iris)
KMeans(copy_x=True, init='k-means++', ...
>>> print(k_means.labels_[::10])
[1 1 1 1 1 0 0 0 0 0 2 2 2 2 2]
>>> print(y_iris[::10])
[0 0 0 0 0 1 1 1 1 1 2 2 2 2 2]
So What To Do?
Bootcamps?
Data Science from Scratch
This is to certify that Joel Grus
has honorably completed the course of study outlined in
the book Data Science from Scratch: First Principles with
Python, and is entitled to all the Rights, Privileges, and
Honors thereunto appertaining.
Joel GrusJune 23, 2015
Certificate Programs?
Hey! Data scientists!
Learning By Building
You don't really understand something until you
build it
For example, I understand garbage disposals
much better now that I had to replace one that
was leaking water all over my kitchen
More relevantly, I thought I understood
hypothesis testing, until I tried to write a book
chapter + code about it.
Learning By Building
Functional Programming
Break Things Down Into Small Functions
So you
don't end
up with
something
like this
Don't Mutate
Example: k-means clustering
Given a set of points, group them into k clusters
in a way that minimizes the within-cluster sum-
of-squares
Global optimization is hard, so use a greedy
iterative approach
Fun Motivation: Image Posterization
Image consists of pixels
Each pixel is a triplet (R,G,B)
Imagine pixels as points in space
Find k clusters of pixels
Recolor each pixel to its cluster mean
I think it's fun, anyway
8 colors
Example: k-means clustering
given some points, find k clusters by
choose k "means"
repeat:
assign each point to cluster of closest "mean"
recompute mean of each cluster
sounds simple! let's code!
def k_means(points, k, num_iters=10):
means = list(random.sample(points, k))
assignments = [None for _ in points]
for _ in range(num_iters):
# assign each point to closest mean
for i, point_i in enumerate(points):
d_min = float('inf')
for j, mean_j in enumerate(means):
d = sum((x - y)**2
for x, y in zip(point_i, mean_j))
if d < d_min:
d_min = d
assignments[i] = j
# recompute means
for j in range(k):
cluster = [point for i, point in enumerate(points) if assignments[i] ==
j]
means[j] = mean(cluster)
return means
def k_means(points, k, num_iters=10):
means = list(random.sample(points, k))
assignments = [None for _ in points]
for _ in range(num_iters):
# assign each point to closest mean
for i, point_i in enumerate(points):
d_min = float('inf')
for j, mean_j in enumerate(means):
d = sum((x - y)**2
for x, y in zip(point_i, mean_j))
if d < d_min:
d_min = d
assignments[i] = j
# recompute means
for j in range(k):
cluster = [point for i, point in enumerate(points) if assignments[i] ==
j]
means[j] = mean(cluster)
return means
start with k randomly chosen points
def k_means(points, k, num_iters=10):
means = list(random.sample(points, k))
assignments = [None for _ in points]
for _ in range(num_iters):
# assign each point to closest mean
for i, point_i in enumerate(points):
d_min = float('inf')
for j, mean_j in enumerate(means):
d = sum((x - y)**2
for x, y in zip(point_i, mean_j))
if d < d_min:
d_min = d
assignments[i] = j
# recompute means
for j in range(k):
cluster = [point for i, point in enumerate(points) if assignments[i] ==
j]
means[j] = mean(cluster)
return means
start with k randomly chosen points
start with no cluster assignments
def k_means(points, k, num_iters=10):
means = list(random.sample(points, k))
assignments = [None for _ in points]
for _ in range(num_iters):
# assign each point to closest mean
for i, point_i in enumerate(points):
d_min = float('inf')
for j, mean_j in enumerate(means):
d = sum((x - y)**2
for x, y in zip(point_i, mean_j))
if d < d_min:
d_min = d
assignments[i] = j
# recompute means
for j in range(k):
cluster = [point for i, point in enumerate(points) if assignments[i] ==
j]
means[j] = mean(cluster)
return means
start with k randomly chosen points
start with no cluster assignments
for each iteration
def k_means(points, k, num_iters=10):
means = list(random.sample(points, k))
assignments = [None for _ in points]
for _ in range(num_iters):
# assign each point to closest mean
for i, point_i in enumerate(points):
d_min = float('inf')
for j, mean_j in enumerate(means):
d = sum((x - y)**2
for x, y in zip(point_i, mean_j))
if d < d_min:
d_min = d
assignments[i] = j
# recompute means
for j in range(k):
cluster = [point for i, point in enumerate(points) if assignments[i] ==
j]
means[j] = mean(cluster)
return means
start with k randomly chosen points
start with no cluster assignments
for each iteration
for each point
def k_means(points, k, num_iters=10):
means = list(random.sample(points, k))
assignments = [None for _ in points]
for _ in range(num_iters):
# assign each point to closest mean
for i, point_i in enumerate(points):
d_min = float('inf')
for j, mean_j in enumerate(means):
d = sum((x - y)**2
for x, y in zip(point_i, mean_j))
if d < d_min:
d_min = d
assignments[i] = j
# recompute means
for j in range(k):
cluster = [point for i, point in enumerate(points) if assignments[i] ==
j]
means[j] = mean(cluster)
return means
start with k randomly chosen points
start with no cluster assignments
for each iteration
for each point
for each mean
def k_means(points, k, num_iters=10):
means = list(random.sample(points, k))
assignments = [None for _ in points]
for _ in range(num_iters):
# assign each point to closest mean
for i, point_i in enumerate(points):
d_min = float('inf')
for j, mean_j in enumerate(means):
d = sum((x - y)**2
for x, y in zip(point_i, mean_j))
if d < d_min:
d_min = d
assignments[i] = j
# recompute means
for j in range(k):
cluster = [point for i, point in enumerate(points) if assignments[i] ==
j]
means[j] = mean(cluster)
return means
start with k randomly chosen points
start with no cluster assignments
for each iteration
for each point
for each mean
compute the distance
def k_means(points, k, num_iters=10):
means = list(random.sample(points, k))
assignments = [None for _ in points]
for _ in range(num_iters):
# assign each point to closest mean
for i, point_i in enumerate(points):
d_min = float('inf')
for j, mean_j in enumerate(means):
d = sum((x - y)**2
for x, y in zip(point_i, mean_j))
if d < d_min:
d_min = d
assignments[i] = j
# recompute means
for j in range(k):
cluster = [point for i, point in enumerate(points) if assignments[i] ==
j]
means[j] = mean(cluster)
return means
start with k randomly chosen points
start with no cluster assignments
for each iteration
for each point
for each mean
compute the distance
assign the point to the cluster of the mean with
the smallest distance
def k_means(points, k, num_iters=10):
means = list(random.sample(points, k))
assignments = [None for _ in points]
for _ in range(num_iters):
# assign each point to closest mean
for i, point_i in enumerate(points):
d_min = float('inf')
for j, mean_j in enumerate(means):
d = sum((x - y)**2
for x, y in zip(point_i, mean_j))
if d < d_min:
d_min = d
assignments[i] = j
# recompute means
for j in range(k):
cluster = [point for i, point in enumerate(points) if assignments[i] ==
j]
means[j] = mean(cluster)
return means
start with k randomly chosen points
start with no cluster assignments
for each iteration
for each point
for each mean
compute the distance
assign the point to the cluster of the mean with
the smallest distance
find the points in each cluster
def k_means(points, k, num_iters=10):
means = list(random.sample(points, k))
assignments = [None for _ in points]
for _ in range(num_iters):
# assign each point to closest mean
for i, point_i in enumerate(points):
d_min = float('inf')
for j, mean_j in enumerate(means):
d = sum((x - y)**2
for x, y in zip(point_i, mean_j))
if d < d_min:
d_min = d
assignments[i] = j
# recompute means
for j in range(k):
cluster = [point for i, point in enumerate(points) if assignments[i] ==
j]
means[j] = mean(cluster)
return means
start with k randomly chosen points
start with no cluster assignments
for each iteration
for each point
for each mean
compute the distance
assign the point to the cluster of the mean with
the smallest distance
find the points in each cluster
and compute the new means
def k_means(points, k, num_iters=10):
means = list(random.sample(points, k))
assignments = [None for _ in points]
for _ in range(num_iters):
# assign each point to closest mean
for i, point_i in enumerate(points):
d_min = float('inf')
for j, mean_j in enumerate(means):
d = sum((x - y)**2
for x, y in zip(point_i, mean_j))
if d < d_min:
d_min = d
assignments[i] = j
# recompute means
for j in range(k):
cluster = [point for i, point in enumerate(points) if assignments[i] ==
j]
means[j] = mean(cluster)
return means
Not impenetrable, but
a lot less helpful than
it could be
def k_means(points, k, num_iters=10):
means = list(random.sample(points, k))
assignments = [None for _ in points]
for _ in range(num_iters):
# assign each point to closest mean
for i, point_i in enumerate(points):
d_min = float('inf')
for j, mean_j in enumerate(means):
d = sum((x - y)**2
for x, y in zip(point_i, mean_j))
if d < d_min:
d_min = d
assignments[i] = j
# recompute means
for j in range(k):
cluster = [point for i, point in enumerate(points) if assignments[i] ==
j]
means[j] = mean(cluster)
return means
Not impenetrable, but
a lot less helpful than
it could be
Can we make it
simpler?
Break Things Down Into Small Functions
def k_means(points, k, num_iters=10):
# start with k of the points as "means"
means = random.sample(points, k)
# and iterate finding new means
for _ in range(num_iters):
means = new_means(points, means)
return means
def new_means(points, means):
# assign points to clusters
# each cluster is just a list of points
clusters = assign_clusters(points, means)
# return the cluster means
return [mean(cluster)
for cluster in clusters]
def assign_clusters(points, means):
# one cluster for each mean
# each cluster starts empty
clusters = [[] for _ in means]
# assign each point to cluster
# corresponding to closest mean
for p in points:
index = closest_index(point, means)
clusters[index].append(point)
return clusters
def closest_index(point, means):
# return index of closest mean
return argmin(distance(point, mean)
for mean in means)
def argmin(xs):
# return index of smallest element
return min(enumerate(xs),
key=lambda pair: pair[1])[0]
To Recap
k_means(points, k, num_iters=10)
mean(points)
k_means(points, k, num_iters=10)
new_means(points, means)
assign_clusters(points, means)
closest_index(point, means)
argmin(xs)
distance(point1, point2)
mean(points)
add(point1, point2)
scalar_multiply(c, point)
As a Pedagogical Tool
Can be used "top down" (as we did here)
Implement high-level logic
Then implement the details
Nice for exposition
Can also be used "bottom up"
Implement small pieces
Build up to high-level logic
Good for workshops
Example: Decision Trees
Want to predict whether
a given Meetup is worth
attending (True) or not
(False)
Inputs are dictionaries
describing each Meetup
{ "group" : "DAML",
"date" : "2015-06-23",
"beer" : "free",
"food" : "dim sum",
"speaker" : "@joelgrus",
"location" : "Google",
"topic" : "shameless self-promotion" }
{ "group" : "Seattle Atheists",
"date" : "2015-06-23",
"location" : "Round the Table",
"beer" : "none",
"food" : "none",
"topic" : "Godless Game Night" }
Example: Decision Trees
{ "group" : "DAML",
"date" : "2015-06-23",
"beer" : "free",
"food" : "dim sum",
"speaker" : "@joelgrus",
"location" : "Google",
"topic" : "shameless self-promotion" }
{ "group" : "Seattle Atheists",
"date" : "2015-06-23",
"location" : "Round the Table",
"beer" : "none",
"food" : "none",
"topic" : "Godless Game Night" }
beer?
True False
speaker?
True False
free none
paid
@jakevdp @joelgrus
Example: Decision Trees
class LeafNode:
def __init__(self, prediction):
self.prediction = prediction
def predict(self, input_dict):
return self.prediction
class DecisionNode:
def __init__(self, attribute, subtree_dict):
self.attribute = attribute
self.subtree_dict = subtree_dict
def predict(self, input_dict):
value = input_dict.get(self.attribute)
subtree = self.subtree_dict[value]
return subtree.predict(input)
Example: Decision Trees
Again inspiration from functional programming:
type Input = Map.Map String String
data Tree = Predict Bool
| Subtrees String (Map.Map String Tree)
look at the "beer" entry
a map from each possible
"beer" value to a subtree
always predict a specific value
Example: Decision Trees
type Input = Map.Map String String
data Tree = Predict Bool
| Subtrees String (Map.Map String Tree)
predict :: Tree -> Input -> Bool
predict (Predict b) _ = b
predict (Subtrees a subtrees) input =
predict subtree input
where subtree = subtrees Map.! (input Map.!
Example: Decision Trees
type Input = Map.Map String String
data Tree = Predict Bool
| Subtrees String (Map.Map String Tree)
We can do the same,
we'll say a decision tree is either
True
False
(attribute, subtree_dict)
("beer",
{ "free" : True,
"none" : False,
"paid" : ("speaker",
{...})})
predict :: Tree -> Input -> Bool
predict (Predict b) _ = b
predict (Subtrees a subtrees) input =
predict subtree input
where subtree = subtrees Map.! (input Map.! a)
Example: Decision Trees
def predict(tree, input_dict):
# leaf node predicts itself
if tree in (True, False):
return tree
else:
# destructure tree
attribute, subtree_dict = tree
# find appropriate subtree
value = input_dict[attribute]
subtree = subtree_dict[value]
# classify using subtree
return predict(subtree, input_dict)
Not Just For Data Science
In Conclusion
Teaching data science is fun, if you're smart
about it
Learning data science is fun, if you're smart
about it
Writing a book is not that much fun
Having written a book is pretty fun
Making slides is actually kind of fun
Functional programming is a lot of fun
Thanks!
@joelgrus
joelgrus@gmail.com
joelgrus.com

More Related Content

What's hot

Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Simplilearn
 
Introduction to stack
Introduction to stackIntroduction to stack
Introduction to stackvaibhav2910
 
Python Machine Learning Tutorial | Machine Learning Algorithms | Python Train...
Python Machine Learning Tutorial | Machine Learning Algorithms | Python Train...Python Machine Learning Tutorial | Machine Learning Algorithms | Python Train...
Python Machine Learning Tutorial | Machine Learning Algorithms | Python Train...Edureka!
 
Introduction to Python for Data Science
Introduction to Python for Data ScienceIntroduction to Python for Data Science
Introduction to Python for Data ScienceArc & Codementor
 
Web scraping in python
Web scraping in python Web scraping in python
Web scraping in python Viren Rajput
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data AnalyticsEdureka!
 
Loops and functions in r
Loops and functions in rLoops and functions in r
Loops and functions in rmanikanta361
 
Object Oriented Programing JAVA presentaion
Object Oriented Programing JAVA presentaionObject Oriented Programing JAVA presentaion
Object Oriented Programing JAVA presentaionPritom Chaki
 
358 33 powerpoint-slides_4-introduction-data-structures_chapter-4
358 33 powerpoint-slides_4-introduction-data-structures_chapter-4358 33 powerpoint-slides_4-introduction-data-structures_chapter-4
358 33 powerpoint-slides_4-introduction-data-structures_chapter-4sumitbardhan
 
Python Sequence | Python Lists | Python Sets & Dictionary | Python Strings | ...
Python Sequence | Python Lists | Python Sets & Dictionary | Python Strings | ...Python Sequence | Python Lists | Python Sets & Dictionary | Python Strings | ...
Python Sequence | Python Lists | Python Sets & Dictionary | Python Strings | ...Edureka!
 
String Builder & String Buffer (Java Programming)
String Builder & String Buffer (Java Programming)String Builder & String Buffer (Java Programming)
String Builder & String Buffer (Java Programming)Anwar Hasan Shuvo
 
Data Science Full Course | Edureka
Data Science Full Course | EdurekaData Science Full Course | Edureka
Data Science Full Course | EdurekaEdureka!
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data sciencebhavesh lande
 

What's hot (20)

Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
 
Introduction to stack
Introduction to stackIntroduction to stack
Introduction to stack
 
Python Machine Learning Tutorial | Machine Learning Algorithms | Python Train...
Python Machine Learning Tutorial | Machine Learning Algorithms | Python Train...Python Machine Learning Tutorial | Machine Learning Algorithms | Python Train...
Python Machine Learning Tutorial | Machine Learning Algorithms | Python Train...
 
Beautiful soup
Beautiful soupBeautiful soup
Beautiful soup
 
07 java collection
07 java collection07 java collection
07 java collection
 
Data literacy
Data literacyData literacy
Data literacy
 
MatplotLib.pptx
MatplotLib.pptxMatplotLib.pptx
MatplotLib.pptx
 
Introduction to Python for Data Science
Introduction to Python for Data ScienceIntroduction to Python for Data Science
Introduction to Python for Data Science
 
Web scraping in python
Web scraping in python Web scraping in python
Web scraping in python
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data Analytics
 
Loops and functions in r
Loops and functions in rLoops and functions in r
Loops and functions in r
 
Java ppt
Java pptJava ppt
Java ppt
 
Object Oriented Programing JAVA presentaion
Object Oriented Programing JAVA presentaionObject Oriented Programing JAVA presentaion
Object Oriented Programing JAVA presentaion
 
358 33 powerpoint-slides_4-introduction-data-structures_chapter-4
358 33 powerpoint-slides_4-introduction-data-structures_chapter-4358 33 powerpoint-slides_4-introduction-data-structures_chapter-4
358 33 powerpoint-slides_4-introduction-data-structures_chapter-4
 
Lists
ListsLists
Lists
 
Python Sequence | Python Lists | Python Sets & Dictionary | Python Strings | ...
Python Sequence | Python Lists | Python Sets & Dictionary | Python Strings | ...Python Sequence | Python Lists | Python Sets & Dictionary | Python Strings | ...
Python Sequence | Python Lists | Python Sets & Dictionary | Python Strings | ...
 
String Builder & String Buffer (Java Programming)
String Builder & String Buffer (Java Programming)String Builder & String Buffer (Java Programming)
String Builder & String Buffer (Java Programming)
 
Java Streams
Java StreamsJava Streams
Java Streams
 
Data Science Full Course | Edureka
Data Science Full Course | EdurekaData Science Full Course | Edureka
Data Science Full Course | Edureka
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data science
 

Viewers also liked

F# for startups v2
F# for startups v2F# for startups v2
F# for startups v2joelgrus
 
T shirts, feminism, parenting, and data science
T shirts, feminism, parenting, and data scienceT shirts, feminism, parenting, and data science
T shirts, feminism, parenting, and data sciencejoelgrus
 
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016Seattle DAML meetup
 
F# for startups
F# for startupsF# for startups
F# for startupsjoelgrus
 
Karin Strauss - DNA Storage, July 2016
Karin Strauss - DNA Storage, July 2016Karin Strauss - DNA Storage, July 2016
Karin Strauss - DNA Storage, July 2016Seattle DAML meetup
 
Numbers game
Numbers gameNumbers game
Numbers gamejoelgrus
 
Secrets of Fire Truck Society - Slides for Ignite Strata 2013
Secrets of Fire Truck Society - Slides for Ignite Strata 2013Secrets of Fire Truck Society - Slides for Ignite Strata 2013
Secrets of Fire Truck Society - Slides for Ignite Strata 2013joelgrus
 

Viewers also liked (7)

F# for startups v2
F# for startups v2F# for startups v2
F# for startups v2
 
T shirts, feminism, parenting, and data science
T shirts, feminism, parenting, and data scienceT shirts, feminism, parenting, and data science
T shirts, feminism, parenting, and data science
 
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
 
F# for startups
F# for startupsF# for startups
F# for startups
 
Karin Strauss - DNA Storage, July 2016
Karin Strauss - DNA Storage, July 2016Karin Strauss - DNA Storage, July 2016
Karin Strauss - DNA Storage, July 2016
 
Numbers game
Numbers gameNumbers game
Numbers game
 
Secrets of Fire Truck Society - Slides for Ignite Strata 2013
Secrets of Fire Truck Society - Slides for Ignite Strata 2013Secrets of Fire Truck Society - Slides for Ignite Strata 2013
Secrets of Fire Truck Society - Slides for Ignite Strata 2013
 

Similar to Joel Grus Seattle DAML Meetup Data Science Presentation

Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
Data Science, what even?!
Data Science, what even?!Data Science, what even?!
Data Science, what even?!David Coallier
 
How to Build a Semantic Search System
How to Build a Semantic Search SystemHow to Build a Semantic Search System
How to Build a Semantic Search SystemTrey Grainger
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
 
Get connected with python
Get connected with pythonGet connected with python
Get connected with pythonJan Kroon
 
Intro to Python for Data Science
Intro to Python for Data ScienceIntro to Python for Data Science
Intro to Python for Data ScienceTJ Stalcup
 
Intro to Python for Data Science
Intro to Python for Data ScienceIntro to Python for Data Science
Intro to Python for Data ScienceTJ Stalcup
 
Data Science, what even...
Data Science, what even...Data Science, what even...
Data Science, what even...David Coallier
 
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify Dataconomy Media
 
The Data Janitor Returns | Daniel Molnar | DN18
The Data Janitor Returns | Daniel Molnar | DN18The Data Janitor Returns | Daniel Molnar | DN18
The Data Janitor Returns | Daniel Molnar | DN18DataconomyGmbH
 
Sztuka czytania między wierszami - R i Data mining
Sztuka czytania między wierszami - R i Data miningSztuka czytania między wierszami - R i Data mining
Sztuka czytania między wierszami - R i Data miningKatarzyna Mrowca
 
AI Is Changing The Way We Look At Data Science
AI Is Changing The Way We Look At Data ScienceAI Is Changing The Way We Look At Data Science
AI Is Changing The Way We Look At Data ScienceAbe
 
Sql saturday el salvador 2016 - Me, A Data Scientist?
Sql saturday el salvador 2016 - Me, A Data Scientist?Sql saturday el salvador 2016 - Me, A Data Scientist?
Sql saturday el salvador 2016 - Me, A Data Scientist?Fabricio Quintanilla
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data ScienceSanghamitra Deb
 

Similar to Joel Grus Seattle DAML Meetup Data Science Presentation (20)

Data science presentation
Data science presentationData science presentation
Data science presentation
 
Data Science, what even?!
Data Science, what even?!Data Science, what even?!
Data Science, what even?!
 
How to Build a Semantic Search System
How to Build a Semantic Search SystemHow to Build a Semantic Search System
How to Build a Semantic Search System
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
20151020 Metis
20151020 Metis20151020 Metis
20151020 Metis
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
 
Get connected with python
Get connected with pythonGet connected with python
Get connected with python
 
R & Data mining in action
R & Data mining in actionR & Data mining in action
R & Data mining in action
 
Intro to Python for Data Science
Intro to Python for Data ScienceIntro to Python for Data Science
Intro to Python for Data Science
 
Intro to Python for Data Science
Intro to Python for Data ScienceIntro to Python for Data Science
Intro to Python for Data Science
 
Data Science, what even...
Data Science, what even...Data Science, what even...
Data Science, what even...
 
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
 
The Data Janitor Returns | Daniel Molnar | DN18
The Data Janitor Returns | Daniel Molnar | DN18The Data Janitor Returns | Daniel Molnar | DN18
The Data Janitor Returns | Daniel Molnar | DN18
 
Sztuka czytania między wierszami - R i Data mining
Sztuka czytania między wierszami - R i Data miningSztuka czytania między wierszami - R i Data mining
Sztuka czytania między wierszami - R i Data mining
 
AI Is Changing The Way We Look At Data Science
AI Is Changing The Way We Look At Data ScienceAI Is Changing The Way We Look At Data Science
AI Is Changing The Way We Look At Data Science
 
Big Data made easy with a Spark
Big Data made easy with a SparkBig Data made easy with a Spark
Big Data made easy with a Spark
 
Machine learning with Google machine learning APIs - Puppy or Muffin?
Machine learning with Google machine learning APIs - Puppy or Muffin?Machine learning with Google machine learning APIs - Puppy or Muffin?
Machine learning with Google machine learning APIs - Puppy or Muffin?
 
Sql saturday el salvador 2016 - Me, A Data Scientist?
Sql saturday el salvador 2016 - Me, A Data Scientist?Sql saturday el salvador 2016 - Me, A Data Scientist?
Sql saturday el salvador 2016 - Me, A Data Scientist?
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data Science
 
Data science
Data scienceData science
Data science
 

More from Seattle DAML meetup

Understanding disparities using the American Community Survey - Sean Green, M...
Understanding disparities using the American Community Survey - Sean Green, M...Understanding disparities using the American Community Survey - Sean Green, M...
Understanding disparities using the American Community Survey - Sean Green, M...Seattle DAML meetup
 
Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016
Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016
Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016Seattle DAML meetup
 
Frequent Pattern Mining - Krishna Sridhar, Feb 2016
Frequent Pattern Mining - Krishna Sridhar, Feb 2016Frequent Pattern Mining - Krishna Sridhar, Feb 2016
Frequent Pattern Mining - Krishna Sridhar, Feb 2016Seattle DAML meetup
 
Streaming Hypothesis Reasoning - William Smith, Jan 2016
Streaming Hypothesis Reasoning - William Smith, Jan 2016Streaming Hypothesis Reasoning - William Smith, Jan 2016
Streaming Hypothesis Reasoning - William Smith, Jan 2016Seattle DAML meetup
 
Been Kim - Interpretable machine learning, Nov 2015
Been Kim - Interpretable machine learning, Nov 2015Been Kim - Interpretable machine learning, Nov 2015
Been Kim - Interpretable machine learning, Nov 2015Seattle DAML meetup
 
Hunting criminals with hybrid analytics -- October 2015
Hunting criminals with hybrid analytics -- October 2015Hunting criminals with hybrid analytics -- October 2015
Hunting criminals with hybrid analytics -- October 2015Seattle DAML meetup
 
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...Seattle DAML meetup
 
Adventures in Data Visualization - Jeff Heer, May 2015
Adventures in Data Visualization - Jeff Heer, May 2015Adventures in Data Visualization - Jeff Heer, May 2015
Adventures in Data Visualization - Jeff Heer, May 2015Seattle DAML meetup
 
Scaling decision trees - George Murray, July 2015
Scaling decision trees - George Murray, July 2015Scaling decision trees - George Murray, July 2015
Scaling decision trees - George Murray, July 2015Seattle DAML meetup
 

More from Seattle DAML meetup (9)

Understanding disparities using the American Community Survey - Sean Green, M...
Understanding disparities using the American Community Survey - Sean Green, M...Understanding disparities using the American Community Survey - Sean Green, M...
Understanding disparities using the American Community Survey - Sean Green, M...
 
Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016
Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016
Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016
 
Frequent Pattern Mining - Krishna Sridhar, Feb 2016
Frequent Pattern Mining - Krishna Sridhar, Feb 2016Frequent Pattern Mining - Krishna Sridhar, Feb 2016
Frequent Pattern Mining - Krishna Sridhar, Feb 2016
 
Streaming Hypothesis Reasoning - William Smith, Jan 2016
Streaming Hypothesis Reasoning - William Smith, Jan 2016Streaming Hypothesis Reasoning - William Smith, Jan 2016
Streaming Hypothesis Reasoning - William Smith, Jan 2016
 
Been Kim - Interpretable machine learning, Nov 2015
Been Kim - Interpretable machine learning, Nov 2015Been Kim - Interpretable machine learning, Nov 2015
Been Kim - Interpretable machine learning, Nov 2015
 
Hunting criminals with hybrid analytics -- October 2015
Hunting criminals with hybrid analytics -- October 2015Hunting criminals with hybrid analytics -- October 2015
Hunting criminals with hybrid analytics -- October 2015
 
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
 
Adventures in Data Visualization - Jeff Heer, May 2015
Adventures in Data Visualization - Jeff Heer, May 2015Adventures in Data Visualization - Jeff Heer, May 2015
Adventures in Data Visualization - Jeff Heer, May 2015
 
Scaling decision trees - George Murray, July 2015
Scaling decision trees - George Murray, July 2015Scaling decision trees - George Murray, July 2015
Scaling decision trees - George Murray, July 2015
 

Recently uploaded

HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 

Recently uploaded (20)

HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 

Joel Grus Seattle DAML Meetup Data Science Presentation

  • 1. Joel Grus Seattle DAML Meetup June 23, 2015 Data Science from Scratch
  • 2. About me Old-school DAML-er Wrote a book ----------> SWE at Google Formerly data science at VoloMetrix, Decide, Farecast
  • 4. The Road to Data Science My
  • 5.
  • 7.
  • 8.
  • 10. Data Science Is A Broad Field Some Stuff More Stuff Even More Stuff Data Science People who think they're data scientists, but they're not really data scientists People who are a danger to everyone around them People who say "machine learnings"
  • 11.
  • 12. a data scientist should be able to JOEL GRUS
  • 13. a data scientist should be able to run a regression, JOEL GRUS
  • 14. a data scientist should be able to run a regression, write a sql query, JOEL GRUS
  • 15. a data scientist should be able to run a regression, write a sql query, scrape a web site, JOEL GRUS
  • 16. a data scientist should be able to run a regression, write a sql query, scrape a web site, design an experiment, JOEL GRUS
  • 17. a data scientist should be able to run a regression, write a sql query, scrape a web site, design an experiment, factor matrices, JOEL GRUS
  • 18. a data scientist should be able to run a regression, write a sql query, scrape a web site, design an experiment, factor matrices, use a data frame, JOEL GRUS
  • 19. a data scientist should be able to run a regression, write a sql query, scrape a web site, design an experiment, factor matrices, use a data frame, pretend to understand deep learning, JOEL GRUS
  • 20. a data scientist should be able to run a regression, write a sql query, scrape a web site, design an experiment, factor matrices, use a data frame, pretend to understand deep learning, steal from the d3 gallery, JOEL GRUS
  • 21. a data scientist should be able to run a regression, write a sql query, scrape a web site, design an experiment, factor matrices, use a data frame, pretend to understand deep learning, steal from the d3 gallery, argue r versus python, JOEL GRUS
  • 22. a data scientist should be able to run a regression, write a sql query, scrape a web site, design an experiment, factor matrices, use a data frame, pretend to understand deep learning, steal from the d3 gallery, argue r versus python, think in mapreduce, JOEL GRUS
  • 23. a data scientist should be able to run a regression, write a sql query, scrape a web site, design an experiment, factor matrices, use a data frame, pretend to understand deep learning, steal from the d3 gallery, argue r versus python, think in mapreduce, update a prior, JOEL GRUS
  • 24. a data scientist should be able to run a regression, write a sql query, scrape a web site, design an experiment, factor matrices, use a data frame, pretend to understand deep learning, steal from the d3 gallery, argue r versus python, think in mapreduce, update a prior, build a dashboard, JOEL GRUS
  • 25. a data scientist should be able to run a regression, write a sql query, scrape a web site, design an experiment, factor matrices, use a data frame, pretend to understand deep learning, steal from the d3 gallery, argue r versus python, think in mapreduce, update a prior, build a dashboard, clean up messy data, JOEL GRUS
  • 26. a data scientist should be able to run a regression, write a sql query, scrape a web site, design an experiment, factor matrices, use a data frame, pretend to understand deep learning, steal from the d3 gallery, argue r versus python, think in mapreduce, update a prior, build a dashboard, clean up messy data, test a hypothesis, JOEL GRUS
  • 27. a data scientist should be able to run a regression, write a sql query, scrape a web site, design an experiment, factor matrices, use a data frame, pretend to understand deep learning, steal from the d3 gallery, argue r versus python, think in mapreduce, update a prior, build a dashboard, clean up messy data, test a hypothesis, talk to a businessperson, JOEL GRUS
  • 28. a data scientist should be able to run a regression, write a sql query, scrape a web site, design an experiment, factor matrices, use a data frame, pretend to understand deep learning, steal from the d3 gallery, argue r versus python, think in mapreduce, update a prior, build a dashboard, clean up messy data, test a hypothesis, talk to a businessperson, script a shell, JOEL GRUS
  • 29. a data scientist should be able to run a regression, write a sql query, scrape a web site, design an experiment, factor matrices, use a data frame, pretend to understand deep learning, steal from the d3 gallery, argue r versus python, think in mapreduce, update a prior, build a dashboard, clean up messy data, test a hypothesis, talk to a businessperson, script a shell, code on a whiteboard, JOEL GRUS
  • 30. a data scientist should be able to run a regression, write a sql query, scrape a web site, design an experiment, factor matrices, use a data frame, pretend to understand deep learning, steal from the d3 gallery, argue r versus python, think in mapreduce, update a prior, build a dashboard, clean up messy data, test a hypothesis, talk to a businessperson, script a shell, code on a whiteboard, hack a p-value, JOEL GRUS
  • 31. a data scientist should be able to run a regression, write a sql query, scrape a web site, design an experiment, factor matrices, use a data frame, pretend to understand deep learning, steal from the d3 gallery, argue r versus python, think in mapreduce, update a prior, build a dashboard, clean up messy data, test a hypothesis, talk to a businessperson, script a shell, code on a whiteboard, hack a p-value, machine-learn a model. JOEL GRUS
  • 32. a data scientist should be able to run a regression, write a sql query, scrape a web site, design an experiment, factor matrices, use a data frame, pretend to understand deep learning, steal from the d3 gallery, argue r versus python, think in mapreduce, update a prior, build a dashboard, clean up messy data, test a hypothesis, talk to a businessperson, script a shell, code on a whiteboard, hack a p-value, machine-learn a model. specialization is for engineers. JOEL GRUS
  • 33. A lot of stuff!
  • 34. What Are Hiring Managers Looking For?
  • 35. What Are Hiring Managers Looking For? Let's check LinkedIn
  • 36.
  • 37. a data scientist should be able to run a regression, write a sql query, scrape a web site, design an experiment, factor matrices, use a data frame, pretend to understand deep learning, steal from the d3 gallery, argue r versus python, think in mapreduce, update a prior, build a dashboard, clean up messy data, test a hypothesis, talk to a businessperson, script a shell, code on a whiteboard, hack a p-value, machine-learn a model. specialization is for engineers. JOEL GRUS grad students!
  • 39. I want to be a data scientist. Great!
  • 40. The Math Way I like to start with matrix decompositions. How's your measure theory?
  • 41. The Math Way The Good: Solid foundation Math is the noblest known pursuit
  • 42. The Math Way The Good: Solid foundation Math is the noblest known pursuit The Bad: Some weirdos don't think math is fun Can be pretty forbidding Can miss practical skills
  • 43. So, did you count the words in that document? No, but I have an elegant proof that the number of words is finite!
  • 44. OK, Let's Try Again
  • 45. I want to be a data scientist. Great!
  • 46. The Tools Way Here's a list of the 25 libraries you really ought to know. How's your R programming?
  • 47. The Tools Way The Good: Don't have to understand the math Practical Can get started doing fun stuff right away
  • 48. The Tools Way The Good: Don't have to understand the math Practical Can get started doing fun stuff right away The Bad: Don't have to understand the math Can get started doing bad science right away
  • 49. So, did you build that model? Yes, and it fits the training data almost perfectly!
  • 50. OK, Maybe Not That Either
  • 52. Example: k-means clustering Unsupervised machine learning technique Given a set of points, group them into k clusters in a way that minimizes the within-cluster sum- of-squares i.e. in a way such that the clusters are as "small" as possible (for a particular conception of "small")
  • 53.
  • 56. The Tools Way # a 2-dimensional example x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2), matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2)) colnames(x) <- c("x", "y") (cl <- kmeans(x, 2)) plot(x, col = cl$cluster) points(cl$centers, col = 1:2, pch = 8, cex = 2)
  • 57. The Tools Way >>> from sklearn import cluster, datasets >>> iris = datasets.load_iris() >>> X_iris = iris.data >>> y_iris = iris.target >>> k_means = cluster.KMeans(n_clusters=3) >>> k_means.fit(X_iris) KMeans(copy_x=True, init='k-means++', ... >>> print(k_means.labels_[::10]) [1 1 1 1 1 0 0 0 0 0 2 2 2 2 2] >>> print(y_iris[::10]) [0 0 0 0 0 1 1 1 1 1 2 2 2 2 2]
  • 58. So What To Do?
  • 60. Data Science from Scratch This is to certify that Joel Grus has honorably completed the course of study outlined in the book Data Science from Scratch: First Principles with Python, and is entitled to all the Rights, Privileges, and Honors thereunto appertaining. Joel GrusJune 23, 2015 Certificate Programs?
  • 62. Learning By Building You don't really understand something until you build it For example, I understand garbage disposals much better now that I had to replace one that was leaking water all over my kitchen More relevantly, I thought I understood hypothesis testing, until I tried to write a book chapter + code about it.
  • 64. Break Things Down Into Small Functions
  • 65. So you don't end up with something like this
  • 67. Example: k-means clustering Given a set of points, group them into k clusters in a way that minimizes the within-cluster sum- of-squares Global optimization is hard, so use a greedy iterative approach
  • 68. Fun Motivation: Image Posterization Image consists of pixels Each pixel is a triplet (R,G,B) Imagine pixels as points in space Find k clusters of pixels Recolor each pixel to its cluster mean I think it's fun, anyway 8 colors
  • 69. Example: k-means clustering given some points, find k clusters by choose k "means" repeat: assign each point to cluster of closest "mean" recompute mean of each cluster sounds simple! let's code!
  • 70. def k_means(points, k, num_iters=10): means = list(random.sample(points, k)) assignments = [None for _ in points] for _ in range(num_iters): # assign each point to closest mean for i, point_i in enumerate(points): d_min = float('inf') for j, mean_j in enumerate(means): d = sum((x - y)**2 for x, y in zip(point_i, mean_j)) if d < d_min: d_min = d assignments[i] = j # recompute means for j in range(k): cluster = [point for i, point in enumerate(points) if assignments[i] == j] means[j] = mean(cluster) return means
  • 71. def k_means(points, k, num_iters=10): means = list(random.sample(points, k)) assignments = [None for _ in points] for _ in range(num_iters): # assign each point to closest mean for i, point_i in enumerate(points): d_min = float('inf') for j, mean_j in enumerate(means): d = sum((x - y)**2 for x, y in zip(point_i, mean_j)) if d < d_min: d_min = d assignments[i] = j # recompute means for j in range(k): cluster = [point for i, point in enumerate(points) if assignments[i] == j] means[j] = mean(cluster) return means start with k randomly chosen points
  • 72. def k_means(points, k, num_iters=10): means = list(random.sample(points, k)) assignments = [None for _ in points] for _ in range(num_iters): # assign each point to closest mean for i, point_i in enumerate(points): d_min = float('inf') for j, mean_j in enumerate(means): d = sum((x - y)**2 for x, y in zip(point_i, mean_j)) if d < d_min: d_min = d assignments[i] = j # recompute means for j in range(k): cluster = [point for i, point in enumerate(points) if assignments[i] == j] means[j] = mean(cluster) return means start with k randomly chosen points start with no cluster assignments
  • 73. def k_means(points, k, num_iters=10): means = list(random.sample(points, k)) assignments = [None for _ in points] for _ in range(num_iters): # assign each point to closest mean for i, point_i in enumerate(points): d_min = float('inf') for j, mean_j in enumerate(means): d = sum((x - y)**2 for x, y in zip(point_i, mean_j)) if d < d_min: d_min = d assignments[i] = j # recompute means for j in range(k): cluster = [point for i, point in enumerate(points) if assignments[i] == j] means[j] = mean(cluster) return means start with k randomly chosen points start with no cluster assignments for each iteration
  • 74. def k_means(points, k, num_iters=10): means = list(random.sample(points, k)) assignments = [None for _ in points] for _ in range(num_iters): # assign each point to closest mean for i, point_i in enumerate(points): d_min = float('inf') for j, mean_j in enumerate(means): d = sum((x - y)**2 for x, y in zip(point_i, mean_j)) if d < d_min: d_min = d assignments[i] = j # recompute means for j in range(k): cluster = [point for i, point in enumerate(points) if assignments[i] == j] means[j] = mean(cluster) return means start with k randomly chosen points start with no cluster assignments for each iteration for each point
  • 75. def k_means(points, k, num_iters=10): means = list(random.sample(points, k)) assignments = [None for _ in points] for _ in range(num_iters): # assign each point to closest mean for i, point_i in enumerate(points): d_min = float('inf') for j, mean_j in enumerate(means): d = sum((x - y)**2 for x, y in zip(point_i, mean_j)) if d < d_min: d_min = d assignments[i] = j # recompute means for j in range(k): cluster = [point for i, point in enumerate(points) if assignments[i] == j] means[j] = mean(cluster) return means start with k randomly chosen points start with no cluster assignments for each iteration for each point for each mean
  • 76. def k_means(points, k, num_iters=10): means = list(random.sample(points, k)) assignments = [None for _ in points] for _ in range(num_iters): # assign each point to closest mean for i, point_i in enumerate(points): d_min = float('inf') for j, mean_j in enumerate(means): d = sum((x - y)**2 for x, y in zip(point_i, mean_j)) if d < d_min: d_min = d assignments[i] = j # recompute means for j in range(k): cluster = [point for i, point in enumerate(points) if assignments[i] == j] means[j] = mean(cluster) return means start with k randomly chosen points start with no cluster assignments for each iteration for each point for each mean compute the distance
  • 77. def k_means(points, k, num_iters=10): means = list(random.sample(points, k)) assignments = [None for _ in points] for _ in range(num_iters): # assign each point to closest mean for i, point_i in enumerate(points): d_min = float('inf') for j, mean_j in enumerate(means): d = sum((x - y)**2 for x, y in zip(point_i, mean_j)) if d < d_min: d_min = d assignments[i] = j # recompute means for j in range(k): cluster = [point for i, point in enumerate(points) if assignments[i] == j] means[j] = mean(cluster) return means start with k randomly chosen points start with no cluster assignments for each iteration for each point for each mean compute the distance assign the point to the cluster of the mean with the smallest distance
  • 78. def k_means(points, k, num_iters=10): means = list(random.sample(points, k)) assignments = [None for _ in points] for _ in range(num_iters): # assign each point to closest mean for i, point_i in enumerate(points): d_min = float('inf') for j, mean_j in enumerate(means): d = sum((x - y)**2 for x, y in zip(point_i, mean_j)) if d < d_min: d_min = d assignments[i] = j # recompute means for j in range(k): cluster = [point for i, point in enumerate(points) if assignments[i] == j] means[j] = mean(cluster) return means start with k randomly chosen points start with no cluster assignments for each iteration for each point for each mean compute the distance assign the point to the cluster of the mean with the smallest distance find the points in each cluster
  • 79. def k_means(points, k, num_iters=10): means = list(random.sample(points, k)) assignments = [None for _ in points] for _ in range(num_iters): # assign each point to closest mean for i, point_i in enumerate(points): d_min = float('inf') for j, mean_j in enumerate(means): d = sum((x - y)**2 for x, y in zip(point_i, mean_j)) if d < d_min: d_min = d assignments[i] = j # recompute means for j in range(k): cluster = [point for i, point in enumerate(points) if assignments[i] == j] means[j] = mean(cluster) return means start with k randomly chosen points start with no cluster assignments for each iteration for each point for each mean compute the distance assign the point to the cluster of the mean with the smallest distance find the points in each cluster and compute the new means
  • 80. def k_means(points, k, num_iters=10): means = list(random.sample(points, k)) assignments = [None for _ in points] for _ in range(num_iters): # assign each point to closest mean for i, point_i in enumerate(points): d_min = float('inf') for j, mean_j in enumerate(means): d = sum((x - y)**2 for x, y in zip(point_i, mean_j)) if d < d_min: d_min = d assignments[i] = j # recompute means for j in range(k): cluster = [point for i, point in enumerate(points) if assignments[i] == j] means[j] = mean(cluster) return means Not impenetrable, but a lot less helpful than it could be
  • 81. def k_means(points, k, num_iters=10): means = list(random.sample(points, k)) assignments = [None for _ in points] for _ in range(num_iters): # assign each point to closest mean for i, point_i in enumerate(points): d_min = float('inf') for j, mean_j in enumerate(means): d = sum((x - y)**2 for x, y in zip(point_i, mean_j)) if d < d_min: d_min = d assignments[i] = j # recompute means for j in range(k): cluster = [point for i, point in enumerate(points) if assignments[i] == j] means[j] = mean(cluster) return means Not impenetrable, but a lot less helpful than it could be Can we make it simpler?
  • 82. Break Things Down Into Small Functions
  • 83. def k_means(points, k, num_iters=10): # start with k of the points as "means" means = random.sample(points, k) # and iterate finding new means for _ in range(num_iters): means = new_means(points, means) return means
  • 84. def new_means(points, means): # assign points to clusters # each cluster is just a list of points clusters = assign_clusters(points, means) # return the cluster means return [mean(cluster) for cluster in clusters]
  • 85. def assign_clusters(points, means): # one cluster for each mean # each cluster starts empty clusters = [[] for _ in means] # assign each point to cluster # corresponding to closest mean for p in points: index = closest_index(point, means) clusters[index].append(point) return clusters
  • 86. def closest_index(point, means): # return index of closest mean return argmin(distance(point, mean) for mean in means) def argmin(xs): # return index of smallest element return min(enumerate(xs), key=lambda pair: pair[1])[0]
  • 87. To Recap k_means(points, k, num_iters=10) mean(points) k_means(points, k, num_iters=10) new_means(points, means) assign_clusters(points, means) closest_index(point, means) argmin(xs) distance(point1, point2) mean(points) add(point1, point2) scalar_multiply(c, point)
  • 88. As a Pedagogical Tool Can be used "top down" (as we did here) Implement high-level logic Then implement the details Nice for exposition Can also be used "bottom up" Implement small pieces Build up to high-level logic Good for workshops
  • 89. Example: Decision Trees Want to predict whether a given Meetup is worth attending (True) or not (False) Inputs are dictionaries describing each Meetup { "group" : "DAML", "date" : "2015-06-23", "beer" : "free", "food" : "dim sum", "speaker" : "@joelgrus", "location" : "Google", "topic" : "shameless self-promotion" } { "group" : "Seattle Atheists", "date" : "2015-06-23", "location" : "Round the Table", "beer" : "none", "food" : "none", "topic" : "Godless Game Night" }
  • 90. Example: Decision Trees { "group" : "DAML", "date" : "2015-06-23", "beer" : "free", "food" : "dim sum", "speaker" : "@joelgrus", "location" : "Google", "topic" : "shameless self-promotion" } { "group" : "Seattle Atheists", "date" : "2015-06-23", "location" : "Round the Table", "beer" : "none", "food" : "none", "topic" : "Godless Game Night" } beer? True False speaker? True False free none paid @jakevdp @joelgrus
  • 91. Example: Decision Trees class LeafNode: def __init__(self, prediction): self.prediction = prediction def predict(self, input_dict): return self.prediction class DecisionNode: def __init__(self, attribute, subtree_dict): self.attribute = attribute self.subtree_dict = subtree_dict def predict(self, input_dict): value = input_dict.get(self.attribute) subtree = self.subtree_dict[value] return subtree.predict(input)
  • 92. Example: Decision Trees Again inspiration from functional programming: type Input = Map.Map String String data Tree = Predict Bool | Subtrees String (Map.Map String Tree) look at the "beer" entry a map from each possible "beer" value to a subtree always predict a specific value
  • 93. Example: Decision Trees type Input = Map.Map String String data Tree = Predict Bool | Subtrees String (Map.Map String Tree) predict :: Tree -> Input -> Bool predict (Predict b) _ = b predict (Subtrees a subtrees) input = predict subtree input where subtree = subtrees Map.! (input Map.!
  • 94. Example: Decision Trees type Input = Map.Map String String data Tree = Predict Bool | Subtrees String (Map.Map String Tree) We can do the same, we'll say a decision tree is either True False (attribute, subtree_dict) ("beer", { "free" : True, "none" : False, "paid" : ("speaker", {...})})
  • 95. predict :: Tree -> Input -> Bool predict (Predict b) _ = b predict (Subtrees a subtrees) input = predict subtree input where subtree = subtrees Map.! (input Map.! a) Example: Decision Trees def predict(tree, input_dict): # leaf node predicts itself if tree in (True, False): return tree else: # destructure tree attribute, subtree_dict = tree # find appropriate subtree value = input_dict[attribute] subtree = subtree_dict[value] # classify using subtree return predict(subtree, input_dict)
  • 96. Not Just For Data Science
  • 97. In Conclusion Teaching data science is fun, if you're smart about it Learning data science is fun, if you're smart about it Writing a book is not that much fun Having written a book is pretty fun Making slides is actually kind of fun Functional programming is a lot of fun

Editor's Notes

  1. hedge fund jerks
  2. sql jockeys
  3. I can do some of these
  4. I can do some of these
  5. I can do some of these
  6. I can do some of these
  7. I can do some of these
  8. I can do some of these
  9. I can do some of these
  10. I can do some of these
  11. I can do some of these
  12. I can do some of these
  13. I can do some of these
  14. I can do some of these
  15. I can do some of these
  16. I can do some of these
  17. I can do some of these
  18. I can do some of these
  19. I can do some of these
  20. I can do some of these
  21. I can do some of these
  22. I can do some of these
  23. I can do some of these
  24. typed in "data science" into LinkedIn Jobs
  25. I can do some of these
  26. for those of us without PhDs
  27. https://www.flickr.com/photos/arlophoto/5616233274
  28. https://www.flickr.com/photos/arlophoto/5616233274
  29. Norvig