• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Data Science, what even?!
 

Data Science, what even?!

on

  • 1,007 views

Presented an abridged version of my "What is data science" talk at #websummit 2013. ...

Presented an abridged version of my "What is data science" talk at #websummit 2013.

This talk goes over the required skillset as defined by Drew Conway and his famous venn diagram, and also outlines the Data Scientific Method brought by Dr. Patil. The talk is mainly two parts and the second part goes over some of the packages and technologies we use — minus the storage part.

Statistics

Views

Total Views
1,007
Views on SlideShare
917
Embed Views
90

Actions

Likes
2
Downloads
16
Comments
0

5 Embeds 90

https://twitter.com 59
http://eventifier.co 13
http://eventifier.com 10
http://www.linkedin.com 6
http://www.eventifier.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Data Science, what even?! Data Science, what even?! Presentation Transcript

    • Data Science?! what even...
    • David Coallier @davidcoallier
    • Data Scientist Engine Yard
    • And I cook.. A lot.
    • (n-1) items
    • Adapting.
    • Feedback.
    • Indifference.
    • Young mathematically inclined minds
    • Young mathematically inclined minds We knew everything.
    • First Bad Assumption.
    • So we asked “experts”.
    • Wrong Ingredients
    • Bad Data
    • Tasted like sh*t
    • From Our Results We had questions.
    • Found Expertise Not Online.
    • Data Scientific Method
    • Find a Question Your Hypothesis
    • Current Data What do you have?
    • Features & Tests Try it.
    • Analyse Results Won’t be pretty.
    • Conversation Framed. By. Data.
    • But....
    • Good Discussions Imply good data scientists
    • Hacking Skills
    • Hacking Skills Maths & Stats
    • Hacking Skills Expertise Maths & Stats
    • Hacking Skills Machine Learning Danger Zone!!! Expertise Research Maths & Stats
    • Hacking Skills Data Science Expertise Maths & Stats
    • Hacking Skills Danger Zone!!! Machine Learning Data Science Maths & Stats Expertise Research
    • Business Don’t need an MBA
    • In other words.
    • 1. Hacking 2. Maths & Stats 3. Expertise
    • Apply Method Data Scientific
    • 1. Question 2. Current Data 3. Features/Tests 4. Analyse 5. Converse
    • Find a Question Let’s imagine Github
    • Upgrade Repos Affect users as little as possible
    • import csv content = csv.read('repo1.csv')
    • λ e f (k; λ ) = k! k −k for k >= 0
    • Converse Present Findings
    • Iterate Commits aren’t key.
    • KPIs are key Indicators from experience
    • Questions Super Important.
    • Just test it..
    • We are Human. Emotional Connection
    • What next? Second Hypothesis.
    • Focus on Data Relevant to your KPIs.
    • Data gives you the what Humans give you the why
    • Turn Information
    • Into Actionable Insight
    • Create Discussions Introspection Engines
    • Seeing, Feeling it The brain sees.
    • Not regressions
    • Not p-values
    • Not slopes
    • Not F-statistics
    • Not coefficients
    • Question Data Not Visualisations.
    • Toolbox What do we use?
    • R Modeling, Testing, Prototyping
    • RStudio The IDE
    • lubridate and zoo Dealing with Dates...
    • yy/mm/dd mm/dd/yy YYYY-mm-dd HH:MM:ss TZ yy-mm-dd 1363784094.513425 yy/mm different timezone
    • reshape2 Reshape your Data
    • ggplot2 Visualise your Data
    • RCurl, RJSONIO Find more Data
    • HMisc Miscellaneous useful functions
    • forecast Can you guess?
    • garch Generalized Autoregressive Conditional Heteroskedasticity
    • quantmod Statistical Financial Trading
    • getSymbols('AAPL') barChart(AAPL) addMACD()
    • xts Extensible Time Series
    • igraph Study Networks
    • maptools Read & View Maps
    • map('state', region = c(row.names(USArrests)), col=cm.colors(16, 1)[floor(USArrests$Rape/max(USArrests$Rape)*28)], fill=T)
    • Python Scientific Computing
    • SciPy http://www.scipy.org
    • scipy.stats
    • scipy.stats Descriptive Statistics
    • from scipy.stats import describe s = [1,2,1,3,4,5] print describe(s)
    • scipy.stats Probability Distributions
    • Example Poisson Distribution
    • λ e f (k; λ ) = k! k −k for k >= 0
    • import scipy.stats.poisson p = poisson.pmf([1,2,3,4,1,2,3], 2)
    • print p.mean() print p.sum() ...
    • NumPy http://www.numpy.org/
    • NumPy Linear Algebra
    • ⎛ 1 0 ⎞ ⎜ 0 1 ⎟ ⎝ ⎠
    • import numpy as np x = np.array([ [1, 0], [0, 1] ]) vec, val = np.linalg.eig(x) np.linalg.eigvals(x)
    • >>> np.linalg.eig(x) ( array([ 1., 1.]), array([ [ 1., 0.], [ 0., 1.] ]) )
    • Matplotlib Python Plotting
    • statsmodels Advanced Statistics Modeling
    • NLTK Natural Language Tool Kit
    • scikit-learn Machine Learning
    • from sklearn import tree X = [[0, 0], [1, 1]] Y = [0, 1] clf = tree.DecisionTreeClassifier() clf = clf.fit(X, Y) clf.predict([[2., 2.]]) >>> array([1])
    • PyBrain ... Machine Learning
    • PyMC Bayesian Inference
    • Pattern Web Mining for Python
    • NetworkX Study Networks
    • MILK: Machine Learning
    • Pandas easy-to-use data structures
    • from pandas import * x = DataFrame([ {"age": 26}, {"age": 19}, {"age": 21}, {"age": 18} ]) print x[x['age'] > 20].count() print x[x['age'] > 20].mean()
    • Python vs R? Different Purposes
    • Dogfooding Data Scientific Method
    • Original Question What is Data Science?
    • Back to you For questioning