Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Everyone can do data science — import.io webinar

5,174 views

Published on

Everyone can do data science with the help of tools such as:
- import.io for visually scraping data from the web
- Pandas to wrangle data in Python
- BigML to apply machine learning to data.

In this presentation , I introduce what machine learning is before moving on to a case study where I show how to build a real estate pricing model. Check out import.io's webinar for the whole thing: http://blog.import.io/post/become-a-data-scientist-in-an-hour

Published in: Data & Analytics
  • Be the first to comment

Everyone can do data science — import.io webinar

  1. 1. Everyone can do data science" import.io webinar 23/9/14 ! Louis Dorard (@louisdorard)
  2. 2. US real estate portals:" - Realtor - Zillow - Trulia - …
  3. 3. Bedrooms Bathrooms Surface (foot²) Year built Type Price ($) 3 1 860 1950 house 565,000 3 1 1012 1951 house 2 1.5 968 1976 townhouse 447,000 4 1315 1950 house 648,000 3 2 1599 1964 house 3 2 987 1951 townhouse 790,000 1 1 530 2007 condo 122,000 4 2 1574 1964 house 835,000 4 2001 house 855,000 3 2.5 1472 2005 house 4 3.5 1714 2005 townhouse 2 2 1113 1999 condo 1 769 1999 condo 315,000
  4. 4. Bedrooms Bathrooms Surface (foot²) Year built Type Price ($) 3 1 860 1950 house 565,000 3 1 1012 1951 house 2 1.5 968 1976 townhouse 447,000 4 1315 1950 house 648,000 3 2 1599 1964 house 3 2 987 1951 townhouse 790,000 1 1 530 2007 condo 122,000 4 2 1574 1964 house 835,000 4 2001 house 855,000 3 2.5 1472 2005 house 4 3.5 1714 2005 townhouse 2 2 1113 1999 condo 1 769 1999 condo 315,000
  5. 5. Let’s create a real estate pricing model
  6. 6. Fabien Durand (@thefabiendurand) www.louisdorard.com/guest/everyone-can-do-data-science-importio
  7. 7. Data Science:" - domain knowledge - hacking abilities - machine learning
  8. 8. What the @#?~% is ML?
  9. 9. “Which type of email is this? — Spam/Ham”" -> Classification
  10. 10. “How much is this house worth? — X $” -> Regression
  11. 11. Bedrooms Bathrooms Surface (foot²) Year built Type Price ($) 3 1 860 1950 house 565,000 3 1 1012 1951 house 2 1.5 968 1976 townhouse 447,000 4 1315 1950 house 648,000 3 2 1599 1964 house 3 2 987 1951 townhouse 790,000 1 1 530 2007 condo 122,000 4 2 1574 1964 house 835,000 4 2001 house 855,000 3 2.5 1472 2005 house 4 3.5 1714 2005 townhouse 2 2 1113 1999 condo 1 769 1999 condo 315,000
  12. 12. ML is a set of AI techniques where “intelligence” is built by referring to examples
  13. 13. ??
  14. 14. “A significant constraint on realizing value from big data will be a shortage of talent, particularly of people with deep expertise in statistics and machine learning.” (McKinsey & Co.)
  15. 15. Making ML effortless (Bret Victor)
  16. 16. HTML / CSS / JavaScript
  17. 17. HTML / CSS / JavaScript
  18. 18. squarespace.com
  19. 19. The two phases of machine learning: • TRAIN a model • PREDICT with a model
  20. 20. The two methods of prediction APIs: • TRAIN a model • PREDICT with a model
  21. 21. The two methods of prediction APIs: • model = create_model(dataset)! • predicted_output = create_prediction(model, new_input)
  22. 22. from bigml.api import BigML ! # create a model! api = BigML()! source = api.create_source('training_data.csv')! dataset = api.create_dataset(source)! model = api.create_model(dataset) ! # make a prediction! prediction = api.create_prediction(model, new_input)! print "Predicted output value: ",prediction['object']['output'] http://bit.ly/bigml_wakari
  23. 23. Recap
  24. 24. • Classification and regression • 2 phases in ML: train and predict • Prediction APIs make it easy to build models • Let’s use them on real estate data to predict price from house characteristics
  25. 25. • Encoding domain knowledge • Making our life easier: restricting data to only 1 city
  26. 26. BigML! • Look at data • Split into training and test • Build model from training • Evaluate model on test • Errors: mean absolute error (or percentage?)
  27. 27. Other import.io + BigML use cases:! - Predict ebook rating from description - Predict sales of etsy stores
  28. 28. Talk at #APIconUK! tomorrow in London
  29. 29. ML Algorithm API Automated Pred. API Text Classification API Vertical Pred. API Fixed-model Pred. API ABSTRACTION
  30. 30. www.louisdorard.com/machine-learning-book 50% off for 24 hours with code “importio” ! ! @louisdorard

×