Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Tf itpbapm

491 views

Published on

.

Published in: Education
  • You can hardly find a student who enjoys writing a college papers. Among all the other tasks they get assigned in college, writing essays is one of the most difficult assignments. Fortunately for students, there are many offers nowadays which help to make this process easier. The best service which can help you is ⇒ www.WritePaper.info ⇐
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Tf itpbapm

  1. 1. Intro to Python: Build a Predictive Model
  2. 2. Introductions ➔ What's your name? ➔ What brought you here today? ➔ What is your programming experience?
  3. 3. We train developers and data scientists through 1x1 mentorship and project-based learning. Guaranteed. About Thinkful
  4. 4. Learn by Doing ➔ Why is Data Science a thing? ➔ What is Python? ➔ How do we use it with a real world project? ➔ How do I learn more?
  5. 5. What is a Data Scientist?
  6. 6. “[LinkedIn] was like arriving at a conference reception and realizing you don’t know anyone. So you just stand in the corner sipping your drink — and you probably leave early.” — LinkedIn Manager, June 2006 Example: LinkedIn 2006
  7. 7. ➔ Joined LinkedIn in 2006, only 8M users (450M in 2016) ➔ Started experiments to predict people’s networks ➔ Engineers were dismissive: “you can already import your address book” Enter: Data Scientist
  8. 8. ➔ Frame the question ➔ Collect the raw data ➔ Process the data ➔ Explore the data ➔ Communicate results The Process: LinkedIn Example
  9. 9. ➔ What questions do we want to answer? ◆ Who? ◆ What? ◆ When? ◆ Where? ◆ Why? ◆ How? Case: Frame the Question
  10. 10. ➔ What connections (type and number) lead to higher user engagement? ➔ Which connections do people want to make but are currently limited from making? ➔ How might we predict these types of connections with limited data from the user? Case: Frame the Question
  11. 11. ➔ What data do we need to answer these questions? Case: Collect the Data
  12. 12. ➔ Connection data (who is who connected to?) ➔ Demographic data (what is the profile of the connection) ➔ Engagement data (how do they use the site) Case: Collect the Data
  13. 13. ➔ How is the data “dirty” and how can we clean it? Case: Process the Data
  14. 14. ➔ User input ➔ Redundancies ➔ Feature changes ➔ Data model changes Case: Process the Data
  15. 15. ➔ What are the meaningful patterns in the data? Case: Explore the Data
  16. 16. ➔ Triangle closing ➔ Time Overlaps ➔ Geographic Overlaps Case: Explore the Data
  17. 17. ➔ How do we communicate this? ➔ To whom? Case: Communicate Findings
  18. 18. ➔ Marketing - sell X more ad space, results in X more impressions per day ➔ Product - build X more features ➔ Development - grow our team by X ➔ Sales - attract X more premium accounts ➔ C-Level - more revenue, 8M - 450M in 10 years Case: Communicate Findings
  19. 19. The Result
  20. 20. Python for Programming ➔ Great for Data Science ➔ Robotics ➔ Web Development (Python/Django) ➔ Automation Let’s Learn Python
  21. 21. Let’s Learn Python
  22. 22. ➔ Our model is going to be a Decision Tree ➔ Decision Trees predict the most likely outcome based on input ➔ Like a computer building a version of 20 questions The Model
  23. 23. Decision Trees: Golf?
  24. 24. ➔ We’ll be using a Google-hosted Python notebook to build this model called Colaboratory ➔ Go to: Colab.research.google.com ➔ Click New Python 3 Notebook The Notebook
  25. 25. from sklearn import tree ➔ Import Tree functionality from the SKLearn Python Package ➔ bit.ly/sklearn-python Code Block 1
  26. 26. X = [[181,80], [177,70], [160,60], [154,54], [166,65], [190,90], [175,64], [177,70], [159,55], [171,75], [181,85]] Y = ['male','female','female','female','male','male','male','female', 'male','female','male'] ➔ Load in our seed data ➔ X is an array of inputs, each input is itself an array that contains Height (in cm) and Weight (in kg) ➔ Y is an array of strings that map to the inputs in X so we can train the model Code Block 2
  27. 27. clf = tree.DecisionTreeClassifier() clf = clf.fit(X,Y) #print tree.export_graphviz(clf,None) ➔ We create an empty DecisionTreeClassifier and assign it to the variable clf ➔ We fit the decision tree with our X and Y seed data ➔ SKLearn is automatically creating our Decision Tree questions for us (Example: Is height > 177? Yes - Male) ➔ Uncomment the last line and paste the return string into: webgraphviz.com Code Block 3
  28. 28. prediction = clf.predict([[183,76]]) print prediction ➔ Now we give our inputs, in the same format ➔ Height (cm), Weight (kg) ➔ Print our prediction Code Block 4
  29. 29. Our model has a few weaknesses: ➔ Limited inputs ➔ Assumptions Shortcomings
  30. 30. Ways to Learn Data Science
  31. 31. ➔ Start with Python and Statistics ➔ Personal Program Manager ➔ Unlimited Q&A Sessions ➔ Student Slack Community ➔ bit.ly/freetrial-ds Thinkful Two-Week Free Trial
  32. 32. The Student Experience Marnie Boyer, Thinkful Graduate Capstone Wolfgang Hall, Thinkful Graduate Capstone
  33. 33. ➔ bit.ly/tf-event-feedback Survey

×