Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Python for Data Science

In this presentation its given an introduction about Data Science, Data Scientist role and features, and how Python ecosystem provides great tools for Data Science process (Obtain, Scrub, Explore, Model, Interpret).
For that, an attached IPython Notebook ( http://bit.ly/python4datascience_nb ) exemplifies the full process of a corporate network analysis, using Pandas, Matplotlib, Scikit-learn, Numpy and Scipy.

  • Login to see the comments

Python for Data Science

  1. 1. PYTHON FOR DATA SCIENCE Gabriel Moreira Machine Learning Engineer @gspmoreira
  2. 2. https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
  3. 3. Why so much buzz?
  4. 4. Big Data
  5. 5. WHERE IS DATA SCIENCE BEEN USED? http://www.kdnuggets.com/2014/12/where-analytics-data-mining-data-science-applied.html
  6. 6. RECOMMENDATIONS EVERYWHERE
  7. 7. WHAT IS DATA SCIENCE http://drewconway.com
  8. 8. WHAT IS DATA SCIENTIST http://www.datasciencecentral.com/profiles/blogs/are-you-a-data-scientist A Data Scientist is someone with deliberate dual personality who can first build a curious business case defined with a telescopic vision and can then dive deep with microscopic lens to sift through DATA to reach the goal while defining and executing all the intermittent tasks.
  9. 9. WHAT IS A DATA SCIENTIST? Data scientists explore and transform data in novel ways to create and publish new features and combine data from diverse sources to create new value. Data scientists make visualizations with researchers, engineers, web developers, and designers to expose raw, intermediate, and refined data early and often.
 Applied researchers solve the heavy problems that data scientists uncover and that stand in the way of delivering value. These problems take intense effort and require novel methods from statistics and machine learning. [Agile Data Science, O’Reilly, 2014]
  10. 10. http://nirvacana.com/thoughts/becoming-a-data-scientist/ Data Science MetroMap Curriculum
  11. 11. IS DATA SCIENTISTTHE NEW WEBMASTER?
  12. 12. [Doing Data Science, O’Reilly, 2014]
  13. 13. [Hillary Mason, Data Scientist] Inquire( Obtain( Scrub( Explore( Model( iNterpret( DATA SCIENCE IS IOSEMN
  14. 14. What about Python?
  15. 15. Inquire( Obtain( Scrub( Explore( Model( iNterpret( PYTHON IS IOSEMN js
  16. 16. ANALYSIS CASE
 CORPORATE SOCIAL NETWORKS
  17. 17. Inquire( Obtain( Scrub( Explore( Model( iNterpret(
  18. 18. INQUIRE 1. Which communities are more popular? 2. Is the engagement of users in corporate communities increasing? 3. What is the distribution of posts publishing time, during the day? 4. What is the percentage of interactions (likes and comments)? 5. How is the likes distribution by user? 6. Is there a relationship between publishing hour and number of interactions? 7. What communities are more engaging (greater avg. interactions on posts)? 8. What are the most relevant words in the posts? 9. How to group posts about similar subjects?
  19. 19. Inquire( Obtain( Scrub( Explore( Model( iNterpret(
  20. 20. OBTAIN •Download data from another location (e.g., a web page or server) •Query data from a database (e.g., MySQL or Oracle) •Extract data from an API (e.g.,Twitter, Facebook) •Extract data from another file (e.g., an HTML file or spreadsheet) •Generate data yourself (e.g., reading sensors or taking surveys)
  21. 21. TWITTER PUBLIC STREAM API
  22. 22. Inquire( Obtain( Scrub( Explore( Model( iNterpret(
  23. 23. Show me the code!
  24. 24. Data Analysis with IPython Notebook Demo bit.ly/python4ds_nb
  25. 25. Inquire( Obtain( Scrub( Explore( Model( iNterpret(
  26. 26. INTERPRET •Drawing conclusions from your data •Evaluating what your results mean •Communicating your result
  27. 27. DATA PRODUCTS “If information has context and the context is interactive, insights are not predictable." [Agile Data Science, O’Reilly, 2014]
  28. 28. SENTIMENT ANALYSIS bit.ly/eleicoes2014debatesbt Analytical Dashboard
  29. 29. SENTIMENT ANALYSIS Analytical Dashboard bit.ly/eleicoes2014debatesbt
  30. 30. SENTIMENT ANALYSIS Dashboard Online - JavaScript
  31. 31. NETWORK ANALYSIS https://linkedjazz.org js
  32. 32. What about 
 Python for Big Data?
  33. 33. PYTHON IN HADOOP • Hadoop Streaming - Allows MapReduce jobs from any executable script - including Python!
 Example using AWS Elastic MapReduce: 
 http://workingsweng.com.br/2014/04/clusterizando-raios-com- hadoop-e-k-means-em-map-reduce/ • Other supporting options for Python in Hadoop HADOOPY Pig UDFs 
 in Jython
  34. 34. THE NEXT-GEM DATA SCIENTIST The best minds of my generation are thinking about how to make people click ads...That sucks. [Jeff Hammerbacher] Next-gen data scientists don’t try to impress with complicated algorithms and models that don’t work. 
 They spend a lot more time trying to get data into shape than anyone cares to admit—maybe up to 90% of their time. Finally, they don’t find religion in tools, methods, or academic departments. They are versatile and interdisciplinary. [Doing Data Science, O’Reilly, 2014]
  35. 35. DATA SCIENCE COURSES • Introduction to Data Science (Univ. of Washington) • Data Science specialization (John Hopkins) • Intro to Hadoop and MapReduce (Cloudera) • Machine Learning (Stanford) • Statistical Learning (Stanford) http://workingsweng.com.br/2014/04/cursos-mooc-e-especializacoes-em-data-science/
  36. 36. BOOKS
  37. 37. The road can be challenging
  38. 38. But may be fun!
  39. 39. Gabriel Moreira @gspmoreira https://about.me/gspmoreira Thank you!

×