Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Data data data Session III
I. What do we mean when we talk about data and where does data come from? II. What is data science? III. Where do you find...
What do we mean when we talk about data? Session III
We ask many many questions about the world around us.
To answer those questions accurately requires a body of data and a set of tools to perform analyses that will lead us towa...
“ The goal is to transform data into information, and information into insight”  Carly Fiorina
 
Data are the raw facts about our world
 
 
Thomas Nylen & Andrew Fountain (PSU), NASA, NSF
A lot of this data is available for you to use
Where does the data come from? Government http://www.data.gov / http://data.gov.uk/ http://www.census.gov/ Scientific rese...
<ul><li>Direct data </li></ul>
Direct data Government http://www.data.gov / http://data.gov.uk/ http://www.census.g ov/ Scientific research http://www.nc...
photo by  solarnu  on  Flickr
<ul><li>Indirect data </li></ul>
Indirect data Government http://www.data.gov / http://data.gov.uk/ http://www.census.gov/ Scientific research http://www.n...
Source: Google Flu Trends (http://www.google.org/flutrends)
There has been an almost incomprehensible growth in digital data
 
<ul><li>5 Megabytes  - a high resolution photo </li></ul><ul><li>5 Gigabytes  - just more than a common DVD stores </li></...
Okay, so there is a lot of data!
Is all this data free to use?
No
What you can do with data is largely dictated by license and copyright
Because of this many organizations have begun advocating and practicing  'open data' 'open data'
What is open data?
Free and unrestricted access to data
http://www.youtube.com/watch?v=3YcZ3Zqk0a8
Kepler Data
'...survey a portion of our region of the Milky Way galaxy to discover dozens of Earth-size planets in or near the habitab...
Launch a telescope into space for orbit around the sun Collect data from thousands of stars Spend 3 months trying to detec...
Eric.Nielsen.Photos on Flickr
 
Different but related ideas <ul><li>Open Government </li></ul><ul><li>data.gov, data.gov.uk </li></ul><ul><li>Open Access ...
What is data science? Part II
Data analysis is a body of methods that help to describe facts, detect patterns, develop explanations, and test hypotheses...
“ The goal is to transform data into information, and information into insight”  Carly Fiorina
It is a set of skills performed often but not exclusively by scientists
The availability of data on the internet is making data analysis accessible to anyone
http://www.youtube.com/watch?v=PnpGIgzNBJo&feature=player_embedded
Part III Where do you find the data you need?
My community has a particular set of data that we rely on very often
 
 
 
 
During our class, to find the data you need...
Search First
Open government data, Large published datasets such as Wikipedia, Flickr, or public FusionTables, Natural sciences resourc...
See the growing list of datasets our class will uncover https://github.com/andrewxhill/DMID/wiki/Datasets
Now you have found it, how do you get it?
Maybe you can download it directly from the website. Pay attention to what formats you find, some will be easier for you t...
Sometimes, APIs, or application programming interfaces are available. These are probably for people with a bit more progra...
 
Scraping Generally the hardest method, as it means programmatically pulling data from sources not necessarily designed to ...
 
Linked data This is an evolution of both scraping and APIs, where many web resources are now designed to be both human rea...
 
Remember that all of these data sources have different formats and potential sources of error, we will have a full session...
Remember that data comes in many shapes and sizes, be aware of what you find General formats - png, jpg, xls, doc Categori...
XLS VS CSV
How can we join datasets together?
Afghanistan 3 Bolivia 2 Guyana 1 Palau 4 GUY 1.03 BOL 1.34 PLW 0 AFG 19.03
Afghanistan 3 Bolivia 2 Guyana 1 Palau 4 GUY 1.03 BOL 1.34 PLW 0.20 AFG 1.93 Afghanistan AFG Bolivia BOL Guyana GUY Palau ...
If you start working with data, really interesting things can appear.
 
 
  GBIF.org
from Eric Fisher on Flickr
Upcoming SlideShare
Loading in …5
×

Data, data, data

781 views

Published on

Session 3 for DMID

Published in: Education, Technology, Design
  • Be the first to comment

  • Be the first to like this

Data, data, data

  1. 1. Data data data Session III
  2. 2. I. What do we mean when we talk about data and where does data come from? II. What is data science? III. Where do you find the data you need? IV. Science Module: Niche Modelling
  3. 3. What do we mean when we talk about data? Session III
  4. 4. We ask many many questions about the world around us.
  5. 5. To answer those questions accurately requires a body of data and a set of tools to perform analyses that will lead us toward an answer
  6. 6. “ The goal is to transform data into information, and information into insight”  Carly Fiorina
  7. 8. Data are the raw facts about our world
  8. 11. Thomas Nylen & Andrew Fountain (PSU), NASA, NSF
  9. 12. A lot of this data is available for you to use
  10. 13. Where does the data come from? Government http://www.data.gov / http://data.gov.uk/ http://www.census.gov/ Scientific research http://www.ncbi.nlm.nih.gov/genbank/ http://www.gbif.org/ http://earthengine.googlelabs.com Semi-automated and large scale collections http://eospso.gsfc.nasa.gov/ http://www.airquality.co.uk/autoinfo.php http://www.statistics.gov.uk/ For profit http://www.flickr.com/ http://www.google.com/trends http://www.facebook.com/data Citizens http://www.wikipedia.org / http://protectedplanet.net/ http://www.openstreetmap.org/
  11. 14. <ul><li>Direct data </li></ul>
  12. 15. Direct data Government http://www.data.gov / http://data.gov.uk/ http://www.census.g ov/ Scientific research http://www.ncbi.nlm.nih.gov/genbank/ http://www.gbif.org/ http://earthengine.googlelabs.com Semi-automated and large scale collections http://eospso.gsfc.nasa.gov / http://www.airquality.co.uk/autoinfo.php http://www.statistics.gov.uk/ For profit http://www.flickr.com/ http://www.google.com/trends http://www.facebook.com/data Citizens http://www.wikipedia.org / http://protectedplanet.net/ http://www.openstreetmap .org/
  13. 16. photo by  solarnu  on  Flickr
  14. 17. <ul><li>Indirect data </li></ul>
  15. 18. Indirect data Government http://www.data.gov / http://data.gov.uk/ http://www.census.gov/ Scientific research http://www.ncbi.nlm.nih.gov/genbank/ http://www.gbif.org / http://earthengine.googlelabs.com Semi-automated and large scale collections http://eospso.gsfc.nasa.gov/ http://www.airquality.co.uk/autoinfo.php http://www.statistics.gov.uk/ For profit http://www.flickr.com / http://www.google.com/trends http://www.facebook.com/data Citizens http://www.wikipedia.org / http://protectedplanet.net/ http://www.openstreetmap.org/
  16. 19. Source: Google Flu Trends (http://www.google.org/flutrends)
  17. 20. There has been an almost incomprehensible growth in digital data
  18. 22. <ul><li>5 Megabytes - a high resolution photo </li></ul><ul><li>5 Gigabytes - just more than a common DVD stores </li></ul><ul><li>1 Terabyte  - size of common home computer harddrive </li></ul><ul><li>15 Petabyte - data produced at CERN each year </li></ul><ul><li>5 Exabytes  ~ every word spoken by humans </li></ul><ul><li>1.2 Zettabytes ~   Digital Universe in 2010 </li></ul><ul><li>1,180,591,620,717,411,303,424 bytes or 2 70  </li></ul>
  19. 23. Okay, so there is a lot of data!
  20. 24. Is all this data free to use?
  21. 25. No
  22. 26. What you can do with data is largely dictated by license and copyright
  23. 27. Because of this many organizations have begun advocating and practicing 'open data' 'open data'
  24. 28. What is open data?
  25. 29. Free and unrestricted access to data
  26. 30. http://www.youtube.com/watch?v=3YcZ3Zqk0a8
  27. 31. Kepler Data
  28. 32. '...survey a portion of our region of the Milky Way galaxy to discover dozens of Earth-size planets in or near the habitable zone and determine how many of the billions of stars in our galaxy have such planets...'
  29. 33. Launch a telescope into space for orbit around the sun Collect data from thousands of stars Spend 3 months trying to detect the presence of plants in orbit around these stars RELEASE THE DATA FOR USE BY ANYONE!!!
  30. 34. Eric.Nielsen.Photos on Flickr
  31. 36. Different but related ideas <ul><li>Open Government </li></ul><ul><li>data.gov, data.gov.uk </li></ul><ul><li>Open Access </li></ul><ul><li>plos.org </li></ul><ul><li>Open Source </li></ul><ul><li>Apache, Firefox </li></ul>
  32. 37. What is data science? Part II
  33. 38. Data analysis is a body of methods that help to describe facts, detect patterns, develop explanations, and test hypotheses. It is used in all of the sciences. It is used in business, in administration, and in policy. Levine, 1997,  Introduction to Data Analysis: The Rules of Evidence
  34. 39. “ The goal is to transform data into information, and information into insight”  Carly Fiorina
  35. 40. It is a set of skills performed often but not exclusively by scientists
  36. 41. The availability of data on the internet is making data analysis accessible to anyone
  37. 42. http://www.youtube.com/watch?v=PnpGIgzNBJo&feature=player_embedded
  38. 43. Part III Where do you find the data you need?
  39. 44. My community has a particular set of data that we rely on very often
  40. 49. During our class, to find the data you need...
  41. 50. Search First
  42. 51. Open government data, Large published datasets such as Wikipedia, Flickr, or public FusionTables, Natural sciences resources, Weather, Atmosphere and Geographic datasets
  43. 52. See the growing list of datasets our class will uncover https://github.com/andrewxhill/DMID/wiki/Datasets
  44. 53. Now you have found it, how do you get it?
  45. 54. Maybe you can download it directly from the website. Pay attention to what formats you find, some will be easier for you to use than others.
  46. 55. Sometimes, APIs, or application programming interfaces are available. These are probably for people with a bit more programming experience, but if you need the data ask the instructors and we might be able to help
  47. 57. Scraping Generally the hardest method, as it means programmatically pulling data from sources not necessarily designed to have data pulled from. 
  48. 59. Linked data This is an evolution of both scraping and APIs, where many web resources are now designed to be both human readable and programatically navigable. 
  49. 61. Remember that all of these data sources have different formats and potential sources of error, we will have a full session on data preparation, cleaning, and analysis
  50. 62. Remember that data comes in many shapes and sizes, be aware of what you find General formats - png, jpg, xls, doc Categorical and storage - csv, sql Geographic - shp, tif, asc
  51. 63. XLS VS CSV
  52. 64. How can we join datasets together?
  53. 65. Afghanistan 3 Bolivia 2 Guyana 1 Palau 4 GUY 1.03 BOL 1.34 PLW 0 AFG 19.03
  54. 66. Afghanistan 3 Bolivia 2 Guyana 1 Palau 4 GUY 1.03 BOL 1.34 PLW 0.20 AFG 1.93 Afghanistan AFG Bolivia BOL Guyana GUY Palau AFG
  55. 67. If you start working with data, really interesting things can appear.
  56. 70.   GBIF.org
  57. 71. from Eric Fisher on Flickr

×