A introduction to Scraperwiki (for not developers)

46,684 views
46,850 views

Published on

A simple introduction to scraperwiki with attention for the NOT developers

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
46,684
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
25
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

A introduction to Scraperwiki (for not developers)

  1. 1. Summer School "Data journalism e visualizzazione grafica dei dati" 29 July 2011 – Flavon (TN)A introduction to for not developers Maurizio Napolitano <napo@fbk.eu>
  2. 2. Description in the nameSCRAPER WIKIsourcehttp://www.modot.org/central/major_projects/July2006photos.htm source http://www.commoncraft.com/video/wikis
  3. 3. Wiki like Wikipedia Scraper like ???a scraper extract datafrom a content
  4. 4. Legal aspectScraper sites may violatecopyright law.Even taking content from an open content site can be acopyright violation, if done in a way which does not respectthe license.For instance, the GNU Free Documentation License (GFDL)and Creative Commons ShareAlike (CC-BY-SA) licensesrequire that a republisher inform readers of the licenseconditions, and give credit to the original author. http://en.wikipedia.org/wiki/Scraper_site
  5. 5. .. then scraperwiki is ... https://scraperwiki.com/A place where share scrapers … and data :)
  6. 6. ScraperWiki legal aspectUse6. You agree that, in using the ScraperWiki site and services, you willnot interfere with the legal rights[...]Intellectual Property9. Subject to the following paragraphs, the source code of theScraperWiki site, and all other copyrightable materials that form a partof it is released under the GNU Affero General Public License.10. All scraping code hosted on the site is licensed under the GNUGeneral Public License. You hereby license all scraping code youcreate using ScraperWiki under the same licence.11. You agree to assert no additional intellectual property rights,including copyright and database right, in any scraped data other thanthose which subsisted in the relevant web sites before the running ofthe relevant scraper and which were held by you at that time.12. You grant us a non-exclusive, worldwide, licence to use any datathat you store on our site, for the purposes of administering the site. https://scraperwiki.com/terms_and_conditions/
  7. 7. ScraperWiki legal aspectUSE6.You agree [..] you will not interfere withthe legal rights[...]INTELLECTUAL PROPERTY9. […] the source code of the ScraperWiki [..] is releasedunder the GNU Affero General Public License.10. All scraping code […] is licensed under the GNUGeneral Public License.11.You agree to assert no additionalintellectual property rights [...]12. You grant us a non-exclusive, worldwide, licence to use any datathat you store on our site, for the purposes of administering the site.
  8. 8. HOW CREATE A SCRAPER?
  9. 9. The NOT developers
  10. 10. The technical approachhttp://unstats.un.org/unsd/demographic/products/socind/education.htm
  11. 11. Behind the page HTML code
  12. 12. Where are the data? There is a structure behind!!!
  13. 13. The algorithm!!!Download th web page Read the informationFind the right position Extract the data Create a CSV file data1;data2;data3 [...] dataN1;dataN2;dataN3
  14. 14. Example: python codehttps://scraperwiki.com/docs/python/python_intro_tutorial/
  15. 15. … and everything run in the cloud!!!
  16. 16. The code in the cloudhttps://scraperwiki.com/scrapers/mlb_rosters/
  17. 17. Sharing & ReUse
  18. 18. Enjoy!!!httpS://scraperwiki.com/
  19. 19. Thanks! A introduction to ScraperWiki for NOT developers by Maurizio Napolitano <napo@fbk.eu> is licensed under a Creative Commons Attribuzione 3.0 Unported License.Created for Summer School "Data journalism e visualizzazione grafica dei dati" 29 July 2011 – Flavon (TN)

×