Scraping by examples
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Scraping by examples

on

  • 1,246 views

Learn how to scrap web pages in Ruby, Javascript (and others, soon).

Learn how to scrap web pages in Ruby, Javascript (and others, soon).

Statistics

Views

Total Views
1,246
Views on SlideShare
1,246
Embed Views
0

Actions

Likes
0
Downloads
15
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Scraping by examples Presentation Transcript

  • 1. by examples Scraping Alexandre GomesFriday, May 20, 2011
  • 2. http://creativecommons.org/licenses/by-nc/3.0/br/Friday, May 20, 2011
  • 3. Primeiros resultados definitivos do Censo 2010: população do Brasil é de 190.755.799 pessoas O Brasil tem 190.755.799 habitantes. É o que constata a Sinopse do Censo Demográfico 2010, que contém os primeiros resultados definitivos do XII Recenseamento Geral do Brasil... 29/04/2011 http://www.ibge.gov.brFriday, May 20, 2011
  • 4. Resumo do Censo 2010Friday, May 20, 2011
  • 5. Resumo do Censo 2010Friday, May 20, 2011
  • 6. Friday, May 20, 2011
  • 7. Friday, May 20, 2011
  • 8. Qual a relação entre os índices de alfabetização e a proporção feminina?Friday, May 20, 2011
  • 9. Exemplo mulheres da região 7.859.539 = = 0.49total de pessoas da região 7.859.539 + 8.004.915 alfabetizados* da região 11.326.492 = = 0.89total de pessoas* da região 12.670.041 * acima de 10 anos de idadeFriday, May 20, 2011
  • 10. E nas demais regiões?Friday, May 20, 2011
  • 11. Scraping byFriday, May 20, 2011 Examples
  • 12. NokogiriFriday, May 20, 2011
  • 13. #1 Acessar a página que contém o dado desejadoFriday, May 20, 2011
  • 14. testeFriday, May 20, 2011
  • 15. teste codigoFriday, May 20, 2011
  • 16. $ rspec spec/ibge_censo2010_spec.rb:8Run filtered using {:line_number=>8}IBGECenso2010 should open page with "Razão de sexo,população de homens e mulheres"Finished in 44.4 seconds1 example, 0 failures$Friday, May 20, 2011
  • 17. #2 Recuperar o dado desejadoFriday, May 20, 2011
  • 18. Antes, entenda a estrutura da páginaFriday, May 20, 2011
  • 19. <table> <thead>...</thead> <tfoot> <tr> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> </tr> </tfoot> Estude o caminho <tbody>...</tbody> do dado na árvore DOM </table>Friday, May 20, 2011
  • 20. Observe IDs e classes CSS que podem ser úteis.Friday, May 20, 2011
  • 21. Friday, May 20, 2011
  • 22. class="td_numeros"Friday, May 20, 2011
  • 23. Friday, May 20, 2011
  • 24. Friday, May 20, 2011
  • 25. ".td_numeros" [Friday, May 20, 2011
  • 26. ".td_numeros" [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14Friday, May 20, 2011 15 16 17
  • 27. 1º dado de que precisamos. (numerador da fórmula) [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14Friday, May 20, 2011 15 16 17
  • 28. 2º dado de que precisamos. (para o cálculo do denominador da fórmula) [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14Friday, May 20, 2011 15 16 17
  • 29. mulheres da região N dados[5] = total de pessoas da região N dados[4] + dados[5] [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14Friday, May 20, 2011 15 16 17
  • 30. testeFriday, May 20, 2011
  • 31. codeFriday, May 20, 2011
  • 32. $ rspec specIBGECenso2010 razao de sexo should open page with "Razão de sexo,população de homens e mulheres" should get number of womenFinished in 1.78 seconds2 examples, 0 failuresFriday, May 20, 2011
  • 33. testeFriday, May 20, 2011
  • 34. codeFriday, May 20, 2011
  • 35. #3 Recuperar o restante de dados desejadosFriday, May 20, 2011
  • 36. Friday, May 20, 2011
  • 37. ...Friday, May 20, 2011
  • 38. #4 Apresentação Web do scrappingFriday, May 20, 2011
  • 39. application.rb (...)Friday, May 20, 2011
  • 40. application.rb (...)Friday, May 20, 2011
  • 41. index.erb(...)Friday, May 20, 2011
  • 42. http://datavisualization.ch/tools/13-javascript-libraries-for-visualizationsFriday, May 20, 2011
  • 43. me do s o c har na ups e s tá m ash ção sualiza vi diferenciada de dados http://datavisualization.ch/tools/13-javascript-libraries-for-visualizationsFriday, May 20, 2011
  • 44. #5 Visualização (ainda tosca) do scrappingFriday, May 20, 2011
  • 45. Friday, May 20, 2011
  • 46. #6 Visualização diferenciada da informaçãoFriday, May 20, 2011
  • 47. Friday, May 20, 2011 ?
  • 48. Agora, a mesma coisa, apenas com JavascriptFriday, May 20, 2011
  • 49. #1 Acessar a página que contém o dado desejadoFriday, May 20, 2011
  • 50. testFriday, May 20, 2011
  • 51. codeFriday, May 20, 2011
  • 52. Friday, May 20, 2011
  • 53. #2 Recuperar o dado desejadoFriday, May 20, 2011
  • 54. testFriday, May 20, 2011
  • 55. codeFriday, May 20, 2011
  • 56. #3 Recuperar o restante de dados desejadosFriday, May 20, 2011
  • 57. ...Friday, May 20, 2011
  • 58. #4 Apresentação Web do scrappingFriday, May 20, 2011
  • 59. index.htmlFriday, May 20, 2011
  • 60. index.htmlFriday, May 20, 2011
  • 61. index.htmlFriday, May 20, 2011
  • 62. index.htmlFriday, May 20, 2011
  • 63. index.html (...)Friday, May 20, 2011
  • 64. index.html (...)Friday, May 20, 2011
  • 65. index.html (...)Friday, May 20, 2011
  • 66. index.html (...)Friday, May 20, 2011
  • 67. http://chart.apis.google.com/chart? chxt=y&chbh=a&chs=500x300&cht=bvg&chco=A2C180,3D7930 &chd=t:49,51,51,50,50|89,82,94,95,93 &chdl=Women|Literates&chp=0.033Friday, May 20, 2011
  • 68. código disponível em...Friday, May 20, 2011
  • 69. P&RFriday, May 20, 2011
  • 70. http://tinyurl.com/AvaliacaoSOO14Friday, May 20, 2011
  • 71. Friday, May 20, 2011