Your SlideShare is downloading. ×

Scraping by examples

1,071

Published on

Learn how to scrap web pages in Ruby, Javascript (and others, soon).

Learn how to scrap web pages in Ruby, Javascript (and others, soon).

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,071
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. by examples Scraping Alexandre GomesFriday, May 20, 2011
  • 2. http://creativecommons.org/licenses/by-nc/3.0/br/Friday, May 20, 2011
  • 3. Primeiros resultados definitivos do Censo 2010: população do Brasil é de 190.755.799 pessoas O Brasil tem 190.755.799 habitantes. É o que constata a Sinopse do Censo Demográfico 2010, que contém os primeiros resultados definitivos do XII Recenseamento Geral do Brasil... 29/04/2011 http://www.ibge.gov.brFriday, May 20, 2011
  • 4. Resumo do Censo 2010Friday, May 20, 2011
  • 5. Resumo do Censo 2010Friday, May 20, 2011
  • 6. Friday, May 20, 2011
  • 7. Friday, May 20, 2011
  • 8. Qual a relação entre os índices de alfabetização e a proporção feminina?Friday, May 20, 2011
  • 9. Exemplo mulheres da região 7.859.539 = = 0.49total de pessoas da região 7.859.539 + 8.004.915 alfabetizados* da região 11.326.492 = = 0.89total de pessoas* da região 12.670.041 * acima de 10 anos de idadeFriday, May 20, 2011
  • 10. E nas demais regiões?Friday, May 20, 2011
  • 11. Scraping byFriday, May 20, 2011 Examples
  • 12. NokogiriFriday, May 20, 2011
  • 13. #1 Acessar a página que contém o dado desejadoFriday, May 20, 2011
  • 14. testeFriday, May 20, 2011
  • 15. teste codigoFriday, May 20, 2011
  • 16. $ rspec spec/ibge_censo2010_spec.rb:8Run filtered using {:line_number=>8}IBGECenso2010 should open page with "Razão de sexo,população de homens e mulheres"Finished in 44.4 seconds1 example, 0 failures$Friday, May 20, 2011
  • 17. #2 Recuperar o dado desejadoFriday, May 20, 2011
  • 18. Antes, entenda a estrutura da páginaFriday, May 20, 2011
  • 19. <table> <thead>...</thead> <tfoot> <tr> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> </tr> </tfoot> Estude o caminho <tbody>...</tbody> do dado na árvore DOM </table>Friday, May 20, 2011
  • 20. Observe IDs e classes CSS que podem ser úteis.Friday, May 20, 2011
  • 21. Friday, May 20, 2011
  • 22. class="td_numeros"Friday, May 20, 2011
  • 23. Friday, May 20, 2011
  • 24. Friday, May 20, 2011
  • 25. ".td_numeros" [Friday, May 20, 2011
  • 26. ".td_numeros" [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14Friday, May 20, 2011 15 16 17
  • 27. 1º dado de que precisamos. (numerador da fórmula) [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14Friday, May 20, 2011 15 16 17
  • 28. 2º dado de que precisamos. (para o cálculo do denominador da fórmula) [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14Friday, May 20, 2011 15 16 17
  • 29. mulheres da região N dados[5] = total de pessoas da região N dados[4] + dados[5] [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14Friday, May 20, 2011 15 16 17
  • 30. testeFriday, May 20, 2011
  • 31. codeFriday, May 20, 2011
  • 32. $ rspec specIBGECenso2010 razao de sexo should open page with "Razão de sexo,população de homens e mulheres" should get number of womenFinished in 1.78 seconds2 examples, 0 failuresFriday, May 20, 2011
  • 33. testeFriday, May 20, 2011
  • 34. codeFriday, May 20, 2011
  • 35. #3 Recuperar o restante de dados desejadosFriday, May 20, 2011
  • 36. Friday, May 20, 2011
  • 37. ...Friday, May 20, 2011
  • 38. #4 Apresentação Web do scrappingFriday, May 20, 2011
  • 39. application.rb (...)Friday, May 20, 2011
  • 40. application.rb (...)Friday, May 20, 2011
  • 41. index.erb(...)Friday, May 20, 2011
  • 42. http://datavisualization.ch/tools/13-javascript-libraries-for-visualizationsFriday, May 20, 2011
  • 43. me do s o c har na ups e s tá m ash ção sualiza vi diferenciada de dados http://datavisualization.ch/tools/13-javascript-libraries-for-visualizationsFriday, May 20, 2011
  • 44. #5 Visualização (ainda tosca) do scrappingFriday, May 20, 2011
  • 45. Friday, May 20, 2011
  • 46. #6 Visualização diferenciada da informaçãoFriday, May 20, 2011
  • 47. Friday, May 20, 2011 ?
  • 48. Agora, a mesma coisa, apenas com JavascriptFriday, May 20, 2011
  • 49. #1 Acessar a página que contém o dado desejadoFriday, May 20, 2011
  • 50. testFriday, May 20, 2011
  • 51. codeFriday, May 20, 2011
  • 52. Friday, May 20, 2011
  • 53. #2 Recuperar o dado desejadoFriday, May 20, 2011
  • 54. testFriday, May 20, 2011
  • 55. codeFriday, May 20, 2011
  • 56. #3 Recuperar o restante de dados desejadosFriday, May 20, 2011
  • 57. ...Friday, May 20, 2011
  • 58. #4 Apresentação Web do scrappingFriday, May 20, 2011
  • 59. index.htmlFriday, May 20, 2011
  • 60. index.htmlFriday, May 20, 2011
  • 61. index.htmlFriday, May 20, 2011
  • 62. index.htmlFriday, May 20, 2011
  • 63. index.html (...)Friday, May 20, 2011
  • 64. index.html (...)Friday, May 20, 2011
  • 65. index.html (...)Friday, May 20, 2011
  • 66. index.html (...)Friday, May 20, 2011
  • 67. http://chart.apis.google.com/chart? chxt=y&chbh=a&chs=500x300&cht=bvg&chco=A2C180,3D7930 &chd=t:49,51,51,50,50|89,82,94,95,93 &chdl=Women|Literates&chp=0.033Friday, May 20, 2011
  • 68. código disponível em...Friday, May 20, 2011
  • 69. P&RFriday, May 20, 2011
  • 70. http://tinyurl.com/AvaliacaoSOO14Friday, May 20, 2011
  • 71. Friday, May 20, 2011

×