Collecting web information with open source tools

2,485 views

Published on

my lightening talk slide at coscup 2011, taipei

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,485
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
25
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Collecting web information with open source tools

  1. 1. Collectinguseful information from web withopen source tools
  2. 2. @sammyfung
  3. 3. Hong Kong
  4. 4. First chairman ofHong Kong Linux User Group
  5. 5. opensource.hk webmaster
  6. 6. How does programmers solve problems in daily life ? 程式員解決現實問題的方法 ?
  7. 7. Coding!就是寫程式 !
  8. 8. a lot of popular web sites running on II$ in Hong Kong.香港很多大型網站 都是用 II$
  9. 9. Very slow when youre using!當你在用的時候,就會很慢!
  10. 10. Visiting websites manually,repeatly for any latest update. 為了追蹤最新消息,人手重覆重瀏覽同一網站
  11. 11. Will you still addicted to plurk/twitter withoutauto new response/reply alert ?如果沒有自動新回應提 示 , 你還會沉迷噗浪 和推特 ?
  12. 12. What do you need ? 你需要甚麼 ?
  13. 13. Regular Expression
  14. 14. HTML Parser
  15. 15. Web Crawling Framework
  16. 16. scrapy.org
  17. 17. About Scrapy written in python x = HtmlXPathSelector(response) torrent = TorrentItem() torrent[url] = response.urltorrent[name] = x.select("//h1/text()").extract() <h1>Hello World</h1>
  18. 18. all of above are available in open source!以上所有的也有 開源軟件
  19. 19. Problem #1a lot of popular web sites running on II$ in Hong Kong.
  20. 20. develop a list of football matches live on cable tv做了「電視足球直播時間表」
  21. 21. Problem #2some web sites doesnt provide data API.
  22. 22. Hong Kong Weather Info 香港天氣
  23. 23. @weatherhk
  24. 24. Alerts of Tropical Cyclonesin Northwest Pacific Ocean @tctrack @tropicalhk
  25. 25. Path and Forecast ofactive tropical cyclone
  26. 26. Lets solveyour own problems with open source tools. 所以多多利用開源軟件 來解決 你生活上遇到的問題吧   Thank you! 謝謝 !
  27. 27. solving problems with open source. Thank you.

×