SlideShare a Scribd company logo
How do we develop open source
to help open data
Sammy Fung
sammy.hk
Malaysia Open Source Conference 2013
We want a better life with
public data.
We want a easier way to
access the public data.
Agenda
● What is Open Data ?
● Use of Open Source Software in web crawling.
● Starting new Open Source project hk0weather
to create Open Weather Data.
Sammy Fung
● Software Developer
– to use and develop open source sofware.
– Perl → PHP → Python.
– interests on Data Mining / Web Crawling.
– works at internet service company 43 Global to
deploy OpenStack cloud service.
Sammy Fung
●
Open Source Community Leader.
– Founding Chairman, Hong Kong Linux User Group.
– Community Manager, opensource.hk.
– GNOME Asia committee member.
– Mozilla Rep.
– Program committee member of COSCUP - the largest
Open Source conference in Taiwan.
●
Blogger at sammy.hk.
What is Open Data ?
Open Data
Three Laws of Open Government Data by David Eaves.
1.If it can't be spidered or indexed, it doesn't exist.
2.If it isn't available in open and machine readable format,
it can't engage.
3.If a legal framework doesn't allow it to be repurposed, it
doesn't empower.
http://eaves.ca/2009/09/30/three-law-of-open-government-data/
Open Data
● Tim Berners-Lee, the inventor of the Web.
– 5stardata.info
– 5 star deployment scheme of Open Data.
* One Star - Open Data
1.make your stuff available on the Web (whatever format) under an
open license.
2.make it available as structured data (e.g., Excel instead of image
scan of a table)
3.use non-proprietary formats (e.g., CSV instead of Excel)
4.use URIs to denote things, so that people can point at your stuff.
5.link your data to other data to provide context.
5stardata.info by Tim Berners-Lee, the inventor of the Web.
** Two Star - Open Data
1.make your stuff available on the Web (whatever format) under an
open license.
2.make it available as structured data (e.g., Excel instead of image
scan of a table)
3.use non-proprietary formats (e.g., CSV instead of Excel)
4.use URIs to denote things, so that people can point at your stuff.
5.link your data to other data to provide context.
5stardata.info by Tim Berners-Lee, the inventor of the Web.
*** Three Star - Open Data
1.make your stuff available on the Web (whatever format) under an
open license.
2.make it available as structured data (e.g., Excel instead of image
scan of a table)
3.use non-proprietary formats (e.g., CSV instead of Excel)
4.use URIs to denote things, so that people can point at your stuff.
5.link your data to other data to provide context.
5stardata.info by Tim Berners-Lee, the inventor of the Web.
**** Four Star - Open Data
1.make your stuff available on the Web (whatever format) under an
open license.
2.make it available as structured data (e.g., Excel instead of image
scan of a table)
3.use non-proprietary formats (e.g., CSV instead of Excel)
4.use URIs to denote things, so that people can point at your stuff.
5.link your data to other data to provide context.
5stardata.info by Tim Berners-Lee, the inventor of the Web.
***** Five Star - Open Data
1.make your stuff available on the Web (whatever format) under an
open license.
2.make it available as structured data (e.g., Excel instead of image
scan of a table)
3.use non-proprietary formats (e.g., CSV instead of Excel)
4.use URIs to denote things, so that people can point at your stuff.
5.link your data to other data to provide context.
5stardata.info by Tim Berners-Lee, the inventor of the Web.
Legco Meeting Minutes
and Voting Results
Legco Meeting Minutes
and Voting Results
Weather Information in Hong Kong
● Hong Kong Observatory
– Hourly Hong Kong Weather Report
– Regional Weather in Hong Kong (10 min updates)
– Weather Forecast and Weekly Weather Forecast
– Typhoon Report and Forecast
Hong Kong Observatory RSS
Hong Kong Observatory RSS
Weather at Data.One
●
My Chinese Blog Post 'Progress of Open
Government Data in Hong Kong' on 2013/1/17.
● Data.One released on 2011/3/31.
● Weather at Data.One provides 7 dataset URLs,
returns RSS (XML) format (Eng/TChi/SChi)
– One word: Useless.
– Data.One dataset (RSS) is completely different with
HKO own paid service (XML).
Weather at Data.One
● Example - Current local weather report:
● Plain text report in RSS.
● Difference to quote report content:
– Website: a pair of HTML tags, eg. <PRE>....</PRE>.
– Data.One: a pair of RSS description tags,
<description>....</description>.
● Other weather data is missing, eg. Regional
temperture updates per each 12 mins.
Weather at Data.One
● Weather at Data.One is 'report' but not 'data'.
● Weather RSS is already released by HKO
before launch of Data.One.
● Technically, json/xml format is better
readable by computer programs.
Open Data is important to citizens.
User of Open Source
Software in web
crawling
Web Scraping
● a computer software technique of extracting
information from websites. (Wikipedia)
● for business, hobbies, research purposes.
Web Scraping
● Look for right URLs to scrap.
● Look for right content from webpages.
● Saving data into data store.
● When to run the web scraping program ?
Use of Open Source Software in
Web Crawling
● Use Open Source Tools to collect useful and
meaningful machine-readable data.
● Doesn't need to wait provider to release data
in machine-readable format.
Open Source Tools
● Python programming lanugage
● with Regular Expression library
● Scrapy web crawling framework
Why python + scrapy ?
● python: my current favourite programming
language for few years.
● scrapy: web crawling framework written in
Python.
What is Scrapy ?
● An open source web scraping framework for
Python.
● Scrapy is a fast high-level screen scraping and
web crawling framework, used to crawl
websites and extract structured data from
their pages. It can be used for a wide range of
purposes, from data mining to monitoring
and automated testing.
Scrapy Features
● define data you want to scrapy
● write spider to extract data
● Built-in: selecting and extracting data from HTML
and XML
● Built-in: JSON, CSV, XML output
● Interactive shell console
● Built-in: web service, telnet console, logging
● Others
Programme List of Paid TVs in 2004
Programme List of Paid TVs in 2004
● I want to know live football match was
showing on which channel.
● Paid TV web site = M$ + IIS + ASP + Flash
● Slow....... Very Slow...... Extremely Slow!
● Couldn't connect at any peak hours!
● Wrote my first web crawler in PHP in 2004.
Public Transportation in 2006-2010
● Kowloon Motor Bus (KMB)
– No map view for a bus route
● Public Transportation Enquiry System (PTES)
– Exteremly Poor, Ugly (or much worse) map UI on
PTES.
HK Observatory and Joint Typhoon
Warning Center
● Any typhoon is coming to Hong Kong ? And
When will it come ?
● No easy data exchange format.
● No RSS nor ATOM.
● We aren't check websites everyday.
My Products
● WeatherHK ← ← ←
● TCTrack
WeatherHK
● http://twitter.com/weatherhk
● hourly current weather report
● weather forecast report
● tropical signal warning
WeatherHK
● Backend: Python + Scrapy + Database +
Twitter + NNTP......
● Frontend: Twitter + Newsgroup
WeatherHK
● http://twitter.com/weatherhk
● Interview by MetroPop in 2009.
My Products
● WeatherHK
● TCTrack ← ← ←
TCTrack
● http://sammy.hk/projects/tctrack/tctrack.php
● Plot TC current and forecast tracks over
Google Map.
● Source:
– JTWC
– HKO
TCTrack
● http://sammy.hk/projects/tctrack/tctrack.php
● Probably first tctrack map in HK using
GoogleMap
● Use of GMap: TCTrack -> Weather
Underground Hong Kong -> HKO
TCTrack
● http://twitter.com/tctrack
● Tweet JTWC updates for Northwest Pacific.
Releases information to citizens
in a better presentation.
Starting new Open
Source project
hk0weather to create
Open Weather Data.
Starting new Open Source projects
to create Open Data
● Develop a open source project.
● Release data in standard machine-readable
data format.
hk0weather
● https://github.com/sammyfung/hk0weather
● Open Source Hong Kong Weather Project.
● convert to JSON data from HKO webpages.
● python + scrapy
● 1st version: from current weather report,
extracting temperture and humidity from 20+
weather stations, export in json format.
hk0weather
● https://github.com/sammyfung/hk0weather
● $ virtualenv hk0weatherenv
● $ source hk0weatherenv/bin/activate
● $ pip install scrapy
● $ git clone
https://github.com/sammyfung/hk0weather.git
● $ cd hk0weather
● $ scrapy crawl currwx -t json -o testresult
hk0weather
● Python
– import re
● Scrapy
– web crawling framework written in Python.
– HtmlXPathSelector.
– built-in JSON, CSV, XML output.
hk0weather
[{"humidity": 80, "station": "hko", "temperture": 17, "time": 1360785720},
{"station": "kingspark", "temperture": 16, "time": 1360785720},
{"station": "wongchukhang", "temperture": 17, "time": 1360785720},
{"station": "takwuling", "temperture": 16, "time": 1360785720},
{"station": "laufaushan", "temperture": 15, "time": 1360785720},
{"station": "taipo", "temperture": 16, "time": 1360785720},
{"station": "shatin", "temperture": 17, "time": 1360785720},
{"station": "tuenmun", "temperture": 17, "time": 1360785720},
{"station": "tseungkwano", "temperture": 16, "time": 1360785720},
{"station": "saikung", "temperture": 16, "time": 1360785720},
{"station": "cheungchau", "temperture": 17, "time": 1360785720},
{"station": "cheungchau", "temperture": 17, "time": 1360785720},
{"station": "tsingyi", "temperture": 17, "time": 1360785720},
{"station": "shekkong", "temperture": 15, "time": 1360785720},
{"station": "tsuenwanhokoon", "temperture": 15, "time": 1360785720},
{"station": "tsuenwanshingmunvalley", "temperture": 17, "time": 1360785720},
{"station": "hongkongpark", "temperture": 17, "time": 1360785720},
{"station": "shaukeiwan", "temperture": 16, "time": 1360785720},
{"station": "kowlooncity", "temperture": 16, "time": 1360785720},
{"station": "happyvalley", "temperture": 18, "time": 1360785720},
{"station": "wongtaisin", "temperture": 17, "time": 1360785720},
{"station": "stanley", "temperture": 16, "time": 1360785720},
{"station": "kwuntong", "temperture": 15, "time": 1360785720},
{"station": "shamshuipo", "temperture": 17, "time": 1360785720}]
Items.py
class Hk0WeatherItem(Item):
time = Field()
station = Field()
temperture = Field()
humidity = Field()
Currwx.py
start_urls = (
'http://www.weather.gov.hk/wxinfo/currwx/curr
entc.htm',
)
Currwx.py
def parse(self, response):
laststation = ''
temperture = int()
stations = []
hxs = HtmlXPathSelector(response)
report = hxs.select('//div[@id="ming"]')
libhk0
class hk0:
stations = [
(u' 天 文 台 ', 'hko'),
(u' 京 士 柏 ', 'kingspark'),
(u' 黃 竹 坑 ', 'wongchukhang'),
(u' 打 鼓 嶺 ', 'takwuling'),
(u' 流 浮 山 ', 'laufaushan'),
libhk0
class hk0:
def gettime(self, report):
…
def hk0current(self, report):
…
Agenda
● What is Open Data ?
● Use of Open Source Software in web crawling.
● Starting new Open Source project hk0weather
to create Open Weather Data.
We want a easier way to
access the public data.
We want a better life with
public data.
Thank You!
sammy.hk

More Related Content

What's hot

Programming the Semantic Web
Programming the Semantic WebProgramming the Semantic Web
Programming the Semantic Web
Steffen Staab
 
Staab programming thesemanticweb
Staab programming thesemanticwebStaab programming thesemanticweb
Staab programming thesemanticwebAneta Tu
 
Adventures in Linked Data Land (presentation by Richard Light)
Adventures in Linked Data Land (presentation by Richard Light)Adventures in Linked Data Land (presentation by Richard Light)
Adventures in Linked Data Land (presentation by Richard Light)
jottevanger
 
Reminiscing about interoperability
Reminiscing about interoperabilityReminiscing about interoperability
Reminiscing about interoperability
Herbert Van de Sompel
 
Ks2009 Semanticweb In Action
Ks2009 Semanticweb In ActionKs2009 Semanticweb In Action
Ks2009 Semanticweb In ActionRinke Hoekstra
 
AHUPO_Vizcaino_remote_presentation_082014
AHUPO_Vizcaino_remote_presentation_082014AHUPO_Vizcaino_remote_presentation_082014
AHUPO_Vizcaino_remote_presentation_082014
Juan Antonio Vizcaino
 
Sparql querying of-property-graphs-harsh thakkar-graph day 2017 sf
Sparql querying of-property-graphs-harsh thakkar-graph day 2017 sfSparql querying of-property-graphs-harsh thakkar-graph day 2017 sf
Sparql querying of-property-graphs-harsh thakkar-graph day 2017 sf
Harsh Thakkar
 
AINL 2016: Bugaychenko
AINL 2016: BugaychenkoAINL 2016: Bugaychenko
AINL 2016: Bugaychenko
Lidia Pivovarova
 
IIIF & Digital Humanities
IIIF & Digital Humanities     IIIF & Digital Humanities
IIIF & Digital Humanities
Jean-Philippe Moreux
 

What's hot (9)

Programming the Semantic Web
Programming the Semantic WebProgramming the Semantic Web
Programming the Semantic Web
 
Staab programming thesemanticweb
Staab programming thesemanticwebStaab programming thesemanticweb
Staab programming thesemanticweb
 
Adventures in Linked Data Land (presentation by Richard Light)
Adventures in Linked Data Land (presentation by Richard Light)Adventures in Linked Data Land (presentation by Richard Light)
Adventures in Linked Data Land (presentation by Richard Light)
 
Reminiscing about interoperability
Reminiscing about interoperabilityReminiscing about interoperability
Reminiscing about interoperability
 
Ks2009 Semanticweb In Action
Ks2009 Semanticweb In ActionKs2009 Semanticweb In Action
Ks2009 Semanticweb In Action
 
AHUPO_Vizcaino_remote_presentation_082014
AHUPO_Vizcaino_remote_presentation_082014AHUPO_Vizcaino_remote_presentation_082014
AHUPO_Vizcaino_remote_presentation_082014
 
Sparql querying of-property-graphs-harsh thakkar-graph day 2017 sf
Sparql querying of-property-graphs-harsh thakkar-graph day 2017 sfSparql querying of-property-graphs-harsh thakkar-graph day 2017 sf
Sparql querying of-property-graphs-harsh thakkar-graph day 2017 sf
 
AINL 2016: Bugaychenko
AINL 2016: BugaychenkoAINL 2016: Bugaychenko
AINL 2016: Bugaychenko
 
IIIF & Digital Humanities
IIIF & Digital Humanities     IIIF & Digital Humanities
IIIF & Digital Humanities
 

Viewers also liked

Strom thurmond wellness & fitness center presentation web
Strom thurmond wellness & fitness center presentation webStrom thurmond wellness & fitness center presentation web
Strom thurmond wellness & fitness center presentation websacrdsgn
 
Open source communities in hong kong and asia (2012 updates) (Summer BarCam...
Open source communities in hong kong and asia  (2012 updates)  (Summer BarCam...Open source communities in hong kong and asia  (2012 updates)  (Summer BarCam...
Open source communities in hong kong and asia (2012 updates) (Summer BarCam...Sammy Fung
 
From Hk0weather to Open Data
From Hk0weather to Open DataFrom Hk0weather to Open Data
From Hk0weather to Open Data
Sammy Fung
 
Use open source software to develop ideas at work
Use open source software to develop ideas at workUse open source software to develop ideas at work
Use open source software to develop ideas at work
Sammy Fung
 
Firefox 4 介紹短講
Firefox 4 介紹短講Firefox 4 介紹短講
Firefox 4 介紹短講
Sammy Fung
 
Présentation Tag The City 20 Septembre
Présentation Tag The City 20 SeptembrePrésentation Tag The City 20 Septembre
Présentation Tag The City 20 SeptembreThierry Marcou
 
Mozilla - Openness of the Web
Mozilla - Openness of the WebMozilla - Openness of the Web
Mozilla - Openness of the WebSammy Fung
 
Software Freedom and Open Source Community
Software Freedom and Open Source CommunitySoftware Freedom and Open Source Community
Software Freedom and Open Source Community
Sammy Fung
 
香港中文開源軟件翻譯
香港中文開源軟件翻譯香港中文開源軟件翻譯
香港中文開源軟件翻譯
Sammy Fung
 
Global Open Source Development 2011-2014 Review and 2015 Forecast
Global Open Source Development 2011-2014 Review and 2015 ForecastGlobal Open Source Development 2011-2014 Review and 2015 Forecast
Global Open Source Development 2011-2014 Review and 2015 Forecast
Sammy Fung
 

Viewers also liked (12)

Datamining
DataminingDatamining
Datamining
 
U72 lesson 04
U72 lesson 04U72 lesson 04
U72 lesson 04
 
Strom thurmond wellness & fitness center presentation web
Strom thurmond wellness & fitness center presentation webStrom thurmond wellness & fitness center presentation web
Strom thurmond wellness & fitness center presentation web
 
Open source communities in hong kong and asia (2012 updates) (Summer BarCam...
Open source communities in hong kong and asia  (2012 updates)  (Summer BarCam...Open source communities in hong kong and asia  (2012 updates)  (Summer BarCam...
Open source communities in hong kong and asia (2012 updates) (Summer BarCam...
 
From Hk0weather to Open Data
From Hk0weather to Open DataFrom Hk0weather to Open Data
From Hk0weather to Open Data
 
Use open source software to develop ideas at work
Use open source software to develop ideas at workUse open source software to develop ideas at work
Use open source software to develop ideas at work
 
Firefox 4 介紹短講
Firefox 4 介紹短講Firefox 4 介紹短講
Firefox 4 介紹短講
 
Présentation Tag The City 20 Septembre
Présentation Tag The City 20 SeptembrePrésentation Tag The City 20 Septembre
Présentation Tag The City 20 Septembre
 
Mozilla - Openness of the Web
Mozilla - Openness of the WebMozilla - Openness of the Web
Mozilla - Openness of the Web
 
Software Freedom and Open Source Community
Software Freedom and Open Source CommunitySoftware Freedom and Open Source Community
Software Freedom and Open Source Community
 
香港中文開源軟件翻譯
香港中文開源軟件翻譯香港中文開源軟件翻譯
香港中文開源軟件翻譯
 
Global Open Source Development 2011-2014 Review and 2015 Forecast
Global Open Source Development 2011-2014 Review and 2015 ForecastGlobal Open Source Development 2011-2014 Review and 2015 Forecast
Global Open Source Development 2011-2014 Review and 2015 Forecast
 

Similar to How do we develop open source software to help open data ? (MOSC 2013)

Use of Open Data in Hong Kong
Use of Open Data in Hong KongUse of Open Data in Hong Kong
Use of Open Data in Hong Kong
Sammy Fung
 
Local Weather Information and GNOME Shell Extension
Local Weather Information and GNOME Shell ExtensionLocal Weather Information and GNOME Shell Extension
Local Weather Information and GNOME Shell ExtensionSammy Fung
 
Use of Open Data in Hong Kong (LegCo 2014)
Use of Open Data in Hong Kong (LegCo 2014)Use of Open Data in Hong Kong (LegCo 2014)
Use of Open Data in Hong Kong (LegCo 2014)
Sammy Fung
 
Open Data and Web API
Open Data and Web APIOpen Data and Web API
Open Data and Web API
Sammy Fung
 
Web Scraping_ Gathering Data from Websites.pptx
Web Scraping_ Gathering Data from Websites.pptxWeb Scraping_ Gathering Data from Websites.pptx
Web Scraping_ Gathering Data from Websites.pptx
HitechIOT
 
Gsoc proposal 2021 polaris
Gsoc proposal 2021 polarisGsoc proposal 2021 polaris
Gsoc proposal 2021 polaris
AyushBansal122
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open Data
Ontotext
 
Gsoc proposal
Gsoc proposalGsoc proposal
Gsoc proposal
AyushBansal122
 
Access Open Data with Open Source Software Tools
Access Open Data with Open Source Software ToolsAccess Open Data with Open Source Software Tools
Access Open Data with Open Source Software ToolsSammy Fung
 
Simplified News Analytics in Presidential Election with Google Cloud Platform
Simplified News Analytics in Presidential Election with Google Cloud PlatformSimplified News Analytics in Presidential Election with Google Cloud Platform
Simplified News Analytics in Presidential Election with Google Cloud Platform
Imre Nagi
 
Stop making tools! Nobody likes them anyway...
Stop making tools! Nobody likes them anyway...Stop making tools! Nobody likes them anyway...
Stop making tools! Nobody likes them anyway...
Christophe Guéret
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
C4Media
 
Open Data Node - Platform and Methodology - 2015-May
Open Data Node - Platform and Methodology - 2015-MayOpen Data Node - Platform and Methodology - 2015-May
Open Data Node - Platform and Methodology - 2015-May
Comsode - FP7 project
 
Web scraping 1 2-3 with python + scrapy (Summer BarCampHK 2012 version)
Web scraping 1 2-3 with python + scrapy (Summer BarCampHK 2012 version)Web scraping 1 2-3 with python + scrapy (Summer BarCampHK 2012 version)
Web scraping 1 2-3 with python + scrapy (Summer BarCampHK 2012 version)Sammy Fung
 
PSUG 1 - 2024-01-22 - Onboarding Best Practices
PSUG 1 - 2024-01-22 - Onboarding Best PracticesPSUG 1 - 2024-01-22 - Onboarding Best Practices
PSUG 1 - 2024-01-22 - Onboarding Best Practices
Tomas Moser
 
Akash rajguru project report sem v
Akash rajguru project report sem vAkash rajguru project report sem v
Akash rajguru project report sem v
Akash Rajguru
 
Nt1310 Final Exam Questions And Answers
Nt1310 Final Exam Questions And AnswersNt1310 Final Exam Questions And Answers
Nt1310 Final Exam Questions And Answers
Lisa Williams
 
Deep Web
Deep WebDeep Web
Deep Web
Gol D Roger
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us?
Andrea Volpini
 

Similar to How do we develop open source software to help open data ? (MOSC 2013) (20)

Ice dec04-04-sammy
Ice dec04-04-sammyIce dec04-04-sammy
Ice dec04-04-sammy
 
Use of Open Data in Hong Kong
Use of Open Data in Hong KongUse of Open Data in Hong Kong
Use of Open Data in Hong Kong
 
Local Weather Information and GNOME Shell Extension
Local Weather Information and GNOME Shell ExtensionLocal Weather Information and GNOME Shell Extension
Local Weather Information and GNOME Shell Extension
 
Use of Open Data in Hong Kong (LegCo 2014)
Use of Open Data in Hong Kong (LegCo 2014)Use of Open Data in Hong Kong (LegCo 2014)
Use of Open Data in Hong Kong (LegCo 2014)
 
Open Data and Web API
Open Data and Web APIOpen Data and Web API
Open Data and Web API
 
Web Scraping_ Gathering Data from Websites.pptx
Web Scraping_ Gathering Data from Websites.pptxWeb Scraping_ Gathering Data from Websites.pptx
Web Scraping_ Gathering Data from Websites.pptx
 
Gsoc proposal 2021 polaris
Gsoc proposal 2021 polarisGsoc proposal 2021 polaris
Gsoc proposal 2021 polaris
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open Data
 
Gsoc proposal
Gsoc proposalGsoc proposal
Gsoc proposal
 
Access Open Data with Open Source Software Tools
Access Open Data with Open Source Software ToolsAccess Open Data with Open Source Software Tools
Access Open Data with Open Source Software Tools
 
Simplified News Analytics in Presidential Election with Google Cloud Platform
Simplified News Analytics in Presidential Election with Google Cloud PlatformSimplified News Analytics in Presidential Election with Google Cloud Platform
Simplified News Analytics in Presidential Election with Google Cloud Platform
 
Stop making tools! Nobody likes them anyway...
Stop making tools! Nobody likes them anyway...Stop making tools! Nobody likes them anyway...
Stop making tools! Nobody likes them anyway...
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
Open Data Node - Platform and Methodology - 2015-May
Open Data Node - Platform and Methodology - 2015-MayOpen Data Node - Platform and Methodology - 2015-May
Open Data Node - Platform and Methodology - 2015-May
 
Web scraping 1 2-3 with python + scrapy (Summer BarCampHK 2012 version)
Web scraping 1 2-3 with python + scrapy (Summer BarCampHK 2012 version)Web scraping 1 2-3 with python + scrapy (Summer BarCampHK 2012 version)
Web scraping 1 2-3 with python + scrapy (Summer BarCampHK 2012 version)
 
PSUG 1 - 2024-01-22 - Onboarding Best Practices
PSUG 1 - 2024-01-22 - Onboarding Best PracticesPSUG 1 - 2024-01-22 - Onboarding Best Practices
PSUG 1 - 2024-01-22 - Onboarding Best Practices
 
Akash rajguru project report sem v
Akash rajguru project report sem vAkash rajguru project report sem v
Akash rajguru project report sem v
 
Nt1310 Final Exam Questions And Answers
Nt1310 Final Exam Questions And AnswersNt1310 Final Exam Questions And Answers
Nt1310 Final Exam Questions And Answers
 
Deep Web
Deep WebDeep Web
Deep Web
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us?
 

More from Sammy Fung

Python 爬網⾴工具 - Scrapy 介紹
Python 爬網⾴工具 - Scrapy 介紹Python 爬網⾴工具 - Scrapy 介紹
Python 爬網⾴工具 - Scrapy 介紹
Sammy Fung
 
DevRel - Transform article writing from printing to online
DevRel - Transform article writing from printing to onlineDevRel - Transform article writing from printing to online
DevRel - Transform article writing from printing to online
Sammy Fung
 
Introduction to Open Source by opensource.hk (2019 Edition)
Introduction to Open Source by opensource.hk (2019 Edition)Introduction to Open Source by opensource.hk (2019 Edition)
Introduction to Open Source by opensource.hk (2019 Edition)
Sammy Fung
 
My Open Source Journey - Developer and Community
My Open Source Journey - Developer and CommunityMy Open Source Journey - Developer and Community
My Open Source Journey - Developer and Community
Sammy Fung
 
Introduction to development with Django web framework
Introduction to development with Django web frameworkIntroduction to development with Django web framework
Introduction to development with Django web framework
Sammy Fung
 
Open Source Technology and Community
Open Source Technology and CommunityOpen Source Technology and Community
Open Source Technology and Community
Sammy Fung
 
Installation of LAMP Server with Ubuntu 14.10 Server Edition
Installation of LAMP Server with Ubuntu 14.10 Server EditionInstallation of LAMP Server with Ubuntu 14.10 Server Edition
Installation of LAMP Server with Ubuntu 14.10 Server Edition
Sammy Fung
 
Building your own job site with Drupal
Building your own job site with DrupalBuilding your own job site with Drupal
Building your own job site with Drupal
Sammy Fung
 
Software Freedom and Community
Software Freedom and CommunitySoftware Freedom and Community
Software Freedom and Community
Sammy Fung
 
Open Source Job Board
Open Source Job BoardOpen Source Job Board
Open Source Job Board
Sammy Fung
 
Introduction of Mozilla Hong Kong (COSCUP 2014)
Introduction of Mozilla Hong Kong (COSCUP 2014)Introduction of Mozilla Hong Kong (COSCUP 2014)
Introduction of Mozilla Hong Kong (COSCUP 2014)Sammy Fung
 
Introduction of Open Source Job Board with Drupal CMS
Introduction of Open Source Job Board with Drupal CMSIntroduction of Open Source Job Board with Drupal CMS
Introduction of Open Source Job Board with Drupal CMSSammy Fung
 
Python, web scraping and content management: Scrapy and Django
Python, web scraping and content management: Scrapy and DjangoPython, web scraping and content management: Scrapy and Django
Python, web scraping and content management: Scrapy and DjangoSammy Fung
 
Mozilla Community and Hong Kong
Mozilla Community and Hong KongMozilla Community and Hong Kong
Mozilla Community and Hong Kong
Sammy Fung
 
ITFest 2014 - Open Source Marketing
ITFest 2014 - Open Source MarketingITFest 2014 - Open Source Marketing
ITFest 2014 - Open Source Marketing
Sammy Fung
 
How Open Data can help entrepreneurs - ITFest 2014 E2
How Open Data can help entrepreneurs - ITFest 2014 E2How Open Data can help entrepreneurs - ITFest 2014 E2
How Open Data can help entrepreneurs - ITFest 2014 E2
Sammy Fung
 
Air Pollution Weather Map at OpenDataHK.make.02
Air Pollution Weather Map at OpenDataHK.make.02Air Pollution Weather Map at OpenDataHK.make.02
Air Pollution Weather Map at OpenDataHK.make.02Sammy Fung
 
15+ years of open source movements in Hong Kong
15+ years of open source movements in Hong Kong15+ years of open source movements in Hong Kong
15+ years of open source movements in Hong Kong
Sammy Fung
 
Introduction of g0v.tw at OpenDataHK.meet.12
Introduction of g0v.tw at OpenDataHK.meet.12Introduction of g0v.tw at OpenDataHK.meet.12
Introduction of g0v.tw at OpenDataHK.meet.12
Sammy Fung
 

More from Sammy Fung (19)

Python 爬網⾴工具 - Scrapy 介紹
Python 爬網⾴工具 - Scrapy 介紹Python 爬網⾴工具 - Scrapy 介紹
Python 爬網⾴工具 - Scrapy 介紹
 
DevRel - Transform article writing from printing to online
DevRel - Transform article writing from printing to onlineDevRel - Transform article writing from printing to online
DevRel - Transform article writing from printing to online
 
Introduction to Open Source by opensource.hk (2019 Edition)
Introduction to Open Source by opensource.hk (2019 Edition)Introduction to Open Source by opensource.hk (2019 Edition)
Introduction to Open Source by opensource.hk (2019 Edition)
 
My Open Source Journey - Developer and Community
My Open Source Journey - Developer and CommunityMy Open Source Journey - Developer and Community
My Open Source Journey - Developer and Community
 
Introduction to development with Django web framework
Introduction to development with Django web frameworkIntroduction to development with Django web framework
Introduction to development with Django web framework
 
Open Source Technology and Community
Open Source Technology and CommunityOpen Source Technology and Community
Open Source Technology and Community
 
Installation of LAMP Server with Ubuntu 14.10 Server Edition
Installation of LAMP Server with Ubuntu 14.10 Server EditionInstallation of LAMP Server with Ubuntu 14.10 Server Edition
Installation of LAMP Server with Ubuntu 14.10 Server Edition
 
Building your own job site with Drupal
Building your own job site with DrupalBuilding your own job site with Drupal
Building your own job site with Drupal
 
Software Freedom and Community
Software Freedom and CommunitySoftware Freedom and Community
Software Freedom and Community
 
Open Source Job Board
Open Source Job BoardOpen Source Job Board
Open Source Job Board
 
Introduction of Mozilla Hong Kong (COSCUP 2014)
Introduction of Mozilla Hong Kong (COSCUP 2014)Introduction of Mozilla Hong Kong (COSCUP 2014)
Introduction of Mozilla Hong Kong (COSCUP 2014)
 
Introduction of Open Source Job Board with Drupal CMS
Introduction of Open Source Job Board with Drupal CMSIntroduction of Open Source Job Board with Drupal CMS
Introduction of Open Source Job Board with Drupal CMS
 
Python, web scraping and content management: Scrapy and Django
Python, web scraping and content management: Scrapy and DjangoPython, web scraping and content management: Scrapy and Django
Python, web scraping and content management: Scrapy and Django
 
Mozilla Community and Hong Kong
Mozilla Community and Hong KongMozilla Community and Hong Kong
Mozilla Community and Hong Kong
 
ITFest 2014 - Open Source Marketing
ITFest 2014 - Open Source MarketingITFest 2014 - Open Source Marketing
ITFest 2014 - Open Source Marketing
 
How Open Data can help entrepreneurs - ITFest 2014 E2
How Open Data can help entrepreneurs - ITFest 2014 E2How Open Data can help entrepreneurs - ITFest 2014 E2
How Open Data can help entrepreneurs - ITFest 2014 E2
 
Air Pollution Weather Map at OpenDataHK.make.02
Air Pollution Weather Map at OpenDataHK.make.02Air Pollution Weather Map at OpenDataHK.make.02
Air Pollution Weather Map at OpenDataHK.make.02
 
15+ years of open source movements in Hong Kong
15+ years of open source movements in Hong Kong15+ years of open source movements in Hong Kong
15+ years of open source movements in Hong Kong
 
Introduction of g0v.tw at OpenDataHK.meet.12
Introduction of g0v.tw at OpenDataHK.meet.12Introduction of g0v.tw at OpenDataHK.meet.12
Introduction of g0v.tw at OpenDataHK.meet.12
 

Recently uploaded

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 

Recently uploaded (20)

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 

How do we develop open source software to help open data ? (MOSC 2013)

  • 1. How do we develop open source to help open data Sammy Fung sammy.hk Malaysia Open Source Conference 2013
  • 2. We want a better life with public data.
  • 3. We want a easier way to access the public data.
  • 4. Agenda ● What is Open Data ? ● Use of Open Source Software in web crawling. ● Starting new Open Source project hk0weather to create Open Weather Data.
  • 5. Sammy Fung ● Software Developer – to use and develop open source sofware. – Perl → PHP → Python. – interests on Data Mining / Web Crawling. – works at internet service company 43 Global to deploy OpenStack cloud service.
  • 6. Sammy Fung ● Open Source Community Leader. – Founding Chairman, Hong Kong Linux User Group. – Community Manager, opensource.hk. – GNOME Asia committee member. – Mozilla Rep. – Program committee member of COSCUP - the largest Open Source conference in Taiwan. ● Blogger at sammy.hk.
  • 7. What is Open Data ?
  • 8. Open Data Three Laws of Open Government Data by David Eaves. 1.If it can't be spidered or indexed, it doesn't exist. 2.If it isn't available in open and machine readable format, it can't engage. 3.If a legal framework doesn't allow it to be repurposed, it doesn't empower. http://eaves.ca/2009/09/30/three-law-of-open-government-data/
  • 9. Open Data ● Tim Berners-Lee, the inventor of the Web. – 5stardata.info – 5 star deployment scheme of Open Data.
  • 10. * One Star - Open Data 1.make your stuff available on the Web (whatever format) under an open license. 2.make it available as structured data (e.g., Excel instead of image scan of a table) 3.use non-proprietary formats (e.g., CSV instead of Excel) 4.use URIs to denote things, so that people can point at your stuff. 5.link your data to other data to provide context. 5stardata.info by Tim Berners-Lee, the inventor of the Web.
  • 11. ** Two Star - Open Data 1.make your stuff available on the Web (whatever format) under an open license. 2.make it available as structured data (e.g., Excel instead of image scan of a table) 3.use non-proprietary formats (e.g., CSV instead of Excel) 4.use URIs to denote things, so that people can point at your stuff. 5.link your data to other data to provide context. 5stardata.info by Tim Berners-Lee, the inventor of the Web.
  • 12. *** Three Star - Open Data 1.make your stuff available on the Web (whatever format) under an open license. 2.make it available as structured data (e.g., Excel instead of image scan of a table) 3.use non-proprietary formats (e.g., CSV instead of Excel) 4.use URIs to denote things, so that people can point at your stuff. 5.link your data to other data to provide context. 5stardata.info by Tim Berners-Lee, the inventor of the Web.
  • 13. **** Four Star - Open Data 1.make your stuff available on the Web (whatever format) under an open license. 2.make it available as structured data (e.g., Excel instead of image scan of a table) 3.use non-proprietary formats (e.g., CSV instead of Excel) 4.use URIs to denote things, so that people can point at your stuff. 5.link your data to other data to provide context. 5stardata.info by Tim Berners-Lee, the inventor of the Web.
  • 14. ***** Five Star - Open Data 1.make your stuff available on the Web (whatever format) under an open license. 2.make it available as structured data (e.g., Excel instead of image scan of a table) 3.use non-proprietary formats (e.g., CSV instead of Excel) 4.use URIs to denote things, so that people can point at your stuff. 5.link your data to other data to provide context. 5stardata.info by Tim Berners-Lee, the inventor of the Web.
  • 15. Legco Meeting Minutes and Voting Results
  • 16. Legco Meeting Minutes and Voting Results
  • 17. Weather Information in Hong Kong ● Hong Kong Observatory – Hourly Hong Kong Weather Report – Regional Weather in Hong Kong (10 min updates) – Weather Forecast and Weekly Weather Forecast – Typhoon Report and Forecast
  • 20. Weather at Data.One ● My Chinese Blog Post 'Progress of Open Government Data in Hong Kong' on 2013/1/17. ● Data.One released on 2011/3/31. ● Weather at Data.One provides 7 dataset URLs, returns RSS (XML) format (Eng/TChi/SChi) – One word: Useless. – Data.One dataset (RSS) is completely different with HKO own paid service (XML).
  • 21. Weather at Data.One ● Example - Current local weather report: ● Plain text report in RSS. ● Difference to quote report content: – Website: a pair of HTML tags, eg. <PRE>....</PRE>. – Data.One: a pair of RSS description tags, <description>....</description>. ● Other weather data is missing, eg. Regional temperture updates per each 12 mins.
  • 22. Weather at Data.One ● Weather at Data.One is 'report' but not 'data'. ● Weather RSS is already released by HKO before launch of Data.One. ● Technically, json/xml format is better readable by computer programs.
  • 23. Open Data is important to citizens.
  • 24. User of Open Source Software in web crawling
  • 25. Web Scraping ● a computer software technique of extracting information from websites. (Wikipedia) ● for business, hobbies, research purposes.
  • 26. Web Scraping ● Look for right URLs to scrap. ● Look for right content from webpages. ● Saving data into data store. ● When to run the web scraping program ?
  • 27. Use of Open Source Software in Web Crawling ● Use Open Source Tools to collect useful and meaningful machine-readable data. ● Doesn't need to wait provider to release data in machine-readable format.
  • 28. Open Source Tools ● Python programming lanugage ● with Regular Expression library ● Scrapy web crawling framework
  • 29. Why python + scrapy ? ● python: my current favourite programming language for few years. ● scrapy: web crawling framework written in Python.
  • 30. What is Scrapy ? ● An open source web scraping framework for Python. ● Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.
  • 31. Scrapy Features ● define data you want to scrapy ● write spider to extract data ● Built-in: selecting and extracting data from HTML and XML ● Built-in: JSON, CSV, XML output ● Interactive shell console ● Built-in: web service, telnet console, logging ● Others
  • 32. Programme List of Paid TVs in 2004
  • 33. Programme List of Paid TVs in 2004 ● I want to know live football match was showing on which channel. ● Paid TV web site = M$ + IIS + ASP + Flash ● Slow....... Very Slow...... Extremely Slow! ● Couldn't connect at any peak hours! ● Wrote my first web crawler in PHP in 2004.
  • 34. Public Transportation in 2006-2010 ● Kowloon Motor Bus (KMB) – No map view for a bus route ● Public Transportation Enquiry System (PTES) – Exteremly Poor, Ugly (or much worse) map UI on PTES.
  • 35. HK Observatory and Joint Typhoon Warning Center ● Any typhoon is coming to Hong Kong ? And When will it come ? ● No easy data exchange format. ● No RSS nor ATOM. ● We aren't check websites everyday.
  • 36. My Products ● WeatherHK ← ← ← ● TCTrack
  • 37. WeatherHK ● http://twitter.com/weatherhk ● hourly current weather report ● weather forecast report ● tropical signal warning
  • 38. WeatherHK ● Backend: Python + Scrapy + Database + Twitter + NNTP...... ● Frontend: Twitter + Newsgroup
  • 40. My Products ● WeatherHK ● TCTrack ← ← ←
  • 41. TCTrack ● http://sammy.hk/projects/tctrack/tctrack.php ● Plot TC current and forecast tracks over Google Map. ● Source: – JTWC – HKO
  • 42. TCTrack ● http://sammy.hk/projects/tctrack/tctrack.php ● Probably first tctrack map in HK using GoogleMap ● Use of GMap: TCTrack -> Weather Underground Hong Kong -> HKO
  • 43. TCTrack ● http://twitter.com/tctrack ● Tweet JTWC updates for Northwest Pacific.
  • 44. Releases information to citizens in a better presentation.
  • 45. Starting new Open Source project hk0weather to create Open Weather Data.
  • 46. Starting new Open Source projects to create Open Data ● Develop a open source project. ● Release data in standard machine-readable data format.
  • 47. hk0weather ● https://github.com/sammyfung/hk0weather ● Open Source Hong Kong Weather Project. ● convert to JSON data from HKO webpages. ● python + scrapy ● 1st version: from current weather report, extracting temperture and humidity from 20+ weather stations, export in json format.
  • 48. hk0weather ● https://github.com/sammyfung/hk0weather ● $ virtualenv hk0weatherenv ● $ source hk0weatherenv/bin/activate ● $ pip install scrapy ● $ git clone https://github.com/sammyfung/hk0weather.git ● $ cd hk0weather ● $ scrapy crawl currwx -t json -o testresult
  • 49. hk0weather ● Python – import re ● Scrapy – web crawling framework written in Python. – HtmlXPathSelector. – built-in JSON, CSV, XML output.
  • 50. hk0weather [{"humidity": 80, "station": "hko", "temperture": 17, "time": 1360785720}, {"station": "kingspark", "temperture": 16, "time": 1360785720}, {"station": "wongchukhang", "temperture": 17, "time": 1360785720}, {"station": "takwuling", "temperture": 16, "time": 1360785720}, {"station": "laufaushan", "temperture": 15, "time": 1360785720}, {"station": "taipo", "temperture": 16, "time": 1360785720}, {"station": "shatin", "temperture": 17, "time": 1360785720}, {"station": "tuenmun", "temperture": 17, "time": 1360785720}, {"station": "tseungkwano", "temperture": 16, "time": 1360785720}, {"station": "saikung", "temperture": 16, "time": 1360785720}, {"station": "cheungchau", "temperture": 17, "time": 1360785720}, {"station": "cheungchau", "temperture": 17, "time": 1360785720}, {"station": "tsingyi", "temperture": 17, "time": 1360785720}, {"station": "shekkong", "temperture": 15, "time": 1360785720}, {"station": "tsuenwanhokoon", "temperture": 15, "time": 1360785720}, {"station": "tsuenwanshingmunvalley", "temperture": 17, "time": 1360785720}, {"station": "hongkongpark", "temperture": 17, "time": 1360785720}, {"station": "shaukeiwan", "temperture": 16, "time": 1360785720}, {"station": "kowlooncity", "temperture": 16, "time": 1360785720}, {"station": "happyvalley", "temperture": 18, "time": 1360785720}, {"station": "wongtaisin", "temperture": 17, "time": 1360785720}, {"station": "stanley", "temperture": 16, "time": 1360785720}, {"station": "kwuntong", "temperture": 15, "time": 1360785720}, {"station": "shamshuipo", "temperture": 17, "time": 1360785720}]
  • 51. Items.py class Hk0WeatherItem(Item): time = Field() station = Field() temperture = Field() humidity = Field()
  • 53. Currwx.py def parse(self, response): laststation = '' temperture = int() stations = [] hxs = HtmlXPathSelector(response) report = hxs.select('//div[@id="ming"]')
  • 54. libhk0 class hk0: stations = [ (u' 天 文 台 ', 'hko'), (u' 京 士 柏 ', 'kingspark'), (u' 黃 竹 坑 ', 'wongchukhang'), (u' 打 鼓 嶺 ', 'takwuling'), (u' 流 浮 山 ', 'laufaushan'),
  • 55. libhk0 class hk0: def gettime(self, report): … def hk0current(self, report): …
  • 56. Agenda ● What is Open Data ? ● Use of Open Source Software in web crawling. ● Starting new Open Source project hk0weather to create Open Weather Data.
  • 57. We want a easier way to access the public data.
  • 58. We want a better life with public data.