Presentation on use of open data in HK given to Legislative Council Secretariat. Content is mixed from my presentations at startmeup 2013 and opendatahk meetup.
The paper describes the work being conducted in the Cross-institutional Authority Collaboration (Institutionenübergreifende Integration von Normdaten, IN2N) project. This pilot project, executed in cooperation with the German National Library and the German Film Institute, aims to establish new collaboration models to improve cross-domain authority maintenance. The paper outlines applied strategies for providing a shared infrastructure as well as workflows for exchanging data about persons; interface enhancements permitting the exploitation of innovative web approaches; and cross-institutional data search and representation solutions. Furthermore, we discuss specific boundary conditions, such as disparities in the level of data granularity, for an interoperable cataloguing environment.
Registration / Certification Interoperability Architecture (overlay peer-review)Herbert Van de Sompel
Presentation for the COAR meeting on Overlay Peer-Review held at INRIA, Paris, France. It provides overall context regarding a scholarly communication system in which the core functions of scholarly communication (registration, certification, awareness, archiving) are implemented in a decoupled manner and whereby each function can simultaneously be fulfilled by different parties, potentially in different ways. It shows how notifications can be used to achieve loosely coupled, point-to-point interoperability in such an environment, zooming in on interoperability between registration and certification aka interoperability between repositories and overlay peer-review services.
Evolving the Web into a Global Dataspace – Advances and ApplicationsChris Bizer
Keynote talk at the 18th International Conference on Business Information Systems, 24-26 June 2015, Poznań, Poland
URL:
http://bis.kie.ue.poznan.pl/bis2015/keynote-speakers/
Abstract:
Motivated by Google, Yahoo!, Microsoft, and Facebook, hundreds of thousands of websites have started to annotate structured data within their pages using markup formats such as Microdata, RDFa, and Microformats. In parallel, the adoption of Linked Data technologies by government agencies, libraries, and scientific institutions has risen considerably. In his talk, Christian Bizer will give an overview of the content profile of the resulting Web of Data. He will showcase applications that exploit the Web of Data and will discuss the challenges of integrating and cleansing data from thousands of independent Web data sources.
The paper describes the work being conducted in the Cross-institutional Authority Collaboration (Institutionenübergreifende Integration von Normdaten, IN2N) project. This pilot project, executed in cooperation with the German National Library and the German Film Institute, aims to establish new collaboration models to improve cross-domain authority maintenance. The paper outlines applied strategies for providing a shared infrastructure as well as workflows for exchanging data about persons; interface enhancements permitting the exploitation of innovative web approaches; and cross-institutional data search and representation solutions. Furthermore, we discuss specific boundary conditions, such as disparities in the level of data granularity, for an interoperable cataloguing environment.
Registration / Certification Interoperability Architecture (overlay peer-review)Herbert Van de Sompel
Presentation for the COAR meeting on Overlay Peer-Review held at INRIA, Paris, France. It provides overall context regarding a scholarly communication system in which the core functions of scholarly communication (registration, certification, awareness, archiving) are implemented in a decoupled manner and whereby each function can simultaneously be fulfilled by different parties, potentially in different ways. It shows how notifications can be used to achieve loosely coupled, point-to-point interoperability in such an environment, zooming in on interoperability between registration and certification aka interoperability between repositories and overlay peer-review services.
Evolving the Web into a Global Dataspace – Advances and ApplicationsChris Bizer
Keynote talk at the 18th International Conference on Business Information Systems, 24-26 June 2015, Poznań, Poland
URL:
http://bis.kie.ue.poznan.pl/bis2015/keynote-speakers/
Abstract:
Motivated by Google, Yahoo!, Microsoft, and Facebook, hundreds of thousands of websites have started to annotate structured data within their pages using markup formats such as Microdata, RDFa, and Microformats. In parallel, the adoption of Linked Data technologies by government agencies, libraries, and scientific institutions has risen considerably. In his talk, Christian Bizer will give an overview of the content profile of the resulting Web of Data. He will showcase applications that exploit the Web of Data and will discuss the challenges of integrating and cleansing data from thousands of independent Web data sources.
Open Data Institute Course - Open Data in a Day conducted by Registered ODI Trainer Ian Henshaw on October 14, 2015 in RTP, NC USA - Deck #1 Introduction to Open Data
TIMELINESS OF OPEN DATA IN OPEN GOVERNMENT DATA PORTALS THROUGH PANDEMIC-RELA...Anastasija Nikiforova
This presentation is a supplementary material for the following article -> Nikiforova, A. (2020, October). Timeliness of open data in open government data portals through pandemic-related data: a long data way from the publisher to the user. In 2020 Fourth International Conference on Multimedia Computing, Networking and Applications (MCNA) (pp. 131-138). IEEE.
The paper addresses the “timeliness” of data in open government data (OGD) portals. It is one of the primary principles of open data, which is considered to be a success factor, while at the same time it is one of the biggest barriers that can disrupt users trust in data and even the desire to use the entire open data portal. However, assessing this aspect is a very difficult task that, in most cases, becomes an impossible for open data users. There is therefore a lack of comparative studies on the timeliness of data of different national open data portals. Unfortunately, 2020 gave the opportunity to find out this. It became easy enough to compare how long is the data path from the data holder to the OGD portal by analysing the timeliness of Covid-19-related data sets in relation to the first case observed in a country. The study thus fills the gap of comparative studies by addressing 60 countries and their OGD portals concerning the timeliness of the data, providing a report on how much and what countries provide the open data as quickly as possible. It makes it possible to understand how quickly OGD portals react to emergencies by opening and updating data for their further potential reuse, which is essential in the digital data-driven world.
Read paper here -> Nikiforova, A. (2020, October). Timeliness of open data in open government data portals through pandemic-related data: a long data way from the publisher to the user. In 2020 Fourth International Conference on Multimedia Computing, Networking and Applications (MCNA) (pp. 131-138). IEEE.https://ieeexplore.ieee.org/abstract/document/9264298?casa_token=FtfC_6bqZnsAAAAA:TaSnKrE7ZCxLyq5hvxX-X8O2sK_vZYcodTBtxoWOvaOAIFmMmy65f5dIK-kKYxFAMiC5jyl7Eeg
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...Heiko Paulheim
Many data mining problems can be solved better if more background knowledge is added: predictive models can become more accurate, and descriptive models can reveal more interesting findings. However, collecting and integrating background knowledge is tedious manual work. In this paper, we introduce the RapidMiner Linked Open Data Extension, which can extend a dataset at hand with additional attributes drawn from the Linked Open Data (LOD) cloud, a large collection of publicly available datasets on various topics. The extension contains operators for linking local data to open data in the LOD cloud, and for augmenting it with additional attributes. In a case study, we show that the prediction error of car fuel consumption can be reduced by 50% by adding additional attributes, e.g., describing the automobile layout and the car body configuration, from Linked Open Data.
Ligado nos Políticos at ESWC'2011 WorkshopPablo Mendes
Publishing Linked Data from Brazilian Politicians on the Web
Lucas de Ramos Araújo
Pablo N. Mendes
Jairo Francisco de Souza
At the Workshop on Semantics in Governance and Policy Modelling, Extended Semantic Web Conference 2011 ESWC2010.
May 30, 2011 - Crete, Greece
In this webinar, David Walters and Christopher Daley from Brunel University London explore the complexities of the current UK open access (OA) policy landscape and examine the concurrent emergence of open science services which aim to provide global OA data. Using the results of their recent study, they will consider whether the global data on open access activity offered by services such as Sherpa REF, CORE and Unpaywall can enhance publication data held within institutional systems. These results will assist discussion of how local, institutional systems may be able to provide a more complete picture of collaborative OA activity through greater interoperability with global data services.
A presentation on the value and the risks of identifying, mining, and visualizing data. All this is described in a big-data-aware setting. The presentation is meant for a wide audience, not requiring deep technical background.
The original presentation was done within the KAS Seminar on Data Journalism in Dec 2017.
Open Data Institute Course - Open Data in a Day conducted by Registered ODI Trainer Ian Henshaw on October 14, 2015 in RTP, NC USA - Deck #1 Introduction to Open Data
TIMELINESS OF OPEN DATA IN OPEN GOVERNMENT DATA PORTALS THROUGH PANDEMIC-RELA...Anastasija Nikiforova
This presentation is a supplementary material for the following article -> Nikiforova, A. (2020, October). Timeliness of open data in open government data portals through pandemic-related data: a long data way from the publisher to the user. In 2020 Fourth International Conference on Multimedia Computing, Networking and Applications (MCNA) (pp. 131-138). IEEE.
The paper addresses the “timeliness” of data in open government data (OGD) portals. It is one of the primary principles of open data, which is considered to be a success factor, while at the same time it is one of the biggest barriers that can disrupt users trust in data and even the desire to use the entire open data portal. However, assessing this aspect is a very difficult task that, in most cases, becomes an impossible for open data users. There is therefore a lack of comparative studies on the timeliness of data of different national open data portals. Unfortunately, 2020 gave the opportunity to find out this. It became easy enough to compare how long is the data path from the data holder to the OGD portal by analysing the timeliness of Covid-19-related data sets in relation to the first case observed in a country. The study thus fills the gap of comparative studies by addressing 60 countries and their OGD portals concerning the timeliness of the data, providing a report on how much and what countries provide the open data as quickly as possible. It makes it possible to understand how quickly OGD portals react to emergencies by opening and updating data for their further potential reuse, which is essential in the digital data-driven world.
Read paper here -> Nikiforova, A. (2020, October). Timeliness of open data in open government data portals through pandemic-related data: a long data way from the publisher to the user. In 2020 Fourth International Conference on Multimedia Computing, Networking and Applications (MCNA) (pp. 131-138). IEEE.https://ieeexplore.ieee.org/abstract/document/9264298?casa_token=FtfC_6bqZnsAAAAA:TaSnKrE7ZCxLyq5hvxX-X8O2sK_vZYcodTBtxoWOvaOAIFmMmy65f5dIK-kKYxFAMiC5jyl7Eeg
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...Heiko Paulheim
Many data mining problems can be solved better if more background knowledge is added: predictive models can become more accurate, and descriptive models can reveal more interesting findings. However, collecting and integrating background knowledge is tedious manual work. In this paper, we introduce the RapidMiner Linked Open Data Extension, which can extend a dataset at hand with additional attributes drawn from the Linked Open Data (LOD) cloud, a large collection of publicly available datasets on various topics. The extension contains operators for linking local data to open data in the LOD cloud, and for augmenting it with additional attributes. In a case study, we show that the prediction error of car fuel consumption can be reduced by 50% by adding additional attributes, e.g., describing the automobile layout and the car body configuration, from Linked Open Data.
Ligado nos Políticos at ESWC'2011 WorkshopPablo Mendes
Publishing Linked Data from Brazilian Politicians on the Web
Lucas de Ramos Araújo
Pablo N. Mendes
Jairo Francisco de Souza
At the Workshop on Semantics in Governance and Policy Modelling, Extended Semantic Web Conference 2011 ESWC2010.
May 30, 2011 - Crete, Greece
In this webinar, David Walters and Christopher Daley from Brunel University London explore the complexities of the current UK open access (OA) policy landscape and examine the concurrent emergence of open science services which aim to provide global OA data. Using the results of their recent study, they will consider whether the global data on open access activity offered by services such as Sherpa REF, CORE and Unpaywall can enhance publication data held within institutional systems. These results will assist discussion of how local, institutional systems may be able to provide a more complete picture of collaborative OA activity through greater interoperability with global data services.
A presentation on the value and the risks of identifying, mining, and visualizing data. All this is described in a big-data-aware setting. The presentation is meant for a wide audience, not requiring deep technical background.
The original presentation was done within the KAS Seminar on Data Journalism in Dec 2017.
Put Your Desktop in the Cloud In Support of the Open Government Directive and...guest1e3ee089
Proposal:
Session Objectives
Key Audiences
Session Format
Key Questions to be Addressed
Session Participants
AV and Other Requirements
Tutorial Materials:
Background
EPA Enterprise Architecture (Land and Water)
EPA Ontology Standard (Faceted Search and Desktop Versions)
MyAirQuality (iPhone App developed by NOAA)
Put Your Desktop in the Cloud In Support of the Open Government Directive and...guest8c518a8
As part of “Put Your Desktop in the Cloud to Support the Open Government Directive and Data.gov/semantic”, I believe that each government employee should:
Create an Open Government Webpage;
Create an Open Government Dashboard; and
Publish Three or More Data Sets.
Putting the L in front: from Open Data to Linked Open DataMartin Kaltenböck
Keynote presentation of Martin Kaltenböck (LOD2 project, Semantic Web Company) at the Government Linked Data Workshop in the course of the OGD Camp 2011 in Warsaw, Poland: Putting the L in front: from Open Data to Linked Open Data
Von Open Data zu Linked Open Data, M. Kaltenböck, SWCMartin Kaltenböck
Präsentation von Martin Kaltenböck, Semantic Web Company am 28.11. 2011 bei der AGEO Jahresveranstaltung 2011 über den Weg von Open Data (Offenen Daten) zu Linked Open Data (Vernetzten offenen Daten), sowie über das Potential und die Vorteile von Linked Open Data (LOD) im Bereich von Offenen Regierungsdaten (Open Government Data- OGD).
DevRel - Transform article writing from printing to onlineSammy Fung
My presentation at DevRel/Asia 2020 to talk about Developer Relationship - Transforming article writing from printing to online.
#opensource #hackthon #linux #asia #hongkong #devrel
My Open Source Journey - Developer and CommunitySammy Fung
I share my open source journey which begins from Linux User Group to nowadays in the Open Source community, in developer role and community leader/volunteer role.
https://coscup.org/2019/en/programs/my-open-source-journey-developer-and-community
I introduced my open source job board at Lightning talk of COSCUP 2014 in Taiwan. This presentation slide is extended at lightning talk of Software Freedom Day 2014 in Hong Kong.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
3. We want a easier way to
access the public data.
4. Agenda
●
What is Open Data ?
●
Use of Open Source Software in web crawling.
●
Starting new Open Source project hk0weather
to create Open Weather Data.
5. Sammy Fung
●
Software Developer
–
to use and develop open source sofware.
–
Perl → PHP → Python.
–
interests on Data Mining / Web Crawling.
–
own a startup of web and mobile technology.
6. Sammy Fung
●
15+ years in Open Source Communities.
–
Founding Chairman, Hong Kong Linux User Group.
–
Founding Chairman, Open Source Hong Kong.
–
Member, GNOME Asia committee.
–
Mozilla Representative
–
Member, program committee at COSCUP
●
Conference for Open Source Coders, Users and Developers.
●
Largest open source conference in Taiwan.
8. Open Data
Three Laws of Open Government Data by David Eaves.
1.If it can't be spidered or indexed, it doesn't exist.
2.If it isn't available in open and machine readable format, it
can't engage.
3.If a legal framework doesn't allow it to be repurposed, it
doesn't empower.
http://eaves.ca/2009/09/30/three-law-of-open-government-data/
10. * One Star - Open Data
1.make your stuff available on the Web (whatever format) under an
open license.
2.make it available as structured data (e.g., Excel instead of image
scan of a table)
3.use non-proprietary formats (e.g., CSV instead of Excel)
4.use URIs to denote things, so that people can point at your stuff.
5.link your data to other data to provide context.
5stardata.info by Tim Berners-Lee, the inventor of the Web.
11. ** Two Star - Open Data
1.make your stuff available on the Web (whatever format) under an
open license.
2.make it available as structured data (e.g., Excel instead of image
scan of a table)
3.use non-proprietary formats (e.g., CSV instead of Excel)
4.use URIs to denote things, so that people can point at your stuff.
5.link your data to other data to provide context.
5stardata.info by Tim Berners-Lee, the inventor of the Web.
12. *** Three Star - Open Data
1.make your stuff available on the Web (whatever format) under an
open license.
2.make it available as structured data (e.g., Excel instead of image
scan of a table)
3.use non-proprietary formats (e.g., CSV instead of Excel)
4.use URIs to denote things, so that people can point at your stuff.
5.link your data to other data to provide context.
5stardata.info by Tim Berners-Lee, the inventor of the Web.
13. **** Four Star - Open Data
1.make your stuff available on the Web (whatever format) under an
open license.
2.make it available as structured data (e.g., Excel instead of image
scan of a table)
3.use non-proprietary formats (e.g., CSV instead of Excel)
4.use URIs to denote things, so that people can point at your stuff.
5.link your data to other data to provide context.
5stardata.info by Tim Berners-Lee, the inventor of the Web.
14. ***** Five Star - Open Data
1.make your stuff available on the Web (whatever format) under an
open license.
2.make it available as structured data (e.g., Excel instead of image
scan of a table)
3.use non-proprietary formats (e.g., CSV instead of Excel)
4.use URIs to denote things, so that people can point at your stuff.
5.link your data to other data to provide context.
5stardata.info by Tim Berners-Lee, the inventor of the Web.
16. Open Data in Hong Kong
●
Data.One
–
http://www.gov.hk/en/theme/psi
–
released on 2011/3/31.
–
First App Competition on Data.One
●
Call for Submission now till 2014/02/28.
17. Weather Information in Hong Kong
●
Hong Kong Observatory
–
Hourly Hong Kong Weather Report
–
Regional Weather in Hong Kong (10 min updates)
–
Weather Forecast and Weekly Weather Forecast
–
Typhoon Report and Forecast
20. Weather at Data.One
●
●
I posted a blog 'Progress of Open
Government Data in Hong Kong' on
2013/01/17.
Weather at Data.One provides 7 dataset URLs,
returns RSS (XML) format (Eng/TChi/SChi)
–
One word: Useless.
–
Data.One dataset (RSS) is completely different
with HKO own paid service (XML).
21. Weather at Data.One
●
Example - Current local weather report:
●
Plain text report in RSS.
●
Difference to quote report content:
–
–
●
Website: a pair of HTML tags, eg. <PRE>....</PRE>.
Data.One: a pair of RSS description tags,
<description>....</description>.
Other weather data is missing, eg. Regional
temperture updates per each 12 mins.
22. Weather at Data.One
●
●
●
Weather at Data.One is 'report' but not 'data'.
Weather RSS is already released by HKO
before launch of Data.One.
Technically, json/xml format is better
readable by computer programs.
24. Data.One
●
JSON/XML (18 datasets)
–
Air Pollution.
●
Past 24-hour Air Pollution Index from stations.
–
Approved Charitable Fund-raising Activities
–
Restaurant and Food Licences.
–
Details of facility locations.
–
Reward Notices from Police Force.
–
Marine Traffic (Arrival/Departure).
–
Traffic Speed and special news.
–
EventHK information.
25. Data.One
●
RSS (10 datasets)
–
Weather Information (7 datasets)
–
Beach Water Quality (1 datasets)
–
Current Air Pollution Index range and forecase (2
datasets)
27. Data.One
●
CSV
–
–
Locations of Public Facility and GovWifi
–
●
Past Record of Air Pollution Index
Marine Shipping directory of HK
HTML
–
●
HTML version of Marine Traffic.
XLS, MDB
–
2011 Population Census.
–
Property Market Statistics.
–
Monthly Digested Stats and Registers of Auth Persons from Building Dept.
–
Routes and fares of public transport.
28. Data.One
●
Many departments does not release their useful data, and
release current information available on their website.
–
●
Few of them keep available open data in their own.
Most of them does not understand what is 'real' open data.
–
–
Open data format insteads of proprietary data format.
–
●
Data insteads of Information.
Useful of data.
Some departments should manage their open data in better
data structure.
31. Legco Meeting Minutes
and Voting Results
●
●
●
In October 2013, LegCo start to publish voting
results of House Committe in XML.
It is not a part of Data.One project.
My open source software on LegCo vote
result XML:
–
http://github.com/sammyfung/legcovotes
32. Digital21 Strategy
Public Consultation Document
(G) Public Sector Information (PSI) as Default
"34. Through different channels (like press releases, publications, websites, etc.), the
Government releases a lot of information in different areas. However, most of such
information can only be read but cannot be used. In view of the immense benefits of
widening access to PSI for free and easy re-use, we propose to make all Government
information released for public consumption machine-readable by default. Where
appropriate, datasets will be released with application programming interfaces (APIs),
providing predefined functions to make their retrieval easier."
(G) 廣泛提供公共資料
"34. 政府透過不同途徑 ( 例如新聞稿、出版物、網站等 ) 發放大量不同範疇的資料。然而 , 這些資
料大都只可供閱讀而不能使用。有見開放公共資料以供免費再用可帶來巨大效益 , 我們建議所有
開放予公眾使用的政府資料都須以數碼格式編製。在適用情況下 , 資料發布時會同時推出應用程
式界面 , 以便提供預設功能 , 讓公眾輕易地檢索資料。 "
33. Digital21 Strategy
Public Consultation Document
"33. PSI datasets can be used and meshed together to create innovative new applications, as
demonstrated by the creative and useful products and services developed from PSI in Hong Kong
and around the world. For example, using PSI datasets on traffic snapshot images, a number of
mobile apps have been developed to provide real-time traffic situation for users to avoid traffic jams
in planning their traffic routes. Experience from other developed economies shows that widening
access to PSI datasets can open up lucrative business opportunities and bring social benefits. By
tapping the creativity of the community and entrepreneurs, the use of PSI can lead to positive social
outcomes. For instance, in some cities in the United States, application of PSI on hygiene inspections
has led to a significant drop in food poisoning incidents."
35. Digital21 Strategy
Public Consultation Document
"35. Apart from Government data, there are vast amounts of PSI handled,
collected and disseminated by public organisations, which are equally useful
for the development of innovative services and products. Therefore, we
propose to encourage public organisations (e.g. public utilities and transport
operators) to release data owned by them in machine-readable format."
"35. 除了政府資料外 , 本港亦備有大量經公共機構處理、收集及發放的公共資料 ,
這些資料對開發創新服務及產品同樣有用。因此 , 我們建議鼓勵公共機構 ( 例如公
用事業及運輸機構 ) 發放以數碼格式編製的資料。 "
37. g0v.tw
●
●
●
●
Promote information transparency.
Develop information platform and tools for a
society of citizen participation.
Open Source model.
Stackoverflow-like Q&A system for public to
asking for data which they are looking for.
38. g0v.tw
●
●
●
Established after Taiwan Yahoo! Open Hack
Day in October 2012.
Hackers, Professors, NGO/NPO, Students,
Writers, Visual Media, Legal Professionals.
Organize 5 bi-monthly hackathons since
December 2012.
42. Moedict 萌典
●
●
●
●
Raw data from Ministry of Education (edu.tw)
Community build of web-based chinese
dictionary with 160,000 Chinese items and
other items.
Support auto-completion, searching and
offline versions.
Source codes, other platforms, data are
available on 3du.tw (hackpad).
52. Web Scraping
●
a computer software technique of extracting
information from websites. (Wikipedia)
●
for business, hobbies, research purposes.
53. Web Scraping
●
Look for right URLs to scrap.
●
Look for right content from webpages.
●
Saving data into data store.
●
When to run the web scraping program ?
54. Use of Open Source Software in
Web Crawling
●
●
Use Open Source Tools to collect useful and
meaningful machine-readable data.
Doesn't need to wait provider to release data
in machine-readable format.
55. Open Source Tools
●
Python programming lanugage
●
with Regular Expression library
●
Scrapy web crawling framework
56. Why python + scrapy ?
●
●
python: my current favourite programming
language for few years.
scrapy: web crawling framework written in
Python.
57. What is Scrapy ?
●
●
An open source web scraping framework for
Python.
Scrapy is a fast high-level screen scraping and
web crawling framework, used to crawl
websites and extract structured data from
their pages. It can be used for a wide range of
purposes, from data mining to monitoring
and automated testing.
58. Scrapy Features
●
define data you want to scrapy
●
write spider to extract data
●
Built-in: selecting and extracting data from HTML
and XML
●
Built-in: JSON, CSV, XML output
●
Interactive shell console
●
Built-in: web service, telnet console, logging
●
Others
60. Programme List of Paid TVs in 2004
●
I want to know live football match was
showing on which channel.
●
Paid TV web site = M$ + IIS + ASP + Flash
●
Slow....... Very Slow...... Extremely Slow!
●
Couldn't connect at any peak hours!
●
Wrote my first web crawler in PHP in 2004.
61. Public Transportation in 2006-2010
●
Kowloon Motor Bus (KMB)
–
●
No map view for a bus route
Public Transportation Enquiry System (PTES)
–
Exteremly Poor, Ugly (or much worse) map UI on
PTES.
62. HK Observatory and Joint Typhoon
Warning Center
●
Any typhoon is coming to Hong Kong ? And
When will it come ?
●
No easy data exchange format.
●
No RSS nor ATOM.
●
We aren't check websites everyday.
83. Agenda
●
What is Open Data ?
●
Use of Open Source Software in web crawling.
●
Starting new Open Source project hk0weather
to create Open Weather Data.
84. We want a easier way to
access the public data.