The COMSODE project has received funding from the Seventh Framework Programme of the European Union in the grant agreement number 611358.
Open Data Node
Platform and Methodology
Peter Hanečák <peter.hanecak@eea.sk>, EEA s.r.o.
May, 2015
Who am I
●
Peter Hanečák <peter.hanecak@eea.sk>
●
member of COMSODE project
– leader of WP2 (architecture and design of ODN)
– leader of WP4 (implementation ODN)
●
enthusiast in many things “Open”,
active in NGOs and other communities
– member of OpenData.sk and SOIT
– Fedora Linux packager
https://www.facebook.com/hany.sk
https://www.linkedin.com/in/peterhanecak
https://twitter.com/PHanecak
Agenda
●
What is COMSODE
●
What is COMSODE Methodology
●
What is Open Data Node (ODN)
●
Integration with ODN
●
HW and SW requirements
●
Future of ODN
COMSODE
●
Components Supporting the Open Data Exploitation
●
main target: publication platform for Open Data
– software tool
●
supplemental goal: methodology for publication of Open Data
– mainly for those with little or no experience with Open Data
– because software as of itself is useless for such people, organizations
●
validation: pilots
– pilots by 3rd parties
– pilot by COMSODE itself: 150 datasets + 3rd party-like Search app by Spinque
COMSODE Methodology
●
publication plan
●
preparation of publication
●
realization of publication
●
archiving
reference:
●
http://www.comsode.eu/index.php/deliverables/
●
Deliverable D5.1 + ANNEX 1 and 2
COMSODE Methodology
●
publication plan
●
preparation of publication
●
realization of publication
●
archiving
COMSODE Methodology
●
publication plan
●
preparation of publication
●
realization of publication
●
archiving
COMSODE Methodology
●
publication plan
●
preparation of publication
●
realization of publication
●
archiving
COMSODE Methodology
●
publication plan
●
preparation of publication
●
realization of publication
●
archiving
Open Data Node
help with many publication steps as outlined in Methodology
handle complexities as present in sources of data
make it easy to publish high-quality (Linked) Open Data from those sources
in automated fashion
most common use-cases: 2* -> 3*+
●
input: XLS, SQL DB, ...
●
transformations: XLS, SQL -> CSV, „bad CSV“ -> CSV, CSV -> Linked Data
●
output:
– tabular/relational data: CSV, REST API
– Linked Data: RDF, SPARQL endpoint
Open Data
not
Open Data
Open Data Node
Open Data Node
ODN can be used by:
●
data publishers
●
data users
Many publishers are also users, thus
the data ecosystem is quite
complex.
ODN can be used in many roles
within that ecosystem.
Open Data Node
●
platform supporting whole
OD publishing process
●
modular design
●
allowing to create distributed
network of nodes
●
able to be integrated to
existing infrastructure
Open Data Node
●
extraction, transformation and
enrichment of internal data
●
storage of resulting Open Data
●
publishing of stored Open Data
on the Web
●
cataloging functionality
●
management functions
Open Data Node
●
publication plan
●
preparation of publication
●
realization of publication
●
archiving
Open Data Node
●
publication plan
●
preparation of publication
●
realization of publication
●
archiving
Open Data Node
●
publication plan
●
preparation of publication
●
realization of publication
●
archiving
Open Data Node
●
publication plan
●
preparation of publication
●
realization of publication
●
archiving
Open Data Node
●
publication plan
●
preparation of publication
●
realization of publication
●
archiving
Integration with Open Data Node
●
data harvesting side
●
data publication side
●
special cases
Integration with Open Data Node
data publication side: as implied by most common use-cases
●
files: CSV, RDF
●
API: REST API, SPARQL endpoint
Integration with Open Data Node
data harvesting side: as implied by most common use-cases
●
files: XLS, „bad CSV“, ... - almost anything(*)
●
API: SQL, SOAP, ... - almost anything(*)
●
plus all the „Open Data files and APIs“
(*) given a prominence of a format/technology or particular interest of „customer“
Integration with Open Data Node
special cases:
●
ODN/Management: integration of SSO with your existing infrastructure
●
ODN/Storage: direct access to SPARQL endpoint
●
ODN/InternalCatalog: direct access to management API
●
etc.
HW and SW requirements
HW:
●
CPU: common x86_64 compatible (dual/quad core is recommended)
●
memory: minimum 4 GB (recommended 8 GB) (*)
●
storage: minimum 40 GB (*)
SW:
●
OS: Debian 7.6 „Wheezy“
●
OpenJDK 7
(*) Subject to size of transformed data and requirements on transformation operations.
Future of ODN
Key point: Open Source
Future depends on many factors:
●
strenght of communities
– around ODN itself
– around individual components: UnifiedViews, CKAN, PostgreSQL, etc.
●
how well the business goes for commercial partners which use and
maintain ODN (EEA, etc.)
Future of ODN
Key point: Open Source
Future depends on many factors:
●
strenght of communities
●
how well the business goes for commercial partners which use and maintain ODN (EEA, etc.)
Existing achievements strenghtening the future:
●
consortium around UnifiedViews: three companies and other organizations
●
Slovak government as customer for ODN
●
around 10 COMSODE Pilots in various EU countries
(so far, at various stages)
ODN implementation in Slovakia
in eDemokracia project, ODN is used as:
●
centralized component
●
de-centralized component
de-centralized component
centralized component
ODN implementation in Slovakia
ODN as part of centralized component:
●
heavily customized
– only some modules used, commercial version of triplestore,
clustered RDBMS, etc.
●
decomposed to multiple servers
●
integrated with other components
– centralized SSO, OCR and content clasification services, etc.
●
an “upgrade” for existing data portal
data.gov.sk
●
incorporated as extension into top-level GOV portal
slovensko.sk
ODN implementation in Slovakia
ODN as de-centralized component:
●
ODN with little customizations
– central catalog and storage preconfigured
– etc.
●
distributed as „live DVD“
●
for gov. organizations and
municipalities
Open Data Node - Platform and Methodology - 2015-May

Open Data Node - Platform and Methodology - 2015-May

  • 1.
    The COMSODE projecthas received funding from the Seventh Framework Programme of the European Union in the grant agreement number 611358. Open Data Node Platform and Methodology Peter Hanečák <peter.hanecak@eea.sk>, EEA s.r.o. May, 2015
  • 2.
    Who am I ● PeterHanečák <peter.hanecak@eea.sk> ● member of COMSODE project – leader of WP2 (architecture and design of ODN) – leader of WP4 (implementation ODN) ● enthusiast in many things “Open”, active in NGOs and other communities – member of OpenData.sk and SOIT – Fedora Linux packager https://www.facebook.com/hany.sk https://www.linkedin.com/in/peterhanecak https://twitter.com/PHanecak
  • 3.
    Agenda ● What is COMSODE ● Whatis COMSODE Methodology ● What is Open Data Node (ODN) ● Integration with ODN ● HW and SW requirements ● Future of ODN
  • 4.
    COMSODE ● Components Supporting theOpen Data Exploitation ● main target: publication platform for Open Data – software tool ● supplemental goal: methodology for publication of Open Data – mainly for those with little or no experience with Open Data – because software as of itself is useless for such people, organizations ● validation: pilots – pilots by 3rd parties – pilot by COMSODE itself: 150 datasets + 3rd party-like Search app by Spinque
  • 5.
    COMSODE Methodology ● publication plan ● preparationof publication ● realization of publication ● archiving reference: ● http://www.comsode.eu/index.php/deliverables/ ● Deliverable D5.1 + ANNEX 1 and 2
  • 6.
    COMSODE Methodology ● publication plan ● preparationof publication ● realization of publication ● archiving
  • 7.
    COMSODE Methodology ● publication plan ● preparationof publication ● realization of publication ● archiving
  • 8.
    COMSODE Methodology ● publication plan ● preparationof publication ● realization of publication ● archiving
  • 9.
    COMSODE Methodology ● publication plan ● preparationof publication ● realization of publication ● archiving
  • 10.
    Open Data Node helpwith many publication steps as outlined in Methodology handle complexities as present in sources of data make it easy to publish high-quality (Linked) Open Data from those sources in automated fashion
  • 11.
    most common use-cases:2* -> 3*+ ● input: XLS, SQL DB, ... ● transformations: XLS, SQL -> CSV, „bad CSV“ -> CSV, CSV -> Linked Data ● output: – tabular/relational data: CSV, REST API – Linked Data: RDF, SPARQL endpoint Open Data not Open Data Open Data Node
  • 12.
    Open Data Node ODNcan be used by: ● data publishers ● data users Many publishers are also users, thus the data ecosystem is quite complex. ODN can be used in many roles within that ecosystem.
  • 13.
    Open Data Node ● platformsupporting whole OD publishing process ● modular design ● allowing to create distributed network of nodes ● able to be integrated to existing infrastructure
  • 14.
    Open Data Node ● extraction,transformation and enrichment of internal data ● storage of resulting Open Data ● publishing of stored Open Data on the Web ● cataloging functionality ● management functions
  • 15.
    Open Data Node ● publicationplan ● preparation of publication ● realization of publication ● archiving
  • 16.
    Open Data Node ● publicationplan ● preparation of publication ● realization of publication ● archiving
  • 17.
    Open Data Node ● publicationplan ● preparation of publication ● realization of publication ● archiving
  • 18.
    Open Data Node ● publicationplan ● preparation of publication ● realization of publication ● archiving
  • 19.
    Open Data Node ● publicationplan ● preparation of publication ● realization of publication ● archiving
  • 20.
    Integration with OpenData Node ● data harvesting side ● data publication side ● special cases
  • 21.
    Integration with OpenData Node data publication side: as implied by most common use-cases ● files: CSV, RDF ● API: REST API, SPARQL endpoint
  • 22.
    Integration with OpenData Node data harvesting side: as implied by most common use-cases ● files: XLS, „bad CSV“, ... - almost anything(*) ● API: SQL, SOAP, ... - almost anything(*) ● plus all the „Open Data files and APIs“ (*) given a prominence of a format/technology or particular interest of „customer“
  • 23.
    Integration with OpenData Node special cases: ● ODN/Management: integration of SSO with your existing infrastructure ● ODN/Storage: direct access to SPARQL endpoint ● ODN/InternalCatalog: direct access to management API ● etc.
  • 24.
    HW and SWrequirements HW: ● CPU: common x86_64 compatible (dual/quad core is recommended) ● memory: minimum 4 GB (recommended 8 GB) (*) ● storage: minimum 40 GB (*) SW: ● OS: Debian 7.6 „Wheezy“ ● OpenJDK 7 (*) Subject to size of transformed data and requirements on transformation operations.
  • 25.
    Future of ODN Keypoint: Open Source Future depends on many factors: ● strenght of communities – around ODN itself – around individual components: UnifiedViews, CKAN, PostgreSQL, etc. ● how well the business goes for commercial partners which use and maintain ODN (EEA, etc.)
  • 26.
    Future of ODN Keypoint: Open Source Future depends on many factors: ● strenght of communities ● how well the business goes for commercial partners which use and maintain ODN (EEA, etc.) Existing achievements strenghtening the future: ● consortium around UnifiedViews: three companies and other organizations ● Slovak government as customer for ODN ● around 10 COMSODE Pilots in various EU countries (so far, at various stages)
  • 27.
    ODN implementation inSlovakia in eDemokracia project, ODN is used as: ● centralized component ● de-centralized component de-centralized component centralized component
  • 28.
    ODN implementation inSlovakia ODN as part of centralized component: ● heavily customized – only some modules used, commercial version of triplestore, clustered RDBMS, etc. ● decomposed to multiple servers ● integrated with other components – centralized SSO, OCR and content clasification services, etc. ● an “upgrade” for existing data portal data.gov.sk ● incorporated as extension into top-level GOV portal slovensko.sk
  • 29.
    ODN implementation inSlovakia ODN as de-centralized component: ● ODN with little customizations – central catalog and storage preconfigured – etc. ● distributed as „live DVD“ ● for gov. organizations and municipalities