SlideShare a Scribd company logo
1 of 27
Download to read offline
Fabio Simeoni (FAO)
the virtual repository
standards-based import and publication
Monday, 17 June 13
2
outline
• about data import and publication
• why it is a problem
• how it can be simplified
• the virtual repository
• where we are
• where we are going
Monday, 17 June 13
3
context
• there is an app
- manages data of some type: adds some value to it
• there is data out there
- quite a lot: waiting to be managed
• there are places out there
- quite a few: waiting to disseminate the added value
- repositories: specialised network services
• the app wants to reach out
- with first-class import and publication facilities
Monday, 17 June 13
4
what we mean by...
• data import
- pull data from some “source”
- transform it, store it, and use it for app-local purposes
- this is no real-time, fine-grained access
• data publication
- transform data for dissemination purposes
- push it to some “sink”
- this is no real-time, fine-grained update
• “coarse” I/O
Monday, 17 June 13
5
scope illustrated
app
repo 1
repo 2
publish
import
transforms
internal model
Monday, 17 June 13
6
the “average joe”
• import = file upload
- users are the sources: they have the data
- just one use case: what about data in repositories?
- should users discover it and retrieve it on behalf of the app?
• publication = export to file
- users are the sinks: they use the data
- just one use case: what about other consumers?
- should users disseminate data on behalf of repositories?
Monday, 17 June 13
7
“average joe” illustrated
app
donwload
upload
transforms
internal model
Monday, 17 June 13
8
(fancier variations)
• URI resolution
- users provide URIs, app resolves them
- a step forward, but onus of discovery remains on users
- repositories not ‘on the Web’ still out of the picture
• no publication, app disseminates
- doubles as a repository service
- two different missions/roles/competencies
- require different models, designs, technologies
- would rather integrate specialised solutions in infra
Monday, 17 June 13
9
imagine this
• users browse all data ‘nearby’ the app
- metadata describes contents, provenance, size ...
• users pick what data to import
- providing directives on how the app should convert it
• users browse repositories ‘nearby’ the app
- metadata describes location, policy, formats, ...
• users pick where to publish
- providing directives on how the app should convert for it
Monday, 17 June 13
10
imagine this
IMPORT
an asset … … …
another asset … … …
that asset … … …
my asset … … …
your asset … … …
… … … …
VERS. ORIGIN ...NAME
…
10 Applications
10 Documents
10 Servers
CHOOSES
customises
PUBLISH
a repo … … …
another repo … … …
that repo … … …
my repo … … …
your repo … … …
… … … …
…. …. ...NAME
10 Applications
10 Documents
10 Servers
CHOOSES
customises
Monday, 17 June 13
11
why don’t we see it
• it’s not simple
- many sources/sinks, APIs, formats, transforms
- difficult to paper over differences for users
- difficult to handle distributed interactions properly
- overall, a non-trivial interoperability problem
• it’s not cost-effective
- it’s not the core business of the app
- core business is to manage, not I/O
Monday, 17 June 13
12
wrong assumptions
• costs should fall entirely on the app
- to bridge across many formats and APIs over the network
• repositories can’t help
- yet their core business is precisely to disseminate
• tools can’t help
- yet the same problem recurs in many apps
Monday, 17 June 13
13
different assumptions
• users are there to choose
- what to import, where to publish: it’s their privilege
• app is there to map
- to/form internal model: it’s its job
• repositories are there to ingest and disseminate
- should make it easy to publish and import: it’s their mission
• tools should provide the glue
- factor out common tasks in reusable solutions: it’s well in their scope
Monday, 17 June 13
14
virtual repository
• a client library, a Jar
- helps the app build first-class import/publication facilities
• materialises an imaginary repository
- client API to discover, retrieve and publish data
• tailored to app
- contains/takes what app can transform (not other way around)
• seemingly local
- as if the data was right there, no ‘network-awareness’
Monday, 17 June 13
15
virtual repository
• a view over real repositories
- defines the ‘data hood’ of the app
• modular
- built out of repository-specific plugins
- plugins implement SPI in their own Jars
- app cherry-picks plugins and deploys Jars
• network-aware
- e.g. parallel data discovery
- e.g. timed out retrieval and updates
Monday, 17 June 13
16
virtual repository
• defines “standard” rules of exchange
- the formats of the data types, the APIs of the formats
• app transforms standards
- no custom work, less transformations
• plugins take/return standards
- do the custom work, as per repository mission
• standards-based rendezvous
- app and plugins sync on data
- ignore each other otherwise: technologies in the back seat
Monday, 17 June 13
17
virtual repository illustrated
app
virtual
repo
publish
discover
import
plugin
repo
repo
repo
"standard"
API SPI
Data
HOOD
client-side server-side
Monday, 17 June 13
18
a use case
• app manages code lists
- SDMX is a standard for code lists
- app implements internal ⇿ SDMX
• some repos disseminate code lists
- e.g. triple-store as SKOS, RDBMS as custom CSV
- plugins implement SKOS ⇿ SDMX, CSV ⇾ SDMX
• some flows are enabled
- DB ⇾ DB plugin ⇾ SDMX ⇾ app
- TS ⇾ TS plugin ⇾ SDMX ⇾ app
- DB ⇾ DB plugin ⇾ SDMX ⇾ app ⇾ SDMX ⇾ TS plugin ⇾ TS
Monday, 17 June 13
19
what we expect
• for apps
- one or two transforms reach the ‘data hood’
- no network awareness: easy coding
- no dependency on repos, including legacy ones: data before technologies
• for repositories
- an API for Java clients
- a low-cost one: plugins are easy
- no dependencies on clients: handle evolution in one place
• net gains
- max results, least effort
- loose coupling
Monday, 17 June 13
20
minimal client API
• AssetType
- what can be exchanged: just a named standard
• Asset
- a description of what is exchanged: a named instance of an AssetType
- bound to RepositoryService that has it/can take it
- specialised: SdmxAsset, SdmxCodelist, CsvAsset, CsvCodelist, ...
- well-known properties induced by type, arbitrary ones specific to instance
• VirtualRepository
- what mediates the exchange of Assets;
- can discover of Assets given AssetTypes
- can retrieve/publish their content in one or more standard APIs
Monday, 17 June 13
21
asset discovery
//somewhere	
  in	
  the	
  app
VirtualRepository	
  repo	
  	
  =	
  …;	
  //factories,	
  injection,	
  new()
//elsewhere:	
  discovery	
  is	
  a	
  remote	
  operation
int	
  discovered	
  =	
  repo.discover(SdmxCodelist.type,	
  CsvCodelist.type);	
  
//elsewhere:	
  build	
  discovery	
  screen	
  for	
  users
for	
  (Asset	
  codelist	
  :	
  repo)	
  {
	
  	
  …
	
  	
  …codelist.id()…	
  
	
  	
  …codelist.name()…
	
  	
  …codelist().service().name()…	
  
	
  	
  	
  
	
  	
  	
  for	
  (Property	
  p	
  :	
  codelist.properties())
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  …p.name()…
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  …p.value()…
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  …p.description…	
  
	
  	
  …	
  
}
Monday, 17 June 13
22
asset retrieval
//use	
  chosen	
  an	
  asset
String	
  codelistId	
  =	
  …;
//retrieve	
  metadata	
  previously	
  discovered
Asset	
  asset	
  =	
  repo.lookup(codelistId);
//DISCLAIMER:	
  there	
  are	
  more	
  elegant	
  ways	
  to	
  dispatch!!!
if	
  (asset	
  instanceof	
  SdmxCodelist)	
  {
	
  	
  	
  //a	
  remote	
  operation:	
  CodelistBean	
  is	
  a	
  standard	
  API	
  for	
  SDMX
	
  	
  	
  CodelistBean	
  codelist	
  =	
  repo.retrieve(asset,	
  CodelistBean.class)
	
  	
  	
  	
  
	
  	
  	
  importFromSdmx(codelist);	
  //app’s	
  transform
}
else	
  	
  if	
  (asset	
  instanceof	
  CsvCodelist)	
  {
	
  	
  	
  
//a	
  remote	
  operation:	
  Table	
  is	
  a	
  standard	
  API	
  for	
  CSV
Table	
  codelist	
  =	
  repo.retrieve(asset,Table.class)
importFromCsv((CsvCodelist)	
  codelist,codelist);	
  //app’s	
  transform
}
Monday, 17 June 13
23
asset publication (1)
//build	
  publication	
  screen	
  for	
  users
Collection<RepositoryService>	
  sinks	
  =	
  
repo.sinks(SdmxCodelist.type,CsvCodelist.type);
//retrieve	
  metadata	
  previously	
  discovered
for	
  (RepositoryService	
  sink	
  :	
  sinks)	
  {
	
  	
  	
  	
  …sink().name()…	
  
	
  	
  	
  	
  …for	
  (Property	
  p	
  :	
  singk.properties())
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  …p.name()…
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  …p.value()…
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  …p.description…
}
//elsewhere:	
  user	
  has	
  chosen	
  an	
  asset
String	
  codelistId	
  =	
  …
MyCodelist	
  codelist	
  =	
  …codelistId…	
  //	
  app	
  retrieves	
  it
//elsewhere:	
  user	
  has	
  chosen	
  a	
  repository
	
  	
  	
  	
  	
  String	
  serviceId	
  =	
  …;
RepositoryService	
  sink	
  =	
  repo.services().lookup(sinkId);	
  //	
  app	
  retrieves	
  it
Monday, 17 June 13
24
asset publication (2)
if	
  (sink.publishes(SdmxCodelist.type))	
  {
SdmxCodelist	
  codelist	
  =	
  new	
  SdmxCodelist(...sink...);
CodelistBean	
  sdmxStream	
  =	
  	
  publishToSdmx(codelist);	
  //app’s	
  transform
	
  	
  //publication	
  is	
  a	
  remote	
  operation
repo.publish(asset,sdmxStream);
	
  }
	
  else	
  if	
  (sink.publishes(CsvCodelist.type))	
  	
  {
CsvCodelist	
  codelist	
  =	
  new	
  CSVCodelist(...sink...);
Table	
  table	
  =	
  	
  publishToCsv(codelist);	
  //app’s	
  transform
	
  	
  repo.publish(asset,table);
	
  	
  	
  }
Monday, 17 June 13
25
where are we
• virtual-repository-1.0.0
- out end of the month, snapshots in gcube-snapshots
• virtual-sdmx-registry-1.0.0
- plugin for one or more SDMX registries
- including iMarine’s (uses CNR’s library)
• virtual-semantic-repository-1.0.0
- plugin for FAO’s triple-store of reference data
• virtual-rtms-1.0.0
- plugin for FAO’s Figis RDBMS
• quick turnaround
- one month development activities, part-time (3 devs)
Monday, 17 June 13
26
where are we
• the approach is viable
- Cotrix integration: expected benefits delivered at expected costs
- plugin development: expected costs, 3-4 days fulltime
- but needs supervision: new standards require new releases
• we have learned a thing or two
- e.g. SDMX is self-describing and flexible, but of bounded expressiveness
- e.g. CSV is less self-describing and regular, but unbounded in principle
• we have much more to learn still
- can we stand production ?
- can we move outside reference data and into ‘big data’ ?
- can we scale when many plugins flog the app’s classpath ?
- what range of apps can we really support?
Monday, 17 June 13
27
where we are going
• grow the ‘data hood’
- more standards (including non-reference data)
- more repositories (i.e. more plugins)
- on demand
• grow the apps
- the new TimeSeries ?
- AssetExplorer ?
- built entirely and solely on VR plus all known plugins
- browse the ‘data hood’ to download in required format
- put those transform to practical use
- killer app for VR
Monday, 17 June 13

More Related Content

Similar to The Virtual Repository

RIPEstat Public demo 21 February 2012
RIPEstat Public demo 21 February 2012RIPEstat Public demo 21 February 2012
RIPEstat Public demo 21 February 2012
RIPE NCC
 
Rustam Aliyev and Ivan Martynov - From monolith web app to micro-frontends – ...
Rustam Aliyev and Ivan Martynov - From monolith web app to micro-frontends – ...Rustam Aliyev and Ivan Martynov - From monolith web app to micro-frontends – ...
Rustam Aliyev and Ivan Martynov - From monolith web app to micro-frontends – ...
OdessaJS Conf
 

Similar to The Virtual Repository (20)

Business-friendly library for inter-service communication
Business-friendly library for inter-service communicationBusiness-friendly library for inter-service communication
Business-friendly library for inter-service communication
 
Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655
Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655
Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655
 
RIPEstat Public demo 21 February 2012
RIPEstat Public demo 21 February 2012RIPEstat Public demo 21 February 2012
RIPEstat Public demo 21 February 2012
 
Framework for IoT Interoperability
Framework for IoT InteroperabilityFramework for IoT Interoperability
Framework for IoT Interoperability
 
Creating Developer-Friendly Docker Containers with Chaperone
Creating Developer-Friendly Docker Containers with ChaperoneCreating Developer-Friendly Docker Containers with Chaperone
Creating Developer-Friendly Docker Containers with Chaperone
 
IBM ConnectED SPOT104: Lightning-Fast Development of Native Mobile Apps for I...
IBM ConnectED SPOT104: Lightning-Fast Development of Native Mobile Apps for I...IBM ConnectED SPOT104: Lightning-Fast Development of Native Mobile Apps for I...
IBM ConnectED SPOT104: Lightning-Fast Development of Native Mobile Apps for I...
 
Building Tomorrow's Web Services
Building Tomorrow's Web ServicesBuilding Tomorrow's Web Services
Building Tomorrow's Web Services
 
Cytoscape and External Data Analysis Tools
Cytoscape and External Data Analysis ToolsCytoscape and External Data Analysis Tools
Cytoscape and External Data Analysis Tools
 
DataBearings: A semantic platform for data integration on IoT, Artem Katasonov
DataBearings: A semantic platform for data integration on IoT, Artem KatasonovDataBearings: A semantic platform for data integration on IoT, Artem Katasonov
DataBearings: A semantic platform for data integration on IoT, Artem Katasonov
 
Treasure Data Cloud Strategy
Treasure Data Cloud StrategyTreasure Data Cloud Strategy
Treasure Data Cloud Strategy
 
Introduction to node js - From "hello world" to deploying on azure
Introduction to node js - From "hello world" to deploying on azureIntroduction to node js - From "hello world" to deploying on azure
Introduction to node js - From "hello world" to deploying on azure
 
FIWARE: Managing Context Information at large scale
FIWARE: Managing Context Information at large scaleFIWARE: Managing Context Information at large scale
FIWARE: Managing Context Information at large scale
 
From monolith web app to micro-frontends
From monolith web app to micro-frontendsFrom monolith web app to micro-frontends
From monolith web app to micro-frontends
 
Rustam Aliyev and Ivan Martynov - From monolith web app to micro-frontends – ...
Rustam Aliyev and Ivan Martynov - From monolith web app to micro-frontends – ...Rustam Aliyev and Ivan Martynov - From monolith web app to micro-frontends – ...
Rustam Aliyev and Ivan Martynov - From monolith web app to micro-frontends – ...
 
The Final Frontier
The Final FrontierThe Final Frontier
The Final Frontier
 
Sword Crig 2007 12 06
Sword Crig 2007 12 06Sword Crig 2007 12 06
Sword Crig 2007 12 06
 
The LOD Gateway: Open Source Infrastructure for Linked Data
The LOD Gateway: Open Source Infrastructure for Linked DataThe LOD Gateway: Open Source Infrastructure for Linked Data
The LOD Gateway: Open Source Infrastructure for Linked Data
 
orioncontextbroker-20180615
orioncontextbroker-20180615orioncontextbroker-20180615
orioncontextbroker-20180615
 
Cloud Services Powered by IBM SoftLayer and NetflixOSS
Cloud Services Powered by IBM SoftLayer and NetflixOSSCloud Services Powered by IBM SoftLayer and NetflixOSS
Cloud Services Powered by IBM SoftLayer and NetflixOSS
 
Обход проверки безопасности в магазинах мобильных приложений при помощи платф...
Обход проверки безопасности в магазинах мобильных приложений при помощи платф...Обход проверки безопасности в магазинах мобильных приложений при помощи платф...
Обход проверки безопасности в магазинах мобильных приложений при помощи платф...
 

More from Fabio Simeoni (10)

Smartgears
SmartgearsSmartgears
Smartgears
 
Featherweight Clients (Athens, 2012)
Featherweight Clients (Athens, 2012)Featherweight Clients (Athens, 2012)
Featherweight Clients (Athens, 2012)
 
Technical Report: My Container
Technical Report: My ContainerTechnical Report: My Container
Technical Report: My Container
 
My Container (Sophia, 2011)
My Container (Sophia, 2011)My Container (Sophia, 2011)
My Container (Sophia, 2011)
 
Project Apash
Project ApashProject Apash
Project Apash
 
Client Libraries (Rodhes, 2011)
Client Libraries (Rodhes, 2011)Client Libraries (Rodhes, 2011)
Client Libraries (Rodhes, 2011)
 
Hello Cotrix
Hello CotrixHello Cotrix
Hello Cotrix
 
the-hitchhiker-s-guide-to-testing
the-hitchhiker-s-guide-to-testingthe-hitchhiker-s-guide-to-testing
the-hitchhiker-s-guide-to-testing
 
a-strategy-for-continuous-delivery
a-strategy-for-continuous-deliverya-strategy-for-continuous-delivery
a-strategy-for-continuous-delivery
 
Grade@cnr
Grade@cnrGrade@cnr
Grade@cnr
 

Recently uploaded

Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 

Recently uploaded (20)

Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 

The Virtual Repository

  • 1. Fabio Simeoni (FAO) the virtual repository standards-based import and publication Monday, 17 June 13
  • 2. 2 outline • about data import and publication • why it is a problem • how it can be simplified • the virtual repository • where we are • where we are going Monday, 17 June 13
  • 3. 3 context • there is an app - manages data of some type: adds some value to it • there is data out there - quite a lot: waiting to be managed • there are places out there - quite a few: waiting to disseminate the added value - repositories: specialised network services • the app wants to reach out - with first-class import and publication facilities Monday, 17 June 13
  • 4. 4 what we mean by... • data import - pull data from some “source” - transform it, store it, and use it for app-local purposes - this is no real-time, fine-grained access • data publication - transform data for dissemination purposes - push it to some “sink” - this is no real-time, fine-grained update • “coarse” I/O Monday, 17 June 13
  • 5. 5 scope illustrated app repo 1 repo 2 publish import transforms internal model Monday, 17 June 13
  • 6. 6 the “average joe” • import = file upload - users are the sources: they have the data - just one use case: what about data in repositories? - should users discover it and retrieve it on behalf of the app? • publication = export to file - users are the sinks: they use the data - just one use case: what about other consumers? - should users disseminate data on behalf of repositories? Monday, 17 June 13
  • 8. 8 (fancier variations) • URI resolution - users provide URIs, app resolves them - a step forward, but onus of discovery remains on users - repositories not ‘on the Web’ still out of the picture • no publication, app disseminates - doubles as a repository service - two different missions/roles/competencies - require different models, designs, technologies - would rather integrate specialised solutions in infra Monday, 17 June 13
  • 9. 9 imagine this • users browse all data ‘nearby’ the app - metadata describes contents, provenance, size ... • users pick what data to import - providing directives on how the app should convert it • users browse repositories ‘nearby’ the app - metadata describes location, policy, formats, ... • users pick where to publish - providing directives on how the app should convert for it Monday, 17 June 13
  • 10. 10 imagine this IMPORT an asset … … … another asset … … … that asset … … … my asset … … … your asset … … … … … … … VERS. ORIGIN ...NAME … 10 Applications 10 Documents 10 Servers CHOOSES customises PUBLISH a repo … … … another repo … … … that repo … … … my repo … … … your repo … … … … … … … …. …. ...NAME 10 Applications 10 Documents 10 Servers CHOOSES customises Monday, 17 June 13
  • 11. 11 why don’t we see it • it’s not simple - many sources/sinks, APIs, formats, transforms - difficult to paper over differences for users - difficult to handle distributed interactions properly - overall, a non-trivial interoperability problem • it’s not cost-effective - it’s not the core business of the app - core business is to manage, not I/O Monday, 17 June 13
  • 12. 12 wrong assumptions • costs should fall entirely on the app - to bridge across many formats and APIs over the network • repositories can’t help - yet their core business is precisely to disseminate • tools can’t help - yet the same problem recurs in many apps Monday, 17 June 13
  • 13. 13 different assumptions • users are there to choose - what to import, where to publish: it’s their privilege • app is there to map - to/form internal model: it’s its job • repositories are there to ingest and disseminate - should make it easy to publish and import: it’s their mission • tools should provide the glue - factor out common tasks in reusable solutions: it’s well in their scope Monday, 17 June 13
  • 14. 14 virtual repository • a client library, a Jar - helps the app build first-class import/publication facilities • materialises an imaginary repository - client API to discover, retrieve and publish data • tailored to app - contains/takes what app can transform (not other way around) • seemingly local - as if the data was right there, no ‘network-awareness’ Monday, 17 June 13
  • 15. 15 virtual repository • a view over real repositories - defines the ‘data hood’ of the app • modular - built out of repository-specific plugins - plugins implement SPI in their own Jars - app cherry-picks plugins and deploys Jars • network-aware - e.g. parallel data discovery - e.g. timed out retrieval and updates Monday, 17 June 13
  • 16. 16 virtual repository • defines “standard” rules of exchange - the formats of the data types, the APIs of the formats • app transforms standards - no custom work, less transformations • plugins take/return standards - do the custom work, as per repository mission • standards-based rendezvous - app and plugins sync on data - ignore each other otherwise: technologies in the back seat Monday, 17 June 13
  • 18. 18 a use case • app manages code lists - SDMX is a standard for code lists - app implements internal ⇿ SDMX • some repos disseminate code lists - e.g. triple-store as SKOS, RDBMS as custom CSV - plugins implement SKOS ⇿ SDMX, CSV ⇾ SDMX • some flows are enabled - DB ⇾ DB plugin ⇾ SDMX ⇾ app - TS ⇾ TS plugin ⇾ SDMX ⇾ app - DB ⇾ DB plugin ⇾ SDMX ⇾ app ⇾ SDMX ⇾ TS plugin ⇾ TS Monday, 17 June 13
  • 19. 19 what we expect • for apps - one or two transforms reach the ‘data hood’ - no network awareness: easy coding - no dependency on repos, including legacy ones: data before technologies • for repositories - an API for Java clients - a low-cost one: plugins are easy - no dependencies on clients: handle evolution in one place • net gains - max results, least effort - loose coupling Monday, 17 June 13
  • 20. 20 minimal client API • AssetType - what can be exchanged: just a named standard • Asset - a description of what is exchanged: a named instance of an AssetType - bound to RepositoryService that has it/can take it - specialised: SdmxAsset, SdmxCodelist, CsvAsset, CsvCodelist, ... - well-known properties induced by type, arbitrary ones specific to instance • VirtualRepository - what mediates the exchange of Assets; - can discover of Assets given AssetTypes - can retrieve/publish their content in one or more standard APIs Monday, 17 June 13
  • 21. 21 asset discovery //somewhere  in  the  app VirtualRepository  repo    =  …;  //factories,  injection,  new() //elsewhere:  discovery  is  a  remote  operation int  discovered  =  repo.discover(SdmxCodelist.type,  CsvCodelist.type);   //elsewhere:  build  discovery  screen  for  users for  (Asset  codelist  :  repo)  {    …    …codelist.id()…      …codelist.name()…    …codelist().service().name()…              for  (Property  p  :  codelist.properties())                    …p.name()…                    …p.value()…                    …p.description…      …   } Monday, 17 June 13
  • 22. 22 asset retrieval //use  chosen  an  asset String  codelistId  =  …; //retrieve  metadata  previously  discovered Asset  asset  =  repo.lookup(codelistId); //DISCLAIMER:  there  are  more  elegant  ways  to  dispatch!!! if  (asset  instanceof  SdmxCodelist)  {      //a  remote  operation:  CodelistBean  is  a  standard  API  for  SDMX      CodelistBean  codelist  =  repo.retrieve(asset,  CodelistBean.class)              importFromSdmx(codelist);  //app’s  transform } else    if  (asset  instanceof  CsvCodelist)  {       //a  remote  operation:  Table  is  a  standard  API  for  CSV Table  codelist  =  repo.retrieve(asset,Table.class) importFromCsv((CsvCodelist)  codelist,codelist);  //app’s  transform } Monday, 17 June 13
  • 23. 23 asset publication (1) //build  publication  screen  for  users Collection<RepositoryService>  sinks  =   repo.sinks(SdmxCodelist.type,CsvCodelist.type); //retrieve  metadata  previously  discovered for  (RepositoryService  sink  :  sinks)  {        …sink().name()…          …for  (Property  p  :  singk.properties())                    …p.name()…                    …p.value()…                    …p.description… } //elsewhere:  user  has  chosen  an  asset String  codelistId  =  … MyCodelist  codelist  =  …codelistId…  //  app  retrieves  it //elsewhere:  user  has  chosen  a  repository          String  serviceId  =  …; RepositoryService  sink  =  repo.services().lookup(sinkId);  //  app  retrieves  it Monday, 17 June 13
  • 24. 24 asset publication (2) if  (sink.publishes(SdmxCodelist.type))  { SdmxCodelist  codelist  =  new  SdmxCodelist(...sink...); CodelistBean  sdmxStream  =    publishToSdmx(codelist);  //app’s  transform    //publication  is  a  remote  operation repo.publish(asset,sdmxStream);  }  else  if  (sink.publishes(CsvCodelist.type))    { CsvCodelist  codelist  =  new  CSVCodelist(...sink...); Table  table  =    publishToCsv(codelist);  //app’s  transform    repo.publish(asset,table);      } Monday, 17 June 13
  • 25. 25 where are we • virtual-repository-1.0.0 - out end of the month, snapshots in gcube-snapshots • virtual-sdmx-registry-1.0.0 - plugin for one or more SDMX registries - including iMarine’s (uses CNR’s library) • virtual-semantic-repository-1.0.0 - plugin for FAO’s triple-store of reference data • virtual-rtms-1.0.0 - plugin for FAO’s Figis RDBMS • quick turnaround - one month development activities, part-time (3 devs) Monday, 17 June 13
  • 26. 26 where are we • the approach is viable - Cotrix integration: expected benefits delivered at expected costs - plugin development: expected costs, 3-4 days fulltime - but needs supervision: new standards require new releases • we have learned a thing or two - e.g. SDMX is self-describing and flexible, but of bounded expressiveness - e.g. CSV is less self-describing and regular, but unbounded in principle • we have much more to learn still - can we stand production ? - can we move outside reference data and into ‘big data’ ? - can we scale when many plugins flog the app’s classpath ? - what range of apps can we really support? Monday, 17 June 13
  • 27. 27 where we are going • grow the ‘data hood’ - more standards (including non-reference data) - more repositories (i.e. more plugins) - on demand • grow the apps - the new TimeSeries ? - AssetExplorer ? - built entirely and solely on VR plus all known plugins - browse the ‘data hood’ to download in required format - put those transform to practical use - killer app for VR Monday, 17 June 13