Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
The Virtual Repository
1. Fabio Simeoni (FAO)
the virtual repository
standards-based import and publication
Monday, 17 June 13
2. 2
outline
• about data import and publication
• why it is a problem
• how it can be simplified
• the virtual repository
• where we are
• where we are going
Monday, 17 June 13
3. 3
context
• there is an app
- manages data of some type: adds some value to it
• there is data out there
- quite a lot: waiting to be managed
• there are places out there
- quite a few: waiting to disseminate the added value
- repositories: specialised network services
• the app wants to reach out
- with first-class import and publication facilities
Monday, 17 June 13
4. 4
what we mean by...
• data import
- pull data from some “source”
- transform it, store it, and use it for app-local purposes
- this is no real-time, fine-grained access
• data publication
- transform data for dissemination purposes
- push it to some “sink”
- this is no real-time, fine-grained update
• “coarse” I/O
Monday, 17 June 13
6. 6
the “average joe”
• import = file upload
- users are the sources: they have the data
- just one use case: what about data in repositories?
- should users discover it and retrieve it on behalf of the app?
• publication = export to file
- users are the sinks: they use the data
- just one use case: what about other consumers?
- should users disseminate data on behalf of repositories?
Monday, 17 June 13
8. 8
(fancier variations)
• URI resolution
- users provide URIs, app resolves them
- a step forward, but onus of discovery remains on users
- repositories not ‘on the Web’ still out of the picture
• no publication, app disseminates
- doubles as a repository service
- two different missions/roles/competencies
- require different models, designs, technologies
- would rather integrate specialised solutions in infra
Monday, 17 June 13
9. 9
imagine this
• users browse all data ‘nearby’ the app
- metadata describes contents, provenance, size ...
• users pick what data to import
- providing directives on how the app should convert it
• users browse repositories ‘nearby’ the app
- metadata describes location, policy, formats, ...
• users pick where to publish
- providing directives on how the app should convert for it
Monday, 17 June 13
10. 10
imagine this
IMPORT
an asset … … …
another asset … … …
that asset … … …
my asset … … …
your asset … … …
… … … …
VERS. ORIGIN ...NAME
…
10 Applications
10 Documents
10 Servers
CHOOSES
customises
PUBLISH
a repo … … …
another repo … … …
that repo … … …
my repo … … …
your repo … … …
… … … …
…. …. ...NAME
10 Applications
10 Documents
10 Servers
CHOOSES
customises
Monday, 17 June 13
11. 11
why don’t we see it
• it’s not simple
- many sources/sinks, APIs, formats, transforms
- difficult to paper over differences for users
- difficult to handle distributed interactions properly
- overall, a non-trivial interoperability problem
• it’s not cost-effective
- it’s not the core business of the app
- core business is to manage, not I/O
Monday, 17 June 13
12. 12
wrong assumptions
• costs should fall entirely on the app
- to bridge across many formats and APIs over the network
• repositories can’t help
- yet their core business is precisely to disseminate
• tools can’t help
- yet the same problem recurs in many apps
Monday, 17 June 13
13. 13
different assumptions
• users are there to choose
- what to import, where to publish: it’s their privilege
• app is there to map
- to/form internal model: it’s its job
• repositories are there to ingest and disseminate
- should make it easy to publish and import: it’s their mission
• tools should provide the glue
- factor out common tasks in reusable solutions: it’s well in their scope
Monday, 17 June 13
14. 14
virtual repository
• a client library, a Jar
- helps the app build first-class import/publication facilities
• materialises an imaginary repository
- client API to discover, retrieve and publish data
• tailored to app
- contains/takes what app can transform (not other way around)
• seemingly local
- as if the data was right there, no ‘network-awareness’
Monday, 17 June 13
15. 15
virtual repository
• a view over real repositories
- defines the ‘data hood’ of the app
• modular
- built out of repository-specific plugins
- plugins implement SPI in their own Jars
- app cherry-picks plugins and deploys Jars
• network-aware
- e.g. parallel data discovery
- e.g. timed out retrieval and updates
Monday, 17 June 13
16. 16
virtual repository
• defines “standard” rules of exchange
- the formats of the data types, the APIs of the formats
• app transforms standards
- no custom work, less transformations
• plugins take/return standards
- do the custom work, as per repository mission
• standards-based rendezvous
- app and plugins sync on data
- ignore each other otherwise: technologies in the back seat
Monday, 17 June 13
18. 18
a use case
• app manages code lists
- SDMX is a standard for code lists
- app implements internal ⇿ SDMX
• some repos disseminate code lists
- e.g. triple-store as SKOS, RDBMS as custom CSV
- plugins implement SKOS ⇿ SDMX, CSV ⇾ SDMX
• some flows are enabled
- DB ⇾ DB plugin ⇾ SDMX ⇾ app
- TS ⇾ TS plugin ⇾ SDMX ⇾ app
- DB ⇾ DB plugin ⇾ SDMX ⇾ app ⇾ SDMX ⇾ TS plugin ⇾ TS
Monday, 17 June 13
19. 19
what we expect
• for apps
- one or two transforms reach the ‘data hood’
- no network awareness: easy coding
- no dependency on repos, including legacy ones: data before technologies
• for repositories
- an API for Java clients
- a low-cost one: plugins are easy
- no dependencies on clients: handle evolution in one place
• net gains
- max results, least effort
- loose coupling
Monday, 17 June 13
20. 20
minimal client API
• AssetType
- what can be exchanged: just a named standard
• Asset
- a description of what is exchanged: a named instance of an AssetType
- bound to RepositoryService that has it/can take it
- specialised: SdmxAsset, SdmxCodelist, CsvAsset, CsvCodelist, ...
- well-known properties induced by type, arbitrary ones specific to instance
• VirtualRepository
- what mediates the exchange of Assets;
- can discover of Assets given AssetTypes
- can retrieve/publish their content in one or more standard APIs
Monday, 17 June 13
21. 21
asset discovery
//somewhere
in
the
app
VirtualRepository
repo
=
…;
//factories,
injection,
new()
//elsewhere:
discovery
is
a
remote
operation
int
discovered
=
repo.discover(SdmxCodelist.type,
CsvCodelist.type);
//elsewhere:
build
discovery
screen
for
users
for
(Asset
codelist
:
repo)
{
…
…codelist.id()…
…codelist.name()…
…codelist().service().name()…
for
(Property
p
:
codelist.properties())
…p.name()…
…p.value()…
…p.description…
…
}
Monday, 17 June 13
22. 22
asset retrieval
//use
chosen
an
asset
String
codelistId
=
…;
//retrieve
metadata
previously
discovered
Asset
asset
=
repo.lookup(codelistId);
//DISCLAIMER:
there
are
more
elegant
ways
to
dispatch!!!
if
(asset
instanceof
SdmxCodelist)
{
//a
remote
operation:
CodelistBean
is
a
standard
API
for
SDMX
CodelistBean
codelist
=
repo.retrieve(asset,
CodelistBean.class)
importFromSdmx(codelist);
//app’s
transform
}
else
if
(asset
instanceof
CsvCodelist)
{
//a
remote
operation:
Table
is
a
standard
API
for
CSV
Table
codelist
=
repo.retrieve(asset,Table.class)
importFromCsv((CsvCodelist)
codelist,codelist);
//app’s
transform
}
Monday, 17 June 13
23. 23
asset publication (1)
//build
publication
screen
for
users
Collection<RepositoryService>
sinks
=
repo.sinks(SdmxCodelist.type,CsvCodelist.type);
//retrieve
metadata
previously
discovered
for
(RepositoryService
sink
:
sinks)
{
…sink().name()…
…for
(Property
p
:
singk.properties())
…p.name()…
…p.value()…
…p.description…
}
//elsewhere:
user
has
chosen
an
asset
String
codelistId
=
…
MyCodelist
codelist
=
…codelistId…
//
app
retrieves
it
//elsewhere:
user
has
chosen
a
repository
String
serviceId
=
…;
RepositoryService
sink
=
repo.services().lookup(sinkId);
//
app
retrieves
it
Monday, 17 June 13
24. 24
asset publication (2)
if
(sink.publishes(SdmxCodelist.type))
{
SdmxCodelist
codelist
=
new
SdmxCodelist(...sink...);
CodelistBean
sdmxStream
=
publishToSdmx(codelist);
//app’s
transform
//publication
is
a
remote
operation
repo.publish(asset,sdmxStream);
}
else
if
(sink.publishes(CsvCodelist.type))
{
CsvCodelist
codelist
=
new
CSVCodelist(...sink...);
Table
table
=
publishToCsv(codelist);
//app’s
transform
repo.publish(asset,table);
}
Monday, 17 June 13
25. 25
where are we
• virtual-repository-1.0.0
- out end of the month, snapshots in gcube-snapshots
• virtual-sdmx-registry-1.0.0
- plugin for one or more SDMX registries
- including iMarine’s (uses CNR’s library)
• virtual-semantic-repository-1.0.0
- plugin for FAO’s triple-store of reference data
• virtual-rtms-1.0.0
- plugin for FAO’s Figis RDBMS
• quick turnaround
- one month development activities, part-time (3 devs)
Monday, 17 June 13
26. 26
where are we
• the approach is viable
- Cotrix integration: expected benefits delivered at expected costs
- plugin development: expected costs, 3-4 days fulltime
- but needs supervision: new standards require new releases
• we have learned a thing or two
- e.g. SDMX is self-describing and flexible, but of bounded expressiveness
- e.g. CSV is less self-describing and regular, but unbounded in principle
• we have much more to learn still
- can we stand production ?
- can we move outside reference data and into ‘big data’ ?
- can we scale when many plugins flog the app’s classpath ?
- what range of apps can we really support?
Monday, 17 June 13
27. 27
where we are going
• grow the ‘data hood’
- more standards (including non-reference data)
- more repositories (i.e. more plugins)
- on demand
• grow the apps
- the new TimeSeries ?
- AssetExplorer ?
- built entirely and solely on VR plus all known plugins
- browse the ‘data hood’ to download in required format
- put those transform to practical use
- killer app for VR
Monday, 17 June 13