Le 30 mars 2016 à 14h30 à la Maison des Sciences de l'Homme 4 rue Ledru, Clermont Ferrand aura lieu une table ronde sur Catastrophes Humanitaires et Communication
Efficient, Scalable, and Provenance-Aware Management of Linked DataeXascale Infolab
The proliferation of heterogeneous Linked Data on the Web requires data management systems to constantly improve their scalability and efficiency. Despite recent advances in distributed Linked Data management, efficiently processing large amounts of Linked Data in a scalable way is still very challenging. In spite of their seemingly simple data models, Linked Data actually encode rich and complex graphs mixing both instance and schema level data. At the same time, users are increasingly interested in investigating or visualizing large collections of online data by performing complex analytic queries. The heterogeneity of Linked Data on the Web also poses new challenges to database systems. The capacity to store, track, and query provenance data is becoming a pivotal feature of Linked Data Management Systems. In this thesis, we tackle issues revolving around processing queries on big, unstructured, and heterogeneous Linked Data graphs.
Le 30 mars 2016 à 14h30 à la Maison des Sciences de l'Homme 4 rue Ledru, Clermont Ferrand aura lieu une table ronde sur Catastrophes Humanitaires et Communication
Efficient, Scalable, and Provenance-Aware Management of Linked DataeXascale Infolab
The proliferation of heterogeneous Linked Data on the Web requires data management systems to constantly improve their scalability and efficiency. Despite recent advances in distributed Linked Data management, efficiently processing large amounts of Linked Data in a scalable way is still very challenging. In spite of their seemingly simple data models, Linked Data actually encode rich and complex graphs mixing both instance and schema level data. At the same time, users are increasingly interested in investigating or visualizing large collections of online data by performing complex analytic queries. The heterogeneity of Linked Data on the Web also poses new challenges to database systems. The capacity to store, track, and query provenance data is becoming a pivotal feature of Linked Data Management Systems. In this thesis, we tackle issues revolving around processing queries on big, unstructured, and heterogeneous Linked Data graphs.
The sequential extraction method SEDEX (sedimentary extraction) modificated by ANDERSON & DELANEY
(2000) has been used to quantify separately four sedimentary phosphorus reservoirs in sediments of the gulf of Paria and the
venezuelan atlantic coast: adsorbed or labile plus P-associated to oxides (F1), P-authigenic (F2), P-detrital (F3) and P-organic
(F4). The marine and continental origin of the sediments was determined by separation of detrital apatite (continental) and
carbonate fluorapatite (CFA) of marine origin. The total phosphorus concentrations are low within the gulf of Paria and the
atlantic venezuelan coast in comparison with other coastal areas (2.38 μmol g-1 to 6.84 μmol g-1) and is mainly in detrital form
(0.78 to 4.61 μmol g-1). In decreasing order the concentrations are: organic (0.56 a 2.47 μmol g-1), adsorbed or labile
phosphorus plus associated oxides (0, 04 to 0. 56 μmol g-1) > autigenic phosphorus (0.04 to 0.31 μmol g-1). ANOVA statistical
tests (P < 0.05) show discrepancies only in the concentrations of the adsorbed or labile phosphorus plus associated oxides
fractions, values being lower in the gulf of Paria. The results suggests that the principal sources are terrestrial lithogenic
apatite from eroded material of the orogenic belts of the coastal Andes and Guiana shield and venezuelan and colombian plains
which was then carried by waters of the Orinoco river and redistributed there. The organic material contribution of native
origin and aloctonal is the second factor that controls the presence of phosphorus in the sediment. The marine contributions
are noted towards the northeast end typified by the presence of carbonate fluoroapatite, indicating of transformation
processes within the sediment.
“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...Marta Villegas
Talk given at the "1st Summer Datathon on Linguistic Linked Open Data (SD-LLOD-15)"
In this talk we will describe our experience when publishing and, more crucially, consuming Linked Data at the Spanish CLARIN Knowledge Centre (http://lod.iula.upf.edu). The center includes a Catalog of NLP resources & tools which aims to promote the use of language technology to researches of Humanities and Social Sciences. Though the original data set followed the XML/XSD schema, this was rewritten in accordance to the LOD approach in order to maximize the information contained in our repositories and to be able to enrich the data there.
We will addresses some critical aspects when RDFying XSD/XML data focusing on the strategy followed when mapping controlled vocabularies expressed in XML enumerations; when dealing with certain unstructured data (those where input strings may generate relevant instances); and when addressing identity resolution and linking tasks once the eventual instances are RDFied. Here we will also report on data cleansing, a crucial and unavoidable task which we addressed as an incremental process where SPARQL played an important role. We will see that some of the decisions taken depend on the eventual application we have in mind. The requirements of our Catalog (implemented as a web browser) include: displaying data to the user in a comprehensive way; aggregating external data in a sensitive manner and making hidden implicit relations explicit. In addition, the system needs to provide fresh data (regularly updated) in a quick response time.
Finally, we will report on our experiences when addressing data integration and enrichment (via data mashup). We experimented with different strategies (e.g. using external URIS vs caching local data) and faced different problems (time latency, dereferencing external URIS) that may be useful to share.
The sequential extraction method SEDEX (sedimentary extraction) modificated by ANDERSON & DELANEY
(2000) has been used to quantify separately four sedimentary phosphorus reservoirs in sediments of the gulf of Paria and the
venezuelan atlantic coast: adsorbed or labile plus P-associated to oxides (F1), P-authigenic (F2), P-detrital (F3) and P-organic
(F4). The marine and continental origin of the sediments was determined by separation of detrital apatite (continental) and
carbonate fluorapatite (CFA) of marine origin. The total phosphorus concentrations are low within the gulf of Paria and the
atlantic venezuelan coast in comparison with other coastal areas (2.38 μmol g-1 to 6.84 μmol g-1) and is mainly in detrital form
(0.78 to 4.61 μmol g-1). In decreasing order the concentrations are: organic (0.56 a 2.47 μmol g-1), adsorbed or labile
phosphorus plus associated oxides (0, 04 to 0. 56 μmol g-1) > autigenic phosphorus (0.04 to 0.31 μmol g-1). ANOVA statistical
tests (P < 0.05) show discrepancies only in the concentrations of the adsorbed or labile phosphorus plus associated oxides
fractions, values being lower in the gulf of Paria. The results suggests that the principal sources are terrestrial lithogenic
apatite from eroded material of the orogenic belts of the coastal Andes and Guiana shield and venezuelan and colombian plains
which was then carried by waters of the Orinoco river and redistributed there. The organic material contribution of native
origin and aloctonal is the second factor that controls the presence of phosphorus in the sediment. The marine contributions
are noted towards the northeast end typified by the presence of carbonate fluoroapatite, indicating of transformation
processes within the sediment.
“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...Marta Villegas
Talk given at the "1st Summer Datathon on Linguistic Linked Open Data (SD-LLOD-15)"
In this talk we will describe our experience when publishing and, more crucially, consuming Linked Data at the Spanish CLARIN Knowledge Centre (http://lod.iula.upf.edu). The center includes a Catalog of NLP resources & tools which aims to promote the use of language technology to researches of Humanities and Social Sciences. Though the original data set followed the XML/XSD schema, this was rewritten in accordance to the LOD approach in order to maximize the information contained in our repositories and to be able to enrich the data there.
We will addresses some critical aspects when RDFying XSD/XML data focusing on the strategy followed when mapping controlled vocabularies expressed in XML enumerations; when dealing with certain unstructured data (those where input strings may generate relevant instances); and when addressing identity resolution and linking tasks once the eventual instances are RDFied. Here we will also report on data cleansing, a crucial and unavoidable task which we addressed as an incremental process where SPARQL played an important role. We will see that some of the decisions taken depend on the eventual application we have in mind. The requirements of our Catalog (implemented as a web browser) include: displaying data to the user in a comprehensive way; aggregating external data in a sensitive manner and making hidden implicit relations explicit. In addition, the system needs to provide fresh data (regularly updated) in a quick response time.
Finally, we will report on our experiences when addressing data integration and enrichment (via data mashup). We experimented with different strategies (e.g. using external URIS vs caching local data) and faced different problems (time latency, dereferencing external URIS) that may be useful to share.