Data processing is increasingly the subject of various internal and external regulations, such as GDPR which has recently come into effect. Instead of assuming that such processes avail of data sources (such as files and relational databases), we approach the problem in a more abstract manner and view these processes as taking datasets as input. These datasets are then created by pulling data from various data sources. Taking a W3C Recommendation for prescribing the structure of and for describing datasets, we investigate an extension of that vocabulary for the generation of executable R2RML mappings. This results in a top-down approach where one prescribes the dataset to be used by a data process and where to find the data, and where that prescription is subsequently used to retrieve the data for the creation of the dataset “just in time”. We argue that this approach to the generation of an R2RML mapping from a dataset description is the first step towards policy-aware mappings, where the generation takes into account regulations to generate mappings that are compliant. In this paper, we describe how one can obtain an R2RML mapping from a data structure definition in a declarative manner using SPARQL CONSTRUCT queries, and demonstrate it using a running example. Some of the more technical aspects are also described.
Reference: Christophe Debruyne, Dave Lewis, Declan O'Sullivan: Generating Executable Mappings from RDF Data Cube Data Structure Definitions. OTM Conferences (2) 2018: 333-350
Giving MongoDB a Way to Play with the GIS CommunityMongoDB
The Geographic Information System (GIS), industry is booming, especially with the continued reliance on online maps and the rise of location-aware mobile devices. GIS tech can be one of the key players in the mobile internet, big data, and the internet of things, and is an essential tool for the next generation of the global IT industry.
Yet, the GIS community is not prepared. With all the data available, GIS experts lack an off-the-shelf solutions to manage the growing volume of spatial data. Relational spatial databases (RSDB) were the leader in this field for decades, but RSDBs have failed to innovate to handle massive volumes of data coming in at high velocity.
Fortunately, MongoDB a useful tool for this challenge, but needs some tooling to create a connector to the GIS tech ecosystem. In order to bridge the gap, we built a pipeline to comply with the architecture of the Geospatial Data Abstraction Library (GDAL), so that MongoDB can work with most of popular GIS tools such as OpenLayers, Mapserver, GeoServer, QGIS, ArcGIS and others with ease. In this talk, I'll go through this pipeline tool and showcase some examples of how you can use this in your next application.
We all know good training data is crucial for data scientists to build quality machine learning models. But when productionizing Machine Learning, Metadata is equally important. Consider for example:
- Provenance of model allowing for reproducible builds
- Context to comply with GDPR, CCPA requirements
- Identifying data shift in your production data
This is the reason we built ArangoML Pipeline, a flexible Metadata store which can be used with your existing ML Pipeline.
Today we are happy to announce a release of ArangoML Pipeline Cloud. Now you can start using ArangoML Pipeline without having to even start a separate docker container.
In this webinar, we will show how to leverage ArangoML Pipeline Cloud with your Machine Learning Pipeline by using an example notebook from the TensorFlow tutorial.
Find the video here: https://www.arangodb.com/arangodb-events/arangoml-pipeline-cloud/
Challenge And Evolution Of Data Orchestration at Rakuten Data SystemAlluxio, Inc.
Data Orchestration Summit 2019
www.alluxio.io/data-orchestration-summit-2019
Nov 7, 2019
Challenge And Evolution Of Data Orchestration at Rakuten Data System
Speaker: Lei Ai, Rakuten
Web Oriented FIM for large scale dataset using Hadoopdbpublications
In large scale datasets, mining frequent itemsets using existing parallel mining algorithm is to balance the load by distributing such enormous data between collections of computers. But we identify high performance issue in existing mining algorithms [1]. To handle this problem, we introduce a new approach called data partitioning using Map Reduce programming model.In our proposed system, we have introduced new technique called frequent itemset ultrametric tree rather than conservative FP-trees. An investigational outcome tells us that, eradicating redundant transaction results in improving the performance by reducing computing loads.
Survey on Performance of Hadoop Map reduce Optimization Methodspaperpublications3
Abstract: Hadoop is a open source software framework for storage and processing large scale of datasets on clusters of commodity hardware. Hadoop provides a reliable shared storage and analysis system, here storage provided by HDFS and analysis provided by MapReduce. MapReduce frameworks are foraying into the domain of high performance of computing with stringent non-functional requirements namely execution times and throughputs. MapReduce provides simple programming interfaces with two functions: map and reduce. The functions can be automatically executed in parallel on a cluster without requiring any intervention from the programmer. Moreover, MapReduce offers other benefits, including load balancing, high scalability, and fault tolerance. The challenge is that when we consider the data is dynamically and continuously produced, from different geographical locations. For dynamically generated data, an efficient algorithm is desired, for timely guiding the transfer of data into the cloud over time for geo-dispersed data sets, there is need to select the best data center to aggregate all data onto given that a MapReduce like framework is most efficient when data to be processed are all in one place, and not across data centers due to the enormous overhead of inter-data center data moving in the stage of shuffle and reduce. Recently, many researchers tend to implement and deploy data-intensive and/or computation-intensive algorithms on MapReduce parallel computing framework for high processing efficiency.
Giving MongoDB a Way to Play with the GIS CommunityMongoDB
The Geographic Information System (GIS), industry is booming, especially with the continued reliance on online maps and the rise of location-aware mobile devices. GIS tech can be one of the key players in the mobile internet, big data, and the internet of things, and is an essential tool for the next generation of the global IT industry.
Yet, the GIS community is not prepared. With all the data available, GIS experts lack an off-the-shelf solutions to manage the growing volume of spatial data. Relational spatial databases (RSDB) were the leader in this field for decades, but RSDBs have failed to innovate to handle massive volumes of data coming in at high velocity.
Fortunately, MongoDB a useful tool for this challenge, but needs some tooling to create a connector to the GIS tech ecosystem. In order to bridge the gap, we built a pipeline to comply with the architecture of the Geospatial Data Abstraction Library (GDAL), so that MongoDB can work with most of popular GIS tools such as OpenLayers, Mapserver, GeoServer, QGIS, ArcGIS and others with ease. In this talk, I'll go through this pipeline tool and showcase some examples of how you can use this in your next application.
We all know good training data is crucial for data scientists to build quality machine learning models. But when productionizing Machine Learning, Metadata is equally important. Consider for example:
- Provenance of model allowing for reproducible builds
- Context to comply with GDPR, CCPA requirements
- Identifying data shift in your production data
This is the reason we built ArangoML Pipeline, a flexible Metadata store which can be used with your existing ML Pipeline.
Today we are happy to announce a release of ArangoML Pipeline Cloud. Now you can start using ArangoML Pipeline without having to even start a separate docker container.
In this webinar, we will show how to leverage ArangoML Pipeline Cloud with your Machine Learning Pipeline by using an example notebook from the TensorFlow tutorial.
Find the video here: https://www.arangodb.com/arangodb-events/arangoml-pipeline-cloud/
Challenge And Evolution Of Data Orchestration at Rakuten Data SystemAlluxio, Inc.
Data Orchestration Summit 2019
www.alluxio.io/data-orchestration-summit-2019
Nov 7, 2019
Challenge And Evolution Of Data Orchestration at Rakuten Data System
Speaker: Lei Ai, Rakuten
Web Oriented FIM for large scale dataset using Hadoopdbpublications
In large scale datasets, mining frequent itemsets using existing parallel mining algorithm is to balance the load by distributing such enormous data between collections of computers. But we identify high performance issue in existing mining algorithms [1]. To handle this problem, we introduce a new approach called data partitioning using Map Reduce programming model.In our proposed system, we have introduced new technique called frequent itemset ultrametric tree rather than conservative FP-trees. An investigational outcome tells us that, eradicating redundant transaction results in improving the performance by reducing computing loads.
Survey on Performance of Hadoop Map reduce Optimization Methodspaperpublications3
Abstract: Hadoop is a open source software framework for storage and processing large scale of datasets on clusters of commodity hardware. Hadoop provides a reliable shared storage and analysis system, here storage provided by HDFS and analysis provided by MapReduce. MapReduce frameworks are foraying into the domain of high performance of computing with stringent non-functional requirements namely execution times and throughputs. MapReduce provides simple programming interfaces with two functions: map and reduce. The functions can be automatically executed in parallel on a cluster without requiring any intervention from the programmer. Moreover, MapReduce offers other benefits, including load balancing, high scalability, and fault tolerance. The challenge is that when we consider the data is dynamically and continuously produced, from different geographical locations. For dynamically generated data, an efficient algorithm is desired, for timely guiding the transfer of data into the cloud over time for geo-dispersed data sets, there is need to select the best data center to aggregate all data onto given that a MapReduce like framework is most efficient when data to be processed are all in one place, and not across data centers due to the enormous overhead of inter-data center data moving in the stage of shuffle and reduce. Recently, many researchers tend to implement and deploy data-intensive and/or computation-intensive algorithms on MapReduce parallel computing framework for high processing efficiency.
Building a Big Data platform with the Hadoop ecosystemGregg Barrett
This presentation provides a brief insight into a Big Data platform using the Hadoop ecosystem.
To this end the presentation will touch on:
-views of the Big Data ecosystem and it’s components
-an example of a Hadoop cluster
-considerations when selecting a Hadoop distribution
-some of the Hadoop distributions available
-a recommended Hadoop distribution
SQL vs NoSQL, an experiment with MongoDBMarco Segato
A simple experiment with MongoDB compared to Oracle classic RDBMS database: what are NoSQL databases, when to use them, why to choose MongoDB and how we can play with it.
In this webinar, we'll see how to use Spark to process data from various sources in R and Python and how new tools like Spark SQL and data frames make it easy to perform structured data processing.
Big Data is an evolution of Business Intelligence (BI).
Whereas traditional BI relies on data warehouses limited in size
(some terabytes) and it hardly manages unstructured data and
real-time analysis, the era of Big Data opens up a new technological
period offering advanced architectures and infrastructures
allowing sophisticated analyzes taking into account these new
data integrated into the ecosystem of the business . In this article,
we will present the results of an experimental study on the performance
of the best framework of Big Analytics (Spark) with the
most popular databases of NoSQL MongoDB and Hadoop. The
objective of this study is to determine the software combination
that allows sophisticated analysis in real time.
"Towards GeneratingPolicy-compliant Datasets" by Christophe Debruyne, Harshvardhan J. Pandit, Dave Lewis, Declan O’Sullivan. Presented at the The 13th IEEE International Conference on SEMANTIC COMPUTING
Jan 30 - Feb 1, 2019, Newport Beach, California
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...Debraj GuhaThakurta
Event: TDWI Accelerate, Seattle, Oct 16, 2017
Topic: Distributed and In-Database Analytics with R
Presenter: Debraj GuhaThakurta
Tags: R, Spark, SQL Server
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...Debraj GuhaThakurta
Event: TDWI Accelerate Seattle, October 16, 2017
Topic: Distributed and In-Database Analytics with R
Presenter: Debraj GuhaThakurta
Description: How to develop scalable and in-DB analytics using R in Spark and SQL-Server
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingKent Graziano
This is a presentation I gave at OUGF14 in Helsinki, Finland.
Data Vault Data Modeling is an agile data modeling technique for designing highly flexible, scalable, and adaptable data structures for enterprise data warehouse repositories. It is a hybrid approach using the best of 3NF and dimensional modeling. It is not a replacement for star schema data marts (and should not be used as such). This approach has been used in projects around the world (Europe, Australia, USA) for the last 10 years but is still not widely known or understood. The purpose of this presentation is to provide attendees with a detailed introduction to the components of the Data Vault Data Model, what they are for and how to build them. The examples will give attendees the basics for how to build, and design structures incrementally, without constant refactoring, when using the Data Vault modeling technique. This technique works well for:
• Building the Enterprise Data Warehouse repository in a CIF architecture
• Building a Persistent Staging Area (PSA) in a Kimball Bus Architecture
• Building your data model incrementally, one sprint at a time using a repeatable technique
• Providing a model that is easily extensible without need to re-engineer existing structure or load processes
Similar to Generating Executable Mappings from RDF Data Cube Data Structure Definitions (20)
BURPing Through RML Test Cases (presented at KGC Workshop @ ESWC 2024)KGChristophe Debruyne
Recently, the W3C Community Group on Knowledge Graph Construction created a suite of test cases for all RML modules developed in the Community Group to verify implementations’ compliance with the new RML specifications. However, these RML test cases could not be tested because no existing RML Processor supports them. In this paper, we report on our process of testing the new RML test cases while at the same time implementing support for the new RML modules in a reference implementation, which we call `BURP' (Basic and Unassuming RML Processor), to investigate the feasibility and possible mistakes of the new RML test cases and specifications. We found several problems in the RML modules, ranging from mismatches between the test cases and their specification and invalid SHACL shapes to edge cases not covered by the specifications. Through this work, we improve the quality of RML test cases and the coverage of their corresponding specifications to increase adoption and conformance among RML Processors.
One year of DALIDA Data Literacy Workshops for Adults: a ReportChristophe Debruyne
Christophe Debruyne, Laura Grehan, Mairéad Hurley, Anne Kearns, Ciaran O'Neill. One year of DALIDA Data Literacy Workshops for Adults: a Report. In Frédérique Laforest, Raphaël Troncy, Elena Simperl, Deepak Agarwal, Aristides Gionis, Ivan Herman, and Lionel Médini, editors, Companion of The Web Conference 2022, Virtual Event / Lyon, France, April 25 - 29, 2022, pages 403-407. ACM, 2022
Projet TOXIN : Des graphes de connaissances pour la recherche en toxicologieChristophe Debruyne
Christophe Debruyne. Projet TOXIN : Des graphes de connaissances pour la recherche en toxicologie. INRS Symposium on "L'informatique au service de l'évaluation du risque chimique" (10 November 2022, Nancy, France)
Knowledge Graphs: Concept, mogelijkheden en aandachtspuntenChristophe Debruyne
Kennis en informatie in een bedrijfsorganisatorische context zijn doorgaans versnipperd en verspreid over databases, rekenbladen, documenten, etc. Daarnaast bezitten kenniswerkers ook domeinexpertise die niet in een systeem wordt opgeslagen. Maar wat als men die kennis en informatie wenst te integreren om, bijvoorbeeld, processen te automatiseren of nieuwe inzichten te verwerven?
Knowledge graphs bieden hiervoor een oplossing. In deze presentatie werpt Christophe Debruyne zijn licht op het concept van de knowledge graphs en hun mogelijkheden. Hij behandelt daarvoor de volgende punten:
Wat is een knowledge graph?
Knowledge graphs versus andere initiatieven
Knowledge graphs versus andere AI technieken
Toepassingsgebied van knowledge graphs
Bouwen en onderhouden van een knowledge graph
SAI.be avondseminarie van 16-11-2021
Reusable SHACL Constraint Components for Validating Geospatial Linked DataChristophe Debruyne
Reusable SHACL Constraint Components for Validating Geospatial Linked Data. Paper presented at the 4th International Workshop on Geospatial Linked Data (GeoLD 2021)
Dr Christophe Debruyne and Dr Lynn Kilgallon showcase this exciting Computer Science research strand in Beyond 2022’s work, demonstrating its potential for changing the questions we can ask of the recovered records, and the hidden stories it can reveal.
Facilitating Data Curation: a Solution Developed in the Toxicology DomainChristophe Debruyne
Christophe Debruyne, Jonathan Riggio, Emma Gustafson, Declan O'Sullivan, Mathieu Vinken, Tamara Vanhaecke, Olga De Troyer.
Presented at the 2020 IEEE 14th International Conference on Semantic Computing, San Diego, California, 3-5 February 2020
Toxicology aims to understand the adverse effects of
chemical compounds or physical agents on living organisms. For
chemicals, much information regarding safety testing of cosmetic
ingredients is now scattered in a plethora of safety evaluation
reports. Toxicologists in our university intend to collect this
information into a single repository. Their current approach uses
spreadsheets, does not scale well, and makes data curation and
querying cumbersome. Semantic technologies (e.g., RDF, OWL,
and Linked Data principles) would be more appropriate for
this purpose. However, this technology is not very accessible to
toxicologists without extensive training. In this paper, we report
on a tool that supports subject matter experts in the construction
of an RDF–based knowledge base for the toxicology domain. The
tool is using the jigsaw metaphor for guiding the subject matter
experts. We demonstrate that the jigsaw metaphor is a viable
option for generating RDF. Future work includes investigating
appropriate methods and tools for knowledge evolution and data
analysis.
Linked Data Publication and Interlinking Research within the SFI funded ADAPT...Christophe Debruyne
Linked Data Publication and Interlinking Research within the SFI funded ADAPT Centre. This presentation was given at the LIBER LOD workshop during the 48th LIBER Annual Conference is in Dublin, 26-28 June 2019.
"Towards GeneratingPolicy-compliant Datasets" by Christophe Debruyne, Harshvardhan J. Pandit, Dave Lewis, Declan O’Sullivan. Presented at the The 13th IEEE International Conference on SEMANTIC COMPUTING
Jan 30 - Feb 1, 2019, Newport Beach, California
A Lightweight Approach to Explore, Enrich and Use Data with a Geospatial Dime...Christophe Debruyne
Paper presentation: Christophe Debruyne, Kris McGlinn, Lorraine McNerney and Declan O'Sullivan: A Lightweight Approach to Explore, Enrich and Use Data with a Geospatial Dimension with Semantic Web Technologies. Presented at the Fourth International ACM SIGMOD Workshop on Managing and Mining Enriched Geo-Spatial Data GeoRich 2017 Co-located with SIGMOD/PODS 2017 in Chicago, IL, USA
Client-side Processing of GeoSPARQL Functions with Triple Pattern FragmentsChristophe Debruyne
Christophe Debruyne, Éamonn Clinton, Declan O'Sullivan: Client-side Processing of GeoSPARQL Functions with Triple Pattern Fragments. Presented at the Linked Data on the Web (LDOW 2017), colocated with the 26th International World Wide Web Conference, 2017 (WWW 2017)
Available at: http://events.linkeddata.org/ldow2017/papers/LDOW_2017_paper_8.pdf
Presentation about the collaboration between ADAPT and the Ordnance Survey Ireland at Linked Data Seminar -- Culture, Base Registries & Visualisations held in Amsterdam, The Netherlands on the 2nd of December 2016
Serving Ireland's Geospatial Information as Linked Data (ISWC 2016 Poster)Christophe Debruyne
Christophe Debruyne, Eamonn Clinton, Lorraine McNerney, Atul Nautiyal, Declan O'Sullivan:
Serving Ireland's Geospatial Information as Linked Data. International Semantic Web Conference (Posters & Demos) 2016
We present data.geohive.ie, which aims to provide an authoritative
platform for serving Ireland’s national geospatial data, including Linked Data. Currently, the platform provides information on Irish administrative boundaries and was designed to support two use cases: serving boundary data of geographic features at various level of detail and capturing the evolution of administrative boundaries. We report on the decisions taken for modeling and serving the data such as the adoption of an appropriate URI strategy, the development of necessary ontologies, and the use of (named) graphs to support aforementioned use cases.
http://ceur-ws.org/Vol-1690/paper14.pdf
R2RML-F: Towards Sharing and Executing Domain Logic in R2RML MappingsChristophe Debruyne
Christophe Debruyne and Declan O'Sullivan: R2RML-F: Towards Sharing and Executing Domain Logic in R2RML Mappings
Paper presented at Linked Data on the Web (LDOW2016, collocated with WWW2016)
http://events.linkeddata.org/ldow2016/papers/LDOW2016_paper_14.pdf
Towards a Project Centric Metadata Model and Lifecycle for Ontology Mapping G...Christophe Debruyne
Christophe Debruyne, Brian Walshe, Declan O'Sullivan: Towards a Project Centric Metadata Model and Lifecycle for Ontology Mapping Governance. Paper presented at iiWAS 2015 on the 13th of December 2015, Brussels, Belgium.
Creating and Consuming Metadata from Transcribed Historical Vital Records for...Christophe Debruyne
Dolores Grant, Christophe Debruyne, Rebecca Grant, Sandra Collins:
Creating and Consuming Metadata from Transcribed Historical Vital Records for Ingestion in a Long-Term Digital Preservation Platform - (Short Paper). OTM Workshops 2015: 445-450
What is Linked Data?
Presented at the Linked Data for Libraries on Thursday, November 6, 2014 at Trinity College Dublin
http://www.dri.ie/linked-data-libraries
Using Semantic Technologies to Create Virtual Families from Historical Vital ...Christophe Debruyne
"Using Semantic Technologies to Create Virtual Families from Historical Vital Records" Presented at the 1st European Ontology Network (EUON) Workshop collocated with EUDAT 2014. Presentation was given in Amsterdam, The Netherlands on the 25th of September, 2014.
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
This pdf is about the Schizophrenia.
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
1. Generating Executable Mappings from RDF
Data Cube Data Structure Definitions
Christophe Debruyne, Dave Lewis, Declan O’Sullivan
Trinity College Dublin
2018-10-23 @ ODBASE
The ADAPT Centre is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund.
2. www.adaptcentre.ieIntroduction
• Data processing is increasingly the subject of various
internal and external regulations – e.g., GDPR.
• Datasets are created and used for a particular purpose.
E.g., sending newsletters or using the purchase history of
users to suggest recommendations. In the context of GDPR,
these purposes require a user’s informed consent.
• Can we generate datasets for a particular purpose “just in
time” that complies with informed consent?
2018-10-23 2
3. www.adaptcentre.ieIntroduction
• R2RML is a convenient way to transform (relational) non-
RDF data into RDF to create these datasets.
• One can create mappings from databases to vocabularies,
ontologies, etc. for data processing activities.
• We, however, chose to adopt the RDF Data Cube
Vocabulary (QB) for representing datasets.
2018-10-23 3
4. www.adaptcentre.ieIntroduction
• QB is an ontology for multi-dimensional datasets.
A Data Structure Definition prescribes how a Dataset and
its Observations are structure. An Observation is identified
by Dimensions and captures a value for a Measure.
• QB’s foundations is rooted in a schema for statistical
datasets and the ontology seemingly complicated, but the
RDF vocabulary is useful for other types of datasets as well.
• Our choice was also influenced by projects in the health
domain where statistical processing of data is key*
*AVERT project: https://www.tcd.ie/medicine/thkc/avert/index.php/
2018-10-23 4
5. www.adaptcentre.ieResearch Question
• From “Can we generate datasets for a particular purpose
“just in time” that complies with informed consent?”
• To: “If we have a DSD for a particular purpose, how can we
create an executable R2RML mapping to generate a dataset
that complies with that DSD’s structure?”
• A solution could is subsequently be extended to take into
account policies so as to generate mapping that is
compliant. In other words: “policy-aware”. To be reported.
2018-10-23 5
6. www.adaptcentre.ieApproach
• R2DQB – pronounced R-2-D-cube
• Data Structure Definitions
• Dimensions
• Measures
• Attributes
• References to tables
• References to columns
• Transformation functions
• …
Mapping
Engine
R2RML Mapping
R2RML
Processor
Data Cube Dataset
extended with
according
to
1
2
3
Validation 4
Provenance
Information
captured with
5
2018-10-23 6
7. www.adaptcentre.ieApproach
Step 1: annotating DSDs
• May be done in a separate graph (separation of concerns)
• We chose to reuse R2RML to assess the feasibility in this
study. A bespoke vocabulary may be considered in the
future.
(example from RDF Data Vocabulary Recommendation)
2018-10-23 7
8. @base <http://www.example.org/>
<#refPeriod> a rdf:Property, qb:DimensionProperty;
rdfs:subPropertyOf sdmx-dimension:refPeriod .
<#refArea> a rdf:Property, qb:DimensionProperty;
rdfs:subPropertyOf sdmx-dimension:refArea .
<#lifeExpectancy> a rdf:Property, qb:MeasureProperty;
rdfs:subPropertyOf sdmx-measure:obsValue;
rdfs:range xsd:decimal .
sdmx-dimension:sex a rdf:Property, qb:DimensionProperty .
<#dsd-le> a qb:DataStructureDefinition;
# The dimensions
qb:component [ qb:dimension <#refArea> ];
qb:component [ qb:dimension <#refPeriod> ];
qb:component [ qb:dimension sdmx-dimension:sex ];
# The measure(s)
qb:component [ qb:measure <#lifeExpectancy> ] .
@base <http://www.example.org/>
<#refPeriod> rr:column "period";
<#refArea> rr:column "area";
<#lifeExpectancy> rr:column "lifeexpectancy";
sdmx-dimension:sex rr:column "sex" .
<#dsd-le> rr:tableName "statssimple";
The DSD
The annotations
Note: prefixes
omitted for brevity.
9. www.adaptcentre.ieApproach
Step 2: Generating the R2RML mapping
• Adopting a declarative approach with SPARQL CONSTRUCT
queries:
1. Generating a triples map for each DSD
2. Generating a subject map for each DSD and a predicate
object map for linking observations to dataset
Subject map is based on dimensions, as
observations are identified by those.
3. Generating predicate object maps from measures
4. Generating predicate object maps from dimensions
5. Generating a link between dataset and DSD
2018-10-23 9
10. 1. CONSTRUCT {
2. ?tm rr:subjectMap [
3. rr:class qb:Observation ;
4. rr:termType rr:BlankNode ;
5. rr:template ?x ;
6. ] .
7. ?tm rr:predicateObjectMap [
8. rr:predicate qb:dataSet ;
9. rr:object ?ds;
10. ] .
11.} WHERE {
12. ?tm pam:correspondsWith ?dsd ;
13. rr:logicalTable [ rr:tableName ?t ] ;
14. BIND(IRI(?t) AS ?ds)
15. {
16. SELECT
17. (CONCAT("{", GROUP_CONCAT(?c; SEPARATOR="}-{"), "}") as ?x) {
18. ?dsd qb:component ?component .
19. { ?component qb:dimension [ rr:column ?c ] }
20. UNION
21. # OMITTED FOR CLARITY (SEE PAPER)
22. } GROUP BY ?dsd
23. }
24.}
Constructing a subject map for observations
and a predicate object map for linking
observations to a dataset.
All queries can be found in the paper.
12. www.adaptcentre.ieApproach
Step 3: Executing the R2RML Mapping – straightforward
We did use our implementation of R2RML which extends the
specification with JavaScript functions called R2RML-F
Step 4: Validating the generated RDF
Using the integrity constraints specified by the RDF Data Cube
Vocabulary Recommendation
2018-10-23 12
13. www.adaptcentre.ieApproach
Step 5: Provenance Information
Keep track of activities and intermediate results with PROV-O.
This will become key for a posteriori compliance analysis in
future work.
pam:Validation_Report
pam:DSD_Document
pam:Generate_Mapping
pam:Execute_Mapping
pam:Validate_Dataset
pam:Mapping_Generator
pam:R2RML_Processor
pam:DSD_Document
pam:R2RML_Mapping
pam:Validatorowl:Thing
prov:Entity
prov:Agent prov:SoftwareAgent
prov:Activity
2018-10-23 13
14. www.adaptcentre.ieFeatures
Mapping values onto URIs, and
Inclusion of data transformation functions
• Mapping languages such as D2R had so-called translation tables,
which mapped elements of one set to elements of another. Ideal
for mapping values to IRIs. R2RML has no such functionality.
That is why we choose to adopt R2RML-F, where such
“translation tables” can be written in a JavaScript function.
• R2RML-F also allows for transformation functions to be written
when the underlying database technology has not support for
that.
Possibility to interlink with external datasets provided by R2RML
2018-10-23 14
15. www.adaptcentre.ieRelated Work
Related Work – generation of R2RML to the best of our
knowledge limited.
• Skjaeveland et al. 2015 proposed a method to generate an
ontology, rules and a mapping from one description
• TabLinker and CSV2DataCube are two tools for generating
QB graphs from Excel files (in a certain format) and CSV
data respectively
• The Open Cube Toolkit has a built-in R2RML compliant D2R
server, but it relies on a bespoke XML that maps source and
DSD.
2018-10-23 15
16. www.adaptcentre.ieConclusions
• We argued that datasets are used for a purpose and that
datasets should be built suitable for a purpose, including
any policies it should comply with.
• Before we can do the latter, we investigated the former by
trying to answer the question: “Can we generate an R2RML
mapping from a data structure definition?”
• The answer is yes and we presented the R2DQB approach
showing how. We strived for a declarative approach using
SPARQL CONSTRUCT queries. A demonstration of the
approach is presented in the paper.
2018-10-23 16
17. www.adaptcentre.ieFuture work
Tackling the problem of policy-aware mapping, which would
complement research on post-hoc compliance analysis (e.g.,
Harsh et al. 2017). To be reported.
The Metadata Vocabulary for Tabular Data (W3C Rec.). A
vocabulary for describing the “schemas” of tabular data,
including constraints. This might be another representation
worth considering (future work)
2018-10-23 17