The document summarizes a path-based approach for storing and querying multidimensional XML (MXML) data in a relational database. MXML extends XML to represent data with different facets under different contexts. The approach stores MXML nodes in separate tables based on their type and uses a path table and Dewey labeling for indexing. It represents contexts using ordered worlds and binary vectors. It also defines Multidimensional XPath (MXPath) to query MXML data using both explicit and inherited contexts.
MXML Storage and the problem of manipulation of context Giannis Tsakonas
Presentation in the First Workshop on Digital Information Management. The workshop is organized by the Laboratory on Digital Libraries and Electronic Publication, Department of Archives and Library Sciences, Ionian University, Greece and aims to create a venue for unfolding research activity on the general field of Information Science. The workshop features sessions for the dissemination of the research results of the Laboratory members, as well as tutorial sessions on interesting issues.
The document discusses the XML DOM (Document Object Model) and its implementation in .NET. It describes how the XmlDocument class represents an XML document and exposes its nodes through collections, allowing the document to be programmatically manipulated in memory. It also covers how to load, parse, modify, and save XML documents using the XML DOM API.
This document discusses parallel databases and various techniques for parallelizing database operations. It begins by introducing different types of parallelism in databases, including I/O parallelism, interquery parallelism, intraquery parallelism, and interoperation parallelism. It then describes techniques for parallelizing specific operations like sorting, joining, and scanning relations. Some key techniques covered include range partitioning, hash partitioning, fragment replication joins, and partitioned parallel hash joins. The goal is to split data and processing across multiple disks and processors to improve performance for large databases and queries.
AN EMPIRICAL COMPARISON OF WEIGHTING FUNCTIONS FOR MULTI-LABEL DISTANCEWEIGHT...cscpconf
Multi-label classification is an extension of classical multi-class one, where any instance can be
associated with several classes simultaneously and thus the classes are no longer mutually
exclusive. It was experimentally shown that the distance-weighted k-nearest neighbour
(DWkNN) algorithm is superior to the original kNN rule for multi-class learning. But, it has not
been investigated whether the distance-weighted strategy is valid for multi-label learning and
which weighting function performs well. In this paper, we provide a concise multi-label DWkNN form (MLC-DWkNN). Furthermore, four weighting functions, Dudani’s linear function varying from 1 to 0, Macleod’s linear function ranging from 1 to 1/2, Dudani’s inverse distance function, and Zavrel’s exponential function, are collected and then investigated by detailed experiments on three benchmark data sets with Manhattan distance. Our study demonstrates that Dudani’s linear and Zavrel’s exponential functions work well, and moreover MLC-DWkNN with such two functions outperforms an existing kNN-based multi-label classifier ML-kNN
The document discusses NoSQL databases and their different classes, including column stores, document stores, and key-value stores. It provides examples of column store databases BigTable and HBase, and notes that document stores like CouchDB allow data to be stored without a predefined schema. The document also discusses object databases and their advantages over relational databases in avoiding the object-relational impedance mismatch.
Islamic University Previous Year Question Solution 2018 (ADBMS)Rakibul Hasan Pranto
A database management system (DBMS) is software designed to define, manipulate, retrieve, and manage data in a database. The primary goal of a DBMS is to provide convenient and efficient ways to store and retrieve database information. It manages data by defining the structure for storing information and providing mechanisms for manipulating that information.
Advance Database Management Systems -Object Oriented Principles In DatabaseSonali Parab
An OODBMS is the result of combining object oriented programming principles with database management principles. Object oriented programming concepts such as encapsulation, polymorphism and inheritance are enforced as well as database management concepts such as the ACID properties (Atomicity, Consistency, Isolation and Durability) which lead to system integrity, support for an ad hoc query language and secondary storage management systems which allow for managing very large amounts of data. The Object Oriented Database Manifesto specifically lists the following features as mandatory for a system to support before it can be called an OODBMS; Complex objects, Object identity, Encapsulation , Types and Classes , Class or Type Hierarchies, Overriding, overloading and late binding, Computational completeness , Extensibility,Persistence , Secondary storage management, Concurrency, Recovery and an Ad Hoc Query Facility.
ICDE2015 Research 3: Distributed Storage and ProcessingTakuma Wakamori
The document summarizes a research presentation on distributed storage and processing. It discusses two papers: 1) PABIRS, a data access middleware for distributed file systems that efficiently processes mixed workloads of queries. It proposes an integrated data access middleware to address this. 2) Scalable distributed transactions across heterogeneous stores.
It then provides details on PABIRS, which uses a hybrid index with a bitmap index and LSM (log-structured merge) tree index. The bitmap index is used for low selectivity keys, while the LSM index is built for "hot" values with selectivity above a threshold. The system aims to efficiently support data retrieval and insertion for various query workloads on distributed file systems.
MXML Storage and the problem of manipulation of context Giannis Tsakonas
Presentation in the First Workshop on Digital Information Management. The workshop is organized by the Laboratory on Digital Libraries and Electronic Publication, Department of Archives and Library Sciences, Ionian University, Greece and aims to create a venue for unfolding research activity on the general field of Information Science. The workshop features sessions for the dissemination of the research results of the Laboratory members, as well as tutorial sessions on interesting issues.
The document discusses the XML DOM (Document Object Model) and its implementation in .NET. It describes how the XmlDocument class represents an XML document and exposes its nodes through collections, allowing the document to be programmatically manipulated in memory. It also covers how to load, parse, modify, and save XML documents using the XML DOM API.
This document discusses parallel databases and various techniques for parallelizing database operations. It begins by introducing different types of parallelism in databases, including I/O parallelism, interquery parallelism, intraquery parallelism, and interoperation parallelism. It then describes techniques for parallelizing specific operations like sorting, joining, and scanning relations. Some key techniques covered include range partitioning, hash partitioning, fragment replication joins, and partitioned parallel hash joins. The goal is to split data and processing across multiple disks and processors to improve performance for large databases and queries.
AN EMPIRICAL COMPARISON OF WEIGHTING FUNCTIONS FOR MULTI-LABEL DISTANCEWEIGHT...cscpconf
Multi-label classification is an extension of classical multi-class one, where any instance can be
associated with several classes simultaneously and thus the classes are no longer mutually
exclusive. It was experimentally shown that the distance-weighted k-nearest neighbour
(DWkNN) algorithm is superior to the original kNN rule for multi-class learning. But, it has not
been investigated whether the distance-weighted strategy is valid for multi-label learning and
which weighting function performs well. In this paper, we provide a concise multi-label DWkNN form (MLC-DWkNN). Furthermore, four weighting functions, Dudani’s linear function varying from 1 to 0, Macleod’s linear function ranging from 1 to 1/2, Dudani’s inverse distance function, and Zavrel’s exponential function, are collected and then investigated by detailed experiments on three benchmark data sets with Manhattan distance. Our study demonstrates that Dudani’s linear and Zavrel’s exponential functions work well, and moreover MLC-DWkNN with such two functions outperforms an existing kNN-based multi-label classifier ML-kNN
The document discusses NoSQL databases and their different classes, including column stores, document stores, and key-value stores. It provides examples of column store databases BigTable and HBase, and notes that document stores like CouchDB allow data to be stored without a predefined schema. The document also discusses object databases and their advantages over relational databases in avoiding the object-relational impedance mismatch.
Islamic University Previous Year Question Solution 2018 (ADBMS)Rakibul Hasan Pranto
A database management system (DBMS) is software designed to define, manipulate, retrieve, and manage data in a database. The primary goal of a DBMS is to provide convenient and efficient ways to store and retrieve database information. It manages data by defining the structure for storing information and providing mechanisms for manipulating that information.
Advance Database Management Systems -Object Oriented Principles In DatabaseSonali Parab
An OODBMS is the result of combining object oriented programming principles with database management principles. Object oriented programming concepts such as encapsulation, polymorphism and inheritance are enforced as well as database management concepts such as the ACID properties (Atomicity, Consistency, Isolation and Durability) which lead to system integrity, support for an ad hoc query language and secondary storage management systems which allow for managing very large amounts of data. The Object Oriented Database Manifesto specifically lists the following features as mandatory for a system to support before it can be called an OODBMS; Complex objects, Object identity, Encapsulation , Types and Classes , Class or Type Hierarchies, Overriding, overloading and late binding, Computational completeness , Extensibility,Persistence , Secondary storage management, Concurrency, Recovery and an Ad Hoc Query Facility.
ICDE2015 Research 3: Distributed Storage and ProcessingTakuma Wakamori
The document summarizes a research presentation on distributed storage and processing. It discusses two papers: 1) PABIRS, a data access middleware for distributed file systems that efficiently processes mixed workloads of queries. It proposes an integrated data access middleware to address this. 2) Scalable distributed transactions across heterogeneous stores.
It then provides details on PABIRS, which uses a hybrid index with a bitmap index and LSM (log-structured merge) tree index. The bitmap index is used for low selectivity keys, while the LSM index is built for "hot" values with selectivity above a threshold. The system aims to efficiently support data retrieval and insertion for various query workloads on distributed file systems.
This document discusses probabilistic models used for text mining. It introduces mixture models, Bayesian nonparametric models, and graphical models including Bayesian networks, hidden Markov models, Markov random fields, and conditional random fields. It provides details on the general framework of mixture models and examples like topic models PLSA and LDA. It also discusses learning algorithms for probabilistic models like EM algorithm and Gibbs sampling.
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
This document discusses XML query language XPath and navigation. It describes how XPath allows querying XML documents by addressing elements and text using a path-like notation. XPath expressions are evaluated based on a context node and node-set. The document also covers XPointer for pointing to specific data within XML documents, and how XPath can be used with the XML DOM and XPathNavigator class in .NET.
The .NET Framework provides classes for working with XML, including parsing, validation, navigation, schema management, transformation, and serialization. Key classes include XmlReader, XmlWriter, and XmlDocument. XML is used extensively in .NET for configuration, ADO.NET, remoting, web services, and more. The document outlines the XML parsing model in .NET and classes for reading, writing, and manipulating XML.
X-RECOSA: MULTI-SCALE CONTEXT AGGREGATION FOR MULTI-TURN DIALOGUE GENERATIONIJCI JOURNAL
In multi-turn dialogue generation, responses are not only related to the topic and background of the context but also related to words and phrases in the sentences of the context. However, currently widely used hierarchical dialog models solely rely on context representations from the utterance-level encoder, ignoring the sentence representations output by the word-level encoder. This inevitably results in a loss of information while decoding and generating. In this paper, we propose a new dialog model X-ReCoSa to tackle this problem which aggregates multi-scale context information for hierarchical dialog models. Specifically, we divide the generation decoder into upper and lower parts, namely the intention part and the generation part. Firstly, the intention part takes context representations as input to generate the intention of the response. Then the generation part generates words depending on sentence representations. Therefore, the hierarchical information has been fused into response generation. we conduct experiments on the English dataset DailyDialog. Experimental results exhibit that our method outperforms baseline models on both automatic metric-based and human-based evaluations.
A Study on Mapping Semi-Structured Data to a Structured Data for Inverted Ind...BRNSSPublicationHubI
The document discusses various approaches for mapping semi-structured data, such as XML documents, to structured data that can be stored in a relational database. It describes techniques like representing the XML data as a graph and then mapping it to tables using approaches like the edge table, attribute table, universal table, and normalized universal table. Each approach has advantages and disadvantages in terms of complexity, performance, and ability to handle queries and updates. The document also discusses challenges in storing semi-structured data and translating queries between the data models.
This document provides an overview of Resource Description Framework (RDF) and its XML-based syntax. It discusses key RDF concepts such as resources, properties, statements, and triples. It also covers RDF graphs and how RDF statements can be represented as XML elements with subjects, properties and objects. The document provides examples of RDF representations of university courses and faculty.
DATA INTEGRATION (Gaining Access to Diverse Data).pptcareerPointBasti
XML provides a standard way to represent and exchange data. It defines elements, which can contain text or other nested elements, and attributes. XML documents can be validated against DTDs or XML schemas, which define allowed structures and datatypes. XML data can be queried using XPath expressions, which select elements or attributes based on their path in the XML tree and optional predicates. XPath allows traversing relationships both vertically and horizontally in the tree structure.
Towards a Query Rewriting Algorithm Over Proteomics XML ResourcesCSCJournals
Querying and sharing Web proteomics data is not an easy task. Given that, several data sources can be used to answer the same sub-goals in the Global query, it is obvious that we can have many candidates rewritings. The user-query is formulated using Concepts and Properties related to Proteomics research (Domain Ontology). Semantic mappings describe the contents of underlying sources. In this paper, we propose a characterization of query rewriting problem using semantic mappings as an associated hypergraph. Hence, the generation of candidates’ rewritings can be formulated as the discovery of minimal Transversals of an hypergraph. We exploit and adapt algorithms available in Hypergraph Theory to find all candidates rewritings from a query answering problem. Then, in future work, some relevant criteria could help to determine optimal and qualitative rewritings, according to user needs, and sources performances.
HadoopXML: A Suite for Parallel Processing of Massive XML Data with Multiple ...Kyong-Ha Lee
This document proposes HadoopXML, a system for efficiently processing massive XML data and multiple twig pattern queries in parallel using Hadoop. Key features of HadoopXML include:
1) It partitions large XML files and processes them in parallel across nodes while preserving structural information.
2) It simultaneously processes multiple twig pattern queries with a shared input scan, without needing separate MapReduce jobs for each query.
3) It enables query processing tasks to share input scans and intermediate results like path solutions, reducing redundant processing and I/O.
4) It provides load balancing to fairly distribute twig join operations across nodes.
Expressing and Exploiting Multi-Dimensional Locality in DASHMenlo Systems GmbH
DASH is a realization of the PGAS (partitioned global address space) programming model in the form of a C++ template library. It provides a multidimensional array abstraction which is typically used as an underlying container for stencil- and dense matrix operations.
Efficiency of operations on a distributed multi-dimensional array highly depends on the distribution of its elements to processes and the communication strategy used to propagate values between them. Locality can only be improved by employing an optimal distribution that is specific to the implementation of the algorithm, run-time parameters such as node topology, and numerous additional aspects. Application developers do not know these implications which also might change in future releases of DASH.
In the following, we identify fundamental properties of distribution patterns that are prevalent in existing HPC applications.
We describe a classification scheme of multi-dimensional distributions based on these properties and demonstrate how distribution patterns can be optimized for locality and communication avoidance automatically and, to a great extent, at compile time.
Data integration with a façade. The case of knowledge graph construction.Enrico Daga
"Data integration with a façade.
The case of knowledge graph construction." is an overview of recent research in façade-based data access. The slides introduce core notions of façade-based data access and the design principles of SPARQL Anything, a system that allows querying of many formats (CSV, JSON, XML, HTML, Markdown , Excel, ...) in plain SPARQL.
This document discusses XML validation using an XML schema (XSD) file. It provides an example of using an XmlReader with validation enabled to validate an XML file against an XSD schema. The example loads an XML file, validates it using a schema at a given URI, and handles any validation errors, displaying status messages. It demonstrates how to automatically generate an XSD from an XML file in Visual Studio to define the XML structure.
The document discusses various XML processing models including DOM, SAX, StAX, and VTD-XML. VTD-XML uses a non-extractive parsing approach that encodes tokens as 64-bit integers to provide efficient random access parsing of XML documents with minimal memory usage. It has advantages over DOM and SAX such as being faster, using less memory, and allowing incremental updates to XML documents. Parallel DOM (ParDOM) is also discussed as an approach to parallelize DOM parsing across multiple CPU cores.
The document discusses working with the XML Document Object Model (DOM). It introduces key DOM objects like DOMDocument, IXMLDOMNode, and IXMLDOMNodeList that allow programmers to access, navigate, and modify XML documents. The DOM represents an XML file as a tree structure of nodes. Programmers can use DOM methods on the nodes to perform tasks like loading XML, creating/accessing elements, and traversing the node tree.
The document discusses XML style sheets and transformations. It introduces Cascading Style Sheets (CSS) and Extensible Style Sheet Language (XSL) as two methods for transforming XML documents. CSS controls formatting of tags, while XSL can reorder, sort, and display elements based on conditions using XSLT and XPath. The document provides examples of creating CSS stylesheets and XSLT templates to format XML data.
This document discusses concepts related to object-oriented databases. It begins by outlining the objectives of examining object-oriented database design concepts and understanding the transition from relational to object-oriented databases. It then provides background on how object-oriented databases arose from advancements in relational database management systems and how they integrate object-oriented programming concepts. The key aspects of object-oriented databases are described as objects serving as the basic building blocks organized into classes with methods and inheritance. The document also covers object-oriented programming concepts like encapsulation, polymorphism, and abstraction that characterize object-oriented management systems. Examples are provided of object database structures and queries.
unit_5_XML data integration database managementsathiyabcsbs
The document discusses XML querying using XPath. It begins with an overview of XPath, describing it as a language for defining templates that traverse the XML tree to select nodes. It then provides examples of basic XPath queries on an sample XML document, including queries to select elements, attributes, and text nodes. The document also covers more advanced XPath features such as predicates for filtering query results, different axes for traversing the tree in various directions, and functions for querying node position and order.
This document discusses creating an OWL ontology from Dublin Core metadata to improve semantic interoperability. It describes a process to transform DC metadata from a digital repository into RDF, then refine the semantics through an OWL ontology. It demonstrates this with metadata from a university repository, adding properties and subclasses. Queries over the instantiated ontology using reasoning show the ability to infer new knowledge not possible from the original metadata alone. The approach aims to semantically enrich existing metadata in a centralized way to help address challenges of the semantic web.
Αρχειακά Μεταδεδομένα: Πρότυπα και Διαχείριση στον Παγκόσμιο ΙστόGiannis Tsakonas
Από τη δεκαετία του 1970 μέχρι σήμερα το Διεθνές Συμβούλιο Αρχείων δημιουργεί και παρέχει στην κοινότητα των Αρχείων και των Αρχειονόμων μια σειρά από πρότυπα για την ανάπτυξη αρχειακών βοηθημάτων έρευνας και συναφών καταλόγων και ευρετηρίων. Στόχος των προτύπων είναι η κοινή αντίληψη, προσέγγιση και ομοιομορφία στη δημιουργία καταλόγων, καθιερωμένων εγγραφών και στη θεματική περιγραφή των αρχείων, της δομής και του περιεχομένου τους.
Ο Παγκόσμιος ιστός έχει γίνει ένα από τα σημαντικότερα μέσα διακίνησης πληροφορίας και η εκρηκτική ανάπτυξη των αντίστοιχων τεχνολογιών ανάπτυξης εφαρμογών στο περιβάλλον του έχει οδηγήσει στην αξιοποίησή του από διάφορες κοινότητες. Σε αυτό το πλαίσιο οι Αρχειονόμοι καλούνται να κωδικοποιήσουν τα μεταδεδομένα τους και να τα καταστήσουν ικανά να διαλειτουργήσουν σε ένα παγκόσμιο περιβάλλον διαχείρισης πληροφορίας και γνώσης, όπου όλες οι επιστημονικές κοινότητες συνυπάρχουν.
Στόχος του σεμιναρίου είναι να παρουσιάσει (α) βασικά πρότυπα διαχείρισης αρχειακής πληροφορίας και (β) το τεχνολογικό υπόβαθρο το οποίο καθορίζει τους τρόπους με τους οποίους είναι δυνατή η ανταλλαγή και η διαλειτουργικότητα - διασύνδεση της αρχειακής πληροφορίας με την πληροφορία που παράγουν άλλοι οργανισμοί και κοινότητες με τις οποίες οι αρχειακές υπηρεσίες έχουν άμεση σχέση.
Το σεμινάριο απευθύνεται σε εργαζόμενους σε δημόσιους και ιδιωτικούς αρχειακούς φορείς, φοιτητές και πτυχιούχους αρχειονόμους, βιβλιοθηκονόμους και πτυχιούχους Πανεπιστημίων και ΤΕΙ με παρεμφερή επαγγελματικά και επιστημονικά ενδιαφέροντα.
Το σεμινάριο εντάσσεται στις δραστηριότητες της Ομάδας Βάσεων Δεδομένων και Πληροφοριακών Συστημάτων του Εργαστηρίου Ψηφιακών Βιβλιοθηκών και Ηλεκτρονικής Δημοσίευσης του Τμήματος Αρχειονομίας, Βιβλιοθηκονομίας και Μουσειολογίας του Ιονίου Πανεπιστημίου, διοργανώνεται στο πλαίσιο του 21st International Conference on Theory and Practice of Digital Libraries και θα διεξαχθεί στο ξενοδοχείο Grand Hotel Palace, Μοναστηρίου 305, Θεσσαλονίκη, την Τρίτη 19 Σεπτεμβρίου 2017 και ώρες 14.00 – 17.00.
The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...Giannis Tsakonas
Digital libraries evaluation is characterised as an interdisciplinary and multidisciplinary domain posing a set of challenges to the research communities that intend to utilise and assess criteria, methods and tools. The amount of scientific production, which is published on the field, hinders and disorientates the researchers who are interested in the domain. The researchers need guidance in order to exploit the considerable amount of data and the diversity of methods effectively as well as to identify new research goals and develop their plans for future works. This paper proposes a methodological pathway to investigate the core topics of the digital library evaluation domain, author communities, their relationships, as well as the researchers who significantly contribute to major topics. The proposed methodology exploits topic modelling algorithms and network analysis on a corpus consisting of the digital library evaluation papers presented in JCDL,ECDL/TDPL and ICADL conferences in the period 2001–2013.
Full text at: dx.doi.org/10.1007/978-3-319-43997-6_19
Session: Digital Library Evaluation
Time: Thursday, 08/Sep/2016, 9:00am - 10:30am
Chair: Claus-Peter Klas
Location: Blauer Saal, Hannover Congress Centrum
This document discusses probabilistic models used for text mining. It introduces mixture models, Bayesian nonparametric models, and graphical models including Bayesian networks, hidden Markov models, Markov random fields, and conditional random fields. It provides details on the general framework of mixture models and examples like topic models PLSA and LDA. It also discusses learning algorithms for probabilistic models like EM algorithm and Gibbs sampling.
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
This document discusses XML query language XPath and navigation. It describes how XPath allows querying XML documents by addressing elements and text using a path-like notation. XPath expressions are evaluated based on a context node and node-set. The document also covers XPointer for pointing to specific data within XML documents, and how XPath can be used with the XML DOM and XPathNavigator class in .NET.
The .NET Framework provides classes for working with XML, including parsing, validation, navigation, schema management, transformation, and serialization. Key classes include XmlReader, XmlWriter, and XmlDocument. XML is used extensively in .NET for configuration, ADO.NET, remoting, web services, and more. The document outlines the XML parsing model in .NET and classes for reading, writing, and manipulating XML.
X-RECOSA: MULTI-SCALE CONTEXT AGGREGATION FOR MULTI-TURN DIALOGUE GENERATIONIJCI JOURNAL
In multi-turn dialogue generation, responses are not only related to the topic and background of the context but also related to words and phrases in the sentences of the context. However, currently widely used hierarchical dialog models solely rely on context representations from the utterance-level encoder, ignoring the sentence representations output by the word-level encoder. This inevitably results in a loss of information while decoding and generating. In this paper, we propose a new dialog model X-ReCoSa to tackle this problem which aggregates multi-scale context information for hierarchical dialog models. Specifically, we divide the generation decoder into upper and lower parts, namely the intention part and the generation part. Firstly, the intention part takes context representations as input to generate the intention of the response. Then the generation part generates words depending on sentence representations. Therefore, the hierarchical information has been fused into response generation. we conduct experiments on the English dataset DailyDialog. Experimental results exhibit that our method outperforms baseline models on both automatic metric-based and human-based evaluations.
A Study on Mapping Semi-Structured Data to a Structured Data for Inverted Ind...BRNSSPublicationHubI
The document discusses various approaches for mapping semi-structured data, such as XML documents, to structured data that can be stored in a relational database. It describes techniques like representing the XML data as a graph and then mapping it to tables using approaches like the edge table, attribute table, universal table, and normalized universal table. Each approach has advantages and disadvantages in terms of complexity, performance, and ability to handle queries and updates. The document also discusses challenges in storing semi-structured data and translating queries between the data models.
This document provides an overview of Resource Description Framework (RDF) and its XML-based syntax. It discusses key RDF concepts such as resources, properties, statements, and triples. It also covers RDF graphs and how RDF statements can be represented as XML elements with subjects, properties and objects. The document provides examples of RDF representations of university courses and faculty.
DATA INTEGRATION (Gaining Access to Diverse Data).pptcareerPointBasti
XML provides a standard way to represent and exchange data. It defines elements, which can contain text or other nested elements, and attributes. XML documents can be validated against DTDs or XML schemas, which define allowed structures and datatypes. XML data can be queried using XPath expressions, which select elements or attributes based on their path in the XML tree and optional predicates. XPath allows traversing relationships both vertically and horizontally in the tree structure.
Towards a Query Rewriting Algorithm Over Proteomics XML ResourcesCSCJournals
Querying and sharing Web proteomics data is not an easy task. Given that, several data sources can be used to answer the same sub-goals in the Global query, it is obvious that we can have many candidates rewritings. The user-query is formulated using Concepts and Properties related to Proteomics research (Domain Ontology). Semantic mappings describe the contents of underlying sources. In this paper, we propose a characterization of query rewriting problem using semantic mappings as an associated hypergraph. Hence, the generation of candidates’ rewritings can be formulated as the discovery of minimal Transversals of an hypergraph. We exploit and adapt algorithms available in Hypergraph Theory to find all candidates rewritings from a query answering problem. Then, in future work, some relevant criteria could help to determine optimal and qualitative rewritings, according to user needs, and sources performances.
HadoopXML: A Suite for Parallel Processing of Massive XML Data with Multiple ...Kyong-Ha Lee
This document proposes HadoopXML, a system for efficiently processing massive XML data and multiple twig pattern queries in parallel using Hadoop. Key features of HadoopXML include:
1) It partitions large XML files and processes them in parallel across nodes while preserving structural information.
2) It simultaneously processes multiple twig pattern queries with a shared input scan, without needing separate MapReduce jobs for each query.
3) It enables query processing tasks to share input scans and intermediate results like path solutions, reducing redundant processing and I/O.
4) It provides load balancing to fairly distribute twig join operations across nodes.
Expressing and Exploiting Multi-Dimensional Locality in DASHMenlo Systems GmbH
DASH is a realization of the PGAS (partitioned global address space) programming model in the form of a C++ template library. It provides a multidimensional array abstraction which is typically used as an underlying container for stencil- and dense matrix operations.
Efficiency of operations on a distributed multi-dimensional array highly depends on the distribution of its elements to processes and the communication strategy used to propagate values between them. Locality can only be improved by employing an optimal distribution that is specific to the implementation of the algorithm, run-time parameters such as node topology, and numerous additional aspects. Application developers do not know these implications which also might change in future releases of DASH.
In the following, we identify fundamental properties of distribution patterns that are prevalent in existing HPC applications.
We describe a classification scheme of multi-dimensional distributions based on these properties and demonstrate how distribution patterns can be optimized for locality and communication avoidance automatically and, to a great extent, at compile time.
Data integration with a façade. The case of knowledge graph construction.Enrico Daga
"Data integration with a façade.
The case of knowledge graph construction." is an overview of recent research in façade-based data access. The slides introduce core notions of façade-based data access and the design principles of SPARQL Anything, a system that allows querying of many formats (CSV, JSON, XML, HTML, Markdown , Excel, ...) in plain SPARQL.
This document discusses XML validation using an XML schema (XSD) file. It provides an example of using an XmlReader with validation enabled to validate an XML file against an XSD schema. The example loads an XML file, validates it using a schema at a given URI, and handles any validation errors, displaying status messages. It demonstrates how to automatically generate an XSD from an XML file in Visual Studio to define the XML structure.
The document discusses various XML processing models including DOM, SAX, StAX, and VTD-XML. VTD-XML uses a non-extractive parsing approach that encodes tokens as 64-bit integers to provide efficient random access parsing of XML documents with minimal memory usage. It has advantages over DOM and SAX such as being faster, using less memory, and allowing incremental updates to XML documents. Parallel DOM (ParDOM) is also discussed as an approach to parallelize DOM parsing across multiple CPU cores.
The document discusses working with the XML Document Object Model (DOM). It introduces key DOM objects like DOMDocument, IXMLDOMNode, and IXMLDOMNodeList that allow programmers to access, navigate, and modify XML documents. The DOM represents an XML file as a tree structure of nodes. Programmers can use DOM methods on the nodes to perform tasks like loading XML, creating/accessing elements, and traversing the node tree.
The document discusses XML style sheets and transformations. It introduces Cascading Style Sheets (CSS) and Extensible Style Sheet Language (XSL) as two methods for transforming XML documents. CSS controls formatting of tags, while XSL can reorder, sort, and display elements based on conditions using XSLT and XPath. The document provides examples of creating CSS stylesheets and XSLT templates to format XML data.
This document discusses concepts related to object-oriented databases. It begins by outlining the objectives of examining object-oriented database design concepts and understanding the transition from relational to object-oriented databases. It then provides background on how object-oriented databases arose from advancements in relational database management systems and how they integrate object-oriented programming concepts. The key aspects of object-oriented databases are described as objects serving as the basic building blocks organized into classes with methods and inheritance. The document also covers object-oriented programming concepts like encapsulation, polymorphism, and abstraction that characterize object-oriented management systems. Examples are provided of object database structures and queries.
unit_5_XML data integration database managementsathiyabcsbs
The document discusses XML querying using XPath. It begins with an overview of XPath, describing it as a language for defining templates that traverse the XML tree to select nodes. It then provides examples of basic XPath queries on an sample XML document, including queries to select elements, attributes, and text nodes. The document also covers more advanced XPath features such as predicates for filtering query results, different axes for traversing the tree in various directions, and functions for querying node position and order.
This document discusses creating an OWL ontology from Dublin Core metadata to improve semantic interoperability. It describes a process to transform DC metadata from a digital repository into RDF, then refine the semantics through an OWL ontology. It demonstrates this with metadata from a university repository, adding properties and subclasses. Queries over the instantiated ontology using reasoning show the ability to infer new knowledge not possible from the original metadata alone. The approach aims to semantically enrich existing metadata in a centralized way to help address challenges of the semantic web.
Similar to Path-based MXML Storage and Querying (20)
Αρχειακά Μεταδεδομένα: Πρότυπα και Διαχείριση στον Παγκόσμιο ΙστόGiannis Tsakonas
Από τη δεκαετία του 1970 μέχρι σήμερα το Διεθνές Συμβούλιο Αρχείων δημιουργεί και παρέχει στην κοινότητα των Αρχείων και των Αρχειονόμων μια σειρά από πρότυπα για την ανάπτυξη αρχειακών βοηθημάτων έρευνας και συναφών καταλόγων και ευρετηρίων. Στόχος των προτύπων είναι η κοινή αντίληψη, προσέγγιση και ομοιομορφία στη δημιουργία καταλόγων, καθιερωμένων εγγραφών και στη θεματική περιγραφή των αρχείων, της δομής και του περιεχομένου τους.
Ο Παγκόσμιος ιστός έχει γίνει ένα από τα σημαντικότερα μέσα διακίνησης πληροφορίας και η εκρηκτική ανάπτυξη των αντίστοιχων τεχνολογιών ανάπτυξης εφαρμογών στο περιβάλλον του έχει οδηγήσει στην αξιοποίησή του από διάφορες κοινότητες. Σε αυτό το πλαίσιο οι Αρχειονόμοι καλούνται να κωδικοποιήσουν τα μεταδεδομένα τους και να τα καταστήσουν ικανά να διαλειτουργήσουν σε ένα παγκόσμιο περιβάλλον διαχείρισης πληροφορίας και γνώσης, όπου όλες οι επιστημονικές κοινότητες συνυπάρχουν.
Στόχος του σεμιναρίου είναι να παρουσιάσει (α) βασικά πρότυπα διαχείρισης αρχειακής πληροφορίας και (β) το τεχνολογικό υπόβαθρο το οποίο καθορίζει τους τρόπους με τους οποίους είναι δυνατή η ανταλλαγή και η διαλειτουργικότητα - διασύνδεση της αρχειακής πληροφορίας με την πληροφορία που παράγουν άλλοι οργανισμοί και κοινότητες με τις οποίες οι αρχειακές υπηρεσίες έχουν άμεση σχέση.
Το σεμινάριο απευθύνεται σε εργαζόμενους σε δημόσιους και ιδιωτικούς αρχειακούς φορείς, φοιτητές και πτυχιούχους αρχειονόμους, βιβλιοθηκονόμους και πτυχιούχους Πανεπιστημίων και ΤΕΙ με παρεμφερή επαγγελματικά και επιστημονικά ενδιαφέροντα.
Το σεμινάριο εντάσσεται στις δραστηριότητες της Ομάδας Βάσεων Δεδομένων και Πληροφοριακών Συστημάτων του Εργαστηρίου Ψηφιακών Βιβλιοθηκών και Ηλεκτρονικής Δημοσίευσης του Τμήματος Αρχειονομίας, Βιβλιοθηκονομίας και Μουσειολογίας του Ιονίου Πανεπιστημίου, διοργανώνεται στο πλαίσιο του 21st International Conference on Theory and Practice of Digital Libraries και θα διεξαχθεί στο ξενοδοχείο Grand Hotel Palace, Μοναστηρίου 305, Θεσσαλονίκη, την Τρίτη 19 Σεπτεμβρίου 2017 και ώρες 14.00 – 17.00.
The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...Giannis Tsakonas
Digital libraries evaluation is characterised as an interdisciplinary and multidisciplinary domain posing a set of challenges to the research communities that intend to utilise and assess criteria, methods and tools. The amount of scientific production, which is published on the field, hinders and disorientates the researchers who are interested in the domain. The researchers need guidance in order to exploit the considerable amount of data and the diversity of methods effectively as well as to identify new research goals and develop their plans for future works. This paper proposes a methodological pathway to investigate the core topics of the digital library evaluation domain, author communities, their relationships, as well as the researchers who significantly contribute to major topics. The proposed methodology exploits topic modelling algorithms and network analysis on a corpus consisting of the digital library evaluation papers presented in JCDL,ECDL/TDPL and ICADL conferences in the period 2001–2013.
Full text at: dx.doi.org/10.1007/978-3-319-43997-6_19
Session: Digital Library Evaluation
Time: Thursday, 08/Sep/2016, 9:00am - 10:30am
Chair: Claus-Peter Klas
Location: Blauer Saal, Hannover Congress Centrum
Increasing traceability of physical library items through Koha: the case of S...Giannis Tsakonas
Presentation in KohaCon2016, the major event of Koha community, on May 31, 2016. The Library & Information Center, University of Patras, Greece has developed the SELIDA framework, which integrates a set of standardized and widespread library technologies in order to increase the identification and traceability of physical items, such as books. The framework makes use of RFID tags in order to assign unique identification marks, in the form of URIs that can be globally exchanged. The framework has been implemented in the fully translated and customized Koha installation of our Library and its core services support checking in/out of books and browsing of history transactions with geospatial visualization. Its use can support transactions between various libraries or branches of the same library. The proposed presentation will describe the architecture of the framework and how it connects to Koha, as well as the challenges we faced during its development.
We were group no 2: notes for the MLAS2015 workshopGiannis Tsakonas
Summary note of the discussion of group no 2 in the IFLA MLAS 2015 workshop in Athens, March 12, 2015, involving librarians (from all around the world), book palaces and … Canadian rock groups.
Βιβλιοθήκες & Πολιτισμός: τα προφανή και τα ευνόηταGiannis Tsakonas
Παρουσίαση στο πάνελ "Πολιτισµός και νέες τεχνολογίες: από τον αισθητό στον ψηφιακό κόσµο" που διοργανώθηκε στο πλαίσιο της ενότητας "Ελευθέρο Βήμα" του Forum Ανάπτυξης 2014 (Κυριακή 23 Νοεµβρίου 2014 στις 20:30-22:00, Ξενοδοχείο Αστήρ, Αίθουσα ΙΙ).
{Tech}changes: the technological state of Greek Libraries.Giannis Tsakonas
The document summarizes technological changes in Greek libraries over recent years. While Greek libraries were early adopters of technological changes, penetration of eBooks and sophisticated business models remains limited. However, libraries have increasingly embraced open access, open source, and open data initiatives. Projects like Kallipos provide enhanced academic textbooks online. Funding from the EU and Greece has supported centralized technological solutions and opportunities for public/private cooperation to make technology more affordable and transform literacy programs.
Affective relationships between users & libraries in times of economic stressGiannis Tsakonas
This study used the Stimulus-Organism-Response framework to identify the critical parameters that govern the affective relationships between Greek academic libraries and their users during times of economic stress. A survey of 950 library users found that social cues like willingness, kindness, and knowledge had the strongest impact on users' emotions. Emotions like satisfaction, confidence and safety positively correlated with library usage. The findings suggest that creating a welcoming environment and providing friendly service are important for positively influencing users' feelings about the library. Further research is needed to explore how these relationship factors interact and influence social and systemic interactions.
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...Giannis Tsakonas
This document proposes a methodology for discovering patterns in scientific literature using a case study of digital library evaluation. It involves:
1. Classifying documents to identify relevant papers using naive Bayes classification.
2. Semantically annotating papers with concepts from a Digital Library Evaluation Ontology using the GoNTogle annotation tool. Over 2,600 annotations were generated.
3. Clustering the annotated papers into coherent groups using k-means clustering.
4. Interpreting the clusters with the assistance of the ontology to discover patterns and trends in the literature. Benchmarking tests were performed to evaluate effectiveness of the methodology.
Παρουσίαση για το σεμινάριο “Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - FRBR και Linked Data”
Το σεμινάριο διοργανώθηκε από τις Βιβλιοθήκες του Τμήματος Νομικής του ΕΚΠΑ και του Πανεπιστημίου Πειραιώς στις 18 και 19 Ιουνίου 2012, υπό την επιμέλεια της Ομάδας Βάσεων Δεδομένων και Πληροφοριακών Συστημάτων του Εργαστηρίου Ψηφιακών Βιβλιοθηκών και Ηλεκτρονικής Δημοσίευσης του Τμήματος Αρχειονομίας - Βιβλιοθηκονομίας του Ιονίου Πανεπιστημίου
Διδάσκοντες:
- Μανόλης Πεπονάκης (MLIS)
- Δρ. Μιχάλης Σφακάκης
- Δρ. Χρήστος Παπαθεοδώρου
Policies for geospatial collections: a research in US and Canadian academic l...Giannis Tsakonas
This document summarizes a research study on geospatial collection development policies in US and Canadian academic libraries. It includes the session overview, research framework, definitions, literature review, objectives, methodology, findings, and conclusions. The methodology involved analyzing the websites of 21 academic libraries for their geospatial policies. The findings show variability in policies but many included general information, collection details, and references to open data. The conclusions are that policies lack homogeneity and more research is needed on policies in other countries.
Developing a Metadata Model for Historic Buildings: Describing and Linking Ar...Giannis Tsakonas
This document discusses developing a metadata model called ARMOS (Architecture Metadata Object Schema) for describing historic buildings. It covers traditional flat metadata descriptions of architecture, examining relationships between works and images/other works. ARMOS aims to group related buildings logically and connect them to facilitate discovery. It draws from architecture theories on morphology, typology and patterns. The conceptual model identifies entities, relationships and attributes. ARMOS is a harmonization profile combining descriptive, structural, administrative and technical metadata from various sources. Issues around terminology, extensions and interoperability are discussed.
Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...Giannis Tsakonas
This document summarizes a workshop on digital information management that took place in April 2012 in Corfu, Greece. It discusses some of the challenges of information retrieval when using natural language queries, including problems of ambiguity, context, and the use of knowledge organization systems and query expansion to help address these challenges. The role of user models and evaluation in understanding real language use is also mentioned.
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - FRBR και Linked DataGiannis Tsakonas
Παρουσίαση για το σεμινάριο “Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - FRBR και Linked Data”
Το σεμινάριο διοργανώθηκε από τη Βιβλιοθήκη και Κέντρο Πληροφόρησης του Πανεπιστημίου Πατρών, στους χώρους της οποίας διεξήχθη την Παρασκευή 3 Φεβρουαρίου 2012, υπό την επιμέλεια της Ομάδας Βάσεων Δεδομένων και Πληροφοριακών Συστημάτων του Εργαστηρίου Ψηφιακών Βιβλιοθηκών και Ηλεκτρονικής Δημοσίευσης του Τμήματος Αρχειονομίας - Βιβλιοθηκονομίας του Ιονίου Πανεπιστημίου
Διδάσκοντες:
- Μανόλης Πεπονάκης (MLIS)
- Δρ. Μιχάλης Σφακάκης
- Δρ. Χρήστος Παπαθεοδώρου
This document discusses open bibliographic data and the Open Bibliographic Principles initiative. It provides background on exchanging bibliographic data and reasons for making it open, such as freeing access, facilitating collaboration, and advancing research. Complications discussed include proprietary attitudes and loss of provenance over time. The document also covers topics such as using bibliographic data as linked open data, navigating it, examples like Libris, applicable licenses, and the E-LIS experience in adopting an open license.
Evaluation is a very vital research interest in the digital library domain. This has been exhibited by the growth of the literature in the main conferences and journal papers. However it is very difficult for one to navigate in this extended corpus. For these reasons the DiLEO ontology has been developed in order to assist the exploration of important concepts and the discovery of trends in the evaluation of digital libraries. DiLEO is a domain ontology, which aims to conceptualize the DL evaluation domain by correlating its key entities and provide reasoning paths that support the design of evaluation experiments.
E-LIS, το ηλεκτρονικό αρχείο για τη Βιβλιοθηκονομία και την Επιστήμη της Πληρ...Giannis Tsakonas
Presentation in the 16th Panhellenic Conference of Academic Libraries.
Φραντζή, Μ., Ανδρέου, Α.Κ., Τσάκωνας, Γ., et al. E-LIS, το ηλεκτρονικό αρχείο για τη Βιβλιοθηκονομία και την Επιστήμη της Πληροφόρησης: τρόποι αξιοποίησης του από την Ελληνική βιβλιοθηκονομική κοινότητα, 2007. In 16ο Πανελλήνιο Συνέδριο Ακαδημαϊκών Βιβλιοθηκών,Πειραιάς (GR),1-3 Οκτωβρίου 2007
Alternative location at:
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
Path-based MXML Storage and Querying
1. 2nd Workshop on Digital Information Management April, 25-26, 2012
Corfu, Greece
Path-based MXML Storage
and Querying
Nikolaos Fousteris 1 , Manolis Gergatsoulis 1 , Yannis Stavrakas 2
1 Department of Archives and Library Science, 2 Institutefor the Management
Ionian University of Information Systems (IMIS),
Ioannou Theotoki 72,49100 Corfu, Greece. R. C. Athena,
{nfouster,manolis}@ionio.gr G. Mpakou 17, 11524, Athens,Greece.
yannis@inmis.gr
2. Introduction & Motivation
The problem of storing and querying XML data
using relational databases has been considered
a lot
Multidimensional XML is an extension of XML
and it is used for representing data that assume
different facets, having different values or
structure, under different contexts
We expand the problem of storing and querying
XML to multidimensional XML data
2
3. Outline
XML Storage
Multidimensional XML(MXML)
Fundamental concepts
MXML example and graphical representation
MXML Storage
A path-based approaches is presented
Context Representation
Multidimensional XPath (MXPath)
MXPath to SQL conversion algorithm
Summary & Future work
3
4. XML Storage (1/2)
Includes techniques to store XML data in
Relational Databases
XML applications (internet applications)
are able to exploit the advantages of the
RDBMS technology
Operations over XML data, are
transformed to operations over the
Relational Schema
4
5. XML Storage (2/2)
Methodology
A Relational Schema is chosen for storing XML data
XML queries are produced by users/applications
XML queries are translated to SQL queries
SQL queries are executed
Results are translated back to XML and returned to the
user/application
Techniques
Schema Based
Schema Oblivious
5
6. Multidimensional XML (MXML)
Fundamental Concepts (1/3)
MXML is an extension of XML
In MXML data assume different facets,
having different value or structure, under
different contexts according to a number of
dimensions which may be applied to
elements and attributes
6
7. MXML – Fundamental Concepts (2/3)
Dimension: is a variable. Assigning different
values for each dimension it is possible to
construct different environments for MXML data
World: represents an environment under which
data obtain a meaning and is determined by
assigning to every dimension a single value
Context Specifier: an expression which specifies
a set of worlds (context) under which a facet of an
MXML element or attribute, is the holding facet of
this element or attribute
7
8. MXML – Example
<book isbn=[edition=english]"0-13-110362-8"[/] Multidimensional
[edition=greek]"0-13-110370-9"[/]> elements/attributes are
<title>The C programming language</title> elements/attributes that
<authors> have different facets
<author>Brian W. Kernighan</author> under different
<author>Dennis M. Ritchie</author> contexts.
</authors>
<@publisher> Each multidimensional
element/attribute
[edition = english] <publisher>Prentice Hall</publisher>[/]
contains one or more
[edition = greek] <publisher>Klidarithmos</publisher>[/]
facets, called Context
</@publisher> element/attributes.
<@translator>
[edition = greek] <translator>Thomas Moraitis</translator>[/]
</@translator>
<@price>
8
…….
10. MXML – Fundamental Concepts (3/3)
Explicit Context: Is the true context (defined by
a context specifier) only within the boundaries of
a single multidimensional element/attribute.
Inherited Context: Is the context, which is
inherited from the ancestor nodes to a
descendant node in the MXML graph.
Inherited Context Coverage: It constraints the
inherited context of a node, so as to contain only
the worlds under which the node has access to
some value node.
10
11. MXML Storage (1/5)
MXML storage includes techniques that
store MXML data in Relational Databases.
Applications using MXML storage are able
to exploit the advantages of the RDBMS
technology.
MXML additional features (context,
different types of MXML nodes/edges etc.)
should be considered.
11
12. MXML Storage (2/5)
Path-based Approach
MXML nodes are divided into groups, according
to their types. Each group is stored in a separate
table named after the type of the nodes
(elements, attributes and value nodes).
There is a Path Table, which stores all possible
paths of the MXML tree.
For node indexing, it is used a dotted format of
Dewey-labeling schema.
12
13. MXML Storage (3/5) Path-based Approach
Dewey-labeling schema
Used for indexing the nodes of
MXML tree.
A label is a dotted string
a1.a2.a3…an.
For each ai (i=1…n), i represents
the depth and ai the position
number of a node among its
siblings.
Ex. Node “1.1.2” is the 2nd child of node “1.1” and is placed
at the 3rd level of the MXML tree.
13
15. MXML Storage (5/5) Path-based Approach
Path representation
Ex. SQL
XPath: /book//picture
wildcard
SQL1:
select.. from.. where path LIKE ‘/book%/picture’
case1: ‘/book/cover/picture’ (match) correct
case2: ‘/booklet/cover/picture’ (match) error
SQL2:
select.. from.. where path LIKE ‘#/book#%/picture’
Cases like case2 above could not happen.
15
16. Context Representation (1/5)
Question
How can we represent in a Relational Database
the set of worlds which are contained in a context
specifier, for each MXML node?
16
17. Context Representation (2/5)
Ordered-Based Representation of Context
Basic idea: Total ordering of worlds based on:
Total ordering of dimensions
Total ordering of dimension values
For k dimensions with each dimension i having zi
possible values, we may have n=z1*z2*….*zk
possible ordered worlds.
Each world is assigned a unique integer value
between 1 and n (w1 to wn).
17
19. Context Representation (4/5)
Ordered-Based Representation of Context
World Vector:
A binary number representing a context
specifier. The position of every bit
corresponds to the position of a world
in the total ordering of all possible worlds.
Each bit of the world vector has two
possible values: 1 if the corresponding
world exists in context specifier or 0 if it does not)
binary digit for W1 …… binary digit for Wi …… binary digit for Wn
1 or 0: world exists or not n=possible worlds number
possible worlds ordering
Ex: world_vector of the expl. context of node 1.1.6.1 = 0011
19
20. Context Representation (5/5)
Ordered-Based
Representation of Context
Explicit Context Table:
Assigns an explicit context
(expressed in binary format
according to world vector
representation) to a MXML node.
Inherited Context Coverage Table:
Assigns an inherited context coverage
(expressed in binary format according to
world vector representation) to a MXML
node.
20
21. Multidimensional XPath (MXPath) (1/2)
MXPath:
An extension of XPath able to easily express
context-aware queries on MXML data.
Both explicit context (ec) and inherited context
coverage (icc) are used to navigate over
multidimensional elements and attributes.
Conditions on the explicit context at any point
of the path are allowed.
Both multidimensional and context nodes can
be returned.
21
22. Multidimensional XPath (MXPath) (2/2)
MXPath example:
[icc() >= “-”],/child::book
/child::cover[ec() >= “ed=gr”]/child->picture
Query in English:
Find the (multidimensional) sub-
Result
element picture of element cover of
the greek edition of the book.
cover[ec() >= “ed=gr”]
is an explicit context qualifier. The
function ec() returns the explicit
context of a node. The above qualifier
says that the ec of the node cover must
be superset of the context described
by the context specifier [ed=gr].
22
23. MXPath to SQL conversion algorithm (1/4)
Methodology:
Give a MXPath query “Q”, we construct the
Multidimensional Tree Pattern “G” of Q.
We divide “Q” in sub-paths according to
segmentation rules, which define the
appropriate segmentation points in “G”.
The sub-paths of “Q” and the predicates are
used as input for the MXPath to SQL
conversion algorithm.
23
24. MXPath to SQL conversion algorithm (2/4)
MXPath example:
branch
/book[authors[author=“Brian W.K.”]]
/cover[ec()=“ed=gr”]/->material
Query in English:
Find the (multidimensional) sub-
element material of element cover of
the greek edition of the book which
has an author named “Brian W.K.”.
Notice that the predicate
Result [authors[author=“Brian W.K.”]]
is a branch.
24
26. MXPath to SQL conversion algorithm (4/4)
MXPath: /book[authors[author=“Brian W.K.”]]/cover[ec()=“ed=gr”]/->material
SQL: (Select a1.node_id AS a_id
From Element_Table a1, Path_Table p1,
Select a3.node_id Value_Table v1
From Element_Table a1, Element_Table a2, Where p1.path Like '#/book#/authors#/author' and
Element_Table a3, Path_Table p1, a1.path_id = p1.path_id and
Path_Table p2, Path_Table p3, v1.path_id = p1.path_id and
EC_Table ec1 v1.value = 'Brian W.K.' and
Where p1.path Like '#/book' and v1.node_id Like CONCAT(a1.node_id,'%')
a1.path_id = p1.path_id and ) AS T
a1.node_id = ANY ( ) and
Select SUBSTRING_INDEX(a_id,'.', 2) a2.node_id Like CONCAT(a1.node_id,'%') and
From p2.path Like '#/book#/cover' and
a2.path_id = p2.path_id and
a2.node_id = ec1.node_id and
ec1.world_vector LIKE '000111' and
a3.node_id Like CONCAT(a2.node_id,'%') and
p3.path Like '#/book#/cover#/->material' and
a3.path_id = p3.path_id
26
27. Summary
MXML
Storing MXML in Relational DB & Context Representation
(path-based approach)
MXML querying using MXPath
Converting MXPath queries to SQL queries
Future work
Conversion Algorithm implementation and evaluation
Further optimization of the relational schema
27
28. References
1. N. Fousteris, Y. Stavrakas, and M. Gergatsoulis.
Multidimensional XPath. In Proc. of iiWAS 2008 , pp. 162-169.
ACM, 2008 .
2. M. Gergatsoulis, Y. Stavrakas, and D. Karteris. Incorporating
Dimensions in XML and DTD. In Database and Expert
Systems Applications, 12th International Conference, DEXA
2001 Munich, Germany, September 3-5, 2001, Proceedings,
volume 2113 of Lecture Notes in Computer Science, pp. 646-
656. Springer, 2001.
3. M. Yoshikawa, T. Amagasa, T. Shimura, and S. Uemura.
XRel: a path-based approach to storage and retrieval of XML
documents using relational databases. ACM Transactions on
Internet Technology, 1(1):110-141, 2001.
28