This document describes an indexing technique called the Index Fabric that can be used to efficiently query semistructured data like XML. The Index Fabric encodes paths in the XML as strings and inserts them into a layered index structure based on Patricia tries. This allows both ad hoc queries over raw paths in the data as well as optimized queries using refined paths. A performance study showed the Index Fabric outperformed using a commercial relational database for querying semistructured data.
International Journal of Computational Engineering Research(IJCER)ijceronline
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
Query Optimization Techniques in Graph Databasesijdms
Graph databases (GDB) have recently been arisen to overcome the limits of traditional databases for
storing and managing data with graph-like structure. Today, they represent a requirementfor many
applications that manage graph-like data,like social networks.Most of the techniques, applied to optimize
queries in graph databases, have been used in traditional databases, distribution systems,… or they are
inspired from graph theory. However, their reuse in graph databases should take care of the main
characteristics of graph databases, such as dynamic structure, highly interconnected data, and ability to
efficiently access data relationships. In this paper, we survey the query optimization techniques in graph
databases. In particular,we focus on the features they have in
Database concepts and Archeticture Ch2 with in class ActivitiesZainab Almugbel
This is the slides of chapter 2 of the book Ramez Elmasri and Shamkant Navathe, "Fundamentals of Database Systems" 6th Edition, 2010
I did not include the activities in the slides. I printed them out in separate papers. Then, I asked students: who liked to participate in activity 1 (the interview) in the class. I selected 2 students for the first activity (one was the interviewer and another was the guest). I did the same for the other activities.
This document provides a summary and introduction to the features of XQuery implemented in SQL Server 2005. It discusses the FLWOR statement, operators, if-then-else constructs, XML constructors, built-in functions, type casting, and examples of using each feature. It also describes non-supported features and scenarios where XQuery is useful, such as querying XML documents, application integration, and analyzing logs.
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
The document summarizes research on vertical fragmentation, allocation, and re-fragmentation in distributed object relational database systems. It proposes an algorithm for vertical fragmentation and allocation that considers the usage of attributes and methods by queries at different sites. The algorithm forms usage matrices, calculates affinity between methods, clusters methods, and partitions the data into fragments that are allocated to sites where they see the most demand. It also describes handling update queries by redirecting them to a server for processing and then propagating the updates to relevant fragments.
New proximity estimate for incremental update of non uniformly distributed cl...IJDKP
The conventional clustering algorithms mine static databases and generate a set of patterns in the form of
clusters. Many real life databases keep growing incrementally. For such dynamic databases, the patterns
extracted from the original database become obsolete. Thus the conventional clustering algorithms are not
suitable for incremental databases due to lack of capability to modify the clustering results in accordance
with recent updates. In this paper, the author proposes a new incremental clustering algorithm called
CFICA(Cluster Feature-Based Incremental Clustering Approach for numerical data) to handle numerical
data and suggests a new proximity metric called Inverse Proximity Estimate (IPE) which considers the
proximity of a data point to a cluster representative as well as its proximity to a farthest point in its vicinity.
CFICA makes use of the proposed proximity metric to determine the membership of a data point into a
cluster.
Crosswalks show how to map data elements from one metadata schema to another. They are used to integrate datasets that use different standards. Creating a crosswalk involves harmonizing the schemas, semantically mapping elements, and establishing rules to handle complex mappings. Once the crosswalk is defined, metadata descriptions can be transformed from the source schema to the target schema. However, information may be lost due to differences between the schemas.
International Journal of Computational Engineering Research(IJCER)ijceronline
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
Query Optimization Techniques in Graph Databasesijdms
Graph databases (GDB) have recently been arisen to overcome the limits of traditional databases for
storing and managing data with graph-like structure. Today, they represent a requirementfor many
applications that manage graph-like data,like social networks.Most of the techniques, applied to optimize
queries in graph databases, have been used in traditional databases, distribution systems,… or they are
inspired from graph theory. However, their reuse in graph databases should take care of the main
characteristics of graph databases, such as dynamic structure, highly interconnected data, and ability to
efficiently access data relationships. In this paper, we survey the query optimization techniques in graph
databases. In particular,we focus on the features they have in
Database concepts and Archeticture Ch2 with in class ActivitiesZainab Almugbel
This is the slides of chapter 2 of the book Ramez Elmasri and Shamkant Navathe, "Fundamentals of Database Systems" 6th Edition, 2010
I did not include the activities in the slides. I printed them out in separate papers. Then, I asked students: who liked to participate in activity 1 (the interview) in the class. I selected 2 students for the first activity (one was the interviewer and another was the guest). I did the same for the other activities.
This document provides a summary and introduction to the features of XQuery implemented in SQL Server 2005. It discusses the FLWOR statement, operators, if-then-else constructs, XML constructors, built-in functions, type casting, and examples of using each feature. It also describes non-supported features and scenarios where XQuery is useful, such as querying XML documents, application integration, and analyzing logs.
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
The document summarizes research on vertical fragmentation, allocation, and re-fragmentation in distributed object relational database systems. It proposes an algorithm for vertical fragmentation and allocation that considers the usage of attributes and methods by queries at different sites. The algorithm forms usage matrices, calculates affinity between methods, clusters methods, and partitions the data into fragments that are allocated to sites where they see the most demand. It also describes handling update queries by redirecting them to a server for processing and then propagating the updates to relevant fragments.
New proximity estimate for incremental update of non uniformly distributed cl...IJDKP
The conventional clustering algorithms mine static databases and generate a set of patterns in the form of
clusters. Many real life databases keep growing incrementally. For such dynamic databases, the patterns
extracted from the original database become obsolete. Thus the conventional clustering algorithms are not
suitable for incremental databases due to lack of capability to modify the clustering results in accordance
with recent updates. In this paper, the author proposes a new incremental clustering algorithm called
CFICA(Cluster Feature-Based Incremental Clustering Approach for numerical data) to handle numerical
data and suggests a new proximity metric called Inverse Proximity Estimate (IPE) which considers the
proximity of a data point to a cluster representative as well as its proximity to a farthest point in its vicinity.
CFICA makes use of the proposed proximity metric to determine the membership of a data point into a
cluster.
Crosswalks show how to map data elements from one metadata schema to another. They are used to integrate datasets that use different standards. Creating a crosswalk involves harmonizing the schemas, semantically mapping elements, and establishing rules to handle complex mappings. Once the crosswalk is defined, metadata descriptions can be transformed from the source schema to the target schema. However, information may be lost due to differences between the schemas.
With the rapid development in Geographic Information Systems (GISs) and their applications, more and
more geo-graphical databases have been developed by different vendors. However, data integration and
accessing is still a big problem for the development of GIS applications as no interoperability exists among
different spatial databases. In this paper we propose a unified approach for spatial data query. The paper
describes a framework for integrating information from repositories containing different vector data sets
formats and repositories containing raster datasets. The presented approach converts different vector data
formats into a single unified format (File Geo-Database “GDB”). In addition, we employ “metadata” to
support a wide range of users’ queries to retrieve relevant geographic information from heterogeneous and
distributed repositories. Such an employment enhances both query processing and performance.
The document describes the network database model and CODASYL DBTG model. Some key points:
- The network model uses a many-to-many relationship with owner and member records linked together.
- The DBTG model simplified this to one-to-one and one-to-many relationships. It uses segments, sets, and links to represent records, relationships, and connections between records.
- The DBTG model provides commands to retrieve, update, insert, and delete records as well as connect and disconnect them from sets. Programs access the database using templates, pointers, and status flags stored in a work area.
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIEScsandit
This document summarizes a research paper that proposes a system to enhance keyword search over relational databases using ontologies. The system builds structures during pre-processing like a reachability index to store connectivity information and an ontology concept graph. During querying, it maps keywords to concepts, uses the ontology to find related concepts and tuples, and generates top-k answer trees combining syntactic and semantic matches while limiting redundant results. The system is expected to perform better than existing approaches by reducing storage requirements through its approach to materializing neighborhood information in the reachability index.
Clustering the results of a search helps the user to overview the information returned. In this paper, we
look upon the clustering task as cataloguing the search results. By catalogue we mean a structured label
list that can help the user to realize the labels and search results. Labelling Cluster is crucial because
meaningless or confusing labels may mislead users to check wrong clusters for the query and lose extra
time. Additionally, labels should reflect the contents of documents within the cluster accurately. To be able
to label clusters effectively, a new cluster labelling method is introduced. More emphasis was given to
/produce comprehensible and accurate cluster labels in addition to the discovery of document clusters. We
also present a new metric that employs to assess the success of cluster labelling. We adopt a comparative
evaluation strategy to derive the relative performance of the proposed method with respect to the two
prominent search result clustering methods: Suffix Tree Clustering and Lingo.
we perform the experiments using the publicly available Datasets Ambient and ODP-239
P REFIX - BASED L ABELING A NNOTATION FOR E FFECTIVE XML F RAGMENTATIONijcsit
XML is
gradually
emplo
yed as
a standard of data exchange
in
web
environment
since its inception
in the
90s
until
present
.
It
serves
as a data exchange between system
s
and other application
s
.
Meanwhile t
he data
volume has grown substantially
in the web and
thus effective methods
of
storing and retrieving
these
data
is
essential
.
One recommended way is
p
hysically or virtually
fragments
the large chunk of data
and
distributes
the fragments
into different nodes.
F
ragmentation design
of XML document
contains of two
parts: fragmentat
ion operation and fragmentation method. The
three
fragmentation o
peration
s are
Horizontal, Vertical
and Hybrid. It
determines how the XML should be fragmented.
This
paper
aims
to give
an overview on the fragmentation design consideration
and
subsequently,
propose a
fragmentation
technique
using
number addressing
.
MONGODB VS MYSQL: A COMPARATIVE STUDY OF PERFORMANCE IN SUPER MARKET MANAGEME...ijcsity
A database is information collection that is organized in tables so that it can easily be accessed, managed,
and updated. It is the collection of tables, schemas, queries, reports, views and other objects. The data are
typically organized to model in a way that supports processes requiring information, such as modelling to
find a hotel with availability of rooms, thus the people can easily locate the hotels with vacancies. There
are many databases commonly, relational and non relational databases. Relational databases usually work
with structured data and non relational databases are work with semi structured data. In this paper, the
performance evaluation of MySQL and MongoDB is performed where MySQL is an example of relational
database and MongoDB is an example of non relational databases. A relational database is a data
structure that allows you to connect information from different 'tables', or different types of data buckets.
Non-relational database stores data without explicit and structured mechanisms to link data from different
buckets to one another. This paper discuss about the performance of MongoDB and MySQL in the field of
Super Market Management System. A supermarket is a large form of the traditional grocery store also a
self-service shop offering a wide variety of food and household products, organized in systematic manner.
It is larger and has a open selection than a traditional grocery store.
Master of Computer Application (MCA) – Semester 4 MC0077Aravind NC
The document provides information on various database normalization forms including 1NF, 2NF, 3NF, BCNF, and 4NF. It explains the differences between 3NF and BCNF, noting that BCNF is a stronger normal form that can capture some anomalies not captured by 3NF. It also discusses the differences between distributed and centralized database systems, highlighting advantages of data distribution such as data sharing, reliability/availability through replication, and faster query processing through parallelization.
Graph Based Workload Driven Partitioning System by Using MongoDBIJAAS Team
The web applications and websites of the enterprises are accessed by a huge number of users with the expectation of reliability and high availability. Social networking sites are generating the data exponentially large amount of data. It is a challenging task to store data efficiently. SQL and NoSQL are mostly used to store data. As RDBMS cannot handle the unstructured data and huge volume of data, so NoSQL is better choice for web applications. Graph database is one of the efficient ways to store data in NoSQL. Graph database allows us to store data in the form of relation. In Graph representation each tuple is represented by node and the relationship is represented by edge. But, to handle the exponentially growth of data into a single server might decrease the performance and increases the response time. Data partitioning is a good choice to maintain a moderate performance even the workload increases. There are many data partitioning techniques like Range, Hash and Round robin but they are not efficient for the small transactions that access a less number of tuples. NoSQL data stores provide scalability and availability by using various partitioning methods. To access the Scalability, Graph partitioning is an efficient way that can be easily represent and process that data. To balance the load data are partitioned horizontally and allocate data across the geographical available data stores. If the partitions are not formed properly result becomes expensive distributed transactions in terms of response time. So the partitioning of the tuple should be based on relation. In proposed system, Schism technique is used for partitioning the Graph. Schism is a workload aware graph partitioning technique. After partitioning the related tuples should come into a single partition. The individual node from the graph is mapped to the unique partition. The overall aim of Graph partitioning is to maintain nodes onto different distributed partition so that related data come onto the same cluster.
An extended database reverse engineering – a key for database forensic invest...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Efficient Record De-Duplication Identifying Using Febrl FrameworkIOSR Journals
This document describes using the Febrl (Freely Extensible Biomedical Record Linkage) framework to perform efficient record de-duplication. It discusses how Febrl allows for data cleaning, standardization, indexing, field comparison, and weight vector classification. Indexing techniques like blocking indexes, q-grams, and canopy clustering are used to reduce the number of record pair comparisons. Field comparison functions calculate matching weights, and classifiers like Fellegi-Sunter and support vector machines are used to determine matches. The method is evaluated on real-world health data, showing accuracy, precision, recall, and false positive rates for different partitioning methods.
Some background and thoughts on Metadata Mapping and Metadata Crosswalks. A collection of online sources and related projects. Comments are more than welcome, as is reuse!
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITYIJDKP
This document summarizes an approach to improve source code retrieval using structural information from source code. A lexical parser is developed to extract control statements and method identifiers from Java programs. A similarity measure is proposed that calculates the ratio of fully matching statements to partially matching statements in a sequence. Experiments show the retrieval model using this measure improves retrieval performance over other models by up to 90.9% relative to the number of retrieved methods.
Transforming data-centric eXtensible markup language into relational database...journalBEEI
eXtensible markup language (XML) appeared internationally as the format for data representation over the web. Yet, most organizations are still utilising relational databases as their database solutions. As such, it is crucial to provide seamless integration via effective transformation between these database infrastructures. In this paper, we propose XML-REG to bridge these two technologies based on node-based and path-based approaches. The node-based approach is good to annotate each positional node uniquely, while the path-based approach provides summarised path information to join the nodes. On top of that, a new range labelling is also proposed to annotate nodes uniquely by ensuring the structural relationships are maintained between nodes. If a new node is to be added to the document, re-labelling is not required as the new label will be assigned to the node via the new proposed labelling scheme. Experimental evaluations indicated that the performance of XML-REG exceeded XMap, XRecursive, XAncestor and Mini-XML concerning storing time, query retrieval time and scalability. This research produces a core framework for XML to relational databases (RDB) mapping, which could be adopted in various industries.
Semi-automatic Discovery of Mappings Between Heterogeneous Data Warehouse Dim...IDES Editor
Data Warehousing is the main Business
Intelligence instrument for the analysis of large amounts of
data. It permits the extraction of relevant information for
decision making processes inside organizations. Given the
great diffusion of Data Warehouses, there is an increasing
need to integrate information coming from independent
Data Warehouses or from independently developed data
marts in the same Data Warehouse. In this paper, we
provide a method for the semi-automatic discovery of
common topological properties of dimensions that can be
used to automatically map elements of different dimensions
in heterogeneous Data Warehouses. The method uses
techniques from the Data Integration research area and
combines topological properties of dimensions in a
multidimensional model.
Comparative study of no sql document, column store databases and evaluation o...ijdms
In the last decade, rapid growth in mobile applications, web technologies, social media generating
unstructured data has led to the advent of various nosql data stores. Demands of web scale are in
increasing trend everyday and nosql databases are evolving to meet up with stern big data requirements.
The purpose of this paper is to explore nosql technologies and present a comparative study of document
and column store nosql databases such as cassandra, MongoDB and Hbase in various attributes of
relational and distributed database system principles. Detailed study and analysis of architecture and
internal working cassandra, Mongo DB and HBase is done theoretically and core concepts are depicted.
This paper also presents evaluation of cassandra for an industry specific use case and results are
published.
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
The document provides definitions and explanations of key concepts in database management systems. It discusses:
- The purpose of a DBMS is to solve problems with file processing systems like data redundancy, inconsistency, difficult data access and isolation, and integrity and concurrency issues.
- Data abstraction and levels of abstraction hide complexity from users through physical, logical, and view levels.
- A DBMS provides an environment for convenient and efficient data retrieval and storage.
- Data independence allows changes to schema definitions without affecting other levels.
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMINGijiert bestjournal
This document summarizes a research paper that evaluates Cassandra and MongoDB NoSQL databases for processing unstructured data using Hadoop streaming. It proposes a system with three stages: data preparation where data is downloaded from Cassandra servers to file systems; data transformation where JSON data is converted to other formats using MapReduce; and data processing where non-Java executables run on the transformed data. The document reviews related work on Cassandra and Hadoop performance and discusses the data models of key-value, document, column-oriented, and graph databases. It concludes that comparing Cassandra and MongoDB can help process unstructured data and outline new approaches.
An perspective into the raise of NoSQL systems and an comparison between RDBMS and NoSQL technologies.
The basic idea of the presentation originated while trying to understand the different alternatives available for managing data while building a fast, highly scalable, available, and reliable enterprise application.
CBSE XII Database Concepts And MySQL PresentationGuru Ji
The document provides an introduction to database concepts and the relational model. It defines what a database is and discusses the purpose of databases, including reducing data redundancy and maintaining data integrity. It also describes different data models like relational, network, and hierarchical models. The relational model is then explained in detail, covering terminology, keys, views, and relational algebra operations like select, project, cartesian product. The document provides examples to illustrate database concepts and the relational model.
1. The document discusses key concepts related to database systems including the definition of a database, database management systems (DBMS), data models, database classification, data integrity, query optimization, structured query language (SQL), parallel databases, and object-relational mapping (ORM).
2. It provides details on common data models like hierarchical, network, and relational models. It also describes concepts like database architecture, data definition language, data manipulation language, and distributed databases.
3. Control questions are provided at the end to test understanding of database concepts like the difference between a database and data set, components of a database system, and main elements of a database.
This document summarizes and compares different methods for performing keyword searches in relational databases. It discusses candidate network-based methods, Steiner-tree based algorithms, and backward expanding keyword search approaches. It also evaluates methods that aim to improve search efficiency and accuracy, such as integrating multiple related tuple units and developing structure-aware indexes. The overall goal is to find an effective and efficient approach to keyword search over relational database structures.
With the rapid development in Geographic Information Systems (GISs) and their applications, more and
more geo-graphical databases have been developed by different vendors. However, data integration and
accessing is still a big problem for the development of GIS applications as no interoperability exists among
different spatial databases. In this paper we propose a unified approach for spatial data query. The paper
describes a framework for integrating information from repositories containing different vector data sets
formats and repositories containing raster datasets. The presented approach converts different vector data
formats into a single unified format (File Geo-Database “GDB”). In addition, we employ “metadata” to
support a wide range of users’ queries to retrieve relevant geographic information from heterogeneous and
distributed repositories. Such an employment enhances both query processing and performance.
The document describes the network database model and CODASYL DBTG model. Some key points:
- The network model uses a many-to-many relationship with owner and member records linked together.
- The DBTG model simplified this to one-to-one and one-to-many relationships. It uses segments, sets, and links to represent records, relationships, and connections between records.
- The DBTG model provides commands to retrieve, update, insert, and delete records as well as connect and disconnect them from sets. Programs access the database using templates, pointers, and status flags stored in a work area.
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIEScsandit
This document summarizes a research paper that proposes a system to enhance keyword search over relational databases using ontologies. The system builds structures during pre-processing like a reachability index to store connectivity information and an ontology concept graph. During querying, it maps keywords to concepts, uses the ontology to find related concepts and tuples, and generates top-k answer trees combining syntactic and semantic matches while limiting redundant results. The system is expected to perform better than existing approaches by reducing storage requirements through its approach to materializing neighborhood information in the reachability index.
Clustering the results of a search helps the user to overview the information returned. In this paper, we
look upon the clustering task as cataloguing the search results. By catalogue we mean a structured label
list that can help the user to realize the labels and search results. Labelling Cluster is crucial because
meaningless or confusing labels may mislead users to check wrong clusters for the query and lose extra
time. Additionally, labels should reflect the contents of documents within the cluster accurately. To be able
to label clusters effectively, a new cluster labelling method is introduced. More emphasis was given to
/produce comprehensible and accurate cluster labels in addition to the discovery of document clusters. We
also present a new metric that employs to assess the success of cluster labelling. We adopt a comparative
evaluation strategy to derive the relative performance of the proposed method with respect to the two
prominent search result clustering methods: Suffix Tree Clustering and Lingo.
we perform the experiments using the publicly available Datasets Ambient and ODP-239
P REFIX - BASED L ABELING A NNOTATION FOR E FFECTIVE XML F RAGMENTATIONijcsit
XML is
gradually
emplo
yed as
a standard of data exchange
in
web
environment
since its inception
in the
90s
until
present
.
It
serves
as a data exchange between system
s
and other application
s
.
Meanwhile t
he data
volume has grown substantially
in the web and
thus effective methods
of
storing and retrieving
these
data
is
essential
.
One recommended way is
p
hysically or virtually
fragments
the large chunk of data
and
distributes
the fragments
into different nodes.
F
ragmentation design
of XML document
contains of two
parts: fragmentat
ion operation and fragmentation method. The
three
fragmentation o
peration
s are
Horizontal, Vertical
and Hybrid. It
determines how the XML should be fragmented.
This
paper
aims
to give
an overview on the fragmentation design consideration
and
subsequently,
propose a
fragmentation
technique
using
number addressing
.
MONGODB VS MYSQL: A COMPARATIVE STUDY OF PERFORMANCE IN SUPER MARKET MANAGEME...ijcsity
A database is information collection that is organized in tables so that it can easily be accessed, managed,
and updated. It is the collection of tables, schemas, queries, reports, views and other objects. The data are
typically organized to model in a way that supports processes requiring information, such as modelling to
find a hotel with availability of rooms, thus the people can easily locate the hotels with vacancies. There
are many databases commonly, relational and non relational databases. Relational databases usually work
with structured data and non relational databases are work with semi structured data. In this paper, the
performance evaluation of MySQL and MongoDB is performed where MySQL is an example of relational
database and MongoDB is an example of non relational databases. A relational database is a data
structure that allows you to connect information from different 'tables', or different types of data buckets.
Non-relational database stores data without explicit and structured mechanisms to link data from different
buckets to one another. This paper discuss about the performance of MongoDB and MySQL in the field of
Super Market Management System. A supermarket is a large form of the traditional grocery store also a
self-service shop offering a wide variety of food and household products, organized in systematic manner.
It is larger and has a open selection than a traditional grocery store.
Master of Computer Application (MCA) – Semester 4 MC0077Aravind NC
The document provides information on various database normalization forms including 1NF, 2NF, 3NF, BCNF, and 4NF. It explains the differences between 3NF and BCNF, noting that BCNF is a stronger normal form that can capture some anomalies not captured by 3NF. It also discusses the differences between distributed and centralized database systems, highlighting advantages of data distribution such as data sharing, reliability/availability through replication, and faster query processing through parallelization.
Graph Based Workload Driven Partitioning System by Using MongoDBIJAAS Team
The web applications and websites of the enterprises are accessed by a huge number of users with the expectation of reliability and high availability. Social networking sites are generating the data exponentially large amount of data. It is a challenging task to store data efficiently. SQL and NoSQL are mostly used to store data. As RDBMS cannot handle the unstructured data and huge volume of data, so NoSQL is better choice for web applications. Graph database is one of the efficient ways to store data in NoSQL. Graph database allows us to store data in the form of relation. In Graph representation each tuple is represented by node and the relationship is represented by edge. But, to handle the exponentially growth of data into a single server might decrease the performance and increases the response time. Data partitioning is a good choice to maintain a moderate performance even the workload increases. There are many data partitioning techniques like Range, Hash and Round robin but they are not efficient for the small transactions that access a less number of tuples. NoSQL data stores provide scalability and availability by using various partitioning methods. To access the Scalability, Graph partitioning is an efficient way that can be easily represent and process that data. To balance the load data are partitioned horizontally and allocate data across the geographical available data stores. If the partitions are not formed properly result becomes expensive distributed transactions in terms of response time. So the partitioning of the tuple should be based on relation. In proposed system, Schism technique is used for partitioning the Graph. Schism is a workload aware graph partitioning technique. After partitioning the related tuples should come into a single partition. The individual node from the graph is mapped to the unique partition. The overall aim of Graph partitioning is to maintain nodes onto different distributed partition so that related data come onto the same cluster.
An extended database reverse engineering – a key for database forensic invest...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Efficient Record De-Duplication Identifying Using Febrl FrameworkIOSR Journals
This document describes using the Febrl (Freely Extensible Biomedical Record Linkage) framework to perform efficient record de-duplication. It discusses how Febrl allows for data cleaning, standardization, indexing, field comparison, and weight vector classification. Indexing techniques like blocking indexes, q-grams, and canopy clustering are used to reduce the number of record pair comparisons. Field comparison functions calculate matching weights, and classifiers like Fellegi-Sunter and support vector machines are used to determine matches. The method is evaluated on real-world health data, showing accuracy, precision, recall, and false positive rates for different partitioning methods.
Some background and thoughts on Metadata Mapping and Metadata Crosswalks. A collection of online sources and related projects. Comments are more than welcome, as is reuse!
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITYIJDKP
This document summarizes an approach to improve source code retrieval using structural information from source code. A lexical parser is developed to extract control statements and method identifiers from Java programs. A similarity measure is proposed that calculates the ratio of fully matching statements to partially matching statements in a sequence. Experiments show the retrieval model using this measure improves retrieval performance over other models by up to 90.9% relative to the number of retrieved methods.
Transforming data-centric eXtensible markup language into relational database...journalBEEI
eXtensible markup language (XML) appeared internationally as the format for data representation over the web. Yet, most organizations are still utilising relational databases as their database solutions. As such, it is crucial to provide seamless integration via effective transformation between these database infrastructures. In this paper, we propose XML-REG to bridge these two technologies based on node-based and path-based approaches. The node-based approach is good to annotate each positional node uniquely, while the path-based approach provides summarised path information to join the nodes. On top of that, a new range labelling is also proposed to annotate nodes uniquely by ensuring the structural relationships are maintained between nodes. If a new node is to be added to the document, re-labelling is not required as the new label will be assigned to the node via the new proposed labelling scheme. Experimental evaluations indicated that the performance of XML-REG exceeded XMap, XRecursive, XAncestor and Mini-XML concerning storing time, query retrieval time and scalability. This research produces a core framework for XML to relational databases (RDB) mapping, which could be adopted in various industries.
Semi-automatic Discovery of Mappings Between Heterogeneous Data Warehouse Dim...IDES Editor
Data Warehousing is the main Business
Intelligence instrument for the analysis of large amounts of
data. It permits the extraction of relevant information for
decision making processes inside organizations. Given the
great diffusion of Data Warehouses, there is an increasing
need to integrate information coming from independent
Data Warehouses or from independently developed data
marts in the same Data Warehouse. In this paper, we
provide a method for the semi-automatic discovery of
common topological properties of dimensions that can be
used to automatically map elements of different dimensions
in heterogeneous Data Warehouses. The method uses
techniques from the Data Integration research area and
combines topological properties of dimensions in a
multidimensional model.
Comparative study of no sql document, column store databases and evaluation o...ijdms
In the last decade, rapid growth in mobile applications, web technologies, social media generating
unstructured data has led to the advent of various nosql data stores. Demands of web scale are in
increasing trend everyday and nosql databases are evolving to meet up with stern big data requirements.
The purpose of this paper is to explore nosql technologies and present a comparative study of document
and column store nosql databases such as cassandra, MongoDB and Hbase in various attributes of
relational and distributed database system principles. Detailed study and analysis of architecture and
internal working cassandra, Mongo DB and HBase is done theoretically and core concepts are depicted.
This paper also presents evaluation of cassandra for an industry specific use case and results are
published.
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
The document provides definitions and explanations of key concepts in database management systems. It discusses:
- The purpose of a DBMS is to solve problems with file processing systems like data redundancy, inconsistency, difficult data access and isolation, and integrity and concurrency issues.
- Data abstraction and levels of abstraction hide complexity from users through physical, logical, and view levels.
- A DBMS provides an environment for convenient and efficient data retrieval and storage.
- Data independence allows changes to schema definitions without affecting other levels.
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMINGijiert bestjournal
This document summarizes a research paper that evaluates Cassandra and MongoDB NoSQL databases for processing unstructured data using Hadoop streaming. It proposes a system with three stages: data preparation where data is downloaded from Cassandra servers to file systems; data transformation where JSON data is converted to other formats using MapReduce; and data processing where non-Java executables run on the transformed data. The document reviews related work on Cassandra and Hadoop performance and discusses the data models of key-value, document, column-oriented, and graph databases. It concludes that comparing Cassandra and MongoDB can help process unstructured data and outline new approaches.
An perspective into the raise of NoSQL systems and an comparison between RDBMS and NoSQL technologies.
The basic idea of the presentation originated while trying to understand the different alternatives available for managing data while building a fast, highly scalable, available, and reliable enterprise application.
CBSE XII Database Concepts And MySQL PresentationGuru Ji
The document provides an introduction to database concepts and the relational model. It defines what a database is and discusses the purpose of databases, including reducing data redundancy and maintaining data integrity. It also describes different data models like relational, network, and hierarchical models. The relational model is then explained in detail, covering terminology, keys, views, and relational algebra operations like select, project, cartesian product. The document provides examples to illustrate database concepts and the relational model.
1. The document discusses key concepts related to database systems including the definition of a database, database management systems (DBMS), data models, database classification, data integrity, query optimization, structured query language (SQL), parallel databases, and object-relational mapping (ORM).
2. It provides details on common data models like hierarchical, network, and relational models. It also describes concepts like database architecture, data definition language, data manipulation language, and distributed databases.
3. Control questions are provided at the end to test understanding of database concepts like the difference between a database and data set, components of a database system, and main elements of a database.
This document summarizes and compares different methods for performing keyword searches in relational databases. It discusses candidate network-based methods, Steiner-tree based algorithms, and backward expanding keyword search approaches. It also evaluates methods that aim to improve search efficiency and accuracy, such as integrating multiple related tuple units and developing structure-aware indexes. The overall goal is to find an effective and efficient approach to keyword search over relational database structures.
Elimination of data redundancy before persisting into dbms using svm classifi...nalini manogaran
Elimination of data redundancy before persisting into dbms using svm classification,
Data Base Management System is one of the
growing fields in computing world. Grid computing, internet
sharing, distributed computing, parallel processing and cloud
are the areas store their huge amount of data in a DBMS to
maintain the structure of the data. Memory management is
one of the major portions in DBMS due to edit, delete, recover
and commit operations used on the records. To improve the
memory utilization efficiently, the redundant data should be
eliminated accurately. In this paper, the redundant data is
fetched by the Quick Search Bad Character (QSBC) function
and intimate to the DB admin to remove the redundancy.
QSBC function compares the entire data with patterns taken
from index table created for all the data persisted in the
DBMS to easy comparison of redundant (duplicate) data in
the database. This experiment in examined in SQL server
software on a university student database and performance is
evaluated in terms of time and accuracy. The database is
having 15000 students data involved in various activities.
Keywords—Data redundancy, Data Base Management System,
Support Vector Machine, Data Duplicate.
I. INTRODUCTION
The growing (prenominal) mass of information
present in digital media has become a resistive problem for
data administrators. Usually, shaped on data congregate
from distinct origin, data repositories such as those used by
digital libraries and e-commerce agent based records with
disparate schemata and structures. Also problems regarding
to low response time, availability, security and quality
assurance become more troublesome to manage as the
amount of data grow larger. It is practicable to specimen
that the peculiarity of the data that an association uses in its
systems is relative to its efficiency for offering beneficial
services to their users. In this environment, the
determination of maintenance repositories with “dirty” data
(i.e., with replicas, identification errors, equal patterns,
etc.) goes greatly beyond technical discussion such as the
everywhere quickness or accomplishment of data
administration systems.
Nalini.M, nalini.tptwin@gmail.com, Anbu.S, anomaly detection,
data mining
big data
dbms
intrusion detection
dublicate detection
data cleaning
data redundancy
data replication, redundancy removel, QSBC, Duplicate detection, error correction, de-duplication, Data cleaning, Dbms, Data sets
Abstract—Since the demand for information retrieval increases quickly, indexing structures became an important issue to support fast information retrieval. According to the work in this paper, a new data structure called Dynamic Ordered Multi-field Index (DOMI) for information retrieval has been introduced. It is based on radix trees organized in segments in addition to a hash table to point to the roots of each segment, where each segment is dedicated to store the values of a single field. The hash table is used to access the needed segments directly without traversing the upper segments. So, DOMI improves look-up performance for queries addressing to a single field. In the case of multiple queries addressing, each segment of the radix tree is traversed sequentially without visiting the unrelated branches. The use of segmentation for the proposed DOMI provides flexibility for minimizing communication overhead in the distributed system. Every field in the radix tree is represented by one segment, where each segment can be stored as one block.
In addition to, the proposed DOMI consumes less space comparing to indexes which are built using B or B+ trees. Hence, it is more suitable for intensive-data such as Big Data.
QUERY OPTIMIZATION IN OODBMS: IDENTIFYING SUBQUERY FOR COMPLEX QUERY MANAGEMENTcsandit
This document discusses query optimization in object-oriented database management systems (OODBMS) using query decomposition and caching. It proposes an approach that decomposes complex queries into smaller subqueries for faster retrieval of cached results. The approach aims to reuse parts of cached results to answer wider queries by combining multiple cached queries. Experiments showed this approach improved query optimization performance especially when data manipulation rates were low compared to data retrieval rates. Key aspects included decomposing queries, caching subquery results, and reusing cached results to answer other queries.
Enhancing keyword search over relational databases using ontologiescsandit
Keyword Search Over Relational Databases (KSORDB) provides an easy way for casual users
to access relational databases using a set of keywords. Although much research has been done
and several prototypes have been developed recently, most of this research implements exact
(also called syntactic or keyword) match. So, if there is a vocabulary mismatch, the user cannot
get an answer although the database may contain relevant data. In this paper we propose a
system that overcomes this issue. Our system extends existing schema-free KSORDB systems
with semantic match features. So, if there were no or very few answers, our system exploits
domain ontology to progressively return related terms that can be used to retrieve more
relevant answers to user.
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES cscpconf
Keyword Search Over Relational Databases (KSORDB) provides an easy way for casual users to access relational databases using a set of keywords. Although much research has been done and several prototypes have been developed recently, most of this research implements exact also called syntactic or keyword) match. So, if there is a vocabulary mismatch, the user cannotget an answer although the database may contain relevant data. In this paper we propose a
system that overcomes this issue. Our system extends existing schema-free KSORDB systems with semantic match features. So, if there were no or very few answers, our system exploits
domain ontology to progressively return related terms that can be used to retrieve morerelevant answers to user.
Power Management in Micro grid Using Hybrid Energy Storage Systemijcnes
This paper proposed for power management in micro grid using a hybrid distributed generator based on photovoltaic, wind-driven PMDC and energy storage system is proposed. In this generator, the sources are together connected to the grid with the help of interleaved boost converter followed by an inverter. Thus, compared to earlier schemes, the proposed scheme has fewer power converters. FUZZY based MPPT controllers are also proposed for the new hybrid scheme to separately trigger the interleaved DC-DC converter and the inverter for tracking the maximum power from both the sources. The integrated operations of both the proposed controllers for different conditions are demonstrated through simulation with the help of MATLAB software
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET Journal
The document discusses techniques for detecting similarity and deduplication in document analysis using vector analysis. It proposes analyzing documents by extracting abstract content, separating words and combining them in a word cloud to determine frequency. This approach aims to identify whether documents are duplicates by analyzing word vectors at the word, sentence and paragraph level while also applying techniques like stemming, stopping words and semantic similarity.
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET Journal
The document discusses techniques for detecting similarity and deduplication in document analysis using vector analysis. It proposes analyzing documents by extracting abstract content, separating words and combining them in a word cloud to determine frequency. This approach aims to identify whether documents are duplicates by analyzing word vectors at the word, sentence and paragraph level while also applying techniques like stemming, stopping words and semantic similarity.
Prerequisies of DBMS
Course Objectives of DBMS
Syllabus
What is the meaning of data and database
DBMS
History of DBMS
Different Databases available in Market
Storage areas
Why to Learn DBMS?
Peoples who work with Databases
Applications of DBMS
I'm Muhammad Sharif Database administrator and Database system Engineer from SKMCHRC Lahore.
I am good in databases and Research in data science
This book title: database systems handbook was purely written by Muhammad Sharif.
Database management systems
Database systems handbook
#Muhammad Sharif
#Database_systems_handbook
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
IRJET- Data Retrieval using Master Resource Description FrameworkIRJET Journal
This document discusses using a Master Resource Description Framework (MRDF) to improve data retrieval efficiency from databases. The MRDF combines multiple RDF files into a single framework to reduce the time needed for search engines to query each individual RDF file. It also describes using a user profile to track user interests and tailor query results accordingly for a personalized search experience. The MRDF approach is presented as improving search efficiency while retrieving data from databases.
Query optimization in oodbms identifying subquery for query managementijdms
This paper is based on relatively newer approach for query optimization in object databases, which uses
query decomposition and cached query results to improve execution a query. Issues that are focused here is
fast retrieval and high reuse of cached queries, Decompose Query into Sub query, Decomposition of
complex queries into smaller for fast retrieval of result.
Here we try to address another open area of query caching like handling wider queries. By using some
parts of cached results helpful for answering other queries (wider Queries) and combining many cached
queries while producing the result.
Multiple experiments were performed to prove the productivity of this newer way of optimizing a query.
The limitation of this technique is that it’s useful especially in scenarios where data manipulation rate is
very low as compared to data retrieval rate.
The document discusses data mining functionalities including descriptive and predictive tasks. Descriptive tasks characterize data properties, while predictive tasks perform induction to make predictions on data. Specifically, it describes concept/class description which involves characterizing and discriminating classes/concepts by summarizing target classes, comparing them to contrasting classes, and presenting outputs in forms like charts and data cubes.
This document discusses data mining functionalities including descriptive and predictive tasks. Descriptive tasks characterize data properties through classification, characterization, and discrimination of data classes. Predictive tasks perform induction to enable predictions on current data. The document also outlines the knowledge discovery process used in data mining and common data types like relational databases, data warehouses, and transactional data.
This document discusses event driven architecture (EDA) and domain driven design. It begins with an introduction to the speaker and an overview of EDA basics. It then describes problems with traditional SOA implementations, where domain logic gets split across many systems. The document proposes that exposing domain events on a shared event bus allows isolating cross-cutting functions to separate systems while keeping domain logic together. It provides examples of how this approach improves scalability and decouples systems. Finally, it outlines potential business benefits of using EDA like enabling complex event processing, business process management, and business activity monitoring on top of the domain events.
The document discusses trends and challenges facing information technology, including building a civic semantic web and waiving rights over linked data. It also discusses whether semantic technologies could permit meaningful brand relationships. The document contains a chart showing government department spending in the UK, with the Department of Health spending £105.7 billion, followed by local and regional government spending £34.3 billion, and the NHS spending £90.7 billion.
genpaxospublic-090703114743-phpapp01.pdfHiroshi Ono
This document summarizes an Erlang meeting held on July 3, 2009 in Tokyo. It discusses the gen_paxos Erlang module, which implements the Paxos consensus algorithm. Paxos is needed to solve problems like split-brains where data could become inconsistent without coordination between nodes. The document explains the key aspects of Paxos like its phases, data model in gen_paxos, and how nodes communicate through message passing in Erlang. It also provides references to related works and papers about Paxos.
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdfHiroshi Ono
The document discusses Scala and functional programming concepts. It provides examples of building a chat application in 30 lines of code using Lift, defining case classes and actors for messages. It summarizes that Scala is a pragmatically oriented, statically typed language that runs on the JVM and has a unique blend of object-oriented and functional programming. Functional programming concepts like immutable data structures, functions as first-class values, and for-comprehensions are demonstrated with examples in Scala.
This document is the introduction to "The Little Book of Semaphores" by Allen B. Downey. It provides an overview of the book, which uses examples and puzzles to teach synchronization concepts and patterns. The book aims to give students more practice with these challenging concepts than a typical operating systems course allows. It also discusses the book's licensing as free and open source documentation.
This document provides style guidelines for Scala developers at Twitter. It outlines recommendations for imports, implicit usage, reflection, comments, whitespace, logging, project layout, variable naming conventions, and ends by thanking people for attending.
This document introduces developing a Scala DSL for Apache Camel. It discusses using Scala features like implicit conversions, passing functions as parameters, and by-name parameters to build a DSL. It provides examples of simple routes in the Scala DSL and compares them to Java. It also covers tooling for Scala in Maven and Eclipse and caveats like interacting with Java generics. The goal is to learn basic Scala concepts and syntax for building a Scala DSL, using Camel as an example.
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdfHiroshi Ono
The document discusses alternative concurrency paradigms to shared-state concurrency for the JVM, including software transactional memory which allows transactions over shared memory, message passing concurrency using the actor model where actors communicate asynchronously via message passing, and dataflow concurrency where variables can only be assigned once. It provides examples of how these paradigms can be used to implement solutions like transferring funds between bank accounts more elegantly than with shared-state concurrency and locks.
This document discusses using TCP/IP for high performance computing (HPC) applications. It finds that while TCP/IP can achieve bandwidth of 1 Gbps over short distances with low latency, the bandwidth degrades significantly over wide area networks with higher latency. It investigates tuning TCP parameters like socket buffer sizes to improve performance over high latency networks.
Martin Odersky outlines the growth and adoption of Scala over the past 6 years and discusses Scala's future direction over the next 5 years. Key points include:
- Scala has grown from its first classroom use in 2003 to filling a full day of talks at JavaOne in 2009 and developing a large user community.
- Scala 2.8 will include new collections, package objects, named/default parameters, and improved tool support.
- Over the next 5 years, Scala will focus on concurrency and parallelism features at all levels from primitives to tools.
- Other areas of focus include extended libraries, performance improvements, and standardized compiler plugin architecture.
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdfHiroshi Ono
This document discusses alternative concurrency paradigms for the Java Virtual Machine (JVM). It begins with an agenda and discusses how Moore's Law no longer solves concurrency problems as processors are becoming multi-core. It then discusses the problems with shared-state concurrency and how separating identity and value can help. The document introduces software transactional memory, message passing concurrency using actors, and dataflow concurrency as alternative paradigms. It uses examples of bank account transfers to demonstrate how these paradigms can be implemented and discusses their advantages over shared-state concurrency.
This document contains the schedule for a conference with sessions on various topics in natural language processing and computational linguistics. The conference will take place from September 14-16. Each day consists of morning and afternoon sessions split into parallel tracks (1a and 1b). Sessions cover areas like semantics, parsing, sentiment analysis, and more. Keynote speakers include Ricardo Baeza-Yates, Kevin Bretonnel Cohen, Mirella Lapata, Shalom Lappin, and Massimo Poesio. Presentations are 20 minutes each with coffee breaks in the mornings and poster sessions in the afternoons.
The article discusses the Guardian's Datastore project, which makes data of public interest freely available online for reuse. Some key points:
- The Datastore contains datasets on topics like MPs' expenses, carbon emissions, and public opinion polls. This data was previously hard to access but the web now allows easy access to billions of statistics.
- Making this data open and machine-readable supports the Guardian's tradition of fact-checking and transparency. It also encourages others to analyze and build upon the data in new ways.
- An early example involved crowdsourcing the review of 500,000 pages of MPs' expenses, revealing new insights. Other Guardian datasets like music recommendations and university rankings are now available for others
genpaxospublic-090703114743-phpapp01.pdfHiroshi Ono
This document summarizes a presentation on Paxos and gen_paxos. It introduces Paxos as a distributed consensus algorithm that is robust to network failures and allows data replication across multiple nodes. It then describes the gen_paxos Erlang implementation of Paxos, including its data model, state machine approach, and messaging between nodes. Key aspects of Paxos like the prepare and propose phases are explained through examples. The document also provides context on applications of Paxos and references for further reading.
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...DanBrown980551
This LF Energy webinar took place June 20, 2024. It featured:
-Alex Thornton, LF Energy
-Hallie Cramer, Google
-Daniel Roesler, UtilityAPI
-Henry Richardson, WattTime
In response to the urgency and scale required to effectively address climate change, open source solutions offer significant potential for driving innovation and progress. Currently, there is a growing demand for standardization and interoperability in energy data and modeling. Open source standards and specifications within the energy sector can also alleviate challenges associated with data fragmentation, transparency, and accessibility. At the same time, it is crucial to consider privacy and security concerns throughout the development of open source platforms.
This webinar will delve into the motivations behind establishing LF Energy’s Carbon Data Specification Consortium. It will provide an overview of the draft specifications and the ongoing progress made by the respective working groups.
Three primary specifications will be discussed:
-Discovery and client registration, emphasizing transparent processes and secure and private access
-Customer data, centering around customer tariffs, bills, energy usage, and full consumption disclosure
-Power systems data, focusing on grid data, inclusive of transmission and distribution networks, generation, intergrid power flows, and market settlement data
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/temporal-event-neural-networks-a-more-efficient-alternative-to-the-transformer-a-presentation-from-brainchip/
Chris Jones, Director of Product Management at BrainChip , presents the “Temporal Event Neural Networks: A More Efficient Alternative to the Transformer” tutorial at the May 2024 Embedded Vision Summit.
The expansion of AI services necessitates enhanced computational capabilities on edge devices. Temporal Event Neural Networks (TENNs), developed by BrainChip, represent a novel and highly efficient state-space network. TENNs demonstrate exceptional proficiency in handling multi-dimensional streaming data, facilitating advancements in object detection, action recognition, speech enhancement and language model/sequence generation. Through the utilization of polynomial-based continuous convolutions, TENNs streamline models, expedite training processes and significantly diminish memory requirements, achieving notable reductions of up to 50x in parameters and 5,000x in energy consumption compared to prevailing methodologies like transformers.
Integration with BrainChip’s Akida neuromorphic hardware IP further enhances TENNs’ capabilities, enabling the realization of highly capable, portable and passively cooled edge devices. This presentation delves into the technical innovations underlying TENNs, presents real-world benchmarks, and elucidates how this cutting-edge approach is positioned to revolutionize edge AI across diverse applications.
The Microsoft 365 Migration Tutorial For Beginner.pptxoperationspcvita
This presentation will help you understand the power of Microsoft 365. However, we have mentioned every productivity app included in Office 365. Additionally, we have suggested the migration situation related to Office 365 and how we can help you.
You can also read: https://www.systoolsgroup.com/updates/office-365-tenant-to-tenant-migration-step-by-step-complete-guide/
Essentials of Automations: Exploring Attributes & Automation ParametersSafe Software
Building automations in FME Flow can save time, money, and help businesses scale by eliminating data silos and providing data to stakeholders in real-time. One essential component to orchestrating complex automations is the use of attributes & automation parameters (both formerly known as “keys”). In fact, it’s unlikely you’ll ever build an Automation without using these components, but what exactly are they?
Attributes & automation parameters enable the automation author to pass data values from one automation component to the next. During this webinar, our FME Flow Specialists will cover leveraging the three types of these output attributes & parameters in FME Flow: Event, Custom, and Automation. As a bonus, they’ll also be making use of the Split-Merge Block functionality.
You’ll leave this webinar with a better understanding of how to maximize the potential of automations by making use of attributes & automation parameters, with the ultimate goal of setting your enterprise integration workflows up on autopilot.
High performance Serverless Java on AWS- GoTo Amsterdam 2024Vadym Kazulkin
Java is for many years one of the most popular programming languages, but it used to have hard times in the Serverless community. Java is known for its high cold start times and high memory footprint, comparing to other programming languages like Node.js and Python. In this talk I'll look at the general best practices and techniques we can use to decrease memory consumption, cold start times for Java Serverless development on AWS including GraalVM (Native Image) and AWS own offering SnapStart based on Firecracker microVM snapshot and restore and CRaC (Coordinated Restore at Checkpoint) runtime hooks. I'll also provide a lot of benchmarking on Lambda functions trying out various deployment package sizes, Lambda memory settings, Java compilation options and HTTP (a)synchronous clients and measure their impact on cold and warm start times.
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillLizaNolte
HERE IS YOUR WEBINAR CONTENT! 'Mastering Customer Journey Management with Dr. Graham Hill'. We hope you find the webinar recording both insightful and enjoyable.
In this webinar, we explored essential aspects of Customer Journey Management and personalization. Here’s a summary of the key insights and topics discussed:
Key Takeaways:
Understanding the Customer Journey: Dr. Hill emphasized the importance of mapping and understanding the complete customer journey to identify touchpoints and opportunities for improvement.
Personalization Strategies: We discussed how to leverage data and insights to create personalized experiences that resonate with customers.
Technology Integration: Insights were shared on how inQuba’s advanced technology can streamline customer interactions and drive operational efficiency.
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...Alex Pruden
Folding is a recent technique for building efficient recursive SNARKs. Several elegant folding protocols have been proposed, such as Nova, Supernova, Hypernova, Protostar, and others. However, all of them rely on an additively homomorphic commitment scheme based on discrete log, and are therefore not post-quantum secure. In this work we present LatticeFold, the first lattice-based folding protocol based on the Module SIS problem. This folding protocol naturally leads to an efficient recursive lattice-based SNARK and an efficient PCD scheme. LatticeFold supports folding low-degree relations, such as R1CS, as well as high-degree relations, such as CCS. The key challenge is to construct a secure folding protocol that works with the Ajtai commitment scheme. The difficulty, is ensuring that extracted witnesses are low norm through many rounds of folding. We present a novel technique using the sumcheck protocol to ensure that extracted witnesses are always low norm no matter how many rounds of folding are used. Our evaluation of the final proof system suggests that it is as performant as Hypernova, while providing post-quantum security.
Paper Link: https://eprint.iacr.org/2024/257
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
AppSec PNW: Android and iOS Application Security with MobSFAjin Abraham
Mobile Security Framework - MobSF is a free and open source automated mobile application security testing environment designed to help security engineers, researchers, developers, and penetration testers to identify security vulnerabilities, malicious behaviours and privacy concerns in mobile applications using static and dynamic analysis. It supports all the popular mobile application binaries and source code formats built for Android and iOS devices. In addition to automated security assessment, it also offers an interactive testing environment to build and execute scenario based test/fuzz cases against the application.
This talk covers:
Using MobSF for static analysis of mobile applications.
Interactive dynamic security assessment of Android and iOS applications.
Solving Mobile app CTF challenges.
Reverse engineering and runtime analysis of Mobile malware.
How to shift left and integrate MobSF/mobsfscan SAST and DAST in your build pipeline.
Must Know Postgres Extension for DBA and Developer during MigrationMydbops
Mydbops Opensource Database Meetup 16
Topic: Must-Know PostgreSQL Extensions for Developers and DBAs During Migration
Speaker: Deepak Mahto, Founder of DataCloudGaze Consulting
Date & Time: 8th June | 10 AM - 1 PM IST
Venue: Bangalore International Centre, Bangalore
Abstract: Discover how PostgreSQL extensions can be your secret weapon! This talk explores how key extensions enhance database capabilities and streamline the migration process for users moving from other relational databases like Oracle.
Key Takeaways:
* Learn about crucial extensions like oracle_fdw, pgtt, and pg_audit that ease migration complexities.
* Gain valuable strategies for implementing these extensions in PostgreSQL to achieve license freedom.
* Discover how these key extensions can empower both developers and DBAs during the migration process.
* Don't miss this chance to gain practical knowledge from an industry expert and stay updated on the latest open-source database trends.
Mydbops Managed Services specializes in taking the pain out of database management while optimizing performance. Since 2015, we have been providing top-notch support and assistance for the top three open-source databases: MySQL, MongoDB, and PostgreSQL.
Our team offers a wide range of services, including assistance, support, consulting, 24/7 operations, and expertise in all relevant technologies. We help organizations improve their database's performance, scalability, efficiency, and availability.
Contact us: info@mydbops.com
Visit: https://www.mydbops.com/
Follow us on LinkedIn: https://in.linkedin.com/company/mydbops
For more details and updates, please follow up the below links.
Meetup Page : https://www.meetup.com/mydbops-databa...
Twitter: https://twitter.com/mydbopsofficial
Blogs: https://www.mydbops.com/blog/
Facebook(Meta): https://www.facebook.com/mydbops/
How information systems are built or acquired puts information, which is what they should be about, in a secondary place. Our language adapted accordingly, and we no longer talk about information systems but applications. Applications evolved in a way to break data into diverse fragments, tightly coupled with applications and expensive to integrate. The result is technical debt, which is re-paid by taking even bigger "loans", resulting in an ever-increasing technical debt. Software engineering and procurement practices work in sync with market forces to maintain this trend. This talk demonstrates how natural this situation is. The question is: can something be done to reverse the trend?
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...Fwdays
Direct losses from downtime in 1 minute = $5-$10 thousand dollars. Reputation is priceless.
As part of the talk, we will consider the architectural strategies necessary for the development of highly loaded fintech solutions. We will focus on using queues and streaming to efficiently work and manage large amounts of data in real-time and to minimize latency.
We will focus special attention on the architectural patterns used in the design of the fintech system, microservices and event-driven architecture, which ensure scalability, fault tolerance, and consistency of the entire system.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving
Manufacturing custom quality metal nameplates and badges involves several standard operations. Processes include sheet prep, lithography, screening, coating, punch press and inspection. All decoration is completed in the flat sheet with adhesive and tooling operations following. The possibilities for creating unique durable nameplates are endless. How will you create your brand identity? We can help!
Northern Engraving | Nameplate Manufacturing Process - 2024
P341
1. A Fast Index for Semistructured Data
Brian F. Cooper1,2, Neal Sample1,2, Michael J. Franklin1,3, Gísli R. Hjaltason1, Moshe Shadmon1
1 2 3
RightOrder Incorporated Department of Computer Science Computer Science Division
3850 N. First St. Stanford University University of California
San Jose, CA 95134 USA Stanford, CA 94305 USA Berkeley, CA 94720 USA
{cooperb,nsample}@db.stanford.edu, franklin@cs.berkeley.edu,
{gislih,moshes}@rightorder.com
Abstract structure takes the place of a schema in traditional,
structured database systems. Evaluating queries over
Queries navigate semistructured data via path semistructured data involves navigating paths through this
expressions, and can be accelerated using an relationship structure, examining both the data elements
index. Our solution encodes paths as strings, and and the self-describing element names along the paths.
inserts those strings into a special index that is Typically, indexes are constructed for efficient access.
highly optimized for long and complex keys. We One option for managing semistructured data is to
describe the Index Fabric, an indexing structure store and query it with a relational database. The data
that provides the efficiency and flexibility we must be converted into a set of tuples and stored in tables;
need. We discuss how quot;raw pathsquot; are used to for example, using tools provided with Oracle 8i/9i [25].
optimize ad hoc queries over semistructured This process requires a schema for the data. Moreover,
data, and how quot;refined pathsquot; optimize specific the translation is not trivial, and it is difficult to efficiently
access paths. Although we can use knowledge evaluate queries without extensions to the relational
about the queries and structure of the data to model [26]. If no schema exists, the data can be stored as
create refined paths, no such knowledge is a set of data elements and parent-child nesting
needed for raw paths. A performance study relationships [17]. Querying this representation is
shows that our techniques, when implemented on expensive, even with indexes. The STORED system [12]
top of a commercial relational database system, uses data mining to extract a partial schema. Data that
outperform the more traditional approach of does not fit the schema well must be stored and queried in
using the commercial system’s indexing its native form.
mechanisms to query the XML. An alternative option is to build a specialized data
manager that contains a semistructured data repository at
1. Introduction its core. Projects such as Lore [24] and industrial products
such as Tamino [28] and XYZFind [29] take this
Database management systems are increasingly being approach. It is difficult to achieve high query performance
called upon to manage semistructured data: data with an using semistructured data repositories, since queries are
irregular or changing organization. An example again answered by traversing many individual element to
application for such data is a business-to-business product element links, requiring multiple index lookups [23].
catalog, where data from multiple suppliers (each with Moreover, semistructured data management systems do
their own schema) must be integrated so that buyers can not have the benefit of the extensive experience gained
query it. Semistructured data is often represented as a with relational systems over the past few decades.
graph, with a set of data elements connected by labeled To solve this problem, we have developed a different
relationships, and this self-describing relationship approach that leverages existing relational database
Permission to copy without fee all or part of this material is granted technology but provides much better performance than
provided that the copies are not made or distributed for direct previous approaches. Our method encodes paths in the
commercial advantage, the VLDB copyright notice and the title of the data as strings, and inserts these strings into an index that
publication and its date appear, and notice is given that copying is by is highly optimized for string searching. The index blocks
permission of the Very Large Data Base Endowment. To copy
otherwise, or to republish, requires a fee and/or special permission from and semistructured data are both stored in a conventional
the Endowment relational database system. Evaluating queries involves
Proceedings of the 27th VLDB Conference, encoding the desired path traversal as a search key string,
Roma, Italy, 2001 and performing a lookup in our index to find the path.
2. There are several advantages to this approach. First, there
is no need for a priori knowledge of the schema of the
data, since the paths we encode are extracted from the
data itself. Second, our approach has high performance
even when the structure of the data is changing, variable
or irregular. Third, the same index can accelerate queries
along many different, complex access paths. This is
because our indexing mechanism scales gracefully with
the number of keys inserted, and is not affected by long or Figure 1. A Patricia trie.
complex keys (representing long or complex paths). raw paths, while the relational index utilized an edge
Our indexing mechanism, called the Index Fabric, mapping as well as a schema extracted by the STORED
utilizes the aggressive key compression inherent in a [12] system. Both refined and raw paths are significantly
Patricia trie [21] to index a large number of strings in a faster than the DBMS’s native indexing mechanism,
compact and efficient structure. Moreover, the Index sometimes by an order of magnitude or more. The
Fabric is inherently balanced, so that all accesses to the difference is particularly striking for data with irregular
index require the same small number of I/Os. As a result, structure, or queries that must navigate multiple paths.
we can index a large, complex, irregularly-structured,
disk-resident semistructured data set while providing 1.1. Paper overview
efficient navigation over paths in the data.
We manage two types of paths for semistructured In this paper, we describe the structure of the Index Fabric
data. First, we can index paths that exist in the raw data and how it can be used to optimize searches over
(called raw paths) to accelerate any ad hoc query. We can semistructured databases. Specifically, we make the
also reorganize portions of the data, to create refined following contributions:
paths, in order to better optimize particular queries. Both • We discuss how to utilize the Index Fabric’s support
kinds of paths are encoded as strings and inserted into the for long and complex keys to index semistructured
Index Fabric. Because the index grows so slowly as we data paths encoded as strings.
add new keys, we can create many refined paths and thus • We examine a simple encoding of the raw paths in a
optimize many access patterns, even complex patterns semistructured document, and discuss how to answer
that traditional techniques cannot easily handle. As a complex path queries over data with irregular
result, we can answer general queries efficiently using structure using raw paths.
raw paths, even as we further optimize certain queries • We present refined paths, a method for aggressively
using refined paths. Maintaining all of the paths in the optimizing frequently occurring and important access
same index structure reduces the resource contention that patterns. Refined paths support answering
occurs with multiple indexes, and provides a uniform complicated queries using a single index lookup.
mechanism that can be tuned for different needs. • We report the results of a performance study which
Although our implementation of the Index Fabric shows that a semistructured index based on the Index
uses a commercial relational DBMS, our techniques do Fabric can be an order of magnitude faster than
not dictate a particular storage architecture. In fact, the traditional indexing schemes.
fabric can be used as an index over a wide variety of This paper is organized as follows. In Section 2 we
storage engines, including a set of text files or a native introduce the Index Fabric and discuss searches and
semistructured database. The index provides a flexible, updates. Next, in Section 3, we present refined paths and
uniform and efficient mechanism to access data, while raw paths and examine how they are used to optimize
utilizing a stable storage manager to provide properties queries. In Section 4 we present the results of our
such as concurrency, fault tolerance, or security. performance experiments. In Section 5 we examine
A popular syntax for semistructured data is XML related work, and in Section 6 we discuss our conclusions.
[30], and in this paper we focus on using the Index Fabric
to index XML-encoded data. XML encodes information 2. The Index Fabric
as data elements surrounded by tags, and tags can be The Index Fabric is a structure that scales gracefully to
nested within other tags. This nesting structure can be large numbers of keys, and is insensitive to the length or
viewed as a tree, and raw paths represent root-to-leaf content of inserted strings. These features are necessary to
traversals of this tree. Refined paths represent traversing treat semistructured data paths as strings.
the tree in some other way (e.g. from sibling to sibling). The Index Fabric is based on Patricia tries [21]. An
We have implemented the Index Fabric as an index example Patricia trie is shown in Figure 1. The nodes are
on top of a popular commercial relational DBMS. To labeled with their depth: the character position in the key
evaluate performance, we indexed an XML data set using represented by the node. The size of the Patricia trie does
both the Index Fabric and the DBMS’s native B-trees. In not depend on the length of inserted keys. Rather, each
the Index Fabric, we have constructed both refined and
3. Layer 2 Layer 1 Layer 0
quot;quot;
quot;quot; quot;quot; f 0
0 c f
2
c quot;faquot;
s
2 2
s t s
2 r
quot;casquot;
s
3
quot;casquot; h t
3
quot;castquot; cat ... far ... fast ...
t
4
i l
4 cash ... quot;castiquot;
i
5
n r
castle ...
casting ... castiron ...
Figure 2. A layered index.
new key adds at most a single link and node to the index, Thus, in Figure 2, the node labeled “3” in layer 1
even if the key is long. Patricia tries grow slowly even as corresponds to the prefix “cas” and is connected to a
large numbers of strings are inserted because of the subtrie (rooted at a node representing “cas” and also
aggressive (lossy) compression inherent in the structure. labeled “3”) in layer 0 using an unlabeled direct link.
Patricia tries are unbalanced, main memory structures
that are rarely used for disk-based data. The Index Fabric 2.1. Searching
is a structure that has the graceful scaling properties of The search process begins in the root node of the block in
Patricia tries, but that is balanced and optimized for disk- the leftmost horizontal layer. Within a particular block,
based access like B-trees. The fabric uses a novel, layered the search proceeds normally, comparing characters in the
approach: extra layers of Patricia tries allow a search to search key to edge labels, and following those edges. If
proceed directly to a block-sized portion of the index that the labeled edge is a far link, the search proceeds
can answer a query. Every query accesses the same horizontally to a different block in the next layer to the
number of layers, providing balanced access to the index. right. If no labeled edge matches the appropriate character
More specifically, the basic Patricia trie string index of the search key, the search follows a direct (unlabeled)
is divided into block-sized subtries, and these blocks are edge horizontally to a new block in the next layer. The
indexed by a second trie, stored in its own block. We can search proceeds from layer to layer until the lowest layer
represent this second trie as a new horizontal layer, (layer 0) is reached and the desired data is found. During
complementing the vertical structure of the original trie. If the search in layer 0, if no labeled edge matches the
the new horizontal layer is too large to fit in a single disk appropriate character of the search key, this indicates that
block, it is split into two blocks, and indexed by a third the key does not exist, and the search terminates.
horizontal layer. An example is shown in Figure 2. The Otherwise, the path is followed to the data. It is necessary
trie in layer 1 is an index over the common prefixes of the to verify that the found data matches the search key, due
blocks in layer 0, where a common prefix is the prefix to the lossy compression of the Patricia trie.
represented by the root node of the subtrie within a block. The search process examines one block per layer1,
In Figure 2, the common prefix for each block is shown in and always examines the same number of layers. If the
“quotes”. Similarly, layer 2 indexes the common prefixes blocks correspond to disk blocks, this means that the
of layer 1. The index can have as many layers as search could require one I/O per layer, unless the needed
necessary; the leftmost layer always contains one block. block is in the cache. One benefit of using the Patricia
There are two kinds of links from layer i to layer i-1:
labeled far links ( ) and unlabeled direct links ( ). 1
It is possible for the search procedure to enter the wrong block,
Far links are like normal edges in a trie, except that a far
and then have to backtrack, due to the lossy compression of
link connects a node in one layer to a subtrie in the next prefixes in the non-leaf layers. This phenomenon is unique to
layer. A direct link connects a node in one layer to a block the multi-layer Patricia trie structure. In practice, such mistakes
with a node representing the same prefix in the next layer. are rare in a well-populated tree. See [10,19].
4. Doc 1: <invoice> Doc 2: <invoice>
<buyer> <buyer>
<name>ABC Corp</name> <name>Oracle Inc</name>
<address>1 Industrial Way</address> <phone>555-1212</phone>
</buyer> </buyer>
<seller> <seller>
<name>Acme Inc</name> <name>IBM Corp</name>
<address>2 Acme Rd.</address> </seller>
</seller> <item>
<item count=3>saw</item> <count>4</count>
<item count=2>drill</item> <name>nail</name>
</invoice> </item>
</invoice>
Figure 3. Sample XML.
structure is that keys are stored very compactly, and many fabric, and how to use path lookups to evaluate queries.
keys can be indexed per block. Thus, blocks have a very As a running example, we will use the XML in Figure 3.
high out-degree (number of far and direct links referring
to the next layer to the right.) Consequently, the vast 3.1. Designators
majority of space required by the index is at the rightmost We encode data paths using designators: special
layer, and the layers to the left (layer 1,2,…n) are characters or character strings. A unique designator is
significantly smaller. In practice, this means that an index assigned to each tag that appears in the XML. For
storing a large number of keys (e.g. a billion) requires example, for the XML in Figure 3, we can choose I for
three layers; layer 0 must be stored on disk but layers 1 <invoice>, B for <buyer>, N for <name>, and so
and 2 can reside in main memory. Key lookups require at on. (For illustration, here we will represent designators as
most one I/O, for the leaf index layer (in addition to data boldface characters.) Then, the string “IBNABC Corp”
I/Os). In the present context, this means that following has the same meaning as the XML fragment
any indexed path through the semistructured data, no <invoice>
matter how long, requires at most one index I/O. <buyer><name>ABC Corp</name></buyer>
</invoice>
2.2. Updates The designator-encoded XML string is inserted into the
Updates, insertions, and deletions, like searches, can be layered Patricia trie of the Index Fabric, which treats
performed very efficiently. An update is a key deletion designators the same way as normal characters, though
followed by a key insertion. Inserting a key into a Patricia conceptually they are from different alphabets.
trie involves either adding a single new node or adding an In order to interpret these designators (and
edge to an existing node. The insertion requires a change consequently to form and interpret queries) we maintain a
to a single block in layer 0. The horizontal index is mapping between designators and element tags called the
searched to locate the block to be updated. If this block designator dictionary. When an XML document is parsed
overflows, it must be split, requiring a new node at layer for indexing, each tag is matched to a designator using the
1. This change is also confined to one block. Splits dictionary. New designators are generated automatically
propagate left in the horizontal layers if at each layer for new tags. The tag names from queries are also
blocks overflow, and one block per layer is affected. translated into designators using the dictionary, to form a
Splits are rare, and the insertion process is efficient. If the search key over the Index Fabric. (See Section 3.5.)
block in the leftmost horizontal layer (the root block)
must be split, a new horizontal layer is created. 3.2. Raw paths
To delete a key, the fabric is searched using the key Raw paths index the hierarchical structure of the XML by
to find the block to be updated, and the edge pointing to encoding root-to-leaf paths as strings. Simple path
the leaf for the deleted key is removed from the trie. It is expressions that start at the root require a single index
possible to perform block recombination if block storage lookup. Other path expressions may require several
is underutilized, although this is not necessary for the lookups, or post-processing the result set. In this section,
correctness of the index. Due to space restrictions, we do we focus on the encoding of raw paths. Raw paths build
not present insertion, deletion and split algorithms here. on previous work in path indexing. (See Section 5).
The interested reader is referred to [10,19]. Tagged data elements are represented as designator-
encoded strings. We can regard all data elements as
3. Indexing XML with the Index Fabric leaves in the XML tree. For example, the XML fragment
<A>alpha<B>beta<C>gamma</C></B></A>.
Because the Index Fabric can efficiently manage large
can be represented as a tree with three root-to-leaf paths:
numbers of complex keys, we can use it to search many
<A>alpha, <A><B>beta and <A><B><C>gamma. If
complex paths through the XML. In this section, we
discuss encoding XML paths as keys for insertion into the we assign A, B and C as the designators for <A>, <B>
5. (a) <invoice> = I (b) Document 1 Document 2
<buyer> = B I B N ABC Corp I B N Oracle Inc
<name> = N I B A 1 Industrial Way I B P 555-1212
<address> = A I S N Acme Inc I S N IBM Corp
<seller> = S I S A 2 Acme Rd. ITC4
<item> = T I T drill I T N nail
<phone> = P
I T C’ 2
<count> = C
I T saw
count (attribute) = C’
I T C’ 3
B Designator 0
I Z
A Normal character data W
... ...
B 1 S
A T 2 N
2 P
... A
N 3 I
A
3
I B A 1 A O I B P 555-
Industrial 1212: I S A 2
Way: Doc 1 Doc 2 Acme Rd.:
Doc 1 I S N I S N IBM
Acme Inc: Corp:
Doc 1 Doc 2
I B N ABC I B N
Corp: Oracle
Doc 1 Inc: Doc 2
(c)
Figure 4. Raw paths.
and <C> respectively, then we can encode the paths in XML document. We have developed a system of alternate
this XML fragment as “A alpha”, “A B beta” and “A B designators to encode order, but do not have space to
C gamma.” This is a prefix encoding of the paths: the discuss those techniques here.
designators, representing the nested tag structure, appear
at the beginning of the key, followed by the data element 3.2.1. Raw path example
at the leaf of the path. This encoding does not require a The XML of Figure 3 can be encoded as a set of raw
pre-existing, regular or static schema for the data. paths. First, we assign designators to tags, as shown in
The alternative is infix encoding, in which data Figure 4(a). Next, we encode the root-to-leaf paths to
elements are nodes along the path. An infix encoding of produce the keys shown in Figure 4(b). Finally, we insert
the above fragment would be “A alpha B beta C these keys in the Index Fabric to generate the trie shown
gamma.” Here, for clarity, we will follow the convention in Figure 4(c). For clarity, this figure omits the horizontal
of previous work, which is to treat data elements as layers and some parts of the trie.
leaves, and we will focus on the prefix encoding.
Tags can contain attributes (name/value pairs.) We 3.3. Refined paths
treat attributes like tagged children; e.g. <A
Refined paths are specialized paths through the XML that
B=“alpha”>… is treated as if it were <A><B>alpha optimize frequently occurring access patterns. Refined
</B>…. The result is that attributes of <A> appear as paths can support queries that have wildcards, alternates
siblings of the other tags nested within <A>. The label and different constants.
“B” is assigned different designators when it appears as a For example, we can create a refined path that is
tag and an attribute (e.g. B=tag, B’=attribute). tuned for a frequently occurring query over the XML in
At any time, a new document can be added to the raw Figure 3, such as “find the invoices where company X
path index, even if its structure differs from previously sold to company Y.” Answering this query involves
indexed documents. The root-to-leaf paths in the finding <buyer> tags that are siblings of a <seller>
document are encoded as raw path keys, and inserted into tag within the same <invoice> tag. First, we assign a
the fabric. New tags that did not exist in the index designator, such as “Z,” to the path. (Recall that
previously can be assigned new designators “on-the-fly” designators are just special characters or strings shown
as the document is being indexed. Currently, this process here in boldface for clarity.) Next, we encode the
does not preserve the sequential ordering of tags in the
6. information indexed by this refined path in an Index Raw paths can also be used to accelerate general
Fabric key. If “Acme Inc” sold items to “ABC Corp,” we path expressions, which are vital for dealing with data
would create a key of the form “Z ABC Corp Acme that has irregular or changing structure because they
Inc.” Finally, we insert the keys we have created into the allow for alternates, optional tags and wildcards. We
fabric. The keys refer to the XML fragments or expand the query into multiple simple path expressions,
documents that answer the query. (See Section 3.4.) and evaluate each using separate key lookup operators.
This encoding scheme is similar to that used for raw Thus, the path expression A.(B1|B2).C results in
paths, with designators and data elements in the same key. searches for A.B1.C and A.B2.C. This means multiple
In a sense, we are overloading the metaphor of encoding traversals but each traversal is a simple, efficient lookup.
paths as strings to support optimizing specific queries by If the query contains wildcards, then it expands to an
encoding specialized paths. Raw and refined paths are infinite set. For example, A.(%)*.C means find every
kept in the same index and accessed using string lookups. <C> that has an ancestor <A>. To answer this query, we
Adding new documents to the refined path index is start by using a prefix key lookup operator to search for
accomplished in two steps. First, the new documents are the “A” prefix, and then follow every child of the “A”
parsed to extract information matching the access pattern prefix node to see if there is a “C” somewhere down
of the refined path. Then, this information is encoded as below. Because we “prefix-encode” all of the raw paths,
an Index Fabric key and inserted into the index. Changes we can prune branches deeper than the designators (e.g.
to refined paths are reflected in simple key updates. after we see the first non-designator character.)
The database administrator decides which refined We can further prune the traversal using another
paths are appropriate. As with any indexing scheme,
structure that summarizes the XML hierarchy. For
creating a new access path requires scanning the database example, Fernandez and Suciu [15] describe techniques
and extracting the keys for insertion into the index. Our for utilizing partial knowledge of a graph structure to
structure grows slowly as new keys are inserted. Thus, prune or rewrite general path expressions.
unlike previous indexing schemes, we can pre-optimize a Queries that correspond to refined paths can be
great many queries without worrying about resource further optimized. The query processor identifies the
contention between different indexes.
query as corresponding to a refined path, and translates
the query into a search key. For example, a query “Find
3.4. Combining the index with a storage manager all invoices where ABC Corp bought from Acme Inc”
Because the Index Fabric is an index, it does not dictate a becomes “Z ABC Corp Acme Inc.” The index is
particular architecture for the storage manager of the searched using the key find the relevant XML. The search
database system. The storage manager can take a number uses the horizontal layers and is very efficient; even if
of forms. The indexed keys can be associated with there are many millions of indexed elements, the answer
pointers that refer to flat text files, tuples in a relational can be found using at most a single index I/O.
system, or objects in a native XML database. In any case,
searching the fabric proceeds as described, and the 4. Experimental results
returned pointers are interpreted appropriately by the
database system. In our implementation, both the index We have conducted performance experiments of our
blocks and the actual XML data are stored in a relational indexing mechanism. We stored an XML-encoded data
database system. Thus, we leverage the maturity of the set in a popular commercial relational database system2,
RDBMS, including concurrency and recovery features. and compared the performance of queries using the
DBMS’ native B-tree index versus using the Index Fabric
3.5. Accelerating queries using the Index Fabric implemented on top of the same database system. Our
performance results thus represent an “apples to apples”
Path expressions are a central component of comparison using the same storage manager.
semistructured query languages (e.g. Lorel [2] or Quilt
[6]). We focus on selection using path expressions, that is, 4.1. Experimental setup
choosing which XML documents or fragments answer the
query, since that is the purpose of an index. We assume The data set we used was the DBLP, the popular
that an XML database system could use a standard computer science bibliography [11]. The DBLP is a set of
approach, such as XSLT [31], to perform projection. XML-like documents; each document corresponds to a
A simple path expression specifies a sequence of tags single publication. There are over 180,000 documents,
starting from the root of the XML. For example, the query totaling 72 Mb of data, grouped into eight classes (journal
“Find invoices where the buyer is ABC Corp” asks for article, book, etc.) A document contains information
XML documents that contain the root-to-leaf path about the type of publication, the title of the publication,
“invoice.buyer.name.`ABC Corp’.” We use a 2
key lookup operator to search for the raw path key The license agreement prohibits publishing the name of the
DBMS with performance data. We refer to it as “the RDBMS.”
corresponding to the simple path expression.
Our system can interoperate with any SQL DBMS.
7. <article key=quot;Codd70quot;> Query Description
<author>E. F. Codd</author>, A Find books by publisher
<title>A Relational Model of Data for Large
Shared Data Banks.</title>, B Find conference papers by author
<pages>377-387</pages>, C Find all publications by author
<year>1970</year>, D Find all publications by co-authors
<volume>13</volume>, E Find all publications by author and year
<journal>CACM</journal>,
<number>6</number>, Table 1. Queries.
<url>db/journals/cacm/cacm13.html#Codd70</url> Conference and journal paper information that does not fit
<ee>db/journals/cacm/Codd70.html</ee>
<cdrom>CACMs1/CACM13/P377.pdf</cdrom> into the SM tables is stored in overflow buckets along
</article> with other types of publications (such as books.)
Figure 5. Sample DBLP document. To evaluate a query over the STORED mapping, the
query processor may have to examine the SM tables, the
the authors, and so on. A sample document is shown in overflow buckets, or both. We created the following key-
Figure 5. Although the data is somewhat regular (e.g. compressed B-tree indexes:
every publication has a title) the structure varies from • An index on each of the author attributes in the
document to document: the number of authors varies, inproceedings and articles SM tables.
some fields are omitted, and so on. • An index on the booktitle attribute (e.g., conference
We used two different methods of indexing the XML name) in the inproceedings table.
via the RDBMS’ native indexing mechanism. The first • An index on the id attribute of each SM table; the id
method, the basic edge-mapping, treats the XML as a set joins with roots(id) in the overflow buckets.
of nodes and edges, where a tag or atomic data element For both the edge and STORED mapping it was necessary
corresponds to a node and a nested relationship to hand tune the query plans generated by the RDBMS,
corresponds to an edge. The database has two tables, since the plans that were automatically tended to us
roots(id,label) and edges(parentid,childid,label). The inefficient join algorithms. We were able to significantly
roots table contains a tuple for every document, with an id improve the performance (e.g. reducing the time to
for the document, and a label, which is the root tag of the execute thousands of queries from days to hours).
document. The edges table contains a tuple for every The Index Fabric contained both raw paths and
nesting relationship. For nested tags, parentid is the ID of refined paths for the DBLP documents. The fabric blocks
the parent node, childid is the ID of the child node, and were stored in an RDBMS table. All of the index schemes
label is the tag. For leaves (data elements nested within we studied index the document IDs. Thus, a query
tags), childid is NULL, and label is the text of the data processor will use an index to find relevant documents,
element. For example, the XML fragment retrieve the complete documents, and then use a post-
<book><author>Jane Doe</author></book>
processing step (e.g. with XSLT) to transform the found
is represented by the tuple (0,book) in roots and the tuples
documents into presentable query results. Here, we focus
(0,1,author) and (1,NULL,Jane Doe) in edges. (Keeping
on the index lookup performance.
the leaves as part of the edges table offered better
All experiments used the same installation of the
performance than breaking them into a separate table.)
RDBMS, running on an 866 MHz Pentium III machine,
We created the following key-compressed B-tree indexes:
with 512 Mb of RAM. For our experiments, we set the
• An index on roots(id), and an index on roots(label).
cache size to ten percent of the data set size. For the edge-
• An index on edges(parentid), an index on mapping and STORED mapping schemes, the whole
edges(childid), and an index on edges(label). cache was devoted to the RDBMS, while in the Index
The second method of indexing XML using the Fabric scheme, half of the cache was given to the fabric
DBMS’ native mechanism is to use the relational and half was given to the RDBMS. In all cases,
mapping generated by the STORED [12] system to create experiments were run on a cold cache. The default
a set of tables, and to build a set of B-trees over the tables. RDBMS logging was used both for queries over the
We refer to this scheme as the STORED mapping. relational mappings and queries over the Index Fabric.
STORED uses data mining to extract schemas from the We evaluated a series of five queries (Table 1) over
data based on frequently occurring structures. The the DBLP data. We ran each query multiple times with
extracted schemas are used to create “storage-mapped different constants; for example, with query B, we tried
tables” (SM tables). Most of the data can be mapped into 7,000 different authors. In each case, 20 percent of the
tuples and stored in the SM tables, while more irregularly query set represented queries that returned no result
structured data must be stored in overflow buckets, similar because the key was not in the data set.
to the edge mapping. The schema for the SM tables was The experimental results are summarized in Table 2.
obtained from the STORED investigators [13]. The SM
(The ∆ column is speed-up versus edge mapping.) In each
tables identified for the DBLP data are inproceedings, for
case, our index is more efficient than the RDBMS alone,
conference papers, and articles, for journal papers.
with more than an order of magnitude speedup in some
8. I/O - Blocks Time - Seconds
Edge Map STORED Raw path Refined path Edge Map STORED Raw path Refined path
value ∆ value ∆ value ∆ value ∆ value ∆ value ∆ value ∆ value ∆
A 416 1.0 370 1.1 13 32.0 - - 6 1.0 4 1.5 0.83 7.2 - -
B 68788 1.0 26490 2.6 6950 9.9 - - 1017 1.0 293 3.5 81 12.6 - -
C 69925 1.0 61272 1.1 34305 2.0 20545 3.4 1056 1.0 649 1.6 397 2.7 236 4.5
D 353612 1.0 171712 2.1 89248 4.0 17337 20.4 5293 1.0 2067 2.6 975 5.4 208 25.4
E 327279 1.0 138386 2.4 113439 2.9 16529 19.8 4835 1.0 1382 3.5 1209 4.0 202 23.9
Table 2. Experimental results.
instances. We discuss the queries and results next. conference papers represent 57 percent of the DBLP
publications. We chose this query because it uses a single
4.2. Query A: Find books by publisher SM table in the STORED mapping. The SM table
Query A accesses a small portion of the DBLP database, generated by STORED for conference papers has three
since out of over 180,000 documents, only 436 author attributes, and overflow buckets contain any
correspond to books. This query is also quite simple, additional authors. In fact, the query processor must take
since it looks for document IDs based on a single root-to- the union of two queries: first, find document IDs by
leaf path, “book.publisher.X” for a particular X. author in the inproceedings SM table, and second, query
Since it can be answered using a single lookup in the raw any inproceedings.author.X paths in the roots
path index, we have not created a refined path. The query and edges overflow tables. Both queries are supported by
can be answered using the basic edge-mapping by B-trees. The edge mapping uses a similar query to the
selecting “book” tuples from the roots table, joining the overflow buckets. The query is answered with one raw
results with “publisher” tuples from the edges table, and path lookup (for inproceedings.author.X) and
joining again with the edges table to find data elements we did not create a refined path.
“X”. The query cannot be answered from the storage The results in Table 2 are for queries with 7,000
mapped tables (SM tables) in the STORED mapping. different author names. Raw paths are much more
Because books represent less than one percent of the efficient, with an order of magnitude less time and I/O’s
DBLP data, they are considered “overflow” by STORED than the edge mapping, and 74 percent fewer I/Os and 72
and stored in the overflow buckets. percent less time than the STORED mapping. We have
The results for query A are shown in Table 2, and plotted the I/Os in Figure 7 with the block reads for index
represent looking for 48 different publishers. The raw blocks and for data blocks (to retrieve document IDs)
path index is much faster than the edge mapping, with a broken out; the data reads for the Index Fabric include the
97 percent reduction in block reads and an 86 percent result verification step for the Patricia trie. For the
reduction in total time. The raw path index is also faster STORED mapping, Figures 6 and 7 separate I/Os to the
than accessing the STORED overflow buckets, with 96 edge-mapped overflow buckets and I/Os to the SM tables.
percent fewer I/Os and 79 percent less time. Note that the Although SM tables can be accessed efficiently (via a
overflow buckets require less time and I/Os than the edge B-trees on the author attributes), the need to go to the
mapping because the overflow buckets do not contain the overflow buckets to complete the query adds significant
information stored in the SM tables, while the edge overhead. The performance of the edge mapping, which is
mapping contains all of the DBLP information and an order of magnitude slower than the raw paths,
requires larger indexes. confirms that this process is expensive. This result
These results indicate that it can be quite expensive illustrates that when some of the data is irregularly
to query semistructured data stored as edges and structured (even if a large amount fits in the SM tables),
attributes. This is because multiple joins are required then the performance of the relational mappings (edge
between the roots and edges table. Even though indexes and STORED) suffers.
support these joins, multiple index lookups are required,
and these increase the time to answer the query. 4.4. Other queries
Moreover, the DBLP data is relatively shallow, in that the Query C (find all document IDs of publications by
path length from root to leaf is only two edges. Deeper author X) contains a wildcard, since it searches for the
XML data, with longer path lengths, would require even path “(%)*.author.X.” The results in Table 2
more joins and thus more index lookups. In contrast, a represent queries for 10,000 different author names.
single index lookup is required for the raw paths. Query D seeks IDs of publications co-authored by
author “X” and author “Y.” This is a “sibling” query that
4.3. Query B: Find conference papers by author looks for two tags nested within the same parent tag. The
This query accesses a large portion of the DBLP, as results in Table 2 are for queries on 10,000 different pairs
9. Query B I/Os Query D I/Os
80000 400000
70000 350000
60000 300000
50000 250000
I/Os
I/Os
40000 200000
30000 150000
20000 100000
10000 50000
0 0
Raw paths STORED Edge mapping Refined Raw paths STORED Edge
mapping paths mapping mapping
index I/O data I/O index I/O - edge data I/O - edge index I/O data I/O index I/O - edge data I/O - edge
Figure 6. Query B: find conference paper by author. Figure 7. Query D: Find publications by co-authors.
of authors, and the I/Os are shown in Figure 7. [27], precomputes joins so that at query time, specific
Query E (find IDs of publications by author X in year queries are very efficient. This idea is similar in spirit to
Y) also seeks a sibling relationship, this time between our raw and refined paths. However, a separate join index
<author> and <year>. The difference is that while must be built for each access path. Moreover, a join index
<author> is very selective (with over 100,000 unique is sensitive to key length, and is usually only used for a
authors), there are only 58 different years (including items single join, not a whole path.
such as “1989/1990”). Consequently, there are a large Path navigation has been studied in object oriented
number of documents for each year. The results in Table (OO) databases. OO databases use sequences [4,22] or
2 are for 10,000 author/year pairs. hierarchies of path indexes [32] to support long paths,
The results shown in Table 2 illustrate that irregularly requiring multiple index lookups per path. Our
structured data is a significant obstacle to managing mechanism supports following paths with a single index
semistructured data in a relational system. For the lookup. Also, OO indexes support linear paths, requiring
STORED mapping, the SM tables can be accessed multiple indexes to evaluate “branchy” queries. Our
efficiently, but the queries cannot be fully answered structure provides a single index for all queries, and one
without costly access to the overflow buckets. The edge lookup to evaluate the query using a refined path. Third,
mapping (which treats all data as irregularly structured) is semistructured data requires generalized path expressions
even less efficient, since every query must be evaluated in order to navigate irregular structure. Although
using expensive self-joins. Thus, even though there are Christophides et al. [8] have studied this problem, their
multiple raw path lookups for queries C, D and E, the raw work focuses on query rewriting and not indexes, and our
paths outperform the relational mappings in each case. mechanism could utilize their techniques (or those of
Moreover, the refined paths offer a significant [15]) to better optimize generalized path expressions over
optimization, especially for complex queries. raw paths. Finally, OO indexes must deal with class
inheritance [7], while XML indexes do not.
5. Related work Text indexing has been studied extensively in both
structured and unstructured databases. Suffix arrays and
The problem of storing, indexing and searching compressed suffix arrays [16], based on Patricia tries,
semistructured data has gained increasing attention provide partial-match searching rather than path
[1,5,6,23]. Shanmugasundaram et al [26] have navigation. Several data and query models for structured
investigated using DTD’s to map the XML data into data besides XML have been studied [3]; our techniques
relational tables. The STORED system extracts the can be adapted for these other models. Others have
schema from the data itself using data mining [12]. Both extended text indexes and multidimensional indexes to
[26] and [12] note that it is difficult to deal with data that deal with structured data [20]; our structural encoding is
has irregular or variable structure. Florescu and new, and we deal with all of the structure in one index.
Kossmann have examined storing XML in an RDBMS as The Index Fabric is a balanced structure like a B-tree
a set of attributes and edges, using little or no knowledge [9], but unlike the B-tree, scales well to large numbers of
of the document structure [17], for example, the edge keys and is insensitive to the length or complexity of
mapping we examine here. Other systems store the data keys. Diwan et al have examined taking general graph
“natively” using a semistructured data model [24,28,29]. structures and providing balanced, disk based access [14].
Evaluating path expressions in these systems usually Our structure is optimized specifically for Patricia tries.
requires multiple index lookups [23]. Raw paths are
conceptually similar to DataGuides [18].
A join index, such as that proposed by Valduriez
10. 6. Conclusions at http://www.rightorder.com/technology/overview.pdf.
We have investigated encoding paths through [11] DBLP Computer Science Bibliography. At http://www.-
semistructured data as simple strings, and performing informatik.uni-trier.de/~ley/db/.
string lookups to answer queries. We have investigated [12] A. Deutsch, M. Fernandez and D. Suciu. Storing
two options: raw paths, which assume no a priori semistructured data with STORED. In Proc. SIGMOD,
knowledge of queries or structure, and refined paths, 1999.
which take advantage of such knowledge to achieve [13] Alin Deutsch. Personal communication, January 24, 2001.
further optimization. Our techniques rely on the Index
[14] A. A. Diwan et al. Clustering techniques for minimizing
Fabric for high performance string lookups over a large external path length. In Proc. 22nd VLDB, 1996.
set of non-uniform, long, and complex strings. While the
indexing mechanisms of an RDBMS or semistructured [15] M. Fernandez and D. Suciu. Optimizing regular path
data repository can provide some optimization, they have expressions using graph schemas. In Proc. ICDE, 1998.
difficulty achieving the high performance possible with [16] P. Ferragina and G. Manzini. An experimental study of a
our techniques. Our experimental results confirm that compressed index. In Proc. ACM-SIAM SODA, 2001.
implementing our techniques on top of an RDBMS offers [17] D. Florescu and D. Kossmann. A performance evaluation
a significant improvement over using the RDBMS’s of alternative mapping schemes for storing XML data in a
native indexes for semistructured data. This is especially relational database. INRIA Technical Report 3684, 1999.
true if the query is complex or branchy, or accesses [18] R. Goldman and J. Widom. DataGuides: enabling query
“irregular” portions of the data (that must be stored in formulation and optimization in semistructured databases.
overflow buckets). Clearly, the Index Fabric represents an In Proc. 23rd VLDB, pages 436-445, 1997.
effective way to manage semistructured data.
[19] Alon Itai. The JS Data Structure. Technical report, 1999.
[20] H. V. Jagadish, N. Koudas and D. Srivastava. On effective
Acknowledgements multi-dimensional indexing for strings. In Proc.
The authors would like to thank Alin Deutsch, Mary SIGMOD, 2000.
Fernandez and Dan Suciu for the use of their STORED
[21] Donald Knuth. The Art of Computer Programming, Vol.
results for the DBLP data. We also want to thank Donald III, Sorting and Searching, Third Edition. Addison
Kossmann for helpful comments on a draft of this paper. Wesley, Reading, MA, 1998.
[22] W. C. Lee and D. L. Lee. Path Dictionary: A New
References Approach to Query Processing in Object-Oriented
[1] S. Abiteboul. Querying semi-structured data. In Proc. Databases. IEEE TKDE, 10(3): 371-388, May/June 1998.
ICDT, 1997.
[23] Jason McHugh and Jennifer Widom. Query Optimization
[2] S. Abiteboul et al. The Lorel query language for for XML. In Proc. 25th VLDB, 1999.
semistructured data. Int. J. on Digital Libraries 1(1): 68- [24] J. McHugh et al. Lore: A Database Management System
88, 1997.
for Semistructured Data. SIGMOD Record, 26(3): 54-66,
[3] R. Baeza-Yates and G. Navarro. Integrating contents and 1997.
structure in text traversal. SIGMOD Record 25(1): 67-79, [25] Oracle Corp. Oracle 9i database. http://www.oracle.-
1996. com/ip/deploy/database/9i/index.html.
[4] E. Bertino. Index configuration in object-oriented [26] J. Shanmugasundaram et al. Relational databases for
databases. VLDB Journal 3(3): 355-399, 1994.
querying XML documents: Limitations and opportunities.
[5] P. Buneman et al. A query language and optimization In Proc. 25th VLDB, 1999.
techniques for unstructured data. In Proc. SIGMOD, 1996. [27] P. Valduriez. Join Indices. TODS 12(2): 218-246, 1987.
[6] D. Chamberlain, J. Robie and D. Florescu. Quilt: An [28] Software AG. Tamino XML database. http://www.-
XML query language for heterogeneous data sources. In softwareag.com/tamino/.
Proc. WebDB Workshop, 2000.
[29] XYZFind. XML Database. http://www.xyzfind.com.
[7] S. Choenni et al. On the selection of optimal index
configuration in OO databases. In Proc. ICDE, 1994. [30] W3C. Extensible Markup Language (XML) 1.0 (Second
Edition). W3C Recommendation, October 6, 2000. See
[8] V. Christophides, S. Cluet and G. Moerkottke. Evaluating http://www.w3.org/TR/2000/REC-xml-20001006.
queries with generalized path expressions. In Proc.
SIGMOD, pages 413-422, 1996. [31] W3C. XSL Transformations (XSLT) 1.0. W3C
Recommendation, November 16, 1999. See
[9] D. Comer. The ubiquitous B-tree. Computing Surveys http://www.w3.org/TR/1999/REC-xslt-19991116.
11(2): 121-137, 1979.
[32] Z. Xie and J. Han. Join index hierarchies for supporting
[10] B. Cooper and M. Shadmon. The Index Fabric: A efficient navigations in object-oriented databases. In Proc.
mechanism for indexing and querying the same data in VLDB, 1994.
many different ways. Technical Report, 2000. Available