This document discusses designing a data warehouse to integrate XML data sources. It first reviews multidimensional modeling concepts and the design process, which involves building an attribute tree from the source schema and defining facts, dimensions and measures. It then examines how relationships can be modeled in XML schemas using sub-elements or ID/IDREF attributes. The paper proposes a semi-automatic approach to build the data warehouse conceptual schema by inferring information from the XML source, with potential designer input if needed.
A Survey on Heterogeneous Data Exchange using XmlIRJET Journal
This document summarizes a research paper on heterogeneous data exchange using XML. It discusses how XML has become a standard for data transmission due to its flexibility, extensibility and ability to represent heterogeneous data. The document then reviews related work on XML data exchange and mapping between relational and XML models. It also describes the process of exporting data from a source database to XML, importing XML data by validating, transforming and storing it in the target database, and transmitting data between different servers.
Download Complete Material - https://www.instamojo.com/prashanth_ns/
Course Outline...
• Identify the need for XML as a standard data interchange format
• Identify the structure of XML documents
• Create an XML schema
• Declare attributes in an XML schema
• Identify the need for XML namespaces
• Reuse XML schema components
• Create groups of elements and attributes in an XML schema
• Transform an XML document through a Cascading Style Sheet
• Transform an XML document through Extensible Style Sheet Language
• Perform conditional formatting
• Use XPath pattern
• Present data in different formats
• Identify the XML Document Object Model
• Validate an XML document against an XML schema using the Document Object Model
• Apply a Style Sheet to an XML document
XML is a standard of data exchange between web applications such as in e-commerce, elearning
and other web portals. The data volume has grown substantially in the web and in
order to effectively retrieve or store these data, it is recommended to be physically or virtually
fragmented and distributed into different nodes. Basically, fragmentation design contains of
two parts: fragmentation operation and fragmentation method. There are three different kinds
of fragmentation operation: Horizontal, Vertical and Hybrid, determines how the XML should
be fragmented. The aim of this paper is to give an overview on the fragmentation design
consideration.
Here are the key advantages and disadvantages of service-oriented architecture (SOA):
Advantages:
- Loose coupling: Services can easily be modified or replaced without affecting other services. This improves flexibility and maintainability.
- Reusability: Services can be reused in different applications, reducing development time and costs.
- Scalability: It's easier to increase or decrease capacity by adding/removing services as needs change.
- Interoperability: Services use standards-based interfaces, making it easier for different systems to communicate.
Disadvantages:
- Complexity: SOA introduces additional layers of abstraction and complexity in architecture, development, and management.
- Performance overhead: Additional processing is required for service abstraction
2008 Industry Standards for C2 CDM and FrameworkBob Marcus
The document discusses standards for data modeling, metadata tagging, XML processing and schemas, and transport that enable interoperability across networks with varying degrees of coupling between systems. It categorizes use cases as intranet-centric within a single organization, extranet-centric across organizations with common standards, and internet-centric for ad hoc interactions. The relationships between these categories and appropriate standards are illustrated. Key standards discussed include XML schemas, RDF, OWL, and data models for command and control like C2IEDM.
SALES BASED DATA EXTRACTION FOR BUSINESS INTELLIGENCEcscpconf
Data warehouse use has increased significantly in recent years and now plays a fundamental role in many organizations’ decision-support processes. An effective business intelligence
infrastructure that leverages the power of a data warehouse can deliver value by helping companies enhance their customer experience. In this paper is to generate reports with various
drilldowns and slier conditions with suitable parameters which provide a complete business solution which is helpful for monitor the company inflow and outflow. The goal of the work is
for potential users of the data warehouse in their decision making process in the Business process system to get a complete visual effort of those reports by creating the chart and grid interface from warehouse. The example in this paper relate directly to the Adventure Work Data Warehouse Project implementation which helps to know the internet sales amount according to different date
Vision Based Deep Web data Extraction on Nested Query Result RecordsIJMER
This document summarizes a research paper on vision-based deep web data extraction from nested query result records. It proposes a technique to extract data from web pages using different font styles, sizes, and cascading style sheets. The extracted data is then aligned into a table using alignment algorithms, including pair-wise, holistic, and nested-structure alignment. The goal is to remove immaterial information from query result pages to facilitate analysis of the extracted data.
The document discusses newer data models including object-relational and XML models. The object-relational model combines relational and object-oriented models, providing greater flexibility and functionality over previous relational databases. XML has a simple data model that allows for more complex models to be built on top. While object-oriented databases are used for niche applications, object-relational databases remain the dominant model for business applications due to their conceptual simplicity, query languages, and ability to support both structured and unstructured data.
A Survey on Heterogeneous Data Exchange using XmlIRJET Journal
This document summarizes a research paper on heterogeneous data exchange using XML. It discusses how XML has become a standard for data transmission due to its flexibility, extensibility and ability to represent heterogeneous data. The document then reviews related work on XML data exchange and mapping between relational and XML models. It also describes the process of exporting data from a source database to XML, importing XML data by validating, transforming and storing it in the target database, and transmitting data between different servers.
Download Complete Material - https://www.instamojo.com/prashanth_ns/
Course Outline...
• Identify the need for XML as a standard data interchange format
• Identify the structure of XML documents
• Create an XML schema
• Declare attributes in an XML schema
• Identify the need for XML namespaces
• Reuse XML schema components
• Create groups of elements and attributes in an XML schema
• Transform an XML document through a Cascading Style Sheet
• Transform an XML document through Extensible Style Sheet Language
• Perform conditional formatting
• Use XPath pattern
• Present data in different formats
• Identify the XML Document Object Model
• Validate an XML document against an XML schema using the Document Object Model
• Apply a Style Sheet to an XML document
XML is a standard of data exchange between web applications such as in e-commerce, elearning
and other web portals. The data volume has grown substantially in the web and in
order to effectively retrieve or store these data, it is recommended to be physically or virtually
fragmented and distributed into different nodes. Basically, fragmentation design contains of
two parts: fragmentation operation and fragmentation method. There are three different kinds
of fragmentation operation: Horizontal, Vertical and Hybrid, determines how the XML should
be fragmented. The aim of this paper is to give an overview on the fragmentation design
consideration.
Here are the key advantages and disadvantages of service-oriented architecture (SOA):
Advantages:
- Loose coupling: Services can easily be modified or replaced without affecting other services. This improves flexibility and maintainability.
- Reusability: Services can be reused in different applications, reducing development time and costs.
- Scalability: It's easier to increase or decrease capacity by adding/removing services as needs change.
- Interoperability: Services use standards-based interfaces, making it easier for different systems to communicate.
Disadvantages:
- Complexity: SOA introduces additional layers of abstraction and complexity in architecture, development, and management.
- Performance overhead: Additional processing is required for service abstraction
2008 Industry Standards for C2 CDM and FrameworkBob Marcus
The document discusses standards for data modeling, metadata tagging, XML processing and schemas, and transport that enable interoperability across networks with varying degrees of coupling between systems. It categorizes use cases as intranet-centric within a single organization, extranet-centric across organizations with common standards, and internet-centric for ad hoc interactions. The relationships between these categories and appropriate standards are illustrated. Key standards discussed include XML schemas, RDF, OWL, and data models for command and control like C2IEDM.
SALES BASED DATA EXTRACTION FOR BUSINESS INTELLIGENCEcscpconf
Data warehouse use has increased significantly in recent years and now plays a fundamental role in many organizations’ decision-support processes. An effective business intelligence
infrastructure that leverages the power of a data warehouse can deliver value by helping companies enhance their customer experience. In this paper is to generate reports with various
drilldowns and slier conditions with suitable parameters which provide a complete business solution which is helpful for monitor the company inflow and outflow. The goal of the work is
for potential users of the data warehouse in their decision making process in the Business process system to get a complete visual effort of those reports by creating the chart and grid interface from warehouse. The example in this paper relate directly to the Adventure Work Data Warehouse Project implementation which helps to know the internet sales amount according to different date
Vision Based Deep Web data Extraction on Nested Query Result RecordsIJMER
This document summarizes a research paper on vision-based deep web data extraction from nested query result records. It proposes a technique to extract data from web pages using different font styles, sizes, and cascading style sheets. The extracted data is then aligned into a table using alignment algorithms, including pair-wise, holistic, and nested-structure alignment. The goal is to remove immaterial information from query result pages to facilitate analysis of the extracted data.
The document discusses newer data models including object-relational and XML models. The object-relational model combines relational and object-oriented models, providing greater flexibility and functionality over previous relational databases. XML has a simple data model that allows for more complex models to be built on top. While object-oriented databases are used for niche applications, object-relational databases remain the dominant model for business applications due to their conceptual simplicity, query languages, and ability to support both structured and unstructured data.
The document discusses XML schemas and how they are used in web services. Some key points:
1. XML schemas formally describe the structure and content of XML documents, defining elements, attributes, data types, and relationships between elements.
2. Schemas are used to validate XML documents and ensure they conform to the defined structure.
3. The document discusses how BT uses XML schemas to describe messages exchanged between web services and the importance of consistent implementation to allow for interoperability.
4. It also mentions best practices like using test cases and focusing on description to communicate how aspects of schemas can be relied upon.
The document discusses physical database requirements and defines three stages of database design: conceptual, logical, and physical. It provides details on each stage, including that physical database design implements the logical data model in a DBMS and involves selecting file storage and ensuring efficient access. The document also covers database architectures, noting that a three-tier architecture separates the user applications from the physical database.
This document discusses challenges in developing master data models across multiple domains. Some key challenges include conflicting data structures and semantics between different models, the expectation that each real-world entity should have only one master record even when represented in different domains, and the need to create horizontal views across domains to provide full visibility of entity data. The document argues that a governed, model-driven approach is needed to reduce duplication and inconsistencies when integrating multiple legacy models into a unified master data environment.
BI Architecture in support of data qualityTom Breur
Business intelligence (BI) projects that involve substantial data integration have often proven failure prone and difficult to plan. Data quality issues trigger rework, which makes it difficult to accurately schedule deliverables. Two things can bring improvement. Firstly, one should deliver information products in the smallest possible chunks, but without adding prohibitive overhead for breaking up the work in tiny increments. This will increase the frequency and improve timeliness of feedback on suitability of information products and hence make planning and progress more predictable. Secondly, BI teams need to provide better stewardship when they facilitate discussions between departments whose data cannot easily be integrated. Many so-called data quality errors do not stem from inaccurate source data, but rather from incorrect interpretation of data. This is mostly caused by different interpretation of essentially the same underlying source system facts across departments with misaligned performance objectives. Such problems require prudent stakeholder management and informed negotiations to resolve such differences. In this chapter I suggest an innovation to data warehouse architecture to help accomplish these objectives.
The document discusses using machine learning models for web content mining and news article classification. Specifically, it proposes using a support vector machine (SVM) model with features extracted from the document object model (DOM) tree to classify news articles into categories like title, date, body text, and noise. The SVM model is trained on a manually labeled dataset and can handle the nonlinear and complex patterns in the data better than rule-based models. The preprocessing step prunes noisy leaf nodes from the DOM tree before feature extraction and model training are performed to classify the remaining leaf nodes.
IRJET- SVM-based Web Content Mining with Leaf Classification Unit From DOM-TreeIRJET Journal
This document discusses using machine learning models and DOM tree analysis to extract important content from news articles for the purpose of topic detection. Specifically, it proposes using a support vector machine (SVM) model with "leaf classification units" from the DOM tree to remove noise data like images, ads, and recommended articles. This approach is meant to generalize to different article structures compared to rule-based models. The document reviews related work using DOM trees and statistical data for web content extraction and visual wrappers. It also discusses using various kernel functions in SVMs for non-linearly separable data.
Discussion post· The proper implementation of a database is es.docxmadlynplamondon
Discussion post
· The proper implementation of a database is essential to the success of the data performance functions of an organization. Identify and evaluate at least three considerations that one must plan for when designing a database.
· Suggest at least two types of databases that would be useful for small businesses, two types for regional level organizations and two types for international companies. Include your rationale for each suggestion.
LP’s post states the following:Top of Form
Question:
The proper implementation of a database is essential to the success of the data performance functions of an organization. Identify and evaluate at least three considerations that one must plan for when designing a database.
Answer:
Planning is the most significant aspect of database design, and here is where most projects for database design will fail because the database does not meet requirements, does not meet expectations, or are just unmanageable. Here you need to be forward-thinking by planning for the future. What information needs to be stored or what things or entities do we need to store information about (Knauff, 2004)? What questions will we need to ask of the database (Knauff, 2004)?
A well-designed database promotes consistent data entry and retrieval and reduces the existence of duplication among the database tables. Relational database tables work together to ensure that the correct data is available when you need it.
The first consideration should be what is the database’s intended purpose. Understanding the purpose will help define the need. Some examples might be “to keep a list of customers,” “to manage inventory,” or “to grade students (Filemaker Staff, n.d.).” All stakeholders need to be involved in this process.
Second is Data integrity. Is the data accurate, consistent, and complete? What kind of categories does the data align with? Identifying these categories is critical to designing an efficient database because different types and amounts of data in each category will be stored. Some example categories might be sales that track “customers,” “products,” and “invoices,” or grades that track “students,” “classes,” and “assignments (Filemaker Staff, n.d.).” Once the categories have been defined the relations can be determined. A good exercise to help with this is to write these out in simple sentences:
“customers order products” and “invoices record customers’ orders.”
Now the organization of the data can begin. The categories above can be used as tables so common data can be grouped.
The third is security. Is the database secure? Will the current policy and rules be sufficient going forward? Who should have access? Who should have access to which tables (Nield, 2016)? Read-only access? Write access? Is this database critical to business operations (Nield, 2016)? What are the D&R plans?
Excessive security creates excessive red tape and obstructs agility, but insufficient security will invite catastrophe (Nield, 2016 ...
Catalog-based Conversion from Relational Database into XML Schema (XSD)CSCJournals
Where we are in the age of information revolution, exchange information, and transport data effectively among various sectors of government, commercial, service and industrial, etc., the uses of a new databases model to support this trend has become very important because inability of traditional databases models to support it. eXtensible Markup Language (XML) considers a new standard model for data interchange through internet and mobiles devices networks, it has become a common language to exchange and share the data of traditional models in easy and inexpensive ways. In this research, we propose a new technique to convert the relational database contents and schema into XML schema (XSD- XML Schema Definition), the main idea of the technique is extracting relational database catalog using Structured Query Language (SQL). We follow three steps to complete the conversion process. First, extracting relation instance (actual content) and schema catalog using SQL query language, which consider the required information to implement XML document and its schema. Second, transform the actual content into XML document tree. The idea of this step is converting table columns of the relations (tables) into the elements of XML document. Third, transform schema catalog into XML schema for describing the structure of the XML document. To do so, we transform datatype of the elements and the variant data constrains such as data length, not null, check and default, moreover define primary foreign keys and the referential integrity between the tables. Overall results of the technique are very promise while the technique is very clear and does not require complex procedures that could adversely effect on the results accuracy. We performed many experiments and report their elapsed CPU times.
OUDG : Cross Model Datum Access with Semantic Preservation for Legacy Databases csandit
Conventional databases are associated with a plurality of database models. Generally database
models are distinct and not interoperable. Data stored in a database under a particular
database model can be termed as “siloed data”. Accordingly, a DBMS associated with a
database silo, is generally not interoperable with another database management system
associated with another database sil. This can limit the exchange of information stored in a
database where those desiring to access the information are not employing a database
management system associated with the database model related to the information. The DBMS
of various data models have proliferated into many companies, and become their legacy
databases. There is a need to access these legacy databases using ODBC. An ODBC is for the
users to transform a legacy database into another legacy database. This paper offers an end
user’s tool of Open Universal Database Gateway(OUDG) to supplement ODBC by
transforming a source legacy database data into Flattened XML documents, and further
transform Flattened XML document into a target legacy database. The Flattened XML
document is a mixture of relational and XML data models, which is user friendly and is a data
standard on the Internet. The result of reengineering legacy databases into each other through
OUDG is information lossless by the preservation of their data semantics in terms of data
dependencies.
Conventional databases are associated with a plurality of database models. Generally database
models are distinct and not interoperable. Data stored in a database under a particular
database model can be termed as “siloed data”. Accordingly, a DBMS associated with a
database silo, is generally not interoperable with another database management system
associated with another database sil. This can limit the exchange of information stored in a
database where those desiring to access the information are not employing a database
management system associated with the database model related to the information. The DBMS
of various data models have proliferated into many companies, and become their legacy
databases. There is a need to access these legacy databases using ODBC. An ODBC is for the
users to transform a legacy database into another legacy database. This paper offers an end
user’s tool of Open Universal Database Gateway(OUDG) to supplement ODBC by
transforming a source legacy database data into Flattened XML documents, and further
transform Flattened XML document into a target legacy database. The Flattened XML
document is a mixture of relational and XML data models, which is user friendly and is a data
standard on the Internet. The result of reengineering legacy databases into each other through
OUDG is information lossless by the preservation of their data semantics in terms of data
dependencies.
Data modeling is the process of creating a visual representation of data within an information system to illustrate the relationships between different data types and structures. The goal is to model data at conceptual, logical, and physical levels to support business needs and requirements. Conceptual models provide an overview of key entities and relationships, logical models add greater detail, and physical models specify how data will be stored in databases. Data modeling benefits include reduced errors, improved communication and performance, and easier management of data mapping.
A relational model of data for large shared data banksSammy Alvarez
This document introduces the relational model of data organization for large shared databases. It discusses inadequacies of existing tree-structured and network models, including ordering, indexing, and access path dependencies that impair data independence. The relational model represents data as mathematical n-ary relations and relationships between domains, providing independence from representation changes. It allows a clearer evaluation of existing systems and competing internal representations. The relational view forms a basis for treating issues like derivability, redundancy, and consistency in a sound way.
The document provides an overview of approaches for clustering XML data based on structure and content. It first outlines applications where XML clustering is useful, including XML query processing and data integration. It then presents a generic framework for XML clustering with three phases: data representation, similarity computation, and clustering/grouping. The document surveys current approaches and aims to classify them and identify common features. It also discusses challenges in XML clustering and future research directions.
P REFIX - BASED L ABELING A NNOTATION FOR E FFECTIVE XML F RAGMENTATIONijcsit
XML is
gradually
emplo
yed as
a standard of data exchange
in
web
environment
since its inception
in the
90s
until
present
.
It
serves
as a data exchange between system
s
and other application
s
.
Meanwhile t
he data
volume has grown substantially
in the web and
thus effective methods
of
storing and retrieving
these
data
is
essential
.
One recommended way is
p
hysically or virtually
fragments
the large chunk of data
and
distributes
the fragments
into different nodes.
F
ragmentation design
of XML document
contains of two
parts: fragmentat
ion operation and fragmentation method. The
three
fragmentation o
peration
s are
Horizontal, Vertical
and Hybrid. It
determines how the XML should be fragmented.
This
paper
aims
to give
an overview on the fragmentation design consideration
and
subsequently,
propose a
fragmentation
technique
using
number addressing
.
The document discusses the Entity Framework, which helps bridge the gap between object-oriented development and databases known as an "impedance mismatch". It generates business objects and entities from database tables and allows CRUD operations and managing relationships. Benefits include writing data access logic in higher-level languages and representing conceptual models with entity relationships. The Entity Framework architecture includes an Entity Data Model layer that maps objects to the database using ADO.NET. The EDM defines conceptual, storage, and mapping layers to program against an object model instead of a relational data model. EDMs can be created from existing databases or by defining a model first.
Formal Models and Algorithms for XML Data InteroperabilityThomas Lee
In this paper, we study the data interoperability problem of web services in terms of XML schema compatibility. When Web Service A sends XML messages to Web Service B, A is interoperable with B if B can accept all messages from A. That is, the XML schema R for B to receive XML instances must be compatible with the XML schema S for A to send XML instances, i.e., A is a subschema of B. We propose a formal model called Schema Automaton (SA) to model W3C XML Schema (XSD) and develop several algorithms to perform different XML schema computations. The computations include schema minimization, schema equivalence testing, subschema testing, and subschema extraction. We have conducted experiments on an e-commerce standard XSD called xCBL to demonstrate the practicality of our algorithms. One experiment has refuted the claim that the xCBL 3.5 XSD is backward compatible with the xCBL 3.0 XSD. Another experiment has shown that the xCBL XSDs can be effectively trimmed into small subschemas for specific applications, which has significantly reduced the schema processing time.
Big Data is used to store huge volume of both structured and unstructured data which is so large and is
hard to process using current / traditional database tools and software technologies. The goal of Big Data
Storage Management is to ensure a high level of data quality and availability for business intellect and big
data analytics applications. Graph database which is not most popular NoSQL database compare to
relational database yet but it is a most powerful NoSQL database which can handle large volume of data in
very efficient way. It is very difficult to manage large volume of data using traditional technology. Data
retrieval time may be more as per database size gets increase. As solution of that NoSQL databases are
available. This paper describe what is big data storage management, dimensions of big data, types of data,
what is structured and unstructured data, what is NoSQL database, types of NoSQL database, basic
structure of graph database, advantages, disadvantages and application area and comparison of various
graph database.
Big Data is used to store huge volume of both structured and unstructured data which is so large and is
hard to process using current / traditional database tools and software technologies. The goal of Big Data
Storage Management is to ensure a high level of data quality and availability for business intellect and big
data analytics applications. Graph database which is not most popular NoSQL database compare to
relational database yet but it is a most powerful NoSQL database which can handle large volume of data in
very efficient way. It is very difficult to manage large volume of data using traditional technology. Data
retrieval time may be more as per database size gets increase. As solution of that NoSQL databases are
available. This paper describe what is big data storage management, dimensions of big data, types of data,
what is structured and unstructured data, what is NoSQL database, types of NoSQL database, basic
structure of graph database, advantages, disadvantages and application area and comparison of various
graph database.
A Study on Graph Storage Database of NOSQLIJSCAI Journal
Big Data is used to store huge volume of both structured and unstructured data which is so large and is
hard to process using current / traditional database tools and software technologies. The goal of Big Data
Storage Management is to ensure a high level of data quality and availability for business intellect and big
data analytics applications. Graph database which is not most popular NoSQL database compare to
relational database yet but it is a most powerful NoSQL database which can handle large volume of data in
very efficient way. It is very difficult to manage large volume of data using traditional technology. Data
retrieval time may be more as per database size gets increase. As solution of that NoSQL databases are
available. This paper describe what is big data storage management, dimensions of big data, types of data,
what is structured and unstructured data, what is NoSQL database, types of NoSQL database, basic
structure of graph database, advantages, disadvantages and application area and comparison of various
graph database.
A Study on Graph Storage Database of NOSQLIJSCAI Journal
This document summarizes a research paper on graph storage databases in NoSQL. It discusses big data and the need for alternative databases to handle large, diverse datasets. It defines the key aspects of big data including volume, velocity, variety and complexity. It also describes different types of NoSQL databases, focusing on the basic structure of graph databases. Graph databases use nodes and relationships to model connected data. The document compares several graph database systems and discusses advantages like performance and flexibility as well as disadvantages like complexity. It outlines several applications of graph databases in areas like social networks and logistics.
Database reports provide us with the ability to further analyze ou.docxwhittemorelucilla
Database reports provide us with the ability to further analyze our data, and provide it in a format that can be used to make business decisions. Discuss the steps that you would take to ensure that we create an effective report. What questions would you ask of the users?
Data presentation should be designed to display correct conclusions. What issues should we think about as we prepare data for presentation? Discuss the different methods that we can use to present data in a report. What role does the audience play in selecting how we present the data?
1 PAGE AND A HALF
.
DataInformationKnowledge1. Discuss the relationship between.docxwhittemorelucilla
Data/Information/Knowledge
1. Discuss the relationship between data, information, and knowledge. Support your discussion with at least 3 academically reviewed articles.
2. Why do organization have information deficiency problem? Suggest ways on how to overcome information deficiency problem.
.
More Related Content
Similar to Data warehouse design from XML sourcesMatte0 Golfarelli Stef.docx
The document discusses XML schemas and how they are used in web services. Some key points:
1. XML schemas formally describe the structure and content of XML documents, defining elements, attributes, data types, and relationships between elements.
2. Schemas are used to validate XML documents and ensure they conform to the defined structure.
3. The document discusses how BT uses XML schemas to describe messages exchanged between web services and the importance of consistent implementation to allow for interoperability.
4. It also mentions best practices like using test cases and focusing on description to communicate how aspects of schemas can be relied upon.
The document discusses physical database requirements and defines three stages of database design: conceptual, logical, and physical. It provides details on each stage, including that physical database design implements the logical data model in a DBMS and involves selecting file storage and ensuring efficient access. The document also covers database architectures, noting that a three-tier architecture separates the user applications from the physical database.
This document discusses challenges in developing master data models across multiple domains. Some key challenges include conflicting data structures and semantics between different models, the expectation that each real-world entity should have only one master record even when represented in different domains, and the need to create horizontal views across domains to provide full visibility of entity data. The document argues that a governed, model-driven approach is needed to reduce duplication and inconsistencies when integrating multiple legacy models into a unified master data environment.
BI Architecture in support of data qualityTom Breur
Business intelligence (BI) projects that involve substantial data integration have often proven failure prone and difficult to plan. Data quality issues trigger rework, which makes it difficult to accurately schedule deliverables. Two things can bring improvement. Firstly, one should deliver information products in the smallest possible chunks, but without adding prohibitive overhead for breaking up the work in tiny increments. This will increase the frequency and improve timeliness of feedback on suitability of information products and hence make planning and progress more predictable. Secondly, BI teams need to provide better stewardship when they facilitate discussions between departments whose data cannot easily be integrated. Many so-called data quality errors do not stem from inaccurate source data, but rather from incorrect interpretation of data. This is mostly caused by different interpretation of essentially the same underlying source system facts across departments with misaligned performance objectives. Such problems require prudent stakeholder management and informed negotiations to resolve such differences. In this chapter I suggest an innovation to data warehouse architecture to help accomplish these objectives.
The document discusses using machine learning models for web content mining and news article classification. Specifically, it proposes using a support vector machine (SVM) model with features extracted from the document object model (DOM) tree to classify news articles into categories like title, date, body text, and noise. The SVM model is trained on a manually labeled dataset and can handle the nonlinear and complex patterns in the data better than rule-based models. The preprocessing step prunes noisy leaf nodes from the DOM tree before feature extraction and model training are performed to classify the remaining leaf nodes.
IRJET- SVM-based Web Content Mining with Leaf Classification Unit From DOM-TreeIRJET Journal
This document discusses using machine learning models and DOM tree analysis to extract important content from news articles for the purpose of topic detection. Specifically, it proposes using a support vector machine (SVM) model with "leaf classification units" from the DOM tree to remove noise data like images, ads, and recommended articles. This approach is meant to generalize to different article structures compared to rule-based models. The document reviews related work using DOM trees and statistical data for web content extraction and visual wrappers. It also discusses using various kernel functions in SVMs for non-linearly separable data.
Discussion post· The proper implementation of a database is es.docxmadlynplamondon
Discussion post
· The proper implementation of a database is essential to the success of the data performance functions of an organization. Identify and evaluate at least three considerations that one must plan for when designing a database.
· Suggest at least two types of databases that would be useful for small businesses, two types for regional level organizations and two types for international companies. Include your rationale for each suggestion.
LP’s post states the following:Top of Form
Question:
The proper implementation of a database is essential to the success of the data performance functions of an organization. Identify and evaluate at least three considerations that one must plan for when designing a database.
Answer:
Planning is the most significant aspect of database design, and here is where most projects for database design will fail because the database does not meet requirements, does not meet expectations, or are just unmanageable. Here you need to be forward-thinking by planning for the future. What information needs to be stored or what things or entities do we need to store information about (Knauff, 2004)? What questions will we need to ask of the database (Knauff, 2004)?
A well-designed database promotes consistent data entry and retrieval and reduces the existence of duplication among the database tables. Relational database tables work together to ensure that the correct data is available when you need it.
The first consideration should be what is the database’s intended purpose. Understanding the purpose will help define the need. Some examples might be “to keep a list of customers,” “to manage inventory,” or “to grade students (Filemaker Staff, n.d.).” All stakeholders need to be involved in this process.
Second is Data integrity. Is the data accurate, consistent, and complete? What kind of categories does the data align with? Identifying these categories is critical to designing an efficient database because different types and amounts of data in each category will be stored. Some example categories might be sales that track “customers,” “products,” and “invoices,” or grades that track “students,” “classes,” and “assignments (Filemaker Staff, n.d.).” Once the categories have been defined the relations can be determined. A good exercise to help with this is to write these out in simple sentences:
“customers order products” and “invoices record customers’ orders.”
Now the organization of the data can begin. The categories above can be used as tables so common data can be grouped.
The third is security. Is the database secure? Will the current policy and rules be sufficient going forward? Who should have access? Who should have access to which tables (Nield, 2016)? Read-only access? Write access? Is this database critical to business operations (Nield, 2016)? What are the D&R plans?
Excessive security creates excessive red tape and obstructs agility, but insufficient security will invite catastrophe (Nield, 2016 ...
Catalog-based Conversion from Relational Database into XML Schema (XSD)CSCJournals
Where we are in the age of information revolution, exchange information, and transport data effectively among various sectors of government, commercial, service and industrial, etc., the uses of a new databases model to support this trend has become very important because inability of traditional databases models to support it. eXtensible Markup Language (XML) considers a new standard model for data interchange through internet and mobiles devices networks, it has become a common language to exchange and share the data of traditional models in easy and inexpensive ways. In this research, we propose a new technique to convert the relational database contents and schema into XML schema (XSD- XML Schema Definition), the main idea of the technique is extracting relational database catalog using Structured Query Language (SQL). We follow three steps to complete the conversion process. First, extracting relation instance (actual content) and schema catalog using SQL query language, which consider the required information to implement XML document and its schema. Second, transform the actual content into XML document tree. The idea of this step is converting table columns of the relations (tables) into the elements of XML document. Third, transform schema catalog into XML schema for describing the structure of the XML document. To do so, we transform datatype of the elements and the variant data constrains such as data length, not null, check and default, moreover define primary foreign keys and the referential integrity between the tables. Overall results of the technique are very promise while the technique is very clear and does not require complex procedures that could adversely effect on the results accuracy. We performed many experiments and report their elapsed CPU times.
OUDG : Cross Model Datum Access with Semantic Preservation for Legacy Databases csandit
Conventional databases are associated with a plurality of database models. Generally database
models are distinct and not interoperable. Data stored in a database under a particular
database model can be termed as “siloed data”. Accordingly, a DBMS associated with a
database silo, is generally not interoperable with another database management system
associated with another database sil. This can limit the exchange of information stored in a
database where those desiring to access the information are not employing a database
management system associated with the database model related to the information. The DBMS
of various data models have proliferated into many companies, and become their legacy
databases. There is a need to access these legacy databases using ODBC. An ODBC is for the
users to transform a legacy database into another legacy database. This paper offers an end
user’s tool of Open Universal Database Gateway(OUDG) to supplement ODBC by
transforming a source legacy database data into Flattened XML documents, and further
transform Flattened XML document into a target legacy database. The Flattened XML
document is a mixture of relational and XML data models, which is user friendly and is a data
standard on the Internet. The result of reengineering legacy databases into each other through
OUDG is information lossless by the preservation of their data semantics in terms of data
dependencies.
Conventional databases are associated with a plurality of database models. Generally database
models are distinct and not interoperable. Data stored in a database under a particular
database model can be termed as “siloed data”. Accordingly, a DBMS associated with a
database silo, is generally not interoperable with another database management system
associated with another database sil. This can limit the exchange of information stored in a
database where those desiring to access the information are not employing a database
management system associated with the database model related to the information. The DBMS
of various data models have proliferated into many companies, and become their legacy
databases. There is a need to access these legacy databases using ODBC. An ODBC is for the
users to transform a legacy database into another legacy database. This paper offers an end
user’s tool of Open Universal Database Gateway(OUDG) to supplement ODBC by
transforming a source legacy database data into Flattened XML documents, and further
transform Flattened XML document into a target legacy database. The Flattened XML
document is a mixture of relational and XML data models, which is user friendly and is a data
standard on the Internet. The result of reengineering legacy databases into each other through
OUDG is information lossless by the preservation of their data semantics in terms of data
dependencies.
Data modeling is the process of creating a visual representation of data within an information system to illustrate the relationships between different data types and structures. The goal is to model data at conceptual, logical, and physical levels to support business needs and requirements. Conceptual models provide an overview of key entities and relationships, logical models add greater detail, and physical models specify how data will be stored in databases. Data modeling benefits include reduced errors, improved communication and performance, and easier management of data mapping.
A relational model of data for large shared data banksSammy Alvarez
This document introduces the relational model of data organization for large shared databases. It discusses inadequacies of existing tree-structured and network models, including ordering, indexing, and access path dependencies that impair data independence. The relational model represents data as mathematical n-ary relations and relationships between domains, providing independence from representation changes. It allows a clearer evaluation of existing systems and competing internal representations. The relational view forms a basis for treating issues like derivability, redundancy, and consistency in a sound way.
The document provides an overview of approaches for clustering XML data based on structure and content. It first outlines applications where XML clustering is useful, including XML query processing and data integration. It then presents a generic framework for XML clustering with three phases: data representation, similarity computation, and clustering/grouping. The document surveys current approaches and aims to classify them and identify common features. It also discusses challenges in XML clustering and future research directions.
P REFIX - BASED L ABELING A NNOTATION FOR E FFECTIVE XML F RAGMENTATIONijcsit
XML is
gradually
emplo
yed as
a standard of data exchange
in
web
environment
since its inception
in the
90s
until
present
.
It
serves
as a data exchange between system
s
and other application
s
.
Meanwhile t
he data
volume has grown substantially
in the web and
thus effective methods
of
storing and retrieving
these
data
is
essential
.
One recommended way is
p
hysically or virtually
fragments
the large chunk of data
and
distributes
the fragments
into different nodes.
F
ragmentation design
of XML document
contains of two
parts: fragmentat
ion operation and fragmentation method. The
three
fragmentation o
peration
s are
Horizontal, Vertical
and Hybrid. It
determines how the XML should be fragmented.
This
paper
aims
to give
an overview on the fragmentation design consideration
and
subsequently,
propose a
fragmentation
technique
using
number addressing
.
The document discusses the Entity Framework, which helps bridge the gap between object-oriented development and databases known as an "impedance mismatch". It generates business objects and entities from database tables and allows CRUD operations and managing relationships. Benefits include writing data access logic in higher-level languages and representing conceptual models with entity relationships. The Entity Framework architecture includes an Entity Data Model layer that maps objects to the database using ADO.NET. The EDM defines conceptual, storage, and mapping layers to program against an object model instead of a relational data model. EDMs can be created from existing databases or by defining a model first.
Formal Models and Algorithms for XML Data InteroperabilityThomas Lee
In this paper, we study the data interoperability problem of web services in terms of XML schema compatibility. When Web Service A sends XML messages to Web Service B, A is interoperable with B if B can accept all messages from A. That is, the XML schema R for B to receive XML instances must be compatible with the XML schema S for A to send XML instances, i.e., A is a subschema of B. We propose a formal model called Schema Automaton (SA) to model W3C XML Schema (XSD) and develop several algorithms to perform different XML schema computations. The computations include schema minimization, schema equivalence testing, subschema testing, and subschema extraction. We have conducted experiments on an e-commerce standard XSD called xCBL to demonstrate the practicality of our algorithms. One experiment has refuted the claim that the xCBL 3.5 XSD is backward compatible with the xCBL 3.0 XSD. Another experiment has shown that the xCBL XSDs can be effectively trimmed into small subschemas for specific applications, which has significantly reduced the schema processing time.
Big Data is used to store huge volume of both structured and unstructured data which is so large and is
hard to process using current / traditional database tools and software technologies. The goal of Big Data
Storage Management is to ensure a high level of data quality and availability for business intellect and big
data analytics applications. Graph database which is not most popular NoSQL database compare to
relational database yet but it is a most powerful NoSQL database which can handle large volume of data in
very efficient way. It is very difficult to manage large volume of data using traditional technology. Data
retrieval time may be more as per database size gets increase. As solution of that NoSQL databases are
available. This paper describe what is big data storage management, dimensions of big data, types of data,
what is structured and unstructured data, what is NoSQL database, types of NoSQL database, basic
structure of graph database, advantages, disadvantages and application area and comparison of various
graph database.
Big Data is used to store huge volume of both structured and unstructured data which is so large and is
hard to process using current / traditional database tools and software technologies. The goal of Big Data
Storage Management is to ensure a high level of data quality and availability for business intellect and big
data analytics applications. Graph database which is not most popular NoSQL database compare to
relational database yet but it is a most powerful NoSQL database which can handle large volume of data in
very efficient way. It is very difficult to manage large volume of data using traditional technology. Data
retrieval time may be more as per database size gets increase. As solution of that NoSQL databases are
available. This paper describe what is big data storage management, dimensions of big data, types of data,
what is structured and unstructured data, what is NoSQL database, types of NoSQL database, basic
structure of graph database, advantages, disadvantages and application area and comparison of various
graph database.
A Study on Graph Storage Database of NOSQLIJSCAI Journal
Big Data is used to store huge volume of both structured and unstructured data which is so large and is
hard to process using current / traditional database tools and software technologies. The goal of Big Data
Storage Management is to ensure a high level of data quality and availability for business intellect and big
data analytics applications. Graph database which is not most popular NoSQL database compare to
relational database yet but it is a most powerful NoSQL database which can handle large volume of data in
very efficient way. It is very difficult to manage large volume of data using traditional technology. Data
retrieval time may be more as per database size gets increase. As solution of that NoSQL databases are
available. This paper describe what is big data storage management, dimensions of big data, types of data,
what is structured and unstructured data, what is NoSQL database, types of NoSQL database, basic
structure of graph database, advantages, disadvantages and application area and comparison of various
graph database.
A Study on Graph Storage Database of NOSQLIJSCAI Journal
This document summarizes a research paper on graph storage databases in NoSQL. It discusses big data and the need for alternative databases to handle large, diverse datasets. It defines the key aspects of big data including volume, velocity, variety and complexity. It also describes different types of NoSQL databases, focusing on the basic structure of graph databases. Graph databases use nodes and relationships to model connected data. The document compares several graph database systems and discusses advantages like performance and flexibility as well as disadvantages like complexity. It outlines several applications of graph databases in areas like social networks and logistics.
Similar to Data warehouse design from XML sourcesMatte0 Golfarelli Stef.docx (20)
Database reports provide us with the ability to further analyze ou.docxwhittemorelucilla
Database reports provide us with the ability to further analyze our data, and provide it in a format that can be used to make business decisions. Discuss the steps that you would take to ensure that we create an effective report. What questions would you ask of the users?
Data presentation should be designed to display correct conclusions. What issues should we think about as we prepare data for presentation? Discuss the different methods that we can use to present data in a report. What role does the audience play in selecting how we present the data?
1 PAGE AND A HALF
.
DataInformationKnowledge1. Discuss the relationship between.docxwhittemorelucilla
Data/Information/Knowledge
1. Discuss the relationship between data, information, and knowledge. Support your discussion with at least 3 academically reviewed articles.
2. Why do organization have information deficiency problem? Suggest ways on how to overcome information deficiency problem.
.
DataHole 12 Score67575554555554555757756555656565556556565565666434686664656566664555575656546555557554556655655465555565546555655467555646457664545665556555644554585456546654565546664566665566666675675665665656766555565486555567676645645575555575665455554655556556575555555455654555655666667665654655556657656558536666536755465655455755755666665545656565655555545545666564656443545655646445567547565654565545565676456544455446455755645655665567565554465466665
State Legislatures
(Part I)
POLS 2212
Legislatures, Policy-Making, and Political Science
• Legislative process is only one part of policy-making
• States are better venue for understanding policy-making
process overall
• Interactions between components are more transparent
• Less ‘political theater’ than national level
• More cases, more variation, more data
• What role do legislatures play in the overall policy-making
process??
• How do legislative-executive relations affect policy outcomes??
Agenda Setting
Formulation /
Negotiation
Adoption /
Enactment
Implementation
Evaluation
Revision /
Termination
• Public attention is focused on an issue
• Collective recognition of problem
Agenda Setting
• Potential solutions are offered
• Some public discourse over options
Formulation / Negotiation
•
Solution
is agreed upon and made into official policy /
law
Adoption / Enactment
• Policy is converted into actionable rules
Implementation
• Fairness, effectiveness, efficiency of policy and rules are
evaluated
Evaluation
• Improvements or changes to policy are made
Revision / Termination
Agenda Setting
• Parties
• Public opinion
• Advocacy groups /
entrepreneurs
Formulation /
Negotiation
• Party leadership
• Interest groups
• Legislature type
• Legislative-executive
relations
Adoption / Enactment
• Legislative-executive
relations
Implementation
• Type of executive
• Bureaucracy
Evaluation
• Social scientists
• Advocacy groups
• Legislative
committees
• State courts
Revision / Termination
• State courts
• Federal courts
‘Professional’
Model
‘Citizen-
Legislator’
Model
Work Load
Nearly full-
time
Part-time
Session
Year-round,
annual
Short-term,
possibly
biannual
Compensation
Medium-high
(over median
for state
employees)
Fairly low
Staff
Large, semi-
permanent
Small, likely
shared
Conceptualizing State Legislatures
Professional Hybrid / Mixture Citizen
State Legislatures
• GA Legislature
• $17k base +per
diem
• $22k – $24k total
Discussion Question
• What are some of the potential benefits /
drawbacks of each of these two models??
State Legislatures and Political Careers (Peverill Squire)
• ‘Career’ Legislatures (Congress)
• Sufficiently high pay
• Minimal incentive to ‘move up’
• Expectation of long tenure
• Heavy time commitment
• ‘Springboard’ Legislatures
• Other positions have higher pay, more prestige
• Expectation of limited tenure
• May be term lim.
DataIDSalaryCompa-ratioMidpoint AgePerformance RatingServiceGenderRaiseDegreeGender1GradeDo not manipuilate Data set on this page, copy to another page to make changes154.50.956573485805.70METhe ongoing question that the weekly assignments will focus on is: Are males and females paid the same for equal work (under the Equal Pay Act)? 228.30.913315280703.90MBNote: to simplfy the analysis, we will assume that jobs within each grade comprise equal work.334.11.100313075513.61FB460.91.06857421001605.51METhe column labels in the table mean:549.21.0254836901605.71MDID – Employee sample number Salary – Salary in thousands 674.11.1066736701204.51MFAge – Age in yearsPerformance Rating - Appraisal rating (employee evaluation score)741.41.0344032100815.71FCService – Years of service (rounded)Gender – 0 = male, 1 = female 822.80.992233290915.81FAMidpoint – salary grade midpoint Raise – percent of last raise9731.089674910010041MFGrade – job/pay gradeDegree (0= BS\BA 1 = MS)1023.31.014233080714.71FAGender1 (Male or Female)Compa-ratio - salary divided by midpoint1124.31.05723411001914.81FA1259.71.0475752952204.50ME1341.81.0444030100214.70FC14251.08523329012161FA1522.60.983233280814.91FA1648.51.213404490405.70MC1763.11.1075727553131FE1836.21.1673131801115.60FB1923.91.039233285104.61MA2035.51.1443144701614.80FB2178.91.1786743951306.31MF2257.61.199484865613.81FD2322.20.964233665613.30FA2453.41.112483075913.80FD2523.61.0282341704040MA2622.30.971232295216.20FA2746.21.156403580703.91MC2874.41.111674495914.40FF2975.61.129675295505.40MF3047.50.9894845901804.30MD3122.90.995232960413.91FA3228.10.906312595405.60MB3363.71.117573590905.51ME3426.90.869312680204.91MB3522.70.987232390415.30FA3624.41.059232775314.30FA3723.81.034232295216.20FA3864.61.1335745951104.50ME3937.31.202312790615.50FB4023.71.031232490206.30MA4140.31.008402580504.30MC4224.41.0592332100815.71FA4372.31.0796742952015.50FF4465.91.1565745901605.21ME4549.91.040483695815.21FD4657.41.0075739752003.91ME47560.982573795505.51ME4868.11.1955734901115.31FE4966.21.1615741952106.60ME5061.71.0835738801204.60ME
Week 1Week 1: Descriptive Statistics, including ProbabilityWhile the lectures will examine our equal pay question from the compa-ratio viewpoint, our weekly assignments will focus onexamining the issue using the salary measure.The purpose of this assignmnent is two fold:1. Demonstrate mastery with Excel tools.2. Develop descriptive statistics to help examine the question.3. Interpret descriptive outcomesThe first issue in examining salary data to determine if we - as a company - are paying males and females equally for doing equal work is to develop somedescriptive statistics to give us something to make a preliminary decision on whether we have an issue or not.1Descriptive Statistics: Develop basic descriptive statistics for SalaryThe first step in analyzing data sets is to find some summary descriptive statistics for key variables. Suggestion: Copy the gender1 and salary columns from the Data tab t.
DataCity1997 Median Price1997 Change1998 Forecast1993-98 Annualize.docxwhittemorelucilla
This document provides a course syllabus for History 2030: Tennessee History at an unnamed university. The syllabus outlines key details about the course including the instructor's contact information, course description and purpose, learning outcomes, instructional methodology, evaluation procedures, course schedule, attendance policy, and accommodations for students with disabilities. The course surveys the geographical background, peoples, political life, economic and social development of Tennessee from its earliest beginnings to the present. Students will be evaluated based on exams, research assignments, and presentations to demonstrate their mastery of Tennessee history and ability to think critically about historical interpretations.
The document summarizes research on the harms of corporal punishment of children and argues that legal reform prohibiting it can be an effective strategy for changing social norms and practices. It describes experiences in Sweden and New Zealand, where legal bans on corporal punishment were accompanied by significant declines in support for the practice and reports of it occurring. While public opinion often lags legal changes initially, studies found dramatic shifts in attitudes and self-reported experiences of corporal punishment over time in both countries following prohibition.
Database Project Charter/Business Case
Khalia Hart
University of Maryland Global Campus
February 21, 2020
Introduction
A database is an electronic collection of data that is built by a user so that they can access, update particular information in the database coherently or rapidly. Today firms employ integrated technology to increase their capacity to serve more clients, keep information well or effectively, organize activities according to the urgency or priorities, accounting records (Tüttelmann F, 2015). Most of the integrated technology depends on multiple databases that supply information relevant in making the decision. Since the business started using databases, their performance increase because the business decisions they make are sound and practical.
Business Problem
The supply chain management is one of the most complicated processes in the business and often at times due to need of detail it gets hard for the supply chain manager to keep the record of the work covered effectively, have enough data to make the decision and also have enough data to monitor the chain of operation (William, 2019). The supply chain has been so crucial for the business because it determines the performance of the company in the industry by assessing the quality of the product produced in the organization, cost of production, the time and effectiveness of distribution network, and overall production operation of the organization.
Operation management has been named as the leading cause of business failure caused by a lack of a system, which the manager or the supervisor can use to monitor the whole system. This is the problem to solve using the database (William, 2019). Using a database, the manager can observe or watch the entire chain from their office, make better decisions by fore- planning approach of the database also make changes within the system when there is the need to cut costs or making the process effective.
Project Scope
Most business organizations are spread in operation, and this is the challenge that makes the supply chain management complex (Tüttelmann F, 2015). This is because the chain is in different localities, and therefore, coordination of operation among the user or the workers becomes a challenge. Through the database system, the business will enjoy proper coordination using the wide Area Network (LAN). Through the LAN network, the company can link computers and cost-effectively share data and communication. Through this system, the company will have a connection and coordination of the processes within the organization. The number of connected devices will range from 10 to 1000, depending on the type of tools and system that is set to facilitate this connection.
Goals and objectives of the system
The purpose of the system that I want to install in the supply chain management is to;
· Monitoring of the supply chain- the system will enable the manager to monitor the system and every process in the order (Gattor.
Databases selected Multiple databases...Full Text (1223 .docxwhittemorelucilla
Kraft reformed Oreo cookies to make them more successful in China. They made the cookies less sweet to suit Chinese tastes, sold them in smaller, cheaper packages, and marketed them with a "dunking" theme. This involved training student brand ambassadors to educate consumers about dipping cookies in milk. Kraft also introduced a Chinese-style Oreo wafer stick that surpassed regular Oreos in sales. These reforms helped Oreo become the best-selling biscuit in China.
DATABASE SYSTEMS DEVELOPMENT & IMPLEMENTATION PLAN1DATABASE SYS.docxwhittemorelucilla
DATABASE SYSTEMS DEVELOPMENT & IMPLEMENTATION PLAN 1
DATABASE SYSTEMS DEVELOPMENT & IMPLEMENTATION PLAN 19
Table of Contents
1. Database System Overview 3
1.1 Business Environment 3
1.2 Database system goals and objective 4
2. Entity Relationship Model 7
2.1 Proposed entities 7
2.2 Business rules 8
2.3 Entity–Relationship Model 9
2.3.1 Relationship Types 9
2.3.2 Normalization form 12
2.3.3 Benefit of using database design 14
3. Structured Query Language (SQL) Scripts 15
3.1 Data definition language (DDL) 15
3.2 Data manipulation language (DML) 16
3.3 SQL report 17
3.4 Benefit of using database queries 19
4. Database Administration Plan 20
5. Future Database System Implementation Plan 21
6. References 22
1.
Database System Overview
1.1 Business Environment
Office Depot, Inc is an American retail store company founded in 1986 and headquartered in Florida, United States. The company provides office and school supplies with 1400 retail stores and e-commerce sites. The supply includes everything to their customer like latest technology, core school and office supplies, printing and documenting service, furniture and other services like cell phone repair, tech and marketing service etc.
Recently there were too many complaints from existing and new customer that the online site is super glitch and lagging. Another customer posted that the delivery did not come on the scheduled day. And they cannot track down the order because the website does not have tracking information. Also when the website is down, customer service cannot help to see the order details either and therefore, they feel it’s frustrating to order online and therefore want to cancel the order. One other customer posted in the website grievance section that the “label maker” showed available in the stock even though it was out of stock when verified with the customer service representative. With every product not in stock, we lose opportunity of sale which costs the store. This not only affect customer but also affect company. We are so dependent on the data, most of the time staff has to correct accounting report, sales estimates and invoice customer manually which is very time-consuming in an excel sheet.
In order to solve above issues and avoid sales loss, Office Depot must have a database to store and maintain correct count of the products. This database will help inventory management i.e. tracking products, update inventory, find popular or less popular item, loss prevention, track inventory status and perform data mining. The staff can access this database via a computerized database. (Gerald H., Importance of inventory database retail)1.2 Database system goals and objective
The mission of the company is to become number one retail company by creating inclusive environment and great shopping experience where both customer and employees are respected and valued. To achieve the retail store mission, we are committed to provide secure and robust data base system for ou.
Database Security Assessment Transcript You are a contracting office.docxwhittemorelucilla
Database Security Assessment Transcript You are a contracting officer's technical representative, a Security System Engineer, at a military hospital. Your department's leaders are adopting a new medical health care database management system. And they've tasked you to create a request for proposal for which different vendors will compete to build and provide to the hospital. A Request For Proposal, or RFP, is when an organization sends out a request for estimates on performing a function, delivering a technology, or providing a service or augmenting staff. RFPs are tailored to each endeavor but have common components and are important in the world of IT contracting and for procurement and acquisitions. To complete the RFP, you must determine the technical and security specifications for the system. You'll write the requirements for the overall system and also provide evaluation standards that will be used in rating the vendor's performance. Your learning will help you determine your system's requirements. As you discover methods of attack, you'll write prevention and remediation requirements for the vendor to perform. You must identify the different vulnerabilities the database should be hardened against.
Modern healthcare systems incorporate databases for effective and efficient management of patient healthcare. Databases are vulnerable to cyberattacks and must be designed and built with security controls from the beginning of the life cycle. Although hardening the database early in the life cycle is better, security is often incorporated after deployment, forcing hospital and healthcare IT professionals to play catch-up. Database security requirements should be defined at the requirements stage of acquisition and procurement.
System security engineers and other acquisition personnel can effectively assist vendors in building better healthcare database systems by specifying security requirements up front within the request for proposal (RFP). In this project, you will be developing an RFP for a new medical healthcare database management system.
Parts of your deliverables will be developed through your learning lab. You will submit the following deliverables for this project:
Deliverables
• An RFP, about 10 to 12 pages, in the form of a double-spaced Word document with citations in APA format. The page count does not include figures, diagrams, tables, or citations. There is no penalty for using additional pages. Include a minimum of six references. Include a reference list with the report.
• An MS-Excel spreadsheet with lab results.
There are 11 steps in this project. You will begin with the workplace scenario and continue with Step 1: "Provide an Overview for Vendors."
Step 1: Provide an Overview for Vendors
As the contracting officer's technical representative (COTR), you are the liaison between your hospital and potential vendors. It is your duty to provide vendors with an overview of your organization. To do so, identify infor.
Database Design Mid Term ExamSpring 2020Name ________________.docxwhittemorelucilla
Database Design Mid Term Exam
Spring 2020
Name: ____________________________
1. What is a data model?
A. method of storing files on a disk drive
B. simple representation of complex real-world data structures
C. name of system for designing software
D. method of designing invoices for customers
2. A Relationship Database system consists of 3 parts: a client front end for sending information to a command processor, a middle tier that interprets user commands, and a management frame work for storing, organizing and securing data.
a. True
b. False
3. What are the 3 components of a table:
A. Row, column, value
B. Row, top, bottom
C. Column, row, top
D. Top, middle, end
4. What does the column represent in a table?
a. Attribute of the table records
b. A complete record in the table
c. The system log from the database
d. A list of database tables
5. What does a row in the table represent?
a. A complete data record
b. List of system logs
c. A list of file systems on database server
d. The primary keys from all the tables.
6. Which of the following is an example of data definition language (DDL)?
a. UPDATE
b. V$SYSLOG
c. CREATE
d. DETAIN
7 . Which of the following is an example of data manipulation language (DML)?
A. SELECT
B. ABORT
C. GRANT
D. REVOKE
8. A _______ key is an attribute that uniquely identifies a record in a table.
9. A _______ key is an attribute that is a primary key in one table and is used as a reference in a second table to establish a relationship between the two tables.
10. When running a ‘SELECT’ join, what is returned from the table:
A. ROW
B. Column
C. single attribute
D. all tables in the database
11. When running a ‘PROJECT’ join, what is returned from the table:
A. COLUMN
B. ROW
C. Single Attribute
D. a list of tables in the database
12. What are the 3 types of relationships commonly shown on an entity relationship diagram?
A. 1 to 1
B. 1 to Many
C. Many to Many
D. All the above
E. None of the above
13. What is an entity relationship diagram (ERD)?
A. graphical representation of all entities in a database and how the entities are related
b. list of the log files in the database.
C. list of all the tablespace names in a database
D. A diagram that shows how data is written to a physical disk drive.
14. The definition of an attribute in a table that has no value is:
A. ZERO
b. NULL
c. ZILTCH
D. NONE
15. A ____________ attribute can either be stored on retrieve on an ad hoc basis.
16. Briefly describe the advantages and disadvantages of storing a derived attribute?
17. A database can process many types of data classifications. Which of the following is not a data classification or architecture that databases can process:
A. Structured
B. Semi-structured
C. undelimited
D. Unstructured
18. The process by which functional/partial dependency and transitive dependency is removed from a database table is called:
a. sharding
b. normalization
c. defragmentation
d. reallocation
.
Database Justification MemoCreate a 1-page memo for the .docxwhittemorelucilla
This document contains two proposed memos. The first recommends migrating from a static website to a database driven application system, noting the benefits of databases in managing dynamic content and data while also acknowledging potential drawbacks. The second memo advocates for using web services and highlights considerations around security, scalability to large volumes of traffic, and compatibility across different devices and platforms.
Database Dump Script(Details of project in file)Mac1) O.docxwhittemorelucilla
Database Dump Script
(Details of project in file)
Mac:
1) Open up the terminal, or if already in MySQL, get out by typing "exit" and pressing enter.
2) Type:
/usr/local/mysql/bin/mysqldump -u root -p [database name] > /tmp/filename.txt
...where [database name] is the name of the database you want to export. When prompted, type the password. Check the /tmp file for your output.
.
Database Design 1. What is a data model A. method of sto.docxwhittemorelucilla
Database Design
1. What is a data model?
A. method of storing files on a disk drive
B. simple representation of complex real-world data structures
C. name of system for designing software
D. method of designing invoices for customers
2. Which of the following are the most important elements of a security program for databases:
a. Integrity, referential index, user rights
b. Confidentiality. Integrity and Availability
c. Availability, multi-master replication, high-bandwidth
d. DBA, System Admin, and PMO
3. Suppose that you have a table with a number of product sales. The product code may repeat in the table as it is likely the same product could be sold multiple times. If you want to produce a list of the unique products that are sold, you could use which of the following keywords in the SELECT statement:
A. LIKE
B. ORDERED BY
C. DISTINCT
D. DIFFERENT
4. What does the column represent in a table?
a. Attribute of the table records
b. A complete record in the table
c. The system log from the database
d. A list of database tables
5. What does a row in the table represent?
a. A complete data record
b. List of system logs
c. A list of file systems on database server
d. The primary keys from all the tables.
6. Which of the following is an example of data definition language (DDL)?
a. UPDATE
b. V$SYSLOG
c. CREATE
d. DETAIN
7 . Which of the following is an example of data manipulation language (DML)?
A. SELECT
B. ABORT
C. GRANT
D. REVOKE
8. A _____________ key is an attribute that uniquely identifies a record in a table.
9. A _____________ key is an attribute that is a primary key in one table and is used as a reference in a second table to establish a relationship between the two tables.
10. When running a ‘SELECT’ join, what is returned from the table:
A. ROW
B. Column
C. single attribute
D. all tables in the database
11. When running a ‘PROJECT’ join, what is returned from the table:
A. COLUMN
B. ROW
C. Single Attribute
D. a list of tables in the database
12. What are the 3 types of relationships commonly shown on an entity relationship diagram?
A. 1 to 1
B. 1 to Many
C. Many to Many
D. All the above
E. None of the above
13. What is an entity relationship diagram (ERD)?
A. graphical representation of all entities in a database and how the entities are related
b. list of the log files in the database.
C. list of all the tablespace names in a database
D. A diagram that shows how data is written to a physical disk drive.
14. The definition of an attribute in a table that has no value is:
A. ZERO
b. NULL
c. ZILTCH
D. NONE
15. A __________ attribute can either be stored on retrieve on an ad hoc basis.
16. Which of the following is not considered a characteristic of distributed management systems:
a. Concurrency Control
b. Business intelligence
c. Transaction management
d. query optimization
17. A database can process many types of data classifications. Which of the following is not a data class.
हिंदी वर्णमाला पीपीटी, hindi alphabet PPT presentation, hindi varnamala PPT, Hindi Varnamala pdf, हिंदी स्वर, हिंदी व्यंजन, sikhiye hindi varnmala, dr. mulla adam ali, hindi language and literature, hindi alphabet with drawing, hindi alphabet pdf, hindi varnamala for childrens, hindi language, hindi varnamala practice for kids, https://www.drmullaadamali.com
Main Java[All of the Base Concepts}.docxadhitya5119
This is part 1 of my Java Learning Journey. This Contains Custom methods, classes, constructors, packages, multithreading , try- catch block, finally block and more.
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
Strategies for Effective Upskilling is a presentation by Chinwendu Peace in a Your Skill Boost Masterclass organisation by the Excellence Foundation for South Sudan on 08th and 09th June 2024 from 1 PM to 3 PM on each day.
How to Manage Your Lost Opportunities in Odoo 17 CRMCeline George
Odoo 17 CRM allows us to track why we lose sales opportunities with "Lost Reasons." This helps analyze our sales process and identify areas for improvement. Here's how to configure lost reasons in Odoo 17 CRM
This document provides an overview of wound healing, its functions, stages, mechanisms, factors affecting it, and complications.
A wound is a break in the integrity of the skin or tissues, which may be associated with disruption of the structure and function.
Healing is the body’s response to injury in an attempt to restore normal structure and functions.
Healing can occur in two ways: Regeneration and Repair
There are 4 phases of wound healing: hemostasis, inflammation, proliferation, and remodeling. This document also describes the mechanism of wound healing. Factors that affect healing include infection, uncontrolled diabetes, poor nutrition, age, anemia, the presence of foreign bodies, etc.
Complications of wound healing like infection, hyperpigmentation of scar, contractures, and keloid formation.
Walmart Business+ and Spark Good for Nonprofits.pdfTechSoup
"Learn about all the ways Walmart supports nonprofit organizations.
You will hear from Liz Willett, the Head of Nonprofits, and hear about what Walmart is doing to help nonprofits, including Walmart Business and Spark Good. Walmart Business+ is a new offer for nonprofits that offers discounts and also streamlines nonprofits order and expense tracking, saving time and money.
The webinar may also give some examples on how nonprofits can best leverage Walmart Business+.
The event will cover the following::
Walmart Business + (https://business.walmart.com/plus) is a new shopping experience for nonprofits, schools, and local business customers that connects an exclusive online shopping experience to stores. Benefits include free delivery and shipping, a 'Spend Analytics” feature, special discounts, deals and tax-exempt shopping.
Special TechSoup offer for a free 180 days membership, and up to $150 in discounts on eligible orders.
Spark Good (walmart.com/sparkgood) is a charitable platform that enables nonprofits to receive donations directly from customers and associates.
Answers about how you can do more with Walmart!"
Leveraging Generative AI to Drive Nonprofit InnovationTechSoup
In this webinar, participants learned how to utilize Generative AI to streamline operations and elevate member engagement. Amazon Web Service experts provided a customer specific use cases and dived into low/no-code tools that are quick and easy to deploy through Amazon Web Service (AWS.)
How to Make a Field Mandatory in Odoo 17Celine George
In Odoo, making a field required can be done through both Python code and XML views. When you set the required attribute to True in Python code, it makes the field required across all views where it's used. Conversely, when you set the required attribute in XML views, it makes the field required only in the context of that particular view.
Data warehouse design from XML sourcesMatte0 Golfarelli Stef.docx
1. Data warehouse design from XML sources
Matte0 Golfarelli Stefano Rizzi
DEIS - University of Bologna DEIS - University of Bologna
Viale Risorgimento, 2 Viale Risorgimento, 2
40136 Bologna - Italy 40136 Bologna - Italy
+39-051-642862 +39-051-2093542
[email protected][email protected]
Boris Vrdoljak
FER - University of Zagreb
Unska 3
10000 Zagreb - Croatia
+385-(O)l -6129756
[email protected]
ABSTRACT
A large amount of data needed in decision-making processes is
stored in the XML data format, which is widely used for e-
commerce and Internet-based information exchange. Thus, as
more organizations view the web as an integral part of their
communication and business, the importance of integrating
XML
data in data warehousing environments is becoming increasingly
high. In this paper we show how the design of a data mart can
be
carried out starting directly from an XML source. Two main
issues arise: on the one hand, since XML models semi-
structured
2. data, not all the information needed for design can be safely
derived; on the other, different approaches for representing
relationships in XML DTDs and Schemas are possible, each
with
different expressive power. After discussing these issues, we
propose a semi-automatic approach for building the conceptual
schema for a data mart starting from the XML sources.
Keywords
Data warehouse design, Data warehousing and the web, XML
1. INTRODUCTION
A large amount of data needed in decision-making processes is
stored in the XML (Extensible Markup Language) data format.
The structure of XML, composed of nested custom-defined tags
that can describe the meaning of the content itself, makes it
usable
as a semantic-preserving data exchange format on the web. As
the
Internet has evolved into a global platform for e-commerce and
information exchange, the interest in XML has been growing
and
large volumes of XML data already exist.
XML can be considered as a particular standard syntax for the
exchange of semi-structured data [l]. One common feature of
semi-structured data models is the lack of schema, so that the
data
is self-describing. However, XML documents can be associated
with and validated against either a Document Type Definition
(DTD) or an XML Schema, both of which allow the structure of
XML documents to be described and their contents to be
constrained. DTDs are defined as a part of the XML 1.0
Permission to make digital or hard copies of all or part of this
work for
3. personal or classroom use is granted without fee provided that
copies
are not made or distributed for profit or commercial advantage
and that
copies bear this notice and the full citation on the first page. To
copy
otherwise, or republish, to post on servers or to redistribute to
lists.
requires prior specific permission and/or a fee.
DOLAP ‘01, November 9,2001, Atlanta, Georgia, USA.
Copyright 2001 ACM l-581 l3-437-l/Ol/OOl I . ..$5.00.
Specification [15], while XML Schemas have recently become
a
W3C Recommendation [16]. XML Schemas considerably
extend
the capabilities of DTDs, especially from the point of view of
data
typing and constraining. With DTDs or Schemas, the
applications
exchanging data can agree about the meaning of the tags and, in
that case, XML reaches its full potential.
During the recent years, the enterprises have been asking for
support in the process of extracting useful, concise and handy
information for decision-making out of the tons of data stored
in
their expensive and complex information systems.
Consequently,
a substantial effort has been made in the academic world to
devise
methodologies for designing data warehousing systems capable
of
seamlessly integrating information from different sources, in
particular from heterogeneous databases. Now, as more
organizations view the web as an integral part of their
4. communication and business, the importance of integrating
XML
data in data warehousing environments is becoming increasingly
high. Some commercial tools now support data extraction from
XML sources to feed the warehouse, but both the warehouse
schema and the logical mapping between the source and the
target
schemas must be defined by the designer.
In this paper we show how multidimensional design for data
warehouses can be carried out starting directly from an XML
source. Two main issues arise: on the one hand, different
approaches for representing relationships in XML DTDs and
Schemas are possible, each achieving a different expressive
power; on the other, since XML models semi-structured data,
not
all the information needed for design can be safely derived.
Thus,
our contribution in this work is twofold: first, we propose a
warehouse-oriented review and comparison of the approaches
for
structuring XML documents; then, we outline an algorithm in
which the problem of correctly inferring the needed information
is
solved by querying the source XML documents and, if
necessary,
by asking the designer’s help.
One alternative approach to design from XML sources consists
in
first translating them into an equivalent relational schema, then
starting from the latter to design the warehouse. Some
approaches
for translating XML documents into a relational database can be
found in the literature, both leaning on the DTD [10][13] or not
[5], but insufficient emphasis is given to the problem of
5. determining the cardinality of relationships, which instead has a
primary role in multidimensional design.
The paper is structured as follows. In Section 2 the basics of
multidimensional modeling and design are given, while in
Section
3 the design alternatives for modeling relationships in both
DTDs
4 0
and XML Schemas are reviewed and discussed. In Section 4 our
approach to multidimensional design is presented with reference
to a case study, and in Section 5 the conclusions are drawn.
2. MULTIDIMENSIONAL MODELING AND
DESIGN
It is now widely recognized that an accurate conceptual design
is
the necessary foundation for building a data warehouse which is
both well-documented and fully satisfies requirements. In order
to
be independent of the specific issues involved in logical and
physical modeling, the approach proposed here is referred to the
conceptual level, from which the logical schemas of the data
marts
can be easily derived.
2.1 Conceptual Modeling
Several conceptual models for data warehouses were devised in
the literature [2][3][6][7][8][14]; they mainly differ for the
graphical representation of concepts, with small differences in
expressive power. In this paper we will adopt the Dimensional
Fact Model (DFM) [7], which represents the data mart by
6. means
of a set offact schemas.
In the following we will briefly discuss the DFM representation
of
the main concepts of the multidimensional model with reference
to the fact schema CLICK, which describes the analysis of the
web site traffic. The reason for choosing this example is that,
due
to the significant role now played by the web in attracting new
clients and supporting sales, analyzing the web server traffic
may
be crucial for improving the enterprise business. In this context,
multidimensional modeling allows many unpredictable complex
queries to be answered, such as:
. What is the trend for the most and the least accessed pages?
. Is there a relationship between business events (for instance,
sale promotions in an e-commerce site) and the number of
accesses?
For such an analysis, we need information about the hostname
or
IP address of the computer requesting the file, the date and time
of
the request, and the URL of the file being requested.
m o n t h 0
day of week
date (
h o l i d a y
d o m a i n
7. (category I
n a t i o n ) CLICK URL f i l e t y p e
0 n n o . o f c l i c k s A 0
host
( h o s t n a m e /
IP address)
hour ()
Figure 1. Fact schema for click-stream analysis
In the fact schema shown in Figure 1, the fact CLICK, focus of
interest for the decision-making process, is associated to the
measures which describe it, i.e. no. of clicks, and to the
dimensions determining its minimum granularity, namely host,
date, hour, and URL. Each dimension is the root of a hierarchy
which determines how the fact may be aggregated and selected
significantly for the decision-making process; each hierarchy
includes a set of attributes linked by functional dependencies.
For
instance, the URL of the file being requested determines the file
type, and the hostname or IP address of the computer requesting
the file determines its domain. The values for the domain
attribute
can be extracted from the hostname suffix that is indicating
either
the category (for instance, “.com” for commercial companies)
or
the nation.
Within the DFM, as in all the other conceptual models, a strong
relevance is given to functional dependencies, since they
represent
many-to-one relationships between attributes which enable
8. flexible aggregation of data in OLAP queries [9]. Thus, the
main
problem in building a conceptual data mart schema is to identify
those relationships in the business domain.
2.2 Conceptual Design
In most approaches to design of data marts, the conceptual
schema
is built starting from the (logical or conceptual) schema of the
operational sources [2][7][8]. The common core of these
approaches consists in navigating the functional dependencies
in
the source schema in order to determine the hierarchies for the
fact. In particular, the methodology to build a fact schema from
an
E/R schema proposed in [7] consists of the following steps:
1. Choosing facts.
2. For each fact:
2.1 Building the attribute tree.
2.2 Rearranging the attribute tree.
2.3 Defining dimensions and measures
For briefly illustrating the methodology, we will use the E/R
diagram describing the web site traffic shown in Figure 2.
Facts typically correspond to events occurring dynamically in
the
enterprise world. On the E/R schema a fact may be represented
either by an entity F or by an n-ary relationship R. In our
example,
the fact of primary interest is represented by entity CLICK.
9. Given a portion of interest of a source schema and an entity F
belonging to it, we call attribute tree the tree such that:
. each vertex corresponds to an attribute - simple or compound
- of the schema;
. the root corresponds to the identifier of F,
. for each vertex V, the corresponding attribute functionally
determines all the attributes corresponding to the descendants
of v.
The attribute tree for F may be constructed automatically by
navigating, starting from F, the functional dependencies
expressed
by many-to-one relationships between entities in the source
schema. Each entity E analyzed is represented in the attribute
tree
by: (1) a node corresponding to the identifier of E, (2) a child
node for each of the non-identifier attributes of E; (3) a child
node
for each entity G connected to E by a many-to-one relationship.
41
time d a t e
(IN
r-5U R LC A T E G O R Y
Figure 2. E/R schema for web site traffk
The resulting attribute tree for the web site traffic analysis
10. elements (i.e. sub-elements) and attributes. Both elements and
example is shown in Figure 3. Starting from the CLICK entity,
attributes are allowed to have values. Document structures can
be
new vertices are added by following many-to-one relationships.
nested to any level of complexity; between the opening and
The URL category attribute could not be included in the
attribute closing tags for an element, any number of additional
elements or
tree because there is a many-to-many relationship between URL
textual data may be present. Attributes are included, with their
and URL category. The nation attribute is included twice in the
respective values, within the element’s opening declaration. An
attribute tree because it can be added to the attribute tree by
XML document that contains data about the web site traffic is
navigating from both the host and the site entities. shown in
Figure 4.
d a t e time <webTraffic>
< c l i c k >
< h o s t hostId=“ares.csr.unibo.it”>
<nation>italy</nation>
</host>
<date>23-MAY-ZOOl</date>
<time>16:43:25</time>
curl urlID="BL0023">
<site siteID="www.hr">
<nation>croatia</nation>
</site>
Figure 3. Attribute tree for fact CLZCK
11. Once the attribute tree has been built, some uninteresting nodes
may be dropped from it. The time attribute, that represents the
exact time of a click, is replaced with a coarser hour attribute.
<fileType>shtml</fileType>
<urlCategory>catalog</urlCategory>
</url>
</click>
<click>
Finally, dimensions and measures must be selected among the
children of the root. In our example, the attributes chosen as
dimensions are host, date, hour and URL; number of clicks,
determined by counting the clicks from the same host to the
same
URL on a given date and hour, is chosen as a measure.
. .
</click>
. . .
</webTraffic>
F i g u r e 4 . A n X M L d o c u m e n t d e s c r i b i n g t h
e w e b s i t e t r a f f i c
Some further minor arrangements must be made in order to
obtain
the fact schema in Figure 1; in particular, the date dimension is
enriched by building a hierarchy which includes attributes
month,
day of week, and holiday. Besides, the host category and nation
optional attributes are replaced by attribute domain, which
indicates either the category or the nation of the host.
12. An XML document is valid if it has an associated schema, such
as
a DTD or an XML Schema, and if it conforms to the constraints
expressed in that schema. Since our methodology for conceptual
design is based on detecting many-to-one relationships, in the
following we will focus on the way those relationships can be
e x p r e s s e d i n t h e D T D a n d t h e X M L S c h e m
a .
3. MODELING RELATIONSHIPS IN XML
An XML document consists of nested element structures,
starting
with a root element. Each element may contain component
3.1 Relationships in DTDs
A DTD defines elements and attributes allowed in an XML
document, and the nesting and occurrences of each element. The
42
structure of an XML document is constrained using element-
type
and attribute-list declarations. Element-type declarations
specify
which sub-elements can appear as children of the element;
attribute-list declarations specify the name, type, and possibly
default value of each attribute associated with a given element
type. Among the different attribute types, types ID, IDREF and
IDREFS have particular relevance for our approach: the ID type
defines a unique identifier for the element; the IDREF type
means
that the attribute’s value is some other element’s identifier;
IDREFS means that its value is a list of identifiers. IDREF(S)
must match the value of some ID attribute in the document [ 11.
13. 3. I .2 Modeling relationships by ID and IDREF(S)
Another way to specify relationships between elements in DTDs
is by means of ID and IDREF(S) attributes. The way these
attributes operate resembles the key and foreign key mechanism
used in relational databases, with some important differences
and
limitations. For instance, if we take the last part of the DTD
from
Figure 5 and let the relationships of the url element be defined
by
ID and IDREF(S) attributes rather than by sub-elements, we
obtain the DTD in Figure 6.
3.1. I Modeling relationships by sub-elements
<!ELEMENT webTraffic ( c l i c k * , f i l e T y p e + ,
urlCategory+) >
Relationships can be specified in DTDs by sub-elements that
may
have different cafdinalities. The optional character following a
child element name or list in the element-type declarations
determines whether the element or the content particles in the
list
may appear one or more (+), zero or more (*), or zero or one
times (?); the default cardinality is exactly one.
Figure 5 presents a DTD according to which the XML document
presented in Figure 4 is valid. Element webTraffic is defined as
a
document element, thus becomes the root of XML documents. A
webTraffic element may have many click elements, while in an
url
element the site sub-element must occur exactly once, followed
by
onefiJeType and many urlCategory sub-elements. A host
14. element
may have either a category or a nation element.
<!DOCTYPE webTraffic [
<!ELEMEIW webTraffic ( c l i c k * ) >
<!ELEMENT c l i c k ( h o s t , d a t e , t i m e , url) >
<!ELEMENT h o s t ( c a t e g o r y 1 n a t i o n ) >
<!ATTLIST h o s t
hostId ID #REQUIRED>
<!ELEMENT c a t e g o r y (#PCDATA)>
<!ELEMENT d a t e (#PCDATA)>
<!ELEMENT t i m e (#PCDATA)>
<!ELEMENT url ( s i t e , fileType,
urlCategory+) >
<!ATTLIST u r l
urlId ID #REQUIRED>
< ! ELEMENT site (nation) >
<!ATTLIST s i t e
siteId ID #REQUIRED>
< ! ELEMENT nation (#PCDATA) >
<!ELEMENT fileType (#PCDATA)>
<!ELEMENT urlcategory (#PCDATA)>
I>
Figure 5. A DTD where relationships are specified by sub-
e l e m e n t s
If a one-to-one or one-to-many relationship must be represented
in
XML, sub-elements with the above mentioned cardinalities can
be
15. used without loss of information. However, given a DTD, we
can
follow only one direction of a relationship. For instance,
according to the DTD in Figure 5, an url element may have
many
urlCutegory sub-elements, but it is not possible to find out,
from
the DTD, whether an URL category can refer to many URLs.
Only by having some knowledge about the domain described by
the DTD, we can conclude that the latter is the case.
. . .
<!ELEMENT u r l ( s i t e )
<!ATTLIST url
urlId ID #REQUIRED
fileTypeRef I D R E F # R E Q U I R E D
urlCategoryRef IDREFS # R E Q U I R E D >
<!ELEMENT fileType EMPTY>
<!ATTLIST fileType
typeID ID #REQUIRED
typeDescription CDATA #IMPLIED>
<!ELEMENT urlcategory EMPTY>
<!ATTLIST urlcategory
urlCategoryId ID #REQUIRED
urlCateqorvDesc CDATA #IMPLIED>
Figure 6. A DTD where relationships are specified by
IDREF(S)
In this example we assume that the$ZeTypeRefattribute of the
url
16. element references the fireType element. On the other hand, an
instance of urlCategoryRef references many instances of
urZCategory. Therefore, for one URL there is exactly one file
type
while there may be several categories. The problem is that,
using
IDREF(S), the participating elements cannot be constrained to
be
of a certain element type. For instance, jZeTypeRef could also
contain a reference to an urZCutegory element, while obviously
we would like to constrain such references tofiZeType elements
only. Further, though the value of an ID attribute is unique
within
the whole document, element types are not required to have an
ID,
and even if an element type has an ID, its usage may be
optional.
For these reasons, there is no means to actually constrain the
allowed relationships by the ID/IDREF(S) mechanism.
3.2 Relationships in XML Schemas
XML Schemas give more accurate representation of the XML
structure constraints than DTDs; in particular, the cardinal&y
can
be specified in more detail by using the minoccurs and
maxoccurs attributes. An XML Schema consists of type
definitions, which can be derived from each other, and element
declarations. The possibility of separating an element
declaration
from the definition of its type enables the sharing and reusing
of
simple and composite types. Further, besides ID and IDREF(S)
attributes, XML Schemas support the use of key and keyRef
elements for defining keys and their references.
The two different ways of specifying relationships in XML
17. Schemas, i.e. sub-elements and key/keyRef mechanisms, will be
described in the following.
43
3.2.1 Modeling relationships by sub-elements
Figure 7 presents a part of an XML Schema that defines the
same
structure and constraints as the DTD in Figure 5. Since, for
defining elements, attributes minOccurs and maxOccurs default
to
1, and only in the case of the urlCategory element the
maxOccurs
attribute is set to “unbounded”, an w-1 element will consist of
one
site, one$leType and many urlcategory elements. We will not
further discuss this solution since, from the point of view of
cardinality modeling, it is essentially equivalent to the solution
in
Section3.1.1.
< e l e m e n t name=“click”>
<complexType>
<sequence>
<element name="host"
type="hostType"/>
<element name="date" type="date"/>
<element name="time" type="time"/>
<element name="url" type="urlType"/>
</sequence>
</complexType>
18. </element>
. .
<complexType name="urlType">
<sequence>
<element name="site" type="siteType"/>
<element name="fileType"
type="string"/>
<element name="urlCategory"
type="string"
maxOccurs="unbounded"/>
</sequence>
<attribute name="urlID" type="string"/>
</complexType>
. . .
F i g u r e 7 . A n X M L S c h e m a w h e r e r e l a t i o n
s h i p s a r e s p e c i f i e d
b y s u b - e l e m e n t s
3 . 2 . 2 Modeling relationships by key and keyRef
elements
In addition to ID and IDREF(S) attributes, XML Schemas
introduce more powerful and flexible mechanisms, similar to
the
relational concept of foreign key. The key element is used to
indicate that every attribute or element value must be unique
within a certain scope and not null. By using keyRef elements,
keys can be referenced.
For the purpose of describing the declaration of keys and their
19. references, an element namedfileTypes (represented in Figure
8)
is added as a sub-element to the click element from the XML
document in Figure 4. The fileTypes element consists of type
elements with z’ypeZD attributes. Figure 9 presents an
evolution of
the XML Schema in Figure 7, according to which every click
element also contains a fileTypes element. This element has a
composite type named fileTypesType (its definition is not
presented in Figure 9), according to which the XML document
in
Figure 8 is valid.
XML Schemas allow to specify the scope for each key by means
of an XPath expression [17]. In our example, the key element
is
namedfileTypeKey. The typeID attribute from Figure 8 is
specified as the key by means of the selector and thefield sub-
elements of the key attribute in Figure 9. The xpath attribute of
the
selector element contains an XPath expression,$leTypes/type,
that
selects the type elements that are sub-elements of the fileTypes
element. The xpath attribute of the field element contains the
@typeZD expression, that specifies the typeZD attribute of the
type
element as the key. Further, thefileType element, that is a sub-
element of w-1, is declared as a keyReJ this means that, for
every
value ofjZeType, afileTypeKey with the same value must
exist.
<fileTypes>
<type typeID="html">
21. Figure 9. An XML Schema where relationships are specified
by key/keyRef
In conclusion, the key/keyRef mechanism may be applied to any
element and attribute content, and the scope of the constraint
can
be specified, while an ID is a type of attribute whose scope is
fixed to be the whole document. Furthermore, combinations of
element and attribute content can also serve as keys and
references in XML Schemas.
4. CONCEPTUAL DESIGN FROM XML
SOURCES
In the previous section, we showed different approaches for
representing relationships in DTDs and XML Schemas. Three
of
them are suitable for specifying relationships: sub-elements in
DTDs, sub-elements and key/keyRef in Schemas; though their
expressive power is different, in the context of this paper they
may be considered to be equivalent since with reference to
multidimensional design they allow the same knowledge to be
captured. We do not consider the fourth approach, ID/IDREF(S)
4 4
in DTDs, since it is not precise and useful enough in
constraining
relationships.
In this section we propose a semi-automatic approach for
building
the conceptual schema of a data mart starting from the XML
sources. Of the three above-mentioned approaches, we have
chosen sub-elements in DTDs for the presentation of our
22. methodology, since Schemas are still not as widely used as
DTDs;
however, the methodology is essentially the same when dealing
w i t h S c h e m a s .
Starting with the assumption that the XML document has a DTD
and conforms to it, the methodology consists of the following
steps:
1. Simplifying the DTD.
2 . Creating a DTD graph.
3. Choosing facts.
4. For each fact:
4.1 Building the attribute tree from the DTD graph.
4.2 Rearranging the attribute tree.
4.3 Defining dimensions and measures.
In the following paragraphs we will describe steps from (1) to
(4.1) referring to the web site traffic example; once the attribute
tree is built, steps 4.2 and 4.3 are identical, respectively, to
steps
2.2 and 2.3 described in Section 2.3.
Simvlifvina t h e D T D
The sub-elements in DTDs may have been declared in a
complicated and redundant way. However, those details of a
DTD
can be simplified [ 131. The transformations for simplifying a
DTD
include converting a nested definition into a flat representation:
23. f o r i n s t a n c e , i n t h e w e b s i t e t r a f f i c e x a m
p l e ,
host(category /nation) i s t r a n s f o r m e d i n t o
host(category?,nation?). Further, the sub-elements having the
same name are grouped, and many unary operators are reduced
to
a single unary operator. Finally, all “+” operators are
transformed
intO “*” operators.
Creating a DTD waph
After simplifying the DTD, a DTD graph representing its
structure
can be created as described in [lo] and [13]; its vertices
correspond to elements, attributes and operators in the DTD.
Attributes and sub-elements are not distinguished in the graph
since, in our methodology, they are considered as equivalent
nesting mechanisms. The DTD graph for the DTD in Figure 5 is
given in Figure 10.
Definina facts
The designer chooses one or more vertices of the DTD graph as
facts; each of them becomes the root of a fact schema. In our
example, we choose the click vertex as the only interesting fact.
Building the attribute tree
The vertices of the attribute tree are a subset of the element and
attribute vertices of the DTD graph. The algorithm to build the
attribute tree is sketched in Figure Il.
webTraffic
1
click
24. host d a t e t i m e url
Figure 10. DTD graph for web site traffic analysis
root=newVertex (F) ;
// newVertex(<vertex>) r e t u r n s a n e w v e r t e x
// o f t h e a t t r i b u t e t r e e
// c o r r e s p o n d i n g t o < v e r t e x >
// o f t h e D T D g r a p h
expand(F,root) ;
expand(E,V) :
// E i s t h e c u r r e n t DTD vertex,
// V i s t h e c u r r e n t a t t r i b u t e t r e e v e r t e x
{ f o r e a c h c h i l d W o f E d o
if W i s e l e m e n t o r a t t r i b u t e d o
{ next=newVertex(W);
addchild (V, next) ;
// a d d s c h i l d W t o V
expand (W, next ) ;
else
if W="?" do
expand (W, V) ;
for each parent Z of E such that
Z i s n o t a d o c u m e n t e l e m e n t d o
if Z=“3” or Z=“*” do
expand(Z,V) ;
e l s e
if n o t toMany(E,Z) do
25. if askDesignerToOne (E, Z) do
{ next=newVertex(Z);
addChild(V,next) ;
expand (Z, next) ;
1
Figure 11. Algorithm for building the attribute tree
The attribute tree is initialized with the fact vertex F; then, it is
enlarged by recursively navigating the functional dependencies
between the vertices of the DTD graph. Each vertex V inserted
in
the attribute tree is expanded as follows (procedure expand):
1. For each vertex W that is a child of V in the DTD graph:
When examining relationships in the same direction as in the
DTD graph, the cardinality information is expressed either
explicitly by “?” and “*” vertices or implicitly by their
4 5
2 . For each vertex Z that is a parent of V in the DTD graph:
absence. If W corresponds to an element or attribute in the since
there is no need for the existence of both host and hostld
DTD, it is added to the attribute tree as a child of V; if W is a
vertices, only host should be left, the same logic should be
applied
“ ? ” operator, its child is added to the attribute tree as a child
of for urlld and siteld. Finally, the time attribute is replaced
with the
26. V; if W is a “*” operator, no vertex is added. hour attribute.
When examining relationships in this direction, vertices
corresponding to “*” and “?,, operators are skipped since they
only express the cardinality in the opposite direction. Since
the DTD yields no further information about the relationship
cardinality, it is necessary to examine the actual data by
querying the XML documents conforming to the DTD. This is
done by procedure checkToMany, which counts the number
of distinct values of Z corresponding to each value of E. If a -
to-many relationship is detected, Z is not included in the
attribute tree. Otherwise, we still cannot be sure that the
cardinality of the relationship from E to Z is -to-one. In this
case, only the designer can tell, leaning on her knowledge of
the business domain, whether the actual cardinality is -to-one
or -to-many (procedure askDesignerToOne). Only in the
first case, Z is added to the attribute tree. The reason why
document elements are not considered is that they have only
one instance within XML documents, thus they have no
interest for aggregation and should not be modeled in the data
m a r t .
In the following some general considerations on the proposed
approach are reported.
The problem of checking cardinalities in XML documents is
related to that of discovering functional dependencies in
relational
databases, which was widely investigated in the literature on
relational theory and data mining [11][12]. In our case, the
situation is much simpler since no inference is necessary, so it
comes down to properly querying the data with an XML query
language supporting aggregate queries. For instance, in W3C
XQuery [18] the use of the distinct function is proposed for
that
purpose, while the use of the group-by function is proposed in
27. [4].
The main question arising is how many XML documents we
must
see to reasonably confirm our presumption that the cardinality
is
-to-one.
Clearly, the semi-structured nature of XML sources increases
the
level of uncertainty on the structure of data as compared to E/R
sources, thus making recourses to the designer’s experience
more
frequent. In our algorithm, we chose to ask questions
interactively
during the tree building phase in order to avoid unnecessary
queries on the documents. An alternative solution consists in
first
building the tree by emphasizing all the uncertain relationships,
then handing on the complete tree to the designer in order to
have
it rearranged (step 4.2) with specific attention to uncertain
relationships, which could be dropped together with their
subtree.
While this solution allows the designer to have a wider point of
view on the tree, it may be less efficient since a vertex, dropped
later, might have been (uselessly) expanded by querying the
XML
d o c u m e n t s .
d a t e t i m e u r l l d
host category Q P P
url f i l e t y p e
0
28. n a t i o n site
s i t e l d
0
F i g u r e 1 2 . A t t r i b u t e t r e e d e r i v e d f r o m t h
e D T D g r a p h
In our example, no uncertain relationships are navigated. Vertex
urlCategory is not added to the attribute tree because it is a
child
of a -*I’ vertex in the DTD graph. The resulting attribute tree
for
the web site traffic analysis example is given in Figure 12.
Some
further arrangements should be made to this tree: for instance,
As a matter of fact, the problem of inferring the relationship
cardinalities is present also when the source to be used for
design
is a relational schema [7]. In fact, the presence in a relation R
of a
foreign key F referencing the primary key K of a relation S,
implies that F functionally determines K and, consequently, all
the
other attributes of K, but tells us nothing about the number of
distinct tuples of R related to each tuple of S. Thus, in
principle, in
order to guess the uncertain cardinalities we should query the
database as in the case of XML sources. On the other hand, in
the
relational case this issue is much less relevant than in the XML
case. In fact, while the designer of an XML document chooses
the
webTraffic
29. 7
”
category
F i g u r e 1 3 . A n o t h e r p o s s i b l e D T D g r a p h d
e s c r i b i n g t h e w e b s i t e traffk
4 6
direction of each link without considering the cardinality of the
relationship to be modeled, the designer of a relational schema
is
constrained (by the need to satisfy the first normal form) to
represent each relationship in the -to-one direction. Thus, in
general, the relationship from S to R is one-to-many, hence, not
interesting for multidimensional modeling; the only case in
which
it might be interesting is when the foreign key mechanism has
been used by the designer to model a one-to-one relationship -
but
this is not very frequent.
As seen in Section 3, several DTDs representing the same
subject
may be designed; for each of them, the resulting attribute tree
may
look different. For instance, the attribute tree for the DTD graph
in
Figure 13 is presented in Figure 14. Having click as a fact,
navigating from hostld to host and from urlld to url entails
analyzing the data to check the uncertain relationship. After
inverting hostld with host and urlld with url (it can be done
30. since
they are related by a one-to-one relationship), the resulting
attribute tree becomes the same as the one in Figure 12.
h o s t c a t e g o r y
date t i m e
nation
hostld
r-
urlld url t i l e t y p e
A n 0
c l i c k
siteld
site () 0
0 nation
F i g u r e 1 4 . A t t r i b u t e t r e e f o r t h e D T D g r a
p h i n F i g u r e 1 3
5. CONCLUSIONS
In this paper we described a semi-automatic approach to
conceptual design of a data mart from an XML source. We
showed how the semi-structured nature of the source increases
the
level of uncertainty on the structure of data as compared to
structured sources such as database schemas, thus requiring
access
to the source documents and, possibly, the designer’s help in
order
31. to detect -to-one relationships. The approach was described with
reference to the case in which the sources are constrained by a
DTD using sub-elements, but it can be adopted equivalently
when
XML Schemas a r e c o n s i d e r e d .
Using XML sources for feeding data warehouse systems will
become a standard in the next few years. Adopting a technique
to
derive the data mart schema directly from the XML sources is
not
the only possible approach: the data mart schema may also be
designed “manually”, meaning that facts, measures and
hierarchies are determined starting from the user requirements
and
the logical connection with the source schemas is established
only
a posteriori. On the other hand, the main problem with this
solution is that, very often, the requirements expressed by the
users cannot be fully supported by the existing data; besides,
the
process of mapping each requirement back to the source schema
m a y b e v e r y c o m p l e x .
6. REFERENCES
VI
PI
[31
[41
PI
[61
32. [71
PI
PI
Abiteboul, S., Buneman, P., and Suciu, D. Data on the Web:
F r o m R e l a t i o n s t o S e m i s t r u c t u r e d D a t a a
n d X M L . M o r g a n
Kaufman Publishers, 2000.
Cabibbo, L., and Torlone, R. A logical approach to
multidimensional databases. In Proc. EDBT, 1998.
Datta, A., and Thomas, H. A conceptual model and algebra
for on-line analytical processing in data warehouses. In Proc.
W I T S , 1 9 9 7 .
D e u t s c h , A . , F e m a n d e z , M . , F l o r e s c u , D .
, L e v y , A . , a n d
Suciu, D. A Query Language for XML. In Proc. 8th World
Wide Web Conference, 1999.
Florescu, D., and Kossmann, D. Storing and Querying XML
Data using an RDBMS. IEEE Data Engineering Bulletin 22,
3, 1999.
Franconi, E., and Sattler, U. A data warehouse conceptual
model for multidimensional aggregation. In Proc. DMDW,
1999.
Golfarelli, M., Maio, D., and Rizzi, S. The Dimensional Fact
Model: a conceptual model for data warehouses. Int. Jour. of
Cooperative Inf. Systems 7,2&3, 1998.
Hiisemann, B., Lechtenborger, J., and Vossen, G. Conceptual
33. data warehouse design. In Proc. DMDW, 2000.
Kimball, R. The data warehouse toolkit. John Wiley & Sons,
1996.
[lo] Lee, D., and Chu, W.W. Constraints-preserving
T r a n s f o r m a t i o n f r o m X M L D o c u m e n t T y p
e D e f i n i t i o n t o
Relational Schema. In Proc. 19” ER (Salt Lake City), 2000.
[ 111 Mannila, H., and Raiha, K.J. Algorithms for inferring
functional dependencies. Data & Knowledge Engineering,
12, 1, 1994.
[ 121 Savnik, I., and Flach, P. Bottom-up induction of
functional
dependencies from relations. In Piatesky-Shapiro (ed.),
Knowledge Discovery in Databases, AAAI, 1993.
[ 131 Shanmugasundaram, J., et al. Relational Databases for
Querying XML Documents: Limitations and Opportunities.
In Proc. 25” VLDB (Edinburgh), 1999.
[ 141 Vassiliadis, P. Modeling multidimensional databases,
cubes
and cube operations. In Proc. 10th SSDBM Conf. (Capri,
Italy), 1998.
[ 151 World Wide Web Consortium (W3C). XML 1 .O
Specification. http://www.w3,org/TR/2000/REC-xml-
20001006.
[16] World Wide Web Consortium (W3C). XML Schema.
http:Nwww.w3.org/XML/Schema.
[17] World Wide Web Consortium (W3C). Xpath Specification
34. 1 .O. http:llwww.w3.orglTRJxpath.
[18] World Wide Web Consortium (W3C). XQuery 1.0: An
XML
Query Language (Working Draft),
http:llwww.w3.orglTIUxqueryl.
47