DBMS
• Data Warehouse & Types of Data
• The term data warehouse is used to distinguish a database that is used for
business analysis (OLAP) rather than transaction processing (OLTP). While an OLTP
database contains current low-level data and is typically optimized for the
selection and retrieval of records, a data warehouse typically contains aggregated
historical data and is optimized for particular types of analyses, depending upon
the client applications.
• The contents of your data warehouse depends on the requirements of your users.
They should be able to tell you what type of data they want to view and at what
levels of aggregation they want to be able to view it.
Data warehouse will store these types of data:
• Historical data
• Derived data
• Metadata
METADATA : Metadata is a data about data. Metadata shows basic information
about data, which can make finding and working with specific instances of data
easier. Metadata increases the accuracy of searching and operating of data from
large amount of data
Definition of DBMS
• A Database Management System (DBMS) is a software system that is
designed to manage and organize data in a structured manner. It allows users
to create, modify, and query a database, as well as manage the security and
access controls for that database.
• DBMS provides an environment to store and retrieve the data in an efficient
manner.
• A database is a collection of interrelated data which helps in the efficient
retrieval, insertion, and deletion of data from the database and organizes the
data in the form of tables, views, schemas, reports, etc.
Key Features of DBMS
• Data storage and retrieval: A DBMS is responsible for
storing and retrieving data from the database, and can
provide various methods for searching and querying
the data.
• Concurrency control: A DBMS provides mechanisms
for controlling concurrent access to the database, to
ensure that multiple users can access the data without
conflicting with each other.
• Data integrity and security: A DBMS provides tools for
enforcing data integrity and security constraints, such
as constraints on the values of data and access controls
that restrict who can access the data.
• Backup and recovery: A DBMS provides mechanisms
for backing up and recovering the data in the event of a
system failure.
• DBMS can be classified into two types: Relational
Database Management System (RDBMS) and Non-
Relational Database Management System (NoSQL or
Non-SQL)
• RDBMS: Data is organized in the form of tables and
each table has a set of rows and columns. The data are
related to each other through primary and foreign
keys.
• NoSQL: Data is organized in the form of key-value
pairs, documents, graphs, or column-based. These are
designed to handle large-scale, high-performance
scenarios.
Properties of DBMS
• A transaction is a single logical unit of work that accesses and possibly modifies the contents of a
database. Transactions access data using read and write operations. In order to maintain
consistency in a database, before and after the transaction, certain properties are followed. These
are called ACID properties.
Atomicity: A transaction is either completely successful or completely unsuccessful. There is no
partial commit. Each transaction is considered as one unit and either runs to completion or is not
executed at all. It involves the following two operations.
Abort: If a transaction aborts, changes made to the database are not visible.
Commit: If a transaction commits, changes made are visible.
Atomicity is also known as the ‘All or nothing rule’.
Consistency: The database remains in a consistent state before and after a transaction is
committed. It refers to the correctness of a database.
Isolation: Transactions are isolated from each other, so that the changes made by one transaction
do not affect the other transactions until they are committed. This property ensures that the
execution of transactions concurrently will result in a state that is equivalent to a state achieved
these were executed serially in some order.
Durability: Once a transaction is committed, its changes are permanent and cannot be rolled back.
This property ensures that once the transaction has completed execution, the updates and
modifications to the database are stored in and written to disk and they persist even if a system
failure occurs. These updates now become permanent and are stored in non-volatile memory. The
effects of the transaction, thus, are never lost.
Data warehouse Architecture
• A data-warehouse is a heterogeneous collection of different data sources organized under a
unified schema (visual or diagrammatical representation). There are 2 approaches for
constructing data-warehouse: Top-down approach and Bottom-up approach
• Top-down approach:
• External Sources –
External source is a source from where data is collected irrespective of the type of
data. Data can be structured, semi structured and unstructured as well.
• Stage Area –
Since the data, extracted from the external sources does not follow a particular
format, so there is a need to validate this data to load into data warehouse. For this
purpose, it is recommended to use ETL tool.
– E(Extracted): Data is extracted from External data source.
– T(Transform): Data is transformed into the standard format.
– L(Load): Data is loaded into data warehouse after transforming it into the
standard format.
• Data-warehouse –
After cleansing of data, it is stored in the data warehouse as central repository. It
actually stores the meta data and the actual data gets stored in the data marts.
• Note that data warehouse stores the data in its purest form in this top-down
approach.
• Data Marts –
Data mart is also a part of storage component. It stores the information of
a particular function of an organization which is handled by single
authority. There can be as many number of data marts in an organization
depending upon the functions. We can also say that data mart contains
subset of the data stored in data warehouse.
• Data Mining –
The practice of analyzing the big data present in data warehouse is data
mining. It is used to find the hidden patterns that are present in the
database or in data warehouse with the help of algorithm of data mining.
This approach is defined by Inmon as – data warehouse as a central
repository for the complete organization and data marts are created from
it after the complete data warehouse has been created.
2. Bottom-up approach:
• First, the data is extracted from external sources (same as happens in
top-down approach).
• Then, the data go through the staging area (as explained above) and
loaded into data marts instead of data warehouse.
• A data mart is a data storage system that contains information
specific to an organization's business unit. It contains a small and
selected part of the data that the company stores in a larger storage
system.
• These data marts are then integrated into data warehouse.
• This approach is given by Kinball as – data marts are created first
and provides a thin view for analyses and data warehouse is created
after complete data marts have been created.
Category OLAP (Online Analytical Processing) OLTP (Online Transaction Processing)
Definition
It is well-known as an online database query
management system.
It is well-known as an online database modifying
system.
Data source Consists of historical data from various Databases. Consists of only operational current data.
Method used It makes use of a data warehouse.
It makes use of a standard database management
system (DBMS).
Application
It is subject-oriented. Used for Data Mining, Analytics,
Decisions making, etc.
It is application-oriented. Used for business tasks.
Normalized In an OLAP database, tables are not normalized. In an OLTP database, tables are normalized .
Usage of data
The data is used in planning, problem-solving, and
decision-making.
The data is used to perform day-to-day
fundamental operations.
Task
It provides a multi-dimensional view of
different business tasks.
It reveals a snapshot of present
business tasks.
Purpose
It serves the purpose to extract
information for analysis and decision-
making.
It serves the purpose to Insert, Update,
and Delete information from the
database.
Volume of data
A large amount of data is stored
typically in TB, PB
The size of the data is relatively small
as the historical data is archived in MB,
and GB.
Queries
Relatively slow as the amount of data
involved is large. Queries may take
hours.
Very Fast as the queries operate on 5%
of the data.
Update
The OLAP database is not often
updated. As a result, data integrity is
unaffected.
The data integrity constraint must be
maintained in an OLTP database.
Backup and Recovery
It only needs backup from time to time
as compared to OLTP.
The backup and recovery process is
maintained rigorously
Processing time
The processing of complex queries can
take a lengthy time.
It is comparatively fast in processing
because of simple and straightforward
queries.
Types of users
This data is generally managed by CEO,
MD, and GM.
This data is managed by clerksForex
and managers.
Operations Only read and rarely write operations. Both read and write operations.
Updates
With lengthy, scheduled batch
operations, data is refreshed on a
regular basis.
The user initiates data updates, which
are brief and quick.
Nature of audience
The process is focused on the
customer.
The process is focused on the market.
Database Design Design with a focus on the subject.
Design that is focused on the
application.
Productivity
Improves the efficiency of business
analysts.
Enhances the user’s productivity.
data warehousing need and characteristics. types of data w data warehouse architecture

data warehousing need and characteristics. types of data w data warehouse architecture

  • 1.
  • 2.
    • Data Warehouse& Types of Data • The term data warehouse is used to distinguish a database that is used for business analysis (OLAP) rather than transaction processing (OLTP). While an OLTP database contains current low-level data and is typically optimized for the selection and retrieval of records, a data warehouse typically contains aggregated historical data and is optimized for particular types of analyses, depending upon the client applications. • The contents of your data warehouse depends on the requirements of your users. They should be able to tell you what type of data they want to view and at what levels of aggregation they want to be able to view it. Data warehouse will store these types of data: • Historical data • Derived data • Metadata METADATA : Metadata is a data about data. Metadata shows basic information about data, which can make finding and working with specific instances of data easier. Metadata increases the accuracy of searching and operating of data from large amount of data
  • 3.
    Definition of DBMS •A Database Management System (DBMS) is a software system that is designed to manage and organize data in a structured manner. It allows users to create, modify, and query a database, as well as manage the security and access controls for that database. • DBMS provides an environment to store and retrieve the data in an efficient manner. • A database is a collection of interrelated data which helps in the efficient retrieval, insertion, and deletion of data from the database and organizes the data in the form of tables, views, schemas, reports, etc.
  • 4.
    Key Features ofDBMS • Data storage and retrieval: A DBMS is responsible for storing and retrieving data from the database, and can provide various methods for searching and querying the data. • Concurrency control: A DBMS provides mechanisms for controlling concurrent access to the database, to ensure that multiple users can access the data without conflicting with each other. • Data integrity and security: A DBMS provides tools for enforcing data integrity and security constraints, such as constraints on the values of data and access controls that restrict who can access the data. • Backup and recovery: A DBMS provides mechanisms for backing up and recovering the data in the event of a system failure.
  • 5.
    • DBMS canbe classified into two types: Relational Database Management System (RDBMS) and Non- Relational Database Management System (NoSQL or Non-SQL) • RDBMS: Data is organized in the form of tables and each table has a set of rows and columns. The data are related to each other through primary and foreign keys. • NoSQL: Data is organized in the form of key-value pairs, documents, graphs, or column-based. These are designed to handle large-scale, high-performance scenarios.
  • 6.
    Properties of DBMS •A transaction is a single logical unit of work that accesses and possibly modifies the contents of a database. Transactions access data using read and write operations. In order to maintain consistency in a database, before and after the transaction, certain properties are followed. These are called ACID properties. Atomicity: A transaction is either completely successful or completely unsuccessful. There is no partial commit. Each transaction is considered as one unit and either runs to completion or is not executed at all. It involves the following two operations. Abort: If a transaction aborts, changes made to the database are not visible. Commit: If a transaction commits, changes made are visible. Atomicity is also known as the ‘All or nothing rule’. Consistency: The database remains in a consistent state before and after a transaction is committed. It refers to the correctness of a database. Isolation: Transactions are isolated from each other, so that the changes made by one transaction do not affect the other transactions until they are committed. This property ensures that the execution of transactions concurrently will result in a state that is equivalent to a state achieved these were executed serially in some order. Durability: Once a transaction is committed, its changes are permanent and cannot be rolled back. This property ensures that once the transaction has completed execution, the updates and modifications to the database are stored in and written to disk and they persist even if a system failure occurs. These updates now become permanent and are stored in non-volatile memory. The effects of the transaction, thus, are never lost.
  • 7.
    Data warehouse Architecture •A data-warehouse is a heterogeneous collection of different data sources organized under a unified schema (visual or diagrammatical representation). There are 2 approaches for constructing data-warehouse: Top-down approach and Bottom-up approach • Top-down approach:
  • 8.
    • External Sources– External source is a source from where data is collected irrespective of the type of data. Data can be structured, semi structured and unstructured as well. • Stage Area – Since the data, extracted from the external sources does not follow a particular format, so there is a need to validate this data to load into data warehouse. For this purpose, it is recommended to use ETL tool. – E(Extracted): Data is extracted from External data source. – T(Transform): Data is transformed into the standard format. – L(Load): Data is loaded into data warehouse after transforming it into the standard format. • Data-warehouse – After cleansing of data, it is stored in the data warehouse as central repository. It actually stores the meta data and the actual data gets stored in the data marts. • Note that data warehouse stores the data in its purest form in this top-down approach.
  • 9.
    • Data Marts– Data mart is also a part of storage component. It stores the information of a particular function of an organization which is handled by single authority. There can be as many number of data marts in an organization depending upon the functions. We can also say that data mart contains subset of the data stored in data warehouse. • Data Mining – The practice of analyzing the big data present in data warehouse is data mining. It is used to find the hidden patterns that are present in the database or in data warehouse with the help of algorithm of data mining. This approach is defined by Inmon as – data warehouse as a central repository for the complete organization and data marts are created from it after the complete data warehouse has been created.
  • 10.
  • 11.
    • First, thedata is extracted from external sources (same as happens in top-down approach). • Then, the data go through the staging area (as explained above) and loaded into data marts instead of data warehouse. • A data mart is a data storage system that contains information specific to an organization's business unit. It contains a small and selected part of the data that the company stores in a larger storage system. • These data marts are then integrated into data warehouse. • This approach is given by Kinball as – data marts are created first and provides a thin view for analyses and data warehouse is created after complete data marts have been created.
  • 12.
    Category OLAP (OnlineAnalytical Processing) OLTP (Online Transaction Processing) Definition It is well-known as an online database query management system. It is well-known as an online database modifying system. Data source Consists of historical data from various Databases. Consists of only operational current data. Method used It makes use of a data warehouse. It makes use of a standard database management system (DBMS). Application It is subject-oriented. Used for Data Mining, Analytics, Decisions making, etc. It is application-oriented. Used for business tasks. Normalized In an OLAP database, tables are not normalized. In an OLTP database, tables are normalized . Usage of data The data is used in planning, problem-solving, and decision-making. The data is used to perform day-to-day fundamental operations.
  • 13.
    Task It provides amulti-dimensional view of different business tasks. It reveals a snapshot of present business tasks. Purpose It serves the purpose to extract information for analysis and decision- making. It serves the purpose to Insert, Update, and Delete information from the database. Volume of data A large amount of data is stored typically in TB, PB The size of the data is relatively small as the historical data is archived in MB, and GB. Queries Relatively slow as the amount of data involved is large. Queries may take hours. Very Fast as the queries operate on 5% of the data. Update The OLAP database is not often updated. As a result, data integrity is unaffected. The data integrity constraint must be maintained in an OLTP database. Backup and Recovery It only needs backup from time to time as compared to OLTP. The backup and recovery process is maintained rigorously Processing time The processing of complex queries can take a lengthy time. It is comparatively fast in processing because of simple and straightforward queries.
  • 14.
    Types of users Thisdata is generally managed by CEO, MD, and GM. This data is managed by clerksForex and managers. Operations Only read and rarely write operations. Both read and write operations. Updates With lengthy, scheduled batch operations, data is refreshed on a regular basis. The user initiates data updates, which are brief and quick. Nature of audience The process is focused on the customer. The process is focused on the market. Database Design Design with a focus on the subject. Design that is focused on the application. Productivity Improves the efficiency of business analysts. Enhances the user’s productivity.