SlideShare a Scribd company logo
Introduction to Data Warehousing
Contents 
• History 
• OLTP vs. OLAP 
• Paradigm Shift 
• Architecture 
• Emerging Technologies 
• Questions
History: Hollerith Cards 
Once upon a time… 
• Reporting was done with data stored on Hollerith cards. 
– A card contained one record. 
– Maximum length of the record was 80 characters. 
– A data file consisted of a stack of cards. 
• Data was “loaded” each time a report was run. 
• Data had no “modeling” per se. There were just sets of records. 
• Programs in languages such as COBOL, RPG, or BASIC would: 
– Read a stack of data cards into memory 
– Loop through records and perform a series of steps on each (e.g. increment a counter or add an amount 
to a total) 
– Send a formatted report to a printer 
• It was difficult to report from multiple record types. 
• Changes to data were implemented by simply adding, removing or replacing cards.
History: Hollerith Cards 
FACTOIDS 
• Card type: IBM 80-column punched card 
• A/K/A: “Punched Card”, “IBM Card” 
• Size: 7 3⁄8 by 3 1⁄4 inches 
• Thickness: .007 inches (143 cards per inch) 
• Capacity: 80 columns with 12 punch locations each
History: Magnetic Tape 
• Card were eventually replaced by magnetic tape. 
• Tapes made data easier to load and storage more efficient. 
• Records were stored sequentially, so individual records could not be quickly accessed. 
• Data processing was still very similar to that of cards—load a file into computer memory 
and loop through the records to process. 
• Updating data files was still difficult.
History: Disk Storage 
History 
• The arrival of disk storage revolutionized data storage and access. 
• It was now possible to have a home base for data: a database. 
• Data was always available: online. 
• Direct access replaced sequential access so data could be accessed more quickly. 
• Data stored on disk required new methods for adding, deleting, or updating records. This 
type of processing became known as Online Transaction Processing (OLTP). 
• Reporting from a database became known as Online Analytical Processing (OLAP).
History: Disk Storage 
History 
FACTOIDS 
• Storage Device: IBM 350 disk storage unit 
• Released: 1956 
• Capacity: 5 million 6-bit characters (3.75 megabytes). 
• Disk spin speed: 1200 RPM. 
• Data transfer rate: 8,800 characters per second.
History: Relational Model 
History 
• In the 1960’s, E.F. Codd developed the relational model. 
• Relational modeling was based on a branch of mathematics called set theory and added 
rigor to the organization and management of data. 
• Relational modeling also improved OLTP (inserting, updating, and deleting data) by 
making these processes more efficient and reducing data anomalies. 
• The relational model also introduced primary keys, foreign keys, referential integrity, 
relational algebra, and a number of other concepts used in modern database systems. 
• It soon became apparent that different data models facilitated OLAP vs. OLTP. 
• Relational data was often denormalized (made non-relational) to support OLAP. 
• Structured Query Language (SQL) was the first language created to support relational 
database operations both OLAP and OLTP.
History: Relational Model 
FACTOIDS 
• Although E.F. Codd was employed by IBM when he create the relational model and IBM 
originated the SQL language (then “SEQUEL”), IBM was not the first vendor to produce a 
relational database or to use SQL. 
• The first commercial implementation of relational database and SQL was from Relational 
Software, Inc. which is now Oracle Corporation. 
• SQL has been standardized by standards organizations American National Standards 
Institute (ANSI) and the International Standards Organization (ISO).
History: Extracts 1 
• In early relational databases, data was extracted from OLTP systems into denormalized 
extracts for reporting. 
OLTP OLAP 
Report 
Source 
Extract
History: Extracts 2 
History 
• And more extracts... 
OLTP OLAP 
Report 
Source 
Extract 
OLTP OLAP 
Report 
Source 
Extract 
OLAP Report 
Extract
History: Extracts 3 
• And more extracts...
History: Extracts 4 
• And more extracts...
History: Naturally Evolving Systems 1 
• Naturally evolving systems began to emerge. 
History: Naturally Evolving Systems 2 
• Naturally evolving systems resulted in 
– Poor organization of data 
– Extremely complicated processing requirements 
– Inconsistencies in extract refresh status 
– Inconsistent report results. 
• This created a need for architected systems for analysis and reporting. 
• Instead of multiple extract files, a single source of truth was needed for each data source.
History: Architected Systems 
• Developers began to design architected systems for OLAP data. 
• In the 1980’s and 1990’s, organizations began to integrate data from multiple sources such 
as accts. receivable, accts. payable, HR, inventory, and so on. These integrated OLAP 
databases became known as Enterprise Data Warehouses (EDWs). 
• Over time methods and techniques for extracting and integrating data into architected 
systems began to evolve and standardize. 
• The term data warehousing is now used to refer to the commonly used architectures, 
methods, and techniques for transforming and integrating data to be used for analysis.
History: An Architected Data Warehouse 
Example of an Architected Data Warehouse 
OLTP Report 
OLTP 
Staging 
History 
DM 
DM 
DM 
DM 
Report 
Data 
set 
Data 
set 
OLTP 
OLTP 
ODS
History: Compare Naturally Evolving System 
Compare: Naturally evolving system  
18
History: Compare Architected Data Warehouse 
Compare: Architected Data Warehouse  
OLTP Report 
OLTP 
Staging 
History 
DM 
DM 
DM 
DM 
Report 
Data 
set 
Data 
set 
OLTP 
OLTP 
ODS
History: Inmon 
• In the early 1990’s W.H. Inmon published Building the Data Warehouse (ISBN- 
10: 0471141615) 
• Inmon put together the quickly accumulating knowledge of data warehousing and 
popularized most of the terminology we use today. 
• Data in a data warehouse is extracted from another data source, transformed to make it 
suitable for analysis, and loaded into the data warehouse. This process is often referred to 
as Extract, Transform and Load (ETL). 
• Since data from multiple sources was integrated in most data warehouses, Inmon also 
described the process as Transformation and Integration (T&I). 
• Data in a data warehouse is stored in history which is modeled for fast performance when 
querying the data. 
• The history tables are the source of truth. 
• Data from history is usually extracted into data marts which are used for analysis and 
reporting. 
• Separate data marts are created for each application. There is often redundant data 
across data marts.
History: Inmon 
FACTOIDS 
• W.H. Inmon coined the term data warehouse. 
• W.H. Inmon is recognized by many as the father of data warehousing. 
• W.H. Inmon created the first and most commonly accepted definition of a data 
warehouse: A subject oriented, nonvolatile, integrated, time variant collection of data in 
support of management's decisions. 
• Other firsts of W.H. Inman 
– Wrote the first book on data warehousing 
– Wrote the first magazine column on data warehousing 
– Taught the first classes on data warehousing
History: Kimball 1 
• Also in the 1990’s, Ralph Kimball published The Data Warehouse Toolkit (ISBN- 
10: 0471153370) which popularized dimensional modeling. 
• Dimensional modeling is based on the cube concept which is a multi-dimensional view of 
data. 
A cube used to represent multi-dimensional data 
• The cube metaphor can only illustrate three dimensions. A dimensional model can be any 
number of dimensions.
History: Kimball 2 
• Kimball implemented cubes as star schemas which support querying data in multiple 
dimensions.
History: Kimball 3 
• Kimball’s books do not discuss the relational model in depth, but his dimensional model 
can be explained in relational terms. 
• A star schema is a useful way to store data for quickly slicing and dicing data on multiple 
dimensions. 
• Dimensional modeling and star schema are frequently misunderstood and improperly 
implemented. Queries against incorrectly designed tables in a star schema can skew 
report results. 
• The term OLAP has come to be used specifically to refer to dimensional modeling in many 
marketing materials. 
• Star schemas are implemented as data marts so that they can be queried by users and 
applications. However, data marts aren’t necessarily star schema.
History: Kimball 4 
FACTOIDS 
• Ralph Kimball had a Ph.D. in electrical engineering from Stanford University. 
• Kimball worked at the Xerox Palo Alto Research Center (PARC). PARC is where laser 
printing, Ethernet, object-oriented programming, and graphic user interfaces were 
invented. 
• Kimball was a principal designer of the Xerox Star Workstation which was the first personal 
computer to use windows, icons, and mice.
OLTP vs. OLAP 
Operational Data/OLTP Data Warehouse/OLAP 
Data is normalized (3NF) Data may be normalized, denormalized, use 
dimensional models, application-specific data 
sets, or other designs. 
Data is constantly updated. Data represents a state at a point in time. 
Existing data does not change. New data can 
be added to history. 
Typical operations are selects on small sets of 
records, inserts, updates, and deletes of 
individual records. 
Typical operations are selects, sorts, groupings 
and aggregations of large numbers of records, 
and inserts of thousands or millions of records. 
All transactions are logged. Inserts may not be logged at record level. 
There normally are no updates or deletes. 
B-tree indexes used for performance Partitioning and bitmap indexes are used for 
performance. 
Traditional development life cycle Heuristic and agile development 
Data designed for application Data taken from some other application 
Date range of records is limited; old 
transactions are archived. 
Date range of history tables can be many years.
Paradigm Shift: For Management 
For Management 
• Traditional development life cycle doesn’t work well when building a data warehouse. There 
is a discovery process. Agile development works better. 
• OLTP data was designed for a given purpose, but OLAP is created from data that was 
designed for some other purpose—not reporting. It is important to evaluate data content 
before designing applications. 
• OLAP data may not be complete or precise per the application. 
• Data integrated from different sources may be inconsistent. 
– Different code values 
– Different columns 
– Different meaning of column names 
• OLAP data tend to be much larger requiring more resources. 
• Storage, storage, storage…
Paradigm Shift: For DBAs 
For DBAs 
• Different system configurations (in Oracle, different initialization parameters) 
• Transaction logging may not be used, and methods for recovery from failure are different. 
• Different tuning requirements: 
– Selects are high cardinality (large percentage of rows) 
– Massive sorting, grouping and aggregation 
– DML operations can involve thousands or millions of records. 
• Need much more temporary space for caching aggregations, sorts and temporary tables. 
• Need different backup strategies. Backup frequency is based on ETL scheduling instead of 
transaction volume. 
• May be required to add new partitions and archive old partitions in history tables. 
• Storage, storage, storage…
Paradigm Shift: For Architects & Developers 
For Architects and Developers 
• Different logical modeling and schema design. 
• Use indexes differently (e.g. bitmap rather than b-tree) 
• Extensive use of partitioning for history and other large tables 
• Different tuning requirements 
– Selects are high cardinality (large percentage of rows) 
– Lots of sorting, grouping and aggregation 
– DML operations can involve thousands or millions of records. 
• ETL processes are different than typical DML processes 
– Use different coding techniques 
– Use packages, functions, and stored procedures but rarely use triggers or constraints 
– Many steps to a process 
– Integrate data from multiple sources 
• Iterative and incremental development process (agile development)
Paradigm Shift: For Analysts and Data Users 
For Analysts and Data Users—All Good News 
• A custom schema (data mart) can be created for each application per the user requirements. 
• Data marts can be permanent, temporary, generalized or project-specific. 
• New data marts can be created quickly—typically in days instead of weeks or months. 
• Data marts can easily be refreshed when new data is added to the data warehouse. Data 
mart refreshes can be scheduled or on demand. 
• In addition to parameterized queries and SQL, there may be additional query tools and 
dashboards (e.g. Business Intelligence, Self-Service BI, data visualization, etc.). 
• Several years of history can be maintained in a data warehouse—bigger samples. 
• There is a consistent single source of truth for any given data set.
Architecture: Main Components 
Components of a Data Warehouse 
Operational Data Data Warehouse OLAP Data 
ETL ETL 
OLTP Report 
OLTP 
Staging History 
REF 
DM 
DM 
DM 
DM 
Report 
Data 
set 
Data 
set 
ODS
Architecture: Staging and ODS 
Staging and ODS 
• New data is initially loaded into staging so that it can be 
Operational Data Data Warehouse OLAP Data 
processed into the data warehouse. 
OLTP Report 
OLTP 
Staging History 
REF 
DM 
DM 
DM 
DM 
Report 
Data 
set 
Data 
set 
ODS 
ETL 
• Many options are available for getting operational data 
from internal or external sources into the staging area 
• SQL Loader 
• imp/exp/impdp/expdp 
• Change Data Capture (CDC) 
• Replication via materialized views 
• Third-party ETL tools 
• Staging contains a snapshot in time of operational data. 
• An Operational Data Store (ODS) is an optional 
component that is used for near real time reporting. 
• Transformation and integration of data in an ODS 
is limited. 
• Less history (shorter time span) is kept in an ODS.
Architecture: History and Reference Data 
History and Reference Data 
Operational Data Data Warehouse OLAP Data 
ETL 
OLTP Report 
OLTP 
Staging History 
REF 
DM 
DM 
DM 
DM 
Report 
Data 
set 
Data 
set 
ODS 
• History includes all source data—no 
exclusions or integrity constraints. 
• Partitioning is used to: 
• manage extremely large tables 
• improve performance of queries 
• to facilitate “rolling window” of 
history. 
• Denormalization can be used to 
reduce number of joins when 
selecting data from history. 
• No surrogate keys—maintain all 
original code values in history. 
• Reference data should also have 
history (e.g. changing ICD9 codes 
over time).
Architecture: Data Marts 
Data Marts 
• Data marts are per requirements of users 
Operational Data OLAP Data 
ETL ETL 
OLTP Report 
OLTP 
Staging History 
REF 
DM 
DM 
DM 
DM 
Report 
Data 
set 
Data 
set 
ODS 
and applications. 
• Selection criteria (conditions in WHERE 
clause) are applied when creating data 
marts. 
• Logical data modeling is applied at data 
mart level (e.g. denormalized, star 
schemas, analytic data sets, etc.). 
• Integrity constraints can be applied at data 
mart level. 
• Any surrogate keys can be applied at data 
mart level (e.g. patient IDs). 
• Data marts can be Oracle, SQL Server, 
text files, SAS data sets, etc. 
• Data marts can be permanent or 
temporary for ongoing or one-time 
applications. 
• Data mart refreshes can be scheduled or 
on demand.
Emerging Technologies 
Emerging technologies that are having an impact on data warehousing 
• Massively Parallel Processing (MPP) 
• In-Memory Databases (IMDB) 
• Unstructured Databases 
• Column-Oriented Databases 
• Database Appliances 
• Data Access Tools 
• Cloud Database Services
Emerging Technologies: MPP 
Massively Parallel Processing (MPP) 
• Data is partitioned over hundreds or even thousands of server nodes. 
• A controller node manages query execution. 
• A query is passed to all nodes simultaneously. 
• Data is retrieved from all nodes and assembled to produce query results. 
• MPP systems will automatically partition and distribute data using their own algorithms. 
Developers and architects need only be concerned with conventional data modeling and DML 
operations. 
• MPP systems make sense for OLAP and data warehousing where queries are on very large 
numbers of records.
Emerging Technologies: IMDB 
In-Memory Database 
• Data is stored in random access memory (RAM) rather than on disk or SSD. 
• Memory is accessed much more quickly reducing seek times. 
• Traditional RDBMS software often uses a memory cache when processing data, but it is 
optimized for limited cache with most data stored on disk. 
• IMDB software has modified algorithms to be optimized to read data from memory. 
• Database replication with failover is typically required because of the volatility of computer 
memory. 
• Cost of RAM has dropped considerably in recent years making IMDB systems more feasible. 
• Microsoft SQL Server has an In-Memory option. Tables must be defined as memory 
optimized to use this feature. 
• Oracle has recently announced the upcoming availability of their In-Memory Option.
Emerging Technologies: Unstructured Databases 
Unstructured Databases 
• Unstructured databases-- sometimes referred to as NoSQL databases--support vast amounts 
of text data and extremely fast text searches. 
• Unstructured databases utilize massively parallel processing (MPP) and extensive text 
indexing. 
• Unstructured databases do not fully support relational features such as complex data 
modeling, join operations and referential integrity. However, these databases are evolving to 
incorporate additional relational capabilities. 
• Oracle, Microsoft, and other RDBMS vendors are creating hybrid database systems that 
incorporate unstructured data with relational database systems. 
• Unstructured databases are useful for very fast text searches on very large amounts of data— 
they are generally not useful for complex transaction processing, analyses and informatics.
Emerging Technologies: Big Data 
FACTOIDS 
• Big data became an issues as early as 1880 with the U.S. Census which took several years to 
tabulate with then existing methods. 
• The term information explosion was first used in a the Lawton Constitution, a small-town 
Oklahoma newspaper in 1941. 
• The term big data was used for the first time in an article by NASA researchers Michael Cox 
and David Ellsworth. The article discussed the inability of current computer systems to 
handle the increasing amounts of data. 
• Google was a pioneer in creating modern hardware and software solutions for big data. 
• Parkinson’s Law of Data: “Data expands to fill the space available.” 
• 1 exabyte= 10006 bytes = 1018 bytes = 1000 petabytes = 1 billion gigabytes.
Emerging Technologies: Column-Oriented 
Column-Oriented Databases 
• Data in a typical relational database is organized by row. The row paradigm is used for 
physical storage as well as the logical organization of data. 
• Column-Oriented databases physically organize data by column while still able to present 
data within rows. 
• Data is stored on disk in blocks. While the row-oriented databases store the contents of a 
row in a block, column-oriented databases store the contents of a column in a block. 
• Each column has row and table identifiers so that columns can be combined to produce rows 
of data in a table. 
• Since most queries select a subset of columns (rather than entire rows), column-oriented 
databases tend to perform much better for analytical processing (e.g. querying a data mart). 
• Microsoft SQL Server and Oracle Exadata have support for column-based data storage.
Emerging Technologies: Appliances 
Database Appliances 
• A database appliance is an integrated, preconfigured package of RDBMS software and 
hardware. 
• The most common type of database appliance is a data warehouse appliance. 
• Most major database vendors including Microsoft and Oracle and their hardware partners 
package and sell database appliances for data warehousing. 
• Data warehouse appliances utilize massively parallel processing (MPP). 
• Database appliances generally do not scale well outside of the purchased configuration. For 
example, you generally don’t add storage to a database appliance. 
• The database appliance removes the burden of performance tuning. Conversely, database 
administrators have less flexibility. 
• A database appliance can be a cost-effective solution for data warehousing in many 
situations.
Emerging Technologies: Data Access Tools 
Data Access Tools 
• Business Intelligence (BI) tools allow users to view and access data, create aggregations and 
summaries, create reports, and view dashboards with current data. 
• BI tools typically sit on top of data marts created by the architects and developers. Data 
marts that support BI are typically star schema. 
• Newer Self-Service BI tools add additional capabilities such as allowing users to integrate 
multiple data sources and do further analysis on result data sets from previous analyses. 
• Data visualization tools allow users to view data in various graphs. 
• Newer tools allow users to access and analyze data from multiple form factors including 
smart phones and tablets. 
• Data access, BI and data visualization tools do not always provide the capability to perform 
complex analyses or fulfill specific requirements of complex reports (e.g. complex statistical 
analyses or studies submitted to journals). Programming skills are frequently still required.
Emerging Technologies: Cloud Databases 
Cloud Database Services 
• Oracle, Microsoft, and other database vendors offer cloud database services. 
• The cloud service performs all database administrative tasks: 
– Replicate data on multiple severs 
– Make backups 
– Scale growing databases 
– Performance tuning 
• Cloud services can be useful for prototyping and heuristic development. A large commitment 
to hardware purchases and administrative staff can be postponed for later assessment. 
• Cloud services could result in considerable cost savings for some organizations. 
• A cloud hybrid database is one that has database components both on the cloud and on local 
servers. 
• Cloud services may limit administrative options and flexibility vs. having your own DBAs. 
• Cloud services may not meet regulatory requirements for security and storage for some 
industries (e.g. medical data).
Operational Data Data Warehouse OLAP Data 
ETL ETL 
OLTP Report 
OLTP 
Staging History 
REF 
DM 
DM 
DM 
DM 
Report 
Data 
set 
Data 
set 
ODS

More Related Content

What's hot

OLAP
OLAPOLAP
Online analytical processing
Online analytical processingOnline analytical processing
Online analytical processing
nurmeen1
 
OLAP v/s OLTP
OLAP v/s OLTPOLAP v/s OLTP
OLAP v/s OLTP
ahsan irfan
 
Simultaneous OLTP and OLAP in ERP
Simultaneous OLTP and OLAP in ERPSimultaneous OLTP and OLAP in ERP
Simultaneous OLTP and OLAP in ERP
Chad Gates
 
Olap, oltp and data mining
Olap, oltp and data miningOlap, oltp and data mining
Olap, oltp and data mining
zafrii
 
OLAP
OLAPOLAP
OLAP technology
OLAP technologyOLAP technology
OLAP technology
Dr. Mahendra Srivastava
 
OLAP OnLine Analytical Processing
OLAP OnLine Analytical ProcessingOLAP OnLine Analytical Processing
OLAP OnLine Analytical Processing
Walid Elbadawy
 
Datawarehouse and OLAP
Datawarehouse and OLAPDatawarehouse and OLAP
Datawarehouse and OLAP
SAS SNDP YOGAM COLLEGE,KONNI
 
An overview of data warehousing and OLAP technology
An overview of  data warehousing and OLAP technology An overview of  data warehousing and OLAP technology
An overview of data warehousing and OLAP technology
Nikhatfatima16
 
Case study: Implementation of OLAP operations
Case study: Implementation of OLAP operationsCase study: Implementation of OLAP operations
Case study: Implementation of OLAP operations
chirag patil
 
EDW and Hadoop
EDW and HadoopEDW and Hadoop
EDW and Hadoop
Tapio Vaattanen
 
BDM8 - Near-realtime Big Data Analytics using Impala
BDM8 - Near-realtime Big Data Analytics using ImpalaBDM8 - Near-realtime Big Data Analytics using Impala
BDM8 - Near-realtime Big Data Analytics using Impala
David Lauzon
 
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use caseBDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
David Lauzon
 
Apache storm
Apache stormApache storm
Apache storm
Kapil Kumar
 
IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?
DataWorks Summit
 
ODI11g, Hadoop and "Big Data" Sources
ODI11g, Hadoop and "Big Data" SourcesODI11g, Hadoop and "Big Data" Sources
ODI11g, Hadoop and "Big Data" Sources
Mark Rittman
 
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.
George Joseph
 
Hadoop data ingestion
Hadoop data ingestionHadoop data ingestion
Hadoop data ingestion
Vinod Nayal
 
MySQL conference 2010 ignite talk on InfiniDB
MySQL conference 2010 ignite talk on InfiniDBMySQL conference 2010 ignite talk on InfiniDB
MySQL conference 2010 ignite talk on InfiniDB
Calpont
 

What's hot (20)

OLAP
OLAPOLAP
OLAP
 
Online analytical processing
Online analytical processingOnline analytical processing
Online analytical processing
 
OLAP v/s OLTP
OLAP v/s OLTPOLAP v/s OLTP
OLAP v/s OLTP
 
Simultaneous OLTP and OLAP in ERP
Simultaneous OLTP and OLAP in ERPSimultaneous OLTP and OLAP in ERP
Simultaneous OLTP and OLAP in ERP
 
Olap, oltp and data mining
Olap, oltp and data miningOlap, oltp and data mining
Olap, oltp and data mining
 
OLAP
OLAPOLAP
OLAP
 
OLAP technology
OLAP technologyOLAP technology
OLAP technology
 
OLAP OnLine Analytical Processing
OLAP OnLine Analytical ProcessingOLAP OnLine Analytical Processing
OLAP OnLine Analytical Processing
 
Datawarehouse and OLAP
Datawarehouse and OLAPDatawarehouse and OLAP
Datawarehouse and OLAP
 
An overview of data warehousing and OLAP technology
An overview of  data warehousing and OLAP technology An overview of  data warehousing and OLAP technology
An overview of data warehousing and OLAP technology
 
Case study: Implementation of OLAP operations
Case study: Implementation of OLAP operationsCase study: Implementation of OLAP operations
Case study: Implementation of OLAP operations
 
EDW and Hadoop
EDW and HadoopEDW and Hadoop
EDW and Hadoop
 
BDM8 - Near-realtime Big Data Analytics using Impala
BDM8 - Near-realtime Big Data Analytics using ImpalaBDM8 - Near-realtime Big Data Analytics using Impala
BDM8 - Near-realtime Big Data Analytics using Impala
 
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use caseBDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
 
Apache storm
Apache stormApache storm
Apache storm
 
IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?
 
ODI11g, Hadoop and "Big Data" Sources
ODI11g, Hadoop and "Big Data" SourcesODI11g, Hadoop and "Big Data" Sources
ODI11g, Hadoop and "Big Data" Sources
 
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.
 
Hadoop data ingestion
Hadoop data ingestionHadoop data ingestion
Hadoop data ingestion
 
MySQL conference 2010 ignite talk on InfiniDB
MySQL conference 2010 ignite talk on InfiniDBMySQL conference 2010 ignite talk on InfiniDB
MySQL conference 2010 ignite talk on InfiniDB
 

Viewers also liked

OLAP
OLAPOLAP
OLAP
Ashir Ali
 
OLTP vs OLAP
OLTP vs OLAPOLTP vs OLAP
OLTP vs OLAP
BI_Solutions
 
Pollinators on the Kerr Ranch
Pollinators on the Kerr RanchPollinators on the Kerr Ranch
Pollinators on the Kerr Ranch
Kerr Center for Sustainable Agriculture
 
Fairisle knitting
Fairisle knittingFairisle knitting
Fairisle knitting
zafiro555
 
Bai5 oxitaxitl3
Bai5 oxitaxitl3Bai5 oxitaxitl3
Bai5 oxitaxitl3vjt_chjen
 
glaucoma
glaucomaglaucoma
glaucoma
jonyfive5
 
Peskovnik a5 web
Peskovnik a5 webPeskovnik a5 web
Peskovnik a5 webpeskovnik
 
Sambahang Kristiano sa Gulod
Sambahang Kristiano sa GulodSambahang Kristiano sa Gulod
Sambahang Kristiano sa Gulod
Samuel Curit
 
Biochar Trials 2013
Biochar Trials 2013Biochar Trials 2013
Obstetrics запорожан
Obstetrics   запорожанObstetrics   запорожан
Obstetrics запорожанIgor Nitsovych
 
History of Earthquake
History of EarthquakeHistory of Earthquake
History of Earthquake
Samuel Curit
 
How to join a WiZiQ class
How to join a WiZiQ classHow to join a WiZiQ class
How to join a WiZiQ class
Paul Enrique Casas
 
Math made easy
Math made easyMath made easy
Math made easy
Kim Graman-Contreras
 
Voc (dutch east india company)
Voc (dutch east india company)Voc (dutch east india company)
Voc (dutch east india company)
000175031
 
PRESENTATION by Ajay sharma1
PRESENTATION by Ajay sharma1PRESENTATION by Ajay sharma1
PRESENTATION by Ajay sharma1
Lovely Proffesional University
 
Landscaping for Pollinators in Oklahoma
Landscaping for Pollinators in OklahomaLandscaping for Pollinators in Oklahoma
Landscaping for Pollinators in Oklahoma
Kerr Center for Sustainable Agriculture
 
Treasure hunt key
Treasure hunt keyTreasure hunt key
Treasure hunt key
iesMola
 

Viewers also liked (20)

OLAP
OLAPOLAP
OLAP
 
OLTP vs OLAP
OLTP vs OLAPOLTP vs OLAP
OLTP vs OLAP
 
Armen nyut
Armen nyutArmen nyut
Armen nyut
 
Pollinators on the Kerr Ranch
Pollinators on the Kerr RanchPollinators on the Kerr Ranch
Pollinators on the Kerr Ranch
 
Fairisle knitting
Fairisle knittingFairisle knitting
Fairisle knitting
 
Bai5 oxitaxitl3
Bai5 oxitaxitl3Bai5 oxitaxitl3
Bai5 oxitaxitl3
 
glaucoma
glaucomaglaucoma
glaucoma
 
Chambar
ChambarChambar
Chambar
 
Peskovnik a5 web
Peskovnik a5 webPeskovnik a5 web
Peskovnik a5 web
 
Sambahang Kristiano sa Gulod
Sambahang Kristiano sa GulodSambahang Kristiano sa Gulod
Sambahang Kristiano sa Gulod
 
Biochar Trials 2013
Biochar Trials 2013Biochar Trials 2013
Biochar Trials 2013
 
Delargement
Delargement Delargement
Delargement
 
Obstetrics запорожан
Obstetrics   запорожанObstetrics   запорожан
Obstetrics запорожан
 
History of Earthquake
History of EarthquakeHistory of Earthquake
History of Earthquake
 
How to join a WiZiQ class
How to join a WiZiQ classHow to join a WiZiQ class
How to join a WiZiQ class
 
Math made easy
Math made easyMath made easy
Math made easy
 
Voc (dutch east india company)
Voc (dutch east india company)Voc (dutch east india company)
Voc (dutch east india company)
 
PRESENTATION by Ajay sharma1
PRESENTATION by Ajay sharma1PRESENTATION by Ajay sharma1
PRESENTATION by Ajay sharma1
 
Landscaping for Pollinators in Oklahoma
Landscaping for Pollinators in OklahomaLandscaping for Pollinators in Oklahoma
Landscaping for Pollinators in Oklahoma
 
Treasure hunt key
Treasure hunt keyTreasure hunt key
Treasure hunt key
 

Similar to Introduction to Datawarehousing

Database Management & Models
Database Management & ModelsDatabase Management & Models
Database Management & Models
Sunderland City Council
 
lec01.ppt
lec01.pptlec01.ppt
lec01.ppt
goodperson7
 
ITI015En-The evolution of databases (I)
ITI015En-The evolution of databases (I)ITI015En-The evolution of databases (I)
ITI015En-The evolution of databases (I)
Huibert Aalbers
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
Rishikese MR
 
DWIntro.pptx
DWIntro.pptxDWIntro.pptx
DWIntro.pptx
KRISHNARAJ207
 
IM SEMINAR.pptx
IM SEMINAR.pptxIM SEMINAR.pptx
IM SEMINAR.pptx
KRISHNARAJ207
 
Whats A Data Warehouse
Whats A Data WarehouseWhats A Data Warehouse
Whats A Data Warehouse
None None
 
data warehousing
data warehousingdata warehousing
data warehousing
Tirath Mulani
 
Koppel, Riding, Pace, and Ockerbloom, "Library Systems & Interoperability: Br...
Koppel, Riding, Pace, and Ockerbloom, "Library Systems & Interoperability: Br...Koppel, Riding, Pace, and Ockerbloom, "Library Systems & Interoperability: Br...
Koppel, Riding, Pace, and Ockerbloom, "Library Systems & Interoperability: Br...
National Information Standards Organization (NISO)
 
Database Technologies
Database TechnologiesDatabase Technologies
Database Technologies
Michel de Goede
 
The Role of XML in an Information Society with Barry Schaeffer
The Role of XML in an Information Society with Barry SchaefferThe Role of XML in an Information Society with Barry Schaeffer
The Role of XML in an Information Society with Barry Schaeffer
dclsocialmedia
 
Unit 3 part i Data mining
Unit 3 part i Data miningUnit 3 part i Data mining
Unit 3 part i Data mining
Dhilsath Fathima
 
The Evolution of the Oracle Database - Then, Now and Later (Fontys Hogeschool...
The Evolution of the Oracle Database - Then, Now and Later (Fontys Hogeschool...The Evolution of the Oracle Database - Then, Now and Later (Fontys Hogeschool...
The Evolution of the Oracle Database - Then, Now and Later (Fontys Hogeschool...
Lucas Jellema
 
Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.
Vibrant Technologies & Computers
 
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
Lucidworks
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data mining
Samrat Tayade
 
Data warehouse introduction
Data warehouse introductionData warehouse introduction
Data warehouse introduction
Murli Jha
 
Overview of dbms
Overview of dbmsOverview of dbms
Overview of dbms
Dabbal Singh Mahara
 
Emerging Technologies in IT
Emerging Technologies in ITEmerging Technologies in IT
Data warehouse - Nivetha Durganathan
Data warehouse - Nivetha DurganathanData warehouse - Nivetha Durganathan
Data warehouse - Nivetha Durganathan
Nivetha Durganathan
 

Similar to Introduction to Datawarehousing (20)

Database Management & Models
Database Management & ModelsDatabase Management & Models
Database Management & Models
 
lec01.ppt
lec01.pptlec01.ppt
lec01.ppt
 
ITI015En-The evolution of databases (I)
ITI015En-The evolution of databases (I)ITI015En-The evolution of databases (I)
ITI015En-The evolution of databases (I)
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
DWIntro.pptx
DWIntro.pptxDWIntro.pptx
DWIntro.pptx
 
IM SEMINAR.pptx
IM SEMINAR.pptxIM SEMINAR.pptx
IM SEMINAR.pptx
 
Whats A Data Warehouse
Whats A Data WarehouseWhats A Data Warehouse
Whats A Data Warehouse
 
data warehousing
data warehousingdata warehousing
data warehousing
 
Koppel, Riding, Pace, and Ockerbloom, "Library Systems & Interoperability: Br...
Koppel, Riding, Pace, and Ockerbloom, "Library Systems & Interoperability: Br...Koppel, Riding, Pace, and Ockerbloom, "Library Systems & Interoperability: Br...
Koppel, Riding, Pace, and Ockerbloom, "Library Systems & Interoperability: Br...
 
Database Technologies
Database TechnologiesDatabase Technologies
Database Technologies
 
The Role of XML in an Information Society with Barry Schaeffer
The Role of XML in an Information Society with Barry SchaefferThe Role of XML in an Information Society with Barry Schaeffer
The Role of XML in an Information Society with Barry Schaeffer
 
Unit 3 part i Data mining
Unit 3 part i Data miningUnit 3 part i Data mining
Unit 3 part i Data mining
 
The Evolution of the Oracle Database - Then, Now and Later (Fontys Hogeschool...
The Evolution of the Oracle Database - Then, Now and Later (Fontys Hogeschool...The Evolution of the Oracle Database - Then, Now and Later (Fontys Hogeschool...
The Evolution of the Oracle Database - Then, Now and Later (Fontys Hogeschool...
 
Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.
 
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data mining
 
Data warehouse introduction
Data warehouse introductionData warehouse introduction
Data warehouse introduction
 
Overview of dbms
Overview of dbmsOverview of dbms
Overview of dbms
 
Emerging Technologies in IT
Emerging Technologies in ITEmerging Technologies in IT
Emerging Technologies in IT
 
Data warehouse - Nivetha Durganathan
Data warehouse - Nivetha DurganathanData warehouse - Nivetha Durganathan
Data warehouse - Nivetha Durganathan
 

More from karunakar81987

Autosys Complete Guide
Autosys Complete GuideAutosys Complete Guide
Autosys Complete Guide
karunakar81987
 
SQLQueries
SQLQueriesSQLQueries
SQLQueries
karunakar81987
 
Frame 12
Frame 12Frame 12
Frame 12
karunakar81987
 
Frame2
Frame2Frame2
Frame
FrameFrame
Atm
AtmAtm
11 atm
11 atm11 atm
Frame relay
Frame relayFrame relay
Frame relay
karunakar81987
 

More from karunakar81987 (8)

Autosys Complete Guide
Autosys Complete GuideAutosys Complete Guide
Autosys Complete Guide
 
SQLQueries
SQLQueriesSQLQueries
SQLQueries
 
Frame 12
Frame 12Frame 12
Frame 12
 
Frame2
Frame2Frame2
Frame2
 
Frame
FrameFrame
Frame
 
Atm
AtmAtm
Atm
 
11 atm
11 atm11 atm
11 atm
 
Frame relay
Frame relayFrame relay
Frame relay
 

Recently uploaded

End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 

Recently uploaded (20)

End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 

Introduction to Datawarehousing

  • 1. Introduction to Data Warehousing
  • 2. Contents • History • OLTP vs. OLAP • Paradigm Shift • Architecture • Emerging Technologies • Questions
  • 3. History: Hollerith Cards Once upon a time… • Reporting was done with data stored on Hollerith cards. – A card contained one record. – Maximum length of the record was 80 characters. – A data file consisted of a stack of cards. • Data was “loaded” each time a report was run. • Data had no “modeling” per se. There were just sets of records. • Programs in languages such as COBOL, RPG, or BASIC would: – Read a stack of data cards into memory – Loop through records and perform a series of steps on each (e.g. increment a counter or add an amount to a total) – Send a formatted report to a printer • It was difficult to report from multiple record types. • Changes to data were implemented by simply adding, removing or replacing cards.
  • 4. History: Hollerith Cards FACTOIDS • Card type: IBM 80-column punched card • A/K/A: “Punched Card”, “IBM Card” • Size: 7 3⁄8 by 3 1⁄4 inches • Thickness: .007 inches (143 cards per inch) • Capacity: 80 columns with 12 punch locations each
  • 5. History: Magnetic Tape • Card were eventually replaced by magnetic tape. • Tapes made data easier to load and storage more efficient. • Records were stored sequentially, so individual records could not be quickly accessed. • Data processing was still very similar to that of cards—load a file into computer memory and loop through the records to process. • Updating data files was still difficult.
  • 6. History: Disk Storage History • The arrival of disk storage revolutionized data storage and access. • It was now possible to have a home base for data: a database. • Data was always available: online. • Direct access replaced sequential access so data could be accessed more quickly. • Data stored on disk required new methods for adding, deleting, or updating records. This type of processing became known as Online Transaction Processing (OLTP). • Reporting from a database became known as Online Analytical Processing (OLAP).
  • 7. History: Disk Storage History FACTOIDS • Storage Device: IBM 350 disk storage unit • Released: 1956 • Capacity: 5 million 6-bit characters (3.75 megabytes). • Disk spin speed: 1200 RPM. • Data transfer rate: 8,800 characters per second.
  • 8. History: Relational Model History • In the 1960’s, E.F. Codd developed the relational model. • Relational modeling was based on a branch of mathematics called set theory and added rigor to the organization and management of data. • Relational modeling also improved OLTP (inserting, updating, and deleting data) by making these processes more efficient and reducing data anomalies. • The relational model also introduced primary keys, foreign keys, referential integrity, relational algebra, and a number of other concepts used in modern database systems. • It soon became apparent that different data models facilitated OLAP vs. OLTP. • Relational data was often denormalized (made non-relational) to support OLAP. • Structured Query Language (SQL) was the first language created to support relational database operations both OLAP and OLTP.
  • 9. History: Relational Model FACTOIDS • Although E.F. Codd was employed by IBM when he create the relational model and IBM originated the SQL language (then “SEQUEL”), IBM was not the first vendor to produce a relational database or to use SQL. • The first commercial implementation of relational database and SQL was from Relational Software, Inc. which is now Oracle Corporation. • SQL has been standardized by standards organizations American National Standards Institute (ANSI) and the International Standards Organization (ISO).
  • 10. History: Extracts 1 • In early relational databases, data was extracted from OLTP systems into denormalized extracts for reporting. OLTP OLAP Report Source Extract
  • 11. History: Extracts 2 History • And more extracts... OLTP OLAP Report Source Extract OLTP OLAP Report Source Extract OLAP Report Extract
  • 12. History: Extracts 3 • And more extracts...
  • 13. History: Extracts 4 • And more extracts...
  • 14. History: Naturally Evolving Systems 1 • Naturally evolving systems began to emerge. 
  • 15. History: Naturally Evolving Systems 2 • Naturally evolving systems resulted in – Poor organization of data – Extremely complicated processing requirements – Inconsistencies in extract refresh status – Inconsistent report results. • This created a need for architected systems for analysis and reporting. • Instead of multiple extract files, a single source of truth was needed for each data source.
  • 16. History: Architected Systems • Developers began to design architected systems for OLAP data. • In the 1980’s and 1990’s, organizations began to integrate data from multiple sources such as accts. receivable, accts. payable, HR, inventory, and so on. These integrated OLAP databases became known as Enterprise Data Warehouses (EDWs). • Over time methods and techniques for extracting and integrating data into architected systems began to evolve and standardize. • The term data warehousing is now used to refer to the commonly used architectures, methods, and techniques for transforming and integrating data to be used for analysis.
  • 17. History: An Architected Data Warehouse Example of an Architected Data Warehouse OLTP Report OLTP Staging History DM DM DM DM Report Data set Data set OLTP OLTP ODS
  • 18. History: Compare Naturally Evolving System Compare: Naturally evolving system  18
  • 19. History: Compare Architected Data Warehouse Compare: Architected Data Warehouse  OLTP Report OLTP Staging History DM DM DM DM Report Data set Data set OLTP OLTP ODS
  • 20. History: Inmon • In the early 1990’s W.H. Inmon published Building the Data Warehouse (ISBN- 10: 0471141615) • Inmon put together the quickly accumulating knowledge of data warehousing and popularized most of the terminology we use today. • Data in a data warehouse is extracted from another data source, transformed to make it suitable for analysis, and loaded into the data warehouse. This process is often referred to as Extract, Transform and Load (ETL). • Since data from multiple sources was integrated in most data warehouses, Inmon also described the process as Transformation and Integration (T&I). • Data in a data warehouse is stored in history which is modeled for fast performance when querying the data. • The history tables are the source of truth. • Data from history is usually extracted into data marts which are used for analysis and reporting. • Separate data marts are created for each application. There is often redundant data across data marts.
  • 21. History: Inmon FACTOIDS • W.H. Inmon coined the term data warehouse. • W.H. Inmon is recognized by many as the father of data warehousing. • W.H. Inmon created the first and most commonly accepted definition of a data warehouse: A subject oriented, nonvolatile, integrated, time variant collection of data in support of management's decisions. • Other firsts of W.H. Inman – Wrote the first book on data warehousing – Wrote the first magazine column on data warehousing – Taught the first classes on data warehousing
  • 22. History: Kimball 1 • Also in the 1990’s, Ralph Kimball published The Data Warehouse Toolkit (ISBN- 10: 0471153370) which popularized dimensional modeling. • Dimensional modeling is based on the cube concept which is a multi-dimensional view of data. A cube used to represent multi-dimensional data • The cube metaphor can only illustrate three dimensions. A dimensional model can be any number of dimensions.
  • 23. History: Kimball 2 • Kimball implemented cubes as star schemas which support querying data in multiple dimensions.
  • 24. History: Kimball 3 • Kimball’s books do not discuss the relational model in depth, but his dimensional model can be explained in relational terms. • A star schema is a useful way to store data for quickly slicing and dicing data on multiple dimensions. • Dimensional modeling and star schema are frequently misunderstood and improperly implemented. Queries against incorrectly designed tables in a star schema can skew report results. • The term OLAP has come to be used specifically to refer to dimensional modeling in many marketing materials. • Star schemas are implemented as data marts so that they can be queried by users and applications. However, data marts aren’t necessarily star schema.
  • 25. History: Kimball 4 FACTOIDS • Ralph Kimball had a Ph.D. in electrical engineering from Stanford University. • Kimball worked at the Xerox Palo Alto Research Center (PARC). PARC is where laser printing, Ethernet, object-oriented programming, and graphic user interfaces were invented. • Kimball was a principal designer of the Xerox Star Workstation which was the first personal computer to use windows, icons, and mice.
  • 26. OLTP vs. OLAP Operational Data/OLTP Data Warehouse/OLAP Data is normalized (3NF) Data may be normalized, denormalized, use dimensional models, application-specific data sets, or other designs. Data is constantly updated. Data represents a state at a point in time. Existing data does not change. New data can be added to history. Typical operations are selects on small sets of records, inserts, updates, and deletes of individual records. Typical operations are selects, sorts, groupings and aggregations of large numbers of records, and inserts of thousands or millions of records. All transactions are logged. Inserts may not be logged at record level. There normally are no updates or deletes. B-tree indexes used for performance Partitioning and bitmap indexes are used for performance. Traditional development life cycle Heuristic and agile development Data designed for application Data taken from some other application Date range of records is limited; old transactions are archived. Date range of history tables can be many years.
  • 27. Paradigm Shift: For Management For Management • Traditional development life cycle doesn’t work well when building a data warehouse. There is a discovery process. Agile development works better. • OLTP data was designed for a given purpose, but OLAP is created from data that was designed for some other purpose—not reporting. It is important to evaluate data content before designing applications. • OLAP data may not be complete or precise per the application. • Data integrated from different sources may be inconsistent. – Different code values – Different columns – Different meaning of column names • OLAP data tend to be much larger requiring more resources. • Storage, storage, storage…
  • 28. Paradigm Shift: For DBAs For DBAs • Different system configurations (in Oracle, different initialization parameters) • Transaction logging may not be used, and methods for recovery from failure are different. • Different tuning requirements: – Selects are high cardinality (large percentage of rows) – Massive sorting, grouping and aggregation – DML operations can involve thousands or millions of records. • Need much more temporary space for caching aggregations, sorts and temporary tables. • Need different backup strategies. Backup frequency is based on ETL scheduling instead of transaction volume. • May be required to add new partitions and archive old partitions in history tables. • Storage, storage, storage…
  • 29. Paradigm Shift: For Architects & Developers For Architects and Developers • Different logical modeling and schema design. • Use indexes differently (e.g. bitmap rather than b-tree) • Extensive use of partitioning for history and other large tables • Different tuning requirements – Selects are high cardinality (large percentage of rows) – Lots of sorting, grouping and aggregation – DML operations can involve thousands or millions of records. • ETL processes are different than typical DML processes – Use different coding techniques – Use packages, functions, and stored procedures but rarely use triggers or constraints – Many steps to a process – Integrate data from multiple sources • Iterative and incremental development process (agile development)
  • 30. Paradigm Shift: For Analysts and Data Users For Analysts and Data Users—All Good News • A custom schema (data mart) can be created for each application per the user requirements. • Data marts can be permanent, temporary, generalized or project-specific. • New data marts can be created quickly—typically in days instead of weeks or months. • Data marts can easily be refreshed when new data is added to the data warehouse. Data mart refreshes can be scheduled or on demand. • In addition to parameterized queries and SQL, there may be additional query tools and dashboards (e.g. Business Intelligence, Self-Service BI, data visualization, etc.). • Several years of history can be maintained in a data warehouse—bigger samples. • There is a consistent single source of truth for any given data set.
  • 31. Architecture: Main Components Components of a Data Warehouse Operational Data Data Warehouse OLAP Data ETL ETL OLTP Report OLTP Staging History REF DM DM DM DM Report Data set Data set ODS
  • 32. Architecture: Staging and ODS Staging and ODS • New data is initially loaded into staging so that it can be Operational Data Data Warehouse OLAP Data processed into the data warehouse. OLTP Report OLTP Staging History REF DM DM DM DM Report Data set Data set ODS ETL • Many options are available for getting operational data from internal or external sources into the staging area • SQL Loader • imp/exp/impdp/expdp • Change Data Capture (CDC) • Replication via materialized views • Third-party ETL tools • Staging contains a snapshot in time of operational data. • An Operational Data Store (ODS) is an optional component that is used for near real time reporting. • Transformation and integration of data in an ODS is limited. • Less history (shorter time span) is kept in an ODS.
  • 33. Architecture: History and Reference Data History and Reference Data Operational Data Data Warehouse OLAP Data ETL OLTP Report OLTP Staging History REF DM DM DM DM Report Data set Data set ODS • History includes all source data—no exclusions or integrity constraints. • Partitioning is used to: • manage extremely large tables • improve performance of queries • to facilitate “rolling window” of history. • Denormalization can be used to reduce number of joins when selecting data from history. • No surrogate keys—maintain all original code values in history. • Reference data should also have history (e.g. changing ICD9 codes over time).
  • 34. Architecture: Data Marts Data Marts • Data marts are per requirements of users Operational Data OLAP Data ETL ETL OLTP Report OLTP Staging History REF DM DM DM DM Report Data set Data set ODS and applications. • Selection criteria (conditions in WHERE clause) are applied when creating data marts. • Logical data modeling is applied at data mart level (e.g. denormalized, star schemas, analytic data sets, etc.). • Integrity constraints can be applied at data mart level. • Any surrogate keys can be applied at data mart level (e.g. patient IDs). • Data marts can be Oracle, SQL Server, text files, SAS data sets, etc. • Data marts can be permanent or temporary for ongoing or one-time applications. • Data mart refreshes can be scheduled or on demand.
  • 35. Emerging Technologies Emerging technologies that are having an impact on data warehousing • Massively Parallel Processing (MPP) • In-Memory Databases (IMDB) • Unstructured Databases • Column-Oriented Databases • Database Appliances • Data Access Tools • Cloud Database Services
  • 36. Emerging Technologies: MPP Massively Parallel Processing (MPP) • Data is partitioned over hundreds or even thousands of server nodes. • A controller node manages query execution. • A query is passed to all nodes simultaneously. • Data is retrieved from all nodes and assembled to produce query results. • MPP systems will automatically partition and distribute data using their own algorithms. Developers and architects need only be concerned with conventional data modeling and DML operations. • MPP systems make sense for OLAP and data warehousing where queries are on very large numbers of records.
  • 37. Emerging Technologies: IMDB In-Memory Database • Data is stored in random access memory (RAM) rather than on disk or SSD. • Memory is accessed much more quickly reducing seek times. • Traditional RDBMS software often uses a memory cache when processing data, but it is optimized for limited cache with most data stored on disk. • IMDB software has modified algorithms to be optimized to read data from memory. • Database replication with failover is typically required because of the volatility of computer memory. • Cost of RAM has dropped considerably in recent years making IMDB systems more feasible. • Microsoft SQL Server has an In-Memory option. Tables must be defined as memory optimized to use this feature. • Oracle has recently announced the upcoming availability of their In-Memory Option.
  • 38. Emerging Technologies: Unstructured Databases Unstructured Databases • Unstructured databases-- sometimes referred to as NoSQL databases--support vast amounts of text data and extremely fast text searches. • Unstructured databases utilize massively parallel processing (MPP) and extensive text indexing. • Unstructured databases do not fully support relational features such as complex data modeling, join operations and referential integrity. However, these databases are evolving to incorporate additional relational capabilities. • Oracle, Microsoft, and other RDBMS vendors are creating hybrid database systems that incorporate unstructured data with relational database systems. • Unstructured databases are useful for very fast text searches on very large amounts of data— they are generally not useful for complex transaction processing, analyses and informatics.
  • 39. Emerging Technologies: Big Data FACTOIDS • Big data became an issues as early as 1880 with the U.S. Census which took several years to tabulate with then existing methods. • The term information explosion was first used in a the Lawton Constitution, a small-town Oklahoma newspaper in 1941. • The term big data was used for the first time in an article by NASA researchers Michael Cox and David Ellsworth. The article discussed the inability of current computer systems to handle the increasing amounts of data. • Google was a pioneer in creating modern hardware and software solutions for big data. • Parkinson’s Law of Data: “Data expands to fill the space available.” • 1 exabyte= 10006 bytes = 1018 bytes = 1000 petabytes = 1 billion gigabytes.
  • 40. Emerging Technologies: Column-Oriented Column-Oriented Databases • Data in a typical relational database is organized by row. The row paradigm is used for physical storage as well as the logical organization of data. • Column-Oriented databases physically organize data by column while still able to present data within rows. • Data is stored on disk in blocks. While the row-oriented databases store the contents of a row in a block, column-oriented databases store the contents of a column in a block. • Each column has row and table identifiers so that columns can be combined to produce rows of data in a table. • Since most queries select a subset of columns (rather than entire rows), column-oriented databases tend to perform much better for analytical processing (e.g. querying a data mart). • Microsoft SQL Server and Oracle Exadata have support for column-based data storage.
  • 41. Emerging Technologies: Appliances Database Appliances • A database appliance is an integrated, preconfigured package of RDBMS software and hardware. • The most common type of database appliance is a data warehouse appliance. • Most major database vendors including Microsoft and Oracle and their hardware partners package and sell database appliances for data warehousing. • Data warehouse appliances utilize massively parallel processing (MPP). • Database appliances generally do not scale well outside of the purchased configuration. For example, you generally don’t add storage to a database appliance. • The database appliance removes the burden of performance tuning. Conversely, database administrators have less flexibility. • A database appliance can be a cost-effective solution for data warehousing in many situations.
  • 42. Emerging Technologies: Data Access Tools Data Access Tools • Business Intelligence (BI) tools allow users to view and access data, create aggregations and summaries, create reports, and view dashboards with current data. • BI tools typically sit on top of data marts created by the architects and developers. Data marts that support BI are typically star schema. • Newer Self-Service BI tools add additional capabilities such as allowing users to integrate multiple data sources and do further analysis on result data sets from previous analyses. • Data visualization tools allow users to view data in various graphs. • Newer tools allow users to access and analyze data from multiple form factors including smart phones and tablets. • Data access, BI and data visualization tools do not always provide the capability to perform complex analyses or fulfill specific requirements of complex reports (e.g. complex statistical analyses or studies submitted to journals). Programming skills are frequently still required.
  • 43. Emerging Technologies: Cloud Databases Cloud Database Services • Oracle, Microsoft, and other database vendors offer cloud database services. • The cloud service performs all database administrative tasks: – Replicate data on multiple severs – Make backups – Scale growing databases – Performance tuning • Cloud services can be useful for prototyping and heuristic development. A large commitment to hardware purchases and administrative staff can be postponed for later assessment. • Cloud services could result in considerable cost savings for some organizations. • A cloud hybrid database is one that has database components both on the cloud and on local servers. • Cloud services may limit administrative options and flexibility vs. having your own DBAs. • Cloud services may not meet regulatory requirements for security and storage for some industries (e.g. medical data).
  • 44. Operational Data Data Warehouse OLAP Data ETL ETL OLTP Report OLTP Staging History REF DM DM DM DM Report Data set Data set ODS

Editor's Notes

  1. To change this title, go to Notes Master
  2. To change this header/title, go to Notes Master
  3. To change this header/title, go to Notes Master
  4. To change this header/title, go to Notes Master
  5. To change this header/title, go to Notes Master
  6. To change this header/title, go to Notes Master
  7. To change this header/title, go to Notes Master
  8. To change this header/title, go to Notes Master
  9. To change this header/title, go to Notes Master
  10. To change this header/title, go to Notes Master
  11. To change this header/title, go to Notes Master
  12. To change this header/title, go to Notes Master
  13. To change this header/title, go to Notes Master
  14. To change this header/title, go to Notes Master
  15. To change this header/title, go to Notes Master
  16. To change this header/title, go to Notes Master
  17. To change this header/title, go to Notes Master
  18. To change this header/title, go to Notes Master
  19. To change this header/title, go to Notes Master
  20. To change this header/title, go to Notes Master
  21. To change this header/title, go to Notes Master
  22. To change this header/title, go to Notes Master
  23. To change this header/title, go to Notes Master
  24. To change this header/title, go to Notes Master
  25. To change this header/title, go to Notes Master
  26. To change this header/title, go to Notes Master
  27. To change this header/title, go to Notes Master
  28. To change this header/title, go to Notes Master
  29. To change this header/title, go to Notes Master
  30. To change this header/title, go to Notes Master