SlideShare a Scribd company logo
1 of 48
DATA WAREHOUSING Presentation Prepared by: Ankur Chandel
CONTENTS Database and Data Warehousing History of data warehousing Evolution in organization use of data warehouses Data Warehouse Architecture Benefits of data warehousing Strategic uses of data warehousing Disadvantages of data warehouses Data mart Data mining Data mining for decision support Text mining OLAP Data warehousing integration Business intelligence
Database and Data Ware Housing…. The Difference… DWH Constitute Entire Information Base For All Time.. Database Constitute Real Time Information… DWH Supports DM And Business Intelligence. Database Is Used To Running The Business DWH Is How To Run The Business
Which are our lowest/highest margin customers ? Who are my customers and what products are they buying? What is the most effective distribution channel? What product prom--otions have the biggest impact on revenue? Which customers are most likely to go to the competition ? What impact will new products/services  have on revenue and margins? A producer wants to know….
Data, Data everywhereyet ... I can’t find the data I need data is scattered over the network many versions, subtle differences I can’t get the data I need need an expert to get the data I can’t understand the data I found available data poorly documented I can’t use the data I found results are unexpected data needs to be transformed from one form to other
What is a Data Warehouse? A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use in a business context.
What is Data Warehousing? Information Data A process of transforming data into information and making it available to users in a timely enough manner to make a difference
Data Warehousing -- a process It is a relational or multidimensional database management system designed to support management decision making.  A data warehousing is a copy of transaction data specifically structured for querying and reporting. Technique for assembling and managing data from various sources for the purpose of answering business questions. Thus making decisions that were not previous possible
Data warehousing is … Subject Oriented: Data that gives information about a particular subject instead of about a company's ongoing operations.  Integrated: Data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole.  Time-variant: All data in the data warehouse is identified with a particular time period.  Non-volatile: Data is stable in a data warehouse. More data is added but data is never removed. This enables management to gain a consistent picture of the business. Data warehousing is combining data from multiple and usually varied sources into one comprehensive and easily manipulated database.  Common accessing systems of data warehousing include queries, analysis and reporting.  Because data warehousing creates one database in the end, the number of sources can be anything you want it to be, provided that the system can handle the volume, of course.  The final result, however, is homogeneous data, which can be more easily manipulated.
History of data warehousing The concept of data warehousing dates back to the late 1980s when IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse".  1960s - General Mills and Dartmouth College, in a joint research project, develop the terms dimensions and facts. 1970s - ACNielsen and IRI provide dimensional data marts for retail sales. 1983 – Tera data introduces a database management system specifically designed for decision support.  1988 - Barry Devlin and Paul Murphy publish the article An architecture for a business and information systems in IBM Systems Journal where they introduce the term "business data warehouse".
OLTP OLTP- ONLINE TRANSACTION PROCESSING Special data organization, access methods and implementation methods are needed to support data warehouse queries (typically multidimensional queries) OLTP systems are tuned for known transactions and workloads while workload is not known a priori in a data warehouse e.g., average amount spent on phone calls between 9AM-5PM in Pune during the month of December
OLTP vs Data Warehouse OLTP Application Oriented Used to run business Detailed data Current up to date Isolated Data Clerical User Few Records accessed at a time (tens) Read/Update Access No data redundancy Database Size     100MB -100 GB Transaction throughput is the performance metric Thousands of users Managed in entirety Warehouse (DSS) Subject Oriented Used to analyze business Summarized and refined Snapshot data Integrated Data Knowledge User (Manager) Large volumes accessed at a time (millions) Mostly Read (Batch Update) Redundancy present Database Size          100 GB - few terabytes Query throughput is the performance metric Hundreds of users Managed by subsets
To summarize ... OLTP Systems are used to “run” a business The Data Warehouse helps to “optimize” the business
Evolution in organizational use of data warehouses Organizations generally start off with relatively simple use of data warehousing. Over time, more sophisticated use of data warehousing evolves. The following general stages of use of the data warehouse can be distinguished: Off line Operational Database  Data warehouses in this initial stage are developed by simply copying the data off an operational system to another server where the processing load of reporting against the copied data does not impact the operational system's performance.  Off line Data Warehouse   Data warehouses at this stage are updated from data in the operational systems on a regular basis and the data warehouse data is stored in a data structure designed to facilitate reporting.  Real Time Data Warehouse  Data warehouses at this stage are updated every time an operational system performs a transaction (e.g. an order or a delivery or a booking.)  Integrated Data Warehouse   Data warehouses at this stage are updated every time an operational system performs a transaction. The data warehouses then generate transactions that are passed back into the operational systems.
Client Client Query & Analysis Warehouse Integration Source Source Source Data Warehouse Architecture Metadata
The data has been selected from various sources and then integrate and store the data in a single and particular format. Data warehousescontain current detailed data, historical detailed data, lightly and highly summarized data, and metadata. Current and historical data are voluminous because they are stored at the highest level of detail. Lightly and highly summarized data are necessary to save processing time when users request them and are readily accessible. Metadataare “data about data”. It is important for designing, constructing, retrieving, and controlling the warehouse data. Technical metadatainclude where the data come from, how the data were changed, how the data are organized, how the data are stored, who owns the data, who is responsible for the data and how to contact them, who can access the data , and the date of last update. Business metadatainclude what data are available, where the data are, what the data mean, how to access the data, predefined reports and queries, and how current the data are.
Business advantages It provides business users with a “customer-centric” view of the company’s heterogeneous data by helping to integrate data from sales, service, manufacturing and distribution, and other customer-related business systems. It provides added value to the company’s customers by allowing them to access better information when data warehousing is coupled with internet technology. It consolidates data about individual customers and provides a repository of all customer contacts for segmentation modeling, customer retention planning, and cross sales analysis. It removes barriers among functional areas by offering a way to reconcile views from multiple areas, thus providing a look at activities that cross functional lines. It reports on trends across multidivisional, multinational operating units, including trends or relationships in areas such as merchandising, production planning etc.
Strategic uses of data warehousing
Disadvantages of data warehouses Data warehouses are not the optimal environment for unstructured data.  Because data must be extracted, transformed and loaded into the warehouse, there is an element of latency in data warehouse data.  Over their life, data warehouses can have high costs. Maintenance costs are high.  Data warehouses can get outdated relatively quickly. There is a cost of delivering suboptimal information to the organization.  There is often a fine line between data warehouses and operational systems. Duplicate, expensive functionality may be developed. Or, functionality may be developed in the data warehouse that, in retrospect, should have been developed in the operational systems and vice versa.
Data Marts A data mart is a scaled down version of a data warehouse that focuses on a particular subject area. A data mart is a subset of an organizational data store, usually oriented to a specific purpose or major data subject, that may be distributed to support business needs.  Data marts are analytical data stores designed to focus on specific business functions for a specific community within an organization.  Usually designed to support the unique business requirements of a specified department or business process Implemented as the first step in proving the usefulness of the technologies to solve business problems Reasons for creating a data mart Easy access to frequently needed data  Creates collective view by a group of users  Improves end-user response time  Ease of creation in less time Lower cost than implementing a full Data warehouse Potential users are more clearly defined than in a full Data warehouse
Information Less IndividuallyStructured History Normalized Detailed DepartmentallyStructured Data Warehouse OrganizationallyStructured More Data From the Data Warehouse to Data Marts
Characteristics of the Departmental Data Mart ,[object Object]
Flexible
Customized by Department
OLAP
Source is departmentally structured data warehouseData mart Data warehouse
Data Mining Data Mining is the process of extracting information from the company's various databases and re-organizing it for purposes other than what the databases were originally intended for.  It provides a means of extracting previously unknown, predictive information from the base of accessible data in data warehouses. Data mining process is different for different organizations depending upon the nature of the data and organization. Data mining tools use sophisticated, automated algorithms to discover hidden patterns, correlations, and relationships among organizational data. Data mining tools are used to predict future trends and behaviors, allowing businesses to make proactive, knowledge driven decisions. For ex: for targeted marketing, data mining can use data on past promotional mailings to identify the targets most likely to maximize the return on the company’s investment in future mailings.
Functions  Classification: It infers the defining characteristics of a certain group Clustering: identifies group of items that share a particular  characteristic Association: identifies relationships between events that occur at one time  Sequencing: similar to association, except that the relationship exists over a period of time Forecasting: estimates future values based on patterns within large sets of data
Characteristics  Data mining tools are needed to extract the buried information “ore”. The “miner” is often an end user, empowered by “data drills” and other power query tools to ask ad hoc questions and get answers quickly, with little or no programming skill. The data mining environment usually has a client/server architecture. Because of the large amounts of data, it is sometimes necessary to use parallel processing for data mining. Data mining tools are easily combined with spreadsheets and other end user software development tools, enabling the mined data to be analyzed and processed quickly and easily. Data mining yields five types of information: associations, sequences, classifications, clusters and forecasting. “Striking it rich” often involves finding unexpected, valuable results.
Common data mining applications
[object Object]
  Data Mining provides the Enterprise with intelligenceData Mining works with DataWarehouse
Data mining for decision support Two capabilities are provided new business opportunities Automated prediction of trends and behavior: for ex, targeted marketing. Automated discovery of previously unknown patterns: for ex, detecting fraudulent credit card transactions and identifying anomalous data representing data entry-keying errors.
Data mining tools IT tools and techniques are used by data miners  Neural computing: It is a machine learning approach by which historical data can be examined for patterns. Intelligent agents: It is the promising approach to retrieve information from the internet or from intranet-based databases. Association analysis: An approach that uses a specialized set of algorithms that sort through large data sets and expresses statistical rules among items.
Text mining  Text mining is the application of data mining to non structured or less structured text files. Operates with less structured information  Frequently focused on document format rather than document content
Text mining helps in…. Find the “hidden” content of documents, including additional useful relationships Relate documents across previously unnoticed divisions (e.g.: discover that customers in two different product divisions have the same characteristics) Group documents by common themes (e.g.: identify all the customers of an insurance firms who have similar complaints and cancel their policies)
To summarize ... OLTP Systems are used to “run” a business The Data Warehouse helps to “optimize” the business
OLAP Online Analytical Processing - coined by EF Codd in 1994 paper contracted by Arbor Software Generally synonymous with earlier terms such as Decisions Support, Business Intelligence, Executive Information System OLAP = Multidimensional Database
OLAP Online analytical processing refers to such end user activities as DSS modelling using spreadsheets and graphics that are done online. OLAP involves many different data items in complex relationships. Objective of OLAP is to analyze complex relationships and look for patterns, trends and exceptions.
OLAP Is FASMI Fast Analysis Shared Multidimensional Information
Strengths of OLAP It is a powerful visualization paradigm It provides fast, interactive response times It is good for analyzing time series It can be useful to find some clusters and outliers Many vendors offer OLAP tools such as brio.com, cognus.com, microstrategy.com etc and it is possible to access an OLAP database from web.
Data warehousing integration ,[object Object]
 Decision making and other tasks:
    CRM, DSS, EISDATA SOURCES (databases) Directuse Data organization ; storage Information Data Warehouse (storage) Use  Direct use        Data visualization  Use of knowledge Use Analytical processing,        Data mining Generate knowledge use  storage Organizational  Knowledge base Purchased knowledge STORAGE
Businesses run on information and the knowledge of how to put that information to use.  Knowledge is not readily available, it is continuously constructed from data and/or information, in a process that may not be simple or easy. The transformation of data into knowledge may be accomplished in several ways  Data collection from various sources stored in simple databases
Data can be processed, organized, and stored in a data warehouse and then analyzed (e.g.) by using analytical processing) by end users for decision support.  Some of the data are converted to information prior to storage in the data warehouse, and some of the data and/or information can be analyzed to generate knowledge.  For example, by using data mining, a process that looks for unknown relationships and patterns in the data, knowledge regarding the impact of advertising on a specific group of customers can be generated. This generated knowledge is stored in an organizational knowledge base, a repository of accumulated corporate knowledge and of purchased knowledge.   The knowledge in the knowledge base can be used to support less experienced and users, or to support complex decision making.   Both the data and the information, at various times during the process, and the knowledge derived at the end of the process, may need to be presented to users.

More Related Content

What's hot

What's hot (20)

NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
Data mining
Data miningData mining
Data mining
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
What is ETL?
What is ETL?What is ETL?
What is ETL?
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
 
Challenges of Conventional Systems.pptx
Challenges of Conventional Systems.pptxChallenges of Conventional Systems.pptx
Challenges of Conventional Systems.pptx
 
Data Models
Data ModelsData Models
Data Models
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guide
 
Business intelligence ppt
Business intelligence pptBusiness intelligence ppt
Business intelligence ppt
 
Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)
 
Data warehouse architecture
Data warehouse architecture Data warehouse architecture
Data warehouse architecture
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
DBMS and its Models
DBMS and its ModelsDBMS and its Models
DBMS and its Models
 
Presentation on Database management system
Presentation on Database management systemPresentation on Database management system
Presentation on Database management system
 
Introduction to Database
Introduction to DatabaseIntroduction to Database
Introduction to Database
 
ER-Model-ER Diagram
ER-Model-ER DiagramER-Model-ER Diagram
ER-Model-ER Diagram
 
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MININGDATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
 
Types dbms
Types dbmsTypes dbms
Types dbms
 
Database systems
Database systemsDatabase systems
Database systems
 

Viewers also liked

Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modelingvivekjv
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Miningidnats
 
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?Health Catalyst
 
Data Modeling for Big Data
Data Modeling for Big DataData Modeling for Big Data
Data Modeling for Big DataDATAVERSITY
 
Advanced applications-architecture-threats
Advanced applications-architecture-threatsAdvanced applications-architecture-threats
Advanced applications-architecture-threatsBlueinfy Solutions
 
Spiral model presentation
Spiral model presentationSpiral model presentation
Spiral model presentationSayedFarhan110
 
Ppt evaluation of information retrieval system
Ppt evaluation of information retrieval systemPpt evaluation of information retrieval system
Ppt evaluation of information retrieval systemsilambu111
 
Vivez plus longtemps et mieux avec le m-health
Vivez plus longtemps et mieux avec le m-healthVivez plus longtemps et mieux avec le m-health
Vivez plus longtemps et mieux avec le m-healthOrange Business Services
 
Objets connectés et quantified self 21082013
Objets connectés et quantified self 21082013Objets connectés et quantified self 21082013
Objets connectés et quantified self 21082013Brice Nadin
 
E-HEALTH 2016 - Sierre - Switzerland
E-HEALTH 2016 - Sierre - SwitzerlandE-HEALTH 2016 - Sierre - Switzerland
E-HEALTH 2016 - Sierre - SwitzerlandPascal Cretton
 
IESA - Introduction à la Veille Stratégique Digitale
IESA - Introduction à la Veille Stratégique DigitaleIESA - Introduction à la Veille Stratégique Digitale
IESA - Introduction à la Veille Stratégique DigitaleMedhi Corneille Famibelle*
 
Big Data: Movement, Warehousing, & Virtualization
Big Data: Movement, Warehousing, & VirtualizationBig Data: Movement, Warehousing, & Virtualization
Big Data: Movement, Warehousing, & Virtualizationtervela
 
Data warehousing and data mining
Data warehousing and data miningData warehousing and data mining
Data warehousing and data miningSnehali Chake
 

Viewers also liked (20)

Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
 
Data Modeling for Big Data
Data Modeling for Big DataData Modeling for Big Data
Data Modeling for Big Data
 
Web Mining
Web Mining Web Mining
Web Mining
 
Advanced applications-architecture-threats
Advanced applications-architecture-threatsAdvanced applications-architecture-threats
Advanced applications-architecture-threats
 
Afaria Overview- Architecture, Scaling, Supported Platforms
Afaria Overview- Architecture, Scaling, Supported PlatformsAfaria Overview- Architecture, Scaling, Supported Platforms
Afaria Overview- Architecture, Scaling, Supported Platforms
 
Spiral model presentation
Spiral model presentationSpiral model presentation
Spiral model presentation
 
Spiral model
Spiral modelSpiral model
Spiral model
 
Web 1.0 2.0-3.0-4.0 Overview
Web 1.0 2.0-3.0-4.0 OverviewWeb 1.0 2.0-3.0-4.0 Overview
Web 1.0 2.0-3.0-4.0 Overview
 
Knowledge Work System
Knowledge Work SystemKnowledge Work System
Knowledge Work System
 
Ppt evaluation of information retrieval system
Ppt evaluation of information retrieval systemPpt evaluation of information retrieval system
Ppt evaluation of information retrieval system
 
Vivez plus longtemps et mieux avec le m-health
Vivez plus longtemps et mieux avec le m-healthVivez plus longtemps et mieux avec le m-health
Vivez plus longtemps et mieux avec le m-health
 
Objets connectés et quantified self 21082013
Objets connectés et quantified self 21082013Objets connectés et quantified self 21082013
Objets connectés et quantified self 21082013
 
E-HEALTH 2016 - Sierre - Switzerland
E-HEALTH 2016 - Sierre - SwitzerlandE-HEALTH 2016 - Sierre - Switzerland
E-HEALTH 2016 - Sierre - Switzerland
 
IESA - Introduction à la Veille Stratégique Digitale
IESA - Introduction à la Veille Stratégique DigitaleIESA - Introduction à la Veille Stratégique Digitale
IESA - Introduction à la Veille Stratégique Digitale
 
IESA - culture digitale - cours 1
IESA - culture digitale - cours 1IESA - culture digitale - cours 1
IESA - culture digitale - cours 1
 
Big Data: Movement, Warehousing, & Virtualization
Big Data: Movement, Warehousing, & VirtualizationBig Data: Movement, Warehousing, & Virtualization
Big Data: Movement, Warehousing, & Virtualization
 
Data warehousing and data mining
Data warehousing and data miningData warehousing and data mining
Data warehousing and data mining
 
IESA culture digitale - cours 2
IESA culture digitale - cours 2IESA culture digitale - cours 2
IESA culture digitale - cours 2
 

Similar to DATA WAREHOUSING

Data warehouse
Data warehouseData warehouse
Data warehouseMR Z
 
Dataware housing
Dataware housingDataware housing
Dataware housingwork
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousingwork
 
Data warehouse
Data warehouseData warehouse
Data warehouseRajThakuri
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data WarehouseSOMASUNDARAM T
 
presentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptxpresentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptxvipush1
 
dw_concepts_2_day_course.ppt
dw_concepts_2_day_course.pptdw_concepts_2_day_course.ppt
dw_concepts_2_day_course.pptDougSchoemaker
 
DATAWAREHOUSE MAIn under data mining for
DATAWAREHOUSE MAIn under data mining forDATAWAREHOUSE MAIn under data mining for
DATAWAREHOUSE MAIn under data mining forAyushMeraki1
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse conceptsobieefans
 
Top 60+ Data Warehouse Interview Questions and Answers.pdf
Top 60+ Data Warehouse Interview Questions and Answers.pdfTop 60+ Data Warehouse Interview Questions and Answers.pdf
Top 60+ Data Warehouse Interview Questions and Answers.pdfDatacademy.ai
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing conceptspcherukumalla
 
Using Data Lakes to Sail Through Your Sales Goals
Using Data Lakes to Sail Through Your Sales GoalsUsing Data Lakes to Sail Through Your Sales Goals
Using Data Lakes to Sail Through Your Sales GoalsIrshadKhan682442
 
Using Data Lakes to Sail Through Your Sales Goals
Using Data Lakes to Sail Through Your Sales GoalsUsing Data Lakes to Sail Through Your Sales Goals
Using Data Lakes to Sail Through Your Sales GoalsWilliamJohnson288536
 
Using Data Lakes To Sail Through Your Sales Goals
Using Data Lakes To Sail Through Your Sales GoalsUsing Data Lakes To Sail Through Your Sales Goals
Using Data Lakes To Sail Through Your Sales GoalsKevinJohnson667312
 

Similar to DATA WAREHOUSING (20)

Data warehouse
Data warehouseData warehouse
Data warehouse
 
DW 101
DW 101DW 101
DW 101
 
Dataware housing
Dataware housingDataware housing
Dataware housing
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Oracle sql plsql & dw
Oracle sql plsql & dwOracle sql plsql & dw
Oracle sql plsql & dw
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 
presentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptxpresentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptx
 
Unit 1
Unit 1Unit 1
Unit 1
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
dw_concepts_2_day_course.ppt
dw_concepts_2_day_course.pptdw_concepts_2_day_course.ppt
dw_concepts_2_day_course.ppt
 
DATAWAREHOUSE MAIn under data mining for
DATAWAREHOUSE MAIn under data mining forDATAWAREHOUSE MAIn under data mining for
DATAWAREHOUSE MAIn under data mining for
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse concepts
 
Top 60+ Data Warehouse Interview Questions and Answers.pdf
Top 60+ Data Warehouse Interview Questions and Answers.pdfTop 60+ Data Warehouse Interview Questions and Answers.pdf
Top 60+ Data Warehouse Interview Questions and Answers.pdf
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
 
Using Data Lakes to Sail Through Your Sales Goals
Using Data Lakes to Sail Through Your Sales GoalsUsing Data Lakes to Sail Through Your Sales Goals
Using Data Lakes to Sail Through Your Sales Goals
 
Using Data Lakes to Sail Through Your Sales Goals
Using Data Lakes to Sail Through Your Sales GoalsUsing Data Lakes to Sail Through Your Sales Goals
Using Data Lakes to Sail Through Your Sales Goals
 
Using Data Lakes To Sail Through Your Sales Goals
Using Data Lakes To Sail Through Your Sales GoalsUsing Data Lakes To Sail Through Your Sales Goals
Using Data Lakes To Sail Through Your Sales Goals
 

Recently uploaded

Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 

Recently uploaded (20)

Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 

DATA WAREHOUSING

  • 1. DATA WAREHOUSING Presentation Prepared by: Ankur Chandel
  • 2. CONTENTS Database and Data Warehousing History of data warehousing Evolution in organization use of data warehouses Data Warehouse Architecture Benefits of data warehousing Strategic uses of data warehousing Disadvantages of data warehouses Data mart Data mining Data mining for decision support Text mining OLAP Data warehousing integration Business intelligence
  • 3. Database and Data Ware Housing…. The Difference… DWH Constitute Entire Information Base For All Time.. Database Constitute Real Time Information… DWH Supports DM And Business Intelligence. Database Is Used To Running The Business DWH Is How To Run The Business
  • 4. Which are our lowest/highest margin customers ? Who are my customers and what products are they buying? What is the most effective distribution channel? What product prom--otions have the biggest impact on revenue? Which customers are most likely to go to the competition ? What impact will new products/services have on revenue and margins? A producer wants to know….
  • 5. Data, Data everywhereyet ... I can’t find the data I need data is scattered over the network many versions, subtle differences I can’t get the data I need need an expert to get the data I can’t understand the data I found available data poorly documented I can’t use the data I found results are unexpected data needs to be transformed from one form to other
  • 6. What is a Data Warehouse? A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use in a business context.
  • 7. What is Data Warehousing? Information Data A process of transforming data into information and making it available to users in a timely enough manner to make a difference
  • 8. Data Warehousing -- a process It is a relational or multidimensional database management system designed to support management decision making. A data warehousing is a copy of transaction data specifically structured for querying and reporting. Technique for assembling and managing data from various sources for the purpose of answering business questions. Thus making decisions that were not previous possible
  • 9. Data warehousing is … Subject Oriented: Data that gives information about a particular subject instead of about a company's ongoing operations. Integrated: Data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole. Time-variant: All data in the data warehouse is identified with a particular time period. Non-volatile: Data is stable in a data warehouse. More data is added but data is never removed. This enables management to gain a consistent picture of the business. Data warehousing is combining data from multiple and usually varied sources into one comprehensive and easily manipulated database. Common accessing systems of data warehousing include queries, analysis and reporting. Because data warehousing creates one database in the end, the number of sources can be anything you want it to be, provided that the system can handle the volume, of course. The final result, however, is homogeneous data, which can be more easily manipulated.
  • 10. History of data warehousing The concept of data warehousing dates back to the late 1980s when IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse". 1960s - General Mills and Dartmouth College, in a joint research project, develop the terms dimensions and facts. 1970s - ACNielsen and IRI provide dimensional data marts for retail sales. 1983 – Tera data introduces a database management system specifically designed for decision support. 1988 - Barry Devlin and Paul Murphy publish the article An architecture for a business and information systems in IBM Systems Journal where they introduce the term "business data warehouse".
  • 11. OLTP OLTP- ONLINE TRANSACTION PROCESSING Special data organization, access methods and implementation methods are needed to support data warehouse queries (typically multidimensional queries) OLTP systems are tuned for known transactions and workloads while workload is not known a priori in a data warehouse e.g., average amount spent on phone calls between 9AM-5PM in Pune during the month of December
  • 12. OLTP vs Data Warehouse OLTP Application Oriented Used to run business Detailed data Current up to date Isolated Data Clerical User Few Records accessed at a time (tens) Read/Update Access No data redundancy Database Size 100MB -100 GB Transaction throughput is the performance metric Thousands of users Managed in entirety Warehouse (DSS) Subject Oriented Used to analyze business Summarized and refined Snapshot data Integrated Data Knowledge User (Manager) Large volumes accessed at a time (millions) Mostly Read (Batch Update) Redundancy present Database Size 100 GB - few terabytes Query throughput is the performance metric Hundreds of users Managed by subsets
  • 13. To summarize ... OLTP Systems are used to “run” a business The Data Warehouse helps to “optimize” the business
  • 14. Evolution in organizational use of data warehouses Organizations generally start off with relatively simple use of data warehousing. Over time, more sophisticated use of data warehousing evolves. The following general stages of use of the data warehouse can be distinguished: Off line Operational Database  Data warehouses in this initial stage are developed by simply copying the data off an operational system to another server where the processing load of reporting against the copied data does not impact the operational system's performance. Off line Data Warehouse  Data warehouses at this stage are updated from data in the operational systems on a regular basis and the data warehouse data is stored in a data structure designed to facilitate reporting. Real Time Data Warehouse  Data warehouses at this stage are updated every time an operational system performs a transaction (e.g. an order or a delivery or a booking.) Integrated Data Warehouse  Data warehouses at this stage are updated every time an operational system performs a transaction. The data warehouses then generate transactions that are passed back into the operational systems.
  • 15. Client Client Query & Analysis Warehouse Integration Source Source Source Data Warehouse Architecture Metadata
  • 16. The data has been selected from various sources and then integrate and store the data in a single and particular format. Data warehousescontain current detailed data, historical detailed data, lightly and highly summarized data, and metadata. Current and historical data are voluminous because they are stored at the highest level of detail. Lightly and highly summarized data are necessary to save processing time when users request them and are readily accessible. Metadataare “data about data”. It is important for designing, constructing, retrieving, and controlling the warehouse data. Technical metadatainclude where the data come from, how the data were changed, how the data are organized, how the data are stored, who owns the data, who is responsible for the data and how to contact them, who can access the data , and the date of last update. Business metadatainclude what data are available, where the data are, what the data mean, how to access the data, predefined reports and queries, and how current the data are.
  • 17. Business advantages It provides business users with a “customer-centric” view of the company’s heterogeneous data by helping to integrate data from sales, service, manufacturing and distribution, and other customer-related business systems. It provides added value to the company’s customers by allowing them to access better information when data warehousing is coupled with internet technology. It consolidates data about individual customers and provides a repository of all customer contacts for segmentation modeling, customer retention planning, and cross sales analysis. It removes barriers among functional areas by offering a way to reconcile views from multiple areas, thus providing a look at activities that cross functional lines. It reports on trends across multidivisional, multinational operating units, including trends or relationships in areas such as merchandising, production planning etc.
  • 18. Strategic uses of data warehousing
  • 19. Disadvantages of data warehouses Data warehouses are not the optimal environment for unstructured data. Because data must be extracted, transformed and loaded into the warehouse, there is an element of latency in data warehouse data. Over their life, data warehouses can have high costs. Maintenance costs are high. Data warehouses can get outdated relatively quickly. There is a cost of delivering suboptimal information to the organization. There is often a fine line between data warehouses and operational systems. Duplicate, expensive functionality may be developed. Or, functionality may be developed in the data warehouse that, in retrospect, should have been developed in the operational systems and vice versa.
  • 20. Data Marts A data mart is a scaled down version of a data warehouse that focuses on a particular subject area. A data mart is a subset of an organizational data store, usually oriented to a specific purpose or major data subject, that may be distributed to support business needs. Data marts are analytical data stores designed to focus on specific business functions for a specific community within an organization. Usually designed to support the unique business requirements of a specified department or business process Implemented as the first step in proving the usefulness of the technologies to solve business problems Reasons for creating a data mart Easy access to frequently needed data Creates collective view by a group of users Improves end-user response time Ease of creation in less time Lower cost than implementing a full Data warehouse Potential users are more clearly defined than in a full Data warehouse
  • 21. Information Less IndividuallyStructured History Normalized Detailed DepartmentallyStructured Data Warehouse OrganizationallyStructured More Data From the Data Warehouse to Data Marts
  • 22.
  • 25. OLAP
  • 26. Source is departmentally structured data warehouseData mart Data warehouse
  • 27. Data Mining Data Mining is the process of extracting information from the company's various databases and re-organizing it for purposes other than what the databases were originally intended for. It provides a means of extracting previously unknown, predictive information from the base of accessible data in data warehouses. Data mining process is different for different organizations depending upon the nature of the data and organization. Data mining tools use sophisticated, automated algorithms to discover hidden patterns, correlations, and relationships among organizational data. Data mining tools are used to predict future trends and behaviors, allowing businesses to make proactive, knowledge driven decisions. For ex: for targeted marketing, data mining can use data on past promotional mailings to identify the targets most likely to maximize the return on the company’s investment in future mailings.
  • 28. Functions Classification: It infers the defining characteristics of a certain group Clustering: identifies group of items that share a particular characteristic Association: identifies relationships between events that occur at one time Sequencing: similar to association, except that the relationship exists over a period of time Forecasting: estimates future values based on patterns within large sets of data
  • 29. Characteristics Data mining tools are needed to extract the buried information “ore”. The “miner” is often an end user, empowered by “data drills” and other power query tools to ask ad hoc questions and get answers quickly, with little or no programming skill. The data mining environment usually has a client/server architecture. Because of the large amounts of data, it is sometimes necessary to use parallel processing for data mining. Data mining tools are easily combined with spreadsheets and other end user software development tools, enabling the mined data to be analyzed and processed quickly and easily. Data mining yields five types of information: associations, sequences, classifications, clusters and forecasting. “Striking it rich” often involves finding unexpected, valuable results.
  • 30. Common data mining applications
  • 31.
  • 32.
  • 33. Data Mining provides the Enterprise with intelligenceData Mining works with DataWarehouse
  • 34. Data mining for decision support Two capabilities are provided new business opportunities Automated prediction of trends and behavior: for ex, targeted marketing. Automated discovery of previously unknown patterns: for ex, detecting fraudulent credit card transactions and identifying anomalous data representing data entry-keying errors.
  • 35. Data mining tools IT tools and techniques are used by data miners Neural computing: It is a machine learning approach by which historical data can be examined for patterns. Intelligent agents: It is the promising approach to retrieve information from the internet or from intranet-based databases. Association analysis: An approach that uses a specialized set of algorithms that sort through large data sets and expresses statistical rules among items.
  • 36. Text mining Text mining is the application of data mining to non structured or less structured text files. Operates with less structured information Frequently focused on document format rather than document content
  • 37. Text mining helps in…. Find the “hidden” content of documents, including additional useful relationships Relate documents across previously unnoticed divisions (e.g.: discover that customers in two different product divisions have the same characteristics) Group documents by common themes (e.g.: identify all the customers of an insurance firms who have similar complaints and cancel their policies)
  • 38. To summarize ... OLTP Systems are used to “run” a business The Data Warehouse helps to “optimize” the business
  • 39. OLAP Online Analytical Processing - coined by EF Codd in 1994 paper contracted by Arbor Software Generally synonymous with earlier terms such as Decisions Support, Business Intelligence, Executive Information System OLAP = Multidimensional Database
  • 40. OLAP Online analytical processing refers to such end user activities as DSS modelling using spreadsheets and graphics that are done online. OLAP involves many different data items in complex relationships. Objective of OLAP is to analyze complex relationships and look for patterns, trends and exceptions.
  • 41.
  • 42. OLAP Is FASMI Fast Analysis Shared Multidimensional Information
  • 43. Strengths of OLAP It is a powerful visualization paradigm It provides fast, interactive response times It is good for analyzing time series It can be useful to find some clusters and outliers Many vendors offer OLAP tools such as brio.com, cognus.com, microstrategy.com etc and it is possible to access an OLAP database from web.
  • 44.
  • 45. Decision making and other tasks:
  • 46. CRM, DSS, EISDATA SOURCES (databases) Directuse Data organization ; storage Information Data Warehouse (storage) Use Direct use Data visualization Use of knowledge Use Analytical processing, Data mining Generate knowledge use storage Organizational Knowledge base Purchased knowledge STORAGE
  • 47. Businesses run on information and the knowledge of how to put that information to use. Knowledge is not readily available, it is continuously constructed from data and/or information, in a process that may not be simple or easy. The transformation of data into knowledge may be accomplished in several ways Data collection from various sources stored in simple databases
  • 48. Data can be processed, organized, and stored in a data warehouse and then analyzed (e.g.) by using analytical processing) by end users for decision support. Some of the data are converted to information prior to storage in the data warehouse, and some of the data and/or information can be analyzed to generate knowledge. For example, by using data mining, a process that looks for unknown relationships and patterns in the data, knowledge regarding the impact of advertising on a specific group of customers can be generated. This generated knowledge is stored in an organizational knowledge base, a repository of accumulated corporate knowledge and of purchased knowledge. The knowledge in the knowledge base can be used to support less experienced and users, or to support complex decision making. Both the data and the information, at various times during the process, and the knowledge derived at the end of the process, may need to be presented to users.
  • 49. Data Warehouse for Decision Support Putting Information technology to help the knowledge worker make faster and better decisions Used to manage and control business Data is historical or point-in-time Optimized for inquiry rather than update Use of the system is loosely defined and can be ad-hoc Used by managers and end-users to understand the business and make judgments
  • 50. Business intelligence and data warehousing
  • 51. Business Intelligence One ultimate use of the data gathered and processed in the data life cycle is for business intelligence. Business intelligence generally involves the creation or use of a data warehouse and/or data mart for storage of data, and the use of front-end analytical tools such as Oracle’s Sales Analyzer and Financial Analyzer or Micro Strategy’s Web. Such tools can be employed by end users to access data, ask queries, request ad hoc (special) reports, examine scenarios, create CRM activities, devise pricing strategies, and much more.
  • 52. How business intelligence works? The process starts with raw data which are usually kept in corporate data bases. For example, a national retail chain that sells everything from grills and patio furniture to plastic utensils had data about inventory, customer information, data about past promotions, and sales numbers in various databases. Though all this information may be scattered across multiple systems-and may seem unrelated-business intelligence software can being it together. This is done by using a data warehouse. In the data warehouse (or mart) tables can be linked, and data cubes are formed. For instance, inventory information is linked to sales numbers and customer databases, allowing for deep analysis of information.
  • 53. Using the business intelligence software the user can ask queries, request ad-hoc reports, or conduct any other analysis. For example, deep analysis can be carried out by performing multilayer queries. Because all the databases are linked, one can search for what products a store has too much of, determine which of these products commonly sell with popular items, bases on previous sales. After planning a promotion to move the excess stock along with the popular products (by bundling them together, for example), one can dig deeper to see where this promotion would be most popular (and most profitable). The results of the request can be reports, predictions, alerts, and/or graphical presentations. These can be disseminated to decision makers to help them in their decision-making tasks.
  • 54. More advanced applications of business intelligence include outputs such as financial modeling budgeting resource allocation and competitive intelligence.