This document provides an overview of business intelligence, data warehousing, data marts, and data mining presented by Mr. Manish Tripathi. It defines business intelligence as a process for analyzing data to help business decisions. Data warehousing is described as a centralized repository for storing historical data from various sources to support analysis and reporting. Data marts are subsets of data warehouses focused on specific business units or teams. Common business intelligence tools and the benefits of these systems are also summarized.
This document provides an introduction to data warehousing fundamentals. It defines a data warehouse as an enterprise repository for subject-oriented, time-variant data used for reporting and analysis. It describes the typical phases of a data warehousing project including strategy, definition, analysis, design, build, population, and evolution. It compares data warehouses to operational databases and data marts. Finally, it discusses extract, transform, load processes, possible reasons for ETL failure, and typical warehousing development tasks.
A data warehouse is a subject-oriented, consolidated collection of integrated data from multiple sources used to support management decision making. It is separate from operational databases and contains historical data for analysis. Data warehouses use a star schema with fact and dimension tables and support online analytical processing (OLAP) for complex analysis and reporting.
A data warehouse consists of several key components:
- Current detail data from operational systems of record which is stored for analysis.
- Integration and transformation programs that convert operational data into a common format for the data warehouse.
- Summarized and archived data used for reporting and analysis over time.
- Metadata that describes the structure and meaning of the data.
Data warehouses are used for standard reporting, queries on summarized data, and data mining of patterns in large datasets to gain business insights.
The document provides information about data warehousing including definitions, how it works, types of data warehouses, components, architecture, and the ETL process. Some key points:
- A data warehouse is a system for collecting and managing data from multiple sources to support analysis and decision-making. It contains historical, integrated data organized around important subjects.
- Data flows into a data warehouse from transaction systems and databases. It is processed, transformed, and loaded so users can access it through BI tools. This allows organizations to analyze customers and data more holistically.
- The main components of a data warehouse are the load manager, warehouse manager, query manager, and end-user access tools. The ETL process
This document discusses data warehousing, including the concept, characteristics, development approaches, and administration of data warehouses. It provides definitions of data warehousing and describes key aspects like the extraction, transformation, and loading (ETL) process used to integrate data from multiple sources into the data warehouse. The document also compares the top-down Inmon and bottom-up Kimball models for developing data warehouses and data marts. Effective administration, scalability, and security are identified as important considerations for managing large data warehouses.
this is the ppt this contains definition of data ware house , data , ware house, data modeling , data warehouse architecture and its type , data warehouse types, single tire, two tire, three tire .
A data warehouse is a collection of integrated data from multiple sources organized to support management decision making. It contains subject-oriented, integrated, time-variant and non-volatile data stored in a way that is optimized for query and analysis. There are different types of data warehouses including data marts, operational data stores and enterprise data warehouses. Key components of a data warehouse include data sources, extraction, loading, a comprehensive database, metadata and middleware tools.
This document provides an introduction to data warehousing fundamentals. It defines a data warehouse as an enterprise repository for subject-oriented, time-variant data used for reporting and analysis. It describes the typical phases of a data warehousing project including strategy, definition, analysis, design, build, population, and evolution. It compares data warehouses to operational databases and data marts. Finally, it discusses extract, transform, load processes, possible reasons for ETL failure, and typical warehousing development tasks.
A data warehouse is a subject-oriented, consolidated collection of integrated data from multiple sources used to support management decision making. It is separate from operational databases and contains historical data for analysis. Data warehouses use a star schema with fact and dimension tables and support online analytical processing (OLAP) for complex analysis and reporting.
A data warehouse consists of several key components:
- Current detail data from operational systems of record which is stored for analysis.
- Integration and transformation programs that convert operational data into a common format for the data warehouse.
- Summarized and archived data used for reporting and analysis over time.
- Metadata that describes the structure and meaning of the data.
Data warehouses are used for standard reporting, queries on summarized data, and data mining of patterns in large datasets to gain business insights.
The document provides information about data warehousing including definitions, how it works, types of data warehouses, components, architecture, and the ETL process. Some key points:
- A data warehouse is a system for collecting and managing data from multiple sources to support analysis and decision-making. It contains historical, integrated data organized around important subjects.
- Data flows into a data warehouse from transaction systems and databases. It is processed, transformed, and loaded so users can access it through BI tools. This allows organizations to analyze customers and data more holistically.
- The main components of a data warehouse are the load manager, warehouse manager, query manager, and end-user access tools. The ETL process
This document discusses data warehousing, including the concept, characteristics, development approaches, and administration of data warehouses. It provides definitions of data warehousing and describes key aspects like the extraction, transformation, and loading (ETL) process used to integrate data from multiple sources into the data warehouse. The document also compares the top-down Inmon and bottom-up Kimball models for developing data warehouses and data marts. Effective administration, scalability, and security are identified as important considerations for managing large data warehouses.
this is the ppt this contains definition of data ware house , data , ware house, data modeling , data warehouse architecture and its type , data warehouse types, single tire, two tire, three tire .
A data warehouse is a collection of integrated data from multiple sources organized to support management decision making. It contains subject-oriented, integrated, time-variant and non-volatile data stored in a way that is optimized for query and analysis. There are different types of data warehouses including data marts, operational data stores and enterprise data warehouses. Key components of a data warehouse include data sources, extraction, loading, a comprehensive database, metadata and middleware tools.
Basic Introduction of Data Warehousing from Adiva Consultingadivasoft
This document provides an overview of Hyperion Essbase & Planning Training. It discusses key concepts like raw data transformation into information, online transaction processing (OLTP) systems, challenges with current data management, the purpose of data warehousing and data marts. It also covers dimensional modeling best practices, types of fact and dimension tables, and how Essbase is tuned for analysis and provides advantages over traditional databases for analytics.
History, definition, need, attributes, applications of data warehousing ; difference between data mining, big data, database and data warehouse ; future scope
This document discusses data warehousing concepts and technologies. It defines a data warehouse as a subject-oriented, integrated, non-volatile, and time-variant collection of data used to support management decision making. It describes the data warehouse architecture including extract-transform-load processes, OLAP servers, and metadata repositories. Finally, it outlines common data warehouse applications like reporting, querying, and data mining.
The document discusses data warehousing, including its history, types, security, applications, components, architecture, benefits and problems. A data warehouse is defined as a subject-oriented, integrated, time-variant collection of data to support management decision making. In the 1990s, organizations needed timely data but traditional systems were too slow. Data warehouses now provide competitive advantages through improved decision making and productivity. They integrate data from multiple sources to support applications like customer analysis, stock control and fraud detection.
Data warehousing involves collecting data from different sources and organizing it in a way that allows for analysis to make business decisions. It provides a single, complete view of data that end users can easily understand. A data warehouse stores integrated data from multiple sources and provides historical views of data to support analysis. It allows organizations to access critical information to support reporting, queries and decision making. Common applications of data warehousing include banking, healthcare, airlines and telecommunications.
This document provides an overview of data warehousing, including its definition, typical architecture, methodologies, advantages, and disadvantages. It defines a data warehouse as a collection of integrated, non-volatile data used to support organizational decision-making. The typical architecture includes layers for operational data, a data access layer, metadata, an informational access layer, and presentation tools. Methodologies include bottom-up design starting with data marts and top-down design using a normalized enterprise data model. Advantages include resolving inconsistencies and retrieving data without impacting operations, while disadvantages include latency and limitations with unstructured data.
This document provides an overview of key concepts related to data warehousing including what a data warehouse is, common data warehouse architectures, types of data warehouses, and dimensional modeling techniques. It defines key terms like facts, dimensions, star schemas, and snowflake schemas and provides examples of each. It also discusses business intelligence tools that can analyze and extract insights from data warehouses.
Glimpse of advantage, limitations of Hadoop and Goals / Business benefits of Data Warehouse and few use cases where Hadoop can be used to strengthen Enterprise Data Warehouse of any organization.
A data warehouse is a centralized database used for reporting and data analysis. It integrates data from multiple sources and stores current and historical data to assist management decision making. A data warehouse transforms data into timely information. It allows users to access specific types of data relevant to their needs through smaller data marts. While data warehouses provide benefits like increased access, consistency and productivity, they also present challenges such as lengthy data loads and compatibility issues.
This document discusses data warehousing and OLAP technology. It defines a data warehouse as a subject-oriented, integrated, time-variant and non-volatile collection of data used for analysis and decision making. A key aspect of data warehousing is the multidimensional data model which organizes data into cubes with facts and dimensions for analysis. Common schemas include star schemas with dimensions connected to a central fact table and snowflake schemas which normalize dimensional hierarchies.
This document provides an introduction to data warehousing. It defines key concepts like data, databases, information and metadata. It describes problems with heterogeneous data sources and fragmented data management in large enterprises. The solution is a data warehouse, which provides a unified view of data from various sources. A data warehouse is defined as a subject-oriented, integrated collection of historical data used for analysis and decision making. It differs from operational databases in aspects like data volume, volatility, and usage. The document outlines the extract-transform-load process and common architecture of data warehousing.
Data warehousing provides consolidated historical data from multiple sources to support analysis and strategic decision-making. A data warehouse is subject-oriented, integrated, stores time-variant data nonvolatile, and is maintained separately from operational databases. It differs from operational databases which focus on current data and transactions, while data warehouses integrate historical data from different sources and organizations to support analysis and informed decisions. Data warehouses are constructed separately to promote high performance of both operational and analytical systems.
The document provides an overview of data warehousing concepts. It defines a data warehouse as a subject-oriented, integrated, time-variant, and non-volatile collection of data. It discusses the differences between OLTP and OLAP systems. It also covers data warehouse architectures, components, and processes. Additionally, it explains key concepts like facts and dimensions, star schemas, normalization forms, and metadata.
This document provides an overview of data warehousing and online analytical processing (OLAP). It defines a data warehouse as a single, consistent store of subject-oriented data obtained from various sources to support end-user business analysis and decision-making. OLAP allows users to easily perform complex multidimensional analyses of data in areas such as comparisons, aggregations, and rankings. The document also discusses key aspects of data warehousing such as extraction, transformation, loading, and management of data from operational systems into the warehouse to support OLAP and decision support.
The document provides an overview of key concepts in data warehousing and business intelligence, including:
1) It defines data warehousing concepts such as the characteristics of a data warehouse (subject-oriented, integrated, time-variant, non-volatile), grain/granularity, and the differences between OLTP and data warehouse systems.
2) It discusses the evolution of business intelligence and key components of a data warehouse such as the source systems, staging area, presentation area, and access tools.
3) It covers dimensional modeling concepts like star schemas, snowflake schemas, and slowly and rapidly changing dimensions.
The document presents information on data warehousing. It defines a data warehouse as a repository for integrating enterprise data for analysis and decision making. It describes the key components, including operational data sources, an operational data store, and end-user access tools. It also outlines the processes of extracting, cleaning, transforming, loading and accessing the data, as well as common management tools. Data marts are discussed as focused subsets of a data warehouse tailored for a specific department.
This document defines key concepts in data warehousing including data warehouses, data marts, and ETL (extract, transform, load). It states that a data warehouse is a non-volatile collection of integrated data from multiple sources used to support management decision making. A data mart contains a single subject area of data. ETL is the process of extracting data from source systems, transforming it, and loading it into a data warehouse or data mart.
This document provides an overview of data mining, data warehousing, and decision support systems. It defines data mining as extracting hidden predictive patterns from large databases and data warehousing as integrating data from multiple sources into a central repository for reporting and analysis. Common data warehousing techniques include data marts, online analytical processing (OLAP), and online transaction processing (OLTP). The document also discusses the benefits of data warehousing such as enhanced business intelligence and historical data analysis, as well challenges around meeting user expectations and optimizing systems. Finally, it describes decision support systems and executive information systems as tools that combine data and models to support business decision making.
Basic Introduction of Data Warehousing from Adiva Consultingadivasoft
This document provides an overview of Hyperion Essbase & Planning Training. It discusses key concepts like raw data transformation into information, online transaction processing (OLTP) systems, challenges with current data management, the purpose of data warehousing and data marts. It also covers dimensional modeling best practices, types of fact and dimension tables, and how Essbase is tuned for analysis and provides advantages over traditional databases for analytics.
History, definition, need, attributes, applications of data warehousing ; difference between data mining, big data, database and data warehouse ; future scope
This document discusses data warehousing concepts and technologies. It defines a data warehouse as a subject-oriented, integrated, non-volatile, and time-variant collection of data used to support management decision making. It describes the data warehouse architecture including extract-transform-load processes, OLAP servers, and metadata repositories. Finally, it outlines common data warehouse applications like reporting, querying, and data mining.
The document discusses data warehousing, including its history, types, security, applications, components, architecture, benefits and problems. A data warehouse is defined as a subject-oriented, integrated, time-variant collection of data to support management decision making. In the 1990s, organizations needed timely data but traditional systems were too slow. Data warehouses now provide competitive advantages through improved decision making and productivity. They integrate data from multiple sources to support applications like customer analysis, stock control and fraud detection.
Data warehousing involves collecting data from different sources and organizing it in a way that allows for analysis to make business decisions. It provides a single, complete view of data that end users can easily understand. A data warehouse stores integrated data from multiple sources and provides historical views of data to support analysis. It allows organizations to access critical information to support reporting, queries and decision making. Common applications of data warehousing include banking, healthcare, airlines and telecommunications.
This document provides an overview of data warehousing, including its definition, typical architecture, methodologies, advantages, and disadvantages. It defines a data warehouse as a collection of integrated, non-volatile data used to support organizational decision-making. The typical architecture includes layers for operational data, a data access layer, metadata, an informational access layer, and presentation tools. Methodologies include bottom-up design starting with data marts and top-down design using a normalized enterprise data model. Advantages include resolving inconsistencies and retrieving data without impacting operations, while disadvantages include latency and limitations with unstructured data.
This document provides an overview of key concepts related to data warehousing including what a data warehouse is, common data warehouse architectures, types of data warehouses, and dimensional modeling techniques. It defines key terms like facts, dimensions, star schemas, and snowflake schemas and provides examples of each. It also discusses business intelligence tools that can analyze and extract insights from data warehouses.
Glimpse of advantage, limitations of Hadoop and Goals / Business benefits of Data Warehouse and few use cases where Hadoop can be used to strengthen Enterprise Data Warehouse of any organization.
A data warehouse is a centralized database used for reporting and data analysis. It integrates data from multiple sources and stores current and historical data to assist management decision making. A data warehouse transforms data into timely information. It allows users to access specific types of data relevant to their needs through smaller data marts. While data warehouses provide benefits like increased access, consistency and productivity, they also present challenges such as lengthy data loads and compatibility issues.
This document discusses data warehousing and OLAP technology. It defines a data warehouse as a subject-oriented, integrated, time-variant and non-volatile collection of data used for analysis and decision making. A key aspect of data warehousing is the multidimensional data model which organizes data into cubes with facts and dimensions for analysis. Common schemas include star schemas with dimensions connected to a central fact table and snowflake schemas which normalize dimensional hierarchies.
This document provides an introduction to data warehousing. It defines key concepts like data, databases, information and metadata. It describes problems with heterogeneous data sources and fragmented data management in large enterprises. The solution is a data warehouse, which provides a unified view of data from various sources. A data warehouse is defined as a subject-oriented, integrated collection of historical data used for analysis and decision making. It differs from operational databases in aspects like data volume, volatility, and usage. The document outlines the extract-transform-load process and common architecture of data warehousing.
Data warehousing provides consolidated historical data from multiple sources to support analysis and strategic decision-making. A data warehouse is subject-oriented, integrated, stores time-variant data nonvolatile, and is maintained separately from operational databases. It differs from operational databases which focus on current data and transactions, while data warehouses integrate historical data from different sources and organizations to support analysis and informed decisions. Data warehouses are constructed separately to promote high performance of both operational and analytical systems.
The document provides an overview of data warehousing concepts. It defines a data warehouse as a subject-oriented, integrated, time-variant, and non-volatile collection of data. It discusses the differences between OLTP and OLAP systems. It also covers data warehouse architectures, components, and processes. Additionally, it explains key concepts like facts and dimensions, star schemas, normalization forms, and metadata.
This document provides an overview of data warehousing and online analytical processing (OLAP). It defines a data warehouse as a single, consistent store of subject-oriented data obtained from various sources to support end-user business analysis and decision-making. OLAP allows users to easily perform complex multidimensional analyses of data in areas such as comparisons, aggregations, and rankings. The document also discusses key aspects of data warehousing such as extraction, transformation, loading, and management of data from operational systems into the warehouse to support OLAP and decision support.
The document provides an overview of key concepts in data warehousing and business intelligence, including:
1) It defines data warehousing concepts such as the characteristics of a data warehouse (subject-oriented, integrated, time-variant, non-volatile), grain/granularity, and the differences between OLTP and data warehouse systems.
2) It discusses the evolution of business intelligence and key components of a data warehouse such as the source systems, staging area, presentation area, and access tools.
3) It covers dimensional modeling concepts like star schemas, snowflake schemas, and slowly and rapidly changing dimensions.
The document presents information on data warehousing. It defines a data warehouse as a repository for integrating enterprise data for analysis and decision making. It describes the key components, including operational data sources, an operational data store, and end-user access tools. It also outlines the processes of extracting, cleaning, transforming, loading and accessing the data, as well as common management tools. Data marts are discussed as focused subsets of a data warehouse tailored for a specific department.
This document defines key concepts in data warehousing including data warehouses, data marts, and ETL (extract, transform, load). It states that a data warehouse is a non-volatile collection of integrated data from multiple sources used to support management decision making. A data mart contains a single subject area of data. ETL is the process of extracting data from source systems, transforming it, and loading it into a data warehouse or data mart.
This document provides an overview of data mining, data warehousing, and decision support systems. It defines data mining as extracting hidden predictive patterns from large databases and data warehousing as integrating data from multiple sources into a central repository for reporting and analysis. Common data warehousing techniques include data marts, online analytical processing (OLAP), and online transaction processing (OLTP). The document also discusses the benefits of data warehousing such as enhanced business intelligence and historical data analysis, as well challenges around meeting user expectations and optimizing systems. Finally, it describes decision support systems and executive information systems as tools that combine data and models to support business decision making.
Data Warehousing is a topic on Management of Information Technology that would help students on their subject matter and as reference for their assigned report.
The document discusses dimensional modeling concepts for data warehousing. It defines dimensional modeling as a technique to design database tables optimized for analytical tasks in a data warehouse. Dimensional models consist of fact tables that contain metrics/measurements and dimension tables that provide context for the facts. The document provides examples of star schemas where the fact table is at the center connected to various dimension tables, and explains how dimensional modeling supports analysis of data through queries along different dimensions.
ETL processes , Datawarehouse and Datamarts.pptxParnalSatle
The document discusses ETL processes, data warehousing, and data marts. It defines ETL as extracting data from source systems, transforming it, and loading it into a data warehouse. Data warehouses integrate data from multiple sources to support business intelligence and analytics. Data marts are focused subsets of data warehouses that serve specific business functions or departments. The document outlines the key components and architecture of data warehousing systems, including source data, data staging, data storage in warehouses and marts, and analytical applications.
1. The document discusses data warehousing and data mining. Data warehousing involves collecting and integrating data from multiple sources to support analysis and decision making. Data mining involves analyzing large datasets to discover patterns.
2. Web mining is discussed as a type of data mining that analyzes web data. There are three domains of web mining: web content mining, web structure mining, and web usage mining. Common techniques for web mining include clustering, association rules, path analysis, and sequential patterns.
3. Web mining has benefits like addressing ineffective search engines and monitoring user visit habits to improve website design. Data warehousing and data mining can provide useful business intelligence when the right analysis techniques are applied to large amounts of integrated
Which Change Data Capture Strategy is Right for You?Precisely
Change Data Capture or CDC is the practice of moving the changes made in an important transactional system to other systems, so that data is kept current and consistent across the enterprise. CDC keeps reporting and analytic systems working on the latest, most accurate data.
Many different CDC strategies exist. Each strategy has advantages and disadvantages. Some put an undue burden on the source database. They can cause queries or applications to become slow or even fail. Some bog down network bandwidth, or have big delays between change and replication.
Each business process has different requirements, as well. For some business needs, a replication delay of more than a second is too long. For others, a delay of less than 24 hours is excellent.
Which CDC strategy will match your business needs? How do you choose?
View this webcast on-demand to learn:
• Advantages and disadvantages of different CDC methods
• The replication latency your project requires
• How to keep data current in Big Data technologies like Hadoop
Using Data Platforms That Are Fit-For-PurposeDATAVERSITY
We must grow the data capabilities of our organization to fully deal with the many and varied forms of data. This cannot be accomplished without an intense focus on the many and growing technical bases that can be used to store, view, and manage data. There are many, now more than ever, that have merit in organizations today.
This session sorts out the valuable data stores, how they work, what workloads they are good for, and how to build the data foundation for a modern competitive enterprise.
Prague data management meetup 2017-02-28Martin Bém
The document discusses an operational data store (ODS) that was implemented to integrate data from two banks, Velká česká banka and Nová česká banka, after a transaction integration, using APIs, ETL workflows, and data transformations to populate the ODS with consolidated customer, account, and transaction data from both banks for operational reporting. It also provides details on the types of data domains integrated into the ODS and growth in API usage over time as more systems accessed the shared ODS.
Data Warehouse Design on Cloud ,A Big Data approach Part_OnePanchaleswar Nayak
This document discusses data warehouse design on the cloud using a big data approach. It covers topics such as business intelligence, data warehousing, data marts, data mining, ETL architecture, data warehouse design methodologies, Bill Inmon's top-down approach, Ralph Kimball's bottom-up approach, and addressing the new challenges of volume, velocity and variety of big data with Hadoop. The document proposes an architecture for next generation data warehousing using Hadoop to handle these new big data challenges.
Master data management and data warehousingZahra Mansoori
This document discusses master data management (MDM) and its role in data warehousing. It describes how MDM can consolidate and cleanse master data from various transactional systems to create a single version of truth. This unified master data is then used to support both operational and analytical initiatives. The document also provides an overview of key components of a data warehouse, including the extraction, transformation, and loading of data from operational systems. It notes that the ideal information architecture places an MDM component between operational and analytical systems to ensure consistent, high-quality master data is available throughout the organization.
Types of database processing,OLTP VS Data Warehouses(OLAP), Subject-oriented
Integrated
Time-variant
Non-volatile,
Functionalities of Data Warehouse,Roll-Up(Consolidation),
Drill-down,
Slicing,
Dicing,
Pivot,
KDD Process,Application of Data Mining
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
Whether to take data ingestion cycles off the ETL tool and the data warehouse or to facilitate competitive Data Science and building algorithms in the organization, the data lake – a place for unmodeled and vast data – will be provisioned widely in 2020.
Though it doesn’t have to be complicated, the data lake has a few key design points that are critical, and it does need to follow some principles for success. Avoid building the data swamp, but not the data lake! The tool ecosystem is building up around the data lake and soon many will have a robust lake and data warehouse. We will discuss policy to keep them straight, send data to its best platform, and keep users’ confidence up in their data platforms.
Data lakes will be built in cloud object storage. We’ll discuss the options there as well.
Get this data point for your data lake journey.
Traditional BI vs. Business Data Lake – A ComparisonCapgemini
Traditional BI systems have limitations in handling big data as they are not designed for unstructured data and have data latency issues. A business data lake provides a new approach by storing all raw structured and unstructured data in a single environment at low cost. This allows for near real-time analysis on any data from any source to gain insights.
This document defines and describes key concepts related to data warehousing and business intelligence. It defines a data warehouse as a repository of integrated data organized for analysis. Key characteristics of a data warehouse include being subject-oriented, integrated, non-volatile, and summarized. The document also discusses data marts, architectures like three-tier and two-tier, and ETL processes. Risks, best practices, and administration of data warehouses are covered as well.
Data Mining is defined as extracting information from huge sets of data. In other words, we can say that data mining is the procedure of mining knowledge from data.
According to Inmon, a data warehouse is a subject oriented,
integrated, time-variant, and non-volatile collection of data. He defined the terms
in the sentence as follows:
The document provides information about data warehousing concepts. It defines a data warehouse as a relational database designed for query and analysis rather than transactions. It contains historical data from various sources and separates analysis from transaction workloads. The goals of a data warehouse are to provide a single source of integrated information, give users direct access to data without relying on IT, and allow predictive modeling. Factors like significant user requests for related historical data and advanced decision support needs should be considered when implementing a data warehouse.
The document provides an overview of SAP BI training. It discusses that SAP stands for Systems Applications and Products in Data Processing and was founded in 1972 in Germany. It is the world's fourth largest software provider and largest provider of ERP software. The training covers topics such as the 3-tier architecture of SAP, data warehousing, ETL, the SAP BI architecture and key components, OLTP vs OLAP, business intelligence definition, and the ASAP methodology for SAP implementations.
Managing Data Warehouse Growth in the New Era of Big DataVineet
This document discusses managing data warehouse growth in the era of big data. It notes that data volumes are increasing exponentially, creating challenges around costs, performance, and governance. To address this, organizations are adopting new technologies like Hadoop and in-memory systems, and implementing tiered storage and data archiving strategies. The goal is to optimize costs by placing data in the most efficient storage for its use and value, while maintaining governance and complying with retention policies.
This document discusses social media communication in the workplace. It begins with a brief history of social media, noting its initial personal use and limited business use, but how it has now become integral to many businesses. The document then lists reasons for using social media at work, such as increasing productivity, strengthening culture and reducing costs. It provides examples of social media uses like interacting with customers, recruitment and internal communications. It also covers monitoring use, positive effects like boosting morale, and potential negative effects like decreased productivity or loss of privacy. The document concludes by outlining best practices or "do's" for social media use at work as well as things to avoid or "don'ts".
The document discusses the role of the World Trade Organization (WTO) in facilitating the global diffusion of information technology. It investigates the WTO's Information Technology Agreement (ITA) which aims to reduce trade barriers on IT products. The study examines the objectives and benefits of the ITA, its impact on business and society, and the results of its efforts to lower import duties. It also analyzes the expansion of the ITA to include more products, participants, and its effects on international trade.
Role of-wto-in-promoting-un-sustainable-development-goalsA P
The document discusses the role of the World Trade Organization (WTO) in promoting the United Nations' Sustainable Development Goals (SDGs). It outlines how several of WTO's key objectives, such as leveraging international business to reduce poverty and hunger, promoting sustainable tourism, and increasing access to healthcare, directly support and help achieve specific SDGs. The document also examines WTO's Aid for Trade program, which provides funding to developing countries to help them implement trade-related infrastructure and build their trade capacity to bridge the income gap.
Microsoft acquired Nokia's mobile phone business in 2013 for $7.2 billion in an effort to become a major player in the smartphone market. However, Microsoft struggled to compete with Android and iOS devices. By 2015 Microsoft had written off $7.6 billion of the acquisition cost and laid off thousands of Nokia employees. Microsoft then sold the Nokia brand to HMD Global in 2016. Reasons for the failure included Microsoft entering the market too late, an inability to attract developers or consumers to its Windows platform, and not being able to match Apple or cheaper Android phones on quality and price.
Manish tripathi-tcs-financial-management-9 october2016A P
- The document analyzes the working capital management of Tata Consultancy Services (TCS) from March 2012 to March 2016 by examining key ratios like current ratio, cash ratio, and quick ratio.
- The average current, cash, and quick ratios for TCS over the five year period were higher than industry standards, suggesting room for improvement in working capital management.
- While TCS has been highly profitable, the analysis concludes the company should take steps to optimize working capital and bring key ratios more in line with industry averages to further improve financial performance.
Six Sigma is a methodology used to improve business processes through statistical analysis. It was introduced at Motorola in 1986 and made central to GE's strategy in 1995. Six Sigma seeks to reduce defects and variability in processes. The term comes from the goal of having six standard deviations between the process mean and nearest specification limit, resulting in virtually no defects. It uses methodologies like DMAIC for existing processes and DMADV/DFSS for new processes to continuously measure, analyze, control, and improve processes through organizational commitment to quality.
This document provides an overview of Infosys and its core banking product Finacle. It discusses Infosys' history and achievements, describes Finacle's functionality and global customer base. It then introduces Canada, its banking sector and top banks. Finally, it outlines potential strategies for launching Finacle in Canada, covering the 7 Ps of marketing - product, price, promotion, place, process, people, and physical evidence. The strategies aim to gain an initial customer through a flexible business model, then promote this success to acquire more Canadian bank clients.
This document discusses three software project failures: the Denver Airport baggage system, the London Stock Exchange's Taurus system, and New Zealand's Novopay payroll system. For each failure, the document outlines what was expected from the project, the results and issues that occurred, and reasons for the failure. Common reasons included underestimating complexity, poor requirements gathering, lack of stakeholder involvement, and not adequately testing for defects before launching. The document concludes with additional reasons software projects often fail such as unrealistic timelines, lack of communication, and not periodically assessing progress.
The document discusses the Capability Maturity Model (CMM), which is a methodology used to develop and refine an organization's software development process. The CMM describes a five-level path from initial to optimized processes. It originated from studies of failed US military software projects in the 1980s. The levels progress from initial/chaotic processes to repeatable, defined, managed, and optimizing processes. The Software Engineering Institute began actively developing the CMM model in 1986 to help assess contractors' ability to manage software projects.
Manish tripathi-principals-of-management-book-review-21 august2015A P
The document reviews the book "How To Sell Yourself" by Joe Girard. It provides details about the author, the book, and summarizes the content of some chapters. The chapters discuss topics like selling yourself, developing self-confidence, positive attitudes, listening skills, keeping promises, persistence and more. The review concludes that the book is very useful for motivation, personality development and infusing self-confidence, especially for sales and marketing people.
This document discusses ERP implementation failures at three major companies - The Hershey Company, Nike, and Hewlett-Packard - and provides suggestions for successful ERP implementations. The Hershey Company rushed its SAP implementation and missed critical testing phases, resulting in a $100M order backlog. Nike's multi-year ERP project faced integration issues and failed demand forecasting, costing $100M in lost revenue. Hewlett-Packard's mySAP transition caused order routing problems and backlogs, costing $160M. Suggestions include careful planning, phased implementations, defining objectives, training, testing, and customizing only when necessary.
The document summarizes how an incident of deleted database data was managed and restored using SQL Server log shipping. Key points:
- An application owner reported that a wrong query deleted important data from a crucial master database at 3:10 PM.
- As the database administrator, the presenter was tasked with restoring the data. They used SQL Server log shipping between a primary server in Mumbai and secondary server in Noida.
- By manually applying transaction logs on the secondary database up to 3 PM, they were able to restore the primary database from the secondary, losing only 10 minutes of data. The incident highlighted the importance of having a disaster recovery strategy in place.
The document discusses information technology disaster recovery management. It defines disaster management as organizing resources to deal with humanitarian aspects of emergencies. IT disasters can be natural, like floods or earthquakes, or man-made, like cyber attacks. Examples of past IT disasters that impacted businesses and organizations are provided. The document outlines different levels of disaster recovery from no plan to fully redundant "hot sites." It describes types of disaster recovery solutions and provides an ideal structure for an IT disaster recovery plan.
Manish tripathi-innovation-trends-19 august2016A P
1. Tesla is an innovative electric vehicle company founded in 2003 by Elon Musk, currently valued at over $30 billion.
2. Tesla figured out how to create an affordable, high-capacity battery pack for its vehicles, giving them a longer range than competitors.
3. Tesla sells cars directly to consumers online and in retail stores, skipping dealerships, and uses customer deposits as interest-free loans.
This document summarizes innovative companies and projects from around the world. It describes Planet Innovation in Australia, which develops products with positive social impacts, including an affordable hearing aid system called Incus. In China, Baidu offers search and local information services and launched a recycling app. An Egyptian startup called Mubser created a navigational device for visually impaired users. Other innovations highlighted include a water purification device from Sweden, affordable food vending machines in Chile, and a direct air capture project for carbon removal in Canada.
1. Tesla is an innovative electric vehicle company founded in 2003 by Elon Musk, currently employing over 13,000 people and valued at over $30 billion.
2. Tesla pioneered bringing affordable long-range battery packs to market, with models offering over 300 miles of range, and innovated a direct-to-consumer sales model.
3. Tesla also innovates through over-the-air software updates, customizable driver experiences, and a highly automated Gigafactory to enable mass production of batteries.
The document discusses key findings from the Boston Consulting Group's annual survey on the most innovative companies. It found that 79% of companies now see innovation as a top priority. The top 5 most innovative companies are non-tech. It then discusses 4 main drivers of innovation: 1) Emphasis on speed to market, 2) Lean R&D processes, 3) Use of digital platforms, and 4) Systematically exploring adjacent markets. Examples are given like Zara's 2-4 week design to market time and GE's use of 3D printing to reduce ultrasound probe costs.
This document discusses the use of business intelligence and analytics tools in the oil and gas industry. It provides five examples of how different oil and gas companies have used BI tools to increase customer base, decide optimal drilling locations, create a single view of business data, improve reporting capabilities, and enable faster decision making. The document emphasizes that BI tools can help oil and gas companies maximize resource utilization and efficiency across the entire industry value chain.
A survey was conducted at Narendra Hospital in Mumbai to understand leadership styles. The survey found that the hospital practices transformational leadership, where the leader leads by example from the front. A democratic style is used where tough decisions are made and employees are motivated. This style helps employees feel secure and remain committed, treating the workplace like a family. The leadership style increases customer satisfaction and employee morale.
The document discusses India's proposed Goods and Services Tax (GST), which would create a unified indirect tax replacing existing central and state level taxes. It provides a history of GST in India from 2000 when it was first proposed. Constitutional amendments are required for its implementation which has taken 16 years due to various challenges. Key aspects include a dual GST system with CGST and SGST, intended benefits like reducing cascading taxes and creating a national market, and concerns around inflation and IT infrastructure requirements.
A presentation on mastering key management concepts across projects, products, programs, and portfolios. Whether you're an aspiring manager or looking to enhance your skills, this session will provide you with the knowledge and tools to succeed in various management roles. Learn about the distinct lifecycles, methodologies, and essential skillsets needed to thrive in today's dynamic business environment.
Originally presented at XP2024 Bolzano
While agile has entered the post-mainstream age, possibly losing its mojo along the way, the rise of remote working is dealing a more severe blow than its industrialization.
In this talk we'll have a look to the cumulative effect of the constraints of a remote working environment and of the common countermeasures.
12 steps to transform your organization into the agile org you deservePierre E. NEIS
During an organizational transformation, the shift is from the previous state to an improved one. In the realm of agility, I emphasize the significance of identifying polarities. This approach helps establish a clear understanding of your objectives. I have outlined 12 incremental actions to delineate your organizational strategy.
Impact of Effective Performance Appraisal Systems on Employee Motivation and ...Dr. Nazrul Islam
Healthy economic development requires properly managing the banking industry of any
country. Along with state-owned banks, private banks play a critical role in the country's economy.
Managers in all types of banks now confront the same challenge: how to get the utmost output from
their employees. Therefore, Performance appraisal appears to be inevitable since it set the
standard for comparing actual performance to established objectives and recommending practical
solutions that help the organization achieve sustainable growth. Therefore, the purpose of this
research is to determine the effect of performance appraisal on employee motivation and retention.
Ganpati Kumar Choudhary Indian Ethos PPT.pptx, The Dilemma of Green Energy Corporation
Green Energy Corporation, a leading renewable energy company, faces a dilemma: balancing profitability and sustainability. Pressure to scale rapidly has led to ethical concerns, as the company's commitment to sustainable practices is tested by the need to satisfy shareholders and maintain a competitive edge.
Designing and Sustaining Large-Scale Value-Centered Agile Ecosystems (powered...Alexey Krivitsky
Is Agile dead? It depends on what you mean by 'Agile'. If you mean that the organizations are not getting the promised benefits because they were focusing too much on the team-level agile "ways of working" instead of systemic global improvements -- then we are in agreement. It is a misunderstanding of Agility that led us down a dead-end. At Org Topologies, we see bright sparks -- the signs of the 'second wave of Agile' as we call it. The emphasis is shifting towards both in-team and inter-team collaboration. Away from false dichotomies. Both: team autonomy and shared broad product ownership are required to sustain true result-oriented organizational agility. Org Topologies is a package offering a visual language plus thinking tools required to communicate org development direction and can be used to help design and then sustain org change aiming at higher organizational archetypes.
Colby Hobson: Residential Construction Leader Building a Solid Reputation Thr...dsnow9802
Colby Hobson stands out as a dynamic leader in the residential construction industry. With a solid reputation built on his exceptional communication and presentation skills, Colby has proven himself to be an excellent team player, fostering a collaborative and efficient work environment.
A team is a group of individuals, all working together for a common purpose. This Ppt derives a detail information on team building process and ats type with effective example by Tuckmans Model. it also describes about team issues and effective team work. Unclear Roles and Responsibilities of teams as well as individuals.
1. Business Intelligence, Data Warehousing
Data Marts, Data Mining
Presented by
Mr. Manish Tripathi ( I – 15-18-19)
Thakur Institute of Management Studies
&
Research
(Sunday 26 March, 2017)
1
3. WHAT IS BUSINESS INTELLIGENCE?
• BI is a technology-driven process for analyzing data
and presenting actionable information to help
corporate executives, business managers and other
end users make more informed business decisions
• BI encompasses a wide variety of tools, applications
and methodologies that enable organizations to
collect data from internal systems and external
sources
• Prepare it for analysis, develop and run queries against
the data, and create reports, dashboards and data
visualizations to make the analytical results available
to corporate decision makers as well as operational
workers
3
4. WHAT IS BUSINESS INTELLIGENCE?
• BI technologies provide historical, current and
predictive views of business operations
• Identifying new opportunities and implementing
an effective strategy based on insights can provide
businesses with a competitive market advantage
and long-term stability
• Business intelligence can be used to support a
wide range of business decisions ranging from
operational to strategic
4
5. BENEFITS OF BUSINESS INTELLIGENCE
• The potential benefits of business intelligence
programs include accelerating and improving
decision making; optimizing internal business
processes; increasing operational efficiency;
driving new revenues; and gaining competitive
advantages over business rivals.
• It removes guesswork
• Gives quicker responses to your business-related
queries
• Obtain important business metrics reports
whenever and wherever you need them
5
6. BENEFITS OF BUSINESS INTELLIGENCE
• Gain a better understanding of business’ past,
present and future
• Gain valuable insight into your customer’s
behaviour
• Pinpoint up-selling as well as cross-selling
opportunities
• Develop efficiency
6
9. BUSINESS INTELLIGENCE TOOLS
• SAP Crystal Reports
• SAS Enterprise BI Server
• Oracle Business Intelligence Enterprise Edition Plus
• IBM Cognos 8 BI
• Microsoft PowerPivot
• MicroStrategy Reporting Suite
• Salesforce CRM
• TIBCO Spotfire Analytics
• Information Builders WebFOCUS
9
12. WHAT IS DATA WAREHOUSING?
• A data warehouse is a federated repository for all
the data that an enterprise's various business
systems collect
• It is a collection of corporate information and data
derived from operational systems and external
data sources
• A data warehouse is designed to support business
decisions by allowing data consolidation, analysis
and reporting at different aggregate levels
12
13. MOST POPULAR DATA WAREHOUSING DEFINITIONS
Ralph Kimball
• A data warehouse is a copy of transaction data
specifically structured for query and analysis
Bill Inmon
• A data warehouse is a subject-oriented,
integrated, time-variant and non-volatile collection
of data in support of management's decision
making process
13
15. Subject-Oriented
A data warehouse can be used to analyze a
particular subject area. For example, "sales"
can be a particular subject 15
16. Integrated
A data warehouse integrates data from
multiple data sources. For example, source A
and source B may have different ways of
identifying a product, but in a data warehouse,
there will be only a single way of identifying a
product
16
17. Time-Variant
Historical data is kept in a data warehouse. For
example, one can retrieve data from 3 months,
6 months, 12 months, or even older data from
a data warehouse. This contrasts with a
transactions system, where often only the most
recent data is kept 17
18. Non-volatile
Once data is in the data warehouse, it will not
change. So, historical data in a data warehouse
should never be altered
18
20. Purpose of Data Warehousing
• Keeping Analysis/Reporting and Production Separate
• Information Integration from multiple systems- Single
point source for information
• Data Consistency and Quality
• High Response Time- Production Databases are tuned
to expected transaction load
• High Response time- Normalized Data vs. Dimensional
Modeling
• Establish the foundation for Decision Support
• Maintain data history, even if the source transaction
systems do not
20
22. Data Warehousing vs. normal Database
1- SIZE
Data warehouses are potentially much bigger than
the databases from where the data is derived.
Databases usually store only the data that is currently
in active use; older records can be purged and moved
to backups, mainly for performance reasons. Data
warehouses are used to store much older historical
records; it's also common to use data warehouses to
store additional information that is bought or
captured elsewhere to complement the information
that is generated and stored by the internal database
system
22
23. Data Warehousing vs. normal Database
2- Normalization
Databases are usually normalized, which means that
a lot of work is done to guarantee that there's a
unique copy of any given bit of information, which is
important for performance and consistency reasons.
But it's common to store different versions of the
same information on a data warehouse, using
different structures to compose and access the
information. In other words, data warehouses are
messier and more irregular, partly by design, as they
need to be able to work with so many different
sources of information
23
24. Data Warehousing vs. normal Database
3- Access pattern
Database records are often retrieved and updated
one by one; data warehouses are nearly always
acessed by reporting engines that work on entire
datasets at a time to generate aggregates and other
analytical information. Databases are frequently
updated, sometimes only a field or record at a time;
data warehouses aren't updated very frequently, and
for all practical purposes, never at the field or record
level; instead data is appended in large batches
24
25. Data Warehousing vs. normal Database
4- Use
Normal databases are used for OLTP whereas data
warehousing is used for OLAP
25
26. Data Warehousing vs. normal Database
5- Performance
For normal database performance is important and
optimized for write operation. Whereas for data
warehouse performance is not critical and optimized
for read operations.
26
27. Data Warehousing vs. normal Database
6- Table & Joins
For normal database the tables and joins are complex
since they are normalized (for RDMS). This is done to
reduce redundant data and to save storage space.
Whereas for data warehouse for the Tables and joins
are simple since they are de-normalized. This is done
to reduce the response time for analytical queries.
27
28. Data Warehousing vs. normal Database
7- Data source
For normal database mostly internal data sources are
used. Whereas for data warehouse external data
sources may also be used like macro economic
indicators, competitor data, market data, etc.
28
29. DATA WAREHOUSING PRODUCTS
• Teradata EDW (enterprise data warehouse)
• Oracle Exadata
• Amazon Redshift
• Cloudera Enterprise Data Hub (EDH)
• Marklogic
• IBM Netezza data warehouse appliance
• SAP Business Warehouse
• MS SQL Parallel Data Warehouse
29
31. 7 STEPS IN BUILDING DATA WAREHOUSE
(MANAGEMENT VIEW)
• Step 1: Determine Business Objectives
• Step 2: Collect and Analyze Information
• Step 3: Identify Core Business Processes
• Step 4: Construct a Conceptual Data Model
• Step 5: Locate Data Sources and Plan Data
Transformations
• Step 6: Set Tracking Duration
• Step 7: Implement the Plan
31
32. 3 STEPS IN BUILDING DATA WAREHOUSE
(TECHNICAL VIEW)
• Extract
• Transform
• Load
32
35. DATA MART
• The data mart is a subset of the data warehouse and
is usually oriented to a specific business line or team
• A data mart is a repository of data that is designed to
serve a particular community of knowledge workers
• Because data marts are optimized to look at data in a
unique way, the design process tends to start with an
analysis of user needs
• Today, data virtualization software can be used to
create virtual data marts, pulling data from disparate
sources and combining it with other data as necessary
to meet the needs of specific business users
35
36. DATA MART
• A virtual data mart provides knowledge workers
with access to the data they need while
preventing data silos and giving the organization's
data management team a level of control over the
organization's data throughout its lifecycle
36
37. REASONS FOR CREATING A DATA MART
• Easy access to frequently needed data
• Creates collective view by a group of users
• Improves end-user response time
• Ease of creation
• Lower cost than implementing a full data
warehouse
• Potential users are more clearly defined than in a
full data warehouse
• Contains only business essential data and is less
cluttered.
37
40. DATA LAKE
• A data lake is a storage repository that holds a vast
amount of raw data in its native format until it is needed
• A data lake uses a flat architecture to store data
• Each data element in a lake is assigned a unique identifier
and tagged with a set of extended metadata tags
• When a business question arises, the data lake can be
queried for relevant data, and that smaller set of data can
then be analyzed to help answer the question
• The term data lake is often associated with Hadoop-
oriented object storage
• In such a scenario, an organization's data is first loaded
into the Hadoop platform, and then business analytics and
data mining tools are applied to the data where it resides
on Hadoop's cluster nodes
40
42. DATA MART VS. DATA WAREHOUSE
1- Data Scope
The first, and most obvious difference is
the information scope each one stores. On
one hand, data warehouses save all kinds
of data related to system. On the other
hand, data marts just store specific subject
information, becoming much more focused
on these functionalities.
42
43. DATA MART VS. DATA WAREHOUSE
2- Size
We can say that a data warehouse is
usually much bigger than data marts,
because it keeps a lot more data.
43
44. DATA MART VS. DATA WAREHOUSE
3-Integration
A data warehouse usually integrates
several sources of data in order to feed
its database and the system’s needs. In
opposite, a data mart has a lot less
integration to do, since its data is very
specific
44
45. DATA MART VS. DATA WAREHOUSE
4- Data Scope
The first, and most obvious difference is
the information scope each one stores. On
one hand, data warehouses save all kinds
of data related to system. On the other
hand, data marts just store specific subject
information, becoming much more focused
on these functionalities.
45
46. DATA MART VS. DATA WAREHOUSE
5- Creation
Creating a data warehouse is way more
difficult and time consuming than building
a data mart. Building all the structure, a
relationships between data, its a long and
very important step. Plus we need to think
and analyse how we will integrate all of the
information sources. Since data marts are
smaller and subject oriented, these actions
tend to be much simpler. 46
47. DATA MART VS. DATA WAREHOUSE
6-Management
Like creation, the management of data
warehouses is far more complex than
data marts. For the same reasons, it is
obvious that when we have a lot more
data, relationships, processes to
manage, it becomes a harder task.
47
48. DATA MART VS. DATA WAREHOUSE
7- Cost
In overall, in terms of cost, data marts
are cheaper than data warehouse. To
build and maintain a data warehouse
we need significantly more physical
resources like servers, disk space,
memory and CPU. Due to the
complexity of the systems, a data mart
requires less time to build and operate.48
49. DATA MART VS. DATA WAREHOUSE
8- Performance
The performance of a system always
depends on how it is built, the
infrastructure which supports it, the
processes, the number of users, etc.
Usually a data mart is faster than a data
warehouse because of the inherited
complexity and large data. 49
51. MULTIDIMENSIONAL ANALYSIS
• Multi-Dimensional Analysis is an Informational
Analysis on data which takes into account many
different relationships, each of which represents a
dimension
• For example, a retail analyst may want to
understand the relationships among sales by
region, by quarter, by demographic distribution
(income, education level, gender), by product
• Multi-dimensional analysis will yield results for
these complex relationships
51
52. MULTIDIMENSIONAL ANALYSIS
• Multi-dimensional Data Analysis (MDDA) refers to
the process of summarizing data across multiple
levels (called dimensions) and then presenting the
results in a multi-dimensional grid format
• This process is also referred to as OLAP cube, Data
Pivot., Decision Cube, and Crosstab
52
53. OLAP CUBE
• An OLAP cube is a multidimensional database that
is optimized for data warehouse and online
analytical processing (OLAP) applications
• An OLAP cube is a method of storing data in a
multidimensional form, generally for reporting
purposes
• In OLAP cubes, data are categorized by dimensions
• OLAP cubes are often pre-summarized across
dimensions to drastically improve query time over
relational databases
53
56. WHAT IS DATA MINING?
• Data mining is the practice of automatically searching
large stores of data to discover patterns and trends
that go beyond simple analysis
• Data mining uses sophisticated mathematical
algorithms to segment the data and evaluate the
probability of future events
• It is the process of finding anomalies, patterns and
correlations within large data sets to predict outcomes
• The overall goal of the data mining process is to
extract information from a data set and transform it
into an understandable structure for further use
• Also known as Knowledge Discovery in Data (KDD)
56
57. The phases, and the iterative nature, of a data mining project.
The process flow shows that a data mining project does not
stop when a particular solution is deployed. The results of
data mining trigger new business questions, which in turn can
be used to develop more focused models.
57
58. 1- PROBLEM DEFINITION
• This initial phase of a data mining project focuses
on understanding the project objectives and
requirements. Once we have specified the project
from a business perspective, we can formulate it
as a data mining problem and develop a
preliminary implementation plan.
• For example, the business problem might be:
"How can I sell more of my product to customers?"
You might translate this into a data mining
problem such as: "Which customers are most likely
to purchase the product?"
58
59. 2- Data Gathering and Preparation
The data understanding phase involves data collection
and exploration. As you take a closer look at the data,
you can determine how well it addresses the business
problem. You might decide to remove some of the
data or add additional data. This is also the time to
identify data quality problems and to scan for
patterns in the data.
59
60. 3- Model Building and Evaluation
In this phase, you select and apply various modeling
techniques and calibrate the parameters to optimal
values. If the algorithm requires data transformations,
you will need to step back to the previous phase to
implement them.
60
61. 4- Knowledge Deployment
• Knowledge deployment is the use of data mining
within a target environment
• In the deployment phase, insight and actionable
information can be derived from data
61
63. Data Mining Models
• A mining model is created by applying an
algorithm to data
• it is a set of data, statistics, and patterns that can
be applied to new data to generate predictions
and make inferences about relationships
• A data mining model gets data from a mining
structure and then analyzes that data by using a
data mining algorithm
• The mining structure and mining model are
separate objects
• The mining structure stores information that
defines the data source
63
64. Data Mining Models
• A mining model stores information derived from
statistical processing of the data, such as the
patterns found as a result of analysis
• A mining model is empty until the data provided
by the mining structure has been processed and
analyzed.
• After a mining model has been processed, it
contains metadata, results, and bindings back to
the mining structure
• Model contains metadata, patterns, and bindings
64
67. Data Mining Algorithms
• An algorithm in data mining is a set of heuristics
and calculations that creates a model from data
• To create a model, the algorithm first analyzes the
data you provide, looking for specific types of
patterns or trends
• The algorithm uses the results of this analysis over
many iterations to find the optimal parameters for
creating the mining model
• These parameters are then applied across the
entire data set to extract actionable patterns and
detailed statistics.
67
68. Data Mining Algorithms
• The mining model that an algorithm creates from
your data can take various forms, including:
1. A set of clusters that describe how the cases in a
dataset are related
2. A decision tree that predicts an outcome, and
describes how different criteria affect that outcome
3. A mathematical model that forecasts sales
4. A set of rules that describe how products are grouped
together in a transaction, and the probabilities that
products are purchased together
68