Quick Upload

Loading...
Flash Player 9 (or above) is needed to view slideshows. We have detected that you do not have it on your computer.To install it, go here
Post to Twitter Post to Twitter
Share on Facebook
Myspace Hi5 Friendster Xanga LiveJournal Facebook Blogger Tagged Typepad Freewebs BlackPlanet gigya icons
« Prev Comments 1 - 3 of 3 Next »
Add a comment If you have a SlideShare account, login to comment; otherwise comment as a guest.

    Introduction to Data Warehousing

    from jsbi, 2 years ago Add as contact

    8265 views | 3 comments | 22 favorites | 8 embeds (Stats)

    Desc: A Brief History of Information Technology
    Databases for Decision Support
    OLTP vs. OLAP
    Why OLAP & OLTP don’t mix (1)
    Organizational Data Flow and Data Storage Components
    Loading the Data Warehouse
    Characteristics of a Data Warehouse
    A Data Warehouse is Subject Oriented

    For more visit : http://jsbi.blogspot.com

    Embed customize close
     

    More Info

    This slideshow is Public

    Views: 8265 Comments: 3 Favorites: 22 Downloads: 0

    View Details: 8030 on Slideshare 235 from embeds
    Flagged as inappropriate Flag as inappropriate

    Flag as inappropriate

    Select your reason for flagging this slideshow as inappropriate.

    If needed, use the feedback form to let us know more details.

    Slideshow Transcript

    1. Slide 1: Data Warehouses Dr S.Natarajan Provided By Jason S www.jsbi.blogspot.com
    2. Slide 2: Introduction Provided By Jason S www.jsbi.blogspot.com
    3. Slide 3: A Brief History of Information Technology • The “dark ages”: paper forms in file cabinets • Computerized systems emerge – Initially for big projects like Social Security – Same functionality as old paper-based systems • The “golden age”: databases are everywhere – Most activities tracked electronically – Stored data provides detailed history of activity • The next step: use data for decision-making – The focus of this course! – Made possible by omnipresence of IT – Identify inefficiencies in current processes – Quantify likely impact of decisions Provided By Jason S www.jsbi.blogspot.com
    4. Slide 4: Databases for Decision Support • 1st phase: Automating existing processes makes them more efficient. – Automation → Lots of well-organized, easily accessed data • 2nd phase: Data analysis allows for better decision-making. – Analyze data → better understanding – Better understanding → better decisions • “Data Entry” vs. “Thinking” – Data analysts are decision-makers: managers, executives, etc. Provided By Jason S www.jsbi.blogspot.com
    5. Slide 5: Databases for Decision Support • 1st phase: Automating existing processes makes them more efficient. – Automation → Lots of well-organized, easily accessed data • 2nd phase: Data analysis allows for better decision-making. – Analyze data → better understanding – Better understanding → better decisions • “Data Entry” vs. “Thinking” – Data analysts are decision-makers: managers, executives, etc. Provided By Jason S www.jsbi.blogspot.com
    6. Slide 6: OLTP vs. OLAP • OLTP: On-Line • OLAP: On-Line Analytical Transaction Processing Processing – Many short transactions – Long transactions, complex (queries + updates) queries – Examples: – Examples: • Update account balance • Report total sales for each department in each month • Enroll in course • Identify top-selling books • Add book to shopping cart • Count classes with fewer – Queries touch small than 10 students amounts of data (one – Queries touch large record or a few records) amounts of data – Updates are frequent – Updates are infrequent – Concurrency is biggest – Individual queries can performance concern require lots of resources Provided By Jason S www.jsbi.blogspot.com
    7. Slide 7: Why OLAP & OLTP don’t mix (1) Different performance requirements • Transaction processing (OLTP): – Fast response time important (< 1 second) – Data must be up-to-date, consistent at all times • Data analysis (OLAP): – Queries can consume lots of resources – Can saturate CPUs and disk bandwidth – Operating on static “snapshot” of data usually OK • OLAP can “crowd out” OLTP transactions – Transactions are slow → unhappy users • Example: – Analysis query asks for sum of all sales – Acquires lock on sales table for consistency – New sales transaction is blocked Provided By Jason S www.jsbi.blogspot.com
    8. Slide 8: Why OLAP & OLTP don’t mix (2) Different data modeling requirements • Transaction processing (OLTP): – Normalized schema for consistency – Complex data models, many tables – Limited number of standardized queries and updates • Data analysis (OLAP): – Simplicity of data model is important • Allow semi-technical users to formulate ad hoc queries – De-normalized schemas are common • Fewer joins → improved query performance • Fewer tables → schema is easier to understand Provided By Jason S www.jsbi.blogspot.com
    9. Slide 9: Why OLAP & OLTP don’t mix (3) Analysis requires data from many sources • An OLTP system targets one specific process – For example: ordering from an online store • OLAP integrates data from different processes – Combine sales, inventory, and purchasing data – Analyze experiments conducted by different labs • OLAP often makes use of historical data – Identify long-term patterns – Notice changes in behavior over time • Terminology, schemas vary across data sources – Integrating data from disparate sources is a major challenge Provided By Jason S www.jsbi.blogspot.com
    10. Slide 10: • A data warehouse is a collection of integrated databases designed to support a DSS. • An operational data store (ODS) stores data for a specific application. It feeds the data warehouse a stream of desired raw data. • A data mart is a lower-cost, scaled- down version of a data warehouse, usually designed to support a small group of users (rather than the entire firm). • The metadata is information that is kept about the warehouse. Provided By Jason S www.jsbi.blogspot.com
    11. Slide 11: Organizational Data Flow and Data Storage Components Provided By Jason S www.jsbi.blogspot.com
    12. Slide 12: Loading the Data Warehouse Data is periodically extracted Data is cleansed and transformed Users query the data warehouse Data Staging Area Data Warehouse Source Systems Provided By Jason S www.jsbi.blogspot.com (OLTP)
    13. Slide 13: Characteristics of a Data Warehouse • Subject oriented – organized based on use • Integrated – inconsistencies removed • Nonvolatile – stored in read-only format • Time variant – data are normally time series • Summarized – in decision-usable format • Large volume – data sets are quite large • Non normalized – often redundant • Metadata – data about data are stored • Data sources – comes from nonintegrated sources Provided By Jason S www.jsbi.blogspot.com
    14. Slide 14: A Data Warehouse is Subject Oriented Provided By Jason S www.jsbi.blogspot.com
    15. Slide 15: Data in a Data Warehouse are Integrated Provided By Jason S www.jsbi.blogspot.com
    16. Slide 16: The Data Warehouse Architecture The architecture consists of various interconnected elements: – Operational and external database layer – the source data for the DW – Information access layer – the tools the end user access to extract and analyze the data – Data access layer – the interface between the operational and information access layers – Metadata layer – the data directory or repository of metadata information Provided By Jason S www.jsbi.blogspot.com
    17. Slide 17: The Data Warehouse Architecture (cont.) Additional layers are: – Process management layer – the scheduler or job controller – Application messaging layer – the “middleware” that transports information around the firm – Physical data warehouse layer – where the actual data used in the DSS are located – Data staging layer – all of the processes necessary to select, edit, summarize and load warehouse data from the operational and external data bases Provided By Jason S www.jsbi.blogspot.com
    18. Slide 18: Components of the Data Warehouse Architecture Provided By Jason S www.jsbi.blogspot.com
    19. Slide 19: Data Warehousing Typology • The virtual data warehouse – the end users have direct access to the data stores, using tools enabled at the data access layer • The central data warehouse – a single physical database contains all of the data for a specific functional area • The distributed data warehouse – the components are distributed across several physical databases Provided By Jason S www.jsbi.blogspot.com
    20. Slide 20: Data Have Data -- The Metadata • The name suggests some high-level technological concept, but it really is fairly simple. Metadata is “data about data”. • With the emergence of the data warehouse as a decision support structure, the metadata are considered as much a resource as the business data they describe. • Metadata are abstractions -- they are high level data that provide concise descriptions of lower-level data. Provided By Jason S www.jsbi.blogspot.com
    21. Slide 21: The Metadata in Action The metadata are essential ingredients in the transformation of raw data into knowledge. They are the “keys” that allow us to handle the raw data. For example, a line in a sales database may contain: 1023 K596 111.50 This is mostly meaningless until we consult the metadata (in the data directory) that tells us it was store number 1023, product K596 and sales of Rs 111.50.www.jsbi.blogspot.com Provided By Jason S
    22. Slide 22: Implementing the Data Warehouse Kozar assembled a list of “seven deadly sins” of data warehouse implementation: 1. “If you build it, they will come” – the DW needs to be designed to meet people’s needs 2. Omission of an architectural framework – you need to consider the number of users, volume of data, update cycle, etc. 3. Underestimating the importance of documenting assumptions – the assumptions and potential conflictsJason S Provided By must www.jsbi.blogspot.com be included in the framework
    23. Slide 23: “Seven Deadly Sins”, continued 1. Failure to use the right tool – a DW project needs different tools than those used to develop an application 2. Life cycle abuse – in a DW, the life cycle really never ends 3. Ignorance about data conflicts – resolving these takes a lot more effort than most people realize 4. Failure to learn from mistakes – since one DW project tends to beget another, learning from the early mistakes will yield higher quality later Provided By Jason S www.jsbi.blogspot.com
    24. Slide 24: The Future of Data Warehousing As the DW becomes a standard part of an organization, there will be efforts to find new ways to use the data. This will likely bring with it several new challenges: – Regulatory constraints may limit the ability to combine sources of disparate data. – These disparate sources are likely to contain unstructured data, which is hard to store. – The Internet makes it possible to access data from virtually “anywhere”. Of course, this just increases the disparity. Provided By Jason S www.jsbi.blogspot.com
    25. Slide 25: Data Integration is Hard • Data warehouses combine data from multiple sources • Data must be translated into a consistent format • Data integration represents ~80% of effort for a typical data warehouse project! • Some reasons why it’s hard: – Metadata is poor or non-existent – Data quality is often bad • Missing or default values • Multiple spellings of the same thing (Cal vs. UC Berkeley vs. University of California) – Inconsistent semantics • What is an airline passenger? Provided By Jason S www.jsbi.blogspot.com
    26. Slide 26: Federated Databases • An alternative to data warehouses • Data warehouse – Create a copy of all the data – Execute queries against the copy • Federated database – Pull data from source systems as needed to answer queries • “lazy” vs. “eager” data integration Rewritten Queries Query Extraction Query Answer Answer Mediator Warehouse Source Provided By JasonSystems S www.jsbi.blogspot.com Source Data Warehouse Federated Database Systems
    27. Slide 27: Warehouses vs. Federation • Advantages of federated databases: – No redundant copying of data – Queries see “real-time” view of evolving data – More flexible security policy • Disadvantages of federated databases: – Analysis queries place extra load on transactional systems – Query optimization is hard to do well – Historical data may not be available – Complex “wrappers” needed to mediate between analysis server and source systems • Data warehouses are much more common in practice – Better performance – Lower complexity – Slightly out-of-date data is acceptable Provided By Jason S www.jsbi.blogspot.com
    28. Slide 28: Visit www.jsbi.blogspot.com for more slides/information!! Mail : jasonblr@gmail.com Provided By Jason S www.jsbi.blogspot.com