Whats a datawarehouse

1,168 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,168
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
67
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Whats a datawarehouse

  1. 1. What is a Data Warehouse? And Why Are So Many Schools Setting Them Up? Richard Goerwitz
  2. 2. What Is a Data Warehouse? <ul><li>Nobody can agree </li></ul><ul><li>So I’m not actually going to define a DW </li></ul><ul><li>Don’t feel cheated, though </li></ul><ul><li>By the end of this talk, you’ll </li></ul><ul><ul><li>Understand key concepts that underlie all warehouse implementations (“talk the talk”) </li></ul></ul><ul><ul><li>Understand the various components out of which DW architects construct real-world data warehouses </li></ul></ul><ul><ul><li>Understand what a data warehouse project looks like </li></ul></ul>
  3. 3. Why Are Schools Setting Up Data Warehouses? <ul><li>A data warehouse makes it easier to: </li></ul><ul><ul><li>Optimize classroom, computer lab usage </li></ul></ul><ul><ul><li>Refine admissions ratings systems </li></ul></ul><ul><ul><li>Forecast future demand for courses, majors </li></ul></ul><ul><ul><li>Tie private spreadsheet data into central repositories </li></ul></ul><ul><ul><li>Correlate admissions and IR data with outcomes such as: </li></ul></ul><ul><ul><ul><li>GPAs </li></ul></ul></ul><ul><ul><ul><li>Placement rates </li></ul></ul></ul><ul><ul><ul><li>Happiness, as measured by alumni surveys </li></ul></ul></ul><ul><ul><li>Notify advisors when extra help may be needed based on </li></ul></ul><ul><ul><ul><li>Admissions data (student vitals; SAT, etc.) </li></ul></ul></ul><ul><ul><ul><li>Special events: A-student suddenly gets a C in his/her major </li></ul></ul></ul><ul><ul><ul><li>Slower trends: Student’s GPA falls for > 2 semesters/terms </li></ul></ul></ul><ul><ul><li>(Many other examples could be given!) </li></ul></ul><ul><li>Better information = better decisions </li></ul><ul><ul><li>Better admission decisions </li></ul></ul><ul><ul><li>Better retention rates </li></ul></ul><ul><ul><li>More effective fund raising, etc. </li></ul></ul>
  4. 4. Talking The Talk <ul><li>To think and communicate usefully about data warehouses you’ll need to understand a set of common terms and concepts: </li></ul><ul><ul><li>OLTP </li></ul></ul><ul><ul><li>ODS </li></ul></ul><ul><ul><li>OLAP, ROLAP, MOLAP </li></ul></ul><ul><ul><li>ETL </li></ul></ul><ul><ul><li>Star schema </li></ul></ul><ul><ul><li>Conformed dimension </li></ul></ul><ul><ul><li>Data mart </li></ul></ul><ul><ul><li>Cube </li></ul></ul><ul><ul><li>Metadata </li></ul></ul><ul><li>Even if you’re not an IT person, pay heed: </li></ul><ul><ul><li>You’ll have to communicate with IT people </li></ul></ul><ul><ul><li>More importantly: </li></ul></ul><ul><ul><ul><li>Evidence shows that IT will only build a successful warehouse if you are intimately involved! </li></ul></ul></ul>
  5. 5. OLTP <ul><li>OLTP = online transaction processing </li></ul><ul><li>The process of moving data around to handle day-to-day affairs </li></ul><ul><ul><li>Scheduling classes </li></ul></ul><ul><ul><li>Registering students </li></ul></ul><ul><ul><li>Tracking benefits </li></ul></ul><ul><ul><li>Recording payments, etc. </li></ul></ul><ul><li>Systems supporting this kind of activity are called transactional systems </li></ul>
  6. 6. Transactional Systems <ul><li>Transactional systems are optimized primarily for the here and now </li></ul><ul><ul><li>Can support many simultaneous users </li></ul></ul><ul><ul><li>Can support heavy read/write access </li></ul></ul><ul><ul><li>Allow for constant change </li></ul></ul><ul><ul><li>Are big, ugly, and often don’t give people the data they want </li></ul></ul><ul><ul><ul><li>As a result a lot of data ends up in shadow databases </li></ul></ul></ul><ul><ul><ul><li>Some ends up locked away in private spreadsheets </li></ul></ul></ul><ul><li>Transactional systems don’t record all previous data states </li></ul><ul><li>Lots of data gets thrown away or archived, e.g.: </li></ul><ul><ul><li>Admissions data </li></ul></ul><ul><ul><li>Enrollment data </li></ul></ul><ul><ul><li>Asset tracking data (“How many computers did we support each year, from 1996 to 2006, and where do we expect to be in 2010?”) </li></ul></ul>
  7. 7. Simple Transactional Database <ul><li>Map of Microsoft Windows Update Service (WUS) back-end database </li></ul><ul><ul><li>Diagrammed using Sybase PowerDesigner </li></ul></ul><ul><ul><ul><li>Each green box is a database “table” </li></ul></ul></ul><ul><ul><ul><li>Arrows are “joins” or foreign keys </li></ul></ul></ul><ul><ul><ul><li>This is simple for an OLTP back end </li></ul></ul></ul>
  8. 8. More Complex Example <ul><li>Recruitment Plus back-end database </li></ul><ul><li>Used by many admissions offices </li></ul><ul><li>Note again: </li></ul><ul><ul><li>Green boxes are tables </li></ul></ul><ul><ul><li>Lines are foreign key relationships </li></ul></ul><ul><ul><li>Purple boxes are views </li></ul></ul><ul><li>Considerable expertise is required to report off this database! </li></ul><ul><li>Imagine what it’s like for even more complex systems </li></ul><ul><ul><li>Colleague </li></ul></ul><ul><ul><li>SCT Banner (over 4,000 tables) </li></ul></ul>
  9. 9. The “Reporting Problem” <ul><li>Often we require OLTP data as a snapshot, in a spreadsheet or report </li></ul><ul><li>Reports require querying back-end OLTP support databases </li></ul><ul><li>But OLTP databases are often very complex, and typically </li></ul><ul><ul><li>Contain many, often obscure, tables </li></ul></ul><ul><ul><li>Utilize cryptic, unintuitive field/column names </li></ul></ul><ul><ul><li>Don’t store all necessary historical data </li></ul></ul><ul><li>As a result, reporting becomes a problem – </li></ul><ul><ul><li>Requires special expertise </li></ul></ul><ul><ul><li>May require modifications to production OLTP systems </li></ul></ul><ul><ul><li>Becomes harder and harder for staff to keep up! </li></ul></ul>
  10. 10. Workarounds <ul><li>Ways of working around the reporting problem include: </li></ul><ul><ul><li>Have OLTP system vendors do the work </li></ul></ul><ul><ul><ul><li>Provide canned reports </li></ul></ul></ul><ul><ul><ul><li>Write reporting GUIs for their products </li></ul></ul></ul><ul><ul><li>Hire more specialists </li></ul></ul><ul><ul><ul><li>To create simplified views of OLTP data </li></ul></ul></ul><ul><ul><ul><li>To write reports, create snapshots </li></ul></ul></ul><ul><ul><li>Periodically copy data from OLTP systems to a place where </li></ul></ul><ul><ul><ul><li>The data is easier to understand </li></ul></ul></ul><ul><ul><ul><li>The data is optimized for reporting </li></ul></ul></ul><ul><ul><ul><li>Easily pluggable into reporting tools </li></ul></ul></ul>
  11. 11. ODS <ul><li>ODS = operational data store </li></ul><ul><li>ODSs were an early workaround to the “reporting problem” </li></ul><ul><li>To create an ODS you </li></ul><ul><ul><li>Build a separate/simplified version of an OLTP system </li></ul></ul><ul><ul><li>Periodically copy data into it from the live OLTP system </li></ul></ul><ul><ul><li>Hook it to operational reporting tools </li></ul></ul><ul><li>An ODS can be an integration point or real-time “reporting database” for an operational system </li></ul><ul><li>It’s not enough for full enterprise-level, cross-database analytical processing </li></ul>
  12. 12. OLAP <ul><li>OLAP = online analytical processing </li></ul><ul><li>OLAP is the process of creating and summarizing historical, multidimensional data </li></ul><ul><ul><li>To help users understand the data better </li></ul></ul><ul><ul><li>Provide a basis for informed decisions </li></ul></ul><ul><ul><li>Allow users to manipulate and explore data themselves, easily and intuitively </li></ul></ul><ul><li>More than just “reporting” </li></ul><ul><li>Reporting is just one (static) product of OLAP </li></ul>
  13. 13. OLAP Support Databases <ul><li>OLAP systems require support databases </li></ul><ul><li>These databases typically </li></ul><ul><ul><li>Support fewer simultaneous users than OL T P back ends </li></ul></ul><ul><ul><li>Are structured simply; i.e., denormalized </li></ul></ul><ul><ul><li>Can grow large </li></ul></ul><ul><ul><ul><li>Hold snapshots of data in OLTP systems </li></ul></ul></ul><ul><ul><ul><li>Provide history/time depth to our analyses </li></ul></ul></ul><ul><ul><li>Are optimized for read (not write) access </li></ul></ul><ul><ul><li>Updated via periodic batch (e.g., nightly) ETL processes </li></ul></ul>
  14. 14. ETL Processes <ul><li>ETL = extract, transform, load </li></ul><ul><ul><li>Extract data from various sources </li></ul></ul><ul><ul><li>Transform and clean the data from those sources </li></ul></ul><ul><ul><li>Load the data into databases used for analysis and reporting </li></ul></ul><ul><li>ETL processes are coded in various ways </li></ul><ul><ul><li>By hand in SQL, UniBASIC, etc. </li></ul></ul><ul><ul><li>Using more general programming languages </li></ul></ul><ul><ul><li>In semi-automated fashion using specialized ETL tools like Cognos Decision Stream </li></ul></ul><ul><li>Most institutions do hand ETL; but note well: </li></ul><ul><ul><li>Hand ETL is slow </li></ul></ul><ul><ul><li>Requires specialized knowledge </li></ul></ul><ul><ul><li>Becomes extremely difficult to maintain as code accumulates and databases/personnel change! </li></ul></ul>
  15. 15. Where Does the Data Go? <ul><li>What sort of a database do the ETL processes dump data into? </li></ul><ul><li>Typically, into very simple table structures </li></ul><ul><li>These table structures are: </li></ul><ul><ul><li>Denormalized </li></ul></ul><ul><ul><li>Minimally branched/hierarchized </li></ul></ul><ul><ul><li>Structured into star schemas </li></ul></ul>
  16. 16. So What Are Star Schemas? <ul><li>Star schemas are collections of data arranged into star-like patterns </li></ul><ul><ul><li>They have fact tables in the middle, which contain amounts, measures (like counts, dollar amounts, GPAs) </li></ul></ul><ul><ul><li>Dimension tables around the outside, which contain labels and classifications (like names, geocodes, majors) </li></ul></ul><ul><ul><li>For faster processing, aggregate fact tables are sometimes also used (e.g., counts pre-averaged for an entire term) </li></ul></ul><ul><li>Star schemas should </li></ul><ul><ul><li>Have descriptive column/field labels </li></ul></ul><ul><ul><li>Be easy for users to understand </li></ul></ul><ul><ul><li>Perform well on queries </li></ul></ul>
  17. 17. A Very Simple Star Schema <ul><li>Data Center UPS </li></ul><ul><li>Power Output </li></ul><ul><li>Dimensions: </li></ul><ul><ul><li>Phase </li></ul></ul><ul><ul><li>Time </li></ul></ul><ul><ul><li>Date </li></ul></ul><ul><li>Facts: </li></ul><ul><ul><li>Volts </li></ul></ul><ul><ul><li>Amps </li></ul></ul><ul><ul><li>Etc. </li></ul></ul>
  18. 18. A More Complex Star Schema <ul><li>Freshman survey data (HERI/CIRP) </li></ul><ul><li>Dimensions: </li></ul><ul><ul><li>Questions </li></ul></ul><ul><ul><li>Survey years </li></ul></ul><ul><ul><li>Data about test takers </li></ul></ul><ul><li>Facts: </li></ul><ul><ul><li>Answer (text) </li></ul></ul><ul><ul><li>Answer (raw) </li></ul></ul><ul><ul><li>Count (1) </li></ul></ul><ul><li>Oops </li></ul><ul><ul><li>Not a star </li></ul></ul><ul><ul><li>Snowflaked! </li></ul></ul>Oops, answers should have been placed in their own dimension (creating a “factless fact table”). I’ll demo a better version of this star later!
  19. 19. Data Marts <ul><li>One definition: </li></ul><ul><ul><li>One or more star schemas that present data on a single or related set of business processes </li></ul></ul><ul><li>Data marts should not be built in isolation </li></ul><ul><li>They need to be connected via dimensional tables that are </li></ul><ul><ul><li>The same or subsets of each other </li></ul></ul><ul><ul><li>Hierarchized the same way internally </li></ul></ul><ul><li>So, e.g., if I construct data marts for… </li></ul><ul><ul><li>GPA trends, student major trends, enrollments </li></ul></ul><ul><ul><li>Freshman survey data, senior survey data, etc. </li></ul></ul><ul><li>… I connect these marts via a conformed student dimension </li></ul><ul><ul><li>Makes correlation of data across star schemas intuitive </li></ul></ul><ul><ul><li>Makes it easier for OLAP tools to use the data </li></ul></ul><ul><ul><li>Allows nonspecialists to do much of the work </li></ul></ul>
  20. 20. Simple Data Mart Example <ul><li>UPS </li></ul><ul><li>Battery star </li></ul><ul><ul><li>By battery </li></ul></ul><ul><ul><ul><li>Run-time </li></ul></ul></ul><ul><ul><ul><li>% charged </li></ul></ul></ul><ul><ul><ul><li>Current </li></ul></ul></ul><ul><li>Input star </li></ul><ul><ul><li>By phase </li></ul></ul><ul><ul><ul><li>Voltage </li></ul></ul></ul><ul><ul><ul><li>Current </li></ul></ul></ul><ul><li>Output star </li></ul><ul><ul><li>By phase </li></ul></ul><ul><ul><ul><li>Voltage </li></ul></ul></ul><ul><ul><ul><li>Current </li></ul></ul></ul><ul><li>Sensor star </li></ul><ul><ul><li>By sensor </li></ul></ul><ul><ul><ul><li>Temp </li></ul></ul></ul><ul><ul><ul><li>Humidity </li></ul></ul></ul>Note conformed date, time dimensions!
  21. 21. CIRP Star/Data Mart <ul><li>CIRP Freshman survey data </li></ul><ul><li>Corrected from a previous slide </li></ul><ul><li>Note the CirpAnswer dimension </li></ul><ul><li>Note student dimension (ties in with other marts) </li></ul>
  22. 22. CIRP Mart in Cognos BI 8
  23. 23. ROLAP, MOLAP <ul><li>ROLAP = OLAP via direct relational query </li></ul><ul><ul><li>E.g., against a (materialized) view </li></ul></ul><ul><ul><li>Against star schemas in a warehouse </li></ul></ul><ul><li>MOLAP = OLAP via multidimensional database (MDB) </li></ul><ul><ul><li>MDB is a special kind of database </li></ul></ul><ul><ul><li>Treats data kind of like a big, fast spreadsheet </li></ul></ul><ul><ul><li>MDBs typically draw data in from a data warehouse </li></ul></ul><ul><ul><ul><li>Built to work best with star schemas </li></ul></ul></ul>
  24. 24. <ul><li>The term data cube means different things to different people </li></ul><ul><li>Various definitions: </li></ul><ul><ul><li>A star schema </li></ul></ul><ul><ul><li>Any DB view used for reporting </li></ul></ul><ul><ul><li>A three-dimensional array in a MDB </li></ul></ul><ul><ul><li>Any multidimensional MDB array (really a hyper cube) </li></ul></ul><ul><li>Which definition do you suppose is technically correct? </li></ul>Data Cubes
  25. 25. Metadata <ul><li>Metadata = data about data </li></ul><ul><li>In a data warehousing context it can mean many things </li></ul><ul><ul><li>Information on data in source OLTP systems </li></ul></ul><ul><ul><li>Information on ETL jobs and what they do to the data </li></ul></ul><ul><ul><li>Information on data in marts/star schemas </li></ul></ul><ul><ul><li>Documentation in OLAP tools on the data they manipulate </li></ul></ul><ul><li>Many institutions make metadata available via data malls or warehouse portals, e.g.: </li></ul><ul><ul><li>University of New Mexico </li></ul></ul><ul><ul><li>UC Davis </li></ul></ul><ul><ul><li>Rensselear Polytechnic Institute </li></ul></ul><ul><ul><li>University of Illinois </li></ul></ul><ul><li>Good ETL tools automate the setup of malls/portals! </li></ul>
  26. 26. The Data Warehouse <ul><li>OK now we’re experts in terms like OLTP, OLAP, star schema, metadata, etc. </li></ul><ul><li>Let’s use some of these terms to describe how a DW works: </li></ul><ul><ul><li>Provides ample metadata – data about the data </li></ul></ul><ul><ul><li>Utilizes easy-to-understand column/field names </li></ul></ul><ul><ul><li>Feeds multidimensional databases (MDBs) </li></ul></ul><ul><ul><li>Is updated via periodic (mainly nightly) ETL jobs </li></ul></ul><ul><ul><li>Presents data in a simplified, denormalized form </li></ul></ul><ul><ul><li>Utilizes star-like fact/dimension table schemas </li></ul></ul><ul><ul><li>Encompasses multiple, smaller data “marts” </li></ul></ul><ul><ul><li>Supports OLAP tools (Access/Excel, Safari, Cognos BI) </li></ul></ul><ul><ul><li>Derives data from (multiple) back-end OLTP systems </li></ul></ul><ul><ul><li>Houses historical data, and can grow very big </li></ul></ul>
  27. 27. A Data Warehouse is Not… <ul><li>Vendor and consultant proclamations aside, a data warehouse is not: </li></ul><ul><ul><li>A project </li></ul></ul><ul><ul><ul><li>With a specific end date </li></ul></ul></ul><ul><ul><li>A product you buy from a vendor </li></ul></ul><ul><ul><ul><li>Like an ODS (such as SCT’s) </li></ul></ul></ul><ul><ul><ul><li>A canned “warehouse” supplied by iStrategy </li></ul></ul></ul><ul><ul><ul><li>Cognos ReportNet </li></ul></ul></ul><ul><ul><li>A database schema or instance </li></ul></ul><ul><ul><ul><li>Like Oracle </li></ul></ul></ul><ul><ul><ul><li>SQL Server </li></ul></ul></ul><ul><ul><li>A cut-down version of your live transactional database </li></ul></ul>
  28. 28. Kimball & Caserta’s Definition <ul><li>According to Ralph Kimball and Joe Caserta, a data warehouse is: </li></ul><ul><ul><li>A system that extracts, cleans, conforms, and delivers source data into a dimensional data store and then supports and implements querying and analysis for the purpose of decision making. </li></ul></ul><ul><li>Another def.: The union of all the enterprise’s data marts </li></ul><ul><li>Aside: The Kimball model is not without some critics: </li></ul><ul><ul><li>E.g., Bill Inmon </li></ul></ul>
  29. 29. Example Data Warehouse (1) <ul><li>This one is RPI’s </li></ul><ul><li>5 parts: </li></ul><ul><ul><li>Sources </li></ul></ul><ul><ul><li>ETL stuff </li></ul></ul><ul><ul><li>DW proper </li></ul></ul><ul><ul><li>Cubes etc. </li></ul></ul><ul><ul><li>OLAP apps </li></ul></ul>
  30. 30. Example Data Warehouse (2) <ul><li>Caltech’s DW </li></ul><ul><li>Five Parts: </li></ul><ul><ul><li>Source systems </li></ul></ul><ul><ul><li>ETL processes </li></ul></ul><ul><ul><li>Data marts </li></ul></ul><ul><ul><li>FM/metadata </li></ul></ul><ul><ul><li>Reporting and analysis tools </li></ul></ul><ul><ul><li>Note: They’re also customers of Cognos! </li></ul></ul>
  31. 31. So Where is Colorado College? <ul><li>Phil Goldstein (Educause Center for Applied Research fellow) identifies the major deployment levels: </li></ul><ul><ul><li>Level 1: Transactional systems only </li></ul></ul><ul><ul><li>Level 2a: ODS or single data mart; no ETL </li></ul></ul><ul><ul><li>Level 2: ODS or single data mart with ETL tools </li></ul></ul><ul><ul><li>Level 3a: Warehouse or multiple marts; no ETL; OLAP </li></ul></ul><ul><ul><li>Level 3b: Warehouse or multiple marts; ETL; OLAP </li></ul></ul><ul><ul><li>Level 3: Enterprise-wide warehouse or multiple marts; ETL tools; OLAP tools </li></ul></ul><ul><li>Goldstein’s study was just released in late 2005 </li></ul><ul><li>It’s very good; based on real survey data </li></ul><ul><li>Which level is Colorado College at? </li></ul>
  32. 32. Implementing a Data Warehouse <ul><li>In many organizations IT people want to huddle and work out a warehousing plan, but in fact </li></ul><ul><ul><li>The purpose of a DW is decision support </li></ul></ul><ul><ul><li>The primary audience of a DW is therefore College decision makers </li></ul></ul><ul><ul><li>It is College decision makers therefore who must determine </li></ul></ul><ul><ul><ul><li>Scope </li></ul></ul></ul><ul><ul><ul><li>Priority </li></ul></ul></ul><ul><ul><ul><li>Resources </li></ul></ul></ul><ul><li>Decision makers can’t make these determinations without an understanding of data warehouses </li></ul><ul><li>It is therefore imperative that key decision makers first be educated about data warehouses </li></ul><ul><ul><li>Once this occurs, it is possible to </li></ul></ul><ul><ul><ul><li>Elicit requirements (a critical step that’s often skipped) </li></ul></ul></ul><ul><ul><ul><li>Determine priorities/scope </li></ul></ul></ul><ul><ul><ul><li>Formulate a budget </li></ul></ul></ul><ul><ul><ul><li>Create a plan and timeline, with real milestones and deliverables! </li></ul></ul></ul>
  33. 33. Is This Really a Good Plan? <ul><li>Sure, according to Phil Goldstein (Educause Center for Applied Research) </li></ul><ul><li>He’s conducted extensive surveys on “academic analytics” (= business intelligence for higher ed) </li></ul><ul><li>His four recommendations for improving analytics: </li></ul><ul><ul><li>Key decisionmakers must lead the way </li></ul></ul><ul><ul><li>Technologists must collaborate </li></ul></ul><ul><ul><ul><li>Must collect requirements </li></ul></ul></ul><ul><ul><ul><li>Must form strong partnerships with functional sponsors </li></ul></ul></ul><ul><ul><li>IT must build the needed infrastructure </li></ul></ul><ul><ul><ul><li>Carleton violated this rule with Cognos BI </li></ul></ul></ul><ul><ul><ul><li>As we discovered, without an ETL/warehouse infrastructure, success with OLAP is elusive </li></ul></ul></ul><ul><ul><li>Staff must train and develop deep analysis skills </li></ul></ul><ul><li>Goldstein’s findings mirror closely the advice of industry heavyweights – Ralph Kimball, Laura Reeves, Margie Ross, Warren Thornthwaite, etc. </li></ul>
  34. 34. Isn’t a DW a Huge Undertaking? <ul><li>Sure, it can be huge </li></ul><ul><li>Don’t hold on too tightly to the big-sounding word, “warehouse” </li></ul><ul><li>Luminaries like Ralph Kimball have shown that a data warehouse can be built incrementally </li></ul><ul><ul><li>Can start with just a few data marts </li></ul></ul><ul><ul><li>Targeted consulting help will ensure proper, extensible architecture and tool selection </li></ul></ul>
  35. 35. What Takes Up the Most Time? <ul><li>You may be surprised to learn what DW step takes the most time </li></ul><ul><li>Try guessing which: </li></ul><ul><ul><li>Hardware </li></ul></ul><ul><ul><li>Physical database setup </li></ul></ul><ul><ul><li>Database design </li></ul></ul><ul><ul><li>ETL </li></ul></ul><ul><ul><li>OLAP setup </li></ul></ul>Acc. to Kimball & Caserta, ETL will eat up 70% of the time. Other analysts give estimates ranging from 50% to 80%. The most often underestimated part of the warehouse project!
  36. 36. Eight Month Initial Deployment Step Duration Secure, configure network 1 day Deploy physical “target” DB 4 days Learn/deploy ETL tool 28 days Choose/set up modeling tool 21 days Design initial data mart 7 days Design ETL processes 28 days Hook up OLAP tools 7 days Publicize, train, train 21 days Step Duration Begin educating decision makers 21 days Collect requirements 14 days Decide general DW design 7 days Determine budget 3 days Identify project roles 1 day Eval/choose ETL tool 21 days Eval/choose physical DB 14 days Spec/order, configure server 20 days
  37. 37. Conclusion <ul><li>Information is held in transactional systems </li></ul><ul><ul><li>But transactional systems are complex </li></ul></ul><ul><ul><li>They don’t talk to each other well; each is a silo </li></ul></ul><ul><ul><li>They require specially trained people to report off of </li></ul></ul><ul><li>For normal people to explore institutional data, data in transactional systems needs to be </li></ul><ul><ul><li>Renormalized as star schemas </li></ul></ul><ul><ul><li>Moved to a system optimized for analysis </li></ul></ul><ul><ul><li>Merged into a unified whole in a data warehouse </li></ul></ul><ul><li>Note: This process must be led by “customers” </li></ul><ul><ul><li>Yes, IT people must build the infrastructure </li></ul></ul><ul><ul><li>But IT people aren’t the main customers </li></ul></ul><ul><li>So who are the customers? </li></ul><ul><ul><li>Admissions officers trying to make good admission decisions </li></ul></ul><ul><ul><li>Student counselors trying to find/help students at risk </li></ul></ul><ul><ul><li>Development offers raising funds that support the College </li></ul></ul><ul><ul><li>Alumni affairs people trying to manage volunteers </li></ul></ul><ul><ul><li>Faculty deans trying to right-size departments </li></ul></ul><ul><ul><li>IT people managing software/hardware assets, etc…. </li></ul></ul>

×