Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Case study- Real-time OLAP Cubes

4,442 views

Published on

  • THIS IS NOT A JOKE. TRUST GREAT DR KLIN . My names Is Jamie from United States. Great Dr. klin is really powerful and knows how to do his job perfectly. If not for him i could not imagine what would have happened. My ex was always picking up an argument with me and was always beating me. He started acting strange and funny and left me for no cause for another girl. I thought all hope was gone. I searched for help on the internet and i came across Great klinspelltemple@gmail.com . I saw so many testimonies about him. I was delighted and i contacted him on his email and told him what was going on. He laughed and assured me to calm down that he will help me and that my ex will come back begging me. He did his work and cast the spell for me and in 3 days time, my ex came apologizing just as he told me. Today we both live together and we are even more in love than how we use to be before he left. Contact the Great Dr. klin for your own help. You can contact him on: klinspelltemple@gmail.com
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Case study- Real-time OLAP Cubes

  1. 1. Case study: Quasi real-time OLAP cubes by Ziemowit Jankowski Database Architect
  2. 2. OLAP Cubes - what is it? ●Used to quickly analyze and retrieve data from different perspectives ●Numeric data ●Structured data: ocan be represented as numeric values (or sets thereof) accessed by a composite key oeach of the parts of the composite key belongs to a well-defined set of values ●Facts = numeric values ●Dimensions = parts of the composite key ●Source = usually a start or snowflake schema in a relational DB (other sources possible)
  3. 3. OLAP Cubes - data sources Star schema Snowflake schema Production outcome Product Date District Sub- district Production outcome Product Date District Year Month Day of week
  4. 4. OLAP Facts and dimensions ●Every "cell" in an OLAP cube contains numeric data a.k.a "measures". ●Every "cell" may contain more than one measure, e.g. forecast and outcome. ●Every "cell" has a unique combination of dimension values. District Product
  5. 5. OLAP Cubes - operations ●Slice = choose values corresponding to ONE value on one or more dimensions ●Dice = choose values corresponding to one slice or a number of consecutive slices on more than 2 dimensions of the cube
  6. 6. OLAP Cubes - operations (cont'd) ●Drill down/up = choose lower/higher level details. Used in context of hierarchical dimensions. ●Pivot = rotate the orientation of the data for reporting purposes ●Roll-up
  7. 7. OLAP Cubes - refresh methods ●Incremental: opossible when cubes grow "outwards", i.e. no "scattered" changes in data oonly delta data need to be read orefresh may be fast if delta is small ●Full: opossible for all cubes, even when changes are "scattered" all over thedata oall data need to be re-read with every orefresh may take long time (hours) Time Cube data Updates Time Cube data New data
  8. 8. The situation on hand ●Business operating on 24*6 basis (Sun-Fri) ●Events from production systems are aggregated into flows and production units ●Production figures may be adjusted manually long after production date ●Daily production figures are basis for daily forecasts with the simplified formula: forecast(yearX) = production(yearX-1) * trend(yearX) + manualFcastAdjustm ●Adjustments in production figures will alter forecast figures ●Outcome and forecast should be stored in MS OLAP cubes as per software architecture demands ●The system should simplify comparisons between forecast and outcome figures
  9. 9. Software ●Source of data: oRelational database oOracle 10g database oextensive use of PL/SQL in database ●Destination of data: oOLAP cubes - MS SQL Server Analysis Services (version 2005 and 2008) ●Other software: oMS SQL Server database
  10. 10. QUESTION Can we get almost real-time reports from MS OLAP cubes? ANSWER YES! The answer lies in "cube partitioning".
  11. 11. Cube partitioning - the basics ●Cube partitions may be updated independently ●Cube partitions may not overlap (duplicate values may occur) ●Time is a good dimension to partition on Time
  12. 12. MS OLAP cube partitioning - details ●Every cube partition has its own query to define the data set fetched from the data source ●The SQL statements define the non-overlapping data sets Relational DB Partitioned cube SQL query a SQL query b SQL query c SQL query d Data source
  13. 13. MS OLAP cube partitioning - details Relational DB Partitioned cube SQL query a SQL query b SQL query c SQL query d Data source Dim Dim Dim Facts Small amount of data Large amount of data
  14. 14. How to partition? - theory ●Partitions with different lengths and different update frequencies: ocurrent data = very small partition, very short update times, updated often o"not very current" data = a bit larger partition, longer update times, updated less often ohistorical data = large partition, long update times, updated seldom ●Operation 24x6 delivers the "seldom" window
  15. 15. How to partition? - theory cont'd ●One cube for both forecast and outcome Forecast measure Outcome measure One year into the future Now Last month Last yearHistory
  16. 16. Solution - approach one Decisions: ●Cubes partitioned on date boundaries ●MOLAP cubes (for better queryperformance) ●Use SSIS to populate cubes odimensions populated by incremental processing ofacts populated by full processing ojobs for historical data must be run after midnight to compensate for date change Actions: ●Cubes built ●SSIS deployed inside SQL Server (and not filesystem) ●SSIS set up as scheduled database jobs
  17. 17. Did it work? No! Malfunctions: ●Simultaneous updates of cube partitions could lead to deadlocks ●Deadlocks left cube partitions in unprocessed state Amendment: ●Cube partitions must not be updated simultaneously
  18. 18. Solution - approach two Decisions: ●Cube processing must be ONE partition at a time ●Scheduling done by SSIS "super package": oSQL Server table contains approx. frequency and package names o"super package" executes SSIS packages as indicated by the table Actions: ●Scheduling table created ●"Super package" created to be self-modifying
  19. 19. Did it work? Not really! Malfunctions: ●Historical data had to be updated after midnight and real- time updates for "Now" partition were postponed. This was done to avoid "gaps" in outcome data and "overlappings" in forecast data. ●Real-time updates ended soon after midnight and were resumed a few hours later. (That was NOT acceptable.) Amendment: ●Re-think!
  20. 20. Solution - approach three Decisions: ●Take advantage of 6*24 cycle (as opposed to 7*24) ●Switch dates on Saturdays only othe "Now" partition had to stretch from Saturday to Saturday oall other partitions had to stretch from a Saturday to another Saturday ●Re-process all time-consuming partitions on Saturday after switch of date
  21. 21. Solution - approach three cont'd Actions: ●Create logic in Oracle database to do date calculations "modulo week", i.e. based on Saturday. Logic implemented as function. ●Rewrite SQL statements for cube partitions so that they employ the Oracle function (as above) instead of current date +/- given number of days. ●Reschedule the time consuming updates so they run every 7th day.
  22. 22. Did it work? Yes! Malfunctions: ●None, really.
  23. 23. Lessons learned ●It is possible to build real-time OLAP cubes in MS technology ●It is possible to make the partitions self-maintaining in terms of partition boundaries ●The concept need careful engineering as there are pits in the way.
  24. 24. Omitted details Some details have been omitted: ●the quasi real-time updates are scheduled to occur every 2nd or 3rd minute ●scheduling is not exact, as the Super-job keeps track of what is to be run and when and executes SSIS packages based on "scheduled-to-run" state, their priority and a few other criteria ●the source of data is not a proper star schema, it is rather an emulation of facts and dimensions by means of data tables and views in Oracle.

×