Successfully reported this slideshow.
Your SlideShare is downloading. ×

50 Shades Of Fail Geneva

Loading in …3

Check these out next

1 of 76 Ad

More Related Content

Similar to 50 Shades Of Fail Geneva (20)

Recently uploaded (20)


50 Shades Of Fail Geneva

  1. 1. @Nephentur freenode | obihackers slide 1 50 Shades of #Fail Analytics Worst Practices in Real Life
  2. 2. @Nephentur freenode | obihackers slide 2 • Oracle ACE Director Business Analytics • Hacking OBI since 2001 (nQuire + Peregrin aquisitions by Siebel) • Speaker at OpenWorld, KScope, regional Oracle User Groups... • Part-time blogger on Analytics, BI, DWH ( • Full-time IRC (freenode | #obihackers) and OTN addict • Oracle Analytics trainer for Oracle University since 2006 Who am I?
  3. 3. @Nephentur freenode | obihackers slide 3 Why this presentation? • BI / Analytics / Data Science has matured • Considered as being a commodity know-how • As a result it’s often done worse than ever… • …especially compared to when it was a niche skill • Hence: Back to basics presentation series • And: Worst Practices means someone else ran into the wall!
  4. 4. @Nephentur freenode | obihackers slide 4 Basics No tool or technology can replace comprehension and understanding.
  5. 5. @Nephentur freenode | obihackers slide 5 Basics Just because someone states a “requirement” does not mean it is correct or worth doing.
  6. 6. @Nephentur freenode | obihackers slide 6 Basics Being a professional means understanding as well as questioning helping companies fulfill their needs – not wants.
  7. 7. @Nephentur freenode | obihackers slide 7 Where things go wrong • Data and logic • Front-end • System, lifecycle, admin etc.
  8. 8. @Nephentur freenode | obihackers slide 8 Where things • Data and logic • Front-end • System, lifecycle, admin etc.
  9. 9. @Nephentur freenode | obihackers slide 9 Data types Incorrect handling of data types Example? INT vs. DOUBLE Why? • “In our source we store it as an integer.” • Proper data type handling underestimated or simply ignored
  10. 10. @Nephentur freenode | obihackers slide 10 Data types Result? • (−)2’147’483’647 or hard error
  11. 11. @Nephentur freenode | obihackers slide 11 Data types
  12. 12. @Nephentur freenode | obihackers slide 12 Hint Never assume you know exactly how a data type will behave. A DOUBLE is not an INT just because it’s numeric. Think of the lowest common denominator when working with heterogeneous sources.
  13. 13. @Nephentur freenode | obihackers slide 13 Database specificities Randomly setting database features Why? • “Because you read it on a blog / forum“ Result? • Generated SQL changes sometimes drastically • Painful troubleshooting
  14. 14. @Nephentur freenode | obihackers slide 14 Database specificities
  15. 15. @Nephentur freenode | obihackers slide 15 Database specificities Someone thought hard about these!
  16. 16. @Nephentur freenode | obihackers slide 16 Database specificities Especially more specific ones people like to pick out of SRs / blog pposts and use out of context!
  17. 17. @Nephentur freenode | obihackers slide 17 Database specificities
  18. 18. @Nephentur freenode | obihackers slide 18 Hint Never just copy / paste things from a blog or white paper. Especially not if the document is called “Best Practices” or “Performance Tuning”!
  19. 19. @Nephentur freenode | obihackers slide 19 Hint Don’t fall into the trap of “I know this database so I will work with others the same way”. Again: source-agnostic tool means relational databases, XML, multi-dimensional, Hadoop, etc.
  20. 20. @Nephentur freenode | obihackers slide 20 Joins and mashups Data type conversion on columns used for join specifications and mashups (or similar “workarounds”)
  21. 21. @Nephentur freenode | obihackers slide 21 Joins and mashups
  22. 22. @Nephentur freenode | obihackers slide 22 Joins and mashups Why? • Existing data models are either not properly built and clean or were never meant to be used together for analytical purposes • Database developers can’t be bothered to change their table creation scripts • “Best practices” of how to create tables are organizational rules Result? • Bye-bye data source optimization, hello full table scans
  23. 23. @Nephentur freenode | obihackers slide 23 Joins and mashups
  24. 24. @Nephentur freenode | obihackers slide 24 Joins and mashups
  25. 25. @Nephentur freenode | obihackers slide 25 Hint Push back on inadequate or inappropriate physical design. “The tool should be able to do this” is not a justification. Do it right once physically vs do it “ok” 50’000 times logically.
  26. 26. @Nephentur freenode | obihackers slide 26 Aggregations Unwittingly returning wrong calculation results due to ignorance of the impact of pre-aggregation / post-aggregation Background? • Physical mapping calculations are executed pre-aggregation • Row-by-row
  27. 27. @Nephentur freenode | obihackers slide 27 Aggregations
  28. 28. @Nephentur freenode | obihackers slide 28 Aggregations
  29. 29. @Nephentur freenode | obihackers slide 29 Aggregations Unwittingly returning wrong calculation results due to ignorance of the impact of pre-aggregation / post-aggregation Background? • Derived logical columns are execute post-aggregation • Data set
  30. 30. @Nephentur freenode | obihackers slide 30 Aggregations
  31. 31. @Nephentur freenode | obihackers slide 31
  32. 32. @Nephentur freenode | obihackers slide 32 Aggregations Why? • Simply one of the most often misunderstood facts about aggregations • “It’s the same formula so it will do the same thing!” Result? • “Wrong” results depending on interpretation and expectation
  33. 33. @Nephentur freenode | obihackers slide 33 Hint Question things when you are handed a formula. In case of doubt just go the extra miles and prepare both.
  34. 34. @Nephentur freenode | obihackers slide 34 Mixing models • Not everything can be ETL’ed together • Not all data matches on all attributes and dimensionalities • Multi-fact models and non-conformed dimensionality are still crucial Problems: • Missing results from cross-star queries • Analyses simply won’t run due to front-end post-calculations failing • Wrong results when queries implicitly hit wrong facts
  35. 35. @Nephentur freenode | obihackers slide 35 Missing models – e.g. OAC Non-conformed dimensionality / LTS content levels
  36. 36. @Nephentur freenode | obihackers slide 36 Missing models – e.g. OAC Non-conformed dimensionality / LTS content levels
  37. 37. @Nephentur freenode | obihackers slide 37 Missing models – e.g. OAC Non-conformed dimensionality / LTS content levels
  38. 38. @Nephentur freenode | obihackers slide 38 Missing models – e.g. OAC Non-conformed dimensionality / LTS content levels
  39. 39. @Nephentur freenode | obihackers slide 39 Missing models – e.g. OAC
  40. 40. @Nephentur freenode | obihackers slide 40 Hint Try to be as precise as possible from the start. Going through a 5 fact, 30 dimension model post-construction is a recipe for disaster. A workaround is never a solution. You will believe me once you have effectively your first environment which combines detailed relational data with huge, sparse cubes, XML files and a Hadoop cluster.
  41. 41. @Nephentur freenode | obihackers slide 41 Dimensional hierarchies • Don’t forget your business users and ease of use • Some developers don’t see the point of content levels! (Don’t believe me? Just do a search StackOverflow.) • “We didn’t need that in OtherReportDrawingTool42.”
  42. 42. @Nephentur freenode | obihackers slide 42 BMM #4
  43. 43. @Nephentur freenode | obihackers slide 43 Hint Work invested here will pay off a hundred-fold. Be smart. Think LEGO! Think re-usable pieces. Forget cell-oriented “reporting tool” … aka Excel!
  44. 44. @Nephentur freenode | obihackers slide 44 BMM #4 Nay Yay
  45. 45. @Nephentur freenode | obihackers slide 45 Time dimensions • Business should be able to use time in a consistent way • Temporal calculations should be easy
  46. 46. @Nephentur freenode | obihackers slide 46 Time dimensions Again: someone actually thought about this
  47. 47. @Nephentur freenode | obihackers slide 47 Time dimensions So why should one go and do this 500 times?
  48. 48. @Nephentur freenode | obihackers slide 48 Hint Be smart and think analytical. Blindly reproducing Excel formulas isn’t being productive. The how you do something is at least as important as what it achieves.
  49. 49. @Nephentur freenode | obihackers slide 49 Hint Your clients are actually paying you to do this. Have the courtesy of delivering something of quality. Guess what: repeat customers and follow-up work because they liked things trump you coming back to fix rubbish. Anybody doing it nice and usable from day 1 will win the business’ heart and budgets.
  50. 50. @Nephentur freenode | obihackers slide 50 Where things go wrong • Data and logic • Front-end • System, lifecycle, admin etc.
  51. 51. @Nephentur freenode | obihackers slide 51 Wasteful construction
  52. 52. @Nephentur freenode | obihackers slide 52 Wasteful construction And here only 4 are excluded!
  53. 53. @Nephentur freenode | obihackers slide 53 Wasteful construction
  54. 54. @Nephentur freenode | obihackers slide 54 Analyses construction
  55. 55. @Nephentur freenode | obihackers slide 55 Null values
  56. 56. @Nephentur freenode | obihackers slide 56 Null values Why? • “We might miss something otherwise.” • “Fix” data quality and modelling issues Result? As usual: performance.
  57. 57. @Nephentur freenode | obihackers slide 57 Hint An analytics solution is not the place to fix your messed up data!
  58. 58. @Nephentur freenode | obihackers slide 58 Excel Excel everywhere Using analytical platforms as an Excel exporting tool Why? • Everyone has Excel • Data on their Excel sheets makes users feel secure • “BI systems never give you enough.” • “I must able to change the data.”
  59. 59. @Nephentur freenode | obihackers slide 59 Excel Excel everywhere Using OAC / OBI as an Excel exporting tool Result? • Data pushed DB -> OBIS -> OBIPS -> Browser -> Excel • Even worse for cloud roundtrips • Data multiplied in static form and open to change for everyone Supreme surprise: Even Oracle says this is not a good idea!
  60. 60. @Nephentur freenode | obihackers slide 60 Hint Never be afraid to reject data dump “requirements”. Focus on what are the actual needs. Focus on what your users are trying to achieve.
  61. 61. @Nephentur freenode | obihackers slide 61 Front-end usage Using OBI as an Excel exporting tool Why? • Everyone has Excel • Data on their Excel sheets makes users feel secure •“BI systems never give you enough.” •“I must able to change the data.” Result? • Data pushed DB -> OBIS -> OBIPS -> Browser -> Excel • Data multiplied in static form and open to change for everyone Multi-million row settings and hundreds of columns in instanceconfig.xml Why? See above Result? Taking down whole OBI farms when generating 5m row, 200 column PDFs based on an analysis with alternating row highlighting and conditional icons.
  62. 62. @Nephentur freenode | obihackers slide 62 Write-back Using analytical tools as a data entry tool Why? • “It is a web application and must support this”. • “We don’t want to buy or use another tool.” Result? • Have you ever heard of APEX? • Analytical write-back or scenario updates does NOT mean arbitrary data entry! •
  63. 63. @Nephentur freenode | obihackers slide 63 Where things go wrong • Data and logic • Front-end • System, lifecycle, admin etc.
  64. 64. @Nephentur freenode | obihackers slide 64 Cache Turning on caching for performance Why? • It’s a quick fix and an easy smoke screen to hide actual issues in the DB. Result? • Without proper cache purging and seeding strategy stale data is assured. • The ACTUAL problem for performance is never addressed and will persist!
  65. 65. @Nephentur freenode | obihackers slide 65 Data Quality Using your analytical tool to “solve” data quality issues Example? CAST, CASE WHEN, FILTER as smoke Why? As above plus: visible data quality issues = YOU are responsible Result? • Performance degradation • Data quality issues are hidden rather than solved • Additional errors when source data changes • Disappearing data when new source data outside the hardcoded cases • ELSE statement becomes the only possible savior (“Invalid conversion”)
  66. 66. @Nephentur freenode | obihackers slide 66 Source Control and Versioning Not using source control for all your artifact Why? Admin effort perceived as waste of time / non-productive Result? • Deployment pains when chasing files and configurations • Ever tried to go back to a certain state of an environment? • Source Control and Concurrent Development for OBIEE • Several tools exist - -
  67. 67. @Nephentur freenode | obihackers slide 67 Source Control and Versioning
  68. 68. @Nephentur freenode | obihackers slide 68 Usage Tracking • Trace and monitor usage • Who does what, when and how • Who accesses what, when and how
  69. 69. @Nephentur freenode | obihackers slide 69 Usage Tracking
  70. 70. @Nephentur freenode | obihackers slide 70 GDPR • Dangerously misunderstood / underestimated • Will become law 25th May 2018 • “Who can access which data why?” • “Which data gets transformed how?” – Think “data lineage on steroids”…
  71. 71. @Nephentur freenode | obihackers slide 71 GDPR for OAC and OBIEE Full SampleApp completely graphed out
  72. 72. @Nephentur freenode | obihackers slide 72 GDPR for OAC and OBIEE Full relationships for DB column “Name”
  73. 73. @Nephentur freenode | obihackers slide 73 GDPR for OAC and OBIEE Start from one user and find all security relationships
  74. 74. @Nephentur freenode | obihackers slide 74 GDPR for OAC and OBIEE Map out all pertinent security relationships for the given user
  75. 75. @Nephentur freenode | obihackers slide 75 Hint Graphing out your environment helps you for more than just EU law Security assessments and reviews become child’s play Impact analysis is instantaneous Just ask and we’ll follow up
  76. 76. @Nephentur freenode | obihackers slide 76