There’s a growing torrent of data. In a previous webinar we talked about 3 kinds of things happening in BIHigh Volume of Data Sources. Datasets are getting larger and larger, and the speed the data comes in is so fast and so users need high-performance systems.With this data explosion, comes the use of analysis not only in traditional BI sense, but also in operations. So, there’s a need for business units to get some simple analytics for transactions that happened for example in the last second, or last minute.At the same time, we still need to do the traditional BI where it uses massive datasets, running historical analysis, which when running on massive datasets can take days.Big Data is everywhere and it requires processing of massive datasets fast and in real-timeFor example at telco may want to use their CDR records to respond to service calls and complaints quickly and thereby improve their service assurance. Operations may want to use such data for real-time authorization. And /or for fraud detection and analysis.Retail may want to do real-time inventory and margin optimization.Banks may want to capitalize on their exadata to perform risk analysis faster and better, so as to reveal more opportunities.Healthcare may want to efficiently parse its patient information and understand the most effective treatments and how to efficiently administer them.
Big Data is everywhere and it requires processing of massive datasets fast and in real-timeFor example at telco may want to use their CDR records to respond to service calls and complaints quickly and thereby improve their service assurance. Operations may want to use such data for real-time authorization. And /or for fraud detection and analysis.Retail may want to do real-time inventory and margin optimization.Banks may want to capitalize on their exadata to perform risk analysis faster and better, so as to reveal more opportunities.Healthcare may want to efficiently parse its patient information and understand the most effective treatments and how to efficiently administer them.There’s a sea of information but we still cannot make out on what’s actually happening. Because of one issue..
And what is that issue?SPEED..Slowness of BI queries, Slow response times is the biggest issue in BI.Benefit of a fast query response is, the more business benefits reported, and the more likely the business goals will be achieved.Some survey results:The Data Warehouse Institute Best Practice Report surveyed their members for what will eventually drive them to replace their current DW platform. Poor Query Performance was the number one reason.BI Survey 8 will included a few thousands of organizations around the world also stated query performance as a deterrent to using BI. Ralph Kimball also reports his top 3 issues. So there’s speed again. He also talks about costs – which includes costs of implementation, costs of hw, cost of support, and cost of adopting to changes in BI. Things change quickly in the user’s perspectives, so how do we deal with that in BI? What kind of rework is required?
Requirements change. In fact, seventy percent of survey by Forrester, respondents say their requirements change on a monthly, daily, or even hourly basis. What's more, 22 percent of respondents believe their requirements change too frequently, that it becomes difficult for traditional BI applications to "keep up.”
Speed is a major challenge in BI. Per Gartner, nearly 70% of data warehouse and BI implementations experience performance constrained issues of various types.
Here’s a typical BI and reporting architecture. In a typical BI and reporting architecture, you will usually have several data sources – you’ve got your ERP systems, your CRM systems, your sales, your POS, your existing data warehouses, and other sources, and because of the data explosion, reporting can experience bottlenecks in performance. And with this data explosion, comes the use of analysis not only in traditional BI sense, but also in operations. Business users require real-time and simple analytics. As data flowing in is fast, users want to be able to get insights fast also. For example for fraud management, chances are you will want to be alerted immediately if there’s a possible case of fraud, without it going through the typical data warehousing, churning, and BI cycle. There’s a need for business units to get some simple analytics for transactions that happened for example in the last second, or last minute.But how to achieve that if we’re having query issues. How in fact do we empower operational analytics when we still have for IT to run batch job reports? In one organization I worked for, every month-end the reporting server just painfully slows down and hangs because of the users’ adhoc queries, our technical department had to reallocate CPU during these instances. Some of the users are resigned to just wait for the reports from the weekly batch jobs, which may not be up-to-date, but what to do? The users cannot wait for one hour to get a report online or adhoc. It’s a pity that we want to empower our users to be more self-sufficient, but 1 hour for a user to get their adhoc report is not an acceptance performance.Performance is an issue.
What can we do?There’s a couple of approaches – speeding up the hw, or you can actually take what you have now and work on it more quickly and more smartly in that environment.
On HWEvery year databases shows incremental improvements, as shown in this report. It’s true that there are improvements in the database performance but a lot of it is becauseof incremental improvements in HW as well.
Here’s a smarter approach.. Let’s go back to that TPC-H benchmark. See that blue bar encircled in red?Gartner calls it a game-changing technology. Robin Bloor says it’s 4 years ahead of its competition.The TPC shows 68% better price performance over the other databases.Billy, will tell you more about this.
Query at Speed of Thought
Query at the Speed of Thought ! 20 Oct, 11-12pm Singapore time 2pm Sydney time
Big DataWe’re drowning in a sea of data• High volume of data sources• Operational analysis• Massive data sets from traditional BI
Big Data Is Everywhere Data Fast Operations Real-time Source Analytcs Real-time Write/index all trades, Show consolidated markets store tick data risk across trader Call initiation Real-time Fraud request authorization detection/analysis Inbound HTTP Hit logging, analysis, Reporting hot-spots / requests alerting site specific activity Rank scores: Online game • Defined intervals Leaderboard lookups • Player “bests” Check/update Report live ad and Website or device balance, serve ad click-thrus by device Package status, lost Package location Sensor scan shipment, package updates rerouting3 3
The Single Biggest Issue in BI Today SPEED 1. Poor Query Response TDWI Q4 2009 Best Practices Report 1. Query Performance Slow 1. Speed, Costs & Irrelevance Ralph Kimball4
Requirements Change Fast 70% Forrester Research – Q1 2010 49% “Proportion of the cases when BI requirements cannot be Survey respondents saying fulfilled by a canned,their “requirements change structured production andon a monthly, daily or even need free form exploration hourly basis.” and analysis.”
“Gartner clients increasingly report performance constrained data warehouses during inquires. Based on these inquiries, we estimate that nearly 70% of data warehouses experience performance-constrained issues of various types.”Magic Quadrant for Data Warehouse Database Management SystemsGartner Group – Jan 2010
Reporting / BI Architecture End-Users ERP ETL EnterpriseCRM Data Warehouse BISCM QUERY ISSUES Reporting ApplicationsLegacyOLTP
What can we do? Faster Smarter Hardware? Software?
TPC-H Benchmark Results – Because of Faster HW?
Smarter Approach “This inevitably puts VectorWise 4 years ahead of the competition in terms of performance – Game Changing and it will remain 4 years ahead until some Technology competitor finds a way to Don Feinberg, Gartner Group catch up at a software level.68% Better Price Performance This is unprecedented.”Price/QphH TPC-H@100GB Robin Bloor (Analyst)