Big Data and BI Best Practices


Published on

Global Business Intelligence (BI) software vendor, Yellowfin, and Actian Corporation, pioneers of the record-breaking analytical database Vectorwise, will host a series of Big Data and BI Best Practices Webinars.

These are the slides from that presentation.

The Big Data & BI Best Practices Webinars and associated slides examine the phenomenal growth in business data and outline strategies for effectively, efficiently and quickly harnessing and exploring ‘Big Data’ for competitive advantage.

1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Point of slide – introduce presenters Glen introduces himself Glen introduces Jason
  • Point of slide – introduce presenters Glen introduces himself Glen introduces Fred
  • Point of slide – Introduce our companies and why we can talk about this topic (some attendees will not have heard of us)A little bit about our 2 companiesYellowfin – mention awards#1 in BI vendor in global Wisdom of Crowds survey#1 Mobile BI by Dresner Advisory Service#1 location Intelligence by Ventana Research Actian – mention record-breaking benchmarks Broken performance and price/performance TPC-H benchmarks by the largest margin’s ever recorded for every benchmark they have entered. And today we are going to talk about the best practices for Big Data and BI
  • Why Big Data? Data is the new oil. A big opportunity.
  • Point of slide – Establish how quickly data is growing. If you don’t have big data now you might soon.Data only grows. And Big Data is growing exponentially.Why? Growth of existing data sources, with sophistocation of computer tracking of shipments, sales, suppliers, and customers, as well as e-mail, and web traffic. Growth of new data sources and types such as geospatial, social media comments, mobile, etc
  • Point of slide – Communicate Big Data didn’t suddenly appear, but now technology exists to leverage it. We’ve always had big data, but now we have the tools and the cost has come down enough to harvest and make value from it. Why is Big Data a Big Deal Now…
  • Point of slide – Don’t confuse your Big Data problem with Googles. They are not the same.But not all Big Data is created Equal. And Big Data is relative. Google,Facebook, Twitter –are outliers that are in a class of their own. And their requirements are significantly different to large enterprise businesses, let alone the normal enterprise business and SME. And you don’t need to have petabytes of data to have a Big Data problem.
  • Point of slide – Define Big Data, and what to look for to see if you have a Big Data problem.The 3 V’s fromGartners 3 is probably the most accepted definition of Big Data because it addresses the pain points … Volume – people think terabytes or petabytes Variety – structured and unstructured data such as… Velocity – includes fast query time, and also streaming data. And for BI, this is by far the most important which we will focus a lot on today.And these are important points because you can suffer from Big Data problems without having much data at all as it’s all relative to your hardware and the tools you are using.So if you have any of these pain points, your data is too big – hence Big Data.
  • Point of slide – Framing slide, we are talking about Big Data for consumers, not analysts.This webinar is about Big Data and BI – and therefore the focus is on assisting decision makers. Too much of the Big Data discussion focuses on data scientists with bespoke projects (hypothesis, hadoop, partitioning, etc). Today we want to focus on data consumers using this in the real world. More data consumers than there are analysts – how can we empower the masses to add value from Big Data. Big Data for everyone
  • Point of slide – Where big data can add value. It’s mostly marketing. What is the opportunity?Over 45% of big data deployments are spent on marketing, with spending on digital marketing set to grow form $34B to $76B by 2016This slide is about use cases
  • Point of slide – Show what industries are using Big Data, and how easy it is for these industries to do it.
  • Point of slide – 60% are already collecting more data than they can effectively use.
  • Point of slide Big data is an opportunity, not a burden. And 70% of businesses see it that way. And 70% also expect an ROI within 1 year of investing in Big Data initiatives – hmmm, is that a bit optimistic if it takes years to build a data warehouse?
  • Point of slide – And most importantly, 84% of organizations using Big Data today say they can now make better decisions – which is what it is all about.
  • So what is big data? Why the hype?This tongue in cheek sketch that highlights the point that there is hype around big data.Roman Stanek, founder and CEO of Good Data – “Today, the difference between success and failure is the ability to monetize a new class of data. It’s ironic that, despite billions of dollars spent on business intelligence systems, we are still data-bankrupt.What that tells us is current skills and technologies is unable to deliver on the business opportunities that can be realized by Big Data. With such high stakes, its no wonder there is hype.
  • The success of any Big Data project hinges on delivering greater business value.Many focus on the monetization of Big data which means driving greater revenue or creating new revenue opportunitiesBut, depending on the industry sector it also can deliver operational efficiencies and increased services levels and customer satisfaction.The potential trap for new entrants into the Big Data arena is the temptation to develop a Big Data infrastructure for all possibilities or contingencies. As we heard from Glen earlier, the ROI window is 12 months.We must maintain a strong focus on delivering against the specific business objectives and not let the technology drive the direction of the project.
  • To realize the potential of big data, what are the levers available you? Which will you use?Personalization – Offering a better, more targeted serviceSocial – Allowing users to communicate and share with other community membersSearch – Making it easier for customers to find what they are looking for (save time improving customer sat)Finding opportunities – How do exploit the data and drive opportunities in the business. By understanding what customers, competitors, and the market are doing we can find new opportunities to exploit. Actionable insights – As Glenn stated earlier, - making it easy for your customers, suppliers, and staff to make better decisions – traditional BI. This is the lever we are focusing on today.Badoo – Collect 10million Records each Day. They had now way of quickly identifying which cThey now run std queries in 10 to 30sec wichhalps them determine which marketing campains are converting customers.
  • Does the data you have match what you want to achieve? There is typically a huge gap between what we have and what we need. What additional data is required outside the normal corporate data? External feeds can make a critical difference to monetizing big DataThen there are governance issues to consider. Who owns the data Vs Who needs the data ? Are there Security & Privacy issues at play?Then there are the Physical considerations - are there documents, email images, or video, which might take a lot of space, but isn’t traditionally used for analysis. What proportion is Structured Vs Unstructured?How much of your data is just indexing to improve performance?Again, the focus must be on collecting the data you need to answer the specific business questions you have?
  • Every industry has very specific use cases that drive Big Data Success.In areas such as…Transportation & Logistics that are detecting Fraudbefore it happens– (Timocom)Driving Sales by incorporating Environmental Data such as weather with PoS data (Sheets)Web Traffic Monitoring to determine customer behavior– GSI Commerce When you know your goals and fully understand your data requirements then you know what data you need to collect.It is then you can make a decision on what infrastructure you need.
  • Again some more humor to get the message across, Hadoop is one of the most well know Big Data solutions. And many of you in the audience today will be using it or considering it for future projects.Hadoop makes a fantastic data store for web traffic and machine data because of it’s unmatched scalability, speed and fault tolerance. However it isn’t always the best for business intelligence were the majority of uses cases are SQL or relational database type applications. So when considering deploying BI for the masses you don’t want to ask them to learn a new skill-set or have deep technical know how.One important lesson we have learned is that creating reports from Hadoop was quite time consuming, and then the query performance was actually quite slow.Today most BI tools do connect to Hadoop (through HIVE) So the key take-away from this Best practice is that the Big Data ecosystem is much bigger than just Hadoop.
  • So what does this eco system look like?Its a huge ecosystem,with many varied solutions that don’t necessarily address all of the 3V’s – Volume, Variety and Velocity.And obviously it’s impossible to accommodate for everything in a single product.With today’s webinar being focused on Big Data for BI and analytics we will focus on that analytical database space. Its built specifically for addressing Business Intelligences and tackles the velocity (speed) issue better than the any of the others.Hadoop makes a fantastic Big Data store, and there are many other Big Data solutions outside of Hadoop in the NoSQL and NewSQL area which solve different pain points, but again are not best practice for BI.Actian has a many customers who started with Hadoop and have incorporated Vectorwise because of its speed – designed if you like for the 3 V’s.
  • Performance is the number 1 issue in BI today. And with rapidly growing data its only worsening.There is a lot of evidence to support this. BI Survey – say every year slow query performance is the number 1 reason by BI projects fail. TDWI Best Practices Report – Almost half (45%), said that poor query response was the top problem that will drive them to replace their current data warehouse. And Gartner – say 70% of data warehouses experience performance constrained issues
  • Just why is it so important to have fast performance? We are the Google generation blessed with instant answers and we have become impatient.Today User Expectations are very demanding. Studies show that BI Project value & adoption drops off dramatically when queries take longer than 10 seconds to run. And on mobile devices that drops to just only 3 seconds. The bigger your data, the slower your reports will run – a huge concern.
  • The solution to this is to use the right tool for the job - It can make a dramatic difference.1. Traditional databases can perform a wide variety of different workloads, but they were never designed for the challenges and complexities of Big Data – Particularity the job of slicing and dicing data. With the amount of additional hardware and BI tuning you require to get better performance, you’d much better served getting a fast, purpose built database.2. Analytical databases ARE purpose built for slicing and dicing data. They are quick, agile, and easy to get started with. 3. Clustered databases are an option as data volumes grow but they aren’t as agile, require substantially more resources and expertise to manage and implement.
  • What are the other benefits you gain from using a fast database?1. Slash the cost of the hardware – In many recent tests and Proof of Concepts Vectorwise consistently outperforms other databases on very small servers compared with much larger racks of servers. The hardware you see on this slide is from the 1 TB TPCH benchmark – Oracle used the large Server and VW used the small 2U =Dell Server.2. Dramatically less maintenance – Take out the cost and burden of having teams of DBAs to tune the database3. Time – Deliver usable BI in much less time without the need for deep technical know how.
  • Planning for a mixed architecture will allow you to bring in the variety of data sources but still deliver on user exceptions of fast BI.And this is an example of how Hadoop might be a part of it. You will certainly need to include your operational data and data from external feeds - all critical components in the big data recipe.However, without the underlying database performance to support the BI tool, even the most brilliant tool will struggle to deliver satisfactory end user adoption.IsCool Entertainment use Hadoop and Vectorwise. They are a European leader in social gaming on Facebook (number 1 in France) with 1.2 million active monthly users. Their gaming platform is built on Hadoop, and they use Vectorwise to analyze user experience. Below is the press release with the quote.“We’re using Vectorwise to investigate consumer behavior to better understand what makes our users play, interact and recommend. Fast and actionable business analytics from Vectorwise will allow us to deliver tailored offers to customers and advertising partners, and thus improve monetization of the games we develop.” – FlorianDouetteau, CTO of IsCool Entertainment. Badoo – global dating site with 150 million members. Use Hadoop for the web application and Vectorwise for the analysis. Before Vectorwise they hard-coded a custom-built analytics solution that was limited in functionality and unable to provide the level of detail their marketing and finance teams needed. “Vectorwise gives us unfettered access to our data and the ability to run ad hoc analyses without the need to have thought of the question before we asked it. This means we can now ask anything of our data and our users’ activity and get answers in just seconds.” – Ian Broadhead, BI lead at BadooNK – Socal media site that has more users in Poland than Facebook has there. 14 million active monthly users. Use Hadoop for click-stream data, such as POST and GET requests, and AdServer logs, and ad hoc queries took sometimes days to design, build and execute. Use Vectorwise for 50-90 of their largest daily queries such as banner optimization (advertising based on user/friend preference) and gaming usage (moving around the buttons/colors, etc to see how changes users). “We looked to solutions from other vendors with analytic databases, but selected Vectorwise for its superior performance and cost-effective model.”
  • Now you have the data you need, you need to get it into the hands of the people who really need it Big Data is a big investment, and there is no point giving only a few people in your organization access to the data. The more you share data, the more value you get from it (it doesn’t lose its value) It needs to be fast, agile, drag and drop, etc – require very little training.And then finally you can get to … next slide (the magic art of making sure your data tells a story).
  • If we are going to ensure mass distribution, then we need delivery tailored for each audiences needs. farmer see map of farm (agriculture), marketing see market segmentation, transportation, etcSo when thinking about Visualization then we need it to make sense for them. Multiple use casesData visualization is critically for people to consume it. Nobody sends people on a course to understand a graph. Build for level of skill of audience PacMan – Visualization best practice points (last webinar)Powerful visualization is the best way to express data – the more data you have, the more focus you need.Yellowfin to list best practice for visualizing huge amounts of data.Storytelling
  • Following slides make points of… If we are going to ensure mass distribution, then we need delivery tailored for each audiences needs. Marketers are going to want to see segmentation of demographics Managers are going to want to interactively drill down into their reports Data scientists are going to want to do statistical analysis Executives are going to want to keep a close eye on KPIs Demographers, people in agriculture are going to want to see things in maps So when thinking about visualizations we need to always keep the audience in mind. Data visualization is critically for people to consume it. Nobody sends people on a course to understand a graph. So build for level of skill of audience
  • But when you visualize it, you can get your point across much better.Should re-do this in Yellowfin.
  • More data requires more focusLink to clearly defined business objectivesOnly include actionable informationInteractivity is essentialStart big, drill to detailMore data doesn’t mean more reports and visualizations, it means deeper insightSelect the right metricsIt’s not enough just to decide on what aspects of your business Big Data analytics allows you to monitor. You need to decide how you’re going to track and measure those chosen aspects, and communicate them to end-users via an agreed form of measurement.Provide contextWithout additional contextual information to help users understand data visualizations, it’s impossible for a user to understand the true meaning of the results presented, what action it requires, or whether it demands any action at all. Effectively highlight the most important information:Draw the users attention to the most pertinent pieces of information firstThe most important data should occupy the most screen real estateSelect the best, not the best looking, visualization.: The data; not the visualizations, should always be made the center of attention. Never use flashy visuals and chart types when simple alternatives are capable of conveying the same message – does the third dimension on that pie chart really add to its meaning?Avoid all design aspects that are unconnected to the task of analytic communication."Perfection is achieved, not when there is nothing left to add, but when there is nothing left to remove” -- Antoine de Saint-ExuperyUse colour appropriately and sparingly to achieve maximum impact and contrastIf all colors chosen to represent different metrics or values within a chart are eye-catching, no single point will standout above the othersSelect colours based on a clear understanding of their inherent or commonly accepted symbolic or metaphoric meaning (red = bad, etc)Be consistent. For example, if data relating to second quarter sales is displayed in purple in one chart, all other charts that display data relating to second quarter sales result should also be displayed in purpleAvoid visual clutter Avoid visually gratuitous chart typesSelect the right visualization for the data and the contextSelecting the most context appropriate visualization for a particularly metric or measure requires the judicious application of a little common sense. For example, if you’re attempting to monitor or track the change in something over time, a line graph will almost always work best. Likewise, if tracking several metrics of similar proportions – a potential example might be new leads generated for the current year by marketing category (Google Ads, LinkedIn, print media, banner advertising, etc) – using a column chart or bar graph would be an effective way to visualize the minor differences in performance between each marketing channel. Conversely, a pie chart would deliver a poor user experience as, at first glance, all the portions would seem equal. Layered maps are criticalDisplays large volumes of data efficiently and helps explain the relationships between different types of dataConsider the unique informational requirements of each defined user groupWhat information are they already aware of? What information would enable them to make more efficient and effective decisions? Support and prompt actionUsers must be enabled with a range of options to share the new information and their associated thoughts with others, in order to drive appropriate resultant action. Such information collaboration and decision-making options should include, but are certainly not limited to, the ability to:Email the relevant report to pertinent and affected stakeholdersAdd contextual knowledge to the reports in question via annotations and comments (discussion threads) and have relevant users with access to those reports notifiedAdd decision-widgets to discussion threads to facilitate voting and polling to enable fast and effective collective decision-makingEmbed fully interactive dashboards and reports externally to the BI tool, on any third-party Web-based platform, to allow external stakeholders to understand and act on the emergent issue
  • Glen does the demo…Drive point home – lets assume we have built a dashboard for a user – dashboard. Real value is that I can browse, un-aggregated. If it was to be traditional, then (show comparison).
  • Summary of what we learned today
  • Big Data and BI Best Practices

    1. 1. Big Data and BIBest Practices
    2. 2. Your presenters Year | LastYellowfin CEO, Glen RabieVP Sales & Services in APAC, ActianCorporation, Jason Leonidas
    3. 3. Your presenters Year | LastYellowfin CEO, Glen RabieGeneral Manager, ActianVectorwise, Fred Gallagher
    4. 4. About Actian and YellowfinMaking Business Intelligence easy Taking Action on Big Data History of 100GB TPC-H Performance Benchmarks Composite Queries Per Hour (Non-Clustered) 500,000.00 400,000.00 QphH@100GB 300,000.00 200,000.00 100,000.00 - Non-Vectorwise Vectorwise
    5. 5. Data isthe newoilDavid McCandlesData Journalist
    6. 6. The rise and rise of Big Data
    7. 7. There has always been Big Data… Its just that now we can actually capture and mine it effectively.Canadian Tar Fields
    8. 8. Not all Big Data is created EqualPlanet Google and friends are the outliers Large Telco The Norm .
    9. 9. Do you have a Big Data problem?
    10. 10. Big Data for Everyone• Big data is not just for data scientists and bespoke projects• Its for decision makers and data consumers• It needs to be anchored in the real world Analyst Consumers
    11. 11. Who is benefitting from Big Data?
    12. 12. Who is benefitting from Big Data?
    13. 13. Why bother with Big Data? of organizations collect60% more data than they can effectively use (MIT Sloan Management Review)
    14. 14. Why bother with Big Data? of organizations see70% Big Data as a big business opportunity (Harris Interactive) of organizations investing70% in Big Data initiatives expect ROI within 1 year (Harris Interactive)
    15. 15. Why bother with Big Data? of organizations that84% actively leverage Big Data say they can now make better decisions (Avanade)
    16. 16. Best Practices in Big DataJason Leonidas, Actian Corporation
    17. 17. Best Practices in Big DataFred Gallagher, Actian Corporation
    18. 18. What is Big Data?
    19. 19. Best Practice #1 Focus on what you want to achieve
    20. 20. It’s all about driving value
    21. 21. Big Data Levers1. Personalization2. Social3. Search4. Find opportunities5. Actionable Insights
    22. 22. Best Practice #2 Identify the data you have vs The data you need
    23. 23. Does your data match what you wantto achieve?
    24. 24. What data do you need?
    25. 25. Best Practice #3 Use the right Big Data tool for the job
    26. 26. Big Data and Hadoop
    27. 27. Big Data Eco-system Social Media Analytic Hadoop Databases Storage BIG Search DATA NewSQL “as-a-service” NoSQL Document Operational BigTable Database Key Value Graph
    28. 28. Best Practice #4 Use a fast database
    29. 29. Slow Query Performance is the#1 issue in BIBI Survey 10: Why BI Projects Fail?1. Query Performance Too SlowTDWI Best Practices Report“45% Poor Query Response the top problem that will eventuallydrive users to replace their current data warehouse platform.”Gartner Magic Quadrant Data Warehousing70% of data warehouses experience performanceconstrained issues of various types
    30. 30. User Expectations Web-Based Business Intelligence Users expect results in less than 10 seconds Mobile BIUsers expect results in less than 3 seconds
    31. 31. Use a fast databaseTraditional Database Analytical Database Clustered Database
    32. 32. Consider the hidden costs Spend Less on Hardware Get faster results on smaller hardware configurations. Spend Less Time Database Tuning Faster deployment and BI projects. No more aggregates, cubes, complex schemas, etc
    33. 33. Best Practice #5 Plan for a mixed architecture
    34. 34. Hadoop and BI architecture HadoopTransactional Fast Database BI Tool External
    35. 35. Best Practice #6Ensure mass distribution of your data
    36. 36. Big Data for Everyone Visualizations Alerts Access Anywhere
    37. 37. Best Practice #7 Tailor data delivery to each audience
    38. 38. Give your audience what they wantDemographics Interactive Reports Statistics KPIs Maps Collaboration
    39. 39. Visualization is powerful Looks like Pac-man Does not look like Pac-man 169 41 Looks like Pac-man Does not look like Pac-man
    40. 40. Big Data Visualization Tips• More data requires more focus• Interactivity is essential• Select the right metrics• Provide context• Support and prompt action
    41. 41. Demonstration
    42. 42. Big Data and BI Best Practices1. Focus on what you want to achieve2. Identify the data you have vs The data you need3. Use the right Big Data tool for the job4. Use a fast database5. Plan for a mixed architecture6. Ensure mass distribution of your data7. Tailor data delivery to each audience
    43. 43. ConclusionQuestions
    44. 44. | Last Year More @YellowfinBI @ActianCorpFeedback & Yellowfin LinkedIn User Group Vectorwise LinkedIn User Group