Good Data: Collaborative Analytics On Demand

  • 1,554 views
Uploaded on

This presentation outlines the key cababilities of the Good Data analytical platform. It also describes the platform's cloud-based architecture.

This presentation outlines the key cababilities of the Good Data analytical platform. It also describes the platform's cloud-based architecture.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • Slide 1:

    Today I will give you 30 minutes introduction of the Good Data's cloud based business intelligence platform. The key Good Data innovation is that it offers it's BI platform as a service on the Internet.

    Slide 2:

    I will start the introduction with explaining how customers and prospects use the Good Data platform. Then I compare the Good Data unique approach with the traditional BI packages. After few minutes, I'm going to show you the platform live. Right after the demo, I dive into the Good Data architecture to explain how the cloud deployment enables the unique capabilities that the platform provides. My hope is that I convince you that business intelligence landscape is changing and that Good Data is on the forefront of this change.

    Slide 3:

    Good Data bets on agile development methodology. We have developed the most what you are going to see today in about 18 months. We would never be able to achieve such results with standard development methodologies. We believe that the agile methodology is very much applicable to development and maintenance of an analysis project as well. Using agile methodologies, our customers significantly shorten down the analytic project's analysis, design and implementation cycle. They also achieve unprecedented quality and usability of their projects. We recommend them to implement their BI projects in weekly iterations. Each iteration breaks down into following steps: extracting data from source systems like CRM, ERP, Financial or logistic applications, transforming them, loading them to the platform. In the next step customers adjust their project's data model. They can connect multiple datasets together for purposes of cross-analysis, create dimension hierarchies, pre-create metrics etc. Then users develop their reports and dashboards. At the end of the cycle, users publish their reports and dashboards to company portals, 3rd party web applications, blogs etc. The Good Data platform encourages web 2.0 style collaboration among all users. This agile methodology allows end users to start working with the first reports as soon as one week after the BI project starts. Not all reports are correct in their first versions. However, the short feedback cycle allows for quick correction of all bugs and misunderstandings and leads to unprecedented alignment between the project developers and end users.

    Slide 4:

    The Good Data agile approach is very different from what we usually experience with the traditional BI platforms. End users of the traditional BI project see the first report often after more than 6 months after the project starts and after a company invests multiple hundreds of thousands dollars or euros to the project. The Good Data platform is available as a service. This removes the lengthy provisioning of the BI HW and SW components that needs to happen before the project implementation. Heavy up front investments make the traditional BI project costs significantly higher than in case of Good Data where each project starts for free. Moreover the Good Data platform is designed to handle many thousands of analytical projects at the same time. I will explain this after the live demo of the Good Data platform.

    Slide 5:

    Now I will show you the Good Data platform live. I have a little CRM analysis project here. The project contains data from Salesforce CRM system that are incrementally loaded to Good Data. The CRM data are mashed up with the industry benchmark data and Census demographic data.

    DEMO

    The development of this project took us 8 man-days. We have developed it few weeks ago and cost of running this project is somewhere around 25 cents per week.

    Slide 6:

    Let's now dive into the Good Data architecture. This slide outlines the key Good Data's architecture layers. You just saw the AJAX based user interface of our platform. The UI communicates with the stateless services that run in the API cloud. These services are available via REST HTTP API and are heavily load balanced. The API cloud contains services that give user access to executed reports report data, that provide the platform's metadata etc. All long running asynchronous tasks are passed to the Execution cloud via a queuing mechanism. The Execution cloud services are again load balanced over many cloud nodes. These execute reports, load data into projects, exports reports in various formats including PDF or Excel etc. At the end of the chain there is the Data cloud that stores the project's data in secure private spaces and performs the number crunching. This layer is absolutely crucial. Most todays BI vendors use a relational databases as the key component in this layer. So did Good Data. Unfortunately, the current database distribution and partitioning technologies don't fit well with the cloud. This is why Good Data develops the breakthrough data query and storage technology that I'm going to talk about now.

    Slide 7:

    As I have said before, the relational database technology doesn't align well with the cloud. Fortunately, we do need only small fraction of the data processing algorithms available in relational databases like filtering, aggregation and joining. These features can be partitioned and processed in parallel.

    Slide 8:

    The database clustering problems, the subset of the relational database capabilities that we need for analytical data crunching and other ideas led us to the design and implementation (that is underway) of very efficient analytical data query processing that is based on following mechanisms:

    * The in-memory data crunching is thousand times faster than processing data on disk. So we need to get data to memory as fast as possible.
    * Analytical queries usually work with all dataset rows but only few columns. Loading all columns to the memory is certainly not necessary. Partitioning the dataset by columns instead rows gives the analytical query processor huge performance advantage.
    * Memory is still scarce resource. Moore's law is on our side. However we need o help it a bit. We compress the columns and even more, we need to partition large columns to smaller chunks to be able to process them in parallel on multiple Data cloud nodes.

    Just few numbers. The 40 million records of an Apache web server log can be compressed to roughly half gigabyte. A machine with 16GB RAM can run circa 25 projects of this size in parallel before it needs to swap data out of the memory. One project of this size can be loaded into memory in about 5 seconds. The Amazon AWS machine can aggregate, filter and join roughly 4 million rows in one second. You all can do the math, right?

    Slide 9:

    The history of OLAP technologies is the never-ending clash of the speed and flexibility requirements. There are two competing approaches available on the market today: ROLAP and MOLAP. MOLAP technologies bet on pre-aggregation that usually happens during night hours. The problem is that users can't move outside the boundaries of pre-computed data cubes. On the other side the ROLAP approach provides users with ultimate flexibility. Users have all data in one relational database. All their queries are executed against the database on as needed basis.

    The ad hoc query has always been the main Good Data differentiator. You might have noticed some caches on our architecture slide. We have implemented state of the art caching mechanism to speed up the ROLAP processing. This mechanism breaks down each report execution to as many granular queries as possible and caches the small query results. These results are then reused. The caching layer speeds up Good Data analyses in order of magnitude. However the partitioning mechanism goes even further in realizing the full potential of the ROLAP technology. Now you can have both flexibility and speed.

    The cloud processing breaks some old BI laws. For example, the 20 hour long processing on one node is equal to one hour processing on 20 nodes. Cost wise, obviously. This rule works great for OLAP processing. The IT doesn't need to size the systems for peak loads anymore.

    Multi-tenancy leads to large scale sharing of computing resources, high utilization rates that shrink the costs of an analytical project.

    And last but not least, the cloud environment forces you to build stateless thus scalable systems. Cloud deployment forced us to the data query processor that we believe is a big leap forward in the OLAP technology.

    Slide 10:

    I only had few minutes to show you the exciting BI platform that we have been developing. This is certainly not enough to present all technical and business advantages that the platform provides. I encourage you to go to www.gooddata.com and sign up with our platform. It is free. Do it today and taste how Business Intelligence is going to be tomorrow.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
1,554
On Slideshare
0
From Embeds
0
Number of Embeds
6

Actions

Shares
Downloads
67
Comments
1
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Good Data: Collaborative Analytics On Demand Zdenek Svoboda, Vice President, Products Jakub Nesetril, Senior Director of User Experiences May 18, 2009 Monday, May 18, 2009
  • 2. Agenda • Good Data Introduction • BI Challenges • Product Demo • Architecture Monday, May 18, 2009
  • 3. Good Data: Business Intelligence in One Week Monday Tuesday Wednesday Thursday Friday Create & Create & Publish & Extract & Load & Adjust Adjust Share Transform Cleanse Analytical Reports / Reports / Data Data Model Dashboards Dashboards AGILE ITERATION Monday, May 18, 2009
  • 4. Traditional Approaches Oſten Fail to Deliver BI Projects Traditional Good Data Project Cost $$$$$ $$ Time to success 6-12+ months weeks Deployment enterprise on demand service Analysis & software / SaaS Implementation waterfall agile Per user cost $100s-$1000s $10s (free for many) Scalability < 10 projects 1000s of projects 4 Monday, May 18, 2009
  • 5. Product Demo Monday, May 18, 2009
  • 6. Good Data Architecture Clients API Cloud Execution Cloud Storage Cloud Metadata Report On Demand AJAX Client Server Executor Caches (Collaboration, Analytics, Async HTTP Report ETL Dashboard, Queue IP REST Results Executor Reporting) Storage PDF/XLS Report Or Results Exporter Storage 3rd Party App Report Pivoting Admin Common services: security, load balancing, routing etc. Amazon Web Services - EC2, EBS, S3 6 Monday, May 18, 2009
  • 7. Good Data Architecture Storage Cloud On Demand Caches • DB not designed for the cloud • DBMS distribution mechanism = clustering Storage • Disk is slow, in memory very fast • Memory still scarce • Processing on stateless instances is hard Storage 7 Monday, May 18, 2009
  • 8. Good Data Query Processing (under development) Data Integration 1 2 3 4 Distributed ETL 1 2 3 4 1 2 3 4 In-memory data processor 1 2 3 4 1 2 3 4 1 2 3 4 Query 1 2 3 4 Query Manager In-memory data processor 5 6 7 8 5 6 7 8 3 2 3 4 5 6 7 8 Amazon Elastic Data Storage MapReduce (partitioned, columnar, 8 (Hadoop) compressed) Monday, May 18, 2009
  • 9. Economics of Cloud Computing Make Good Data Possible Cheaper Processing • Cloud makes ROLAP possible Power • Parallel: 1 CPU x 20 hours = 20 CPUs x 1 hour Elastic Scale • BI is very sensitive to unpredictable load • IT builds for peak load, we don’t have to Massive Multi- Tenancy • Single instance across 1000s of customers Service-Oriented • Transient HW nodes (stateless design) • Massive load balancing (shared nothing) • Amazon S3 and EBS are the only persistence 9 Monday, May 18, 2009
  • 10. Text Sign up for free today! www.gooddata.com 10 Monday, May 18, 2009