The Architecture of Agile Data Warehousing
Upcoming SlideShare
Loading in...5
×
 

The Architecture of Agile Data Warehousing

on

  • 344 views

According to Forrester Research, only 12% of data in organizations is being used for analytics. ...

According to Forrester Research, only 12% of data in organizations is being used for analytics.

New Agile Data Warehousing techniques simplify combining all of your data to accelerate user-driven analytics and data discovery.

In this 1-hour session, our GoodData experts will teach you how to:

- Gain competitive advantage through analytics
- Leverage ALL data sources for more powerful insights
- Eliminate complexity and accelerate user-driven results

Statistics

Views

Total Views
344
Views on SlideShare
177
Embed Views
167

Actions

Likes
1
Downloads
26
Comments
0

1 Embed 167

http://www.gooddata.com 167

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

The Architecture of Agile Data Warehousing   The Architecture of Agile Data Warehousing Presentation Transcript

  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. Architecture Of Agile Data Warehousing May 27, 2014 The
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. Cory Vander Jagt, Product Marketing Speakers Pavel Kolesnikov, Product Management
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. ▶ Market trends ▶ Architecting for agility ▶ 5 requirements for an agile data warehouse ▶ Implementation best practices ▶ Checklists and tips for success What We’ll Cover
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. 12% of Data in organizations is being used for analytics. 52% of simple BI requests take a week or more to turn around. NOT ENOUGH DATA CHANGES NOT FAST ENOUGH <1/3 of complex BI requests are fulfilled within one month. The Analytic Data Reality… Less Than Ideal
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. You and your 12%.
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. CloudOn-premise Structured Unstructured OPERATIONAL DATA ENTERPRISE “DARK” DATA ENTERPRISE SAAS DATA SOCIAL MEDIA DATA Contacts, Web logs, Email, RequestsTransactions, Meters, RFID, Sensor, GPS, Monitoring
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. Users lack a complete and trusted view to relevant and timely information.
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. WHY CUSTOMERS SEEK THE CLOUD High volume Social, public, machine… Departmental operations Sales, payroll, HR, marketing… Operating costs No infrastructure, no upgrades… Data is moving away from traditional warehouses… …and outside the organization
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved.
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. GoodData Confidential. 2014 GoodData Corporation. All rights reserved. Manages high complexity big data issues at scale. Increase efficiency of IT resources. Empower data-driven decisions at all levels. Massive data infrastructure available as a service. Self-service dashboards, reports & analytics. 10x faster than traditional solutions. Open Analytics PaaS Deliver time to value.
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved.
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. What Companies Ask Us How can we keep up with all of our data requests? How do we make analytics affordable? How can we expose critical data to our customers?
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved.GoodData Confidential. 2013 GoodData Corporation. All rights reserved. HOW DO WE ARCHITECT DATA TO ENHANCE AGILITY?
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. Welcome changing requirements, even late in development. Agile processes harness change for the customer's competitive advantage. -http://agilemanifesto.org/principles.html
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. Agile? ▶ Find out where you are ▶ Take a small step towards your goal ▶ Adjust your understanding based on what you learned ▶ Repeat (Dave Thomas, http://pragdave.me/blog/2014/03/04/time-to-kill-agile/)
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. Agile? ▶ Find out where you are ▶ Take a small step towards your goal ▶ Adjust your understanding based on what you learned ▶ Repeat There are no agile tools. But some tools can help you increase agility.
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. Business User Top Requirements: • Access to Line of Business dashboards and reports • Automated delivery and export to multiple tools and formats • Self service ad hoc analysis
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. ▶ Ad hoc reporting. Self-service to meet all demands of demanding execs: CEO, CFO, CMO, VP Sales… ▶ Experience. Even an exec can use ▶ What’s Comcast’s story? We moved from “I think” to “I know” Metric-ify management
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. Business Analyst Top Requirements: • Define models, KPIs, metrics, etc. • Quickly build new analytics projects based on requirements • Support business requests
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. Styling the Cloud ▶ Customer Obsession. Combine 6 cloud data sources to identify top customers ▶ Speed to value. Dashboards live in 30 days ▶ Innovation: Zero additional resources to deploy and manage GoodData "You shouldn't have to be a database administrator to sell a great pair of pants.” David Glueck, Sr. Director of Data Science and Engineering
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. BI Developer Top Requirements: • Integration • Quality • Automation
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. Analytics = Raving Fans ▶ Speed: Fully-custom analytics offering built in four weeks. ▶ Customer obsession: Embedded analytics is one of the top three reasons customers buy Phizzle ▶ Innovation: “GoodData was the only full-stack solution that clearly met our needs. From the cutting-edge platform architecture to the sexy front end capabilities, it aligned perfectly with what we wanted to accomplish.” “When we show analytics Powered By GoodData to our customers we see jaws drop” Stephen Goldberg, VP Engineering By embedding custom analytics into its platform, Phizzle delivers a never-before- seen view of a fan to retailers, travel companies and sports teams.
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. Data Analyst Top Requirements: • Data Collection • Advanced Analytics • Data Visualization
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. Wow Clients with New Services ▸ Transformed approach at major holding company ▸ Scale for the future ▸ Developed 4 new analytic products in less than 5 months. Expanding reach to 100s of global brands and building a $10M+ practice. Agency Partner
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. Poll - Is your current data warehousing architecture sufficiently supporting all of the roles within your organization? 1 – Yes. 2 – Analysts and Developers are supported but Business Users experience significant delays. 3 – Business Users are supported but Analysts or Developers do not have sufficient access to tools or resources. 4 – No one in the organization is sufficiently supported. 5 – Don’t know / unsure. Type Your Answer In The Q/A Box
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved.GoodData Confidential. 2013 GoodData Corporation. All rights reserved. REQUIREMENTS FOR AN AGILE DATA WAREHOUSE
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. 5 Key Requirements 1 2 3 Support Different Business Models Support ALL Data Sources & Types Capture Historical Changes 4 5 Scale Automatically Allow Manipulation & Transformation
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved.GoodData Confidential. 2013 GoodData Corporation. All rights reserved. BEST PRACTICES
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. 1) Complete Initial High-Level Modeling. 2) Model details throughout project Just In Time. 3) Drive adoption early. Prove the model. 4) Prioritization of requirements determines what gets completed during a Run.
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. Start with Small How? - purchase hardware and licenses? OR - provision new data warehouse / data mart with a single click/API call?
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. Start with Small Frequently Heard: IT: “Please tell me the top 3 priorities” Business: “If I give you my top 3 priorities, nothing else will get implemented.” Anticipate change to gain trust. Start with small.
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. Programmable platform Automate everything that can be automated! Examples: ▶ Create / clone a data mart ▶ Apply a change (model, ETL, dashboards, …) ▶ Deploy and schedule an ETL process
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. Separated logical and physical analytical models Report is defined as SUM(Amount) sliced by Region (Amount and Region are logical objects). It just works, regardless of physical model changes physical model A physical model B physical model C
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. Separated data storage and analytical area Analytical area: dimensional model reflecting the analytical requirements from business Storage area: authoritative data storage independent on analytical requirements (3NF, 3NF + history, DataVault…) You are always able to rebuild to analytical layer on the top of the storage area
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. Storage Area: Keep all fields, raw data, full history Analytical area: ▶ as small as possible for the best performance and usability ▶ easily extensible with extra fields / historical data Storage area: all fields, full history, raw data (no business transformation!) Affordable with a modern columnar technology
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. - Data flow visualization ETL tools - SQL - lingua franca of the data world - Multidimensional query language + logical model - Scripting language (automation, data retrieval, orchestration, reusable components …) Right tools for the job…
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. Prebuilt reusable components save your time Examples: - Downloaders: from a source to data warehouse web services (SFDC, …), databases, flat files... - Provisioning tools Bonus: what if the platform allows you to build your own reusable components? Reusable data flow components
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved.GoodData Confidential. 2013 GoodData Corporation. All rights reserved. AGILE DATA WAREHOUSING ARCHITECTURE
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. GoodData is a business analytics Platform as a Service (PaaS) that supports the entire lifecycle of data and analytics, from storage and data governance to data discovery. What Is GoodData?
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. Our Differentiators ► 100% in the cloud ► End-to-end solution ► Open APIs throughout our platform ► Leader in embeddable analytics ► Built on best available technology (HP Vertica, Splunk, etc.)
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. GoodData Open Analytics Platform
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. GoodData DSS: Data Storage Services ► Fully-managed data warehousing service powered by HP Vertica for storing: ► Raw data from source systems ► Data required for ETL processing, data governance Reporting Warehouse
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. Clustered for Scale
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. GoodData Platform Vision ▶ Data discovery is process of distilling actionable information from data ▶ Playing with data ▶ Finding answers to questions you didn't know to ask ▶ Data governance is process of collecting, persisting and transforming data that turns input data into trustworthy facts ▶ Collecting data from multiple data sources ▶ Storing complete data history ▶ Data cleansing and validation ▶ Applying business transformation
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. GoodData Platform Lifecycle
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved.
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. ▶ Data is loaded from all data sources ▶ API, database, flat file etc. ▶ … and backed up immediately ▶ Amazon S3 / HDFS (immutable) ▶ .. without semantic changes ▶ No schema enforcement ▶ Business data context COLLECT: Downloaders
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. ▶ Downloader responsibility, customizable ▶ Extract data from backup storage to DSS ▶ … Enforce schema ▶ … Validate data ▶ Mark & notify non-conforming records ▶ … Merge partial updates to complete history ▶ Handle new/modified/deleted records STORE: Extract Facts from Data
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. ▶ Business transformation ▶ Arbitrary transformations ▶ Join data sources ▶ Generate new facts (datasets) COMBINE: Join and Transform
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. ▶ Map the combined facts to specific LDMs ▶ One or more target datamarts with different models ▶ Reduce data volumes ▶ Filter (example: last 3 months) ▶ Pre-aggregate (progressive) ▶ Full loads whenever possible ▶ Incremental is a performance optimization COMBINE: Transformed Facts to Datamart
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. Data Governance: Data Loading Management Deploy, execute, schedule, monitor, notify & alert the platform runtime processes (e.g. CloudConnect, Ruby etc.)
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. ▶ Reliable HDFS or Amazon S3 based storage for staging ▶ Columnar and clustered db (Vertica) for historical data storage (DWH) ▶ All errors in subsequent phases are recovered from the DSS ▶ Both data loss or processing errors/bugs Data Storage Service (DSS)
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. ▶ MapReduce mechanisms for merging data from staging to historical storage (DWH) ▶ Amazon S3, Cascalog ▶ Deploy, execute, schedule & monitor governance processes ▶ Data transformation scripts (visual development) ▶ Ruby runtime Platform Runtime for BI Automation
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. ▶ Multitenant ▶ Across multiple data centers ▶ Service-based ▶ Scalability and Failover ▶ Asynchronous load balancing among many instances of stateless services ▶ Clustering and distributed processing for (few) stateful services Architecture Principles
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. High Level Architecture
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. Reporting Warehouse Sample Reference Data Flow Setup
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. Infrastructure • Network & Hardware • Distributed File System • Distributed Storage • Workflow Engine • Data Warehouses • Event Stores • OLAP Data Marts • App & Web Servers • Load Balancers Platform Services • Privacy & Security • Sharing • Integration • Customization • Metadata Layer • Web Services / APIs • Modeling Framework • Statistical Functions • Explorers & Dashboards • Ad-hoc Reporting Operations • Authentication • Resource Scaling • Availability • Connectivity • Monitoring & Alerting • Patch Management • Upgrades • Backup & Archiving • Network Operations Center Buyer Requirements ✓ User Experience ✓ Custom Branding ✓ Trust ✓ Collaboration ✓ Open Platform ✓ Instant Scale ✓ Real-time Access ✓ Speed to Value What Users Get What Users See
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved.GoodData Confidential. 2013 GoodData Corporation. All rights reserved. ✓ Will current and future business models be supported? ✓ Does it capture ALL types of data from any source? ✓ Are historical changes automatically recorded? ✓ Will Storage capacity scale automatically with future growth? ✓ Can we manipulate and transform the data as required? Agile Data Warehousing Checklist
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved.GoodData Confidential. 2013 GoodData Corporation. All rights reserved. ✓ Start with Small! ✓ Anticipate Changes ✓ Automate as much as possible ✓ Separate: ✓ Logical model ✓ Analytical model ✓ Data storage ✓ Select the right tools for the job ✓ Know you data sources, keep all fields & full history Implementation Best Practices
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. Questions?
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. LinkedIn cvanderjagt pavelkolesnikov Learn More www.gooddata.com Contact GoodData team@gooddata.com
  • GoodData Confidential. 2013 GoodData Corporation. All rights reserved. Thanks!