Web Analytics: Challenges in Data Modeling

3,620 views

Published on

This presentation accompanied a great talk on Web Analytics by Anne Marie Macek, Senior Manager in Data Strategy at Marriott International, at the DC Business Intelligentsia Meetup on December 11.

For more info on future events visit: http://www.meetup.com/BusinessIntelligentsiaDC/events/150884302/

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,620
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
78
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide
  • How many attendees know Web Analytics? How many know Data Modeling? Attempt to bring everyone to an even playing field, then focus on challenges.
  • Wikipedia definition
  • Performance <> uptime, response time. Talking about Business Results.
  • Web Analytics: Challenges in Data Modeling

    1. 1. WEB ANALYTICS CHALLENGES IN DATA MODELING
    2. 2. AGENDA • Introduction to Web Analytics • Data Sources, Data Capture • Vocabulary • Data Modeling Basics • Relational vs. Dimensional • Normalization, De-normalization, Aggregation • Web Analytics + Data Modeling • Four-tiered Data Model for Web data • Challenges • Q&A
    3. 3. INTRODUCTION • Anne Marie Macek • Senior Manager, Data Strategy • Consumer Insight and Revenue Strategy • Marriott International • 30+ years Data Modeling and Reporting • 14+ years Data Warehousing and Business Intelligence • 4+ years Web Analytics Data and Reporting • MBA, Management Information Systems • BS, Mathematics and Computer Science
    4. 4. EXPERIENCE • Data Modeling: • Flat Files, IMS/DB, DB2, Oracle, Netezza • MS Access, Borland Paradox • Cognos Powerplay, MS Analysis Services, Cognos 10.2 Dynamic Cubes • Reporting: • COBOL, Focus, SAS, Actuate • Cognos BI Suite • Business Functions: • eCommerce, Revenue Management, Sales & Marketing • Human Resources, Finance
    5. 5. DEFINITION • Web analytics is the measurement, collection, analysis and reporting of internet data for purposes of understanding and optimizing web usage. Source: Wikipedia
    6. 6. OBJECTIVES • Website Performance • Conversion Rate ($ sales / # visits) • Trends over time • In Response to Campaigns • Website Optimization • Customer Behavior • Technological Trends • Integration • Customer Lifetime Value / Segmentation • Personalization • Proactive display of pertinent information
    7. 7. DATA SOURCES • • • • • • • • • • • Click-stream Data Search Engine Optimization (SEO) Campaign Classification Email Campaigns Advertising Impressions 3rd Party Marketing Data IP Geolocation Competitive Analysis Customer Information Multi-channel Analysis Outcome Data
    8. 8. CLICKSTREAM COLLECTION • Web Log Files • Rudimentary data collected on company’s web server • Page name, IP address, browser, date/time • Does not screen out search engine robots • JavaScript Tagging (Google Analytics, Omniture, WebTrends) • • • • As page loads, data is sent to 3rd party for collection Assigns a cookie to the user Can implement custom tags on specific pages Does not count pages served from cache • Packet Sniffers (Cloudmeter Pion, Tealeaf CX Connect) • Software or hardware layer installed on web servers • Parsing raw data, and ensuring PII can be complex
    9. 9. CLICKSTREAM ANALYSIS • Number of Visitors • Total vs. Unique • New vs. Repeat • Source of Visit (Session) • External Link (Campaign Analysis / Attribution) • Direct • Searches Performed On Site • Keywords • Sort Order of Results • Page Analysis • Specific Actions Performed • Order (Booking) • Signup for Membership, Credit Card, Event • Abandonment (Bounce Rate)
    10. 10. BRINGING CLICKSTREAM IN-HOUSE • Control/Consolidate Business Rules • Integration with Corporate Systems of Record • Single Version of the Truth • Integration with Other Web Data Sources • Enable more “intelligent” metrics • Not all visits are a conversion opportunity • Shift from “visit analysis” to “customer analysis” • Enable advanced statistical and predictive modeling • Multi-touch Attribution • Pay Per Click (PPC) Keyword Bid Optimization
    11. 11. CLICKSTREAM CHALLENGES • “Clickstream data … is delightfully complex, ever changing, and full of mysterious occurrences.” Avinash Kaushik, Web Analytics: An Hour a Day • Volume • Cons- It’s big • Pros- It’s incremental • • • • • • • Fairly Unstructured Exceptions to every rule Mobile App vs. Mobile Web vs. Desktop Rapidly Changing Most queries require trending YTD + 2 years’ history Few “natural” metrics; most require count (distinct) How do I model this data??
    12. 12. DATA WAREHOUSE APPROACHES Bill Inmon Ralph Kimball • DW is Central Repository of all Enterprise Data • “Top Down” • Relational Model (3NF) • Feeds Functional Data Marts • Huge Undertaking • DW is the “Virtual” Integration of Various Functional Data Marts • “Bottom Up” • Dimensional Model • Quicker to Develop • Silo-ed and Redundant
    13. 13. RELATIONAL MODEL Source: sqlservercentral.com
    14. 14. DIMENSIONAL MODELS Star Schema Snowflake Schema Source: Wikipedia
    15. 15. NORMALIZATION • Removes redundancy and dependency from data structures. • 1NF: Remove Repeating Groups • 2NF: Remove Partial Key Dependencies • 3NF: Remove Dependencies Among Attributes • Tutorial: http://phlonx.com/resources/nf3/ • Data Warehouses require some De-Normalization to improve query performance
    16. 16. ECOMMERCE DATA WAREHOUSE Native Source Model Fact Model BI Model Aggregate Model
    17. 17. NATIVE SOURCE MODEL Plus • In-database copy of the source data • Stores data elements we are not yet ready to model further • Maintains details for research purposes • Prevents repeating historical conversion Minus • • • • Huge Unstructured Not normalized (at all) Not useful for analysis or reporting
    18. 18. NATIVE SOURCE MODEL
    19. 19. FACT MODEL Plus • “Snow-relational” • Nearly Normalized (optimized for load) • Multiple Fact & Extension Tables (manage I/O) • Granular (click row) • Contains keys to integrate with enterprise data Minus • Complex load including propagation and look-back • Use requires nonfiltered joins of massive tables • Difficult to use for analysis, cannot be used for reporting
    20. 20. FACT MODEL
    21. 21. BI MODEL Plus Minus • “Star-flake” Model • De-normalized (optimized for query) • Pre-joined • Granular (click row) • Integrated with enterprise data at load time • Useful for detailed analysis • Complex load process • It’s still big! • Corrections to Fact Model data issues require re-build or complex conversion processes • Difficult to use for reporting
    22. 22. BI MODEL
    23. 23. AGGREGATE MODEL Plus • Star Schema (simple) • De-normalized (optimized for query) • Aggregated • Fast query performance • Great for predetermined reports Minus • Corrections to Fact Model data issues and embedded dimensions require re-build • Count distincts only available for predetermined dimensions • Limited use for analysis
    24. 24. AGGREGATE MODEL
    25. 25. QUESTIONS? • Thank You!

    ×