Big Data Analytics on the Google Cloud Platform


Published on

Online retail is a fiercely competitive market where every retailer is trying to gain a competitive edge by understanding their customers better and analyzing their buying patterns, likes & dislikes. Such knowledge would greatly help them to target & serve their customers better, thereby increasing their sales revenues.

Big Data analytics is the answer for online retailer’s need to glean such business insights from their customer data.

In this webinar, we showcased & discussed:
- End to end data flow from session log files to analytical dashboards & reports.
- Developing solution aggregates and analyze online transactions.
- Aggregation and analysis of data residing in the session log files.

All data related techniques were demonstrated on the Google Cloud Platform.
All data visualizations were performed using Tableau.

Published in: Technology, Business
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Online Retail market has seen phenomenal growth in the recent years which is not going to abate in the next couple of decades.More Americans are planning to shop online than go down to their neighborhood mall!
  • Big Data Analytics on the Google Cloud Platform

    1. 1. Big Data Analytics on January 9th, 2014
    2. 2. GROW WITH BIG DATA. Third Eye Consulting Services & Solutions LLC.
    3. 3. For Questions Tweet Directly to @ThirdEyeCss We are actively monitoring this Twitter channel!
    4. 4. Agenda 1. 5 minutes - Introductions 2. 15 minutes - Introduction to the Google Cloud Platform & its various Big Data services 3. 10 minutes - Showcasing various Online Retail Analytics - User, Site & Products Analytics 4. 15 minutes - Live Demonstration - Ingestion of session log data to visualization in Tableau 5. 15 minutes - Q&A Session (Can extend beyond based on the audience enthusiasm & participation!)
    5. 5. Google Cloud Platform
    6. 6. Google Cloud Platform – Key Components App Engine  Big Query  Cloud SQL  Cloud Storage  Compute Engine Tweet @ThirdEyeCss 
    7. 7. App Engine - Architecture A highly elastic and scale on demand infrastructure for deploying and running front end web applications App Master Front End Instance 1 Front End Instance 2 Front End Instance 3 Front End Instance n App Server Instance 1 App Server Instance 2 App Server Instance 3 App Server Instance n Datasto re Memcac he Static Files
    8. 8. App Engine - Advantages       Scales on Demand Very low barrier for entry No initial hardware costs Issues such as scalability, reliability are non-issues Can handle very large amounts of data Can handle very large user volumes, including sudden spikes by scaling elastically
    9. 9. BigQuery  A column oriented data store that can store and process billions of rows of data  SQL like query syntax for querying data  Run ad-hoc queries against multi terabyte data sets in seconds  Highly scalable, reliable and secure as it uses underlying core Google Platform Infrastructure
    10. 10. BigQuery  Supports all the main ETL and BI tools like Informatica, Talend, QlikView and Tableau  Primarily used for real-time data analysis and visualization  Integration with App Engine through APIs
    11. 11. BigQuery SQL Access  Only SELECT operations  No CREATE, UPDATE or DROP  Analysis of Unstructured data using REGEXP_yyyy functions  JOINs of small (<8mb of compressed data) and large tables are possible. Performance penalty for large table joins
    12. 12. BigQuery Programmatic Access  bq command line tool, Google API client library, REST API  Google API client library supports various languages like Java, Python, JavaScript, Ruby, PHP, Google Apps Script  Authentication is handled via Oauth2  In REST API, credentials and HTTP request have to be handled manually by user
    13. 13. BigQuery Use Cases  Can  Real be used for batch analysis of large data sets time analytics for dashboard type applications  Pre-process very large data sets and serve data in real-time  Visualization using third party tools that call Big Query APIs.
    14. 14. Cloud SQL  MySQL database running on the Google Cloud Platform  Easy migration from local MySQL instances to Cloud SQL  Highly scalable and reliable with replication  Supports all major MySQL features including stored procedures, triggers and views  GUI Frontend for easy administration and operations  Built on top of core Google Infrastructure  Easy integration with App Engine
    15. 15. Cloud Storage   Custom App Cloud SQL BigQuery Cloud SQL Cloud Storage A highly reliable cloud storage platform for storing and accessing vast amounts of data Can be used for data archival and content delivery  Data can be ingested and processed by other Google Cloud Services  Accessible through GUI, command line and APIs
    16. 16. Cloud Storage  Object store that can deliver very efficiently over the internet  Not a mountable file system  Buckets are the basic container. They cannot be nested and can reside in the US or EU geographies.  Objects are stored in buckets. They are immutable and can be upto 5TB in size.  ACLs can be setup for Google users, groups, app domain, authenticated users with READ, WRITE or FULL_CONTROL. Signed URL access for anonymous users.  Can be accessed using XML and JSON REST APIs  Command line access using gsutil tool  App Engine Storage API for access from App Engine
    17. 17. Compute Engine  Infrastructure as a service  Linux Virtual machines with associated storage and network infrastructure are hosted by Google  Can run any type of application or workload in the google cloud that uses the same Google Core Infrastructure  Highly elastic and scalable  A typical use case would be to provision a Hadoop Cluster on demand using several 10s to 100s of virtual machines as name node and data nodes
    18. 18. Compute Engine  Various machine type configurations possible such as High Memory, High CPU, Standard etc.  Very easy provisioning and management using cloud management software like RightScale  CentOS and Debian are the default OSes currently supported.  Typical use cases are batch processing, log analysis, i/o intensive workloads, hadoop on the cloud (map/reduce)
    19. 19. Online Retail Analytics & Visualization
    20. 20. Online Retail Industry Forrester: U.S. Online Retail Sales to Hit $370 Billion by
    21. 21. Healthcare Store  Large online retailer’s Health Store website.  Thousands of health care products are sold per month.
    22. 22. These large online retailers are killing us! I need to increase sales. I need to understand my site visitors better. VP OF MARKETING Can Big Data Analytics help?
    23. 23. DATA SCIENTIST Yes, Big Data Analytics can help! Google’s Cloud platform handles all the complexities of Big Data processing. We start with regular session log files.
    24. 24. Session Log File (W3C compliant) Time & Date when visitor came on site Unique User & Session Id Product Page Visited by User Referral Site
    25. 25. From the simple log files, we can do sophisticated analytics like these: DATA SCIENTIST User Analytics • # of Unique Site Visitors, per hour, per day • # of Return Site Visitors, per hour, per day • Total # of Site Visitors, per hour, per day • Top 10 Active Users per hour, per day
    26. 26. Product Analytics like these: • Top 10 Popular Products per hour, per day • Top 10 popular Products in Shopping Basket per hour, per day • Top 10 Bought Products per hour, per day DATA SCIENTIST
    27. 27. Conversion Analytics like these: • # of users who added products to shopping basket per hour, per day • # of users who actually bought products per hour, per day • % of users who browsed, added products to shopping cart & actually bought per hour, per day. DATA SCIENTIST
    28. 28. Behold, The Google Cloud Platform’s Dashboard! DATA SCIENTIST List of available Services.
    29. 29. Google Cloud Platform’s Cloud Storage DATA SCIENTIST Session Log Files Uploaded to Cloud Storage.
    30. 30. Google Cloud Platform’s BigQuery DATA SCIENTIST Tables on BigQuery with data from Session Log Files.
    31. 31. Running a Query on BigQuery DATA SCIENTIST Queries on BigQuery are very much SQL like, easy to develop & gets results fast.
    32. 32. Visualize BigQuery’s Results in DATA SCIENTIST Tableau provides an easy & effective way to develop dashboards & reports.
    33. 33. Site Analytics – Referral Site Comparisons DATA SCIENTIST Traffic referred to site from other sources like Google. com
    34. 34. Site Analytics – Referral Site Comparisons DATA SCIENTIST Traffic referred to site from other sources like Google. com
    35. 35. Site Analytics – Referral Site Comparisons DATA SCIENTIST Traffic referred to site from other sources like Google. com
    36. 36. Product Analytics - Product Purchase Trends DATA SCIENTIST Analysis of specific products as purchased on site over hours / days in a month
    37. 37. Conversion Analytics - Product Added to Cart vs. Bought. DATA SCIENTIST Analysis of which products were placed in cart vs actually bought over hours / days in a month
    38. 38. Conversion Analytics - Conversion Rate Trends DATA SCIENTIST Analysis of which products were placed in cart vs actually bought over hours / days in a month
    39. 39. DATA SCIENTIST You now know: - how are your products selling, - when are they selling, - which referring site helps the most and other such info. You now have the power of Big Data Analytics on your fingertips!
    40. 40. Wow! Now, I can compete against all the giants! Let me start on my marketing plans! VP OF MARKETING
    41. 41. Q&A @ThirdEyeCss
    42. 42. Third Eye is Google’s Partner for the Google Cloud Platform We are mentioned on Google’s Cloud Platform, site: Tweet @ThirdEyeCss
    43. 43. Contact: Dj Das, Founder & CEO, Alan Merrihew, VP of Business Development, Phone - (408) 462-5257 Corporate Site - Big Data Training - Big Data Educational Seminars -,, Big Data Jobs - Big Data Analytics As a Service -,,,
    44. 44. THANK YOU!