How we are using BigQuery and Apps Scripts at teowaki

929 views

Published on

I was invited to speak at the Google Startup Launch Summit in London about how we are using the google cloud to power our startup

Published in: Software, Technology, Business
  • Be the first to comment

  • Be the first to like this

How we are using BigQuery and Apps Scripts at teowaki

  1. 1. javier ramirez @supercoco9 How we are using BigQuery and Apps Scripts at teowaki
  2. 2. Set a distance. Set an expiration time. Bye bye noise.
  3. 3. Analytics flow
  4. 4. Analytics flow, by segment
  5. 5. Automatic Alerts
  6. 6. javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14 REST API (Ruby on Rails) + Web on top (AngularJS)
  7. 7. javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
  8. 8. data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the structures of your database architectures. Ed Dumbill program chair for the O’Reilly Strata Conference javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
  9. 9. 1. non intrusive metrics 2. keep the history 3. avoid vendor lock-in 4. interactive queries 5. cheap 6. extra ball: real time javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
  10. 10. javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
  11. 11. Cloud Storage: Cost-efficient storage of files javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
  12. 12. javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
  13. 13. Hadoop Cassandra Amazon Redshift ... javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14 tools we considered:
  14. 14. Our choice: Google BigQuery Data analysis as a service http://developers.google.com/bigquery javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
  15. 15. Based on “Dremel” Specifically designed for interactive queries over petabytes of real-time data javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
  16. 16. loading data You just send the data in text (or JSON) format javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
  17. 17. SQL javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14 select name from USERS order by date; select count(*) from users; select max(date) from USERS; select sum(total) from ORDERS group by user;
  18. 18. specific extensions for analytics javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14 within flatten nest stddev top first last nth variance var_pop var_samp covar_pop covar_samp quantiles
  19. 19. web console screenshot javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
  20. 20. javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14 our most active user
  21. 21. javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14 country segmented traffic
  22. 22. javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14 10 request we should be caching
  23. 23. javier ramirez @supercoco9 http://teowaki.com startup launch summit london 14 5 most created resources
  24. 24. new users per month
  25. 25. SELECT repository_name, repository_language, repository_description, COUNT(repository_name) as cnt, repository_url FROM github.timeline WHERE type="WatchEvent" AND PARSE_UTC_USEC(created_at) >= PARSE_UTC_USEC("#{yesterday} 20:00:00") AND repository_url IN ( SELECT repository_url FROM github.timeline WHERE type="CreateEvent" AND PARSE_UTC_USEC(repository_created_at) >= PARSE_UTC_USEC('#{yesterday} 20:00:00') AND repository_fork = "false" AND payload_ref_type = "repository" GROUP BY repository_url ) GROUP BY repository_name, repository_language, repository_description, repository_url HAVING cnt >= 5 ORDER BY cnt DESC LIMIT 25
  26. 26. Automation with Apps Script Read from bigquery Create a spreadsheet on Drive E-mail it everyday as a PDF javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
  27. 27. cloud storage pricing $0.032 per GB a gzipped 4.8 MB file stores 1MM rows $0.000092 / month per 1MM rows javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
  28. 28. bigquery pricing $26 per stored TB 1000000 rows => $0.00416 / month £0.00243 / month $5 per processed TB 1 full scan = 160 MB 1 count = 0 MB 1 full scan over 1 column = 5.4 MB 100 GB => $0.05 / month £0.03javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
  29. 29. £0.054307 / month* per 1MM rows *the 1st 100GB every month are free of charge javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
  30. 30. 1. non intrusive metrics 2. keep the history 3. avoid vendor lock-in 4. interactive queries 5. cheap 6. extra ball: real time javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
  31. 31. ig
  32. 32. Find related links at https://teowaki.com/teams/javier-community/link-categories/bigquery-talk Thanks! Javier Ramírez @supercoco9 startup launch summit london 14

×