Your SlideShare is downloading. ×
How we are using BigQuery and Apps Scripts at teowaki
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

How we are using BigQuery and Apps Scripts at teowaki

479
views

Published on

I was invited to speak at the Google Startup Launch Summit in London about how we are using the google cloud to power our startup

I was invited to speak at the Google Startup Launch Summit in London about how we are using the google cloud to power our startup

Published in: Software, Technology, Business

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
479
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. javier ramirez @supercoco9 How we are using BigQuery and Apps Scripts at teowaki
  • 2. Set a distance. Set an expiration time. Bye bye noise.
  • 3. Analytics flow
  • 4. Analytics flow, by segment
  • 5. Automatic Alerts
  • 6. javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14 REST API (Ruby on Rails) + Web on top (AngularJS)
  • 7. javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
  • 8. data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the structures of your database architectures. Ed Dumbill program chair for the O’Reilly Strata Conference javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
  • 9. 1. non intrusive metrics 2. keep the history 3. avoid vendor lock-in 4. interactive queries 5. cheap 6. extra ball: real time javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
  • 10. javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
  • 11. Cloud Storage: Cost-efficient storage of files javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
  • 12. javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
  • 13. Hadoop Cassandra Amazon Redshift ... javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14 tools we considered:
  • 14. Our choice: Google BigQuery Data analysis as a service http://developers.google.com/bigquery javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
  • 15. Based on “Dremel” Specifically designed for interactive queries over petabytes of real-time data javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
  • 16. loading data You just send the data in text (or JSON) format javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
  • 17. SQL javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14 select name from USERS order by date; select count(*) from users; select max(date) from USERS; select sum(total) from ORDERS group by user;
  • 18. specific extensions for analytics javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14 within flatten nest stddev top first last nth variance var_pop var_samp covar_pop covar_samp quantiles
  • 19. web console screenshot javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
  • 20. javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14 our most active user
  • 21. javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14 country segmented traffic
  • 22. javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14 10 request we should be caching
  • 23. javier ramirez @supercoco9 http://teowaki.com startup launch summit london 14 5 most created resources
  • 24. new users per month
  • 25. SELECT repository_name, repository_language, repository_description, COUNT(repository_name) as cnt, repository_url FROM github.timeline WHERE type="WatchEvent" AND PARSE_UTC_USEC(created_at) >= PARSE_UTC_USEC("#{yesterday} 20:00:00") AND repository_url IN ( SELECT repository_url FROM github.timeline WHERE type="CreateEvent" AND PARSE_UTC_USEC(repository_created_at) >= PARSE_UTC_USEC('#{yesterday} 20:00:00') AND repository_fork = "false" AND payload_ref_type = "repository" GROUP BY repository_url ) GROUP BY repository_name, repository_language, repository_description, repository_url HAVING cnt >= 5 ORDER BY cnt DESC LIMIT 25
  • 26. Automation with Apps Script Read from bigquery Create a spreadsheet on Drive E-mail it everyday as a PDF javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
  • 27. cloud storage pricing $0.032 per GB a gzipped 4.8 MB file stores 1MM rows $0.000092 / month per 1MM rows javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
  • 28. bigquery pricing $26 per stored TB 1000000 rows => $0.00416 / month £0.00243 / month $5 per processed TB 1 full scan = 160 MB 1 count = 0 MB 1 full scan over 1 column = 5.4 MB 100 GB => $0.05 / month £0.03javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
  • 29. £0.054307 / month* per 1MM rows *the 1st 100GB every month are free of charge javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
  • 30. 1. non intrusive metrics 2. keep the history 3. avoid vendor lock-in 4. interactive queries 5. cheap 6. extra ball: real time javier ramirez @supercoco9 https://teowaki.com startup launch summit london 14
  • 31. ig
  • 32. Find related links at https://teowaki.com/teams/javier-community/link-categories/bigquery-talk Thanks! Javier Ramírez @supercoco9 startup launch summit london 14