Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Lunch & Learn BigQuery & Firebase from other Google Cloud customers

1) Migrating your on-prem #Enterprise #Data #Warehouse into the #Cloud? Here is what you need to learn (and unlearn) when designing a modern Cloud #DataWarehouse in #BigQuery!
2) Launching a #Startup? See how to supercharge your idea with #Firebase!

Watch the recording at and more forward-looking talks on #Cloud #Architectures & #DataEngineering join User Group.

  • Be the first to comment

Lunch & Learn BigQuery & Firebase from other Google Cloud customers

  1. 1. Lunch & Learn with Google Cloud
  2. 2. Organizers Software Engineer @ Accenture GDG Capital Region Lead Women Techmakers Ambassador Linda Kovacs Daniel Zivkovic Karen Tamrazyan
  3. 3. Sponsors
  4. 4. Introducing C2C The Independent Google Cloud Community We’re here to unite Google Cloud customers across the globe. Connections Customer-to-customer conversations, events, forums, and other outlets to connect with peers and experts. Events and Education Customer stories, presentations, blogs, and points of view on hot topics, best practices, and the latest Google Cloud news. Exclusive Access Sessions and conversations with Google Cloud experts and executives to learn from the best and share your feedback to help shape what’s next.
  5. 5. C2C Team Jeff Branham General Manager Danny Pancratz Director of Product Ilias Papachristos EMEA Community Manager
  6. 6. What You Can Expect: Connect ● Community platform to share resources, discuss ideas, and provide advice on issues and ongoing projects ● Live Member Discussions to share experiences, discuss best practices, and find inspiration from other thought leaders and experts ● Regional Connect Events for peer-to- peer sharing and network-building. Learn ● On-demand videos, blogs, and resources to provide a launchpad of aggregated expertise from customers, partners and GC. ● Cohort-based learning programs to build subject matter expertise and GCP literacy across the community. Shape ● Best practices through the shared expertise of communities of practice. ● Trusted resources collections vetted by customers. ● Product feedback delivered with a unified customer voice to shape the future of cloud. Join: Questions: Follow: @meetC2C
  7. 7. Agenda ☑ 4:00pm - 4:15pm Connect & Network ☑ 4:15pm - 5:00pm Dan Sullivan “How to Design a Modern Data Warehouse in BigQuery, or Why I Needed to Forget Everything I Learned in Data Modeling School” ☑ 5:00pm - 5:45pm Kudz Murefu “Small Teams, Big Things with Firebase & GCP Serverless Services” ☑ 5:45pm - 6:00pm WIN cool PRIZES from our sponsors! Closing Comments & Networking All time is GMT.
  8. 8. How to Design a Modern Data Warehouse in BigQuery or Why I Needed to Forget Everything I Learned in Data Modeling School Author of the official Google Cloud study guides for the Professional Architect, Professional Data Engineer, and Associate Cloud Engineer Dan Sullivan PEAK6 Technologies Cloud Architect and Data Scientist
  9. 9. How to Design a Modern Data Warehouse in BigQuery ...or why I needed to forget everything I learned in data modeling school
  10. 10. Architecture Ahead
  11. 11. Datastore Options ➤ Relational ➢ Highly structured and transactional ➢ Difficult to scale ➤ NoSQL ➢ Semi-structured, eventual consistency, scalable ➤ Analytical ➢ Structured, scalable, not transactional
  12. 12. Data Warehouse (early 2000s) ➤ Few servers ➤ Tightly coupled storage and compute ➤ Scale vertically ➤ Built on same relational database management systems used for OLTP
  13. 13. BigQuery ➤ Serverless data warehouse ➤ Petabyte scale ➤ Uses SQL but is not a relational database ➤ Analytical database ➤ Other features ➢ BigQuery ML ➢ BigQuery BI Engine ➢ BigQuery GIS
  14. 14. So What’s Different about BigQuery?
  15. 15. Source:
  16. 16. Dremel ➤ Multi-tenant cluster ➤ SQL queries to execution trees ➢ Leaves are called slots; read data and perform computation ➢ Inner nodes perform aggregation ➤ Dynamically allocate slots to queries ➤ Maintains fairness ➤ Single user cloud get 1,000s of slots
  17. 17. Source:
  18. 18. Colossus ➤ Distributed storage system ➤ Handles replication and recovery ➤ No need to managed storage
  19. 19. Jupiter & Borg ➤ Jupiter ➢ Google networking switch ➢ Petibit scale ➢ Storage to compute communication ➢ No need for rack awareness ➤ Borg ➢ Predecessor of Kubernetes ➢ Manages mixers and slots aece3767
  20. 20. Capacitor ➤ Columnar storage format ➤ Supports semi-structured data ➢ Nested structures ➢ Repeated fields ➤ No need to read parent column to produce a nested structure attribute value ➤ Compression
  21. 21. What Does this Mean for Data Modeling?
  22. 22. If you remember anything from this talk ... ➤ Design for scanning in parallel ➤ Partition to minimize amount of data scanned ➤ Cluster to further reduce the amount of data scanned ➤ Joins may require shuffling data across slots so ... ➤ Denormalize using nested and repeated fields
  23. 23. Partitioning
  24. 24. Partitioned Tables ➤ Table is divided into segments called partitions ➤ Improves query performance ➤ Lowers cost by reducing amount of data scanned
  25. 25. Partition by Ingestion Time ➤ Loads data into daily, date-based partitions ➤ Automatically creates new partitions ➤ Uses ingestion time to determine partition ➤ Create pseudo-column _PARTITIONTIME ➢ Date-based timestamp ➢ Used in queries to limit the number of partitions scanned
  26. 26. Date/Timestamp Partitioning ➤ Partition based on date or timestamp column ➤ Each partition holds one day of data ➤ No need for _PARTITIONTIME ➤ Special partitions ➢ _NULL_ when nulls in partition column ➢ _UNPARTITION_ when values in column outside allowed range
  27. 27. Integer Range Partition ➤ Partition column must be an integer type ➤ Partition column cannot be repeated ➤ Cannot use Legacy SQL to query partitioned tables
  28. 28. Sharding vs. Partitioning ➤ Sharding ➢ Use separate table for each day ➢ [TABLE_NAME_PREFIX]_YYMMDD ➢ Use UNION in queries to scan multiple tables ➤ Partitioning is preferred over sharding ➢ Less metadata to maintain ➢ Less permission checking overhead ➢ Better performance
  29. 29. Requiring Partition Filter ➤ Require_partitioning_filter parameter ➤ Specified at table level (formerly at partition level) ➤ Requires a WHERE clause with the partition column
  30. 30. Clustered Tables
  31. 31. Clustered Tables ➤ Data sorted based on values in one or more columns ➤ Can improve performance of aggregate queries ➤ Can reduce scanning when cluster columns used in WHERE clause ➤ Used with partitioned tables
  32. 32. Automatic Reclustering ➤ As new data is added to a table, data may be stored out of order ➤ BigQuery automatically re-clusters in the background
  33. 33. Nested and Repeated Fields
  34. 34. Nested and Repeated Fields
  35. 35. Nested and Repeated Fields
  36. 36. One more time … if you remember anything from this talk ... ➤ Design for scanning in parallel ➤ Partition to minimize amount of data scanned ➤ Cluster to further reduce the amount of data scanned ➤ Joins may require shuffling data across slots so ... ➤ Denormalize using nested and repeated fields to avoid needing joins
  37. 37. Small Teams, Big Things with Firebase & GCP Serverless Services Kudz Murefu Founder Strma Music
  38. 38. Strma Infrastructure Leveraging Firebase & Google Cloud serverless to build a Streaming Platform By Kudzanai Murefu
  39. 39. ➔ Strma is a streaming app for african music ➔ Our journey started in 2017 whilst a business student ➔ Mission was to create a simple way to deliver Afro-music over the web ➔ We launched on Wordpress as a simple blog, off we went! Birth of the Idea
  40. 40. Prevailing Challenges ➔ Heavy reliance on Plugins ➔ Very slow page loads ➔ Limited File storage for songs ➔ Expensive Hosting The exodus from Wordpress
  41. 41. What to use for my backend ➔ Database? ➔ Hosting? ➔ Backend Jobs? +
  42. 42. A miracle from heaven Firebase Authentication Realtime Database Functions Hosting Storage
  43. 43. Realtime Database ➔ Simple NoSQL Database ➔ Can be accessed from the web or through your codebase ➔ Easily interact with the Database Tree ➔ No need to setup a server
  44. 44. Realtime Database On initial setup you can manually enter records using web console
  45. 45. Realtime Synching ➔ Allows for real time updates with no extra configuration ➔ Changes are broadcasted to all clients ➔ Just subscribe with to database with 3 lines of code
  46. 46. QUICK DEMO
  47. 47. Firebase Storage ➔ Built on top of Google Cloud Storage ➔ Same technology powering Spotify and Google photos ➔ Robust uploads and downloads ➔ Use with drag & drop interface or using codebase
  48. 48. QUICK DEMO
  49. 49. Web Interface ➔ Simple web interface to manage files & folders
  50. 50. Firebase Hosting ➔ Easily deploy your website to a global CDN ➔ Comes with versioning and ability to rollback ➔ SSL certificates are built in ➔ Free tier 10gb or PayAsYouGo plan
  51. 51. Cloud Functions ➔ Easily trigger code to do some task through http ➔ Code is simple and in javascript & typescript ➔ Use with Database to trigger when data changes ➔ Use with Storage on file upload ➔ Can schedule to run periodically
  52. 52. QUICK DEMO
  53. 53. Bringing it altogether ➔ Firebase is an all in one solution ➔ Simple but robust enough to go from ZERO to HERO ➔ Allows to focus more on business instead of Infrastructure Authentication Realtime Database Functions Hosting Storage
  54. 54. Growth, growth, growth... ➔ 5000 weekly users on the website, and growing ➔ Just launched our Android app ➔ We plan to grow the platform to 1 million+ users ➔ And our team is growing
  55. 55. ➔ Needed a way to gradually introduce updates ➔ Canary like deployments ➔ e.g. Release a Beta feature to 15% of traffic ➔ Easily validate performance before releasing to 100% traffic. ➔ CI/CD for remote developers From firebase hosting to Cloud Run Staging Deploy Deploy Firebase Hosting Cloud Run Before Now Production
  56. 56. Cloud Workflows to Improve efficiency
  57. 57. Firebase Client Apps Build code deploy to Cloud Run Cloud Build Cloud Run Remote code commits Https SDK Workflows Calls Functions Scheduled Workflows Current Infrastructure Authentication Realtime Database Functions Storage
  58. 58. Sponsors
  59. 59. Raffle time! We have a lot of prizes from our amazing sponsors. Let’s raffle them off! Raffle Drawing Prizes: 1. Dan Sullivan Google Cloud Associate Cloud Engineer Certification Practice Exam ($50 value each) to all attendees. 2. C2C The Independent Google Cloud Community offers 5 hoodies. 3. O’Reilly 5 Books & 30 days full access to library ($50 value each). 4. ROI Training 4 On Demand Google Cloud Certification training: ACE/PCE ($500 value each). 5. Jetbrains offers 3 free annual Personal subscriptions ($249 value each).
  60. 60. Uniting people from every corner of the Google Cloud universe to connect, learn, and shape the future of the cloud. Connect with Google Cloud Professionals on the C2C Community Platform Your one-stop shop for engaging with other members, staying on top of upcoming events, browsing articles and videos, and so much more. The structure and navigation reflects our three main community focuses: connect, learn, and shape. Connect: Join a group (we've got plenty for you to choose from) and start engaging in real time with other members. New groups starting for Germany and the UK and Ireland! Learn: Think of this section as a library for C2C content. Each of our top focus areas has a dedicated collection of articles, videos, and content from our community and events. Shape: Help shape the future of C2C by sharing your expertise, ideas, and by requesting topics you want us to cover with our C2C events and content. Join by Monday for a chance to win a C2C hoodie! Create your account at Select C2C-Sponsored Event as your referral
  61. 61. Raffle Drawing Link Wheel of names: