Getting Started on
Google Cloud Platform
Aaron Taylor
@ataylor0123
access any file in seconds,
wherever it is.
www.meta.sc
Folders are outdated
Files are scattered
Talk Roadmap
• What problems we face at Meta
• How we are solving them using GCP
• How you can get started on GCP
Building a product
• No baggage, free to choose whatever stack we want
• Take advantage of latest technologies
• but not quite bleeding edge
Engineering Goals
• This will be a complex product, it needs to be
comprehensible to everyone on our team
• Keep the team as lean as possible
• Focus on product, not sysadmin and dev ops
Language Choices
• Go chosen as our primary language
• Python for NLP and data analysis
• enables easy experimentation, comfortable for data
scientists and developers
• Java/Scala interacting with Dataflow, Apache Tika, etc.
Our Hard Problems
• User onboarding load
• Heterogeneous (changing) data sources
• Unpredictable traffic from web hooks
• Compute loads for file content analysis
• Processing streaming data
User Onboarding
• Crawl multiple cloud
accounts at once
• Parallel computation
• In-process using Go
• Distributed using tasks
• App Engine
Taskqueues
Heterogeneous Data
• Remove complexity of
third-party services
• Detect changes/
breakages in APIs
• Distributed by nature
• Continuous Deployment
• Datastore
• BigQuery
Unpredictable Traffic
• Changes are pushed to
us through web hooks
• Dropping changes
generally unacceptable
• One user should not
negatively impact others
• App Engine autoscaling
• Asynchronous task
queues
Compute loads
• Rich file content analysis
• Parallel computation
• App Engine Flexible
Runtimes
• CPU-based autoscaling
Stream Processing
• Efficient handling of
high-volume changes
• Collate events in
succession, from
multiple users
• Google Cloud Pub/Sub
• Google Cloud Dataflow
How we started off
• App Engine is our entry point
• Service Oriented Architecture
• Currently ~37 different services
• Cloud Datastore is our persistence layer
• BigQuery as a data warehouse
Documentation
• Lots of information for getting started
• Quality resources for our growing team
• Onboarding new developers without GCP
experience has been a breeze
• Google is devoting lots of resources to this area
App Engine
• Don’t worry about servers
• Cache, task queues, cron, database, logging,
monitoring, and more all built in
• Powerful, configurable autoscaling
• Heavy compute on App Engine Flexible Runtimes
Development Process
• Build, run, and test services locally
• Continuous deployment to a development project
• Incremental releases go to production project
• Logging and monitoring easy to setup
Problems we faced
• Mantra of “don’t worry about scalability”didn’t take us
very far
• Users have lots and lots of files
• Datastore use optimizations
• Cost issues with App Engine
• Trimming auto-scaling parameters
• Migrated heavy compute to Flexible Runtimes
Outside GCP
• Algolia
• Hosts infrastructure for our search indices
• Pusher
• realtime socket connections
• Postmark/Mailchimp
• transactional and campaign-based email
Growth of the platform
• Rapid changes and improvements taking place
• Flexible Runtimes
• Container Engine
• Dataflow
• Investing in a documentation overhaul soon
• Support is generally quite responsive
Recent Developments
• Introduction of Pub/Sub to our system for all event
processing
• Experimenting with Kubernetes/Container Engine
• Dataflow stream processing jobs
• Splitting functionality into multiple projects
Quickstart Documentation for Go
How you can start off
Hello World in Go
https://cloud.google.com/appengine/docs/go/quickstart
Server
package hello
import (
"fmt"
"net/http"
)
func init() {
http.HandleFunc("/", handler)
}
func handler(w http.ResponseWriter, r *http.Request) {
fmt.Fprint(w, "Hello, world!")
}
hello.go
Configuration
runtime: go
api_version: go1
handlers:
- url: /.*
script: _go_app
app.yaml
Deploy
appcfy.py update .
Add a Guestbook
https://cloud.google.com/appengine/docs/go/gettingstarted/creating-guestbook
Datastoretype Greeting struct {
Author string
Content string
Date time.Time
}
// guestbookKey returns the key used for all guestbook entries.
func guestbookKey(c appengine.Context) *datastore.Key {
// The string "default_guestbook" here could be varied to have multiple guestbooks.
return datastore.NewKey(c, "Guestbook", "default_guestbook", 0, nil)
}
func root(w http.ResponseWriter, r *http.Request) {
c := appengine.NewContext(r)
// Ancestor queries, as shown here, are strongly consistent with the High
// Replication Datastore. Queries that span entity groups are eventually
// consistent. If we omitted the .Ancestor from this query there would be
// a slight chance that Greeting that had just been written would not
// show up in a query.
q := datastore.NewQuery("Greeting").Ancestor(guestbookKey(c)).Order("-Date").Limit(10)
greetings := make([]Greeting, 0, 10)
if _, err := q.GetAll(c, &greetings); err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
if err := guestbookTemplate.Execute(w, greetings); err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
}
}
Templates
var guestbookTemplate = template.Must(template.New("book").Parse(`
<html>
<head>
<title>Go Guestbook</title>
</head>
<body>
{{range .}}
{{with .Author}}
<p><b>{{.}}</b> wrote:</p>
{{else}}
<p>An anonymous person wrote:</p>
{{end}}
<pre>{{.Content}}</pre>
{{end}}
<form action="/sign" method="post">
<div><textarea name="content" rows="3" cols="60"></textarea></div>
<div><input type="submit" value="Sign Guestbook"></div>
</form>
</body>
</html>
`))
Forms
func sign(w http.ResponseWriter, r *http.Request) {
c := appengine.NewContext(r)
g := Greeting{
Content: r.FormValue("content"),
Date: time.Now(),
}
if u := user.Current(c); u != nil {
g.Author = u.String()
}
// We set the same parent key on every Greeting entity to ensure each Greeting
// is in the same entity group. Queries across the single entity group
// will be consistent. However, the write rate to a single entity group
// should be limited to ~1/second.
key := datastore.NewIncompleteKey(c, "Greeting", guestbookKey(c))
_, err := datastore.Put(c, key, &g)
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
http.Redirect(w, r, "/", http.StatusFound)
}
Conclusions
• Google Cloud Platform has allowed us to build out
Meta in ways that wouldn’t otherwise be feasible
• Simplicity of App Engine allows us to focus on product
• Scalability/Availability are built in to the platform
access any file in seconds,
wherever it is.
www.meta.sc/careers
careers@meta.sc

Getting Started on Google Cloud Platform

  • 1.
    Getting Started on GoogleCloud Platform Aaron Taylor @ataylor0123
  • 2.
    access any filein seconds, wherever it is. www.meta.sc
  • 4.
  • 5.
  • 7.
    Talk Roadmap • Whatproblems we face at Meta • How we are solving them using GCP • How you can get started on GCP
  • 8.
    Building a product •No baggage, free to choose whatever stack we want • Take advantage of latest technologies • but not quite bleeding edge
  • 9.
    Engineering Goals • Thiswill be a complex product, it needs to be comprehensible to everyone on our team • Keep the team as lean as possible • Focus on product, not sysadmin and dev ops
  • 10.
    Language Choices • Gochosen as our primary language • Python for NLP and data analysis • enables easy experimentation, comfortable for data scientists and developers • Java/Scala interacting with Dataflow, Apache Tika, etc.
  • 11.
    Our Hard Problems •User onboarding load • Heterogeneous (changing) data sources • Unpredictable traffic from web hooks • Compute loads for file content analysis • Processing streaming data
  • 12.
    User Onboarding • Crawlmultiple cloud accounts at once • Parallel computation • In-process using Go • Distributed using tasks • App Engine Taskqueues
  • 13.
    Heterogeneous Data • Removecomplexity of third-party services • Detect changes/ breakages in APIs • Distributed by nature • Continuous Deployment • Datastore • BigQuery
  • 14.
    Unpredictable Traffic • Changesare pushed to us through web hooks • Dropping changes generally unacceptable • One user should not negatively impact others • App Engine autoscaling • Asynchronous task queues
  • 15.
    Compute loads • Richfile content analysis • Parallel computation • App Engine Flexible Runtimes • CPU-based autoscaling
  • 16.
    Stream Processing • Efficienthandling of high-volume changes • Collate events in succession, from multiple users • Google Cloud Pub/Sub • Google Cloud Dataflow
  • 17.
    How we startedoff • App Engine is our entry point • Service Oriented Architecture • Currently ~37 different services • Cloud Datastore is our persistence layer • BigQuery as a data warehouse
  • 18.
    Documentation • Lots ofinformation for getting started • Quality resources for our growing team • Onboarding new developers without GCP experience has been a breeze • Google is devoting lots of resources to this area
  • 19.
    App Engine • Don’tworry about servers • Cache, task queues, cron, database, logging, monitoring, and more all built in • Powerful, configurable autoscaling • Heavy compute on App Engine Flexible Runtimes
  • 20.
    Development Process • Build,run, and test services locally • Continuous deployment to a development project • Incremental releases go to production project • Logging and monitoring easy to setup
  • 21.
    Problems we faced •Mantra of “don’t worry about scalability”didn’t take us very far • Users have lots and lots of files • Datastore use optimizations • Cost issues with App Engine • Trimming auto-scaling parameters • Migrated heavy compute to Flexible Runtimes
  • 22.
    Outside GCP • Algolia •Hosts infrastructure for our search indices • Pusher • realtime socket connections • Postmark/Mailchimp • transactional and campaign-based email
  • 23.
    Growth of theplatform • Rapid changes and improvements taking place • Flexible Runtimes • Container Engine • Dataflow • Investing in a documentation overhaul soon • Support is generally quite responsive
  • 24.
    Recent Developments • Introductionof Pub/Sub to our system for all event processing • Experimenting with Kubernetes/Container Engine • Dataflow stream processing jobs • Splitting functionality into multiple projects
  • 25.
    Quickstart Documentation forGo How you can start off
  • 26.
    Hello World inGo https://cloud.google.com/appengine/docs/go/quickstart
  • 27.
    Server package hello import ( "fmt" "net/http" ) funcinit() { http.HandleFunc("/", handler) } func handler(w http.ResponseWriter, r *http.Request) { fmt.Fprint(w, "Hello, world!") } hello.go
  • 28.
  • 29.
  • 30.
  • 31.
    Datastoretype Greeting struct{ Author string Content string Date time.Time } // guestbookKey returns the key used for all guestbook entries. func guestbookKey(c appengine.Context) *datastore.Key { // The string "default_guestbook" here could be varied to have multiple guestbooks. return datastore.NewKey(c, "Guestbook", "default_guestbook", 0, nil) } func root(w http.ResponseWriter, r *http.Request) { c := appengine.NewContext(r) // Ancestor queries, as shown here, are strongly consistent with the High // Replication Datastore. Queries that span entity groups are eventually // consistent. If we omitted the .Ancestor from this query there would be // a slight chance that Greeting that had just been written would not // show up in a query. q := datastore.NewQuery("Greeting").Ancestor(guestbookKey(c)).Order("-Date").Limit(10) greetings := make([]Greeting, 0, 10) if _, err := q.GetAll(c, &greetings); err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } if err := guestbookTemplate.Execute(w, greetings); err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) } }
  • 32.
    Templates var guestbookTemplate =template.Must(template.New("book").Parse(` <html> <head> <title>Go Guestbook</title> </head> <body> {{range .}} {{with .Author}} <p><b>{{.}}</b> wrote:</p> {{else}} <p>An anonymous person wrote:</p> {{end}} <pre>{{.Content}}</pre> {{end}} <form action="/sign" method="post"> <div><textarea name="content" rows="3" cols="60"></textarea></div> <div><input type="submit" value="Sign Guestbook"></div> </form> </body> </html> `))
  • 33.
    Forms func sign(w http.ResponseWriter,r *http.Request) { c := appengine.NewContext(r) g := Greeting{ Content: r.FormValue("content"), Date: time.Now(), } if u := user.Current(c); u != nil { g.Author = u.String() } // We set the same parent key on every Greeting entity to ensure each Greeting // is in the same entity group. Queries across the single entity group // will be consistent. However, the write rate to a single entity group // should be limited to ~1/second. key := datastore.NewIncompleteKey(c, "Greeting", guestbookKey(c)) _, err := datastore.Put(c, key, &g) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } http.Redirect(w, r, "/", http.StatusFound) }
  • 34.
    Conclusions • Google CloudPlatform has allowed us to build out Meta in ways that wouldn’t otherwise be feasible • Simplicity of App Engine allows us to focus on product • Scalability/Availability are built in to the platform
  • 35.
    access any filein seconds, wherever it is. www.meta.sc/careers careers@meta.sc