Thursday, Feb 27, 2020
1. Intro & Activity Update
2. Community Open Mic
3. Ian Whitestone: “Bootstrapping a
data-driven application with
Zappa (Serverless Python) to
find an apartment in Toronto”
4. Networking
1 Meetup Agenda
Serverless is not just about the Tech:
Serverless is New Agile & Mindset
Serverless Dev (gluing
other people’s APIs and
managed services)
We're obsessed by
creating business value
(meaningful MVPs,
products) and helping
We build bridges
between Serverless
Community (“Dev leg”),
and Front-end & Voice-
First folks (“UX leg”),
and empower UX
Achieve agility NOT by
“sprinting” faster (like in
Scrum), but by working
smarter (by using
bigger building blocks
and less Ops)
© 2020 Trend Micro Inc.1
Trend Micro Cloud One™
Cloud Security Simplified
Albert Kramer
Technical Director Trend Micro
© 2020 Trend Micro Inc.2
Cloud Na)ve Applica)ons
Strategic Priori,es for Cloud Builders
Cloud Migra)on
Cloud Opera)onal Excellence
Risk & Compliance
Containers Serverless
Amazon S3
Azure Blob
GitHub Jenkins
How do you secure such a complex &
fast-paced environment?
TransiCon, not a cut-over
Hybrid cloud is the norm
• Deliver fast, iterate oFen
leverage: code re-use, open-
source and public code
Repeatable & consistent
Infrastructure and cost
Cloud Center of Excellence (CCoE)
Physical Virtual Cloud
© 2020 Trend Micro Inc.3
Trend Micro Cloud One™
Security Services Pla/orm for Cloud Builders
Cloud Na)ve Applica)ons
Containers Serverless
Cloud StorageCloud Workloads
Cloud Opera)onal Excellence
Risk & Compliance
Cloud Migra)on
Physical Virtual Cloud
© 2020 Trend Micro Inc.4
Trend Micro Cloud One™
Security Services Pla/orm for Cloud Builders
Workload & container host
Security for container
File scanning for cloud
storage services
Network layer IPS to
secure entire VPCs
Security for
serverless funcCons,
APIs, and applicaCons
Assurance cloud infrastructure is
configured securely
Cloud-na)ve, SaaS-based pla2orm with the most extensive set of cloud security services
• Single-sign-on
• Common user
and cloud
• Common
procurement &
• Common
support &
• Expandable
© 2020 Trend Micro Inc.5
Trend Micro Cloud One™
Security Services Pla/orm for Cloud Builders
Serverless and API protection
Automation center
Future Talks
Upcoming 2020 #ServerlessTO Meetups
1. Intro to PySpark – Python Data Analysis at scale in the Cloud –
Jonathan Rioux, Lead Data Scientist at EPAM Systems & author of
PySpark in Action book ** MARCH 19 **
2. Introduction to Google BigQuery – Matt Welke, Software
Developer at GroupBy Inc
3. Solving your Business Problems with Serverless Architectures
– Panel discussion ** BACK BY POPULAR DEMAND **
4. Serverless with Pivotal Cloud Foundry – Adib Saikali, Principal
Platform Architect at VMware
5. Fivetran – Data Pipelines, Reinvented – Replicate your data into
the Cloud Warehouse of your choice
6. Your Own Presentation – PLEASE VOLUNTEER ☺
Community Open Mic
Your 10 sec. pitch ☺
- Looking for work?
- Offering work?
About You – because without you, there would be no meetups!
Feature Talk
Ian Whitestone, Data Scientist
at Shopify
Bootstrapping a data-driven application with ZappaBootstrapping a data-driven application with Zappa
Some background..Some background..
>>> df[df.bedrooms == 1].price.median()>>> df[df.bedrooms == 1].price.median()
>>> df[df.bedrooms == 0].price.median()>>> df[df.bedrooms == 0].price.median()
>>> df[df.housing_type == 'basement'].price.median()>>> df[df.housing_type == 'basement'].price.median()
Not only is it expensive..Not only is it expensive..
Maybe there's a better way?Maybe there's a better way?
InspiredbyasimpleSanFrancisco madebyVik
hello, domihello, domi
Today's talkToday's talk
Serverless OfferingsServerless Offerings
1millionrequests&400,000GB-secondspermonth[🙅 💸 ]
My RequirementsMy Requirements
Serverless Python from ScratchServerless Python from Scratch
# Create virtualenv and install packages
→ pipenv install requests
import requests
import yaml
import main
def my_handler(event=None, context=None):
"""Kick off the desired function
event : dict, optional
AWS Lambda uses this parameter to pass in event data to the handler
context : LambdaContext, optional
AWS Lambda uses this parameter to provide runtime information
to your handler
main.do_stuff() # and things
→ tree
├── Pipfile
├── Pipfile.lock
├── app
│ ├──
│ └──
Step 1: Build Deployment PackageStep 1: Build Deployment Package
→ pipenv run pip show requests
Name: requests
Version: 2.22.0
Summary: Python HTTP for Humans.
Author: Kenneth Reitz
License: Apache 2.0
Location: /Users/ianwhitestone/.../virtualenvs/.../lib/python3.7/site-packages 👈
Requires: idna, urllib3, certifi, chardet
Required-by: zappa
→ PACKAGES_DIR=/Users/ianwhitestone/.../virtualenvs/.../lib/python3.7/site-packages
→ PROJECT_DIR=$(pwd)
→ zip -r ${PROJECT_DIR}/ .
→ cd ${PROJECT_DIR}/app
→ zip -r ${PROJECT_DIR}/ .
Step 2: Create Identity & Access Management (IAM)Step 2: Create Identity & Access Management (IAM)
→ aws iam create-role 
--role-name lambda_basic_role 
--assume-role-policy-document file://lambda_trust_policy.json
"Role": {
"Path": "/",
"RoleName": "lambda_basic_role",
"RoleId": "AROA......",
"Arn": "arn:aws:iam::<account_num>:role/lambda_basic_role",
"CreateDate": "2019-09-22T16:48:43Z",
"AssumeRolePolicyDocument": {
"Version": "2012-10-17",
"Statement": [
"Effect": "Allow",
"Principal": {
"Service": ""
"Action": "sts:AssumeRole"
# Give it full access to S3
→ aws iam attach-role-policy 
--role-name lambda_basic_role 
--policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess
# And cloudwatch (logs)
→ aws iam attach-role-policy 
--role-name lambda_basic_role 
--policy-arn arn:aws:iam::aws:policy/CloudWatchFullAccess
Step 3: Create Lambda FunctionStep 3: Create Lambda Function
→ aws lambda create-function 
--function-name download_stuff 
--runtime python3.7  😎
--role arn:aws:iam::<account_num>:role/lambda_basic_role 
--handler handler.my_handler 
--zip-file fileb://../ 
--memory-size 128 
--timeout 900 # max timeout (15 minutes)
Step 4: Create Cloudwatch Events to Trigger LambdaStep 4: Create Cloudwatch Events to Trigger Lambda
# Run it every hour
aws events put-rule 
--name "RunLambdaFunction" 
--schedule-expression "rate(1 hour)" 
--state "ENABLED"
# Add lambda function as target
aws events put-targets 
--rule "RunLambdaFunction" 
--targets "Id"="1","Arn"="arn:aws:lambda:us-east-1:<account_num>:function:download_stuff"
→ chmod -R 755 $PACKAGES_DIR
→ chmod -R 755 $PROJECT_DIR
→ aws lambda update-function-code 
--function-name download_stuff 
--zip-file fileb://../
# Create virtualenv and install packages
→ pipenv install requests
→ pipenv install psycopg2 # new dependency!
Now let's talk about ZappaNow let's talk about Zappa
# Create virtualenv and install packages
→ pipenv install requests
→ pipenv install psycopg2
→ pipenv install zappa # new dependency!
(canbecreatedstepbystepwithzappa init)
"dev": {
"apigateway_enabled": false,
"aws_region": "us-east-1",
"profile_name": "default",
"project_name": "download_stuff",
"runtime": "python3.7",
"s3_bucket": "download_stuff",
"keep_warm": false,
"events": [{
"function": "main.do_stuff",
"expression": "rate(1 hour)"
"prod": {
// config for production
→ zappa deploy dev
Calling deploy for stage dev..
Downloading and installing dependencies..
- psycopg2-binary==2.8.3: Using locally cached manylinux wheel
- sqlite==python3: Using precompiled lambda package
Packaging project as zip.
Uploading (9.5MiB)..
100%|█████████████████████████████████████████| 9.97M/9.97M [00:21<00:00, 528KB/s]
Scheduled with expression rate(1 minute)!
Deployment complete!
# Show all logs
→ zappa tail dev
Calling tail for stage dev..
[1569183806942] Instancing..
[1569183806943] [DEBUG] 2019-09-22T20:23:26.942Z 97e8-d0b23aaf17a0 Zappa Event:
{'time': '2019-09-22T20:23:24Z', 'detail-type': 'Scheduled Event', 'source': '',
'region': 'us-east-1', 'detail': {}, 'version': '0',
'resources': ['arn:aws:events:us-east-1:<>:rule/'],
'id': '75265076-af20-30ca-fd1e-b3fcbe478843', 'kwargs': {}}
[1569183806988] hello world!!
[1569183865861] [DEBUG] 2019-09-22T20:24:25.861Z 8064-931e09d761e6 Zappa Event:
{'time': '2019-09-22T20:24:24Z', 'detail-type': 'Scheduled Event', 'source': '',
'region': 'us-east-1', 'detail': {}, 'version': '0',
'resources': ['arn:aws:events:us-east-1:<>:rule/'],
'id': '823d2b37-6a85-c162-5084-1906492f4b93', 'kwargs': {}}
[1569183865861] hello world!!
# Show logs from specific timeframe
→ zappa tail dev --since 1m
# Show logs from specific timeframe and filter
→ zappa tail batch_secondary_us_east_1 --since 1d --filter "ERROR"
→ zappa invoke dev "import psycopg2; print('hello')" --raw
Calling invoke for stage dev..
[START] RequestId: e35516da-b71d-4452-9896-e622fe263d1f Version: $LATEST
[DEBUG] 2019-09-22T20:20:09.25Z e622fe263d1f Zappa Event:
{'raw_command': "import psycopg2; print('hello')"}
[END] RequestId: e35516da-b71d-4452-9896-e622fe263d1f
[REPORT] RequestId: e35516da-b71d-4452-9896-e622fe263d1f
Duration: 198.44 ms
Billed Duration: 200 ms
Memory Size: 512 MB
Max Memory Used: 84 MB
Init Duration: 525.29 ms
thiscanchangeresponsetimefrom~300millisecondsto~3seconds( )
default{"keep_warm": true}setting
{"slim_handler": true}
Many more features..Many more features..
Easyrollbackswithzappa rollback prod -n 1
Easyinfrateardownwithzappa undeploy prod
Other serverless frameworksOther serverless frameworks
Overview of domiOverview of domi
"app": {
"app_function": "",
"aws_region": "us-east-1",
"slim_handler": false,
"runtime": "python3.7",
"certificate_arn": "arn:aws:acm:us-east-1:XXXXXX:certificate/XXXXXX",
"domain": "",
"keep_warm": true,
"keep_warm_expression": "cron(0/3 12-4 ? * * *)",
"timeout_seconds": 3,
"batch_primary_us_east_1": {
"slim_handler": false,
"keep_warm": false,
"aws_region": "us-east-1",
"runtime": "python3.7",
"events": [
"function": "",
"expression": "cron(0 */2 * * ? *)"
"function": "",
"expression": "cron(15 */2 * * ? *)"
"function": "",
"expression": "cron(15 */2 * * ? *)"
"timeout_seconds": 900,
isaspatialdatabaseextenderfor object-relational
PostGIS PostgreSQL
Run fast, powerful spatial queriesRun fast, powerful spatial queries
SELECT listings.*
FROM listings, user_regions
ST_Contains(user_regions.geom, listings.geom)
AND bedrooms >= 1
AND bathrooms >= 1
AND ...
from geoalchemy2 import Geometry
from sqlalchemy import Column, Integer
class Listing(BASE):
__tablename__ = "listings"
id = Column(Integer, primary_key=True)
geom = Column(Geometry(geometry_type="POINT", srid=4326))
bedrooms = Column(Integer)
class UserRegion(BASE):
__tablename__ = "user_regions"
id = Column(Integer, primary_key=True)
user_id = Column(Integer, ForeignKey(""))
geom = Column(Geometry(geometry_type="POLYGON", srid=4326))
from models import Listing, UserRegion, SESSION
from sqlalchemy import func
listings = (
UserRegion.user_id == 123,
func.ST_Contains(UserRegion.geom, Listing.geom),
Price RankPrice Rank
Option 1: ClusteringOption 1: Clustering
Key problem with this approach:Key problem with this approach:
Option 2: Linear RegressionOption 2: Linear Regression
Option 3: Quantile RegressionOption 3: Quantile Regression
Feature EngineeringFeature Engineering
price ~ bedrooms + bathrooms + size + is_furnished + ...
But how do we account for location?But how do we account for location?
Area Encoding?Area Encoding?
from sklearn.cluster import KMeans
X = df[['lat', 'long']].values
km = KMeans(20, init='k-means++')
clusters = km.predict(X) # classify points into 1 of 20 clusters
price ~ bedrooms + bathrooms + size + is_furnished + ...
+ cluster_0 + cluster_1 + ...
Nearest NeighborsNearest Neighbors
👪 🏠 ...?... 🏠 👪👪 🏠 ...?... 🏠 👪
(canalsouse )
>>> from annoy import AnnoyIndex
# build the tree
>>> featurees = ["lat_scaled", "long_scaled", "bedrooms_scaled"]
>>> tree = AnnoyIndex(len(features), "euclidean")
>>> for index, row in df[features].iterrows():
tree.add_item(index, row.values)
# search da tree
>>> apartment_index = 1 # index of apartment to search
>>> tree.get_nns_by_item(apartment_index, 51) # get 50 closest points
[1, 23412, 424, 794, 12, 939, 58, 3, ...]
price ~ bedrooms + bathrooms + size + is_furnished + ...
+ nn_50_avg_price + ...
Better handling for remote apartments (outskirts)Better handling for remote apartments (outskirts)
"price_rank_primary": {
"project_name": "domi",
"slim_handler": true,
"memory_size": 3000,
"apigateway_enabled": false,
"keep_warm": false,
"aws_region": "us-east-1",
"runtime": "python3.7",
"events": [
"function": "",
"expression": "cron(0 */2 * * ? *)"
"timeout_seconds": 900,
Displaying to usersDisplaying to users
User design considerationsUser design considerations
All ya need is a little...All ya need is a little...
Monitoring with Great ExpectationsMonitoring with Great Expectations
Enter Great ExpectationsEnter Great Expectations
Types of ExpectationsTypes of Expectations
Example: Validating Row CountsExample: Validating Row Counts
Example: Validating Row CountsExample: Validating Row Counts
"data_asset_name": "yesterdays_craigslist_listings",
"expectation_suite_name": "default",
"expectations": [
"expectation_type": "expect_table_row_count_to_be_between",
"kwargs": {
"min_value": 300
from domi.db import DB_ENGINE
from great_expectations.dataset import SqlAlchemyDataset
sql_query = """
FROM {tablename}
new_sql_dataset = SqlAlchemyDataset(custom_sql=sql_query, engine=db_engine)
validation_results = new_sql_dataset.validate(expectation_suite="expectations.json")
if validation_results["success"]:
Example: Model Monitoring with DistributionalExample: Model Monitoring with Distributional
Serverless Gotchas and WorkaroundsServerless Gotchas and Workarounds
Gotcha 1: Shared SESSION objectGotcha 1: Shared SESSION object
from domi.handlers import process_new_listings
from domi.db import SESSION
# 👆 everything instantiated above here is shared across future function invocations
def lambda_handler(event, context):
# Automatically ensure all transactions are succesfully committed,
# or rolled back if not
def commit_session(_raise=True):
if not SESSION:
except Exception as e:
if _raise:
def session_committer(func):
def wrapper(*args, **kwargs):
return func(*args, **kwargs)
return wrapper
# Use decorator on any function doing database transactions
def process_new_listings():
Gotcha 2: Slim handler kills performance on coldGotcha 2: Slim handler kills performance on cold
{"slim_handler": true}savesdeploymentpackagetoS3
"callbacks": { // Call custom functions during the local Zappa deployment/update process
"settings": "my_app.settings_callback", // After loading the settings
"zip": "my_app.zip_callback", // After creating the package
"post": "my_app.post_callback", // After command has executed
See formoredetails.
"app": {
"app_function": "",
"aws_region": "us-east-1",
"runtime": "python3.7",
"certificate_arn": "arn:aws:acm:us-east-1:XXXXXX:certificate/XXXXXX",
"domain": "",
"keep_warm": true,
"keep_warm_expression": "cron(0/3 12-4 ? * * *)",
"timeout_seconds": 3,
// updated settings
"slim_handler": false,
"regex_excludes": [
"pandas", "scipy", "numpy", "PIL", "statsmodels", "matplotlib"
"callbacks": {
"zip": "zappa_package_cleaner.main"
Gotcha 3: Adding new, pre-compiled packagesGotcha 3: Adding new, pre-compiled packages
(⚠hackiness follows)(⚠hackiness follows)
# when running locally this will import succesfully
# when running on lambda, this will fail and fallback to pre-compiled version
from annoy import AnnoyIndex
from lambda_annoy import AnnoyIndex
Workaround (long term)Workaround (long term)
Wrapping up...Wrapping up...
Deploying with Github ActionsDeploying with Github Actions
import statsmodels.formula.api as smf
mod = smf.quantreg('foodexp ~ income', data) # uses patsy model formulas
res =
Bootstrapping a data-driven application with Zappa
Find an apartment in Toronto with serverless Python

Find an apartment in Toronto with serverless Python