Lecole Cole @lecole
Steffany Boldrini @steffbold
HOWTO BUILD A BIG DATA
APPLICATION
We start off by building 3-tier applications
• Web Server
• Application Server
• Database
HOWTO BUILD A BIG DATA
APPLICATION
We break down the parts to enable scaling
• Remove state from Application server
• Shard Database
• Introduce caching
3TIER ARCHITECTURE V1
Data Collection Instances
client
mobile client
Data Collection
Data Collection Instances
Data Collection Instances
Data Analysis
Elastic Load
Balancing
router
Amazon

Route 53
Internet
Gateway
Internet
Gateway
Application Server
MySQL DB instances
Application Instances
Application Instances
Application Instances
Business Users
HOWTO BUILD A BIG DATA
APPLICATION
To deal with data volume we move to NoSQL Database
• Columnar database
• Fast reads, No Joins
3TIER ARCHITECTURE V1
Data Collection Instances
client
mobile client
Data Collection
Data Collection Instances
Data Collection Instances
Data Analysis
Elastic Load
Balancing
router
Amazon

Route 53
Internet
Gateway
Internet
Gateway
Application Server
MySQL DB instances
Application Instances
Application Instances
Application Instances
Business Users
3TIER ARCHITECTURE V2
Data Collection Instances
client
mobile client
Data Collection
Data Collection Instances
Data Collection Instances
Elastic Load
Balancing
router
Amazon

Route 53
Internet
Gateway
Internet
Gateway
Application Server
Application Instances
Amazon

DynamoDB
Cache Node
Data Analysis
Business Users
Application Instances
Application Instances
HOWTO BUILD A BIG DATA
APPLICATION
But we still need SQL of some parts of our application
• We add Redshift data warehouse
• Columnar database
• Fast reads
• SQL engine
• Petabyte Scale data warehouse
3TIER ARCHITECTURE V2
Data Collection Instances
client
mobile client
Data Collection
Data Collection Instances
Data Collection Instances
Elastic Load
Balancing
router
Amazon

Route 53
Internet
Gateway
Internet
Gateway
Application Server
Application Instances
Amazon

DynamoDB
Cache Node
Amazon 

Redshift
Data Analysis
Business Users
Application Instances
Application Instances
HOWTO BUILD A BIG DATA
APPLICATION
Let push a little more into removing as many “servers” as
possible.
• How can we remove the Data Collection servers
• How can we remove Application Servers
• How can we improve thought put
3TIER ARCHITECTURE V2
Data Collection Instances
client
mobile client
Data Collection
Data Collection Instances
Data Collection Instances
Elastic Load
Balancing
router
Amazon

Route 53
Internet
Gateway
Internet
Gateway
Application Server
Application Instances
Amazon

DynamoDB
Cache Node
Amazon 

Redshift
Data Analysis
Business Users
Application Instances
Application Instances
3TIER SERVERLESS
client
mobile client
Eventful Data Collection
Amazon

DynamoDB
Cache Node
Amazon 

Redshift
Application Instances
Application Server
Elastic Load
Balancing
router
Amazon

Route 53
Internet
Gateway
Internet
Gateway
AWS
Lambda
Amazon API
Gateway
Amazon Machine
Learning
Data Analysis
Business Users
Application Instances
Application Instances
HOWTO BUILD A BIG DATA
APPLICATION
Let push a little more into removing as many “servers” as
possible.
• How can we remove the Data Collection servers
• How can we remove Application Servers
• How can we improve thought put
3TIER SERVERLESS
client
mobile client
Eventful Data Collection
Amazon

DynamoDB
Cache Node
Amazon 

Redshift
Application Instances
Application Server
Elastic Load
Balancing
router
Amazon

Route 53
Internet
Gateway
Internet
Gateway
AWS
Lambda
Amazon API
Gateway
Amazon Machine
Learning
Data Analysis
Business Users
Application Instances
Application Instances
HOWTO BUILD A BIG DATA
APPLICATION
Removing your application servers is not so easy.
• You run your application server as a monolith
• You have a proven build/deployment process
• You understand how to debug your application
locally
3TIER SERVERLESS
client
mobile client
Eventful Data Collection
Amazon

DynamoDB
Cache Node
Amazon 

Redshift
Application Instances
Application Server
Elastic Load
Balancing
router
Amazon

Route 53
Internet
Gateway
Internet
Gateway
AWS
Lambda
Amazon API
Gateway
Amazon Machine
Learning
Data Analysis
Business Users
Application Instances
Application Instances
MICRO SERVICES
Some of you may have seen the Netflix’s Micro-
services diagram
MICRO SERVICES
• It’s not as scary as you think
• Most applications are small (Compared to Netflix)
• There are many frameworks to manage your
application
• Serverless (Multi Cloud, Multi language)
• AWS SAM (AWS specific, Multi language)
MICRO SERVICES
This architecture requires different tooling and mindset
• More difficult to run offline
• More difficult to debug
• More difficult to Monitor
• Many more places where it can and will break
MICRO SERVICES
With all these problem why would I want to more to
Micro Services.
• Many different presentations on why
• Microservices at Netflix scale (Great video)
3TIER SERVERLESS
client
mobile client
Eventful Data Collection
Amazon

DynamoDB
Cache Node
Amazon 

Redshift
Application Instances
Application Server
Elastic Load
Balancing
router
Amazon

Route 53
Internet
Gateway
Internet
Gateway
AWS
Lambda
Amazon API
Gateway
Amazon Machine
Learning
Data Analysis
Business Users
Application Instances
Application Instances
SERVERLESS
client
mobile client
Internet
Gateway
AWS
Lambda
Amazon API
Gateway
Amazon
Kinesis
Amazon
Elastic Search
Service
Amazon
Kinesis
Analytics
React Static
Application
Data Analysis
Business Users
Amazon

DynamoDB
Cache Node
Amazon 

Redshift
Amazon Machine
Learning
AWS
Lambda
Amazon API
Gateway
AWS
Lambda
AWS
Lambda
MOVINGYOUR APPLICATION
TO SERVERLESS
What are the steps we took to move our monolithic
Django/Java/NodeJS app to Serverless.
• Separate presentation logic from business logic
• Understand time requirements for each function
• Understand the input and output payload for each
function
MOVINGYOUR APPLICATION
TO SERVERLESS
What technologies will you use.
• Serverless framework
• Similar to AWS SAM
• Multi Language (Python, Go, C#, Javascript, Java)
• Multi Platform (AWS, GCP, Microsoft, IBM)
MOVINGYOUR APPLICATION
TO SERVERLESS
Understanding the limitations and constraints.
• AWS Lambda
• 5 minute execution time
• 50 mb function bundles
• API Gateway
• 30 Second time out
• 10 MB payload limit
MOVINGYOUR APPLICATION
TO SERVERLESS
Breaking down the frontend of our monolith application.
• Separate view logic and business logic
• Create libraries out of business logic to simplify
sharing
• View logic and template gets translated to
ReactJS components
• API Gateway endpoints are created for each
view
MOVINGYOUR APPLICATION
TO SERVERLESS
Wait how do we manage 30-50 endpoints.
• Thats where Serverless framework comes into
play
• Define each endpoint usingYAML
• Define access each Lambda function to API
Gateway connection
• Define access to other resources needed.
MOVINGYOUR APPLICATION
TO SERVERLESS
But my functions takes longer then 5 minutes to
execute.
• Well there is still a problem.
• API Gateway has a 30 second time out.
• So your functions have to execute within 30
seconds.
• That will not work for my application. (My
application too)
MOVINGYOUR APPLICATION
TO SERVERLESS
How to get passed the 30 second time out from API Gateway.
• Create ticketing system.
• ReactJS component requests data from the backend,
and the backend returned aTicket ID.
• ReactJS application Polls for results to the backend
every (x) seconds.
• Ticketing systems checks status and returns to the
frontend the job has completed and the data is ready.
• ReactJS application requests data for itsTicket ID.
MOVINGYOUR APPLICATION
TO SERVERLESS
How do I do all of that with ReactJS.
• We wont go in-depth with ReactJS (Thats a whole talk by
itself).
• With ReactJS we use Redux for data flow coordination.
• And Redux Saga for side effects (Ajax calls to the server).
• Redux Saga allows you to create a kind of demon
process for your data fetching from the backend.
• We create a long demon sequence to poll and fetch
data after the processing is complete.
MOVINGYOUR APPLICATION
TO SERVERLESS
Wait I thought you need NodeJS to run ReactJS
• ReactJS is just Javascript and can be ran from a
CDN.
• You can compile your ReactJS application into a
Javascript bundle thats loaded from a Static HTML
page.
• Once RectJS is loaded it runs from the users
browser.
MOVINGYOUR APPLICATION
TO SERVERLESS
What about this ticketing system.
• This is another place Lambda comes in handy.
• You can asynchronous launch a lambda function
per request.
• If you need a complex workflow, you can use AWS
Step functions to synchronize your lambda
functions.
MOVINGYOUR APPLICATION
TO SERVERLESS
Put all that together you get something pretty
interesting.
• No longer paying for idle time.
• Better react to spikes in your system.
• Ability to scale dynamically.
• Ability to start off small and grow.
3TIER SERVERLESS
client
mobile client
Eventful Data Collection
Amazon

DynamoDB
Cache Node
Amazon 

Redshift
Application Instances
Application Server
Elastic Load
Balancing
router
Amazon

Route 53
Internet
Gateway
Internet
Gateway
AWS
Lambda
Amazon API
Gateway
Amazon Machine
Learning
Data Analysis
Business Users
Application Instances
Application Instances
SERVERLESS
client
mobile client
Internet
Gateway
AWS
Lambda
Amazon API
Gateway
Amazon
Kinesis
Amazon
Elastic Search
Service
Amazon
Kinesis
Analytics
React Static
Application
Data Analysis
Business Users
Amazon

DynamoDB
Cache Node
Amazon 

Redshift
Amazon Machine
Learning
AWS
Lambda
Amazon API
Gateway
AWS
Lambda
AWS
Lambda
MOVINGYOUR APPLICATION
TO SERVERLESS
Its not all good.
• Need for increase monitoring.
• Need for increase Alerting.
• Need for better logging.
• Need for better error handling.
QUESTIONS

How to Build a Big Data Application: Serverless Edition

  • 1.
    Lecole Cole @lecole SteffanyBoldrini @steffbold
  • 2.
    HOWTO BUILD ABIG DATA APPLICATION We start off by building 3-tier applications • Web Server • Application Server • Database
  • 3.
    HOWTO BUILD ABIG DATA APPLICATION We break down the parts to enable scaling • Remove state from Application server • Shard Database • Introduce caching
  • 4.
    3TIER ARCHITECTURE V1 DataCollection Instances client mobile client Data Collection Data Collection Instances Data Collection Instances Data Analysis Elastic Load Balancing router Amazon
 Route 53 Internet Gateway Internet Gateway Application Server MySQL DB instances Application Instances Application Instances Application Instances Business Users
  • 5.
    HOWTO BUILD ABIG DATA APPLICATION To deal with data volume we move to NoSQL Database • Columnar database • Fast reads, No Joins
  • 6.
    3TIER ARCHITECTURE V1 DataCollection Instances client mobile client Data Collection Data Collection Instances Data Collection Instances Data Analysis Elastic Load Balancing router Amazon
 Route 53 Internet Gateway Internet Gateway Application Server MySQL DB instances Application Instances Application Instances Application Instances Business Users
  • 7.
    3TIER ARCHITECTURE V2 DataCollection Instances client mobile client Data Collection Data Collection Instances Data Collection Instances Elastic Load Balancing router Amazon
 Route 53 Internet Gateway Internet Gateway Application Server Application Instances Amazon
 DynamoDB Cache Node Data Analysis Business Users Application Instances Application Instances
  • 8.
    HOWTO BUILD ABIG DATA APPLICATION But we still need SQL of some parts of our application • We add Redshift data warehouse • Columnar database • Fast reads • SQL engine • Petabyte Scale data warehouse
  • 9.
    3TIER ARCHITECTURE V2 DataCollection Instances client mobile client Data Collection Data Collection Instances Data Collection Instances Elastic Load Balancing router Amazon
 Route 53 Internet Gateway Internet Gateway Application Server Application Instances Amazon
 DynamoDB Cache Node Amazon 
 Redshift Data Analysis Business Users Application Instances Application Instances
  • 10.
    HOWTO BUILD ABIG DATA APPLICATION Let push a little more into removing as many “servers” as possible. • How can we remove the Data Collection servers • How can we remove Application Servers • How can we improve thought put
  • 11.
    3TIER ARCHITECTURE V2 DataCollection Instances client mobile client Data Collection Data Collection Instances Data Collection Instances Elastic Load Balancing router Amazon
 Route 53 Internet Gateway Internet Gateway Application Server Application Instances Amazon
 DynamoDB Cache Node Amazon 
 Redshift Data Analysis Business Users Application Instances Application Instances
  • 12.
    3TIER SERVERLESS client mobile client EventfulData Collection Amazon
 DynamoDB Cache Node Amazon 
 Redshift Application Instances Application Server Elastic Load Balancing router Amazon
 Route 53 Internet Gateway Internet Gateway AWS Lambda Amazon API Gateway Amazon Machine Learning Data Analysis Business Users Application Instances Application Instances
  • 13.
    HOWTO BUILD ABIG DATA APPLICATION Let push a little more into removing as many “servers” as possible. • How can we remove the Data Collection servers • How can we remove Application Servers • How can we improve thought put
  • 14.
    3TIER SERVERLESS client mobile client EventfulData Collection Amazon
 DynamoDB Cache Node Amazon 
 Redshift Application Instances Application Server Elastic Load Balancing router Amazon
 Route 53 Internet Gateway Internet Gateway AWS Lambda Amazon API Gateway Amazon Machine Learning Data Analysis Business Users Application Instances Application Instances
  • 15.
    HOWTO BUILD ABIG DATA APPLICATION Removing your application servers is not so easy. • You run your application server as a monolith • You have a proven build/deployment process • You understand how to debug your application locally
  • 16.
    3TIER SERVERLESS client mobile client EventfulData Collection Amazon
 DynamoDB Cache Node Amazon 
 Redshift Application Instances Application Server Elastic Load Balancing router Amazon
 Route 53 Internet Gateway Internet Gateway AWS Lambda Amazon API Gateway Amazon Machine Learning Data Analysis Business Users Application Instances Application Instances
  • 17.
    MICRO SERVICES Some ofyou may have seen the Netflix’s Micro- services diagram
  • 18.
    MICRO SERVICES • It’snot as scary as you think • Most applications are small (Compared to Netflix) • There are many frameworks to manage your application • Serverless (Multi Cloud, Multi language) • AWS SAM (AWS specific, Multi language)
  • 19.
    MICRO SERVICES This architecturerequires different tooling and mindset • More difficult to run offline • More difficult to debug • More difficult to Monitor • Many more places where it can and will break
  • 20.
    MICRO SERVICES With allthese problem why would I want to more to Micro Services. • Many different presentations on why • Microservices at Netflix scale (Great video)
  • 21.
    3TIER SERVERLESS client mobile client EventfulData Collection Amazon
 DynamoDB Cache Node Amazon 
 Redshift Application Instances Application Server Elastic Load Balancing router Amazon
 Route 53 Internet Gateway Internet Gateway AWS Lambda Amazon API Gateway Amazon Machine Learning Data Analysis Business Users Application Instances Application Instances
  • 22.
    SERVERLESS client mobile client Internet Gateway AWS Lambda Amazon API Gateway Amazon Kinesis Amazon ElasticSearch Service Amazon Kinesis Analytics React Static Application Data Analysis Business Users Amazon
 DynamoDB Cache Node Amazon 
 Redshift Amazon Machine Learning AWS Lambda Amazon API Gateway AWS Lambda AWS Lambda
  • 23.
    MOVINGYOUR APPLICATION TO SERVERLESS Whatare the steps we took to move our monolithic Django/Java/NodeJS app to Serverless. • Separate presentation logic from business logic • Understand time requirements for each function • Understand the input and output payload for each function
  • 24.
    MOVINGYOUR APPLICATION TO SERVERLESS Whattechnologies will you use. • Serverless framework • Similar to AWS SAM • Multi Language (Python, Go, C#, Javascript, Java) • Multi Platform (AWS, GCP, Microsoft, IBM)
  • 25.
    MOVINGYOUR APPLICATION TO SERVERLESS Understandingthe limitations and constraints. • AWS Lambda • 5 minute execution time • 50 mb function bundles • API Gateway • 30 Second time out • 10 MB payload limit
  • 26.
    MOVINGYOUR APPLICATION TO SERVERLESS Breakingdown the frontend of our monolith application. • Separate view logic and business logic • Create libraries out of business logic to simplify sharing • View logic and template gets translated to ReactJS components • API Gateway endpoints are created for each view
  • 27.
    MOVINGYOUR APPLICATION TO SERVERLESS Waithow do we manage 30-50 endpoints. • Thats where Serverless framework comes into play • Define each endpoint usingYAML • Define access each Lambda function to API Gateway connection • Define access to other resources needed.
  • 28.
    MOVINGYOUR APPLICATION TO SERVERLESS Butmy functions takes longer then 5 minutes to execute. • Well there is still a problem. • API Gateway has a 30 second time out. • So your functions have to execute within 30 seconds. • That will not work for my application. (My application too)
  • 29.
    MOVINGYOUR APPLICATION TO SERVERLESS Howto get passed the 30 second time out from API Gateway. • Create ticketing system. • ReactJS component requests data from the backend, and the backend returned aTicket ID. • ReactJS application Polls for results to the backend every (x) seconds. • Ticketing systems checks status and returns to the frontend the job has completed and the data is ready. • ReactJS application requests data for itsTicket ID.
  • 30.
    MOVINGYOUR APPLICATION TO SERVERLESS Howdo I do all of that with ReactJS. • We wont go in-depth with ReactJS (Thats a whole talk by itself). • With ReactJS we use Redux for data flow coordination. • And Redux Saga for side effects (Ajax calls to the server). • Redux Saga allows you to create a kind of demon process for your data fetching from the backend. • We create a long demon sequence to poll and fetch data after the processing is complete.
  • 31.
    MOVINGYOUR APPLICATION TO SERVERLESS WaitI thought you need NodeJS to run ReactJS • ReactJS is just Javascript and can be ran from a CDN. • You can compile your ReactJS application into a Javascript bundle thats loaded from a Static HTML page. • Once RectJS is loaded it runs from the users browser.
  • 32.
    MOVINGYOUR APPLICATION TO SERVERLESS Whatabout this ticketing system. • This is another place Lambda comes in handy. • You can asynchronous launch a lambda function per request. • If you need a complex workflow, you can use AWS Step functions to synchronize your lambda functions.
  • 33.
    MOVINGYOUR APPLICATION TO SERVERLESS Putall that together you get something pretty interesting. • No longer paying for idle time. • Better react to spikes in your system. • Ability to scale dynamically. • Ability to start off small and grow.
  • 34.
    3TIER SERVERLESS client mobile client EventfulData Collection Amazon
 DynamoDB Cache Node Amazon 
 Redshift Application Instances Application Server Elastic Load Balancing router Amazon
 Route 53 Internet Gateway Internet Gateway AWS Lambda Amazon API Gateway Amazon Machine Learning Data Analysis Business Users Application Instances Application Instances
  • 35.
    SERVERLESS client mobile client Internet Gateway AWS Lambda Amazon API Gateway Amazon Kinesis Amazon ElasticSearch Service Amazon Kinesis Analytics React Static Application Data Analysis Business Users Amazon
 DynamoDB Cache Node Amazon 
 Redshift Amazon Machine Learning AWS Lambda Amazon API Gateway AWS Lambda AWS Lambda
  • 36.
    MOVINGYOUR APPLICATION TO SERVERLESS Itsnot all good. • Need for increase monitoring. • Need for increase Alerting. • Need for better logging. • Need for better error handling.
  • 37.