CloudZone Big Data Month 2016 Agenda for Mastering AWS Redshift

•Download as PPTX, PDF•

2 likes•458 views

The document provides an agenda for a Master AWS Redshift training event. The agenda includes introductory and lab sessions on using Amazon Redshift for data warehousing. It also lists presentations on processing large volumes of data with Node.js, Docker and AWS, and tips for optimizing Redshift performance.

Technology

All content is the property and proprietary interest of CloudZone, The removal of any proprietary notices, including attribution information, is strictly prohibited.

Shimon Tolts
General Manager, Data Solutions
Atom
Data Pipeline Processing 200B events with
Node.js And Docker On AWS

About ironSource: Hypergrowth
People Reached Each Month
4200
Apps Installed Every Minute
with the ironSource Platform
Registered & Analyzed Data Events
Every Month
200B
800M
50B
0
100B
150B
200B
Jun
2015
Jul
2015
Aug
2015
Sep
2015
Oct
2015
Nov
2015
Dec
2015
Jan
2016
Feb
2016
Mar
2016
Apr
2016
May
2016

We needed a way to manage this data:
Our Business Challenge
ProcessCollect Store

Collection
● Multi region layer - Latency based routing
● Low latency from client to Atom servers
● High Availability - AWS regions does fail!
● Storing raw data + headers upon receiving

Data Enrichment
● Enrich data before storing in your Data Lake
and/or Warehouse
○ IP to Country
○ Currency conversion
○ Decrypt data
○ User Agent parsing - OS, Browser, Device...
● Any custom logic you would like! - fully
extendible

Data Targets
● Near real-time data insertion - 1 minute!
● Stream data to Google Storage and/or AWS S3
● Smart insertion of data into AWS Redshift
○ Set the amount of parallel copys
○ Configure priority on tables
● BigQuery - Streaming data using batch files
import (saves 20% cost)

Micro-Services Architecture
● Everything is a service
● Decoupling
● Distributed systems
Separate lifecycle
● Communication using RESTful /
Queue / Streams

Docker
● Linux Container
● Save provisioning time
● Infrastructure as code
● Dev-Test-Production - identical container
● Ship easily

Cloud infrastructure
● Pay as you go - (grow)
● SaaS services
● Auto-scaling-groups
● DynamoDB
● RDS *SQL
● Redshift data warehouse

Continuous Integration
● From commit to production
● Jenkins commit hook
● Git branching model
● AWS dynamic slaves
● Unit tests
● Docker builds
● Updating live environment

● Xplenty - hadoop service - ~40min query
● One big cluster - 96 xlarge nodes
● No WLM configuration
● CSV copy
● No reserved nodes
● different ETL process implemented by every
department.
STARTING POINT

● using 8xlnodes if needed
● Redshift cluster per department
● “hot and cold” clusters - SSD: fast and furios, HDD: slow but cheap
● WLM configuration
● Reserved Nodes
● JSON copy
● One pipeline to rule them all - ironBeast - currently supporting over
50B events per month. inserting data to more than 10 Redshift clusters.
SOLUTION:

THINGS WE LEARNED ALONG THE WAY
● https://github.com/awslabs/amazon-redshift-utils (AdminViews)
● users permissions does not apply on new tables created in a schema
● Vacuum Vacuum Vacuum
● Avoid parallel inserts (especially in 8xl nodes) - if you copy to multiple tables, it is better to
implement a COPY queue
● STL_LOAD_ERRORS - money on the floor
● Columnar datastore does not mean you can use as much columns as you want - it is better to
split to multiple tables.
● Encode your columns - ‘analyze compression’
● instances that query Redshift should use MTU 1500 - link

10 Million
Free Monthly Events
Thank you!
ironsrc.com/atom
shimont@ironsrc.com @shimontolts

What's hot

Change Data Capture - Scale by the Bay 2019Petr Zapletal

Real-Time Vote Platform BenchmarkLahav Savir

How Docker Accelerates Continuous Development at ironSource: Containers #101 ...Brittany Ingram

Metail at Cambridge AWS User Group Main Meetup #3Gareth Rogers

Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016Coburn Watson

Apache Cassandra in the CloudInstaclustr

Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...Coburn Watson

GCPLA Meetup Workshop - Migration from a Legacy Infrastructure to the CloudSamuel Chow

Managing application & instance state on AWSDavid Mat

Cloud Overviewiasaglobal

Should You Read Kafka as a Stream or in Batch? Should You Even Care? | Ido Na...HostedbyConfluent

Presto Summit 2018 - 07 - Lyftkbajda

PloneConf2017: serverless python for astronaut safetyChris Shenton

Spotify's journey to GCPAlexey Lapitsky

Intro to the Google Cloud for DevelopersLynn Langit

Why Isn't the Cloud Cheaper - John Merline, MilwaukeeAWS Chicago

Presto Summit 2018 - 02 - LinkedInkbajda

Big Data on EC2: Mashing Technology in the CloudGeorge Ang

Big Data Day LA 2015 - Lessons learned from scaling Big Data in the Cloud by...Data Con LA

AWSome day 2018 - database in cloudCorley S.r.l.

What's hot (20)

Change Data Capture - Scale by the Bay 2019

Real-Time Vote Platform Benchmark

How Docker Accelerates Continuous Development at ironSource: Containers #101 ...

Metail at Cambridge AWS User Group Main Meetup #3

Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016

Apache Cassandra in the Cloud

Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...

GCPLA Meetup Workshop - Migration from a Legacy Infrastructure to the Cloud

Managing application & instance state on AWS

Cloud Overview

Should You Read Kafka as a Stream or in Batch? Should You Even Care? | Ido Na...

Presto Summit 2018 - 07 - Lyft

PloneConf2017: serverless python for astronaut safety

Spotify's journey to GCP

Intro to the Google Cloud for Developers

Why Isn't the Cloud Cheaper - John Merline, Milwaukee

Presto Summit 2018 - 02 - LinkedIn

Big Data on EC2: Mashing Technology in the Cloud

Big Data Day LA 2015 - Lessons learned from scaling Big Data in the Cloud by...

AWSome day 2018 - database in cloud

Similar to CloudZone Big Data Month 2016 Agenda for Mastering AWS Redshift

[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介Amazon Web Services Japan

REDSHIFT - AmazonDouglas Bernardini

[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介Amazon Web Services Japan

Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...Gary Arora

AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...Amazon Web Services

Serverless Realtime BackupAmazon Web Services

AWS for the Java DeveloperRory Preddy

AWS and Serverless with AlexaRory Preddy

Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Precisely

Deep Dive on Microservices and DockerKristana Kane

Amazon Elastic Map Reduce - Ian Meyershuguk

Getting Started with Amazon RedshiftAmazon Web Services

AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...Amazon Web Services

AWS for Java Developers workshopRory Preddy

2017 AWS DB Day | Amazon Redshift 자세히 살펴보기Amazon Web Services Korea

ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...Amazon Web Services

Migrating to Amazon RDS with Database Migration ServiceAmazon Web Services

Migrating Monolithic Applications with the Strangler Pattern Thanh Nguyen

How Glidewell Moves Data to Amazon RedshiftAttunity

Aws lambda and accesing AWS RDS - ClouddictiveClouddictive

Similar to CloudZone Big Data Month 2016 Agenda for Mastering AWS Redshift (20)

[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

REDSHIFT - Amazon

[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介

Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...

AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...

Serverless Realtime Backup

AWS for the Java Developer

AWS and Serverless with Alexa

Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...

Deep Dive on Microservices and Docker

Amazon Elastic Map Reduce - Ian Meyers

Getting Started with Amazon Redshift

AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...

AWS for Java Developers workshop

2017 AWS DB Day | Amazon Redshift 자세히 살펴보기

ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...

Migrating to Amazon RDS with Database Migration Service

Migrating Monolithic Applications with the Strangler Pattern

How Glidewell Moves Data to Amazon Redshift

Aws lambda and accesing AWS RDS - Clouddictive

Recently uploaded

Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

Install Stable Diffusion in windows machinePadma Pradeep

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

AI as an Interface for Commercial BuildingsMemoori

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

Vulnerability_Management_GRC_by Sohang Sengupta.pptxnull - The Open Security Community

Understanding the Laravel MVC ArchitecturePixlogix Infotech

The transition to renewables in India.pdfCompetition Advisory Services (India) LLP

SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

Recently uploaded (20)

Snow Chain-Integrated Tire for a Safe Drive on Winter Roads

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

Install Stable Diffusion in windows machine

Presentation on how to chat with PDF using ChatGPT code interpreter

AI as an Interface for Commercial Buildings

Human Factors of XR: Using Human Factors to Design XR Systems

Swan(sea) Song – personal research during my six years at Swansea ... and bey...

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

Vulnerability_Management_GRC_by Sohang Sengupta.pptx

Understanding the Laravel MVC Architecture

The transition to renewables in India.pdf

SQL Database Design For Developers at php[tek] 2024

My Hashitalk Indonesia April 2024 Presentation

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

08448380779 Call Girls In Friends Colony Women Seeking Men

Injustice - Developers Among Us (SciFiDevCon 2024)

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

CloudZone Big Data Month 2016 Agenda for Mastering AWS Redshift

1. All content is the property and proprietary interest of CloudZone, The removal of any proprietary notices, including attribution information, is strictly prohibited.

2. All content is the property and proprietary interest of CloudZone, The removal of any proprietary notices, including attribution information, is strictly prohibited. Big Data Month 2016 – Up Next… 15.11 22.11 22.11 28.11 30.11 14.11

3. All content is the property and proprietary interest of CloudZone, The removal of any proprietary notices, including attribution information, is strictly prohibited. 13:00 – 13:20 Intro to Amazon Redshift by IronSource 13:20 – 15:00 LAB I – Using Amazon RedShift 15:00 – 15:15 Break 15:15 – 17:25 LAB II – Table Layout and Schema Design with Amazon Redshift 17:25 – 17:30 Your next steps on AWS by CloudZone Master AWS Redshift - Agenda

4. Shimon Tolts General Manager, Data Solutions Atom Data Pipeline Processing 200B events with Node.js And Docker On AWS

5. About ironSource: Hypergrowth People Reached Each Month 4200 Apps Installed Every Minute with the ironSource Platform Registered & Analyzed Data Events Every Month 200B 800M 50B 0 100B 150B 200B Jun 2015 Jul 2015 Aug 2015 Sep 2015 Oct 2015 Nov 2015 Dec 2015 Jan 2016 Feb 2016 Mar 2016 Apr 2016 May 2016

6. We needed a way to manage this data: Our Business Challenge ProcessCollect Store

8. Collection ● Multi region layer - Latency based routing ● Low latency from client to Atom servers ● High Availability - AWS regions does fail! ● Storing raw data + headers upon receiving

9. Data Enrichment ● Enrich data before storing in your Data Lake and/or Warehouse ○ IP to Country ○ Currency conversion ○ Decrypt data ○ User Agent parsing - OS, Browser, Device... ● Any custom logic you would like! - fully extendible

10. Data Targets ● Near real-time data insertion - 1 minute! ● Stream data to Google Storage and/or AWS S3 ● Smart insertion of data into AWS Redshift ○ Set the amount of parallel copys ○ Configure priority on tables ● BigQuery - Streaming data using batch files import (saves 20% cost)

11.

12. Micro-Services Architecture ● Everything is a service ● Decoupling ● Distributed systems Separate lifecycle ● Communication using RESTful / Queue / Streams

13. Docker ● Linux Container ● Save provisioning time ● Infrastructure as code ● Dev-Test-Production - identical container ● Ship easily

14. Cloud infrastructure ● Pay as you go - (grow) ● SaaS services ● Auto-scaling-groups ● DynamoDB ● RDS *SQL ● Redshift data warehouse

15. Continuous Integration ● From commit to production ● Jenkins commit hook ● Git branching model ● AWS dynamic slaves ● Unit tests ● Docker builds ● Updating live environment

16. Diagram

17.

18. ● Xplenty - hadoop service - ~40min query ● One big cluster - 96 xlarge nodes ● No WLM configuration ● CSV copy ● No reserved nodes ● different ETL process implemented by every department. STARTING POINT

19.

20.

21. ● using 8xlnodes if needed ● Redshift cluster per department ● “hot and cold” clusters - SSD: fast and furios, HDD: slow but cheap ● WLM configuration ● Reserved Nodes ● JSON copy ● One pipeline to rule them all - ironBeast - currently supporting over 50B events per month. inserting data to more than 10 Redshift clusters. SOLUTION:

22. WORK LOAD MANAGEMENT

23. THINGS WE LEARNED ALONG THE WAY ● https://github.com/awslabs/amazon-redshift-utils (AdminViews) ● users permissions does not apply on new tables created in a schema ● Vacuum Vacuum Vacuum ● Avoid parallel inserts (especially in 8xl nodes) - if you copy to multiple tables, it is better to implement a COPY queue ● STL_LOAD_ERRORS - money on the floor ● Columnar datastore does not mean you can use as much columns as you want - it is better to split to multiple tables. ● Encode your columns - ‘analyze compression’ ● instances that query Redshift should use MTU 1500 - link

24. Redshift use cases

25. 10 Million Free Monthly Events Thank you! ironsrc.com/atom shimont@ironsrc.com @shimontolts

CloudZone Big Data Month 2016 Agenda for Mastering AWS Redshift

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to CloudZone Big Data Month 2016 Agenda for Mastering AWS Redshift

Similar to CloudZone Big Data Month 2016 Agenda for Mastering AWS Redshift (20)

More from Idan Tohami

More from Idan Tohami (20)

Recently uploaded

Recently uploaded (20)

CloudZone Big Data Month 2016 Agenda for Mastering AWS Redshift