Redshift and Why the 'Like' In 'PosgresSQL Like' Matters

•Download as PPTX, PDF•

0 likes•401 views

This talk goes through Metail's experiences with Redshift and highlights some of the design decisions we made and why. Particularly where we tripped ourselves up by applying to the logic that Redshift is a PosgreSQL fork too strongly.

Data & Analytics

1
Metail's Redshift Experience and Why the 'Like'
in 'Postgres Like' Is Important
Gareth Rogers, Data Engineer

2
Metail lets you try on clothes online
Discover clothes on
your body shape
Create, save outfits
and share
Shop with confidence
of size and fit

3
Proven impact as validated by
American business schools and A/B tests
‘‘
…customers who had access to the fitting tool are more likely to come
back to the site, and this effect is statistically significant… ‘‘
…shows approximately a 5.1 percent reduction in returns
compared to the control group…In other words, providing fit information
reduces average fulfilment costs”
…sales for users with access to the tool were substantially higher overall - 22.32 percent larger
‘‘
Source: “The Value of Fit Information in Online Retail: Evidence from a
Randomized Field Experiment” by Prof Santiago Gallino (Dartmouth College -
Tuck School of Business) & Prof Antonio Moreno (Northwestern University) –Oct 21,
2015
DATA
1000+ GARMENTS
POINTS3M

4
Architecture
Comparing with a more modern flow:
http://tech.metail.com/elastic-mapreduce-metail-aws-loft-london/
User DB
DynamoDB

6
Creating the Cluster
• Compute capacity vs storage capacity
– Tight coupling of compute and storage
– We load everything into Redshift so far >3TB of data
– At 1GB per day compute cluster last 10 sprints, at 30GB per day not so long :S
– Six node dc1.8xlarge cluster costs $957.60 per week on-demand pricing

8
It’s Postgres Like – Connecting to the Server
• Postgres like system meant from day one there were
mature tools and stack overflow help
• Redshift ecosystem now more mature and
optimised tooling and help exists
• Redshift JDBC/ODBC is now recommended over
PostgresSQL driver

9
It’s Postgres Like – My First Query
WITH order_events AS (
SELECT collector_tstamp, event_id, ue_properties
FROM events
WHERE collector_tstamp >= '2015-09-20' AND collector_tstamp < '2015-10-02‘ AND event = 'unstruct'
AND JSON_EXTRACT_PATH_TEXT(ue_properties,'data','data','name') = 'Order'),
in_orders AS (
SELECT DATE(collector_tstamp) AS order_date,
COUNT(event_id) AS orders,
COUNT(DISTINCT event_id) AS orders_distinct
FROM order_events
WHERE ue_properties ILIKE '%"bin":"in",%'
GROUP BY DATE(collector_tstamp) ORDER BY DATE (collector_tstamp)),
out_orders AS (
SELECT DATE(collector_tstamp) AS order_date,
COUNT(event_id) AS orders,
COUNT(DISTINCT event_id) AS orders_distinct
FROM order_events
WHERE ue_properties ILIKE '%"bin":"out",%'
GROUP BY DATE(collector_tstamp)
ORDER BY DATE(collector_tstamp))
SELECT bin_in.order_date,
bin_in.orders AS bin_in_orders,
bin_out.orders AS bin_out_orders
FROM in_orders AS bin_in
INNER JOIN out_orders AS bin_out ON bin_in.order_date = bin_out.order_date
ORDER BY bin_in.order_date;

10
Not so Postgres Like – Schema Design
• For day-to-day querying even power users
won’t notice the difference
• For the schema designers the differences
matter and will bite you from the start
• Redshift = columnar; Postgres = row; Very
different optimisation considerations

11
Summary
• Redshift gives you all the usual AWS goodies
• Day-to-day you don’t care that Redshift is
Postgres like
• When designing the schema forget about row
databases, experiment with columnar stores

Viewers also liked

Fits.me E-Fashion Summit 2013E-Ventual

Data Insights TalkMetail

How to Land a Job in a Startup (26:02:15)Paul Connor

Doctors Medinetinfo india

Las ticsYenniffer Figueroa

I’m drifting through negative spacetoddegreene

8051 microcontrollerjokersclown57

Anillos de superbowldef sfsf

Impactos de los Cultivos Transgénicos en Uruguay: Promesas, Riesgos y CertezasCarlos Alberto Vicente

Las ticsYenniffer Figueroa

Raspored odeljenja po ucionicama 2015 09Прва нишка гимназија "Стеван Сремац" Инфо

20160919_CV_Beshr Al HamwiBeshr Syr

Hany KSA 1Hany El khamisy

Library as a classroom (23 June 2016)Sally Pewhairangi

A causa do efeito inesperadoRui da Silveira Cruz Ventura

NarutoJesus De La Rosa Teheran

Viewers also liked (16)

Fits.me E-Fashion Summit 2013

Data Insights Talk

How to Land a Job in a Startup (26:02:15)

Doctors

Las tics

I’m drifting through negative space

8051 microcontroller

Anillos de superbowl

Impactos de los Cultivos Transgénicos en Uruguay: Promesas, Riesgos y Certezas

Las tics

Raspored odeljenja po ucionicama 2015 09

20160919_CV_Beshr Al Hamwi

Hany KSA 1

Library as a classroom (23 June 2016)

A causa do efeito inesperado

Naruto

Similar to Redshift and Why the 'Like' In 'PosgresSQL Like' Matters

BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...Big Data Week

Presentation_BigData_NenaMarinn5712036

Nastel AutoPilot Proactive Application AnalyticsjKool

Spring Data JPA USE FOR CREATING DATA JPAmichaelaaron25322

Towards Increasing Predictability of Machine Learning ResearchArtemSunfun

Spring Data JPA in detail with spring bootrinky1234

Testing data warehouse applications by Kirti BhushanKirti Bhushan

Running Head PROJECT DELIVERABLE 31PROJECT DELIVERABLE 310.docxtodd581

A federated information infrastructure that works Stratebi

Mtc strategy-briefing-houston-pd m-05212018-3Dania Kodeih

Situation Awareness In A Complex Worldvsorathia

Application Metrics - IPC2023Rafael Dohms

The Next Generation Application Server – How Event Based Processing yields s...Guy Korland

Elementary Concepts of data minigDr Anjan Krishnamurthy

Azure Databricks for Data ScientistsRichard Garris

Comprehensive container based service monitoring with kubernetes and istioFred Moyer

Application Metrics (with Prometheus examples) #PHPDD18Rafael Dohms

Application Metrics (with Prometheus examples)Rafael Dohms

AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...Value Amplify Consulting

Introduction to System, Simulation and ModelMd. Hasan Imam Bijoy

Similar to Redshift and Why the 'Like' In 'PosgresSQL Like' Matters (20)

BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...

Presentation_BigData_NenaMarin

Nastel AutoPilot Proactive Application Analytics

Spring Data JPA USE FOR CREATING DATA JPA

Towards Increasing Predictability of Machine Learning Research

Spring Data JPA in detail with spring boot

Testing data warehouse applications by Kirti Bhushan

Running Head PROJECT DELIVERABLE 31PROJECT DELIVERABLE 310.docx

A federated information infrastructure that works

Mtc strategy-briefing-houston-pd m-05212018-3

Situation Awareness In A Complex World

Application Metrics - IPC2023

The Next Generation Application Server – How Event Based Processing yields s...

Elementary Concepts of data minig

Azure Databricks for Data Scientists

Comprehensive container based service monitoring with kubernetes and istio

Application Metrics (with Prometheus examples) #PHPDD18

Application Metrics (with Prometheus examples)

AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...

Introduction to System, Simulation and Model

Recently uploaded

办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La

High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh

Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics

RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort

NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics

Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863

How we prevented account sharing with MFAAndrei Kaleshka

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha

INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss

专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss

PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava

Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375

E-Commerce Order PredictionShraddha Kamble.pptxBoston Institute of Analytics

Call Girls in Saket 99530🔝 56974 Escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa

办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La

Recently uploaded (20)

办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一

High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf

RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi

NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...

Dubai Call Girls Wifey O52&786472 Call Girls Dubai

How we prevented account sharing with MFA

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...

INTERNSHIP ON PURBASHA COMPOSITE TEX LTD

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一

专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改

PKS-TGC-1084-630 - Stage 1 Proposal.pptx

Customer Service Analytics - Make Sense of All Your Data.pptx

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...

E-Commerce Order PredictionShraddha Kamble.pptx

Call Girls in Saket 99530🔝 56974 Escort Service

GA4 Without Cookies [Measure Camp AMS]

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf

办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一

Redshift and Why the 'Like' In 'PosgresSQL Like' Matters

1. 1 Metail's Redshift Experience and Why the 'Like' in 'Postgres Like' Is Important Gareth Rogers, Data Engineer

2. 2 Metail lets you try on clothes online Discover clothes on your body shape Create, save outfits and share Shop with confidence of size and fit

3. 3 Proven impact as validated by American business schools and A/B tests ‘‘ …customers who had access to the fitting tool are more likely to come back to the site, and this effect is statistically significant… ‘‘ …shows approximately a 5.1 percent reduction in returns compared to the control group…In other words, providing fit information reduces average fulfilment costs” …sales for users with access to the tool were substantially higher overall - 22.32 percent larger ‘‘ Source: “The Value of Fit Information in Online Retail: Evidence from a Randomized Field Experiment” by Prof Santiago Gallino (Dartmouth College - Tuck School of Business) & Prof Antonio Moreno (Northwestern University) –Oct 21, 2015 DATA 1000+ GARMENTS POINTS3M

4. 4 Architecture Comparing with a more modern flow: http://tech.metail.com/elastic-mapreduce-metail-aws-loft-london/ User DB DynamoDB

5. 5 Creating the Cluster

6. 6 Creating the Cluster • Compute capacity vs storage capacity – Tight coupling of compute and storage – We load everything into Redshift so far >3TB of data – At 1GB per day compute cluster last 10 sprints, at 30GB per day not so long :S – Six node dc1.8xlarge cluster costs $957.60 per week on-demand pricing

7. 7 Creating the Cluster

8. 8 It’s Postgres Like – Connecting to the Server • Postgres like system meant from day one there were mature tools and stack overflow help • Redshift ecosystem now more mature and optimised tooling and help exists • Redshift JDBC/ODBC is now recommended over PostgresSQL driver

9. 9 It’s Postgres Like – My First Query WITH order_events AS ( SELECT collector_tstamp, event_id, ue_properties FROM events WHERE collector_tstamp >= '2015-09-20' AND collector_tstamp < '2015-10-02‘ AND event = 'unstruct' AND JSON_EXTRACT_PATH_TEXT(ue_properties,'data','data','name') = 'Order'), in_orders AS ( SELECT DATE(collector_tstamp) AS order_date, COUNT(event_id) AS orders, COUNT(DISTINCT event_id) AS orders_distinct FROM order_events WHERE ue_properties ILIKE '%"bin":"in",%' GROUP BY DATE(collector_tstamp) ORDER BY DATE (collector_tstamp)), out_orders AS ( SELECT DATE(collector_tstamp) AS order_date, COUNT(event_id) AS orders, COUNT(DISTINCT event_id) AS orders_distinct FROM order_events WHERE ue_properties ILIKE '%"bin":"out",%' GROUP BY DATE(collector_tstamp) ORDER BY DATE(collector_tstamp)) SELECT bin_in.order_date, bin_in.orders AS bin_in_orders, bin_out.orders AS bin_out_orders FROM in_orders AS bin_in INNER JOIN out_orders AS bin_out ON bin_in.order_date = bin_out.order_date ORDER BY bin_in.order_date;

10. 10 Not so Postgres Like – Schema Design • For day-to-day querying even power users won’t notice the difference • For the schema designers the differences matter and will bite you from the start • Redshift = columnar; Postgres = row; Very different optimisation considerations

11. 11 Summary • Redshift gives you all the usual AWS goodies • Day-to-day you don’t care that Redshift is Postgres like • When designing the schema forget about row databases, experiment with columnar stores

Redshift and Why the 'Like' In 'PosgresSQL Like' Matters

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (16)

Similar to Redshift and Why the 'Like' In 'PosgresSQL Like' Matters

Similar to Redshift and Why the 'Like' In 'PosgresSQL Like' Matters (20)

Recently uploaded

Recently uploaded (20)

Redshift and Why the 'Like' In 'PosgresSQL Like' Matters