Scaling etl with hadoop shapira 3

System Architect at Confluent
Sep. 13, 2013
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
Scaling etl with hadoop   shapira 3
1 of 46

More Related Content

What's hot

How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...DataWorks Summit/Hadoop Summit
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoopmarkgrover
Cisco connect toronto 2015 big data  sean mc keownCisco connect toronto 2015 big data  sean mc keown
Cisco connect toronto 2015 big data sean mc keownCisco Canada
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaDataWorks Summit
Apache Eagle - Monitor Hadoop in Real TimeApache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real TimeDataWorks Summit/Hadoop Summit
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopDataWorks Summit/Hadoop Summit

What's hot(20)

Viewers also liked

A Walk Through the Kimball ETL Subsystems with Oracle Data IntegrationA Walk Through the Kimball ETL Subsystems with Oracle Data Integration
A Walk Through the Kimball ETL Subsystems with Oracle Data IntegrationMichael Rainey
Designing High Performance ETL for Data WarehouseDesigning High Performance ETL for Data Warehouse
Designing High Performance ETL for Data WarehouseMarcel Franke
ETL Is Dead, Long-live StreamsETL Is Dead, Long-live Streams
ETL Is Dead, Long-live StreamsC4Media
Designing and implementing_an_etl_frameworkDesigning and implementing_an_etl_framework
Designing and implementing_an_etl_frameworkBharat Vadlamudi
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 DataWorks Summit
Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...
Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...Mark Rittman

Similar to Scaling etl with hadoop shapira 3

Getting Started with HadoopGetting Started with Hadoop
Getting Started with HadoopCloudera, Inc.
Intro to Big DataIntro to Big Data
Intro to Big DataZohar Elkayam
Things Every Oracle DBA Needs to Know about the Hadoop EcosystemThings Every Oracle DBA Needs to Know about the Hadoop Ecosystem
Things Every Oracle DBA Needs to Know about the Hadoop EcosystemZohar Elkayam
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoopbddmoscow
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics PlatformN Masahiro
Search in the Apache Hadoop Ecosystem: Thoughts from the FieldSearch in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the FieldAlex Moundalexis

More from Gwen (Chen) Shapira

Velocity 2019  - Kafka Operations Deep DiveVelocity 2019  - Kafka Operations Deep Dive
Velocity 2019 - Kafka Operations Deep DiveGwen (Chen) Shapira
Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote Gwen (Chen) Shapira
Gluecon - Kafka and the service meshGluecon - Kafka and the service mesh
Gluecon - Kafka and the service meshGwen (Chen) Shapira
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Gwen (Chen) Shapira
Papers we love   realtime at facebookPapers we love   realtime at facebook
Papers we love realtime at facebookGwen (Chen) Shapira
Kafka reliability   velocity 17Kafka reliability   velocity 17
Kafka reliability velocity 17Gwen (Chen) Shapira

Recently uploaded

info_session_gdsc_tmsl .pptxinfo_session_gdsc_tmsl .pptx
info_session_gdsc_tmsl .pptxNikitaSingh741518
Metadata & Discovery Group Conference 2023 - Day 1 ProgrammeMetadata & Discovery Group Conference 2023 - Day 1 Programme
Metadata & Discovery Group Conference 2023 - Day 1 ProgrammeCILIP MDG
Elevate Your Enterprise with FME 23.1Elevate Your Enterprise with FME 23.1
Elevate Your Enterprise with FME 23.1Safe Software
who we are - values.pptxwho we are - values.pptx
who we are - values.pptxLauraGarceran
LLaMA 2.pptxLLaMA 2.pptx
LLaMA 2.pptxRkRahul16
Netwitness RT - Don’t scratch that patch.pptxNetwitness RT - Don’t scratch that patch.pptx
Netwitness RT - Don’t scratch that patch.pptxStefano Maccaglia

Scaling etl with hadoop shapira 3

Editor's Notes

  1. Start with a portion of your data for fast iterationsPrototype – with Impala / streamingStart high level – tune as you go
  2. Pregnancy takes 9 month, no matter how many women are assigned to itBut – Elephants are pregnant for two years!
  3. 50% of Hadoop systems end up managed by the data warehouse team. And they want to do things as they always did – Kimball methodologies, time dimensions, etc.Not always a good idea, not always scalable, not always possible.