SlideShare a Scribd company logo
1 of 61
Your Trusted Third Party in the Digital Age™
Scalding on Tez
Twitter HQ, July 14th, 2015
Copyright©2015TransparencyRightsManagement.Allrightsreserved
2
• Who’s this guy?
• How did we come to use Scalding?
• Scalding on Tez: the Mini-HOWTO
• In practice
• Tips and Tricks
• All aboard: how?
• Performance
Agenda
Copyright©2015TransparencyRightsManagement.Allrightsreserved
3
WHO’S THIS GUY?
Copyright©2015TransparencyRightsManagement.Allrightsreserved
4Images: Amos Evans / « Rama » / Marcin Wichary // Wikipedia
• I’m 39
• My oldest
computer is 33
Who’s this guy?
8-bit
Basic(s) Z80
assembly
Turbo
Pascal
C++
Python
Java
ISO CNC
C#
Scala
Still afraid of
Shapeless
Copyright©2015TransparencyRightsManagement.Allrightsreserved
5
HOW DID WE COME TO SCALDING?
Copyright©2015TransparencyRightsManagement.Allrightsreserved
6
• A Trusted Third Party
– Data escrow, controlled execution
– Independent re-computation
– Privacy & Personal Data compliance assessment
• Big Data Services for Entertainment
– Metadata enrichment
– IP use certification
– Dataset analysis as a service
Why Scalding?
Transparency Rights Management:
Copyright©2015TransparencyRightsManagement.Allrightsreserved
7
Why Scalding?
« Big Data Services for Entertainment » - a Use Case
Digital Service
Provider
Report
Copyright Owners /
Collective
Management
Organizations
Copyright©2015TransparencyRightsManagement.Allrightsreserved
8
Why Scalding?
« Big Data Services for Entertainment » - a Use Case
Digital Service
Provider
Report
Copyright Owners /
Collective
Management
Organizations
Data
Improvement
Automatic Data Feed
(« in your format »)
Independent
Report
Conformance
Report
Copyright©2015TransparencyRightsManagement.Allrightsreserved
9
• September 2013: SQL Server overheats
• October 2013: using Lingual
12 SQL steps + bash scripts
• September 2014: Cascading + Java
• September 28th: tried out Scalding
• November 2014: delivered first results on
Scalding
• April 2015: First success on Scalding+Tez
Why Scalding?
Dataset analysis (from YouTube monthly reports)
Copyright©2015TransparencyRightsManagement.Allrightsreserved
10
Our system…
Jenkins
git
Mesos
Chronos Marathon
YARN 2.6.0
HDFS 2.6.0
Debian Debian Debian DebianDebian
Ansible
APP
scalding
cascading
YARN
RM
APP (WS)
Akka Spray
Artifactory
4-way
Non-Reg
Jenkins
Slave
Copyright©2015TransparencyRightsManagement.Allrightsreserved
11
Our system…
7 machines, and still a lot of things to discover
Copyright©2015TransparencyRightsManagement.Allrightsreserved
12
SCALDING ON TEZ,
THE MINI-HOWTO
Copyright©2015TransparencyRightsManagement.Allrightsreserved
13
• Step 0: Prerequisites:
– A YARN cluster
– Cascading 3.0
– TEZ runtime lib in HDFS
– A version of scalding with fabric selection
Scalding on Tez, the mini-howto
(2.6.0)
0.6.2-SNAPSHOT
0.13.1 + PR1220
Copyright©2015TransparencyRightsManagement.Allrightsreserved
14https://github.com/cchepelov/wcplus/blob/master/build.sbt
Scalding on Tez, the mini-HOWTO
• Step 1: build.sbt
Copyright©2015TransparencyRightsManagement.Allrightsreserved
15
Scalding on Tez, the mini-HOWTO
• Step 1: build.sbt (redux)
1. Regain control on what libraries are included
2. Exclude some « long transitive » dependencies that pull in junk
3. Put in the desired fabric, in a configurable way
sbt --DCASCADING_FABRIC=hadoop clean assembly
Copyright©2015TransparencyRightsManagement.Allrightsreserved
16
Scalding on Tez, the mini-HOWTO
• Step 1bis: assembly.sbt
We’re using fatjars to simplify deployment.
Because of jar hell, we « need » a complicated assembly.sbt
https://github.com/cchepelov/wcplus/blob/master/assembly.sbt
Copyright©2015TransparencyRightsManagement.Allrightsreserved
17https://github.com/cchepelov/wcplus/blob/master/src/main/scala/com/transparencyrights/demo/wcplus/CommonJob.scala
Scalding on Tez, the mini-HOWTO
• Step 2: a few job flags
Copyright©2015TransparencyRightsManagement.Allrightsreserved
18
• tez.task.resource.memory.mb
– As large as you can afford to give, per CPU per node
– The more memory, the less Tez needs to spill
intermediates to disk
• tez.container.max.java.heap.fraction
– Defaults (1024MiB * 0.8) assume the JVM’s Native
memory requirements don’t exceed 208 MiB
– Scalding + the Scala runtime + Cascading on top of
Tez seems to require more.
YARN kills offenders switftly!
– The 460MiB figure we’re using (1024+512)*(1-0.7)
may be a bit wasteful
• Step 2: a few job flags (continued)
Copyright©2015TransparencyRightsManagement.Allrightsreserved
19
THAT’S IT.
(ALMOST)
Copyright©2015TransparencyRightsManagement.Allrightsreserved
20
IN PRACTICE…
Copyright©2015TransparencyRightsManagement.Allrightsreserved
21
« A VERSION OF SCALDING WITH
FABRIC SELECTION »
WAIT, WHAT?
Copyright©2015TransparencyRightsManagement.Allrightsreserved
22
Scalding traditional --local and --hdfs flags:
– Uses either LocalFlowConnector or
HadoopFlowConnector
– Types are hard-coded
Cascading 2.5 introduced a new fabric concept.
You can run either with cascading-hadoop or
with cascading-hadoop2-mr1. But:
– Incompatible jars (can’t load both)
– Main types visible to Scalding are different
In practice
« A version of scalding with fabric selection » Wait, What?
Copyright©2015TransparencyRightsManagement.Allrightsreserved
23
PR1220:
 No longer hardcodes « either Local or Hadoop 1.X »
 Enables supplying any flow connector
implementation, as long as the jar’s around.
 --hdfs to be deprecated as an alias to --hadoop1
 Still built against Cascading 2.6
In practice
« A version of scalding with fabric selection » Wait, What?
Copyright©2015TransparencyRightsManagement.Allrightsreserved
24
« STILL BUILT ON CASCADING 2.6 »
WHY?
Copyright©2015TransparencyRightsManagement.Allrightsreserved
25
Cascading 3.0 has carefully updated some argument types
to prepare for the future
This is source- and binary-compatible:
In practice
« Still built on Cascading 2.6 »
Scala enforces generic type safety, and the Cascading 3.0
upgrades are not legal with scalac.
But they still are with the JVM…
libraryconsumer
LibraryV2
Same
consumer
In Java
Copyright©2015TransparencyRightsManagement.Allrightsreserved
26
Scalding will require some adjustment to
become compatible with the java-level source
upgrades.
Can this happen without breaking scalding
application source code ?
In practice
… Going to native Cascading 3.0 ?
Copyright©2015TransparencyRightsManagement.Allrightsreserved
27
GUAVA
Copyright©2015TransparencyRightsManagement.Allrightsreserved
28
GUAVAGUAVA
Copyright©2015TransparencyRightsManagement.Allrightsreserved
29
• Guava is a nice library…
… of little use in Scala (?)
• In a Scalding/Cascading/Tez JVM, multiple versions of
guava are required. Each layer depends on its own
version.
About every single version from 11.0 to 16.0.2
• There have been breaking changes (method renames &
removals) in guava 13
• These happen on really mundane objects (Closeable,
Stopwatch), but they’re major troublemakers
In practice…
Guava.
Copyright©2015TransparencyRightsManagement.Allrightsreserved
30
• Asking Apache to quickly upgrade to guava
18, or Google to re-introduce deprecated
interfaces… probably not immediate
• Solution: Frankenguava.
In practice…
Guava Hell: a temporary solution
Guava 18.0 JAR
Copyright©2015TransparencyRightsManagement.Allrightsreserved
31
• Asking Apache to quickly upgrade to guava
18, or Google to re-introduce deprecated
interfaces… probably not immediate
• Solution: Frankenguava.
In practice…
Guava Hell: a temporary solution
Guava 18.0 JAR
Stopwatch &
Closeables
Copyright©2015TransparencyRightsManagement.Allrightsreserved
32
• Asking Apache to quickly upgrade to guava
18, or Google to re-introduce deprecated
interfaces… probably not immediate
• Solution: Frankenguava.
In practice…
Guava Hell: a temporary solution
Guava 18.0 JAR
Stopwatch &
Closeables including
deprecated
overloads
Stopwatch &
Closeables
Copyright©2015TransparencyRightsManagement.Allrightsreserved
33
• Step 1: Post-prepare
the Tez runtime
• Step 2: Enforce the use
of the appropriate
guava
In practice…
Frankenguava: howto
• Build tez from source
• Unpack runtime jar from tez-dist
• Remove guava
• Put frankenguava
• Repack
• Deploy on HDFS
Copyright©2015TransparencyRightsManagement.Allrightsreserved
34
CASCADING’S TEZ*REGISTRY
Copyright©2015TransparencyRightsManagement.Allrightsreserved
35
• Cascading 3.0 uses a set of mapping registries
to convert cascading patterns into the back-
end API.
The Tez registries are new, and distinct from the MR
registries
• The Tez registries are hardened against
Concurrent’s extensive test library, which is built
on years of MR experience.
Tez has its own trouble spots.
Beware of hash joins.
• It works fine now, but getting the scalding test
library onboard will help a long way.
In practice…
Cascading’s Tez*Registry
Copyright©2015TransparencyRightsManagement.Allrightsreserved
36
• It works mostly fine now, but getting the scalding
test library onboard will help a long way.
In practice…
Cascading’s Tez*Registry
Last-minute update:
.filterWithValue / .mapWithValue currently
crash the Cascading planner (as of 3.0.1)
(implementation uses a HashJoin)
Copyright©2015TransparencyRightsManagement.Allrightsreserved
37
AN EXAMPLE
Copyright©2015TransparencyRightsManagement.Allrightsreserved
38
A small test:
Copyright©2015TransparencyRightsManagement.Allrightsreserved
39
A small test: « wc plus »
70 books
1.1M lines
10M words
56M bytes
Word,
relative frequency,
deviation from median relative freq
Two Words,
relative frequency,
deviation from median relative freq
Ten Words,
relative frequency,
deviation from median relative freq
Compute
Frequencies
Ignoring things that
are more frequent
than 80% of the max
word frequency
All Expressions (1-W to 10-W),
relative frequency,
deviation from median relative freq
…
Copyright©2015TransparencyRightsManagement.Allrightsreserved
40
A small test: « wc plus »
70 books
1.1M lines
10M words
56M bytes
Word,
relative frequency,
deviation from median relative freq
Two Words,
relative frequency,
deviation from median relative freq
Ten Words,
relative frequency,
deviation from median relative freq
Compute
Frequencies
Ignoring things that
are more frequent
than 80% of the max
word frequency
All Expressions (1-W to 10-W),
relative frequency,
deviation from median relative freq
…
No .filterWithValue /
.mapWithValue for now
Roulex45 / Wikipedia
count
count
count
count
Copyright©2015TransparencyRightsManagement.Allrightsreserved
41
A small test: « wc plus »
Copyright©2015TransparencyRightsManagement.Allrightsreserved
42
TIPS & TRICKS
Copyright©2015TransparencyRightsManagement.Allrightsreserved
43
Run your job with
-Dcascading.planner.plan.path=/tmp/path/to/plan.lst
The planner will output a lot of useful files. One of them is
…/$(Job)/4-final-flow-steps/0000-step-node-sub-graph.dot
Run that file through graphviz
dot –O –Tpdf 0000-step-node-sub-graph.dot
or, if the PDF is illegible, Firefox’s great at zooming into SVG files:
dot –O –Tsvg 0000-step-node-sub-graph.dot
Tips & Tricks
0000-step-node-sub-graph.dot
Copyright©2015TransparencyRightsManagement.Allrightsreserved
44
Tips & Tricks
0000-step-node-sub-graph.dot
This is how TEZ names
our stuff !
Copyright©2015TransparencyRightsManagement.Allrightsreserved
45
MR
– One flow, many (MANY)
independent steps
– One or more operators per
step
– Step-to-step
communications involve
disk (HDFS)
– Each step is independent
as far as MR is concerned
– Step scheduling managed
from outside the cluster,
by Cascading
TEZ
– One flow, one DAG. A DAG
includes several nodes.
– One or more operators per
node
– Node-to-Node
communications managed
by TEZ. Memory, direct
network or disk as
necessary
– YARN sees one
« Application » per flow
– Node scheduling managed
by TEZ DAG AppMaster
Tips & Tricks
Major differences between how a cascading job gets
mapped to MR and to TEZ:
Copyright©2015TransparencyRightsManagement.Allrightsreserved
46
Tips & Tricks
yarn-swimlanes.sh
• A tool included in the tez source distribution,
in tez-tools/swimlanes (bash + python)
• Requires YARN ATS to work
« yarn logs –applicationId application_1345431315_1511 » must work
• Reports, in a GANTT chart, the per-container
occupation
Copyright©2015TransparencyRightsManagement.Allrightsreserved
47
Tips & Tricks
yarn-swimlanes.sh (2)
application_1435150225179_0474.svg
Copyright©2015TransparencyRightsManagement.Allrightsreserved
48
Tips & Tricks
yarn-swimlanes.sh (3)
time
containers
Copyright©2015TransparencyRightsManagement.Allrightsreserved
49
Tips & Tricks
Consider using .forceToDisk to ensure work is balanced
within the DAG
890 seconds
160 seconds
Copyright©2015TransparencyRightsManagement.Allrightsreserved
50
Tips & Tricks
Consider using .forceToDisk to ensure work is balanced
within the DAG
890 seconds 160 seconds
Copyright©2015TransparencyRightsManagement.Allrightsreserved
51
• .forceToDisk really means « don’t merge
those two TEZ nodes » which implies
« manage appropriate data transmission
between these two nodes »
• TextFile & other FixedPathSource friends
don’t seem to automatically spread out work
as well as they used to (huh?)
• YMMV, WIP.
Tips & Tricks
• Consider using .forceToDisk to ensure work is balanced
within the DAG
Copyright©2015TransparencyRightsManagement.Allrightsreserved
52
ALL ABOARD: HOW?
Copyright©2015TransparencyRightsManagement.Allrightsreserved
53
• A build of scalding against Cascading 3.0.x
 Fabric-switching logic
 Get the test library to pass also on Tez
 Some applications might still uncover new mapping issues 
increased community test case experience
 ???
• Getting the « guava mess » fixed
 Ideally all of Apache goes to recent guavas
 Enforced shading of Guava across the whole stack?
 Failing that, automated runtime patcher?
 (my « build stuff » partner makes me write: OSGI/Java9)
 ???
• Except for that, Tez is really easy for a YARN shop. Drop it
in, and it runs!
All aboard: how?
Smoothening up the UX for us app developers
Copyright©2015TransparencyRightsManagement.Allrightsreserved
54
PERFORMANCE
Copyright©2015TransparencyRightsManagement.Allrightsreserved
55
Performance
MR vs TEZ
Copyright©2015TransparencyRightsManagement.Allrightsreserved
56
Performance
MR vs TEZ; to scale
Copyright©2015TransparencyRightsManagement.Allrightsreserved
57
Performance
MR vs TEZ; TO SCALE!!!
MR run time:
14:22 (wall)
12:49 (cluster time)
5:43:26 (total CPU)
TEZ run time:
4:03(wall)
2:50(cluster time)
1:25:35 (total CPU)
Copyright©2015TransparencyRightsManagement.Allrightsreserved
58
Performance
Output of tez-tool « yarn-swimlanes.sh »
• 1 « swimlane » per active container
• 1 colour per DAG Vertex (the black dots are actually the Vertex ID)
• Container occupation is pretty good while there is work to do
• (not demonstrated here) containers die when they are idle.
This is good!
Copyright©2015TransparencyRightsManagement.Allrightsreserved
59
CONCLUSION
Copyright©2015TransparencyRightsManagement.Allrightsreserved
60
As a conclusion…
A lot of effort so far… …but worth it!
Images: Nicholas Babaian // Flickr. Marathon du Médoc 2008
Copyright©2015TransparencyRightsManagement.Allrightsreserved
61
THANKS!
For building that tech
For helping out
For your attention today

More Related Content

What's hot

Red Hat Java Update and Quarkus Introduction
Red Hat Java Update and Quarkus IntroductionRed Hat Java Update and Quarkus Introduction
Red Hat Java Update and Quarkus IntroductionJohn Archer
 
Tackling non-determinism in Hadoop - Testing and debugging distributed system...
Tackling non-determinism in Hadoop - Testing and debugging distributed system...Tackling non-determinism in Hadoop - Testing and debugging distributed system...
Tackling non-determinism in Hadoop - Testing and debugging distributed system...Akihiro Suda
 
Docker Demystified - Virtual VMs without the Fat
Docker Demystified - Virtual VMs without the FatDocker Demystified - Virtual VMs without the Fat
Docker Demystified - Virtual VMs without the FatErik Osterman
 
State of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache BigtopState of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache BigtopGanesh Raju
 

What's hot (6)

Red Hat Java Update and Quarkus Introduction
Red Hat Java Update and Quarkus IntroductionRed Hat Java Update and Quarkus Introduction
Red Hat Java Update and Quarkus Introduction
 
MySQL-and-virtualization
MySQL-and-virtualizationMySQL-and-virtualization
MySQL-and-virtualization
 
Tackling non-determinism in Hadoop - Testing and debugging distributed system...
Tackling non-determinism in Hadoop - Testing and debugging distributed system...Tackling non-determinism in Hadoop - Testing and debugging distributed system...
Tackling non-determinism in Hadoop - Testing and debugging distributed system...
 
Juggva cloud
Juggva cloudJuggva cloud
Juggva cloud
 
Docker Demystified - Virtual VMs without the Fat
Docker Demystified - Virtual VMs without the FatDocker Demystified - Virtual VMs without the Fat
Docker Demystified - Virtual VMs without the Fat
 
State of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache BigtopState of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache Bigtop
 

Similar to Scalding on tez (final)

Pluggable Infrastructure with CI/CD and Docker
Pluggable Infrastructure with CI/CD and DockerPluggable Infrastructure with CI/CD and Docker
Pluggable Infrastructure with CI/CD and DockerBob Killen
 
Ephemeral DevOps: Adventures in Managing Short-Lived Systems
Ephemeral DevOps: Adventures in Managing Short-Lived SystemsEphemeral DevOps: Adventures in Managing Short-Lived Systems
Ephemeral DevOps: Adventures in Managing Short-Lived SystemsPriyanka Aash
 
Vagrant Workshop
Vagrant WorkshopVagrant Workshop
Vagrant Workshopsys army
 
Ceph, Xen, and CloudStack: Semper Melior-XPUS13 McGarry
Ceph, Xen, and CloudStack: Semper Melior-XPUS13 McGarryCeph, Xen, and CloudStack: Semper Melior-XPUS13 McGarry
Ceph, Xen, and CloudStack: Semper Melior-XPUS13 McGarryThe Linux Foundation
 
Java and Containers - Make it Awesome !
Java and Containers - Make it Awesome !Java and Containers - Make it Awesome !
Java and Containers - Make it Awesome !Dinakar Guniguntala
 
Container Networking: the Gotchas (Mesos London Meetup 11 May 2016)
Container Networking: the Gotchas (Mesos London Meetup 11 May 2016)Container Networking: the Gotchas (Mesos London Meetup 11 May 2016)
Container Networking: the Gotchas (Mesos London Meetup 11 May 2016)Andrew Randall
 
Kubecon seattle 2018 workshop slides
Kubecon seattle 2018 workshop slidesKubecon seattle 2018 workshop slides
Kubecon seattle 2018 workshop slidesWeaveworks
 
Fabric8 - Being devOps doesn't suck anymore
Fabric8 - Being devOps doesn't suck anymoreFabric8 - Being devOps doesn't suck anymore
Fabric8 - Being devOps doesn't suck anymoreHenryk Konsek
 
Vagrant crash course
Vagrant crash courseVagrant crash course
Vagrant crash courseMarcus Deglos
 
GC Tuning Confessions Of A Performance Engineer
GC Tuning Confessions Of A Performance EngineerGC Tuning Confessions Of A Performance Engineer
GC Tuning Confessions Of A Performance EngineerMonica Beckwith
 
Kubernetes 101 VMworld 2019 workshop slides
Kubernetes 101 VMworld 2019 workshop slidesKubernetes 101 VMworld 2019 workshop slides
Kubernetes 101 VMworld 2019 workshop slidesSimone Morellato
 
Nagios Conference 2014 - Gerald Combs - A Trillion Truths
Nagios Conference 2014 - Gerald Combs - A Trillion TruthsNagios Conference 2014 - Gerald Combs - A Trillion Truths
Nagios Conference 2014 - Gerald Combs - A Trillion TruthsNagios
 
[JOI] TOTVS Developers Joinville - Java #1
[JOI] TOTVS Developers Joinville - Java #1[JOI] TOTVS Developers Joinville - Java #1
[JOI] TOTVS Developers Joinville - Java #1Rubens Dos Santos Filho
 
The Container Security Checklist
The Container Security Checklist The Container Security Checklist
The Container Security Checklist LibbySchulze
 
Barak Merimovich (GIgaSpaces) & Gal Moav (Ravello) - Devstack on Demand, Open...
Barak Merimovich (GIgaSpaces) & Gal Moav (Ravello) - Devstack on Demand, Open...Barak Merimovich (GIgaSpaces) & Gal Moav (Ravello) - Devstack on Demand, Open...
Barak Merimovich (GIgaSpaces) & Gal Moav (Ravello) - Devstack on Demand, Open...Cloud Native Day Tel Aviv
 
WWCode Dallas - Kubernetes: Learning from Zero to Production
WWCode Dallas - Kubernetes: Learning from Zero to ProductionWWCode Dallas - Kubernetes: Learning from Zero to Production
WWCode Dallas - Kubernetes: Learning from Zero to ProductionRosemary Wang
 
Kubernetes Native Java and Eclipse MicroProfile | EclipseCon Europe 2019
Kubernetes Native Java and Eclipse MicroProfile | EclipseCon Europe 2019Kubernetes Native Java and Eclipse MicroProfile | EclipseCon Europe 2019
Kubernetes Native Java and Eclipse MicroProfile | EclipseCon Europe 2019The Eclipse Foundation
 

Similar to Scalding on tez (final) (20)

Pluggable Infrastructure with CI/CD and Docker
Pluggable Infrastructure with CI/CD and DockerPluggable Infrastructure with CI/CD and Docker
Pluggable Infrastructure with CI/CD and Docker
 
Ephemeral DevOps: Adventures in Managing Short-Lived Systems
Ephemeral DevOps: Adventures in Managing Short-Lived SystemsEphemeral DevOps: Adventures in Managing Short-Lived Systems
Ephemeral DevOps: Adventures in Managing Short-Lived Systems
 
Vagrant Workshop
Vagrant WorkshopVagrant Workshop
Vagrant Workshop
 
Make Accelerator Pluggable for Container Engine
Make Accelerator Pluggable for Container EngineMake Accelerator Pluggable for Container Engine
Make Accelerator Pluggable for Container Engine
 
Ceph, Xen, and CloudStack: Semper Melior-XPUS13 McGarry
Ceph, Xen, and CloudStack: Semper Melior-XPUS13 McGarryCeph, Xen, and CloudStack: Semper Melior-XPUS13 McGarry
Ceph, Xen, and CloudStack: Semper Melior-XPUS13 McGarry
 
Java and Containers - Make it Awesome !
Java and Containers - Make it Awesome !Java and Containers - Make it Awesome !
Java and Containers - Make it Awesome !
 
Container Networking: the Gotchas (Mesos London Meetup 11 May 2016)
Container Networking: the Gotchas (Mesos London Meetup 11 May 2016)Container Networking: the Gotchas (Mesos London Meetup 11 May 2016)
Container Networking: the Gotchas (Mesos London Meetup 11 May 2016)
 
Kubecon seattle 2018 workshop slides
Kubecon seattle 2018 workshop slidesKubecon seattle 2018 workshop slides
Kubecon seattle 2018 workshop slides
 
Fabric8 - Being devOps doesn't suck anymore
Fabric8 - Being devOps doesn't suck anymoreFabric8 - Being devOps doesn't suck anymore
Fabric8 - Being devOps doesn't suck anymore
 
Vagrant crash course
Vagrant crash courseVagrant crash course
Vagrant crash course
 
GC Tuning Confessions Of A Performance Engineer
GC Tuning Confessions Of A Performance EngineerGC Tuning Confessions Of A Performance Engineer
GC Tuning Confessions Of A Performance Engineer
 
Kubernetes 101 VMworld 2019 workshop slides
Kubernetes 101 VMworld 2019 workshop slidesKubernetes 101 VMworld 2019 workshop slides
Kubernetes 101 VMworld 2019 workshop slides
 
Nagios Conference 2014 - Gerald Combs - A Trillion Truths
Nagios Conference 2014 - Gerald Combs - A Trillion TruthsNagios Conference 2014 - Gerald Combs - A Trillion Truths
Nagios Conference 2014 - Gerald Combs - A Trillion Truths
 
Devstack On Demand
Devstack On DemandDevstack On Demand
Devstack On Demand
 
[JOI] TOTVS Developers Joinville - Java #1
[JOI] TOTVS Developers Joinville - Java #1[JOI] TOTVS Developers Joinville - Java #1
[JOI] TOTVS Developers Joinville - Java #1
 
The Container Security Checklist
The Container Security Checklist The Container Security Checklist
The Container Security Checklist
 
Barak Merimovich (GIgaSpaces) & Gal Moav (Ravello) - Devstack on Demand, Open...
Barak Merimovich (GIgaSpaces) & Gal Moav (Ravello) - Devstack on Demand, Open...Barak Merimovich (GIgaSpaces) & Gal Moav (Ravello) - Devstack on Demand, Open...
Barak Merimovich (GIgaSpaces) & Gal Moav (Ravello) - Devstack on Demand, Open...
 
WWCode Dallas - Kubernetes: Learning from Zero to Production
WWCode Dallas - Kubernetes: Learning from Zero to ProductionWWCode Dallas - Kubernetes: Learning from Zero to Production
WWCode Dallas - Kubernetes: Learning from Zero to Production
 
Kubernetes Native Java and Eclipse MicroProfile | EclipseCon Europe 2019
Kubernetes Native Java and Eclipse MicroProfile | EclipseCon Europe 2019Kubernetes Native Java and Eclipse MicroProfile | EclipseCon Europe 2019
Kubernetes Native Java and Eclipse MicroProfile | EclipseCon Europe 2019
 
London Hug 20/6 - Vault production
London Hug 20/6 - Vault productionLondon Hug 20/6 - Vault production
London Hug 20/6 - Vault production
 

Recently uploaded

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 

Recently uploaded (20)

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 

Scalding on tez (final)

Editor's Notes

  1. ZX-81: CC-BY-SA 3.0 Amos Evans / WP TO7-70: CC-BY-SA 2.0 « Rama » /WP PC-1512: CC-BY-SA 3.0 Marcin Wichary / WP
  2. Meet Jane. Jane loves music. And Jane’s favourite music video platform has all the music Jane loves. So Jane listens to music from the Platform.
  3. After october 2013: went on different things, the topic was left in storage for a while September 2014: new model, same concept; built on plain Cascading to simplify some of the hairiest SQL logic (Optiq lacks(ed) analytic functions, so the pretty much single SQL statement from SQL Server days had to be exploded into the 12 stages) Met guys from Lausanne at the end of September. Was already curious about Scala / Scalding then, decided to spend two days to give it a spin. Never turned back !
  4. Myriad’s still a wishlist item for now, as it doesn’t seem to play nice with YARN in HA mode.
  5. We REALLY don’t want to misrepresent our maturity level
  6. TEZ 0.6.2-SNAPSHOT is required, as Warning: TEZ 0.7 runtime is not API-compatible with 0.6 (altough the source-level API is quite close). Cascading might change the Tez dependency from time to time…
  7. The typical Hadoop+Tez stacks pulls in a Jetty, a Tomcat, a Jersey, multiple guavas, and the kitchen sink.
  8. We believe our workload requires 270-ish MiB of native memory. When we have time, we’ll either power down for extra sticks of RAM, or attempt to shave 20 MiB of heap per TezChild.
  9. (reportedly)
  10. Prune & Graft
  11. Prune & Graft
  12. Prune & Graft
  13. Why these two steps? The « same » code is getting executed in wildly different CLASSPATH: Cascading driver, TezChild, etc.
  14. Hash joins means hash joins, but also .filter/mapWithValue, joinWithTiny, etc.
  15. Hash joins means hash joins, but also .filter/mapWithValue, joinWithTiny, etc.
  16. Who wants to see another « Word Count » ?
  17. Who wants to see another « Word Count » ?
  18. Who wants to see another « Word Count » ?
  19. I’m not going to look into that, fairly standard code except where I’ve been naïve. You get the idea.
  20. « All of Apache goes to recent guavas… » or drops the library altogether. At the very least, every one not using the most recent version effing shades it.
  21. CC-BY-SA 2.0