23. ISOLATION (JOBCLIENT)
• UDFs used in query enabled
• Executing CREATE TEMPORARY FUNCTION
• Add databases/tables from PlazmaDB to Metastore
• Executing CREATE DATABASE/TABLE
25. ISOLATION (JOBCLIENT)
• The good:
• High level of isolation
• OOM deals protect jobs from each other
• The bad:
• Job setup costs are a bit high
26. ISOLATION (IN CLUSTER)
• Using Hadoop resource pools:
• 1 account 1 resource pool (not counting sub-pools)
• Based on price plan max and min running containers are set
• Currently 6711 pools in production
27. ISOLATION (IN CLUSTER)
• The good part:
• Relatively low cost to guarantee minimum resources
• Jobs can still burst to max if resources are free in the cluster
28. ISOLATION (IN CLUSTER)
•The bad parts:
• Due to too many pools meaning cluster separation is needed
• The Resourcemanager tends to get slow with too many pools
• Some unsafe UDFs needs to be disabled
• java_method()
• reflect()
31. WHAT IS PTD?
• Patchset Treasure Data
• Name first coined by these two
@frsyuki @tagomoris
32. WHAT IS PTD?
• Patchset Treasure Data
• Name first coined by these two
• Still in development @frsyuki @tagomoris
33. WHAT IS PTD?
• Patchset Treasure Data
• Name first coined by these two
• Still in development
• Original plan
@frsyuki @tagomoris
34. WHAT IS PTD?
• Patchset Treasure Data
• Name first coined by these two
• Still in development
• Original plan
• Base all internal Hadoop components on latest community edition
• Simplify releases to keep an as current version as possible
@frsyuki @tagomoris
35. WHAT IS PTD?
• Patchset Treasure Data
• Name first coined by these two
• Still in development
• Original plan
• Base all internal Hadoop components on latest community edition
• Simplify releases to keep an as current version as possible
• What it’s turning into
@frsyuki @tagomoris
36. WHAT IS PTD?
• Patchset Treasure Data
• Name first coined by these two
• Still in development
• Original plan
• Base all internal Hadoop components on latest community edition
• Simplify releases to keep an as current version as possible
• What it’s turning into
• A complete overhaul of most things related to Hadoop
@frsyuki @tagomoris
43. ELEPHANT SERVER
• Provides REST api for job submission and monitoring
• All Hive/Pig related code separated from the generic worker
• Distributed on memory queue managing job progress
47. JOB PRESERVING RESTARTS
• A new instance of the server is started joining the Hazelcast cluster and repeatedly trying
to start REST server
• The old instance goes into shutdown mode (not starting new jobs but keep current ones
running)