SlideShare a Scribd company logo
1 of 82
Download to read offline
Hadoop Adventures
At Spotify
Adam Kawa
Data Engineer
How many times
were you told
bad news?
How many times
did your excellent weekend
change into a workday?
How many times
did you change your passion
for glory?
How many times
did you
surrender?
Many typos and grammar errors ;)

Hadoop nightmares guaranteed!
My answer to each of these questions:
At least once
Why Hadoop Adventures?
A rapidly growing cluster!
Configuration settings stop working
Masters have troubles coordinating slaves
More data + users + jobs
Own mistakes

The disk space is consumed
quicker and quicker

Many nodes were added
What feelings did you get when it happened?
Responsibility, Critical, Expensive,
Interruptions, Frustration, Stress, Email,
Ashamed,
Cooperation, Learning, Surprise,
Fun, Exciting, Happy Pride
About Me

I needed to learn more about
operating systems, hardware, networking...

A Java developer
who was developing MapReduce, Pig and Hive jobs
who started operating Hadoop and HBase on a 5-node cluster
who later joined Spotify
to meet a 138x larger Elephant
About The Elephant
I met this Elephant
He grows rapidly from 60 to 690 nodes
He stores 6.29PT of data
He receives 50TB of data each day
He seems to be the largest in Europe
He did not like me at all
His friends did not like me too
We had problems living together!
I want to make them PUBLIC now!

Hands off
a little boy!
What Will I Talk About?
Five real (and my favourite) Hadoop incidents that
broke our cluster
or made it very unstable
What Will I Share?
Real problems
Real mistakes
Real lessons learned
Real graphs
Real numbers
Real emails
Real excerpts from our conversations
What Will Also I Share?
The mistake that I made and I did not like to talk about
Feeling: Ashamed
Adventure 1 (February 2013)

Troubles running more
resource-intensive Hive queries
Problem

A user can not run resource-intensive Hive queries
It happened immediately after significantly expanding the cluster
Description
The queries are valid
The queries run successfully on small datasets
But they fail on large datasets
Surprisingly they run successfully through other user accounts
The user has right permissions to HDFS directories and Hive tables
Observations
When this user runs a resource-intensive query
The cluster experiences stability problems
The NameNode becomes less responsive
It was losing the connection with DataNodes and mark them dead
But the DataNode daemons are running completely fine
The NameNode is throwing thousands of warnings and exceptions
14592 times only during 8 min (4768/min in a peak)
Normally
Hadoop is a very trusty elephant
The username comes from the client machine (and is not verified)
The groupname is resolved on the NameNode server
Using the shell command ''id -Gn <username>''
If a user does not have an account on the NameNode server
The ExitCodeException exception is thrown
Possible Fixes
Create an user account on the NameNode server (dirty, insecure)
Use AD/LDAP for a user-group resolution
hadoop.security.group.mapping.ldap.* settings
If you also need the full-authentication, deploy Kerberos
Our Fix
We decided to use LDAP for a user-group resolution
However, LDAP settings in Hadoop did not work for us
Because posixGroup is not a supported filter group class
We found a workaround using nsswitch.conf
Lesson Learned
Know who is going to use your cluster
Know who is abusing the cluster (HDFS access and MapReduce jobs)
Parse the NameNode logs regularly
Look for FATAL, ERROR, Exception messages
Especially before and after expanding the cluster
Adventure 2 (March 2013)

DataNodes become
blocked sometimes (Part 1)
Problem
The DataNodes are marked dead by the NameNode
The DataNodes have not sent a heartbeat for a long time

Last Contact (seconds)
When a DataNode Is Marked Dead
A costly block replication process is started
Alive DataNodes consume resources to recreate missing replicas
Map tasks are processing non-local blocks
Tasks and jobs experience problems
They can fail or be not started at all (BlockMissingException)
Observations
$ps shows the DataNode daemons exist on their servers
sudo jstack -F <datanode-pid> does not show anything interesting
Some of the DataNodes automatically re-join the cluster
The DataNodes seem to be blocked for a while
First (Relief) Idea!
Let's temporarily increase the datanode-liveness interval to get some
breathing room...
Wouter:

Maybe we should up the

Adam:
Wouter:
Adam:

What value could be OK? 15, maybe 20 minutes?
20 maybe? It will give us a bit more breathing room.
Yes, the replication will not start so quickly..

dfs.namenode.heartbeat.recheck-interval?
A patch was quickly written, reviewed and deployed.

The formula is:
2 * dfs.namenode.heartbeat.recheck.interval + 10 * dfs.heartbeat.interval

and it should give us
2 * 600 sec + 10 * 3 sec → 20min:30sec
Crazy Idea!
Let's do other (important) tasks after deploying this (simple) change
… and do not measure its impact
Adventure 3 (March 2013)

The cluster becomes unstable
after around 1 hour of uptime
Problem
Each time, around 1 hour after restarting the NameNode
Majority (or all) of the DataNodes are marked dead by the NameNode
$ps shows the the DataNode daemons
The DataNode servers are running fine
Question?
Why does it happen each time,
exactly around 1 hour after restarting the NameNode?
First Idea!
''When a DataNode initially starts up,
as well as every hour thereafter,
it sends a block report to the NameNode''
You can read it in
* a book,
* blog posts,
* even see it in the code!
Maybe?
A storm of block reports coming after the first hour
heavily overloads the NameNode?
starts a heavy garbage collection phase that freezes the NameNode?
Could it be a right fix?
Increase dfs.blockreport.initialDelay to something bigger
than zero
Delay for first block report in seconds – a random
value between 0 and dfs.blockreport.initialDelay.
Because
Changing initialDelay requires a restart of the NameNode
The restart will be longer (the block reports are sent with a delay)

… and it is a sunny Sunday afternoon
Let's also deploy memory+GC changes to the NN at the same time
We will solve more problems in fewer iterations!
A couple of hours later...
Wouter:
Yeah, looks like we're pretty good!
Adam:
Good timing, one evening before Monday...
Johannes: Yeah, maybe I'll get two or three hours of weekend!
A couple of days later...
WARNING : There are 10779564 missing blocks.
Please check the logs or run fsck in order to identify the missing blocks.
Number
of
dead
DataNodes
A Mail Of Shame!

I had a dilemma whether to silently fix this interval or send this email...
A Reply Of Support!
Next (Shocking) Finding!

NO storm of block report after the 1st hour means that ... tuning heap and GC helped
Too Many Lessons Learned
Measure the impact of each single change
Never make bulk changes to the cluster
Double-check a description of a configuration parameter
Double-check the default values of configuration parameters
Question (almost) everything
Troubleshoot the cluster together interactively and non-interactively
Share the knowledge, even if you make a mistake
Give a support to your team-mates, even if they fail
Adventure 2 (March 2013)

DataNodes become blocked
sometimes (Part 2)
Observations

sudo jstack -F <datanode-pid> does not show anything interesting
sudo -u hdfs jstack <datanode-pid> shows something interesting

462 threads are blocked/waiting
19 threads are runnable
The threads are waiting for the same lock, to finish operations like:
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.getDfsUsed
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.initReplicaRecovery
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.finalizeBlock
The lock is held by other thread that is … waiting for something else
recoverRbw tries to stop the other thread
it has the lock (the monitor object) that the other is waiting for
This deadlock is
Described in HDFS-3655
Duplicated by HDFS-5016
Related to HDFS-4851

Together

with many
other

threads
Lesson Learned
Master the troubleshooting and monitoring tools
Contribute to Apache Hadoop more
Send a post to Hadoop mailing list
Raise a well-documented JIRA ticket
Implement a patch
Adventure 4 (March 2013)

Tasks in Hive jobs are
constantly ''KILLED UNCLEAN by
the user''
Problem
Surprisingly, ad-hoc Hive queries are running extremely long
Thousands of task are constantly being killed
Only 1 task failed,
2x more task
were killed
than
were completed
The logs show that the JobTracker gets a request to kill the tasks
Who can actually send a kill request?
User (using e.g. mapred job -kill-task)
JobTracker (a speculative duplicate, or when a whole job fails)
Fair Scheduler (if the preemption is enabled)
Two suspicious characters have a good alibi
Users
She/he is friendly, patient and respective to others
JobTracker
The speculative execution is disabled
Jobs have not fail (yet)
Key Observations
Tasks are usually killed quickly after the start
Surviving tasks are running fine for long time
Ad-hoc Hive queries are running in their own Fair Scheduler's pool
Eureka!
FairScheduler prefers to kill young!
Preempt the newest tasks
in an over-share pools
to forcibly give their slots
to starving pools
Hive pool was running over its minimum and fair shares
Other pools were starving
So that
Fair Scheduler was (legally) killing Hive tasks from time to time

Fair Scheduler can kill to be KIND...
Possible Fixes
Tune minimum shares based on your workload
Tune preemption timeouts based on your workload
Disable the preemption
Limit the number of map/reduce tasks in a pool
Limit the number of jobs in a pool
Alternative
Capacity Scheduler
Lessons Learned
A scheduler should NOT be considered as a ''black-box''
Adventure 5 (April 2013)

JobTracker runs
super slowly
Problem
JobTracker becomes super slow... (but used to be super snappy)
Everybody is annoyed by unresponsive JobTracker web interface
Observations
Many large jobs are running on the cluster
Only the largest one (at that time) run 58.538 tasks
It aims to process 21.82 TB of data
1.5 year of data from one of our datasets
It is an ad-hoc Hive job
It is still easy to run
a large Hive job
even if hive.mapred.mode
is set to strict...
Possible Solution
Limit the number of tasks per job using
mapred.jobtracker.maxtasks.per.job

Unfortunately,
there are no separate properties
for map and reduce tasks
Real-Life Dialogue
Adam:
What value for mapred.jobtracker.maxtasks.per.job?
Sven:
Maybe 50K
Adam:
Why not 30K?
Sven:
We can run large production jobs and it is safer
to start with a high value and maybe decrease it later.
Adam:
Hmmm? Sounds good...!
But maybe 40K, not 50K? ;)
Sven:
OK. Let's meet in the middle...
The previous approach is a great example of “guesstimation”
Negotiation skills are required!
Question?
Should a real data-driven company
make such a decision
based on a guess or data?
The data and the answer
Based on 436K jobs from two months
The largest production job created 22.6K tasks (still to many!)
The jobs that create more than 23K tasks
Usually ad-hoc Hive queries or Python streaming jobs
These jobs fail, or are killed by impatient users in most cases
Hadoop can be used to …
process data generated by Hadoop ;)
Lessons Learned
Make data-driven decisions
Use Hadoop to analyze … Hadoop
Administrators should cooperate with developers+analysts
Developers+analysts will often adapt to the changes
Negotiation skills are still useful
A technical blog
with some bytes of humour!

More Adventures at HakunaMapData.com
Nodes are marked dead due to a heavy swapping
The out-of-memory killer is killing TaskTrackers
The script for daily data cleaning runs more than one day!
Adventure

The nearest future will be
BIGGER
YARN + Tez + Fast SQL
Grand Lesson Learned
Hadoop is like a kid!
It needs love and care
It can make you proud, but it also will cause problems
It will grow and change quickly
It will bring new friends home
It will surprise you
Are there any issues
questions? ;)
BONUS!
One Question:
What can happen after some time of simultaneous
development of MapReduce jobs
maintenance of a large cluster
and listening to perfect music for every moment?
A Possible Answer:
You may discover Hadoop in the lyrics of many popular songs!
Want to join the band?
Check out spotify.com/jobs or @Spotifyjobs
for more information
kawaa@spotify.com
HakunaMapData.com
Thank you!

More Related Content

What's hot

Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceZeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceDatabricks
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021StreamNative
 
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenData Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenDatabricks
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkBo Yang
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBill Liu
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Introduction to Apache Calcite
Introduction to Apache CalciteIntroduction to Apache Calcite
Introduction to Apache CalciteJordan Halterman
 
Batch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache FlinkBatch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache FlinkVasia Kalavri
 
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Spark Operator—Deploy, Manage and Monitor Spark clusters on KubernetesDatabricks
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheDremio Corporation
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumTathastu.ai
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesNishith Agarwal
 
Building a semantic/metrics layer using Calcite
Building a semantic/metrics layer using CalciteBuilding a semantic/metrics layer using Calcite
Building a semantic/metrics layer using CalciteJulian Hyde
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodDatabricks
 
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesApache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesShivji Kumar Jha
 
Adding measures to Calcite SQL
Adding measures to Calcite SQLAdding measures to Calcite SQL
Adding measures to Calcite SQLJulian Hyde
 
Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Kevin Weil
 
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, ConfluentKafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, ConfluentHostedbyConfluent
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent
 

What's hot (20)

Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceZeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
 
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenData Discovery at Databricks with Amundsen
Data Discovery at Databricks with Amundsen
 
Spotify: behind the scenes
Spotify: behind the scenesSpotify: behind the scenes
Spotify: behind the scenes
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Introduction to Apache Calcite
Introduction to Apache CalciteIntroduction to Apache Calcite
Introduction to Apache Calcite
 
Batch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache FlinkBatch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache Flink
 
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 
Building a semantic/metrics layer using Calcite
Building a semantic/metrics layer using CalciteBuilding a semantic/metrics layer using Calcite
Building a semantic/metrics layer using Calcite
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
 
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesApache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
 
Adding measures to Calcite SQL
Adding measures to Calcite SQLAdding measures to Calcite SQL
Adding measures to Calcite SQL
 
Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)
 
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, ConfluentKafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 

Similar to Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)

The Past, Present, and Future of Hadoop at LinkedIn
The Past, Present, and Future of Hadoop at LinkedInThe Past, Present, and Future of Hadoop at LinkedIn
The Past, Present, and Future of Hadoop at LinkedInCarl Steinbach
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...DataWorks Summit/Hadoop Summit
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupesh Bansal
 
node.js, javascript and the future
node.js, javascript and the futurenode.js, javascript and the future
node.js, javascript and the futureJeff Miccolis
 
How Many Slaves (Ukoug)
How Many Slaves (Ukoug)How Many Slaves (Ukoug)
How Many Slaves (Ukoug)Doug Burns
 
Lessons learnt on a 2000-core cluster
Lessons learnt on a 2000-core clusterLessons learnt on a 2000-core cluster
Lessons learnt on a 2000-core clusterEugene Kirpichov
 
Pilot Tech Talk #10 — Practical automation by Kamil Cholewiński
Pilot Tech Talk #10 — Practical automation by Kamil CholewińskiPilot Tech Talk #10 — Practical automation by Kamil Cholewiński
Pilot Tech Talk #10 — Practical automation by Kamil CholewińskiPilot
 
Whirlwind tour of Hadoop and HIve
Whirlwind tour of Hadoop and HIveWhirlwind tour of Hadoop and HIve
Whirlwind tour of Hadoop and HIveEdward Capriolo
 
Distributed Systems: scalability and high availability
Distributed Systems: scalability and high availabilityDistributed Systems: scalability and high availability
Distributed Systems: scalability and high availabilityRenato Lucindo
 
Puppet for Sys Admins
Puppet for Sys AdminsPuppet for Sys Admins
Puppet for Sys AdminsPuppet
 
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Edureka!
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherJanBask Training
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015Christopher Curtin
 
SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!Andraz Tori
 
A gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and HadoopA gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and HadoopStefano Paluello
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
 

Similar to Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013) (20)

The Past, Present, and Future of Hadoop at LinkedIn
The Past, Present, and Future of Hadoop at LinkedInThe Past, Present, and Future of Hadoop at LinkedIn
The Past, Present, and Future of Hadoop at LinkedIn
 
LinkedIn
LinkedInLinkedIn
LinkedIn
 
Os Whitaker
Os WhitakerOs Whitaker
Os Whitaker
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
 
node.js, javascript and the future
node.js, javascript and the futurenode.js, javascript and the future
node.js, javascript and the future
 
How Many Slaves (Ukoug)
How Many Slaves (Ukoug)How Many Slaves (Ukoug)
How Many Slaves (Ukoug)
 
Lessons learnt on a 2000-core cluster
Lessons learnt on a 2000-core clusterLessons learnt on a 2000-core cluster
Lessons learnt on a 2000-core cluster
 
Pilot Tech Talk #10 — Practical automation by Kamil Cholewiński
Pilot Tech Talk #10 — Practical automation by Kamil CholewińskiPilot Tech Talk #10 — Practical automation by Kamil Cholewiński
Pilot Tech Talk #10 — Practical automation by Kamil Cholewiński
 
Whirlwind tour of Hadoop and HIve
Whirlwind tour of Hadoop and HIveWhirlwind tour of Hadoop and HIve
Whirlwind tour of Hadoop and HIve
 
Hadoop
HadoopHadoop
Hadoop
 
Distributed Systems: scalability and high availability
Distributed Systems: scalability and high availabilityDistributed Systems: scalability and high availability
Distributed Systems: scalability and high availability
 
Puppet for Sys Admins
Puppet for Sys AdminsPuppet for Sys Admins
Puppet for Sys Admins
 
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for Fresher
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
 
Hadoop Internals
Hadoop InternalsHadoop Internals
Hadoop Internals
 
SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!
 
A gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and HadoopA gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and Hadoop
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 

More from Adam Kawa

Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Adam Kawa
 
Big Data At Spotify
Big Data At SpotifyBig Data At Spotify
Big Data At SpotifyAdam Kawa
 
Apache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeApache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeAdam Kawa
 
Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)
Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)
Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)Adam Kawa
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationAdam Kawa
 
Apache Hadoop Java API
Apache Hadoop Java APIApache Hadoop Java API
Apache Hadoop Java APIAdam Kawa
 
Apache Hadoop Ecosystem (based on an exemplary data-driven…
Apache Hadoop Ecosystem (based on an exemplary data-driven…Apache Hadoop Ecosystem (based on an exemplary data-driven…
Apache Hadoop Ecosystem (based on an exemplary data-driven…Adam Kawa
 
Apache Hadoop YARN
Apache Hadoop YARNApache Hadoop YARN
Apache Hadoop YARNAdam Kawa
 
Data model for analysis of scholarly documents in the MapReduce paradigm
Data model for analysis of scholarly documents in the MapReduce paradigm Data model for analysis of scholarly documents in the MapReduce paradigm
Data model for analysis of scholarly documents in the MapReduce paradigm Adam Kawa
 
Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGIntroduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGAdam Kawa
 
Systemy rekomendacji
Systemy rekomendacjiSystemy rekomendacji
Systemy rekomendacjiAdam Kawa
 
Introduction To Elastic MapReduce at WHUG
Introduction To Elastic MapReduce at WHUGIntroduction To Elastic MapReduce at WHUG
Introduction To Elastic MapReduce at WHUGAdam Kawa
 

More from Adam Kawa (12)

Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
 
Big Data At Spotify
Big Data At SpotifyBig Data At Spotify
Big Data At Spotify
 
Apache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeApache Hadoop In Theory And Practice
Apache Hadoop In Theory And Practice
 
Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)
Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)
Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS Federation
 
Apache Hadoop Java API
Apache Hadoop Java APIApache Hadoop Java API
Apache Hadoop Java API
 
Apache Hadoop Ecosystem (based on an exemplary data-driven…
Apache Hadoop Ecosystem (based on an exemplary data-driven…Apache Hadoop Ecosystem (based on an exemplary data-driven…
Apache Hadoop Ecosystem (based on an exemplary data-driven…
 
Apache Hadoop YARN
Apache Hadoop YARNApache Hadoop YARN
Apache Hadoop YARN
 
Data model for analysis of scholarly documents in the MapReduce paradigm
Data model for analysis of scholarly documents in the MapReduce paradigm Data model for analysis of scholarly documents in the MapReduce paradigm
Data model for analysis of scholarly documents in the MapReduce paradigm
 
Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGIntroduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUG
 
Systemy rekomendacji
Systemy rekomendacjiSystemy rekomendacji
Systemy rekomendacji
 
Introduction To Elastic MapReduce at WHUG
Introduction To Elastic MapReduce at WHUGIntroduction To Elastic MapReduce at WHUG
Introduction To Elastic MapReduce at WHUG
 

Recently uploaded

Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 

Recently uploaded (20)

Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 

Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)

  • 2. How many times were you told bad news?
  • 3.
  • 4. How many times did your excellent weekend change into a workday?
  • 5.
  • 6. How many times did you change your passion for glory?
  • 7.
  • 8. How many times did you surrender?
  • 9. Many typos and grammar errors ;) Hadoop nightmares guaranteed!
  • 10. My answer to each of these questions: At least once
  • 11. Why Hadoop Adventures? A rapidly growing cluster! Configuration settings stop working Masters have troubles coordinating slaves More data + users + jobs Own mistakes The disk space is consumed quicker and quicker Many nodes were added
  • 12. What feelings did you get when it happened? Responsibility, Critical, Expensive, Interruptions, Frustration, Stress, Email, Ashamed, Cooperation, Learning, Surprise, Fun, Exciting, Happy Pride
  • 13. About Me I needed to learn more about operating systems, hardware, networking... A Java developer who was developing MapReduce, Pig and Hive jobs who started operating Hadoop and HBase on a 5-node cluster who later joined Spotify to meet a 138x larger Elephant
  • 14. About The Elephant I met this Elephant He grows rapidly from 60 to 690 nodes He stores 6.29PT of data He receives 50TB of data each day He seems to be the largest in Europe He did not like me at all His friends did not like me too We had problems living together! I want to make them PUBLIC now! Hands off a little boy!
  • 15. What Will I Talk About? Five real (and my favourite) Hadoop incidents that broke our cluster or made it very unstable
  • 16. What Will I Share? Real problems Real mistakes Real lessons learned Real graphs Real numbers Real emails Real excerpts from our conversations
  • 17. What Will Also I Share? The mistake that I made and I did not like to talk about Feeling: Ashamed
  • 18. Adventure 1 (February 2013) Troubles running more resource-intensive Hive queries
  • 19. Problem A user can not run resource-intensive Hive queries It happened immediately after significantly expanding the cluster
  • 20. Description The queries are valid The queries run successfully on small datasets But they fail on large datasets Surprisingly they run successfully through other user accounts The user has right permissions to HDFS directories and Hive tables
  • 21. Observations When this user runs a resource-intensive query The cluster experiences stability problems The NameNode becomes less responsive It was losing the connection with DataNodes and mark them dead But the DataNode daemons are running completely fine
  • 22. The NameNode is throwing thousands of warnings and exceptions 14592 times only during 8 min (4768/min in a peak)
  • 23. Normally Hadoop is a very trusty elephant The username comes from the client machine (and is not verified) The groupname is resolved on the NameNode server Using the shell command ''id -Gn <username>'' If a user does not have an account on the NameNode server The ExitCodeException exception is thrown
  • 24. Possible Fixes Create an user account on the NameNode server (dirty, insecure) Use AD/LDAP for a user-group resolution hadoop.security.group.mapping.ldap.* settings If you also need the full-authentication, deploy Kerberos
  • 25. Our Fix We decided to use LDAP for a user-group resolution However, LDAP settings in Hadoop did not work for us Because posixGroup is not a supported filter group class We found a workaround using nsswitch.conf
  • 26. Lesson Learned Know who is going to use your cluster Know who is abusing the cluster (HDFS access and MapReduce jobs) Parse the NameNode logs regularly Look for FATAL, ERROR, Exception messages Especially before and after expanding the cluster
  • 27. Adventure 2 (March 2013) DataNodes become blocked sometimes (Part 1)
  • 28. Problem The DataNodes are marked dead by the NameNode The DataNodes have not sent a heartbeat for a long time Last Contact (seconds)
  • 29. When a DataNode Is Marked Dead A costly block replication process is started Alive DataNodes consume resources to recreate missing replicas Map tasks are processing non-local blocks Tasks and jobs experience problems They can fail or be not started at all (BlockMissingException)
  • 30. Observations $ps shows the DataNode daemons exist on their servers sudo jstack -F <datanode-pid> does not show anything interesting Some of the DataNodes automatically re-join the cluster The DataNodes seem to be blocked for a while
  • 31. First (Relief) Idea! Let's temporarily increase the datanode-liveness interval to get some breathing room... Wouter: Maybe we should up the Adam: Wouter: Adam: What value could be OK? 15, maybe 20 minutes? 20 maybe? It will give us a bit more breathing room. Yes, the replication will not start so quickly.. dfs.namenode.heartbeat.recheck-interval?
  • 32. A patch was quickly written, reviewed and deployed. The formula is: 2 * dfs.namenode.heartbeat.recheck.interval + 10 * dfs.heartbeat.interval and it should give us 2 * 600 sec + 10 * 3 sec → 20min:30sec
  • 33. Crazy Idea! Let's do other (important) tasks after deploying this (simple) change … and do not measure its impact
  • 34. Adventure 3 (March 2013) The cluster becomes unstable after around 1 hour of uptime
  • 35. Problem Each time, around 1 hour after restarting the NameNode Majority (or all) of the DataNodes are marked dead by the NameNode $ps shows the the DataNode daemons The DataNode servers are running fine
  • 36. Question? Why does it happen each time, exactly around 1 hour after restarting the NameNode?
  • 37. First Idea! ''When a DataNode initially starts up, as well as every hour thereafter, it sends a block report to the NameNode'' You can read it in * a book, * blog posts, * even see it in the code!
  • 38. Maybe? A storm of block reports coming after the first hour heavily overloads the NameNode? starts a heavy garbage collection phase that freezes the NameNode?
  • 39. Could it be a right fix? Increase dfs.blockreport.initialDelay to something bigger than zero Delay for first block report in seconds – a random value between 0 and dfs.blockreport.initialDelay.
  • 40. Because Changing initialDelay requires a restart of the NameNode The restart will be longer (the block reports are sent with a delay) … and it is a sunny Sunday afternoon Let's also deploy memory+GC changes to the NN at the same time We will solve more problems in fewer iterations!
  • 41. A couple of hours later...
  • 42. Wouter: Yeah, looks like we're pretty good! Adam: Good timing, one evening before Monday... Johannes: Yeah, maybe I'll get two or three hours of weekend!
  • 43. A couple of days later...
  • 44. WARNING : There are 10779564 missing blocks. Please check the logs or run fsck in order to identify the missing blocks. Number of dead DataNodes
  • 45. A Mail Of Shame! I had a dilemma whether to silently fix this interval or send this email...
  • 46. A Reply Of Support!
  • 47. Next (Shocking) Finding! NO storm of block report after the 1st hour means that ... tuning heap and GC helped
  • 48. Too Many Lessons Learned Measure the impact of each single change Never make bulk changes to the cluster Double-check a description of a configuration parameter Double-check the default values of configuration parameters Question (almost) everything Troubleshoot the cluster together interactively and non-interactively Share the knowledge, even if you make a mistake Give a support to your team-mates, even if they fail
  • 49. Adventure 2 (March 2013) DataNodes become blocked sometimes (Part 2)
  • 50. Observations sudo jstack -F <datanode-pid> does not show anything interesting sudo -u hdfs jstack <datanode-pid> shows something interesting 462 threads are blocked/waiting 19 threads are runnable The threads are waiting for the same lock, to finish operations like: org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.getDfsUsed org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.initReplicaRecovery org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.finalizeBlock
  • 51. The lock is held by other thread that is … waiting for something else
  • 52. recoverRbw tries to stop the other thread it has the lock (the monitor object) that the other is waiting for This deadlock is Described in HDFS-3655 Duplicated by HDFS-5016 Related to HDFS-4851 Together with many other threads
  • 53. Lesson Learned Master the troubleshooting and monitoring tools Contribute to Apache Hadoop more Send a post to Hadoop mailing list Raise a well-documented JIRA ticket Implement a patch
  • 54. Adventure 4 (March 2013) Tasks in Hive jobs are constantly ''KILLED UNCLEAN by the user''
  • 55. Problem Surprisingly, ad-hoc Hive queries are running extremely long Thousands of task are constantly being killed
  • 56. Only 1 task failed, 2x more task were killed than were completed
  • 57.
  • 58. The logs show that the JobTracker gets a request to kill the tasks Who can actually send a kill request? User (using e.g. mapred job -kill-task) JobTracker (a speculative duplicate, or when a whole job fails) Fair Scheduler (if the preemption is enabled)
  • 59. Two suspicious characters have a good alibi Users She/he is friendly, patient and respective to others JobTracker The speculative execution is disabled Jobs have not fail (yet)
  • 60. Key Observations Tasks are usually killed quickly after the start Surviving tasks are running fine for long time Ad-hoc Hive queries are running in their own Fair Scheduler's pool
  • 61. Eureka! FairScheduler prefers to kill young! Preempt the newest tasks in an over-share pools to forcibly give their slots to starving pools
  • 62. Hive pool was running over its minimum and fair shares Other pools were starving So that Fair Scheduler was (legally) killing Hive tasks from time to time Fair Scheduler can kill to be KIND...
  • 63. Possible Fixes Tune minimum shares based on your workload Tune preemption timeouts based on your workload Disable the preemption Limit the number of map/reduce tasks in a pool Limit the number of jobs in a pool Alternative Capacity Scheduler
  • 64. Lessons Learned A scheduler should NOT be considered as a ''black-box''
  • 65. Adventure 5 (April 2013) JobTracker runs super slowly
  • 66. Problem JobTracker becomes super slow... (but used to be super snappy) Everybody is annoyed by unresponsive JobTracker web interface
  • 67. Observations Many large jobs are running on the cluster Only the largest one (at that time) run 58.538 tasks It aims to process 21.82 TB of data 1.5 year of data from one of our datasets It is an ad-hoc Hive job It is still easy to run a large Hive job even if hive.mapred.mode is set to strict...
  • 68. Possible Solution Limit the number of tasks per job using mapred.jobtracker.maxtasks.per.job Unfortunately, there are no separate properties for map and reduce tasks
  • 69. Real-Life Dialogue Adam: What value for mapred.jobtracker.maxtasks.per.job? Sven: Maybe 50K Adam: Why not 30K? Sven: We can run large production jobs and it is safer to start with a high value and maybe decrease it later. Adam: Hmmm? Sounds good...! But maybe 40K, not 50K? ;) Sven: OK. Let's meet in the middle...
  • 70. The previous approach is a great example of “guesstimation” Negotiation skills are required! Question? Should a real data-driven company make such a decision based on a guess or data?
  • 71. The data and the answer Based on 436K jobs from two months The largest production job created 22.6K tasks (still to many!) The jobs that create more than 23K tasks Usually ad-hoc Hive queries or Python streaming jobs These jobs fail, or are killed by impatient users in most cases Hadoop can be used to … process data generated by Hadoop ;)
  • 72. Lessons Learned Make data-driven decisions Use Hadoop to analyze … Hadoop Administrators should cooperate with developers+analysts Developers+analysts will often adapt to the changes Negotiation skills are still useful
  • 73. A technical blog with some bytes of humour! More Adventures at HakunaMapData.com Nodes are marked dead due to a heavy swapping The out-of-memory killer is killing TaskTrackers The script for daily data cleaning runs more than one day!
  • 74. Adventure The nearest future will be BIGGER YARN + Tez + Fast SQL
  • 75. Grand Lesson Learned Hadoop is like a kid! It needs love and care It can make you proud, but it also will cause problems It will grow and change quickly It will bring new friends home It will surprise you
  • 76. Are there any issues questions? ;)
  • 78. One Question: What can happen after some time of simultaneous development of MapReduce jobs maintenance of a large cluster and listening to perfect music for every moment?
  • 79. A Possible Answer: You may discover Hadoop in the lyrics of many popular songs!
  • 80.
  • 81. Want to join the band? Check out spotify.com/jobs or @Spotifyjobs for more information kawaa@spotify.com HakunaMapData.com