Here are some tips on hiring and retaining top Big Data talent. Features : how to source candidates, how to interview them, interview techniques and mistakes.
Listen to video of presentation and download slides here : http://elephantscale.com/2017/03/building-successful-big-data-team-demand-webinar/
1. How To Build A Successful Big
Data Team
Sujee Maniyam
Founder, CEO @ Elephant Scale
A Big Data Training Company
www.ElephantScale.com
sujee@elephantscale.com
2. Hi, I am Sujee Maniyam J
u Big Data practitioner / educator
u 15 years+ software development
experience
u Founder Elephant Scale
www.ElephantScale.com
– Training in Big Data
u Author
– “Hadoop illuminated” open source book
– “HBase Design Patterns”
– "Data analytics with Hadoop and Spark" - O'Reilly Media
u Open Source contributor
http://github.com/sujee
u sujee@elephantscale.com
www.sujee.net
Big
Data
Training
(c) Elephant Scale 2017. All rights rserved. 2
3. Audience For This Talk
u Managers
u Directors
u HR
(c) Elephant Scale 2017. All rights rserved. 3
4. What Have I Seen ?
u I have been a Big Data consultant
u I have hired Big Data talent
u I have helped clients interview Big Data people
u We have trained 1000+ students in many countries…
u I have seen
– how companies are adopting Big Data
– And how they are going about building their big data teams
(c) Elephant Scale 2017. All rights rserved. 4
6. Is Big Data Shortage Real?
u Plenty of Gardner / McKinsey studies
u Data is only going to get bigger
– Connected devices
– Internet of Things (IoT)
u Big Data is no longer the luxury of social / web companies
– It is every where (retail / insurance / finance …etc)
(c) Elephant Scale 2017. All rights rserved. 6
8. Do you really need a big data team in house?
Own Team Outsourced
- Data is 'core part of business'
- Hire and nurture the right talent
- Once team is in play, move quickly
- Leverage expertise
- Up and running quickly
- Can take a while to build a team - Potential vendor issues (vendor
dependent)
- Not having the expertise in house
(c) Elephant Scale 2017. All rights rserved. 8
9. Big Data Roles
Role Big Data
Developer
Dev Ops Data Scientist /
Analyst
Job Build data
infrastructure
Keep things running Make sense of data
Skills Languages : Java /
Python / Scala
Linux
Big Data systems :
Hadoop / Spark /
NoSQL …
Admin skills (Linux /
Networking)
Scripting (python /
shell)
Deployments
(puppet / chef /
docker ...)
Languages : R ,
Python,
Good stats ‘/ math
Big Data tools
(Hadoop / Spark)
** Domain
knowledge **
Target pool Developers Admins Data analysts
(c) Elephant Scale 2017. All rights rserved. 9
11. Why Is It Hard To Hire Big Data Talent
u The technology is fairly young (< 10 years old)
– Smaller talent pool
– (Hadoop turned 10 yrs old in 2016)
u The technology *is* complex
– Lots of moving parts
u An effective Big Data person has to know a lot of other skills
– Linux
– Scripting
– Admin / trouble shooting
– Programming
u Experience is hard to get
u Expensive $$$$
(c) Elephant Scale 2017. All rights rserved. 11
12. Mistakes I’ve Seen
u Overly demanding job posts
“looking for 10 years Hadoop & 5 years HBase
experience”
u Very specific Job requirements
“Looking for Impala expert to tweak our queries”
u Offering too little pay
u Bad interview process
(c) Elephant Scale 2017. All rights rserved. 12
13. Generic Tips
u Source them properly
u Hire a ’strong generalist’ rather than a ‘specialist’
u ‘First hires’ are VERY VERY important.
u Use ‘amplification’
u Interview thoroughly
– Get outside help if needed
u Train internal teams vs. hiring from outside
(c) Elephant Scale 2017. All rights rserved. 13
14. Active Candidates vs. Passive Candidates
u Passive candidates
– Not actively looking
– ‘happy / content’ with their current job
– Usually great hires!
– Long hiring process
– And expensive $$$ J
u Active Candidates
– Actively looking for jobs
– Shorter hiring window
(c) Elephant Scale 2017. All rights rserved. 14
15. Generalist vs. Specialist
u Reasons for hiring a ‘strong generalist’
– Technologies evolve very rapidly.
May be using a different technology in few months.
– Can be trained to adopt new technologies quickly
– Larger talent pool
u Need ’specialists’ ?
“Looking for Cassandra experts to fine tune our
cluster”
– Hire consultants.
(c) Elephant Scale 2017. All rights rserved. 15
17. Sourcing Candidates : Traditional
u !!! Referrals !!!!
– Ask employees to recommend potential hires
– Way to get ‘passive’ candidates
u Recruiters
– Still works
– Measure the ‘effectiveness’
– Not simple ‘key word matching’ J
u Job Boards
– Usual bunch (Monster / Indeed …)
– Stack overflow
(c) Elephant Scale 2017. All rights rserved. 17
18. Big Data Specific Sourcing
u Recruiters ++
– Monitor mailing lists / github / LinkedIn activities
u Conferences
– Post on ’job board’
– Speaking / booths
u Meetups / Hackathons
– Host meetups
– !! Long / organic process !!
(c) Elephant Scale 2017. All rights rserved. 18
19. Tips For Hiring Open Source Committers / And
Away From Popular Companies
u They are in ‘very high demand’ ... And they know it !
u They usually don’t apply for jobs.
Approach them via a mutual contact.
u $$$ is important, but not the only thing..
u Offer a promotion:
– Software Eng @ X à Team Lead
– Manager @ Y à Director
u Entice them with other benefits
– Allow open source contribution time / time to write book
– Send them to conferences
– Work-from-home
(c) Elephant Scale 2017. All rights rserved. 19
20. ’First Hire’
u This person is VERY VERY important , if you are building Big Data
team.
u Need to be ‘hands-on’
– Definitely not ‘power-point-architects’ !!!!
– This person has to design and probably build first version of your Big
Data platform.
u Hire an experienced leader.
– He/She will need to build a team
– Will be your ‘face’ in recruiting
– Lead the team
u Amplification
u Get outside help for interviewing if needed
(c) Elephant Scale 2017. All rights rserved. 20
22. Amplification
u Couple of top developers as
leads
– Experienced
– $$$
– Architect / guidance
u Junior / mid-level talent
– ’generalists’
– Affordable $$$
– Can learn quickly under the
tutelage of senior developers
Lead
J Dev J Dev M Dev J Dev
(c) Elephant Scale 2017. All rights rserved. 22
24. Interview Process
u Usually interviews are done ‘ad-hoc’
– Interviewers don't have time to prepare
– Not enough co-ordination… same questions asked by multiple
interviewers
u One that worked for me:
– Have a review meeting (10 – 15 mins / candidate).
Review resume as a team
– Agree on overall questioning areas :
Tim : “I will ask about Big Data design”
Mark : “I will ask him about NoSQL”
Anand : ”I will quizz him on Java”
– Don’t exchange notes, untill every one had done interviews.
(Might wrongly influence the other person)
– ‘debrief’ meeting right after the interview with all interviewers
(c) Elephant Scale 2017. All rights rserved. 24
25. Actual Interview
u ‘Stick to the topics you agreed to with the team’
– Don’t freelance (unless you have spare time)
u My format
– “tell me about a project you had done”.
• Have them white board the solution
• Ask questions, challenge design a little bit J
– Ask concept questions (10 - 20)
– Are you working on any open source project?
– Present a problem and ask them to
• Design on white board
• Iterate over the design with them (‘throw curve balls')
• Write some code
(weed out power-point-architects)
(c) Elephant Scale 2017. All rights rserved. 25
26. My Ideal Candidate
u Experienced
– Must demonstrate ‘wide spectrum’ of knowledge
– Familiar with new technologies (even if haven’t used personally
is fine)
– Can evaluate technologies
– Hands on , write code
– Open source experience
u Junior
– Solid computer science fundamentals (algorithms / data
structures)
– Knows the concepts
– Working on own (open source) projects
– ‘Hungry’ to learn
(c) Elephant Scale 2017. All rights rserved. 26
27. Hiring Big Data Developers
u Must demonstrate ‘wide spectrum’ of knowledge
– Concepts
– Familiar with new technologies (even if haven’t used personally
is fine)
u Must be HANDS-ON
– Write code / write code / write code !
u Design a solution to a given problem
– Sketch out design on white board
– Choose technology stack
– Write code for some portions
(c) Elephant Scale 2017. All rights rserved. 27
28. Hiring Dev-Ops
u Must be scripting expert
– Ask them to write scripts
– Demo their projects / scripts
u Give them VMs in cloud and ask to perform a task
– E.g. setup a 3 node Cassandra cluster
– May be give them a day before interview to do this.
u Test their troubleshooting knowledge
– “X happened to our Hadoop cluster… what do you think caused
it? How we can prevent it”
(c) Elephant Scale 2017. All rights rserved. 28
30. Hiring Data Scientists / Analysts
u Do you want a ‘domain expert’ or ’generalist’
Banking / Finance
u Don’t confuse ‘data scientists’ with ’data engineers’.
Data Scientists don’t need to know the infrastructure work.. They
are focused on analyzing the data.
u Must have solid stats / math fundamentals.
u Must know a analytical language / framework : R / Spark
u Must have good graphics / presentations skills
u Beware of ’fake data scientists’.
”One regression does not data scientist make”
(c) Elephant Scale 2017. All rights rserved. 30
31. Interview WOW Moments
u “Show me something”
u Open source project demo by a college grad
u Implemented a working solution on his laptop during the
interview !
– And added a visualization component to boot
u Demonstrated an ‘spin up’ script for Hadoop cluster in cloud
– Spun up a 20 node Hadoop cluster in minutes.
(c) Elephant Scale 2017. All rights rserved. 31
32. Interview Mistakes
u Asking ‘trivia’ questions
– “write a linked-list”
u Letting candidate drone on about the project they had done
u Not asking to write code / white-board
u Asking the same questions over and over again
– Lack of coordination
(c) Elephant Scale 2017. All rights rserved. 32
33. Getting Help For Interviewing
u If you don’t have any Big Data people to interview…
– Get outside help.
u Specially important when doing ‘first hire’
u We had done this for our clients
– Contact me offline J
sujee@elephantscale.com
(c) Elephant Scale 2017. All rights rserved. 33
35. Training Your Own Team in Big Data
u Leverage existing team…
– You know them already...
– They know the company / work
– Teach them new technology à win / win
u We can help!
ElephantScale.com/training
Developer Track Data Science Track Dev Ops Track
Hadoop
Spark
NoSQL
Spark
Python
R
Hadoop
Spark
NoSQL
Scripting
(c) Elephant Scale 2017. All rights rserved. 35
36. Hiring Out of College !
u Great way to hire solid talent at affordable $$$
u Look for grads ’hungry’ to learn
u Good mentoring is a must
u Pair with a senior developer / mentor
Our College Bootcamp
Modern software development : Java /
Python / Scala
Development environments : Linux / Git
Big Data Stack : Hadoop / Spark /
NoSQL
Data Analytics stack : Python / R
(c) Elephant Scale 2017. All rights rserved. 36
37. How To Keep The Talent
u Competition is fierce…
there is always some one willing to pay higher $$$
u Experienced talent:
– Offer them leadership opportunities (in charge of a component /
project)
– Offer mentoring opportunities
– Learn new technologies / training
– Open source contribution time (20% , 50% , 100%)
– Go to conferences and speak
– Flexible work environment : reduce commute, work-from-home
u Young talent:
– Great mentoring
– Good learning environment (training ..etc)
– Offer ’grow into’ roles
(c) Elephant Scale 2017. All rights rserved. 37
39. Tips from Sunondo Gosh
u Sunondo Gosh
– Senior Director of Engineering @ Ellie Mae
– Former Director of Engineering @ Digital Insight
– https://www.linkedin.com/in/sunondoghosh/
u Tips:
– Look for 'can do' attitude
– Don't compromise on 'culture fit'
– Diversity is important
– Give them a 'continuous learning path'
– Automate as much as possible
u Sunondo's presentation :
http://bit.ly/2mEvOdv
(c) Elephant Scale 2017. All rights rserved. 39
40. Tips from Kuldip Pabla
u Kuldip Pabla
– Co-Founder and CTO, Cooldimi
– Former Senior Director @
Cloud Services Lab, Samsung
– https://www.linkedin.com/in/cooldeep/
u Tips
– Organize meetups
– hire experts - use the experts to train others, mainly strong java
engineers, this paid back. Employees stayed back as they were
learning
– build incrementally to keep the momentum
(c) Elephant Scale 2017. All rights rserved. 40