SlideShare a Scribd company logo
1 of 42
How To Build A Successful Big
Data Team
Sujee Maniyam
Founder, CEO @ Elephant Scale
A Big Data Training Company
www.ElephantScale.com
sujee@elephantscale.com
Hi, I am Sujee Maniyam J
u Big Data practitioner / educator
u 15 years+ software development
experience
u Founder Elephant Scale
www.ElephantScale.com
– Training in Big Data
u Author
– “Hadoop illuminated” open source book
– “HBase Design Patterns”
– "Data analytics with Hadoop and Spark" - O'Reilly Media
u Open Source contributor
http://github.com/sujee
u sujee@elephantscale.com
www.sujee.net
Big
Data
Training
(c) Elephant Scale 2017. All rights rserved. 2
Audience For This Talk
u Managers
u Directors
u HR
(c) Elephant Scale 2017. All rights rserved. 3
What Have I Seen ?
u I have been a Big Data consultant
u I have hired Big Data talent
u I have helped clients interview Big Data people
u We have trained 1000+ students in many countries…
u I have seen
– how companies are adopting Big Data
– And how they are going about building their big data teams
(c) Elephant Scale 2017. All rights rserved. 4
Background
Is Big Data Shortage Real?
u Plenty of Gardner / McKinsey studies
u Data is only going to get bigger
– Connected devices
– Internet of Things (IoT)
u Big Data is no longer the luxury of social / web companies
– It is every where (retail / insurance / finance …etc)
(c) Elephant Scale 2017. All rights rserved. 6
Some ‘research’ (Source : Indeed.com)
(c) Elephant Scale 2017. All rights rserved. 7
Do you really need a big data team in house?
Own Team Outsourced
- Data is 'core part of business'
- Hire and nurture the right talent
- Once team is in play, move quickly
- Leverage expertise
- Up and running quickly
- Can take a while to build a team - Potential vendor issues (vendor
dependent)
- Not having the expertise in house
(c) Elephant Scale 2017. All rights rserved. 8
Big Data Roles
Role Big Data
Developer
Dev Ops Data Scientist /
Analyst
Job Build data
infrastructure
Keep things running Make sense of data
Skills Languages : Java /
Python / Scala
Linux
Big Data systems :
Hadoop / Spark /
NoSQL …
Admin skills (Linux /
Networking)
Scripting (python /
shell)
Deployments
(puppet / chef /
docker ...)
Languages : R ,
Python,
Good stats ‘/ math
Big Data tools
(Hadoop / Spark)
** Domain
knowledge **
Target pool Developers Admins Data analysts
(c) Elephant Scale 2017. All rights rserved. 9
Hiring
Why Is It Hard To Hire Big Data Talent
u The technology is fairly young (< 10 years old)
– Smaller talent pool
– (Hadoop turned 10 yrs old in 2016)
u The technology *is* complex
– Lots of moving parts
u An effective Big Data person has to know a lot of other skills
– Linux
– Scripting
– Admin / trouble shooting
– Programming
u Experience is hard to get
u Expensive $$$$
(c) Elephant Scale 2017. All rights rserved. 11
Mistakes I’ve Seen
u Overly demanding job posts
“looking for 10 years Hadoop & 5 years HBase
experience”
u Very specific Job requirements
“Looking for Impala expert to tweak our queries”
u Offering too little pay
u Bad interview process
(c) Elephant Scale 2017. All rights rserved. 12
Generic Tips
u Source them properly
u Hire a ’strong generalist’ rather than a ‘specialist’
u ‘First hires’ are VERY VERY important.
u Use ‘amplification’
u Interview thoroughly
– Get outside help if needed
u Train internal teams vs. hiring from outside
(c) Elephant Scale 2017. All rights rserved. 13
Active Candidates vs. Passive Candidates
u Passive candidates
– Not actively looking
– ‘happy / content’ with their current job
– Usually great hires!
– Long hiring process
– And expensive $$$ J
u Active Candidates
– Actively looking for jobs
– Shorter hiring window
(c) Elephant Scale 2017. All rights rserved. 14
Generalist vs. Specialist
u Reasons for hiring a ‘strong generalist’
– Technologies evolve very rapidly.
May be using a different technology in few months.
– Can be trained to adopt new technologies quickly
– Larger talent pool
u Need ’specialists’ ?
“Looking for Cassandra experts to fine tune our
cluster”
– Hire consultants.
(c) Elephant Scale 2017. All rights rserved. 15
Sourcing Candidates
u Old resume
u New Resume
(c) Elephant Scale 2017. All rights rserved. 16
Sourcing Candidates : Traditional
u !!! Referrals !!!!
– Ask employees to recommend potential hires
– Way to get ‘passive’ candidates
u Recruiters
– Still works
– Measure the ‘effectiveness’
– Not simple ‘key word matching’ J
u Job Boards
– Usual bunch (Monster / Indeed …)
– Stack overflow
(c) Elephant Scale 2017. All rights rserved. 17
Big Data Specific Sourcing
u Recruiters ++
– Monitor mailing lists / github / LinkedIn activities
u Conferences
– Post on ’job board’
– Speaking / booths
u Meetups / Hackathons
– Host meetups
– !! Long / organic process !!
(c) Elephant Scale 2017. All rights rserved. 18
Tips For Hiring Open Source Committers / And
Away From Popular Companies
u They are in ‘very high demand’ ... And they know it !
u They usually don’t apply for jobs.
Approach them via a mutual contact.
u $$$ is important, but not the only thing..
u Offer a promotion:
– Software Eng @ X à Team Lead
– Manager @ Y à Director
u Entice them with other benefits
– Allow open source contribution time / time to write book
– Send them to conferences
– Work-from-home
(c) Elephant Scale 2017. All rights rserved. 19
’First Hire’
u This person is VERY VERY important , if you are building Big Data
team.
u Need to be ‘hands-on’
– Definitely not ‘power-point-architects’ !!!!
– This person has to design and probably build first version of your Big
Data platform.
u Hire an experienced leader.
– He/She will need to build a team
– Will be your ‘face’ in recruiting
– Lead the team
u Amplification
u Get outside help for interviewing if needed
(c) Elephant Scale 2017. All rights rserved. 20
Amplification
Team
Lead
Younglings
(c) Elephant Scale 2017. All rights rserved. 21
Amplification
u Couple of top developers as
leads
– Experienced
– $$$
– Architect / guidance
u Junior / mid-level talent
– ’generalists’
– Affordable $$$
– Can learn quickly under the
tutelage of senior developers
Lead
J Dev J Dev M Dev J Dev
(c) Elephant Scale 2017. All rights rserved. 22
Interview Process
Interview Process
u Usually interviews are done ‘ad-hoc’
– Interviewers don't have time to prepare
– Not enough co-ordination… same questions asked by multiple
interviewers
u One that worked for me:
– Have a review meeting (10 – 15 mins / candidate).
Review resume as a team
– Agree on overall questioning areas :
Tim : “I will ask about Big Data design”
Mark : “I will ask him about NoSQL”
Anand : ”I will quizz him on Java”
– Don’t exchange notes, untill every one had done interviews.
(Might wrongly influence the other person)
– ‘debrief’ meeting right after the interview with all interviewers
(c) Elephant Scale 2017. All rights rserved. 24
Actual Interview
u ‘Stick to the topics you agreed to with the team’
– Don’t freelance (unless you have spare time)
u My format
– “tell me about a project you had done”.
• Have them white board the solution
• Ask questions, challenge design a little bit J
– Ask concept questions (10 - 20)
– Are you working on any open source project?
– Present a problem and ask them to
• Design on white board
• Iterate over the design with them (‘throw curve balls')
• Write some code
(weed out power-point-architects)
(c) Elephant Scale 2017. All rights rserved. 25
My Ideal Candidate
u Experienced
– Must demonstrate ‘wide spectrum’ of knowledge
– Familiar with new technologies (even if haven’t used personally
is fine)
– Can evaluate technologies
– Hands on , write code
– Open source experience
u Junior
– Solid computer science fundamentals (algorithms / data
structures)
– Knows the concepts
– Working on own (open source) projects
– ‘Hungry’ to learn
(c) Elephant Scale 2017. All rights rserved. 26
Hiring Big Data Developers
u Must demonstrate ‘wide spectrum’ of knowledge
– Concepts
– Familiar with new technologies (even if haven’t used personally
is fine)
u Must be HANDS-ON
– Write code / write code / write code !
u Design a solution to a given problem
– Sketch out design on white board
– Choose technology stack
– Write code for some portions
(c) Elephant Scale 2017. All rights rserved. 27
Hiring Dev-Ops
u Must be scripting expert
– Ask them to write scripts
– Demo their projects / scripts
u Give them VMs in cloud and ask to perform a task
– E.g. setup a 3 node Cassandra cluster
– May be give them a day before interview to do this.
u Test their troubleshooting knowledge
– “X happened to our Hadoop cluster… what do you think caused
it? How we can prevent it”
(c) Elephant Scale 2017. All rights rserved. 28
‘Data Scientist’
(c) Elephant Scale 2017. All rights rserved. 29
Hiring Data Scientists / Analysts
u Do you want a ‘domain expert’ or ’generalist’
Banking / Finance
u Don’t confuse ‘data scientists’ with ’data engineers’.
Data Scientists don’t need to know the infrastructure work.. They
are focused on analyzing the data.
u Must have solid stats / math fundamentals.
u Must know a analytical language / framework : R / Spark
u Must have good graphics / presentations skills
u Beware of ’fake data scientists’.
”One regression does not data scientist make”
(c) Elephant Scale 2017. All rights rserved. 30
Interview WOW Moments
u “Show me something”
u Open source project demo by a college grad
u Implemented a working solution on his laptop during the
interview !
– And added a visualization component to boot
u Demonstrated an ‘spin up’ script for Hadoop cluster in cloud
– Spun up a 20 node Hadoop cluster in minutes.
(c) Elephant Scale 2017. All rights rserved. 31
Interview Mistakes
u Asking ‘trivia’ questions
– “write a linked-list”
u Letting candidate drone on about the project they had done
u Not asking to write code / white-board
u Asking the same questions over and over again
– Lack of coordination
(c) Elephant Scale 2017. All rights rserved. 32
Getting Help For Interviewing
u If you don’t have any Big Data people to interview…
– Get outside help.
u Specially important when doing ‘first hire’
u We had done this for our clients
– Contact me offline J
sujee@elephantscale.com
(c) Elephant Scale 2017. All rights rserved. 33
Team Development / Training
Training Your Own Team in Big Data
u Leverage existing team…
– You know them already...
– They know the company / work
– Teach them new technology à win / win
u We can help!
ElephantScale.com/training
Developer Track Data Science Track Dev Ops Track
Hadoop
Spark
NoSQL
Spark
Python
R
Hadoop
Spark
NoSQL
Scripting
(c) Elephant Scale 2017. All rights rserved. 35
Hiring Out of College !
u Great way to hire solid talent at affordable $$$
u Look for grads ’hungry’ to learn
u Good mentoring is a must
u Pair with a senior developer / mentor
Our College Bootcamp
Modern software development : Java /
Python / Scala
Development environments : Linux / Git
Big Data Stack : Hadoop / Spark /
NoSQL
Data Analytics stack : Python / R
(c) Elephant Scale 2017. All rights rserved. 36
How To Keep The Talent
u Competition is fierce…
there is always some one willing to pay higher $$$
u Experienced talent:
– Offer them leadership opportunities (in charge of a component /
project)
– Offer mentoring opportunities
– Learn new technologies / training
– Open source contribution time (20% , 50% , 100%)
– Go to conferences and speak
– Flexible work environment : reduce commute, work-from-home
u Young talent:
– Great mentoring
– Good learning environment (training ..etc)
– Offer ’grow into’ roles
(c) Elephant Scale 2017. All rights rserved. 37
Final Tips
Tips from Sunondo Gosh
u Sunondo Gosh
– Senior Director of Engineering @ Ellie Mae
– Former Director of Engineering @ Digital Insight
– https://www.linkedin.com/in/sunondoghosh/
u Tips:
– Look for 'can do' attitude
– Don't compromise on 'culture fit'
– Diversity is important
– Give them a 'continuous learning path'
– Automate as much as possible
u Sunondo's presentation :
http://bit.ly/2mEvOdv
(c) Elephant Scale 2017. All rights rserved. 39
Tips from Kuldip Pabla
u Kuldip Pabla
– Co-Founder and CTO, Cooldimi
– Former Senior Director @
Cloud Services Lab, Samsung
– https://www.linkedin.com/in/cooldeep/
u Tips
– Organize meetups
– hire experts - use the experts to train others, mainly strong java
engineers, this paid back. Employees stayed back as they were
learning
– build incrementally to keep the momentum
(c) Elephant Scale 2017. All rights rserved. 40
Thanks !
Sujee Maniyam
sujee@elephantscale.com
ElephantScale.com
Expert training in Big Data
(c) Elephant Scale 2017. All rights rserved.
Big
Data
Training
41
Resources
u https://datajobs.com/big-data-jobs-recruiting
(c) Elephant Scale 2017. All rights rserved. 42

More Related Content

Viewers also liked

Viewers also liked (7)

Entreprenariat feminin 2016 - parlement wallon
Entreprenariat feminin 2016 - parlement wallonEntreprenariat feminin 2016 - parlement wallon
Entreprenariat feminin 2016 - parlement wallon
 
This is what christmas looked like the year you were born
This is what christmas looked like the year you were bornThis is what christmas looked like the year you were born
This is what christmas looked like the year you were born
 
2016 Salesforce Release Highlights
2016 Salesforce Release Highlights2016 Salesforce Release Highlights
2016 Salesforce Release Highlights
 
BALANCE-FREE-MINIMAL-POWERPOINT-KEYNOTE-TEMPLATE
BALANCE-FREE-MINIMAL-POWERPOINT-KEYNOTE-TEMPLATEBALANCE-FREE-MINIMAL-POWERPOINT-KEYNOTE-TEMPLATE
BALANCE-FREE-MINIMAL-POWERPOINT-KEYNOTE-TEMPLATE
 
Design for Jihad
Design for JihadDesign for Jihad
Design for Jihad
 
Guia do Participante - TEDxUFSCar 2016
Guia do Participante - TEDxUFSCar 2016Guia do Participante - TEDxUFSCar 2016
Guia do Participante - TEDxUFSCar 2016
 
The Future of Open Educational Resources
The Future of Open Educational ResourcesThe Future of Open Educational Resources
The Future of Open Educational Resources
 

Similar to Building a Big Data Team

A Comprehensive Learning Path to Become a Data Science 2021.pptx
A Comprehensive Learning Path to Become a Data Science 2021.pptxA Comprehensive Learning Path to Become a Data Science 2021.pptx
A Comprehensive Learning Path to Become a Data Science 2021.pptx
RajSingh512965
 
Skribb.it Berkeley Final Presentation
Skribb.it Berkeley Final PresentationSkribb.it Berkeley Final Presentation
Skribb.it Berkeley Final Presentation
Stanford University
 

Similar to Building a Big Data Team (20)

A Comprehensive Learning Path to Become a Data Science 2021.pptx
A Comprehensive Learning Path to Become a Data Science 2021.pptxA Comprehensive Learning Path to Become a Data Science 2021.pptx
A Comprehensive Learning Path to Become a Data Science 2021.pptx
 
Real World Lessons Using Lean UX (Workshop)
Real World Lessons Using Lean UX (Workshop)Real World Lessons Using Lean UX (Workshop)
Real World Lessons Using Lean UX (Workshop)
 
The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016 The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016
 
Big Data Oslo v 4 Sci Code: "Current Industry Projects within AI and the Best...
Big Data Oslo v 4 Sci Code: "Current Industry Projects within AI and the Best...Big Data Oslo v 4 Sci Code: "Current Industry Projects within AI and the Best...
Big Data Oslo v 4 Sci Code: "Current Industry Projects within AI and the Best...
 
Ibm welcome and cognitive 20171012 v11
Ibm welcome and cognitive 20171012 v11Ibm welcome and cognitive 20171012 v11
Ibm welcome and cognitive 20171012 v11
 
Building successful data science teams
Building successful data science teamsBuilding successful data science teams
Building successful data science teams
 
Scaling from new start to enterprise platform
Scaling from new start to enterprise platformScaling from new start to enterprise platform
Scaling from new start to enterprise platform
 
From Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data ScienceFrom Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data Science
 
Info Session : University Institute of engineering and technology , Kurukshet...
Info Session : University Institute of engineering and technology , Kurukshet...Info Session : University Institute of engineering and technology , Kurukshet...
Info Session : University Institute of engineering and technology , Kurukshet...
 
Power BI storytelling 101
Power BI storytelling 101Power BI storytelling 101
Power BI storytelling 101
 
Dr atif shahzad_engg_ management_cost management
Dr atif shahzad_engg_ management_cost managementDr atif shahzad_engg_ management_cost management
Dr atif shahzad_engg_ management_cost management
 
Touchpoint 2012 Symposium on Interaction Design: Notes
Touchpoint 2012 Symposium on Interaction Design: NotesTouchpoint 2012 Symposium on Interaction Design: Notes
Touchpoint 2012 Symposium on Interaction Design: Notes
 
Folien zur Vorlesung Wirtschaftsinformatik
Folien zur Vorlesung WirtschaftsinformatikFolien zur Vorlesung Wirtschaftsinformatik
Folien zur Vorlesung Wirtschaftsinformatik
 
Better Together - Design Thinking, Agile e Lean Startup
Better Together - Design Thinking, Agile e Lean StartupBetter Together - Design Thinking, Agile e Lean Startup
Better Together - Design Thinking, Agile e Lean Startup
 
DevOps for Dinosaurs
DevOps for DinosaursDevOps for Dinosaurs
DevOps for Dinosaurs
 
Skribb.it Berkeley Final Presentation
Skribb.it Berkeley Final PresentationSkribb.it Berkeley Final Presentation
Skribb.it Berkeley Final Presentation
 
From Lab to Factory: Or how to turn data into value
From Lab to Factory: Or how to turn data into valueFrom Lab to Factory: Or how to turn data into value
From Lab to Factory: Or how to turn data into value
 
Tips and Tricks for a Great Dev Platform
Tips and Tricks for a Great Dev PlatformTips and Tricks for a Great Dev Platform
Tips and Tricks for a Great Dev Platform
 
National STEM League - Student Goals and Academic Glue
National STEM League - Student Goals and Academic GlueNational STEM League - Student Goals and Academic Glue
National STEM League - Student Goals and Academic Glue
 
Introduction to legal design: Product & project management
Introduction to legal design: Product & project managementIntroduction to legal design: Product & project management
Introduction to legal design: Product & project management
 

More from elephantscale

More from elephantscale (8)

AI for Kids
AI for KidsAI for Kids
AI for Kids
 
How to obtain the Cloudera Data Engineer Certification
How to obtain the Cloudera Data Engineer CertificationHow to obtain the Cloudera Data Engineer Certification
How to obtain the Cloudera Data Engineer Certification
 
Petrophysics and Big Data by Elephant Scale training and consultin
Petrophysics and Big Data by Elephant Scale training and consultinPetrophysics and Big Data by Elephant Scale training and consultin
Petrophysics and Big Data by Elephant Scale training and consultin
 
Changing the game with cloud dw
Changing the game with cloud dwChanging the game with cloud dw
Changing the game with cloud dw
 
Oil & Gas Big Data use cases
Oil & Gas Big Data use casesOil & Gas Big Data use cases
Oil & Gas Big Data use cases
 
Machine Learning with Spark
Machine Learning with SparkMachine Learning with Spark
Machine Learning with Spark
 
Reference architecture for Internet Of Things
Reference architecture for Internet Of ThingsReference architecture for Internet Of Things
Reference architecture for Internet Of Things
 
Hadoop to spark_v2
Hadoop to spark_v2Hadoop to spark_v2
Hadoop to spark_v2
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Building a Big Data Team

  • 1. How To Build A Successful Big Data Team Sujee Maniyam Founder, CEO @ Elephant Scale A Big Data Training Company www.ElephantScale.com sujee@elephantscale.com
  • 2. Hi, I am Sujee Maniyam J u Big Data practitioner / educator u 15 years+ software development experience u Founder Elephant Scale www.ElephantScale.com – Training in Big Data u Author – “Hadoop illuminated” open source book – “HBase Design Patterns” – "Data analytics with Hadoop and Spark" - O'Reilly Media u Open Source contributor http://github.com/sujee u sujee@elephantscale.com www.sujee.net Big Data Training (c) Elephant Scale 2017. All rights rserved. 2
  • 3. Audience For This Talk u Managers u Directors u HR (c) Elephant Scale 2017. All rights rserved. 3
  • 4. What Have I Seen ? u I have been a Big Data consultant u I have hired Big Data talent u I have helped clients interview Big Data people u We have trained 1000+ students in many countries… u I have seen – how companies are adopting Big Data – And how they are going about building their big data teams (c) Elephant Scale 2017. All rights rserved. 4
  • 6. Is Big Data Shortage Real? u Plenty of Gardner / McKinsey studies u Data is only going to get bigger – Connected devices – Internet of Things (IoT) u Big Data is no longer the luxury of social / web companies – It is every where (retail / insurance / finance …etc) (c) Elephant Scale 2017. All rights rserved. 6
  • 7. Some ‘research’ (Source : Indeed.com) (c) Elephant Scale 2017. All rights rserved. 7
  • 8. Do you really need a big data team in house? Own Team Outsourced - Data is 'core part of business' - Hire and nurture the right talent - Once team is in play, move quickly - Leverage expertise - Up and running quickly - Can take a while to build a team - Potential vendor issues (vendor dependent) - Not having the expertise in house (c) Elephant Scale 2017. All rights rserved. 8
  • 9. Big Data Roles Role Big Data Developer Dev Ops Data Scientist / Analyst Job Build data infrastructure Keep things running Make sense of data Skills Languages : Java / Python / Scala Linux Big Data systems : Hadoop / Spark / NoSQL … Admin skills (Linux / Networking) Scripting (python / shell) Deployments (puppet / chef / docker ...) Languages : R , Python, Good stats ‘/ math Big Data tools (Hadoop / Spark) ** Domain knowledge ** Target pool Developers Admins Data analysts (c) Elephant Scale 2017. All rights rserved. 9
  • 11. Why Is It Hard To Hire Big Data Talent u The technology is fairly young (< 10 years old) – Smaller talent pool – (Hadoop turned 10 yrs old in 2016) u The technology *is* complex – Lots of moving parts u An effective Big Data person has to know a lot of other skills – Linux – Scripting – Admin / trouble shooting – Programming u Experience is hard to get u Expensive $$$$ (c) Elephant Scale 2017. All rights rserved. 11
  • 12. Mistakes I’ve Seen u Overly demanding job posts “looking for 10 years Hadoop & 5 years HBase experience” u Very specific Job requirements “Looking for Impala expert to tweak our queries” u Offering too little pay u Bad interview process (c) Elephant Scale 2017. All rights rserved. 12
  • 13. Generic Tips u Source them properly u Hire a ’strong generalist’ rather than a ‘specialist’ u ‘First hires’ are VERY VERY important. u Use ‘amplification’ u Interview thoroughly – Get outside help if needed u Train internal teams vs. hiring from outside (c) Elephant Scale 2017. All rights rserved. 13
  • 14. Active Candidates vs. Passive Candidates u Passive candidates – Not actively looking – ‘happy / content’ with their current job – Usually great hires! – Long hiring process – And expensive $$$ J u Active Candidates – Actively looking for jobs – Shorter hiring window (c) Elephant Scale 2017. All rights rserved. 14
  • 15. Generalist vs. Specialist u Reasons for hiring a ‘strong generalist’ – Technologies evolve very rapidly. May be using a different technology in few months. – Can be trained to adopt new technologies quickly – Larger talent pool u Need ’specialists’ ? “Looking for Cassandra experts to fine tune our cluster” – Hire consultants. (c) Elephant Scale 2017. All rights rserved. 15
  • 16. Sourcing Candidates u Old resume u New Resume (c) Elephant Scale 2017. All rights rserved. 16
  • 17. Sourcing Candidates : Traditional u !!! Referrals !!!! – Ask employees to recommend potential hires – Way to get ‘passive’ candidates u Recruiters – Still works – Measure the ‘effectiveness’ – Not simple ‘key word matching’ J u Job Boards – Usual bunch (Monster / Indeed …) – Stack overflow (c) Elephant Scale 2017. All rights rserved. 17
  • 18. Big Data Specific Sourcing u Recruiters ++ – Monitor mailing lists / github / LinkedIn activities u Conferences – Post on ’job board’ – Speaking / booths u Meetups / Hackathons – Host meetups – !! Long / organic process !! (c) Elephant Scale 2017. All rights rserved. 18
  • 19. Tips For Hiring Open Source Committers / And Away From Popular Companies u They are in ‘very high demand’ ... And they know it ! u They usually don’t apply for jobs. Approach them via a mutual contact. u $$$ is important, but not the only thing.. u Offer a promotion: – Software Eng @ X à Team Lead – Manager @ Y à Director u Entice them with other benefits – Allow open source contribution time / time to write book – Send them to conferences – Work-from-home (c) Elephant Scale 2017. All rights rserved. 19
  • 20. ’First Hire’ u This person is VERY VERY important , if you are building Big Data team. u Need to be ‘hands-on’ – Definitely not ‘power-point-architects’ !!!! – This person has to design and probably build first version of your Big Data platform. u Hire an experienced leader. – He/She will need to build a team – Will be your ‘face’ in recruiting – Lead the team u Amplification u Get outside help for interviewing if needed (c) Elephant Scale 2017. All rights rserved. 20
  • 22. Amplification u Couple of top developers as leads – Experienced – $$$ – Architect / guidance u Junior / mid-level talent – ’generalists’ – Affordable $$$ – Can learn quickly under the tutelage of senior developers Lead J Dev J Dev M Dev J Dev (c) Elephant Scale 2017. All rights rserved. 22
  • 24. Interview Process u Usually interviews are done ‘ad-hoc’ – Interviewers don't have time to prepare – Not enough co-ordination… same questions asked by multiple interviewers u One that worked for me: – Have a review meeting (10 – 15 mins / candidate). Review resume as a team – Agree on overall questioning areas : Tim : “I will ask about Big Data design” Mark : “I will ask him about NoSQL” Anand : ”I will quizz him on Java” – Don’t exchange notes, untill every one had done interviews. (Might wrongly influence the other person) – ‘debrief’ meeting right after the interview with all interviewers (c) Elephant Scale 2017. All rights rserved. 24
  • 25. Actual Interview u ‘Stick to the topics you agreed to with the team’ – Don’t freelance (unless you have spare time) u My format – “tell me about a project you had done”. • Have them white board the solution • Ask questions, challenge design a little bit J – Ask concept questions (10 - 20) – Are you working on any open source project? – Present a problem and ask them to • Design on white board • Iterate over the design with them (‘throw curve balls') • Write some code (weed out power-point-architects) (c) Elephant Scale 2017. All rights rserved. 25
  • 26. My Ideal Candidate u Experienced – Must demonstrate ‘wide spectrum’ of knowledge – Familiar with new technologies (even if haven’t used personally is fine) – Can evaluate technologies – Hands on , write code – Open source experience u Junior – Solid computer science fundamentals (algorithms / data structures) – Knows the concepts – Working on own (open source) projects – ‘Hungry’ to learn (c) Elephant Scale 2017. All rights rserved. 26
  • 27. Hiring Big Data Developers u Must demonstrate ‘wide spectrum’ of knowledge – Concepts – Familiar with new technologies (even if haven’t used personally is fine) u Must be HANDS-ON – Write code / write code / write code ! u Design a solution to a given problem – Sketch out design on white board – Choose technology stack – Write code for some portions (c) Elephant Scale 2017. All rights rserved. 27
  • 28. Hiring Dev-Ops u Must be scripting expert – Ask them to write scripts – Demo their projects / scripts u Give them VMs in cloud and ask to perform a task – E.g. setup a 3 node Cassandra cluster – May be give them a day before interview to do this. u Test their troubleshooting knowledge – “X happened to our Hadoop cluster… what do you think caused it? How we can prevent it” (c) Elephant Scale 2017. All rights rserved. 28
  • 29. ‘Data Scientist’ (c) Elephant Scale 2017. All rights rserved. 29
  • 30. Hiring Data Scientists / Analysts u Do you want a ‘domain expert’ or ’generalist’ Banking / Finance u Don’t confuse ‘data scientists’ with ’data engineers’. Data Scientists don’t need to know the infrastructure work.. They are focused on analyzing the data. u Must have solid stats / math fundamentals. u Must know a analytical language / framework : R / Spark u Must have good graphics / presentations skills u Beware of ’fake data scientists’. ”One regression does not data scientist make” (c) Elephant Scale 2017. All rights rserved. 30
  • 31. Interview WOW Moments u “Show me something” u Open source project demo by a college grad u Implemented a working solution on his laptop during the interview ! – And added a visualization component to boot u Demonstrated an ‘spin up’ script for Hadoop cluster in cloud – Spun up a 20 node Hadoop cluster in minutes. (c) Elephant Scale 2017. All rights rserved. 31
  • 32. Interview Mistakes u Asking ‘trivia’ questions – “write a linked-list” u Letting candidate drone on about the project they had done u Not asking to write code / white-board u Asking the same questions over and over again – Lack of coordination (c) Elephant Scale 2017. All rights rserved. 32
  • 33. Getting Help For Interviewing u If you don’t have any Big Data people to interview… – Get outside help. u Specially important when doing ‘first hire’ u We had done this for our clients – Contact me offline J sujee@elephantscale.com (c) Elephant Scale 2017. All rights rserved. 33
  • 34. Team Development / Training
  • 35. Training Your Own Team in Big Data u Leverage existing team… – You know them already... – They know the company / work – Teach them new technology à win / win u We can help! ElephantScale.com/training Developer Track Data Science Track Dev Ops Track Hadoop Spark NoSQL Spark Python R Hadoop Spark NoSQL Scripting (c) Elephant Scale 2017. All rights rserved. 35
  • 36. Hiring Out of College ! u Great way to hire solid talent at affordable $$$ u Look for grads ’hungry’ to learn u Good mentoring is a must u Pair with a senior developer / mentor Our College Bootcamp Modern software development : Java / Python / Scala Development environments : Linux / Git Big Data Stack : Hadoop / Spark / NoSQL Data Analytics stack : Python / R (c) Elephant Scale 2017. All rights rserved. 36
  • 37. How To Keep The Talent u Competition is fierce… there is always some one willing to pay higher $$$ u Experienced talent: – Offer them leadership opportunities (in charge of a component / project) – Offer mentoring opportunities – Learn new technologies / training – Open source contribution time (20% , 50% , 100%) – Go to conferences and speak – Flexible work environment : reduce commute, work-from-home u Young talent: – Great mentoring – Good learning environment (training ..etc) – Offer ’grow into’ roles (c) Elephant Scale 2017. All rights rserved. 37
  • 39. Tips from Sunondo Gosh u Sunondo Gosh – Senior Director of Engineering @ Ellie Mae – Former Director of Engineering @ Digital Insight – https://www.linkedin.com/in/sunondoghosh/ u Tips: – Look for 'can do' attitude – Don't compromise on 'culture fit' – Diversity is important – Give them a 'continuous learning path' – Automate as much as possible u Sunondo's presentation : http://bit.ly/2mEvOdv (c) Elephant Scale 2017. All rights rserved. 39
  • 40. Tips from Kuldip Pabla u Kuldip Pabla – Co-Founder and CTO, Cooldimi – Former Senior Director @ Cloud Services Lab, Samsung – https://www.linkedin.com/in/cooldeep/ u Tips – Organize meetups – hire experts - use the experts to train others, mainly strong java engineers, this paid back. Employees stayed back as they were learning – build incrementally to keep the momentum (c) Elephant Scale 2017. All rights rserved. 40
  • 41. Thanks ! Sujee Maniyam sujee@elephantscale.com ElephantScale.com Expert training in Big Data (c) Elephant Scale 2017. All rights rserved. Big Data Training 41