Setting up
a Big Data
TeamBest Practices
(*) perspective from
Mohammed AL Madhani.
Sources of this presentation
Sources used in this presentation
Building Data Science Teams
By Paco Nathan
Publisher: O'Reilly Media
Big Data trend in Google trends
Bigdata vs IOT
36 different types of sensors
= 30$ in Amazon.com!!
Bigdata vs IOT vs data science trends
The Business Impacts of Data Science
What is a Data Scientist?
Business Intelligence vs Data Science
Some commonly used methods/model
Discrete event simulation
Queuing model
Monte Carlo simulation
Agent-based modeling
System dynamics
Game theory
Probabilities
Economic analysis, IRR , NPV , FV
Linear regression
Stepwise regression–method
Logistic regression
Confidence intervals
Hypothesis testing
Statistical inferences
Design of experiments
Analysis of variance Principal component analysis (PCA)
Data mining
Forecasting
Artificial neural networks
Fuzzy logic
Expert systems
Decision trees
Markov chain
Revenue management (yield management)
Optimization
Linear
programming
Integer programming
DataScientest
Data artist of turning Data
into action
Skills:
65%
30%
45%
75%
Math
Computer
science
Statistics
Domain expertise
Experience:
Doing data representations and Using algorithms for optimization and validation, communicante with the team
to make sure data avaiability, rediness, completness, work with data researcher in decomposition the problem.
14
30%
Machine Learning
The Data Science Venn Diagram
How we can reduce
the traffic Jam?
How we can reduce
waiting time for the
patient?
UNLOCKING DATA
The Data scientist mission
17
?
Solution
Answer
ROI
Question
Data
INPUTS
OUTPUTS
Generating the momentum
Description:
Proofs of concepts can generate the
critical momentum needed to jump start
any Data Science Capability
Problem
solving
Critical
ThinkingBuy-in
Necessary
Data
Clear
ROI
Dedication
&
focus Fail often
and learn
quickly
18
Limited
Complexity
and
Duration
Dealing with the problem ( Informs – CAP
Approach)
1 2
Decomposition & Datafication
DESIGN THINKING METHODOLOGY (Alternative
Approach)
BOOZ ALLEN’S DESIGN THINKING TOOL BOX
FOR ANALYTICS
Big DataResearcher
Domain expert with data
science knowledge
Skills:
65%
10%
Math
Computer
science
Domain expertise
22
70%
Communication
Statistics
30%
30%
Mission:
Generates low-fidelity prototypes to demonstrate applicability and test ideas quickly and cheaply before
making significant investments
The Four Key Activities of a Data Science
Respondents who said there weren’t enough
data scientists to go around
Do Data Scientists Have What They Need?
Data preparation
P
30
If you have perfect information or zero information
then your task is easy – it is in between those two
extremes that the trouble begins
“
”
Maslow’s of Need could by applied to Data
Optimized
Measured
Defined
managed
Performed
Enhance the data management maturity (
Data Preparation )
DataManagement Team ( Data preparation part )
Ready to Go !
Develop and execute all data
flow jobs, business rules,
matching, Scrapping,
Cleaning, munging, joining
and wrangling
Responsible about data flow ,
data solutions and Data
Models and architectures
Responsible about managing
the data elements and data
metadata.
Running the maturity model
components
Big Data
Steward
Big Data
Architect
Big Data
Engineer
33
SomeCertificats
Targeting to Certify your team will increase the maturity
1. Certified Analytics
Professional (CAP)
created in 2013 by the Institute for Operations
Research and the Management Sciences (INFORMS)
and is targeted towards data scientists.
2. EMC: Data Science
Associate (EMCDSA)
tests the ability to apply common techniques and
tools required for big data analytics.
3. SAS Certified Predictive
Modeler :
designed for SAS Enterprise Miner users who perform
predictive analytics. 34
SAS Certified
Predictive
Modeler u
EMCDSA
CAP Certified
Tools to support the bigdata team
• • Spreadsheet systems
• • Statistical systems
• • Optimization systems
• • Simulation systems
• • Business intelligence systems
• • Data management systems
• ▪ Structured data
• ▪ Unstructured data
• • Data integration systems
• • Operating systems such as HADOOP
BOOZ ALLEN TALENT MANAGEMENT MODEL
BOOZ ALLEN TALENT MANAGEMENT MODEL
CRISP-DM (cross-
industry standard
process for data
mining)
Six Sigma’s DMAIC
News:Metis Bootcamp Tuition Increase
• Effective June 20, 2016, the tuition for the Metis Data Science
Bootcamp in New York and San Francisco will increase to $15,500.
Accepted students who have signed and returned their enrollment
agreements on or before June 20, 2016 will receive the current
tuition of $14,000.
• This is the first tuition increase for Metis, and is the result of
continued investments to ensure that our students are best prepared
for careers in data science.

Mohammed AL Madhani

  • 1.
    Setting up a BigData TeamBest Practices (*) perspective from Mohammed AL Madhani.
  • 2.
    Sources of thispresentation
  • 3.
    Sources used inthis presentation Building Data Science Teams By Paco Nathan Publisher: O'Reilly Media
  • 4.
    Big Data trendin Google trends
  • 5.
    Bigdata vs IOT 36different types of sensors = 30$ in Amazon.com!!
  • 7.
    Bigdata vs IOTvs data science trends
  • 8.
    The Business Impactsof Data Science
  • 9.
    What is aData Scientist?
  • 12.
  • 13.
    Some commonly usedmethods/model Discrete event simulation Queuing model Monte Carlo simulation Agent-based modeling System dynamics Game theory Probabilities Economic analysis, IRR , NPV , FV Linear regression Stepwise regression–method Logistic regression Confidence intervals Hypothesis testing Statistical inferences Design of experiments Analysis of variance Principal component analysis (PCA) Data mining Forecasting Artificial neural networks Fuzzy logic Expert systems Decision trees Markov chain Revenue management (yield management) Optimization Linear programming Integer programming
  • 14.
    DataScientest Data artist ofturning Data into action Skills: 65% 30% 45% 75% Math Computer science Statistics Domain expertise Experience: Doing data representations and Using algorithms for optimization and validation, communicante with the team to make sure data avaiability, rediness, completness, work with data researcher in decomposition the problem. 14 30% Machine Learning
  • 15.
    The Data ScienceVenn Diagram
  • 16.
    How we canreduce the traffic Jam? How we can reduce waiting time for the patient?
  • 17.
    UNLOCKING DATA The Datascientist mission 17 ? Solution Answer ROI Question Data INPUTS OUTPUTS
  • 18.
    Generating the momentum Description: Proofsof concepts can generate the critical momentum needed to jump start any Data Science Capability Problem solving Critical ThinkingBuy-in Necessary Data Clear ROI Dedication & focus Fail often and learn quickly 18 Limited Complexity and Duration
  • 19.
    Dealing with theproblem ( Informs – CAP Approach) 1 2 Decomposition & Datafication
  • 20.
    DESIGN THINKING METHODOLOGY(Alternative Approach)
  • 21.
    BOOZ ALLEN’S DESIGNTHINKING TOOL BOX FOR ANALYTICS
  • 22.
    Big DataResearcher Domain expertwith data science knowledge Skills: 65% 10% Math Computer science Domain expertise 22 70% Communication Statistics 30% 30% Mission: Generates low-fidelity prototypes to demonstrate applicability and test ideas quickly and cheaply before making significant investments
  • 23.
    The Four KeyActivities of a Data Science
  • 24.
    Respondents who saidthere weren’t enough data scientists to go around
  • 27.
    Do Data ScientistsHave What They Need?
  • 28.
  • 29.
  • 30.
    30 If you haveperfect information or zero information then your task is easy – it is in between those two extremes that the trouble begins “ ”
  • 31.
    Maslow’s of Needcould by applied to Data Optimized Measured Defined managed Performed
  • 32.
    Enhance the datamanagement maturity ( Data Preparation )
  • 33.
    DataManagement Team (Data preparation part ) Ready to Go ! Develop and execute all data flow jobs, business rules, matching, Scrapping, Cleaning, munging, joining and wrangling Responsible about data flow , data solutions and Data Models and architectures Responsible about managing the data elements and data metadata. Running the maturity model components Big Data Steward Big Data Architect Big Data Engineer 33
  • 34.
    SomeCertificats Targeting to Certifyyour team will increase the maturity 1. Certified Analytics Professional (CAP) created in 2013 by the Institute for Operations Research and the Management Sciences (INFORMS) and is targeted towards data scientists. 2. EMC: Data Science Associate (EMCDSA) tests the ability to apply common techniques and tools required for big data analytics. 3. SAS Certified Predictive Modeler : designed for SAS Enterprise Miner users who perform predictive analytics. 34 SAS Certified Predictive Modeler u EMCDSA CAP Certified
  • 35.
    Tools to supportthe bigdata team • • Spreadsheet systems • • Statistical systems • • Optimization systems • • Simulation systems • • Business intelligence systems • • Data management systems • ▪ Structured data • ▪ Unstructured data • • Data integration systems • • Operating systems such as HADOOP
  • 36.
    BOOZ ALLEN TALENTMANAGEMENT MODEL
  • 37.
    BOOZ ALLEN TALENTMANAGEMENT MODEL
  • 39.
  • 40.
  • 41.
    News:Metis Bootcamp TuitionIncrease • Effective June 20, 2016, the tuition for the Metis Data Science Bootcamp in New York and San Francisco will increase to $15,500. Accepted students who have signed and returned their enrollment agreements on or before June 20, 2016 will receive the current tuition of $14,000. • This is the first tuition increase for Metis, and is the result of continued investments to ensure that our students are best prepared for careers in data science.

Editor's Notes

  • #2 Alright thank you very much for having me this afternoon. It’s big pleasure and honor to be here My Name is Mohammed AL Madhani, information management director at Federal demographic council.
  • #3 I would like to start telling you about the great sources helping me to do this presentation. 1- a great guide from booz aleen, it’s available free on the internet. 2nd one is building data science capabilities. The third is a report from Crowdflower .
  • #4 Also I used the study guide from CAP , an Informs certificate for data scientist. 5- any a video by paco Nathan about building data science team.
  • #5 If you do a small search in google trends we will start looking to people interests. I start searching about Big Data So we know that the big date is a big trend,
  • #6 Beside the bigdata , I search about IOT, it’s shows that IOT is making a bigger trend which really scare me, cause that’s mean another huge data is coming.
  • #8  Organizations want to get all their data, they have a lot of data, they're not doing as much as they could be doing with it? if you wanna do big data, you have to have a data scientist. The second reason they're asking this question, is they wanna know what the skills sit are in data scientist because they're finding it hard to hire them. If you make another search about data science, it’s shows it’s also trending but not as like as IOT.   So do you need to hire a PhD in mathematics or statistics?!   Or perhaps you can grow a data scientist within your existing organization?
  • #9 Studies shows that there is a big improvement in performance, when bigdata and data science being adapted.
  • #11 a data scientist is someone who finds new discoveries   That’s what a scientists does.   they make a hypothesis, And they try to investigate that hypothesis.   in case of data scientist, they do it with Data   They look for meaning knowledge in the data and they do that in a couple of a different ways
  • #12 One, is they visualize the data, they look at the data they create reports and look for patterns in the data   that's very similar to what you might, think of as a traditional business intelligence analyst or data analyst. so that's one of the tools that data scientist use.
  • #13 The two capabilities are additive and complementary, each offering a necessary view of business operations and the operating environment.
  • #14 but what really distinguishes a data scientist is they are using algorithms, Advanced algorithms. That actually run through the data, looking for all this meaning. And you may have heard things like machine learning algorithms or you may have heard algorithm such as neural networks or regression or K-means There’s dozens of these algorithms out there, and essentially they run through the data looking for the meaning That is one of the fundamental tools data scientist
  • #15 so to use those algorithms ,the data scientist has to have a strong Foundational knowledge in mathematics and statistics   And in some cases computer science and domain knowledge
  • #17 that might be a very good question for data scientist to answer and the data scientist would go about that by gathering all the data, running algorithms till they can find some reliable pattern that can answer that question.
  • #18 You give the data scientest a data and question and he will start the journey to give you an answer and also a technology that will last supporting the answer.
  • #19 In order to generate the momentum to start seriously working in the problem,
  • #30 We actually need the data scientist to spend more of their time in representation, optimization and evaluation.