SlideShare a Scribd company logo
“DATA IN THE WILD” –
BEGINNER STEPS INTO DATA
MARTA FAJLHAUER, GSTATS, BSC
DATA ANALYST AT BRIGHTBLUE CONSULTING,
PROFESSIONAL FELLOW OR ROYAL STATISTICAL SOCIETY
POSTGRADUATE STUDENT AT QUEEN MARY UNIVERSITY OF LONDON
What I learned from analysing 250 profiles of my LinkedIn
connections working in Data Science?
What I learned during my work in Data Engineering
What I learn when I work in Data Analytics.
Bayesian reasoning for social media
 curiosity, understanding, asking questions, looking for
answers on business and personal questions.
I want to work in Data
Science (£75,000 - £100,000)
Procurement / IT Service Desk /
Threat Intel Librarian / Audit / PMO /
Corporate / Business System /
Business / Technical / Analyst
Data / Analytics Consultant
Analytics and Business Intelligence
Analytical storyteller
AI and Advanced Analytics
Econometrician
Statistician
Mathematician
Software / Cloud / Mathematical / Data /
Linux Operation / System / Service /
Marketing / Backend / Blockchain / Splunk
/ Oracle / Machine Learning / AI Engineer
Data and Software / System / Enterprise /
Data Solution / Cloud Architect
Lead Software crafter
Software / Full Stack / Software developer
Cloud / AI / Computer Vision / Machine
Learning Consultant
Applied Machine Learning Scientists
Deep learning specialist
Enterprise data strategy
Machine Learning / AI / Robotics /
Researcher
Big Data Developer
Oracle DBA
DevOps
-> Machine Learning
-> R
-> Python
-> Deep Learning
-> NLP
-> AI
-> Advanced Statistics
241 profiless
86 data Scientists (27 PhD and 13 BSc)
64 Data Analysts (1 PhD and 35 BSc)
64 Engineers
 Computer Science or
Mathematics
background.
 Others in every single
category
 Mathematics for Data
Analytics and Computer
Science for Data
Engineering
Data
Scientists
less than 20% computer science
60% degree in computer science
But….
Lead Software Crafter: BSc Health
science
DevOps: BSc Applied linguistics
Marketing Engineer: English
literature
Senior Analytics Consultant: BSc
Music
Software Engineer: Public relations
Data Engineer: Anthropology
Data manager: BSc Arts
Cloud Consultant: Advanced
Aeronautical
Engineering
Data Engineer: Public Health
You need to choose what you want to expertise at:
They are called doctors but does it mean that one can perform work of another?
Does it mean that one is more important than another? No. It means that one
decided to concentrate on a specific thing after exploration stage.
EBOV virus for charity helping people in Africa. Crime Data mining using USA census
data
DATA ENGINEERING – FIRST JOB:
“DATA SOMETHING”
IT Ops and Security
Machine data
Real time visibility
Forwarding data in real time.
Collect and visualise
Forward data in real time to indexes
Scales from single server to distributed deployment
Accepts any text data as input, parses the data into
events, stores events in indexes, searches and reports
 Writing configuration files <TCP / UDP, SSL, HEC>
 Set up receiving ports on indexers, add inputs to forwarders
 Compress feed to save money for data pre-processing from Hadoop Clusters
 Lesson 0: where is the coffee machine
 Lesson 1: Not many girls in the Data Engineering work: The only girl, the only
non-technical.
 Lesson 2: Stack Overflow and Google is my best friend.
 Lesson 3: How to set up Splunk image on Docker container
 Lesson 4: setting up distributed, global deployment – very important to set up
proper time and time zone to correlate across multiple sources, set up alerts in
case of anomalies
 Lesson 5: Encryption data and different levels of access are very important in
finance – REGEX, Bush, Linux
 Dashboard and automatic pivots using Splunk Programming Language.
No time to carefully
check all details the
analytics of this kind
of data is completely
different than for
static data.
In static data .csv you can
check if you have missing data
or not, you can visualise all
details and understand the
data but in real time rolling
data it’s completely different.
You have already set up
dashboards to concentrate on
the most important bits. In
Splunk ,you can set up an alert
When you deal with
this kind of data you
don’t concentrate on
Statistics behind it
only choose an
algorithm from a
selection that you
think will the best
meet conditions. With
static data you think
about R^2,
coefficients and so
much more.
read code
written by
someone else
modify the
elements for
your own
purpose
Write your
own code
 There are languages like R when sometimes much more efficient is to use
package already in the system.
 When you set up a loop on millions of data first check if your loops give the
expected output and run smoothly on a smaller data. Once you check that
remember to add loop counter so you can track progress and set up automatic
saving of the output.
DATA ANALYTICS
Algorithms, R&D, statistical thinking
 Lesson 1: relying completely on statistical knowledge without thinking if
correlation does imply causation. (not only regression)
 Whatever you can plot it to visualise the data
 R, Python, Excel, SAS whatever works for the given purpose – you choose.
 Different models for different kind of data
 In smaller datasets, static data you may have much bigger fun from an
analytics point of view rather than with rolling in real time data coming from
different sources.
Bayesian Reasoning for Social Data
Sherlock Holmes and Watson
 It’s July, and mostly sunny <- prior. Predict: mostly sunny
 Someone carry an umbrella <- likelihood Predict: rainy
 What if this is country where you carry umbrella during hot days? What if you
carry umbrella only when it’s raining?
 Update belief <- posterior
If an absent-minded professor takes his umbrella into a classroom, there's a probability of 1/4 that he'll
absent-mindedly leave it there. One day, he sets off with his umbrella, teaches in three classrooms, and
comes back to his office... without his umbrella. What's the probability he left the umbrella?
16/
64
12/
64
16/16+12+9 ~ 43%
P(left in the first classroom, given that he left it
somewhere) =
P(left it in the classroom and he left it somewhere) /
P(he left it somewhere) = (1/4)/((1−27/64))
𝑃 𝑇𝑟𝑢𝑡ℎ 𝐷𝑎𝑡𝑎 =
𝑃(𝐷𝑎𝑡𝑎|𝑇𝑟𝑢𝑡ℎ)𝑃(𝑇𝑟𝑢𝑡ℎ)
𝑃(𝐷𝑎𝑡𝑎)
𝑃 𝑇𝑟𝑢𝑡ℎ = 𝑡ℎ𝑒 𝑝𝑟𝑖𝑜𝑟 = 𝑤ℎ𝑎𝑡 𝑤𝑒 𝑏𝑒𝑙𝑖𝑒𝑣𝑒 𝑖𝑛
𝑃 𝐷𝑎𝑡𝑎 𝑇𝑟𝑢𝑡ℎ = 𝑡ℎ𝑒 𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 = 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎 𝑐𝑜𝑙𝑙𝑒𝑐𝑡𝑒𝑑 𝑡𝑜 𝑐𝑜𝑛𝑓𝑖𝑟𝑚 𝑜𝑢𝑟 𝑏𝑒𝑙𝑖𝑒𝑓
𝑃 𝑇𝑟𝑢𝑡ℎ 𝐷𝑎𝑡𝑎 = 𝑡ℎ𝑒 𝑝𝑜𝑠𝑡𝑒𝑟𝑖𝑜𝑟 = 𝑡ℎ𝑒 𝑢𝑝𝑑𝑎𝑡𝑒𝑑 𝑏𝑒𝑙𝑖𝑒𝑓
𝑃 𝑇𝑟𝑢𝑡ℎ 𝐷𝑎𝑡𝑎 ∝ 𝑃 𝑇𝑟𝑢𝑡ℎ 𝑃(𝐷𝑎𝑡𝑎|𝑇𝑟𝑢𝑡ℎ)
Prior
belief
The data
collected
Updated
belief
Updated
belief
Thedata
collected
Prior
belief
𝒑𝒐𝒔𝒕𝒆𝒓𝒊𝒐𝒓
∝ 𝒑𝒓𝒊𝒐𝒓 ∗ 𝒍𝒊𝒌𝒆𝒍𝒊𝒉𝒐𝒐𝒅
ROI, customer retention, losing
umbrella: all is based on some
previous belief
Why we may prefer to use Bayesian rather
than Classical approaches to the data?
problem with small n large p
limited influence on what features will be
selected in classical approaches
power of making decision what
coefficients are going into the model or
how strongly they will go into the model.
Why we are so different yet
so similar - No two people are exactly alike
and no two people are exactly different
preferences
 Bayesian statistics allows you to be subjective, to better connect the real world with the data.
 P-values and confidence intervals vs posterior distribution. <all outcomes and their probabilities>
 Answers that we look for do not match the answers from classical models.
 Important question: what is the probability of an event when the p-value is less than 0.005?
 A better than B with p-value 0..001. A is more expensive.
 You have the predicted probability of quality guarantee in hand., expected prices on the market
 Bayesian methods support complex decision – making under uncertainty.
Bayesian
methods provide
tradeoffs
between speed
and generality
Don’t know priors
Are you sure?
Multiple module analysis
with different level of
priors.
• Business rules influencing decision
• Movement of needs depending on price
• We need to think about competitors,
situation on the market, prices of other
products within the store
We try to measure the return of investment by media type.
We have cross-sectional unit: regions, markets, trade areas, channels, brands, competitor brands.
Another dimension is the time series can be weekly, monthly. at least 5 years of monthly data and 2 years of weekly data.
The dependent variable we would have to be units, not currency due to price elasticity.
Marketing Mix Modelling
• the theory that will never die
• Bayesian Methods for Hackers - http://camdavidsonpilon.github.io/Probabilistic-
Programming-and-Bayesian-Methods-for-Hackers/
• Think Bayes – Bayesian Statistics in Python https://greenteapress.com/wp/think-bayes/
• Statistical Computing for Scientists and engineers - https://www.zabaras.com/statistical-
computing-2017
• Chris Bishop Introduction to Bayesian Inference:
http://videolectures.net/mlss09uk_bishop_ibi/?q=mlss+2009
• Statistical Rethinking: Ebook:
http://xcelab.net/rmpubs/rethinking/Statistical_Rethinking_sample.pdf Videos:
https://www.youtube.com/watch?v=oy7Ks3YfbDg&list=PLDcUM9US4XdM9_N6XUUFrhghGJ4K2
5bFc
MARTA FAJLHAUER
Email: fajlhauermarta@gmail.com
LinkedIn: https://www.linkedin.com/in/martafajlhauer/

More Related Content

What's hot

Module 1.2 data preparation
Module 1.2  data preparationModule 1.2  data preparation
Module 1.2 data preparation
Sara Hooker
 
Introduction to Machine learning
Introduction to Machine learningIntroduction to Machine learning
Introduction to Machine learning
Knoldus Inc.
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
Tamir Taha
 
Data Science Full Course | Edureka
Data Science Full Course | EdurekaData Science Full Course | Edureka
Data Science Full Course | Edureka
Edureka!
 
Barga, roger. predictive analytics with microsoft azure machine learning
Barga, roger. predictive analytics with microsoft azure machine learningBarga, roger. predictive analytics with microsoft azure machine learning
Barga, roger. predictive analytics with microsoft azure machine learning
maldonadojorge
 
Module 8: Natural language processing Pt 1
Module 8:  Natural language processing Pt 1Module 8:  Natural language processing Pt 1
Module 8: Natural language processing Pt 1
Sara Hooker
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
Laguna State Polytechnic University
 
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
Edureka!
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Edureka!
 
CRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining ProjectsCRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining Projects
Michał Łopuszyński
 
Managing machine learning
Managing machine learningManaging machine learning
Managing machine learning
David Murgatroyd
 
Applications: Prediction
Applications: PredictionApplications: Prediction
Applications: Prediction
NBER
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning Project
Eng Teong Cheah
 
Machine Learning Engineer Salary, Roles And Responsibilities, Skills and Resu...
Machine Learning Engineer Salary, Roles And Responsibilities, Skills and Resu...Machine Learning Engineer Salary, Roles And Responsibilities, Skills and Resu...
Machine Learning Engineer Salary, Roles And Responsibilities, Skills and Resu...
Simplilearn
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualization
Dr. Hamdan Al-Sabri
 
Linear Regression Algorithm | Linear Regression in R | Data Science Training ...
Linear Regression Algorithm | Linear Regression in R | Data Science Training ...Linear Regression Algorithm | Linear Regression in R | Data Science Training ...
Linear Regression Algorithm | Linear Regression in R | Data Science Training ...
Edureka!
 
Research Method EMBA chapter 12
Research Method EMBA chapter 12Research Method EMBA chapter 12
Research Method EMBA chapter 12
Mazhar Poohlah
 
PyGotham 2016
PyGotham 2016PyGotham 2016
PyGotham 2016
Manojit Nandi
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Niko Vuokko
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
Koundinya Desiraju
 

What's hot (20)

Module 1.2 data preparation
Module 1.2  data preparationModule 1.2  data preparation
Module 1.2 data preparation
 
Introduction to Machine learning
Introduction to Machine learningIntroduction to Machine learning
Introduction to Machine learning
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
Data Science Full Course | Edureka
Data Science Full Course | EdurekaData Science Full Course | Edureka
Data Science Full Course | Edureka
 
Barga, roger. predictive analytics with microsoft azure machine learning
Barga, roger. predictive analytics with microsoft azure machine learningBarga, roger. predictive analytics with microsoft azure machine learning
Barga, roger. predictive analytics with microsoft azure machine learning
 
Module 8: Natural language processing Pt 1
Module 8:  Natural language processing Pt 1Module 8:  Natural language processing Pt 1
Module 8: Natural language processing Pt 1
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
 
CRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining ProjectsCRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining Projects
 
Managing machine learning
Managing machine learningManaging machine learning
Managing machine learning
 
Applications: Prediction
Applications: PredictionApplications: Prediction
Applications: Prediction
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning Project
 
Machine Learning Engineer Salary, Roles And Responsibilities, Skills and Resu...
Machine Learning Engineer Salary, Roles And Responsibilities, Skills and Resu...Machine Learning Engineer Salary, Roles And Responsibilities, Skills and Resu...
Machine Learning Engineer Salary, Roles And Responsibilities, Skills and Resu...
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualization
 
Linear Regression Algorithm | Linear Regression in R | Data Science Training ...
Linear Regression Algorithm | Linear Regression in R | Data Science Training ...Linear Regression Algorithm | Linear Regression in R | Data Science Training ...
Linear Regression Algorithm | Linear Regression in R | Data Science Training ...
 
Research Method EMBA chapter 12
Research Method EMBA chapter 12Research Method EMBA chapter 12
Research Method EMBA chapter 12
 
PyGotham 2016
PyGotham 2016PyGotham 2016
PyGotham 2016
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 

Similar to Bayesian reasoning

Bigdataanalytics
BigdataanalyticsBigdataanalytics
Bigdataanalytics
Haroon Karim
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
Ghulam Imaduddin
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
Osman Ali
 
Data science in business Administration Nagarajan.pptx
Data science in business Administration Nagarajan.pptxData science in business Administration Nagarajan.pptx
Data science in business Administration Nagarajan.pptx
NagarajanG35
 
1 UNIT-DSP.pptx
1 UNIT-DSP.pptx1 UNIT-DSP.pptx
1 UNIT-DSP.pptx
PothyeswariPothyes
 
data science and business analytics
data science and business analyticsdata science and business analytics
data science and business analytics
sunnypatil1778
 
Data science tutorial
Data science tutorialData science tutorial
Data science tutorial
Aakashdata
 
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIMAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
Big Data Week
 
Data science presentation
Data science presentationData science presentation
Data science presentation
MSDEVMTL
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
Roger Barga
 
365 Data Science
365 Data Science365 Data Science
365 Data Science
IvanHo572682
 
Demystifying Data Science
Demystifying Data ScienceDemystifying Data Science
Demystifying Data Science
Jonathan Sedar
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data Science
Sanghamitra Deb
 
Guide for a Data Scientist
Guide for a Data ScientistGuide for a Data Scientist
Guide for a Data Scientist
Rohit Dubey
 
What is business analytics
What is business analyticsWhat is business analytics
What is business analytics
Sherpa Consulting
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera, Inc.
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
Shanmugasundaram M
 
The Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewThe Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape Overview
Dr. Ananth Krishnamoorthy
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
Mahir Haque
 
Data preparation and processing chapter 2
Data preparation and processing chapter  2Data preparation and processing chapter  2
Data preparation and processing chapter 2
Mahmoud Alfarra
 

Similar to Bayesian reasoning (20)

Bigdataanalytics
BigdataanalyticsBigdataanalytics
Bigdataanalytics
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Data science in business Administration Nagarajan.pptx
Data science in business Administration Nagarajan.pptxData science in business Administration Nagarajan.pptx
Data science in business Administration Nagarajan.pptx
 
1 UNIT-DSP.pptx
1 UNIT-DSP.pptx1 UNIT-DSP.pptx
1 UNIT-DSP.pptx
 
data science and business analytics
data science and business analyticsdata science and business analytics
data science and business analytics
 
Data science tutorial
Data science tutorialData science tutorial
Data science tutorial
 
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIMAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
 
365 Data Science
365 Data Science365 Data Science
365 Data Science
 
Demystifying Data Science
Demystifying Data ScienceDemystifying Data Science
Demystifying Data Science
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data Science
 
Guide for a Data Scientist
Guide for a Data ScientistGuide for a Data Scientist
Guide for a Data Scientist
 
What is business analytics
What is business analyticsWhat is business analytics
What is business analytics
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
 
The Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewThe Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape Overview
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Data preparation and processing chapter 2
Data preparation and processing chapter  2Data preparation and processing chapter  2
Data preparation and processing chapter 2
 

Recently uploaded

Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 

Recently uploaded (20)

Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 

Bayesian reasoning

  • 1. “DATA IN THE WILD” – BEGINNER STEPS INTO DATA MARTA FAJLHAUER, GSTATS, BSC DATA ANALYST AT BRIGHTBLUE CONSULTING, PROFESSIONAL FELLOW OR ROYAL STATISTICAL SOCIETY POSTGRADUATE STUDENT AT QUEEN MARY UNIVERSITY OF LONDON
  • 2. What I learned from analysing 250 profiles of my LinkedIn connections working in Data Science? What I learned during my work in Data Engineering What I learn when I work in Data Analytics. Bayesian reasoning for social media  curiosity, understanding, asking questions, looking for answers on business and personal questions.
  • 3. I want to work in Data Science (£75,000 - £100,000) Procurement / IT Service Desk / Threat Intel Librarian / Audit / PMO / Corporate / Business System / Business / Technical / Analyst Data / Analytics Consultant Analytics and Business Intelligence Analytical storyteller AI and Advanced Analytics Econometrician Statistician Mathematician Software / Cloud / Mathematical / Data / Linux Operation / System / Service / Marketing / Backend / Blockchain / Splunk / Oracle / Machine Learning / AI Engineer Data and Software / System / Enterprise / Data Solution / Cloud Architect Lead Software crafter Software / Full Stack / Software developer Cloud / AI / Computer Vision / Machine Learning Consultant Applied Machine Learning Scientists Deep learning specialist Enterprise data strategy Machine Learning / AI / Robotics / Researcher Big Data Developer Oracle DBA DevOps -> Machine Learning -> R -> Python -> Deep Learning -> NLP -> AI -> Advanced Statistics
  • 4. 241 profiless 86 data Scientists (27 PhD and 13 BSc) 64 Data Analysts (1 PhD and 35 BSc) 64 Engineers
  • 5.  Computer Science or Mathematics background.  Others in every single category  Mathematics for Data Analytics and Computer Science for Data Engineering Data Scientists
  • 6. less than 20% computer science 60% degree in computer science But…. Lead Software Crafter: BSc Health science DevOps: BSc Applied linguistics Marketing Engineer: English literature Senior Analytics Consultant: BSc Music Software Engineer: Public relations Data Engineer: Anthropology Data manager: BSc Arts Cloud Consultant: Advanced Aeronautical Engineering Data Engineer: Public Health
  • 7. You need to choose what you want to expertise at: They are called doctors but does it mean that one can perform work of another? Does it mean that one is more important than another? No. It means that one decided to concentrate on a specific thing after exploration stage. EBOV virus for charity helping people in Africa. Crime Data mining using USA census data
  • 8. DATA ENGINEERING – FIRST JOB: “DATA SOMETHING”
  • 9. IT Ops and Security Machine data Real time visibility Forwarding data in real time. Collect and visualise Forward data in real time to indexes Scales from single server to distributed deployment Accepts any text data as input, parses the data into events, stores events in indexes, searches and reports
  • 10.  Writing configuration files <TCP / UDP, SSL, HEC>  Set up receiving ports on indexers, add inputs to forwarders  Compress feed to save money for data pre-processing from Hadoop Clusters  Lesson 0: where is the coffee machine  Lesson 1: Not many girls in the Data Engineering work: The only girl, the only non-technical.  Lesson 2: Stack Overflow and Google is my best friend.  Lesson 3: How to set up Splunk image on Docker container  Lesson 4: setting up distributed, global deployment – very important to set up proper time and time zone to correlate across multiple sources, set up alerts in case of anomalies  Lesson 5: Encryption data and different levels of access are very important in finance – REGEX, Bush, Linux  Dashboard and automatic pivots using Splunk Programming Language.
  • 11. No time to carefully check all details the analytics of this kind of data is completely different than for static data. In static data .csv you can check if you have missing data or not, you can visualise all details and understand the data but in real time rolling data it’s completely different. You have already set up dashboards to concentrate on the most important bits. In Splunk ,you can set up an alert When you deal with this kind of data you don’t concentrate on Statistics behind it only choose an algorithm from a selection that you think will the best meet conditions. With static data you think about R^2, coefficients and so much more.
  • 12. read code written by someone else modify the elements for your own purpose Write your own code  There are languages like R when sometimes much more efficient is to use package already in the system.  When you set up a loop on millions of data first check if your loops give the expected output and run smoothly on a smaller data. Once you check that remember to add loop counter so you can track progress and set up automatic saving of the output.
  • 13. DATA ANALYTICS Algorithms, R&D, statistical thinking
  • 14.  Lesson 1: relying completely on statistical knowledge without thinking if correlation does imply causation. (not only regression)  Whatever you can plot it to visualise the data  R, Python, Excel, SAS whatever works for the given purpose – you choose.  Different models for different kind of data  In smaller datasets, static data you may have much bigger fun from an analytics point of view rather than with rolling in real time data coming from different sources.
  • 15. Bayesian Reasoning for Social Data Sherlock Holmes and Watson
  • 16.  It’s July, and mostly sunny <- prior. Predict: mostly sunny  Someone carry an umbrella <- likelihood Predict: rainy  What if this is country where you carry umbrella during hot days? What if you carry umbrella only when it’s raining?  Update belief <- posterior
  • 17. If an absent-minded professor takes his umbrella into a classroom, there's a probability of 1/4 that he'll absent-mindedly leave it there. One day, he sets off with his umbrella, teaches in three classrooms, and comes back to his office... without his umbrella. What's the probability he left the umbrella? 16/ 64 12/ 64 16/16+12+9 ~ 43% P(left in the first classroom, given that he left it somewhere) = P(left it in the classroom and he left it somewhere) / P(he left it somewhere) = (1/4)/((1−27/64))
  • 18. 𝑃 𝑇𝑟𝑢𝑡ℎ 𝐷𝑎𝑡𝑎 = 𝑃(𝐷𝑎𝑡𝑎|𝑇𝑟𝑢𝑡ℎ)𝑃(𝑇𝑟𝑢𝑡ℎ) 𝑃(𝐷𝑎𝑡𝑎) 𝑃 𝑇𝑟𝑢𝑡ℎ = 𝑡ℎ𝑒 𝑝𝑟𝑖𝑜𝑟 = 𝑤ℎ𝑎𝑡 𝑤𝑒 𝑏𝑒𝑙𝑖𝑒𝑣𝑒 𝑖𝑛 𝑃 𝐷𝑎𝑡𝑎 𝑇𝑟𝑢𝑡ℎ = 𝑡ℎ𝑒 𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 = 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎 𝑐𝑜𝑙𝑙𝑒𝑐𝑡𝑒𝑑 𝑡𝑜 𝑐𝑜𝑛𝑓𝑖𝑟𝑚 𝑜𝑢𝑟 𝑏𝑒𝑙𝑖𝑒𝑓 𝑃 𝑇𝑟𝑢𝑡ℎ 𝐷𝑎𝑡𝑎 = 𝑡ℎ𝑒 𝑝𝑜𝑠𝑡𝑒𝑟𝑖𝑜𝑟 = 𝑡ℎ𝑒 𝑢𝑝𝑑𝑎𝑡𝑒𝑑 𝑏𝑒𝑙𝑖𝑒𝑓 𝑃 𝑇𝑟𝑢𝑡ℎ 𝐷𝑎𝑡𝑎 ∝ 𝑃 𝑇𝑟𝑢𝑡ℎ 𝑃(𝐷𝑎𝑡𝑎|𝑇𝑟𝑢𝑡ℎ) Prior belief The data collected Updated belief Updated belief Thedata collected Prior belief
  • 19. 𝒑𝒐𝒔𝒕𝒆𝒓𝒊𝒐𝒓 ∝ 𝒑𝒓𝒊𝒐𝒓 ∗ 𝒍𝒊𝒌𝒆𝒍𝒊𝒉𝒐𝒐𝒅 ROI, customer retention, losing umbrella: all is based on some previous belief
  • 20. Why we may prefer to use Bayesian rather than Classical approaches to the data? problem with small n large p limited influence on what features will be selected in classical approaches power of making decision what coefficients are going into the model or how strongly they will go into the model.
  • 21. Why we are so different yet so similar - No two people are exactly alike and no two people are exactly different preferences
  • 22.  Bayesian statistics allows you to be subjective, to better connect the real world with the data.  P-values and confidence intervals vs posterior distribution. <all outcomes and their probabilities>  Answers that we look for do not match the answers from classical models.  Important question: what is the probability of an event when the p-value is less than 0.005?  A better than B with p-value 0..001. A is more expensive.  You have the predicted probability of quality guarantee in hand., expected prices on the market  Bayesian methods support complex decision – making under uncertainty.
  • 24. Don’t know priors Are you sure? Multiple module analysis with different level of priors.
  • 25. • Business rules influencing decision • Movement of needs depending on price • We need to think about competitors, situation on the market, prices of other products within the store
  • 26. We try to measure the return of investment by media type. We have cross-sectional unit: regions, markets, trade areas, channels, brands, competitor brands. Another dimension is the time series can be weekly, monthly. at least 5 years of monthly data and 2 years of weekly data. The dependent variable we would have to be units, not currency due to price elasticity. Marketing Mix Modelling
  • 27. • the theory that will never die • Bayesian Methods for Hackers - http://camdavidsonpilon.github.io/Probabilistic- Programming-and-Bayesian-Methods-for-Hackers/ • Think Bayes – Bayesian Statistics in Python https://greenteapress.com/wp/think-bayes/ • Statistical Computing for Scientists and engineers - https://www.zabaras.com/statistical- computing-2017 • Chris Bishop Introduction to Bayesian Inference: http://videolectures.net/mlss09uk_bishop_ibi/?q=mlss+2009 • Statistical Rethinking: Ebook: http://xcelab.net/rmpubs/rethinking/Statistical_Rethinking_sample.pdf Videos: https://www.youtube.com/watch?v=oy7Ks3YfbDg&list=PLDcUM9US4XdM9_N6XUUFrhghGJ4K2 5bFc
  • 28. MARTA FAJLHAUER Email: fajlhauermarta@gmail.com LinkedIn: https://www.linkedin.com/in/martafajlhauer/

Editor's Notes

  1. Structure of the talk.
  2. Statement: “I want to work in Data Science” based on salary Explosion of information. First conference and where my friends works.