SlideShare a Scribd company logo
1 of 32
Feeding The Vizard: finding the stories in the data
Feeding The Vizard: finding the stories in the data
Feeding The Vizard: finding the stories in the data
Feeding The Vizard: finding the stories in the data
Feeding The Vizard: finding the stories in the data
Feeding The Vizard: finding the stories in the data
Feeding The Vizard: finding the stories in the data
Feeding The Vizard: finding the stories in the data
Feeding The Vizard: finding the stories in the data
Feeding The Vizard: finding the stories in the data
Feeding The Vizard: finding the stories in the data
Feeding The Vizard: finding the stories in the data
Feeding The Vizard: finding the stories in the data
Feeding The Vizard: finding the stories in the data
Feeding The Vizard: finding the stories in the data
Feeding The Vizard: finding the stories in the data
Feeding The Vizard: finding the stories in the data
Feeding The Vizard: finding the stories in the data
Feeding The Vizard: finding the stories in the data
Feeding The Vizard: finding the stories in the data
Feeding The Vizard: finding the stories in the data
Feeding The Vizard: finding the stories in the data
Feeding The Vizard: finding the stories in the data
Feeding The Vizard: finding the stories in the data
Feeding The Vizard: finding the stories in the data
Feeding The Vizard: finding the stories in the data
Feeding The Vizard: finding the stories in the data
Feeding The Vizard: finding the stories in the data
Feeding The Vizard: finding the stories in the data
Feeding The Vizard: finding the stories in the data
Feeding The Vizard: finding the stories in the data
Feeding The Vizard: finding the stories in the data

More Related Content

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Recently uploaded (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Featured

Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Featured (20)

AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 

Editor's Notes

  1. Feeding the Vizard: Finding stories in the dataMonica Rogati. Data Scientist at LinkedIn
  2. There are plenty of stories in the LinkedIn data – because there are 120 millionprofiles – with career histories - that in aggregate, can tell you a lot about the world & labor market evolution.
  3. What were the fastest growing titles in a given year? We can see the tech boom in 1999 and the bubble bursting as people go back to grad school. Today, we see the rise in social media & of course, data scientists
  4. ..and, finally, we can see fads in job titles goingfrom “gurus” to “ninjas” to “rock stars”. So HOW do we find stories like these in the data?
  5. You have to ask the right questions. I’m going to talk about how to ask the right question by showing you a a deceptively simple exercise that LinkedIn data scientists go through. The question is, what are the hottest industries this year, according to the LinkedIn data? There’s one small detail I’m not specifying – what’s the definition of hot. That definition plays a major part in asking the right questions.
  6. SO let’s take a look at the data. On LinkedIn, we have over 120M people, their industry, and the year they joined.
  7. … so the first attempt at defining “hot” might be – let’s look at the YOY growth of an industry & look at the top 3. That idea is not so hot – at best, it’s only an indicatorof LinkedIn’s penetration in an industry; at worst, it’s actually a contrarian indicator because it shows people who might want to transition OUT of that industry
  8. The next piece of data we can look at is the individual positions people list on their profiles – they have a start date and an industry, so you can see what industry people are flowing into in a given year. Much better.
  9. You run the numbers… Wait a second!! Is consulting really the hottest industry? Hmm.. I think the data is trying to lie to us. We need to take into account churn & promotions – and we do that by looking at the NET inflow of people into an industry: people coming IN minus people coming out.
  10. There, that should be much better. Next external factor that might come into play is seasonality. If we’re doing this analysis in the summer, it looks like there a lot fewer teachers and accountants, and a lot more summer interns compared to last year! So ideally, we want to compare the same time period to take out seasonal effects
  11. OK … done, let’s take another look. Are the Mining and metals & Dairy industries really the hottest industries this year? Or are they just very small industries on LinkedIn, and it’s much easier to grow off of a small base? You can get around this by making separate categories for industries of different size, ignoring industries below a certain size, or somehow account for that effect.
  12. For example , there are a lot of fake accounts that we’ve immediately closed, but they’re still there in the database. If you don’t check for that flag, you have this army of darthvaders boosting up the defense and space industry.
  13. Hm, ok, we plot the YOY growth and we get something that looks like this : a spaghetti chart that mostly shows industries moving in unison – an effect of the broader economic conditions (see that dip in 2001 and 2009). If we want to actually focus on differences between industries instead of what they have in common, we need to scale or normalize those numbers – for example, by dividing the net # of people coming into an industry by the TOTAL number of people who started jobs that year. This also has the nice property that it accounts for website growth.
  14. OK, this MUST be it, right? The data stopped lying and we can actually see some real trends. Wild swings around 2000 for Internet and telecommunications, and there’s definitely something going on w/ real estate there. It still looks like spaghetti, it’s hard to understand and explain, and it’s not exactly telling a story. To tell the story, we need to make some hard decisions and pick only a couple of those lines, clean things up, and let that story shine.
  15. Nice! I’ve picked 3 industries – when the line is above zero, that industry is growing; below 0, it’s shrinking. So the Internet is taking off in 94, booming in 99, then there’s a huge dip in 2001. Real Estate is growing steadily, it’s picking up in 2002, and it’s sinking in 2008 – and so are financial services. This is all coming from aggregating data on people’s public LinkedIn profiles! This is the kind of story that gets people excited about the insights in the LinkedIn data – but it wouldn’t have been possible, if we didn’t ask the right questions.
  16. We have to aggregate across interesting dimensions, normalize the raw data, and dig. Let’s see a few examples of what this means.
  17. Looking at job promotions by month and country reveals interesting patterns.
  18. Normalizing raw data is essential -- there is always a denominator. Look for what is over-indexed, not just popular. To see the trend HERE, we use %ages.
  19. Finally, ask why . For the promotions data, we took a look at when people were born. And here is our explanation for the trend: Millenials don’t care what month it is, they want their promotions.
  20. Sometimes you have to rephrase the question you’re being asked. This infographic is 100% correct – but Atlanta and Chicago don’t strike me as wedding destinations. When people ask for most popular , give them “over-represented” – normalize (in this case, normalize by city size and total incoming flights).
  21. Here are two memorable images to explain – and help you remember- the concept of over-represented and under-represented.Babies are UNDER-represented in prison: fewer of them there vs. the overall population.What are a few animals over-represented on geeky t-shirts? Wolves, elefants and pigs.
  22. Let’s take a look at what majors are over-represented among entrepreneurs, according to the LinkedIn data.
  23. What are other over-represented schools among entrepreneurs, other than the business schools (in blue)? There are a few technical schools (in orange), and a few general schools (in green) , some of which skew technical.
  24. Which companies are over-represented in founders’ histories?
  25. Let’s take another example. What first names are over-represented among CEOs? You might look at the list and say – well, those are just popular baby names 50 years ago. But that’s not the whole story – you have to dig.
  26. Look at the length of the names – now that’s an interesting story! There’s Chip, Todd and Trey - the quintessential sales guys. CEOs are more diverse – but they still want to be your friend -- so they use nicknames.
  27. The story itself needs to be interesting – to get people talking about it; It has to be accessible, so it’s easy to understand by a lot of people; and it has to be relatable – Howard Stern covered it because Howard was one of the names.
  28. So obviously there are a lot of interesting stories in the data. We all love data & want more of it – I mean, that’s IS my license plate. But raw data isn’t enough – you have to DIG to find the real story.
  29. Feeding the Vizard: Finding stories in the dataMonica Rogati. Data Scientist at LinkedIn