stackconf 2023 | Analyzing Public Conversation using LDA and Topic Modeling, in Python by Sveta Gimpelson.pdf

NETWAYS
NETWAYSNETWAYS
Analyzing Public Conversations
using LDA and Topic Modeling
Introducing Myself
2
Sveta Gimpelson
● CDO and Co-Founder at Memphis.dev
● Software engineer
● Excited about networks. And graphs
● Playing football on mom’s team
● Harry Potter fan
Analyzing Public Conversations of Influencers
3
Visualize the output
Analyze collected data
Collect data from social
networks
I want to know the highlights of what is happening in
the football domain.
Why Social Media
4 Source: Global social media statistics research summary 2023
Why Social Media
5 Source: Global social media statistics research summary 2023
Social Network Analysis (SNA)
6
A network is a number of points (or ‘nodes’) that are
connected by links.
Generally in social network analysis, the nodes are people
and the links are any social connection between them –
for example, friendship, marital/family ties, or financial ties.
follows/ retweeted/ liked a
tweet
replied to a question/ have >1
groups in common
?
Social Network Analysis (SNA)
7
Social network analysis aims to understand a community by
mapping the relationships that connect them as a network,
and then trying to draw out key individuals, groups within the
network (‘components’), and/or associations between the
individuals.
Source:
https://digi.uga.edu/network-graphs/
Social Network - Representation
8
● Graph - Network
● Nodes - People
● Links - Relationships
● Sub-Graph - Communities
9
Collect data - Find the influencers
10
Begin with 10 accounts
1. GOAL – @goal
2. ESPN FC – @ESPNFC
3. FourFourTwo – @FourFourTwo
4. BBC Sport – @BBCSport
5. WhoScored.com – @WhoScored
6. Squawka – @Squawka
7. OptaJoe – @OptaJoe
8. Bleacher Report Football – @brfootball
9. Sky Sports News – @SkySportsNews
10. Transfermarkt – @Transfermarkt
Collect data - Find the influencers
11
Ranking mechanism:
Based on the following parameters
1. Accounts that this accounts follow
2. Use keywords related to football
3. Have many followers
4. Accounts that been retweeted by our network
Follows
Ranking mechanism
Decision
IN
OUT
NEEDS REVIEW
The Expansion
14
15
Architecture
16
Architecture
17
Architecture
18
Collector
19
Rate Limit
Deduplication
Memphis’ Station
20
Memphis’ Station
21
Analyzer
22
Consumer
23
Preprocessing
24
Run Model
25
Upload To BigQuery
26
Visualize
27
Enhancements
28
Collector
● Produce to different partitions based on source/ account
● Collect from different sources - Facebook, Reddit, Telegram
● Enforce schema to the stations
● Use storage tiering - to enable batch processing (Apache iceberg,
Spark..)
Enhancements
29
Analyzer
● Consume from specific partition/ entire station/ communities
● Fine tuning to the LDA model
● Use other models/ combinations
● Use GPT to show more “readable” output
Enhancements
30
Visualization
● Other visualization tools
● Topics over time (TOT) - Evolution
● Create metrics
Thank you!
sveta@memphis.dev
https://twitter.com/memphisveta
Source code: HERE!
1 of 31

Recommended

Τweetfix: Data Analytics on Match Fixing by
Τweetfix: Data Analytics on Match FixingΤweetfix: Data Analytics on Match Fixing
Τweetfix: Data Analytics on Match FixingAntigoni-Maria Founta
206 views23 slides
User behavior model & recommendation on basis of social networks by
User behavior model & recommendation on basis of social networks User behavior model & recommendation on basis of social networks
User behavior model & recommendation on basis of social networks Shah Alam Sabuj
1K views53 slides
Approaching Big Data: Lesson Plan by
Approaching Big Data: Lesson Plan Approaching Big Data: Lesson Plan
Approaching Big Data: Lesson Plan Bessie Chu
2.7K views52 slides
Tackling the challenge of data Visualisation by
Tackling the challenge of data VisualisationTackling the challenge of data Visualisation
Tackling the challenge of data VisualisationScott Abbott
556 views61 slides
Social media tools for audience research and measurement and relevant influen... by
Social media tools for audience research and measurement and relevant influen...Social media tools for audience research and measurement and relevant influen...
Social media tools for audience research and measurement and relevant influen...Brilliant Noise
2.9K views62 slides
Virtual Class: Raising Visibility // Week 4 by
Virtual Class: Raising Visibility // Week 4Virtual Class: Raising Visibility // Week 4
Virtual Class: Raising Visibility // Week 4KDMC
333 views18 slides

More Related Content

Similar to stackconf 2023 | Analyzing Public Conversation using LDA and Topic Modeling, in Python by Sveta Gimpelson.pdf

5 Latest Research Papers On ML You Must Read In 2020 by
5 Latest Research Papers On ML You Must Read In 20205 Latest Research Papers On ML You Must Read In 2020
5 Latest Research Papers On ML You Must Read In 2020Steven Wallach
6 views75 slides
[Brighton SEO] Audience Intelligence & SEO: How to integrate data sources to ... by
[Brighton SEO] Audience Intelligence & SEO: How to integrate data sources to ...[Brighton SEO] Audience Intelligence & SEO: How to integrate data sources to ...
[Brighton SEO] Audience Intelligence & SEO: How to integrate data sources to ...Rory Hope
11.5K views57 slides
Social Media Analytics for Official Statistics by
Social Media Analytics for Official StatisticsSocial Media Analytics for Official Statistics
Social Media Analytics for Official StatisticsIsmail Fahmi
475 views97 slides
20120706 dir res_pres4_02 by
20120706 dir res_pres4_0220120706 dir res_pres4_02
20120706 dir res_pres4_02robbratney
408 views11 slides
20120706 dir res_pres4_03 by
20120706 dir res_pres4_0320120706 dir res_pres4_03
20120706 dir res_pres4_03robbratney
226 views11 slides
Managing Entropy - Clarity '13 - Keith Goode by
Managing Entropy - Clarity '13 - Keith GoodeManaging Entropy - Clarity '13 - Keith Goode
Managing Entropy - Clarity '13 - Keith GoodeKeith Goode
780 views65 slides

Similar to stackconf 2023 | Analyzing Public Conversation using LDA and Topic Modeling, in Python by Sveta Gimpelson.pdf(20)

5 Latest Research Papers On ML You Must Read In 2020 by Steven Wallach
5 Latest Research Papers On ML You Must Read In 20205 Latest Research Papers On ML You Must Read In 2020
5 Latest Research Papers On ML You Must Read In 2020
Steven Wallach6 views
[Brighton SEO] Audience Intelligence & SEO: How to integrate data sources to ... by Rory Hope
[Brighton SEO] Audience Intelligence & SEO: How to integrate data sources to ...[Brighton SEO] Audience Intelligence & SEO: How to integrate data sources to ...
[Brighton SEO] Audience Intelligence & SEO: How to integrate data sources to ...
Rory Hope11.5K views
Social Media Analytics for Official Statistics by Ismail Fahmi
Social Media Analytics for Official StatisticsSocial Media Analytics for Official Statistics
Social Media Analytics for Official Statistics
Ismail Fahmi475 views
20120706 dir res_pres4_02 by robbratney
20120706 dir res_pres4_0220120706 dir res_pres4_02
20120706 dir res_pres4_02
robbratney408 views
20120706 dir res_pres4_03 by robbratney
20120706 dir res_pres4_0320120706 dir res_pres4_03
20120706 dir res_pres4_03
robbratney226 views
Managing Entropy - Clarity '13 - Keith Goode by Keith Goode
Managing Entropy - Clarity '13 - Keith GoodeManaging Entropy - Clarity '13 - Keith Goode
Managing Entropy - Clarity '13 - Keith Goode
Keith Goode780 views
Measuring Social Media Return on Investment: Advanced social media analytics by Hugh Stephens
Measuring Social Media Return on Investment: Advanced social media analyticsMeasuring Social Media Return on Investment: Advanced social media analytics
Measuring Social Media Return on Investment: Advanced social media analytics
Hugh Stephens2K views
Federated Ontology for Sports- Paper by George Sam
Federated Ontology for Sports- PaperFederated Ontology for Sports- Paper
Federated Ontology for Sports- Paper
George Sam1K views
'Stories & Numbers' - A Framework for Measuring Engagement by Jason Ryan
'Stories & Numbers' - A Framework for Measuring Engagement'Stories & Numbers' - A Framework for Measuring Engagement
'Stories & Numbers' - A Framework for Measuring Engagement
Jason Ryan1.3K views
Raising visibility, awareness and reach for your online project by KDMC
Raising visibility, awareness and reach  for your online projectRaising visibility, awareness and reach  for your online project
Raising visibility, awareness and reach for your online project
KDMC590 views
Technology and open knowledge in sports statistics by dwiederman
Technology and open knowledge in sports statisticsTechnology and open knowledge in sports statistics
Technology and open knowledge in sports statistics
dwiederman771 views
Pathways Summary Brief 25 Aug2010 by jmorriso
Pathways Summary Brief   25 Aug2010Pathways Summary Brief   25 Aug2010
Pathways Summary Brief 25 Aug2010
jmorriso634 views
Raising Visibility of Your Project by KDMC
Raising Visibility of Your ProjectRaising Visibility of Your Project
Raising Visibility of Your Project
KDMC401 views
Big social data analytics - social network analysis by Jari Jussila
Big social data analytics - social network analysis Big social data analytics - social network analysis
Big social data analytics - social network analysis
Jari Jussila2.3K views
How To Prepare For The Future Of Search by Charlene Li
How To Prepare For The Future Of SearchHow To Prepare For The Future Of Search
How To Prepare For The Future Of Search
Charlene Li10.3K views
Rubrics for DMPs by Jisc RDM
Rubrics for DMPsRubrics for DMPs
Rubrics for DMPs
Jisc RDM784 views

Recently uploaded

New Microsoft Word Document.docx by
New Microsoft Word Document.docxNew Microsoft Word Document.docx
New Microsoft Word Document.docxapomahendranagarmudd
7 views11 slides
OSMC 2023 | IGNITE: Metrics, Margins, Mutiny – How to make your SREs (not) ru... by
OSMC 2023 | IGNITE: Metrics, Margins, Mutiny – How to make your SREs (not) ru...OSMC 2023 | IGNITE: Metrics, Margins, Mutiny – How to make your SREs (not) ru...
OSMC 2023 | IGNITE: Metrics, Margins, Mutiny – How to make your SREs (not) ru...NETWAYS
7 views20 slides
CitSciOz MOUA Inspiring Change Through Art by
CitSciOz MOUA Inspiring Change Through ArtCitSciOz MOUA Inspiring Change Through Art
CitSciOz MOUA Inspiring Change Through ArtChristian Bartens
42 views16 slides
Christan van Dorst - Hyteps by
Christan van Dorst - HytepsChristan van Dorst - Hyteps
Christan van Dorst - HytepsDutch Power
6 views24 slides
OSMC 2023 | IGNITE: Serving Server-Side WASM with Web Awareness with NGINX Un... by
OSMC 2023 | IGNITE: Serving Server-Side WASM with Web Awareness with NGINX Un...OSMC 2023 | IGNITE: Serving Server-Side WASM with Web Awareness with NGINX Un...
OSMC 2023 | IGNITE: Serving Server-Side WASM with Web Awareness with NGINX Un...NETWAYS
11 views19 slides
Speaking with confidence-converted.pdf by
Speaking with confidence-converted.pdfSpeaking with confidence-converted.pdf
Speaking with confidence-converted.pdfAbdul salam
16 views13 slides

Recently uploaded(20)

OSMC 2023 | IGNITE: Metrics, Margins, Mutiny – How to make your SREs (not) ru... by NETWAYS
OSMC 2023 | IGNITE: Metrics, Margins, Mutiny – How to make your SREs (not) ru...OSMC 2023 | IGNITE: Metrics, Margins, Mutiny – How to make your SREs (not) ru...
OSMC 2023 | IGNITE: Metrics, Margins, Mutiny – How to make your SREs (not) ru...
NETWAYS7 views
Christan van Dorst - Hyteps by Dutch Power
Christan van Dorst - HytepsChristan van Dorst - Hyteps
Christan van Dorst - Hyteps
Dutch Power6 views
OSMC 2023 | IGNITE: Serving Server-Side WASM with Web Awareness with NGINX Un... by NETWAYS
OSMC 2023 | IGNITE: Serving Server-Side WASM with Web Awareness with NGINX Un...OSMC 2023 | IGNITE: Serving Server-Side WASM with Web Awareness with NGINX Un...
OSMC 2023 | IGNITE: Serving Server-Side WASM with Web Awareness with NGINX Un...
NETWAYS11 views
Speaking with confidence-converted.pdf by Abdul salam
Speaking with confidence-converted.pdfSpeaking with confidence-converted.pdf
Speaking with confidence-converted.pdf
Abdul salam 16 views
Helko van den Brom - VSL by Dutch Power
Helko van den Brom - VSLHelko van den Brom - VSL
Helko van den Brom - VSL
Dutch Power7 views
Roozbeh Torkzadeh - TU Eindhoven by Dutch Power
Roozbeh Torkzadeh - TU EindhovenRoozbeh Torkzadeh - TU Eindhoven
Roozbeh Torkzadeh - TU Eindhoven
Dutch Power6 views
Post-event report intro session-1.docx by RohitRathi59
Post-event report intro session-1.docxPost-event report intro session-1.docx
Post-event report intro session-1.docx
RohitRathi5912 views
OSMC 2023 | Icinga for Windows – Age of PowerShell by Christian Stein by NETWAYS
OSMC 2023 | Icinga for Windows – Age of PowerShell by Christian SteinOSMC 2023 | Icinga for Windows – Age of PowerShell by Christian Stein
OSMC 2023 | Icinga for Windows – Age of PowerShell by Christian Stein
NETWAYS8 views
BLogSite (Web Programming) (1).pdf by Fiverr
BLogSite (Web Programming) (1).pdfBLogSite (Web Programming) (1).pdf
BLogSite (Web Programming) (1).pdf
Fiverr10 views
231121 SP slides - PAS workshop November 2023.pdf by PAS_Team
231121 SP slides - PAS workshop November 2023.pdf231121 SP slides - PAS workshop November 2023.pdf
231121 SP slides - PAS workshop November 2023.pdf
PAS_Team115 views
OSMC 2023 | Know your data: The stats behind your alerts by Dave McAllister by NETWAYS
OSMC 2023 | Know your data: The stats behind your alerts by Dave McAllisterOSMC 2023 | Know your data: The stats behind your alerts by Dave McAllister
OSMC 2023 | Know your data: The stats behind your alerts by Dave McAllister
NETWAYS9 views

stackconf 2023 | Analyzing Public Conversation using LDA and Topic Modeling, in Python by Sveta Gimpelson.pdf