SlideShare a Scribd company logo
1 of 50
Download to read offline
Analysis of Online User Behaviour
for Art and Culture Events
Marco Brambilla Tahereh Arabghalizi Behnam Rahdari
Marco Brambilla
Contacts: @marcobrambi, marco.brambilla@polimi.it, http://datascience.deib.polimi.it
UNIVERSITY OF
PITTSBURGH
Agenda
Context
Method
• Pre-processing
• Analysis: topics, user clustering, images
• Prediction of interests
Challenges & Conclusions
Context
• Role of social media in our life
• Social media for cultural and artistic events
• Behaviour and content
• Multi-disciplinary collaboration on social media analysis and
cultural heritage
• Collaboration: Politecnico di Milano, Musei di Brescia, University
of Pittsburg
Research Questions
Topics of interest of visitors?
Categorization of users?
Demographics of visitors?
Engagement and online
participation?
Relation between photos, time,
location, text and the event?
Approach
Domain-specific pipeline to profile social media users
and content in cultural or art events
Case Study
The Floating Piers by Christo and Jeanne Claude
Iseo Lake, Italy
June 2016
Case Study
Case Study
• 17 MLN $
• 220,000 floating blocks
• 1.5 MLN visitors in 16 days
Pre-processing
Data Extraction
• Using Instagram and Twitter APIs
• Extract relevant tweets/posts during the event
• Extract all relevant users
o That tweet/post directly
o that like, comment, retweet, etc.
• Extract all properties
o Textual: bio, tweet/post text, hashtag, etc.
o Quantitative: #followers, #followings, etc.
o Media: photos, metadata (geotag, …)
Tweets Posts
14,062 30,256
Users Users
23,916 94,666
Authors Reacting Authors Reacting
7,724 16,197 16,681 77,985
From June 10th to July 30th
Collected Data
• Text normalization (NLP)
• Language identification and translation
• Gender detection
• Data cleansing
• Store clean and transformed data
Preprocessing
Time Distribution (Twitter)
Time series – Instagram vs. Twitter
Instagram Likes and Comments
Italy Lombardy Region Iseo Lake
Geographical Distribution (Instagram)
Data Analysis Process
1. Document Term Matrix (DTM)
2. Topic Extraction
3. Dimension Reduction
4. Cluster Analysis and Validation
5. Prediction
6. Media Analysis
7. Content Network Analysis
Topics
Document-term Matrix
A matrix that describes the frequency of terms that
occur in a collection of documents
Terms
Documents
Art Travel Italy Design …
Post 1 0 1 1 0
Post 2 1 2 0 1
Post 3 0 0 1 0
Post 4 1 1 3 1
…
Topic Extraction
Latent Dirichlet Allocation (LDA):
documents as mixtures of topics (with probability)
Input: Document Term Matrix
Outputs: Topics, Topic Probabilities Matrix
Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 Topic 6 …
Post 1 0.19 0.16 0.27 0.14 0.11 0.13
Post 2 0.31 0.18 0.21 0.08 0.10 0.12
Post 3 0.25 0.24 0.20 0.17 0.09 0.05
Post 4 0.19 0.32 0.22 0.10 0.07 0.10
…
Dimensionality Reduction
• Too many topics extracted with LDA
• Using Principle Component Analysis (PCA) to extract a smaller set
of linearly uncorrelated topics
> 0.95
Variance share Cumulative variance share
User Clustering
Cluster Analysis
• Apply clustering algorithms over Topic Probabilities
Matrix to cluster users
• Multiple data slices
• Multiple algorithms
o K-means
o Hierarchical
o DBSCAN
Topic 1
Topic 3
Topic 2
Cluster Validity
• How to evaluate the “goodness” of the resulting
clusters?
• Validation Measures
– Internal : ex. Silhouette Coefficient, Dunn’s Index,
Calinski-Harabasz index, etc.
– External: ex. Entropy, Purity, Rand index, etc.
User Clustering
Travel
Lovers
Art
Lovers
Internet & Tech
Lovers
Users’ Biography Word Clouds
Cluster Labeling
Word Network for Clusters
Travel Lovers
Art Lovers
Tech Lovers
Hierarchical Clustering
Language
Gender
Impact of Demographics
Prediction
Prediction
Predict the category or the interest area of potential new users for
similar cultural or art events in the future
Decision Trees
o Prepare Required Data
o Grow Decision Tree
o Extract rules from the tree
o Predict using test data
o Evaluate
Extracted Rules
Rule 1 : if (0.36 < Bio_score < 0.37 OR Bio_score < 0.35)
then Travel Lover
Rule 2: if (0.35 < Bio_score < 0.36 AND Status_count >
14.5) OR (Bio_score > 0.37 AND language != Italian)
then Art Lover
Rule 3: if (Bio_score > 0.37 AND Language = Italian) then
Tech Lover
Otherwise: Not Interested
accuracy = 62 %
Prediction rules
Decision Tree
Image Analysis
People in Pictures
Age Sex
50.4% female
49.6% male
Visitor Analytics
Race
Bias of the medium?
Objects in Pictures Hashtags
Users tend not to report the actual content of the photos
in their textual descriptions /hashtags
Object Extraction from Pictures
Main color shades among all photos
Color Detection
for Subject Identification
Confusion Matrix
Simple techniques “good enough”?
Objects or Colors?
Ongoing Challenges
Future Challenges of KE
Determining exact
positioning based on
perspective
Future Challenges of KE
Network structures
and their temporal
evolution
Max graph perturbation
Daily graph variations
Future Challenges
Real cross-disciplinarity
(cultural heritage, humanities,
social science)
No one at the cultural part of the
event!
Exhibit--->
Conclusions
• (Sometimes) Simple methods work just fine
• Interesting profiling and behaviour detection
• Still far from cross-disciplinary approaches
• How much: human vs. machine?
• Data Visualization and Crowdsourcing
Contacts: Marco Brambilla, @marcobrambi, marco.brambilla@polimi.it
http://datascience.deib.polimi.it
http://www.marco-brambilla.com
Analysis of Online User Behaviour
for Art and Culture Events
Marco Brambilla, Tahereh Arabghalizi, Behnam Rahdari
Analysis and knowledge extraction of user behaviour and social media content for art culture events
Analysis and knowledge extraction of user behaviour and social media content for art culture events
Analysis and knowledge extraction of user behaviour and social media content for art culture events

More Related Content

More from Marco Brambilla

Analysis of On-line Debate on Long-Running Political Phenomena. The Brexit C...
Analysis of On-line Debate on Long-Running Political Phenomena.The Brexit C...Analysis of On-line Debate on Long-Running Political Phenomena.The Brexit C...
Analysis of On-line Debate on Long-Running Political Phenomena. The Brexit C...Marco Brambilla
 
Community analysis using graph representation learning on social networks
Community analysis using graph representation learning on social networksCommunity analysis using graph representation learning on social networks
Community analysis using graph representation learning on social networksMarco Brambilla
 
Available Data Science M.Sc. Thesis Proposals
Available Data Science M.Sc. Thesis Proposals Available Data Science M.Sc. Thesis Proposals
Available Data Science M.Sc. Thesis Proposals Marco Brambilla
 
Data Cleaning for social media knowledge extraction
Data Cleaning for social media knowledge extractionData Cleaning for social media knowledge extraction
Data Cleaning for social media knowledge extractionMarco Brambilla
 
Iterative knowledge extraction from social networks. The Web Conference 2018
Iterative knowledge extraction from social networks. The Web Conference 2018Iterative knowledge extraction from social networks. The Web Conference 2018
Iterative knowledge extraction from social networks. The Web Conference 2018Marco Brambilla
 
Driving Style and Behavior Analysis based on Trip Segmentation over GPS Info...
Driving Style and Behavior Analysis based on Trip Segmentation over GPS  Info...Driving Style and Behavior Analysis based on Trip Segmentation over GPS  Info...
Driving Style and Behavior Analysis based on Trip Segmentation over GPS Info...Marco Brambilla
 
Myths and challenges in knowledge extraction and analysis from human-generate...
Myths and challenges in knowledge extraction and analysis from human-generate...Myths and challenges in knowledge extraction and analysis from human-generate...
Myths and challenges in knowledge extraction and analysis from human-generate...Marco Brambilla
 
Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...
Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...
Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...Marco Brambilla
 
Model-driven Development of User Interfaces for IoT via Domain-specific Comp...
Model-driven Development of  User Interfaces for IoT via Domain-specific Comp...Model-driven Development of  User Interfaces for IoT via Domain-specific Comp...
Model-driven Development of User Interfaces for IoT via Domain-specific Comp...Marco Brambilla
 
A Model-Based Method for Seamless Web and Mobile Experience. Splash 2016 conf.
A Model-Based Method for  Seamless Web and Mobile Experience. Splash 2016 conf.A Model-Based Method for  Seamless Web and Mobile Experience. Splash 2016 conf.
A Model-Based Method for Seamless Web and Mobile Experience. Splash 2016 conf.Marco Brambilla
 
Big Data and Stream Data Analysis at Politecnico di Milano
Big Data and Stream Data Analysis at Politecnico di MilanoBig Data and Stream Data Analysis at Politecnico di Milano
Big Data and Stream Data Analysis at Politecnico di MilanoMarco Brambilla
 
Web Science. An introduction
Web Science. An introductionWeb Science. An introduction
Web Science. An introductionMarco Brambilla
 
On the Quest for Changing Knowledge. Capturing emerging entities from social ...
On the Quest for Changing Knowledge. Capturing emerging entities from social ...On the Quest for Changing Knowledge. Capturing emerging entities from social ...
On the Quest for Changing Knowledge. Capturing emerging entities from social ...Marco Brambilla
 
Studying Multicultural Diversity of Cities and Neighborhoods through Social M...
Studying Multicultural Diversity of Cities and Neighborhoods through Social M...Studying Multicultural Diversity of Cities and Neighborhoods through Social M...
Studying Multicultural Diversity of Cities and Neighborhoods through Social M...Marco Brambilla
 
Model driven software engineering in practice book - Chapter 9 - Model to tex...
Model driven software engineering in practice book - Chapter 9 - Model to tex...Model driven software engineering in practice book - Chapter 9 - Model to tex...
Model driven software engineering in practice book - Chapter 9 - Model to tex...Marco Brambilla
 
Model driven software engineering in practice book - chapter 7 - Developing y...
Model driven software engineering in practice book - chapter 7 - Developing y...Model driven software engineering in practice book - chapter 7 - Developing y...
Model driven software engineering in practice book - chapter 7 - Developing y...Marco Brambilla
 
Automatic code generation for cross platform, multi-device mobile apps. An in...
Automatic code generation for cross platform, multi-device mobile apps. An in...Automatic code generation for cross platform, multi-device mobile apps. An in...
Automatic code generation for cross platform, multi-device mobile apps. An in...Marco Brambilla
 
IFML - Internet of Things and Internet of People: The Role of User Interactio...
IFML - Internet of Things and Internet of People: The Role of User Interactio...IFML - Internet of Things and Internet of People: The Role of User Interactio...
IFML - Internet of Things and Internet of People: The Role of User Interactio...Marco Brambilla
 
Model-Driven Software Engineering in Practice - Chapter 5 - Integration of Mo...
Model-Driven Software Engineering in Practice - Chapter 5 - Integration of Mo...Model-Driven Software Engineering in Practice - Chapter 5 - Integration of Mo...
Model-Driven Software Engineering in Practice - Chapter 5 - Integration of Mo...Marco Brambilla
 
Mobile + cloud + internet of things (iot) = nuove opportunità di business
Mobile + cloud + internet of things (iot) = nuove opportunità di businessMobile + cloud + internet of things (iot) = nuove opportunità di business
Mobile + cloud + internet of things (iot) = nuove opportunità di businessMarco Brambilla
 

More from Marco Brambilla (20)

Analysis of On-line Debate on Long-Running Political Phenomena. The Brexit C...
Analysis of On-line Debate on Long-Running Political Phenomena.The Brexit C...Analysis of On-line Debate on Long-Running Political Phenomena.The Brexit C...
Analysis of On-line Debate on Long-Running Political Phenomena. The Brexit C...
 
Community analysis using graph representation learning on social networks
Community analysis using graph representation learning on social networksCommunity analysis using graph representation learning on social networks
Community analysis using graph representation learning on social networks
 
Available Data Science M.Sc. Thesis Proposals
Available Data Science M.Sc. Thesis Proposals Available Data Science M.Sc. Thesis Proposals
Available Data Science M.Sc. Thesis Proposals
 
Data Cleaning for social media knowledge extraction
Data Cleaning for social media knowledge extractionData Cleaning for social media knowledge extraction
Data Cleaning for social media knowledge extraction
 
Iterative knowledge extraction from social networks. The Web Conference 2018
Iterative knowledge extraction from social networks. The Web Conference 2018Iterative knowledge extraction from social networks. The Web Conference 2018
Iterative knowledge extraction from social networks. The Web Conference 2018
 
Driving Style and Behavior Analysis based on Trip Segmentation over GPS Info...
Driving Style and Behavior Analysis based on Trip Segmentation over GPS  Info...Driving Style and Behavior Analysis based on Trip Segmentation over GPS  Info...
Driving Style and Behavior Analysis based on Trip Segmentation over GPS Info...
 
Myths and challenges in knowledge extraction and analysis from human-generate...
Myths and challenges in knowledge extraction and analysis from human-generate...Myths and challenges in knowledge extraction and analysis from human-generate...
Myths and challenges in knowledge extraction and analysis from human-generate...
 
Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...
Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...
Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...
 
Model-driven Development of User Interfaces for IoT via Domain-specific Comp...
Model-driven Development of  User Interfaces for IoT via Domain-specific Comp...Model-driven Development of  User Interfaces for IoT via Domain-specific Comp...
Model-driven Development of User Interfaces for IoT via Domain-specific Comp...
 
A Model-Based Method for Seamless Web and Mobile Experience. Splash 2016 conf.
A Model-Based Method for  Seamless Web and Mobile Experience. Splash 2016 conf.A Model-Based Method for  Seamless Web and Mobile Experience. Splash 2016 conf.
A Model-Based Method for Seamless Web and Mobile Experience. Splash 2016 conf.
 
Big Data and Stream Data Analysis at Politecnico di Milano
Big Data and Stream Data Analysis at Politecnico di MilanoBig Data and Stream Data Analysis at Politecnico di Milano
Big Data and Stream Data Analysis at Politecnico di Milano
 
Web Science. An introduction
Web Science. An introductionWeb Science. An introduction
Web Science. An introduction
 
On the Quest for Changing Knowledge. Capturing emerging entities from social ...
On the Quest for Changing Knowledge. Capturing emerging entities from social ...On the Quest for Changing Knowledge. Capturing emerging entities from social ...
On the Quest for Changing Knowledge. Capturing emerging entities from social ...
 
Studying Multicultural Diversity of Cities and Neighborhoods through Social M...
Studying Multicultural Diversity of Cities and Neighborhoods through Social M...Studying Multicultural Diversity of Cities and Neighborhoods through Social M...
Studying Multicultural Diversity of Cities and Neighborhoods through Social M...
 
Model driven software engineering in practice book - Chapter 9 - Model to tex...
Model driven software engineering in practice book - Chapter 9 - Model to tex...Model driven software engineering in practice book - Chapter 9 - Model to tex...
Model driven software engineering in practice book - Chapter 9 - Model to tex...
 
Model driven software engineering in practice book - chapter 7 - Developing y...
Model driven software engineering in practice book - chapter 7 - Developing y...Model driven software engineering in practice book - chapter 7 - Developing y...
Model driven software engineering in practice book - chapter 7 - Developing y...
 
Automatic code generation for cross platform, multi-device mobile apps. An in...
Automatic code generation for cross platform, multi-device mobile apps. An in...Automatic code generation for cross platform, multi-device mobile apps. An in...
Automatic code generation for cross platform, multi-device mobile apps. An in...
 
IFML - Internet of Things and Internet of People: The Role of User Interactio...
IFML - Internet of Things and Internet of People: The Role of User Interactio...IFML - Internet of Things and Internet of People: The Role of User Interactio...
IFML - Internet of Things and Internet of People: The Role of User Interactio...
 
Model-Driven Software Engineering in Practice - Chapter 5 - Integration of Mo...
Model-Driven Software Engineering in Practice - Chapter 5 - Integration of Mo...Model-Driven Software Engineering in Practice - Chapter 5 - Integration of Mo...
Model-Driven Software Engineering in Practice - Chapter 5 - Integration of Mo...
 
Mobile + cloud + internet of things (iot) = nuove opportunità di business
Mobile + cloud + internet of things (iot) = nuove opportunità di businessMobile + cloud + internet of things (iot) = nuove opportunità di business
Mobile + cloud + internet of things (iot) = nuove opportunità di business
 

Recently uploaded

Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...ThinkInnovation
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...ThinkInnovation
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etclalithasri22
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are successPratikSingh115843
 

Recently uploaded (16)

Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
 
2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etc
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are success
 

Analysis and knowledge extraction of user behaviour and social media content for art culture events

  • 1. Analysis of Online User Behaviour for Art and Culture Events Marco Brambilla Tahereh Arabghalizi Behnam Rahdari Marco Brambilla Contacts: @marcobrambi, marco.brambilla@polimi.it, http://datascience.deib.polimi.it UNIVERSITY OF PITTSBURGH
  • 2. Agenda Context Method • Pre-processing • Analysis: topics, user clustering, images • Prediction of interests Challenges & Conclusions
  • 3. Context • Role of social media in our life • Social media for cultural and artistic events • Behaviour and content • Multi-disciplinary collaboration on social media analysis and cultural heritage • Collaboration: Politecnico di Milano, Musei di Brescia, University of Pittsburg
  • 4. Research Questions Topics of interest of visitors? Categorization of users? Demographics of visitors? Engagement and online participation? Relation between photos, time, location, text and the event?
  • 5. Approach Domain-specific pipeline to profile social media users and content in cultural or art events
  • 6. Case Study The Floating Piers by Christo and Jeanne Claude Iseo Lake, Italy June 2016
  • 8.
  • 9.
  • 10. Case Study • 17 MLN $ • 220,000 floating blocks • 1.5 MLN visitors in 16 days
  • 12. Data Extraction • Using Instagram and Twitter APIs • Extract relevant tweets/posts during the event • Extract all relevant users o That tweet/post directly o that like, comment, retweet, etc. • Extract all properties o Textual: bio, tweet/post text, hashtag, etc. o Quantitative: #followers, #followings, etc. o Media: photos, metadata (geotag, …)
  • 13. Tweets Posts 14,062 30,256 Users Users 23,916 94,666 Authors Reacting Authors Reacting 7,724 16,197 16,681 77,985 From June 10th to July 30th Collected Data
  • 14. • Text normalization (NLP) • Language identification and translation • Gender detection • Data cleansing • Store clean and transformed data Preprocessing
  • 16. Time series – Instagram vs. Twitter
  • 18. Italy Lombardy Region Iseo Lake Geographical Distribution (Instagram)
  • 19. Data Analysis Process 1. Document Term Matrix (DTM) 2. Topic Extraction 3. Dimension Reduction 4. Cluster Analysis and Validation 5. Prediction 6. Media Analysis 7. Content Network Analysis
  • 21. Document-term Matrix A matrix that describes the frequency of terms that occur in a collection of documents Terms Documents Art Travel Italy Design … Post 1 0 1 1 0 Post 2 1 2 0 1 Post 3 0 0 1 0 Post 4 1 1 3 1 …
  • 22. Topic Extraction Latent Dirichlet Allocation (LDA): documents as mixtures of topics (with probability) Input: Document Term Matrix Outputs: Topics, Topic Probabilities Matrix Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 Topic 6 … Post 1 0.19 0.16 0.27 0.14 0.11 0.13 Post 2 0.31 0.18 0.21 0.08 0.10 0.12 Post 3 0.25 0.24 0.20 0.17 0.09 0.05 Post 4 0.19 0.32 0.22 0.10 0.07 0.10 …
  • 23. Dimensionality Reduction • Too many topics extracted with LDA • Using Principle Component Analysis (PCA) to extract a smaller set of linearly uncorrelated topics > 0.95 Variance share Cumulative variance share
  • 25. Cluster Analysis • Apply clustering algorithms over Topic Probabilities Matrix to cluster users • Multiple data slices • Multiple algorithms o K-means o Hierarchical o DBSCAN Topic 1 Topic 3 Topic 2
  • 26. Cluster Validity • How to evaluate the “goodness” of the resulting clusters? • Validation Measures – Internal : ex. Silhouette Coefficient, Dunn’s Index, Calinski-Harabasz index, etc. – External: ex. Entropy, Purity, Rand index, etc.
  • 28. Travel Lovers Art Lovers Internet & Tech Lovers Users’ Biography Word Clouds Cluster Labeling
  • 29. Word Network for Clusters
  • 30. Travel Lovers Art Lovers Tech Lovers Hierarchical Clustering
  • 33. Prediction Predict the category or the interest area of potential new users for similar cultural or art events in the future Decision Trees o Prepare Required Data o Grow Decision Tree o Extract rules from the tree o Predict using test data o Evaluate
  • 34. Extracted Rules Rule 1 : if (0.36 < Bio_score < 0.37 OR Bio_score < 0.35) then Travel Lover Rule 2: if (0.35 < Bio_score < 0.36 AND Status_count > 14.5) OR (Bio_score > 0.37 AND language != Italian) then Art Lover Rule 3: if (Bio_score > 0.37 AND Language = Italian) then Tech Lover Otherwise: Not Interested accuracy = 62 % Prediction rules
  • 38. Age Sex 50.4% female 49.6% male Visitor Analytics Race Bias of the medium?
  • 39. Objects in Pictures Hashtags Users tend not to report the actual content of the photos in their textual descriptions /hashtags Object Extraction from Pictures
  • 40. Main color shades among all photos Color Detection for Subject Identification
  • 41. Confusion Matrix Simple techniques “good enough”? Objects or Colors?
  • 43. Future Challenges of KE Determining exact positioning based on perspective
  • 44. Future Challenges of KE Network structures and their temporal evolution Max graph perturbation Daily graph variations
  • 45. Future Challenges Real cross-disciplinarity (cultural heritage, humanities, social science) No one at the cultural part of the event! Exhibit--->
  • 46. Conclusions • (Sometimes) Simple methods work just fine • Interesting profiling and behaviour detection • Still far from cross-disciplinary approaches • How much: human vs. machine? • Data Visualization and Crowdsourcing
  • 47. Contacts: Marco Brambilla, @marcobrambi, marco.brambilla@polimi.it http://datascience.deib.polimi.it http://www.marco-brambilla.com Analysis of Online User Behaviour for Art and Culture Events Marco Brambilla, Tahereh Arabghalizi, Behnam Rahdari