For centuries, science (in German "Wissenschaft") has aimed to create ("schaften") new knowledge ("Wissen") from the observation of physical phenomena, their modelling, and empirical validation. Recently, a new source of knowledge has emerged: not (only) the physical world any more, but the virtual world, namely the Web with its ever-growing stream of data materialized in the form of social network chattering, content produced on demand by crowds of people, messages exchanged among interlinked devices in the Internet of Things. The knowledge we may find there can be dispersed, informal, contradicting, unsubstantiated and ephemeral today, while already tomorrow it may be commonly accepted. The challenge is once again to capture and create knowledge that is new, has not been formalized yet in existing knowledge bases, and is buried inside a big, moving target (the live stream of online data). The myth is that existing tools (spanning fields like semantic web, machine learning, statistics, NLP, and so on) suffice to the objective. While this may still be far from true, some existing approaches are actually addressing the problem and provide preliminary insights into the possibilities that successful attempts may lead to.
The talk explores the mixed realistic-utopian domain of knowledge extraction and reports on some tools and cases where digital and physical world have brought together for better understanding our society.
On the Quest for Changing Knowledge. Capturing emerging entities from social ...Marco Brambilla
Massive data integration technologies have been recently used to produce very large ontologies. However, knowledge in the world continuously evolves, and ontologies are largely incomplete for what concerns low-frequency data, belonging to the so-called long tail.
Socially produced content is an excellent source for discovering emerging knowledge: it is huge, and immediately
reflects the relevant changes which hide emerging entities. Thus, we propose a method for
discovering emerging entities by extracting them from social content.
We start from a purely-syntactic method as a baseline, and we propose a semantics-based method based on entity recognition and DBpedia. The method associates candidate entities to feature vectors, built
from social content by using co-occurrence, and then extracts the emerging entities by using feature similarity measures.
Once instrumented by experts through very simple initialization, the method is capable of finding emerging
entities and extracting their relevant relationships to given types; the method can be
continuously or periodically iterated, using the already identified emerged knowledge as new starting point.
We validate our method by applying it to a set of diverse domain-specific application scenarios, spanning fashion, literature, exhibitions and so on. We show the approach at work and we demonstrate its effectiveness on datasets with different characterization in terms of coverage, dynamics and size.
Chcete vědět víc? Mnoho dalších prezentací, videí z konferencí, fotografií i jiných dokumentů je k dispozici v institucionálním repozitáři NTK: http://repozitar.techlib.cz
Would you like to know more? Find presentations, reports, conference videos, photos and much more in our institutional repository at: http://repozitar.techlib.cz/?ln=en
MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...Micah Altman
Julia Flanders, who is the Director of the Digital Scholarship Group in the Northeastern University Library, and a Professor of Practice in Northeastern's English Department gave a talk on Jobs, Roles, Skills, Tools: Working in the Digital Academy as part of the Program on Information Science Brown Bag Series.
In the talk, illustrated by the slides below, Julia discusses the evolving landscape of digital humanities (and digital scholarship more broadly) and considers the relationship between technology, tool development, and professional roles.
For more see: http://informatics.mit.edu/event/brown-bag-jobs-roles-skills-tools-working-digital-academy-julia-flanders
SP1: Exploratory Network Analysis with GephiJohn Breslin
ICWSM 2011 Tutorial
Sebastien Heymann and Julian Bilcke
Gephi is an interactive visualization and exploration software for all kinds of networks and relational data: online social networks, emails, communication and financial networks, but also semantic networks, inter-organizational networks and more. Designed to make data navigation and manipulation easy, it aims to fulfill the complete chain from data importing to aesthetics refinements and interaction. Users interact with the visualization and manipulate structures, shapes and colors to reveal hidden properties. The goal is to help data analysts to make hypotheses, intuitively discover patterns or errors in large data collections.
In this tutorial we will provide a hands-on demonstration of the essential functionalities of Gephi, based on a real case scenario: the exploration of student networks from the "Facebook100" dataset (Social Structure of Facebook Networks, Amanda L. Traud et al, 2011). The participants will be guided step by step through the complete chain of representation, manipulation, layout, analysis and aesthetics refinements. Particular focus will be put on filters and metrics for the creation of their first visualizations. They will be incited to compare the hypotheses suggested by their own exploration to the results actually published in the academic paper afterwards. They finally will walk away with the practical knowledge enabling them to use Gephi for their own projects. The tutorial is intended for professionals, researchers and graduates who wish to learn how playing during a network exploration can speed up their studies.
Sébastien Heymann is a Ph.D. Candidate in Computer Science at Université Pierre et Marie Curie, France. His research at the ComplexNetworks team focuses on the dynamics of realworld networks. He leads the Gephi project since 2008, and is the administrator of the Gephi Consortium.
Julian Bilcke is a Software Engineer at ISC-PIF (Complex Systems Institute of Paris, France). He is a founder and a developer for the Gephi project since 2008.
On the Quest for Changing Knowledge. Capturing emerging entities from social ...Marco Brambilla
Massive data integration technologies have been recently used to produce very large ontologies. However, knowledge in the world continuously evolves, and ontologies are largely incomplete for what concerns low-frequency data, belonging to the so-called long tail.
Socially produced content is an excellent source for discovering emerging knowledge: it is huge, and immediately
reflects the relevant changes which hide emerging entities. Thus, we propose a method for
discovering emerging entities by extracting them from social content.
We start from a purely-syntactic method as a baseline, and we propose a semantics-based method based on entity recognition and DBpedia. The method associates candidate entities to feature vectors, built
from social content by using co-occurrence, and then extracts the emerging entities by using feature similarity measures.
Once instrumented by experts through very simple initialization, the method is capable of finding emerging
entities and extracting their relevant relationships to given types; the method can be
continuously or periodically iterated, using the already identified emerged knowledge as new starting point.
We validate our method by applying it to a set of diverse domain-specific application scenarios, spanning fashion, literature, exhibitions and so on. We show the approach at work and we demonstrate its effectiveness on datasets with different characterization in terms of coverage, dynamics and size.
Chcete vědět víc? Mnoho dalších prezentací, videí z konferencí, fotografií i jiných dokumentů je k dispozici v institucionálním repozitáři NTK: http://repozitar.techlib.cz
Would you like to know more? Find presentations, reports, conference videos, photos and much more in our institutional repository at: http://repozitar.techlib.cz/?ln=en
MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...Micah Altman
Julia Flanders, who is the Director of the Digital Scholarship Group in the Northeastern University Library, and a Professor of Practice in Northeastern's English Department gave a talk on Jobs, Roles, Skills, Tools: Working in the Digital Academy as part of the Program on Information Science Brown Bag Series.
In the talk, illustrated by the slides below, Julia discusses the evolving landscape of digital humanities (and digital scholarship more broadly) and considers the relationship between technology, tool development, and professional roles.
For more see: http://informatics.mit.edu/event/brown-bag-jobs-roles-skills-tools-working-digital-academy-julia-flanders
SP1: Exploratory Network Analysis with GephiJohn Breslin
ICWSM 2011 Tutorial
Sebastien Heymann and Julian Bilcke
Gephi is an interactive visualization and exploration software for all kinds of networks and relational data: online social networks, emails, communication and financial networks, but also semantic networks, inter-organizational networks and more. Designed to make data navigation and manipulation easy, it aims to fulfill the complete chain from data importing to aesthetics refinements and interaction. Users interact with the visualization and manipulate structures, shapes and colors to reveal hidden properties. The goal is to help data analysts to make hypotheses, intuitively discover patterns or errors in large data collections.
In this tutorial we will provide a hands-on demonstration of the essential functionalities of Gephi, based on a real case scenario: the exploration of student networks from the "Facebook100" dataset (Social Structure of Facebook Networks, Amanda L. Traud et al, 2011). The participants will be guided step by step through the complete chain of representation, manipulation, layout, analysis and aesthetics refinements. Particular focus will be put on filters and metrics for the creation of their first visualizations. They will be incited to compare the hypotheses suggested by their own exploration to the results actually published in the academic paper afterwards. They finally will walk away with the practical knowledge enabling them to use Gephi for their own projects. The tutorial is intended for professionals, researchers and graduates who wish to learn how playing during a network exploration can speed up their studies.
Sébastien Heymann is a Ph.D. Candidate in Computer Science at Université Pierre et Marie Curie, France. His research at the ComplexNetworks team focuses on the dynamics of realworld networks. He leads the Gephi project since 2008, and is the administrator of the Gephi Consortium.
Julian Bilcke is a Software Engineer at ISC-PIF (Complex Systems Institute of Paris, France). He is a founder and a developer for the Gephi project since 2008.
Exploring the Bi-verse.A trip across the digital and physical ecospheresMarco Brambilla
The Web and social media are the environments where people post their content, opinions, activities, and resources. Therefore, a considerable amount of user-generated content is produced every day for a wide variety of purposes. On the other side, people live their everyday life immersed in the physical world, where society, economy, politics and personal relations continuously evolve. These two opposite and complementary environment are today fully integrated: they reflect each other and they interact with each other in a stronger and stronger way.
Exploring and studying content and data coming from both environments offers a great opportunity to understand the ever evolving modern society, in terms of topics of interest, events, relations, and behaviour.
In this speech I will discuss through business cases and socio-political scenarios how we can extract insights and understand reality by combining and analyzing data from the digital and physical world, so as to reach a better overall picture of reality itself. Along this path, we need to keep into account that reality is complex and varies in time, space and along many other dimensions, including societal and economic variables. The speech highlights the main challenges that need to be addressed and outlines some data science strategies that can be applied to tackle these specific challenges.
This slide deck has been presented as a keynote speech at WISE 2022 in Biarritz, France.
Joy Mountford at BayCHI: Visualizations of Our Collective LivesBayCHI
The lines between art, design, and information are dissolving as we experience new places and objects. Consider, for example, the organic flow of air traffic over North America at daybreak, the bursts of search query memes spreading around the globe, and the pointillist surge of mobile phone usage on New Year's Eve. Using the new techniques of generative data visualization, a new generation of artist/designers/engineer/scientists are creating gorgeous, dynamic experiences driven by massive sets of data about our own lives. Their work comes to life in architectural spaces, on walls of wood and metal and light and shimmering glass clouds suspended overhead. Of course it must be touched to be appreciated and engaged with, simple gestures launch a thousand images and possibilities. Many of these projects have received international recognition. They are primarily 3D applications that can run in real time, but really can only be appreciated by watching them, as movies. These data movies aim to make information easier to understand while being enjoyable to watch. Surprising insights surface through looking at our 'data life' in new ways, and may compel us to design in different, even better ways.
Online text data for machine learning, data science, and research - Who can p...Fredrik Olsson
This slide deck concerns online text data for machine learning, artificial intelligence, data science, and scientific research. After this talk, you’ll know who can provide online text data, what types of data are hard to get, and principal data hygiene factors.
Updated in August 2019.
It has been said that Mobiles +Cloud + Social + Big Data = Better Run The World. IBM has invested over $20 billion since 2005 to grow its analytics business, many companies will invest more than $120 billion by 2015 on analytics, hardware, software and services critical in almost every industry like ; Healthcare, media, sports, finance, government, etc.
It has been estimated that there is a shortage of 140,000 – 190,000 people with deep analytical skills to fill the demand of jobs in the U.S. by 2018.
Decoding the human genome originally took 10 years to process; now it can be achieved in one week with the power of Analytic and BI (Business Intelligence). This lecture’s Key Messages is that Analytics provide a competitive edge to individuals , companies and institutions and that Analytics and BI are often critical to the success of any organization.
Methodology used is to teach analytic techniques through real world examples and real data with this goal to convince audience of the Analytics Edge and power of BI, and inspire them to use analytics and BI in their career and their life.
Geospatial Intelligence Middle East 2013_Big Data_Steven RamageSteven Ramage
Some initial considerations and discussion points around geospatial big data. Location adds context and relevance. Need to consider a number of V factors including Value.
Abstract:
Big Data concern large-volume, complex, growing data sets with multiple, autonomous sources. With the fast development of networking, data storage, and the data collection capacity, Big Data are now rapidly expanding in all science and engineering domains, including physical, biological and biomedical sciences. This paper presents a HACE theorem that characterizes the features of the Big Data revolution, and proposes a Big Data processing model, from the data mining perspective. This data-driven model involves demand-driven aggregation of information sources, mining and analysis, user interest modeling, and security and privacy considerations. We analyze the challenging issues in the data-driven model and also in the Big Data revolution.
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...Sirris
This lecture highlights current trends, challenges and opportunities related to the emergence of large amounts of data. It also presents Sirris’s recent research activities in this domain.
Researching Social Media – Big Data and Social Media AnalysisFarida Vis
Researching Social Media – Big Data and Social Media Analysis, presentation for the Social Media for Researchers: A Sheffield Universities Social Media Symposium, 23 September 2014
Tutorial for ACM Multimedia 2016, given together with Gerald Friedland, with contributions from Julia Bernd and Yiannis Kompatsiaris. The presentation covered an introduction to the problem of disclosing personal information through multimedia sharing, the associated security risks, methods for conducting multimodla inferences and technical frameworks that could help alleviate such risks.
Big Data, Big Opportunity: Making Sense of Big Data for PRCision
Big Data creates big brand-building opportunities. Do you know how to use it effectively? It’s as easy as tapping into social media intelligence and doesn’t require an additional investment in IT resources.
Heidi Sullivan of Cision and Mike Maziarka of Visible Technologies explore the basics of Big Data and how to leverage it to positively impact your brand.
Exploring the Bi-verse.A trip across the digital and physical ecospheresMarco Brambilla
The Web and social media are the environments where people post their content, opinions, activities, and resources. Therefore, a considerable amount of user-generated content is produced every day for a wide variety of purposes. On the other side, people live their everyday life immersed in the physical world, where society, economy, politics and personal relations continuously evolve. These two opposite and complementary environment are today fully integrated: they reflect each other and they interact with each other in a stronger and stronger way.
Exploring and studying content and data coming from both environments offers a great opportunity to understand the ever evolving modern society, in terms of topics of interest, events, relations, and behaviour.
In this speech I will discuss through business cases and socio-political scenarios how we can extract insights and understand reality by combining and analyzing data from the digital and physical world, so as to reach a better overall picture of reality itself. Along this path, we need to keep into account that reality is complex and varies in time, space and along many other dimensions, including societal and economic variables. The speech highlights the main challenges that need to be addressed and outlines some data science strategies that can be applied to tackle these specific challenges.
This slide deck has been presented as a keynote speech at WISE 2022 in Biarritz, France.
Joy Mountford at BayCHI: Visualizations of Our Collective LivesBayCHI
The lines between art, design, and information are dissolving as we experience new places and objects. Consider, for example, the organic flow of air traffic over North America at daybreak, the bursts of search query memes spreading around the globe, and the pointillist surge of mobile phone usage on New Year's Eve. Using the new techniques of generative data visualization, a new generation of artist/designers/engineer/scientists are creating gorgeous, dynamic experiences driven by massive sets of data about our own lives. Their work comes to life in architectural spaces, on walls of wood and metal and light and shimmering glass clouds suspended overhead. Of course it must be touched to be appreciated and engaged with, simple gestures launch a thousand images and possibilities. Many of these projects have received international recognition. They are primarily 3D applications that can run in real time, but really can only be appreciated by watching them, as movies. These data movies aim to make information easier to understand while being enjoyable to watch. Surprising insights surface through looking at our 'data life' in new ways, and may compel us to design in different, even better ways.
Online text data for machine learning, data science, and research - Who can p...Fredrik Olsson
This slide deck concerns online text data for machine learning, artificial intelligence, data science, and scientific research. After this talk, you’ll know who can provide online text data, what types of data are hard to get, and principal data hygiene factors.
Updated in August 2019.
It has been said that Mobiles +Cloud + Social + Big Data = Better Run The World. IBM has invested over $20 billion since 2005 to grow its analytics business, many companies will invest more than $120 billion by 2015 on analytics, hardware, software and services critical in almost every industry like ; Healthcare, media, sports, finance, government, etc.
It has been estimated that there is a shortage of 140,000 – 190,000 people with deep analytical skills to fill the demand of jobs in the U.S. by 2018.
Decoding the human genome originally took 10 years to process; now it can be achieved in one week with the power of Analytic and BI (Business Intelligence). This lecture’s Key Messages is that Analytics provide a competitive edge to individuals , companies and institutions and that Analytics and BI are often critical to the success of any organization.
Methodology used is to teach analytic techniques through real world examples and real data with this goal to convince audience of the Analytics Edge and power of BI, and inspire them to use analytics and BI in their career and their life.
Geospatial Intelligence Middle East 2013_Big Data_Steven RamageSteven Ramage
Some initial considerations and discussion points around geospatial big data. Location adds context and relevance. Need to consider a number of V factors including Value.
Abstract:
Big Data concern large-volume, complex, growing data sets with multiple, autonomous sources. With the fast development of networking, data storage, and the data collection capacity, Big Data are now rapidly expanding in all science and engineering domains, including physical, biological and biomedical sciences. This paper presents a HACE theorem that characterizes the features of the Big Data revolution, and proposes a Big Data processing model, from the data mining perspective. This data-driven model involves demand-driven aggregation of information sources, mining and analysis, user interest modeling, and security and privacy considerations. We analyze the challenging issues in the data-driven model and also in the Big Data revolution.
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...Sirris
This lecture highlights current trends, challenges and opportunities related to the emergence of large amounts of data. It also presents Sirris’s recent research activities in this domain.
Researching Social Media – Big Data and Social Media AnalysisFarida Vis
Researching Social Media – Big Data and Social Media Analysis, presentation for the Social Media for Researchers: A Sheffield Universities Social Media Symposium, 23 September 2014
Tutorial for ACM Multimedia 2016, given together with Gerald Friedland, with contributions from Julia Bernd and Yiannis Kompatsiaris. The presentation covered an introduction to the problem of disclosing personal information through multimedia sharing, the associated security risks, methods for conducting multimodla inferences and technical frameworks that could help alleviate such risks.
Big Data, Big Opportunity: Making Sense of Big Data for PRCision
Big Data creates big brand-building opportunities. Do you know how to use it effectively? It’s as easy as tapping into social media intelligence and doesn’t require an additional investment in IT resources.
Heidi Sullivan of Cision and Mike Maziarka of Visible Technologies explore the basics of Big Data and how to leverage it to positively impact your brand.
Similar to Myths and challenges in knowledge extraction and analysis from human-generated content (20)
Hierarchical Transformers for User Semantic Similarity - ICWE 2023Marco Brambilla
We discuss the use of hierarchical transformers for user semantic similarity in the context of analyzing users' behavior and profiling social media users. The objectives of the research include finding the best model for computing semantic user similarity, exploring the use of transformer-based models, and evaluating whether the embeddings reflect the desired similarity concept and can be used for other tasks.
We use a large dataset of Twitter users and apply an automatic labeling approach. The dataset consists of English tweets posted in November and December 2020, totaling about 27GB of compressed data. Preprocessing steps include filtering out short texts, cleaning user connections, and selecting a benchmark set of users for evaluation.
Since Transformer architectures are known to work well on short text, we cannot use them on extensive collections of tweets describing the activity of a user. Therefore, we propose a hierarchical structure of transformer models to be used first on tweets and then on their aggregations.
The models used in the study include hierarchical transformers, and the tweet embeddings are obtained using four Transformer-based models: RoBERTa2, BERTweet3, Sentence BERT4, and Twitter4SSE5. The researchers test different techniques for processing tweet embeddings to generate accurate user embeddings, including mean pooling, recurrence over BERT (RoBERT), and transformer over BERT (ToBERT).
The evaluation of the models is done on a set of 5,000 users, comparing user similarities with 30 other candidate users, 5 of which are considered similar and 25 considered dissimilar. The evaluation metrics used include mean average precision (MAP), mean reciprocal rank (MRR) at 10, and normalized discounted cumulative gain (nDCG).
The optimization process involves selecting a loss function and using the AdamW optimizer with specific hyperparameters. The results show that the hierarchical approach with a Stage-1 Twitter4SSE model and a Stage-2 Transformer model performs the best among the alternatives.
In conclusion, the research provides a large unbiased dataset for user similarity analysis, presents a hierarchical language model optimized for accurate user similarity computation, and validates the models' performance on similarity tasks, with potential applications to related problems.
The future work includes investigating the impact of time and topic drift on the models' performance.
In online social media platforms, users can express their ideas by posting original content or by adding comments and responses to existing posts, thus generating virtual discussions and conversations. Studying these conversations is essential for understanding the online communication behavior of users. This study proposes a novel approach to retrieve popular patterns on online conversations using network-based analysis. The analysis consists of two main stages: intent analysis and network generation. Users’ intention is detected using keyword-based categorization of posts and comments, integrated with classification through Naïve Bayes and Support Vector Machine algorithms for uncategorized comments. A continuous human-in-the-loop approach further improves the keyword-based classification. To build and understand communication patterns among the users, we build conversation graphs starting from the hierarchical structure of posts and comments, using a directed multigraph network. The experiments categorize 90% comments with 98% accuracy on a real social media dataset. The model then identifies relevant patterns in terms of shape and content; and finally determines the relevance and frequency of the patterns. Results show that the most popular online discussion patterns obtained from conversation graphs resemble real-life interactions and communication.
Trigger.eu: Cocteau game for policy making - introduction and demoMarco Brambilla
COCTEAU stands for "Co-Creating the European Union".
It's a project supported by the European Union whose objective is to involve citizens to cooperate alongside policy makers, contributing to build a better future.
Generation of Realistic Navigation Paths for Web Site Testing using RNNs and ...Marco Brambilla
A large audience of users and typically a long time frame are needed to produce sensible and useful log data, making it an expensive task.
To address this limit, we propose a method that focuses on the generation of REALISTIC NAVIGATIONAL PATHS, i.e., web logs .
Our approach is extremely relevant because it can at the same time tackle the problem of lack of publicly available data about web navigation logs, and also be adopted in industry for AUTOMATIC GENERATION OF REALISTIC TEST SETTINGS of Web sites yet to be deployed.
The generation has been implemented using deep learning methods for generating more realistic navigation activities, namely
Recurrent Neural Network, which are very well suited to temporally evolving data
Generative Adversarial Network: neural networks aimed at generating new data, such as images or text, very similar to the original ones and sometimes indistinguishable from them, that have become increasingly popular in recent years.
We run experiments using open data sets of weblogs as training, and we run tests for assessing the performance of the methods. Results in generating new weblog data are quite good with respect to the two evaluation metrics adopted (BLEU and Human evaluation).
Our study is described in detail in the paper published at ICWE 2020 – International Conference on Web Engineering with DOI: 10.1007/978-3-030-50578-3. It’s available online on the Springer Web site.
Analyzing rich club behavior in open source projectsMarco Brambilla
The network of collaborations in an open source project can reveal relevant emergent properties that influence its prospects of success.
In this work, we analyze open source projects to determine whether they exhibit a rich-club behavior, i.e., a phenomenon where contributors with a high number of collaborations (i.e., strongly connected within the collaboration network)
are likely to cooperate with other well-connected individuals. The presence or absence of a rich-club has an impact on the sustainability and robustness of the project.
For this analysis, we build and study a dataset with the 100 most popular projects in GitHub, exploiting connectivity patterns in the graph structure of collaborations that arise from commits, issues and pull requests. Results show that rich-club behavior is present in all the projects, but only few of them have an evident club structure. We compute coefficients both for single source graphs and the overall interaction graph, showing that rich-club behavior varies across different layers of software development. We provide possible explanations of our results, as well as implications for further analysis.
Analysis of On-line Debate on Long-Running Political Phenomena.The Brexit C...Marco Brambilla
In this study, we demonstrate that the computational social science is important to understand people behavior in political phenomena, and based on the long-running Brexit debate analysis on Twitter, we predict the public stance, discussion topics, and we measure the involvement of automated accounts and politicians’ social media accounts.
Community analysis using graph representation learning on social networksMarco Brambilla
In a world more and more connected, new and complex interaction
patterns can be extracted in the communication between people.
This is extremely valuable for brands that can better understand
the interests of users and the trends on social media to better target
their products. In this paper, we aim to analyze the communities
that arise around commercial brands on social networks to understand
the meaning of similarity, collaboration, and interaction
among users.We exploit the network that builds around the brands
by encoding it into a graph model.We build a social network graph,
considering user nodes and friendship relations; then we compare
it with a heterogeneous graph model, where also posts and hashtags
are considered as nodes and connected to the different node
types; we finally build also a reduced network, generated by inducing
direct user-to-user connections through the intermediate
nodes (posts and hashtags). These different variants are encoded
using graph representation learning, which generates a numerical
vector for each node. Machine learning techniques are applied to
these vectors to extract valuable insights for each user and for the
communities they belong to. In the paper, we report on our experiments
performed on an emerging fashion brand on Instagram, and
we show that our approach is able to discriminate potential customers
for the brand, and to highlight meaningful sub-communities
composed by users that share the same kind of content on social
networks.
Data Cleaning for social media knowledge extractionMarco Brambilla
Social media platforms let users share their opinions through textual or multimedia content. In many settings, this becomes a valuable source of knowledge that can be exploited for specific business objectives. Brands and companies often ask to monitor social media as sources for understanding the stance, opinion, and sentiment of their customers, audience and potential audience. This is crucial for them because it let them understand the trends and future commercial and marketing opportunities.
However, all this relies on a solid and reliable data collection phase, that grants that all the analyses, extractions and predictions are applied on clean, solid and focused data. Indeed, the typical topic-based collection of social media content performed through keyword-based search typically entails very noisy results.
We recently implemented a simple study aiming at cleaning the data collected from social content, within specific domains or related to given topics of interest. We propose a basic method for data cleaning and removal of off-topic content based on supervised machine learning techniques, i.e. classification, over data collected from social media platforms based on keywords regarding a specific topic. We define a general method for this and then we validate it through an experiment of data extraction from Twitter, with respect to a set of famous cultural institutions in Italy, including theaters, museums, and other venues.
For this case, we collaborated with domain experts to label the dataset, and then we evaluated and compared the performance of classifiers that are trained with different feature extraction strategies.
Iterative knowledge extraction from social networks. The Web Conference 2018Marco Brambilla
Knowledge in the world continuously evolves, and ontologies are largely incomplete, especially regarding data belonging to the so-called long tail. We propose a method for discovering emerging knowledge by extracting it from social content. Once initialized by domain experts, the method is capable of finding relevant entities by means of a mixed syntactic-semantic method. The method uses seeds, i.e. prototypes of emerging entities provided by experts, for generating candidates; then, it associates candidates to feature vectors built by using terms occurring in their social content and ranks the candidates by using their distance from the centroid of seeds, returning the top candidates. Our method can run iteratively, using the results as new seeds.
In this paper we address the following research questions: (1) How does the reconstructed domain knowledge evolve if the candidates of one extraction are recursively used as seeds (2) How does the reconstructed domain knowledge spread geographically (3) Can the method be used to inspect the past, present, and future of knowledge (4) Can the method be used to find emerging knowledge?.
This work was presented at The Web Conference 2018, MSM workshop.
Driving Style and Behavior Analysis based on Trip Segmentation over GPS Info...Marco Brambilla
Over one billion cars interact with each other on the road every day. Each driver has his own driving style, which could impact safety, fuel economy and road congestion. Knowledge about the driving style of the driver could be used to encourage ``better" driving behaviour through immediate feedback
while driving, or by scaling auto insurance rates based on the aggressiveness of the driving style.
In this work we report on our study of driving behaviour profiling based on unsupervised data mining methods. The main goal is to detect the different driving behaviours, and thus to cluster drivers with similar behaviour.
This paves the way to new business models related to the driving sector, such as Pay-How-You-Drive insurance
policies and car rentals.
Driver behavioral characteristics are studied by collecting information from GPS sensors on the cars and by applying three different analysis approaches (DP-means, Hidden Markov Models, and Behavioural Topic Extraction) to the contextual scene detection problems on car trips, in order to detect different
behaviour along each trip. Subsequently, drivers are clustered in similar profiles based on that and the results are compared with a human-defined groundtruth on drivers classification. The proposed framework is tested on a real dataset containing sampled car signals. While the different approaches show relevant differences in trip segment classification, the coherence of the final driver clustering results is surprisingly high.
Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...Marco Brambilla
Knowledge bases like DBpedia, Yago or Google's Knowledge
Graph contain huge amounts of ontological knowledge harvested from
(semi-)structured, curated data sources, such as relational databases or
XML and HTML documents. Yet, the Web is full of knowledge that is
not curated and/or structured and, hence, not easily indexed, for ex-
ample social data. Most work so far in this context has been dedicated
to the extraction of entities, i.e., people, things or concepts. This poster
describes our work toward the extraction of relationships among entities.
The objective is reconstructing a typed graph of entities and relation-
ships to represent the knowledge contained in social data, without the
need for a-priori domain knowledge. The experiments with real datasets
show promising performance across a variety of domains.
The key distinguishing
feature of the work is its focus on highly unstructured social data (tweets and
Facebook posts) without reliable grammar structures. Traditional relation extraction approaches supervised , semi-supervised or unsupervised,
commonly assume the availability of grammatically correct language corpora.
Model-driven Development of User Interfaces for IoT via Domain-specific Comp...Marco Brambilla
Internet of Things technologies and applications are evolving and continuously gaining traction in all fields and environments, including homes, cities, services, industry and commercial enterprises. However, still many problems need to be addressed. For instance, the
IoT vision is mainly focused on the technological and infrastructure aspect, and on the management and analysis of the huge amount of generated data, while so far the development of front-end and user interfaces for
IoT has not played a relevant role in research. On the contrary, user interfaces in the IoT ecosystem they can play a key role in the acceptance of solutions by final adopters. In this paper we present a model-driven approach to the design of IoT interfaces, by defining a specific visual design language and design patterns for IoT\ applications, and we show them at work. The language we propose is defined as an extension of the OMG standard language called IFML.
A Model-Based Method for Seamless Web and Mobile Experience. Splash 2016 conf.Marco Brambilla
Consumer-centered software applications nowadays are required
to be available both as mobile and desktop versions.
However, the app design is frequently made only for one of
the two (i.e., mobile first or web first) while missing an appropriate
design for the other (which, in turn, simply mimics
the interaction of the first one). This results into poor quality
of the interaction on one or the other platform. Current solutions
would require different designs, to be realized through
different design methods and tools, and that may require to
double development and maintenance costs.
In order to mitigate such an issue, this paper proposes a
novel approach that supports the design of both web and mobile
applications at once. Starting from a unique requirement
and business specification, where web– and mobile–specific
aspects are captured through tagging, we derive a platform independent
design of the system specified in IFML. This
model is subsequently refined and detailed for the two platforms,
and used to automatically generate both the web and
mobile versions. If more precise interactions are needed for
the mobile part, a blending with MobML, a mobile-specific
modeling language, is devised. Full traceability of the relations
between artifacts is granted.
The Web Science course focuses on the study of large-scale socio-technical systems associated with the World Wide Web. It considers the relationship between people and technology, the ways that society and technology complement one another and the way they impact on broader society. These analyses are inherently associated with Big Data management issues.
The course is organised in four parts.
1. Syntax
In the first part, the course introduces the basis of content analysis. If focuses on the syntactic aspects, covering the fundamentals of natural language processing and text mining. It describes the structure and typical characteristics of the different web sources, spanning search results, social media contents, social network structures, Web APIs, and so on. It also provides an overview of the basic Web analysis techniques applied in Web search and Web recommendation.
2. Semantics
In the second part, the course presents semantic technologies. These technologies are very important nowadays because they allow to treat the "variety" dimension of Big Data, i.e., they enable integration of multiple and diverse sources of information, which is typical on the modern Web platform. Covered topics include:
- RDF - a flexible data model to represent heterogeneous data
- OWL - a flexible ontological language to model heterogeneous data sources
- SPARQL - a query language for RDF.
It shows how to put all the pieces together in order to achieve interoperability among heterogeneous information sources
3. Time
The third part covers the realm of temporal-dependent data. The topics covered here allow to treat the "velocity" dimension of Big Data. It shows the importance for many Big Data analysis scenarios to process data stream, coming for instance from Internet of Things (IoT) and Social Media sources; and it describes how to apply semantic and syntactic techniques in the context of time-dependent information. For instance, it shows how to extend RDF to model RDF streams, how to extend SPARQL to continuously process RDF streams and how to reason on those RDF Streams
4. Applications
In the fourth part, the course focuses on specific application scenarios and presents the typical settings and problems where the presented techniques can be applied. This part discusses settings such as: big data analysis for smart cities; data analytics for brand monitoring (marketing) and event monitoring; data analysis for trend detection and user engagement; and so on.
Studying Multicultural Diversity of Cities and Neighborhoods through Social M...Marco Brambilla
Cities are growing as melting pots of people with different culture, religion, and language. In this paper, through multilingual analysis of Twitter contents shared within a city, we analyze the prevalent language in the different neighborhoods of the city and we compare the results with census data, in order to highlight any parallelisms or discrepancies between the two data sources. We show that the officially identified neighborhoods are actually representing significantly different communities and that the use of the social media as a data source helps to detect those weak signals that are not captured from traditional data.
Model driven software engineering in practice book - Chapter 9 - Model to tex...Marco Brambilla
Slides for the mdse-book.com chapter 9 - Model-to-text transformations.
Complete set of slides now available:
Chapter 1 - http://www.slideshare.net/mbrambil/modeldriven-software-engineering-in-practice-chapter-1-introduction
Chapter 2 - http://www.slideshare.net/mbrambil/modeldriven-software-engineering-in-practice-chapter-2-mdse-principles
Chapter 3 - http://www.slideshare.net/jcabot/model-driven-software-engineering-in-practice-chapter-3-mdse-use-cases
Chapter 4 - http://www.slideshare.net/jcabot/modeldriven-software-engineering-in-practice-chapter-4
Chapter 5 - http://www.slideshare.net/mbrambil/modeldriven-software-engineering-in-practice-chapter-5-integration-of-modeldriven-in-development-processes
Chapter 6 - http://www.slideshare.net/jcabot/mdse-bookslideschapter6
Chapter 7 - http://www.slideshare.net/mbrambil/model-driven-software-engineering-in-practice-book-chapter-7-developing-your-own-modeling-language
Chapter 8 - http://www.slideshare.net/jcabot/modeldriven-software-engineering-in-practice-chapter-8-modeltomodel-transformations
Chapter 9 - https://www.slideshare.net/mbrambil/model-driven-software-engineering-in-practice-book-chapter-9-model-to-text-transformations-and-code-generation
Chapter 10 - http://www.slideshare.net/jcabot/mdse-bookslideschapter10managingmodels
This book discusses how approaches based on modeling can improve the daily practice of software professionals. This is known as Model-Driven Software Engineering (MDSE) or, simply, Model-Driven Engineering (MDE).
MDSE practices have proved to increase efficiency and effectiveness in software development. MDSE adoption in the software industry is foreseen to grow exponentially in the near future, e.g., due to the convergence of software development and business analysis.
This book is an agile and flexible tool to introduce you to the MDE and MDSE world, thus allowing you to quickly understand its basic principles and techniques and to choose the right set of MDE instruments for your needs so that you can start to benefit from MDE right away.
The book is organized into two main parts.
The first part discusses the foundations of MDSE in terms of basic concepts (i.e., models and transformations), driving principles, application scenarios and current standards, like the wellknown MDA initiative proposed by OMG (Object Management Group) as well as the practices on how to integrate MDE in existing development processes.
The second part deals with the technical aspects of MDSE, spanning from the basics on when and how to build a domain-specific modeling language, to the description of Model-to-Text and Model-to-Model transformations, and the tools that support the management of MDE projects.
The book covers a wide set of introductory and technical topics, spanning MDE at large, definitions and orientation in the MD* world, metamodeling, domain specific languages, model transformations, reverse engineering, OMG's MDA, UML, OCL, A
Model driven software engineering in practice book - chapter 7 - Developing y...Marco Brambilla
Slides for the mdse-book.com - Chapter 7: Developing Your Own Modeling Language.
Complete set of slides now available:
Chapter 1 - http://www.slideshare.net/mbrambil/modeldriven-software-engineering-in-practice-chapter-1-introduction
Chapter 2 - http://www.slideshare.net/mbrambil/modeldriven-software-engineering-in-practice-chapter-2-mdse-principles
Chapter 3 - http://www.slideshare.net/jcabot/model-driven-software-engineering-in-practice-chapter-3-mdse-use-cases
Chapter 4 - http://www.slideshare.net/jcabot/modeldriven-software-engineering-in-practice-chapter-4
Chapter 5 - http://www.slideshare.net/mbrambil/modeldriven-software-engineering-in-practice-chapter-5-integration-of-modeldriven-in-development-processes
Chapter 6 - http://www.slideshare.net/jcabot/mdse-bookslideschapter6
Chapter 7 - http://www.slideshare.net/mbrambil/model-driven-software-engineering-in-practice-book-chapter-7-developing-your-own-modeling-language
Chapter 8 - http://www.slideshare.net/jcabot/modeldriven-software-engineering-in-practice-chapter-8-modeltomodel-transformations
Chapter 9 - https://www.slideshare.net/mbrambil/model-driven-software-engineering-in-practice-book-chapter-9-model-to-text-transformations-and-code-generation
Chapter 10 - http://www.slideshare.net/jcabot/mdse-bookslideschapter10managingmodels
This book discusses how approaches based on modeling can improve the daily practice of software professionals. This is known as Model-Driven Software Engineering (MDSE) or, simply, Model-Driven Engineering (MDE).
MDSE practices have proved to increase efficiency and effectiveness in software development. MDSE adoption in the software industry is foreseen to grow exponentially in the near future, e.g., due to the convergence of software development and business analysis.
This book is an agile and flexible tool to introduce you to the MDE and MDSE world, thus allowing you to quickly understand its basic principles and techniques and to choose the right set of MDE instruments for your needs so that you can start to benefit from MDE right away.
The first part discusses the foundations of MDSE in terms of basic concepts (i.e., models and transformations), driving principles, application scenarios and current standards, like the wellknown MDA initiative proposed by OMG (Object Management Group) as well as the practices on how to integrate MDE in existing development processes.
The second part deals with the technical aspects of MDSE, spanning from the basics on when and how to build a domain-specific modeling language, to the description of Model-to-Text and Model-to-Model transformations, and the tools that support the management of MDE projects.
The book covers the MD* world, metamodeling, domain specific languages, model transformations, reverse engineering, OMG's MDA, UML, OCL, ATL, QVT, MOF, Eclipse, EMF, GMF, TCS, xText.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
5. There are more things
In heaven and earth, Horatio,
Than are dreamt of in your philosophy.
Shakespeare (Hamlet Act 1, scene 5)
6. The Answer to the Great Question...
Of Life, the Universe and Everything
Data
Information
Knowledge
WisdomContext
independence
Understanding
Understanding relations
Understanding patterns
Understanding principles
14. 14
Data source Around for Frequency Delay
Census data 100s year years months
Newspaper 100s year days 1 day
Weather sensors 10s year hours/minutes hours/minutes
TV news 10s years hours minutes
Traffic sensors years 15 minutes minutes
Call Data Records years 15 minutes hours
Social media years seconds seconds
IoT recently milliseconds milliseconds
Source:EmanueleDellaValle
The data evolution
15. Data piles up without easing decision making
I have to decide:
A or B?
Why not C?
What if D?
Source:EmanueleDellaValle
16. But, we would like to …
fusing all those
data sources
making sense of the
fused information
Definitely E!
Source:EmanueleDellaValle
26. Data Quality Issue
Gartner Report
In 2017, 33% of the largest global companies will experience an information
crisis due to their inability to adequately value, govern and trust their
enterprise information.
If you torture the data long enough,
it will confess to anything
– Darrell Huff
27. The Vicious Cycle of Bad Data
Bad Data
Incorrect
Analysis
Invalid
Insights
Wrong
Decisions
Poor
Outcome
28. Conventional Definition of Data Quality
• Accuracy
• The data was recorded correctly.
• Completeness
• All relevant data was recorded.
• Uniqueness
• Entities are recorded once.
• Timeliness
• The data is kept up to date (and time consistency is granted).
• Consistency
• The data agrees with itself.
29. Why is Data “Dirty” ?
• Dummy Values,
• Absence of Data,
• Multipurpose Fields,
• Cryptic Data,
• Contradicting Data,
• Shared Field Usage,
• Inappropriate Use of Fields,
• Violation of Business Rules,
• Reused Primary Keys,
• Non-Unique Identifiers,
• Data Integration Problems
30. Data Wrangling a.k.a.
• Data Preprocessing
• Data Preparation
• Data Cleansing
• Data Scrubbing
• Data Munging
• Data Transformation
• Data Fold, Spindle, Mutilate…
• (good old) ETL
31. Foursquare
• Check-ins explicitly performed in venues all around the world
• Data set: Geo-localized Foursquare venues, collected through a
query every 50m with radius >50m over:
• Milan area: 20km x 17,5km
• Some numbers
• Total n° of venues: 90K (dirty)
• Total n° of valid venues: 43K
39. Data vs. Question
• Are they aligned?
• The usual problem of representativeness of the sample…
• At a different scale
• With much less control
• Example: the different pictures of the city
47. Example. Space Granularity: the Grid
• Regular squared grid
• Irregular grid with official business-driven meaning
• Irregular grid with data-driven definition
12/4
58. • Mobile Phone Calls & Msgs: 5 to 10 MLN per day in a city like Milan
• Trackable user events (incl. data traffic): 1,000 per user per day
Mobile Phone Data
59. IoT Sensors
• People counters: 1 event per second (or less)
• 86K+ events per day per sensor
• Industrial machine sensors: 100 measurements per second
63. Response of Social Media #MFW
• MILANO FASHION WEEK #MFW
• We have 2 signals:
• The first coming from the social media (in this case we will talk about only
Instagram)
• The second derived from the official calendar events
64. Research Questions
“Are live events still relevant?
Can online visibility be described simply by how famous is the brand?
Do space and time still matter?
Can we predict how people behave in time/space within events?
65. Discover more about the #MFW case
• https://marco-brambilla.com/2017/04/04/social-media-
behaviour-during-live-events-the-milano-fashion-week-mfw-case-
www2017/
(INCLUDING SLIDES)
66. Use Case #2: Design
The Milano Design Week
& FuoriSalone
67. •Fuorisalone Official database
• events/locations/itineraries
• Fuorisalone Official App
• GPS positions1 of the App users
• Events inserted in the agenda on the App
• Private social post (Facebook) of App users2
• SocialMedia Listener
• Keyword-based public social post (Twitter/Instagram)
• Semantic analysis
•
1 when the App was running
• 2 to use some App features the users had to perform a social login
Data sources of the analysis
68. • Data elements are georeferenced and aggregate by citypixel (100
x 100 mt squares)
• Merging multiple data sources makes it possible to infer
information:
• Which events attract more visitors?
• Which areas have the larger presence of visitors?
• Do people talk on the social networks about the events they are
interested in?
• Do people use social networks while visiting the events?
• ...
Fusing the data
87. THANKS!
QUESTIONS?
Myths and Challenges
in Extraction of Emerging Knowledge
from Human-generated Content
Marco Brambilla @marcobrambi marco.brambilla@polimi.it
http://datascience.deib.polimi.it http://home.deib.polimi.it/marcobrambi
Editor's Notes
Paolo
Qui spieghiamo le dimensioni trovate per descrivere la città per poi spiegare su quale parte ci siamo focalizzati e le 3 analisi ampliate.
Qui spieghiamo le dimensioni trovate per descrivere la città per poi spiegare su quale parte ci siamo focalizzati e le 3 analisi ampliate.
Qui spieghiamo le dimensioni trovate per descrivere la città per poi spiegare su quale parte ci siamo focalizzati e le 3 analisi ampliate.
Piercesare
Piercesare > selinunte giambellino
db di Twitter contiene quindi 106278 tweet, con una percentuale di circa il 6.5% di post geolocalizzati, che corrisponde in valore assoluto a quasi 7mila post.
db di Instagram, invece, contiene poco più di 556 mila post (circa 5 volte le dimensioni del db di Twitter), con il 28% circa di media geolocalizzati (+/- 155mila post).
Possiamo subito notare due fatti interessanti:
Per questo specifico scenario (MFW) Instagram è stato il mezzo di comunicazione preferito
utenti di Instagram risultano più propensi ad esibire la loro posizione «fisica» e quindi il coinvolgimento a un evento, (o la visita di un luogo, in generale), quasi ad indicare una prova della stessa partecipazione all’evento interessato
A questo punto possiamo partire con l’esplorazione e lo studio dei nostri dati che si compone di differenti sotto-analisi -> (analizzato alcune misure proprie degli autori dei contenuti, affrontato il problema di risposta nel tempo e nello spazio ai diversi appuntamenti da calendario, e, dopo avere aggiunto un altro tipo di reazione, che definiamo di popolarità, confronto dei risultati precedenti)