This document provides an introduction to mining data from the social web. It discusses various social media platforms like Facebook, Twitter, LinkedIn, Foursquare, and Flickr that enable users to produce, consume and interact with content. The document explores what insights can be gained from analyzing the large amounts of social data, such as understanding social behavior, political sentiment, how cities are experienced, and career trends. It outlines existing research on analyzing data from Twitter and photos to detect events, trends, opinions and more. The document concludes by discussing potential student project ideas involving hypothesis testing, exploring questions, solving problems, or analyzing interesting datasets.
Social Network Analysis (SNA) and its implications for knowledge discovery in...ACMBangalore
Social Network Analysis (SNA) and its implications for knowledge discovery in Informal Networks- Talk by Dr Jai Ganesh, SETLabs, Infosys at Search and Social Platforms tutorial, as part of Compute 2009, ACM Bangalore
Social Network Analysis Introduction including Data Structure Graph overview. Doug Needham
Social Network Analysis Introduction including Data Structure Graph overview. Given in Cincinnati August 18th 2015 as part of the DataSeed Meetup group.
This workshop will introduce some of the main principles and techniques of Social Network Analysis (SNA). We will use examples from organizational and social media-based networks to understand concepts such as network density, diameter, centrality measures, community detection algorithms, etc. The session will also introduce Gephi, a popular program for SNA. Gephi is a free and open-source tool that is available for both Mac and PC computers.
By the end of the session, you will develop a general understanding of what SNA is, what research questions it can help you answer, and how it can be applied to your own research. You will also learn how to use Gephi to visualize and examine networks using various layout and community detection algorithms.
Instructor’s Bio: Dr. Anatoliy Gruzd is a Canada Research Chair in Social Media Data Stewardship, Associate Professor at the Ted Rogers School of Management at Ryerson University, and Director of Research at the Social Media Lab. Anatoliy is also a Member of the Royal Society of Canada’s College of New Scholars, Artists and Scientists; a co-editor of a multidisciplinary journal on Big Data and Society; and a founding co-chair of the International Conference on Social Media and Society. His research initiatives explore how social media platforms are changing the ways in which people and organizations communicate, collaborate and disseminate information and how these changes impact the norms and structures of modern society.
Social Network Analysis (SNA) and its implications for knowledge discovery in...ACMBangalore
Social Network Analysis (SNA) and its implications for knowledge discovery in Informal Networks- Talk by Dr Jai Ganesh, SETLabs, Infosys at Search and Social Platforms tutorial, as part of Compute 2009, ACM Bangalore
Social Network Analysis Introduction including Data Structure Graph overview. Doug Needham
Social Network Analysis Introduction including Data Structure Graph overview. Given in Cincinnati August 18th 2015 as part of the DataSeed Meetup group.
This workshop will introduce some of the main principles and techniques of Social Network Analysis (SNA). We will use examples from organizational and social media-based networks to understand concepts such as network density, diameter, centrality measures, community detection algorithms, etc. The session will also introduce Gephi, a popular program for SNA. Gephi is a free and open-source tool that is available for both Mac and PC computers.
By the end of the session, you will develop a general understanding of what SNA is, what research questions it can help you answer, and how it can be applied to your own research. You will also learn how to use Gephi to visualize and examine networks using various layout and community detection algorithms.
Instructor’s Bio: Dr. Anatoliy Gruzd is a Canada Research Chair in Social Media Data Stewardship, Associate Professor at the Ted Rogers School of Management at Ryerson University, and Director of Research at the Social Media Lab. Anatoliy is also a Member of the Royal Society of Canada’s College of New Scholars, Artists and Scientists; a co-editor of a multidisciplinary journal on Big Data and Society; and a founding co-chair of the International Conference on Social Media and Society. His research initiatives explore how social media platforms are changing the ways in which people and organizations communicate, collaborate and disseminate information and how these changes impact the norms and structures of modern society.
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Xiaohan Zeng
The advent of the social networks has completely changed our daily life. The deluge of data collected on Social Network Services (SNS) and recent developments in complex network theory have enabled many marvelous predictive analysis, which tells us many amazing stories.
Why do we often feel that "the world is so small?" Is the six-degree separation purely imagination or based on mathematical insights? Why are there just a few rockstars who enjoy extreme popularity while most of us stay unknown to the world? When science meets coffee shop knowledge, things are bound to be intriguing.
I will first briefly describe what social networks are, in the mathematical sense. Then I will introduce some ways to extract characteristics of networks, and how these analyses can explain many anecdotes in our life. Finally, I'll show an example of what we can learn from social network analysis, based on data from Groupon.
An overview of the Network Overview Discovery and Exploration add-in for Excel 2007 (NodeXL), a social network analysis add-in for the familiar spreadsheet application. Visualize twitter, flickr, facebook, and email networks with just a few mouse clicks.
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...Denis Parra Santander
- First version was a guest lecture about Network Visualization in the class "Data Visualization" taught by Dr. Sharon Hsiao in the QMSS program at Columbia University http://www.columbia.edu/~ih2240/dataviz/index.htm
- This updated version was delivered in our class on SNA at PUC Chile in the MPGI master program.
In online social media platforms, users can express their ideas by posting original content or by adding comments and responses to existing posts, thus generating virtual discussions and conversations. Studying these conversations is essential for understanding the online communication behavior of users. This study proposes a novel approach to retrieve popular patterns on online conversations using network-based analysis. The analysis consists of two main stages: intent analysis and network generation. Users’ intention is detected using keyword-based categorization of posts and comments, integrated with classification through Naïve Bayes and Support Vector Machine algorithms for uncategorized comments. A continuous human-in-the-loop approach further improves the keyword-based classification. To build and understand communication patterns among the users, we build conversation graphs starting from the hierarchical structure of posts and comments, using a directed multigraph network. The experiments categorize 90% comments with 98% accuracy on a real social media dataset. The model then identifies relevant patterns in terms of shape and content; and finally determines the relevance and frequency of the patterns. Results show that the most popular online discussion patterns obtained from conversation graphs resemble real-life interactions and communication.
Short presentation at Dagstuhl seminar on Physical-Cyber-Social Computing, September 29 to October 4, 2013.
http://www.dagstuhl.de/en/program/calendar/semhp/?semnr=13402
This short set of slides summarizes the characteristics of people who play specific roles in networks. In a social network analysis, people in these roles can be discovered by running mathematical algorithms through the social graphs. But you don't need to be an algorithm to spot some of these people in your networks!
Slides for talk at ConTech 2011 the International Symposium on Convergence Technology (ConTech 2011) – Smart & Humane World – on November 3rd in Seoul, South Korea.
Date: 2011 November 3 (Thurs)
Place: COEX Grand Ballroom, Seoul, Korea
Organized by Advanced Institutes of Convergence Technologies (AICT), Seoul National University (SNU)
In Cooperation with Ministry of Knowledge Economy, Ministry of Education, Science and Technology, National Research Foundation of Korea, Graduate School of Convergence Science and Technology (GSCST)
LSS'11: Charting Collections Of Connections In Social MediaLocal Social Summit
Keynote Title: Charting Collections of Connections in Social Media: Creating Maps and Measures with NodeXL
Abstract: Networks are a data structure common found across all social media services that allow populations to author collections of connections. The Social Media Research Foundation‘s NodeXL project makes analysis of social media networks accessible to most users of the Excel spreadsheet application. With NodeXL, Networks become as easy to create as pie charts. Applying the tool to a range of social media networks has already revealed the variations present in online social spaces. A review of the tool and images of Twitter, flickr, YouTube, and email networks will be presented.
2013 NodeXL Social Media Network AnalysisMarc Smith
Social media network analysis and visualization with NodeXL - the network overview discovery and exploration add-in for Excel. Map Twitter, Facebook, email, blogs, and the web with a point and click interface within the familiar spreadsheet.
An introduction in the world of Social Network Analysis and a view on how this may help learning networks. History, data collection and several analysis techniques are shown.
One Web of pages, One Web of peoples, One Web of Services, One Web of Data, O...Fabien Gandon
Keynote Fabien GANDON, at WIM2016: One Web of pages, One Web of peoples, One Web of Services, One Web of Data, One Web of Things…and with the Semantic Web bind them.
Introduction for the second workshop "#FAIL! Things that didn't work out in social media research - and what we can learn from them". Workshop at #ir16 conference, Phoenix, October 21st, 2015
See https://failworkshops.wordpress.com
Wimmics Research Team 2015 Activity ReportFabien Gandon
Extract of the activity report of the Wimmics joint research team between Inria Sophia Antipolis - Méditerranée and I3S (CNRS and Université Nice Sophia Antipolis). Wimmics stands for web-instrumented man-machine interactions, communities and semantics. The team focuses on bridging social semantics and formal semantics on the web.
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Xiaohan Zeng
The advent of the social networks has completely changed our daily life. The deluge of data collected on Social Network Services (SNS) and recent developments in complex network theory have enabled many marvelous predictive analysis, which tells us many amazing stories.
Why do we often feel that "the world is so small?" Is the six-degree separation purely imagination or based on mathematical insights? Why are there just a few rockstars who enjoy extreme popularity while most of us stay unknown to the world? When science meets coffee shop knowledge, things are bound to be intriguing.
I will first briefly describe what social networks are, in the mathematical sense. Then I will introduce some ways to extract characteristics of networks, and how these analyses can explain many anecdotes in our life. Finally, I'll show an example of what we can learn from social network analysis, based on data from Groupon.
An overview of the Network Overview Discovery and Exploration add-in for Excel 2007 (NodeXL), a social network analysis add-in for the familiar spreadsheet application. Visualize twitter, flickr, facebook, and email networks with just a few mouse clicks.
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...Denis Parra Santander
- First version was a guest lecture about Network Visualization in the class "Data Visualization" taught by Dr. Sharon Hsiao in the QMSS program at Columbia University http://www.columbia.edu/~ih2240/dataviz/index.htm
- This updated version was delivered in our class on SNA at PUC Chile in the MPGI master program.
In online social media platforms, users can express their ideas by posting original content or by adding comments and responses to existing posts, thus generating virtual discussions and conversations. Studying these conversations is essential for understanding the online communication behavior of users. This study proposes a novel approach to retrieve popular patterns on online conversations using network-based analysis. The analysis consists of two main stages: intent analysis and network generation. Users’ intention is detected using keyword-based categorization of posts and comments, integrated with classification through Naïve Bayes and Support Vector Machine algorithms for uncategorized comments. A continuous human-in-the-loop approach further improves the keyword-based classification. To build and understand communication patterns among the users, we build conversation graphs starting from the hierarchical structure of posts and comments, using a directed multigraph network. The experiments categorize 90% comments with 98% accuracy on a real social media dataset. The model then identifies relevant patterns in terms of shape and content; and finally determines the relevance and frequency of the patterns. Results show that the most popular online discussion patterns obtained from conversation graphs resemble real-life interactions and communication.
Short presentation at Dagstuhl seminar on Physical-Cyber-Social Computing, September 29 to October 4, 2013.
http://www.dagstuhl.de/en/program/calendar/semhp/?semnr=13402
This short set of slides summarizes the characteristics of people who play specific roles in networks. In a social network analysis, people in these roles can be discovered by running mathematical algorithms through the social graphs. But you don't need to be an algorithm to spot some of these people in your networks!
Slides for talk at ConTech 2011 the International Symposium on Convergence Technology (ConTech 2011) – Smart & Humane World – on November 3rd in Seoul, South Korea.
Date: 2011 November 3 (Thurs)
Place: COEX Grand Ballroom, Seoul, Korea
Organized by Advanced Institutes of Convergence Technologies (AICT), Seoul National University (SNU)
In Cooperation with Ministry of Knowledge Economy, Ministry of Education, Science and Technology, National Research Foundation of Korea, Graduate School of Convergence Science and Technology (GSCST)
LSS'11: Charting Collections Of Connections In Social MediaLocal Social Summit
Keynote Title: Charting Collections of Connections in Social Media: Creating Maps and Measures with NodeXL
Abstract: Networks are a data structure common found across all social media services that allow populations to author collections of connections. The Social Media Research Foundation‘s NodeXL project makes analysis of social media networks accessible to most users of the Excel spreadsheet application. With NodeXL, Networks become as easy to create as pie charts. Applying the tool to a range of social media networks has already revealed the variations present in online social spaces. A review of the tool and images of Twitter, flickr, YouTube, and email networks will be presented.
2013 NodeXL Social Media Network AnalysisMarc Smith
Social media network analysis and visualization with NodeXL - the network overview discovery and exploration add-in for Excel. Map Twitter, Facebook, email, blogs, and the web with a point and click interface within the familiar spreadsheet.
An introduction in the world of Social Network Analysis and a view on how this may help learning networks. History, data collection and several analysis techniques are shown.
One Web of pages, One Web of peoples, One Web of Services, One Web of Data, O...Fabien Gandon
Keynote Fabien GANDON, at WIM2016: One Web of pages, One Web of peoples, One Web of Services, One Web of Data, One Web of Things…and with the Semantic Web bind them.
Introduction for the second workshop "#FAIL! Things that didn't work out in social media research - and what we can learn from them". Workshop at #ir16 conference, Phoenix, October 21st, 2015
See https://failworkshops.wordpress.com
Wimmics Research Team 2015 Activity ReportFabien Gandon
Extract of the activity report of the Wimmics joint research team between Inria Sophia Antipolis - Méditerranée and I3S (CNRS and Université Nice Sophia Antipolis). Wimmics stands for web-instrumented man-machine interactions, communities and semantics. The team focuses on bridging social semantics and formal semantics on the web.
Social Data and Multimedia Analytics for News and Events ApplicationsYiannis Kompatsiaris
The keynote discusses a framework enabling real-time multimedia indexing and search across multiple social media sources. It places particular emphasis on the real-time, social and contextual nature of content and information consumption in order to integrate topic and event detection, mining, search and retrieval, based on aggregation and indexing of shared user-generated multimedia content. User-friendly applications for the News and Events domains have been developed based on these approaches, incorporating novel user-centric media visualisation and browsing methods. The research and development is part of the FP7 EU project SocialSensor.
Content:
Introduction
Motivation – Challenges
SocialSensor Project and Use Cases
Research Approaches
Large-Scale visual search
Clustering
Verification
Demos – Applications
MM News Demo
Clusttour
Thessfest
Conclusions
Introduction to Computational Social Science - Lecture 1Lauri Eloranta
First lecture of the course CSS01: Introduction to Computational Social Science at the University of Helsinki, Spring 2015. (http://blogs.helsinki.fi/computationalsocialscience/).
Lecturer: Lauri Eloranta
Questions & Comments: https://twitter.com/laurieloranta
The workshop opens with a discussion of how to repurpose digital "methods of the medium" for social and cultural scholarly research, including its limitations, critiques and ethics. Subsequently participants are trained in using digital methods in hands-on sessions. How to use crawlers for dynamic URL sampling and issue network mapping? How to employ scrapers to create a bias or partisanship diagnostic instrument? We also consider how to deploy online platforms for social research. How to transform Wikipedia from an online encyclopaedia to a device for cross-cultural memory studies? How to make use of social media so as to profile the preferences and tastes of politicians’ friends, and also locate most engaged with content? How to make use of Twitter analytics to debanalize tweets, and provide compelling accounts of events on the ground? Finally, the workshop turns to the question of employing web data and metrics as societal indices more generally.
Digital Humanities is a term that elicits both excitement and scorn in scholarly circles, and there is still a great deal of discussion as to whether it is a field of inquiry, a set of research methods, or simply a new perspective on arts and humanities research. This workshop will provide a brief survey of how the evolving theory and practice of using contemporary technology and technology-assisted research methods are impacting scholarship in the arts and humanities.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Building the PoliMedia search system; data- and user-drivenMaxKemman
Presentation at eHumanities group at Meerten's Institute (Amsterdam) on Thursday 18 April 2013.
Analysing media coverage across several types of media-outlets is a challenging task for (media) historians. A specific example of media coverage research investigates the coverage of political debates and how the representation of topics and people change over time. The PoliMedia project (http://www.polimedia.nl) aims to showcase the potential of cross-media analysis for research in the humanities, by 1) curating automatically detected semantic links between four data sets of different media types, and 2) developing a demonstrator application that allows researchers to deploy such an interlinked collection for quantitative and qualitative analysis of media coverage of debates in the Dutch parliament.
These two goals reflect the two perspectives on the development of a search system such as PoliMedia; data- and user-driven. In this presentation, Laura Hollink (VU) will present the data-driven perspective of linking between different datasets and the research questions that arise in achieving this linkage: how to combine different types of datasets and what kind of research questions are made possible by the data? Max Kemman (EUR) will present the user-driven perspective: which benefits can scholars have from linking of these datasets? What are the user requirements for the PoliMedia search system and how was the system evaluated with scholars in an eye tracking study?
How can we mine, analyse and visualise the Social Web?
In this lecture, you will learn about mining social web data for analysis. Data preparation and gathering basic statistics on your data.
Case study of the 24 Hour Museum and its journey from portal to publisher. Presented to the Culturemondo Roundtable by Jane Finnis on Thursday 11th December 2008.
Presentation of our work "Local discrepancy maximization on graphs" at the International IEEE Conference on Data Engineering (ICDE) 2015 in Seoul, Korea.
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
Executive Directors Chat Leveraging AI for Diversity, Equity, and InclusionTechSoup
Let’s explore the intersection of technology and equity in the final session of our DEI series. Discover how AI tools, like ChatGPT, can be used to support and enhance your nonprofit's DEI initiatives. Participants will gain insights into practical AI applications and get tips for leveraging technology to advance their DEI goals.
Delivering Micro-Credentials in Technical and Vocational Education and TrainingAG2 Design
Explore how micro-credentials are transforming Technical and Vocational Education and Training (TVET) with this comprehensive slide deck. Discover what micro-credentials are, their importance in TVET, the advantages they offer, and the insights from industry experts. Additionally, learn about the top software applications available for creating and managing micro-credentials. This presentation also includes valuable resources and a discussion on the future of these specialised certifications.
For more detailed information on delivering micro-credentials in TVET, visit this https://tvettrainer.com/delivering-micro-credentials-in-tvet/
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
How to Build a Module in Odoo 17 Using the Scaffold MethodCeline George
Odoo provides an option for creating a module by using a single line command. By using this command the user can make a whole structure of a module. It is very easy for a beginner to make a module. There is no need to make each file manually. This slide will show how to create a module using the scaffold method.
Thinking of getting a dog? Be aware that breeds like Pit Bulls, Rottweilers, and German Shepherds can be loyal and dangerous. Proper training and socialization are crucial to preventing aggressive behaviors. Ensure safety by understanding their needs and always supervising interactions. Stay safe, and enjoy your furry friends!
Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides
1. mining
the
social
web
Aris2des
Gionis
Michael
Mathioudakis
firstname.lastname@aalto.fi
Aalto
University
Spring
2015
2. social
web
facebook
twiEer
linkedin
foursquare
flickr
instagram
pinterest
youtube
ustream
github
stackoverflow
wikipedia
2
3. social
web
websites
and
plaHorms
that
enable
users
to
produce
content
blog
posts,
‘status’
messages,
videos,
pictures,
podcasts
consume
content
read
text
-‐
blog
posts,
‘status’
messages
listen
to
podcasts,
watch
videos
interact
with
each
other
comment
on
each
other’s
posts,
‘like’
or
rate
items
3
4. mining
the
social
web
a
lot
of
users...
a
lot
of
data...
what
could
we
learn*?
*
assuming
we
have
the
data
-‐
more
on
that
later
gain
insights
into...
social
behavior
how
many
connec2ons
does
an
average
person
have?
do
people
connect
with
like-‐minded
people?
poli2cal
sen2ment
what
do
people
think
about
current
poli2cal
issues?
how
we
experience
our
ci2es
what’s
the
best
neighborhood
for
food/nightlife?
how
we
build
our
careers
how
oRen
do
people
change
careers?
how
beneficial
is
it
to
‘network’
professionally?
other?
4
5. mining
the
social
web
there
is
already
research
that
explores
those
ques2ons
we
will
discuss
some
of
it
now
and
in
the
next
two
lectures
5
6. twiEer
• a
social
sensor
– social
network
+
news
media
– what
is
happening?
– where,
who?
happening?
– trends
– events
– opinions
– poli2cal
views
– sen2ments
– demographics
6
12. foursquare
• loca2on-‐based
social
network
• users
check-‐in
to
different
loca2ons
• loca2ons
have
types
(hierarchy)
– restaurant,
sport
venue,
museum,
college,
…
• ques2ons:
– where
do
people
hang
out?
– where
events
take
place?
– do
friends
influence
each
other?
12
13. when/where
people
check
in?
. exploration
0 5 10 15 20
New-York
London
Barcelona
Helsinki
Total
(a) Hourly check-ins frequency during the day. The activity is at its lowest
around a.m. and after that, there are three peaks: one when people
go to work in the morning, one in the middle of the day and the last
one at the end of the evening. Yet, depending of the city, these peaks
do not happen at the same time, nor with the same intensity. Therefore,
instead of working directly the raw values of features, we use the number
of standard deviation or z-score.
– – – – – – – –
10
20
hour
perce
– – – – – –
10
20
30
40
50
60
hour
percentage
hours time clusters in Paris
Figure : Venues clustered by time of check-ins.
13
14. when/where
people
check
in?
datasets
City Name Category Entropy
Barcelona
Castellers de Barcelona Non-Profit 0.0139
Café de la Pompeu Café 0.0172
Ràdio Radio Station 0.0176
Paris
Boutique Orange Electronics Store 0.0099
Métro Goncourt [] Subway 0.0105
Blue Acacia Office 0.0112
Barcelona
Plaça de Catalunya Plaza 0.5835
Sants Estació Train Station 0.6298
Sagrada Família Government Building 0.6309
Camp Nou Stadium 0.6852
Paris
Gare SNCF : Gare de Lyon Train Station 0.6725
Gare SNCF : Paris Nord Train Station 0.6911
Musée du Louvre Museum 0.6924
Tour Eiffel Government Building 0.7167
(a) Venues in Paris and Barcelona with lowest and highest user en-
tropy.
14
18. your
project
come
up
with
a
project
idea
implement
it!
report
on
your
results
and
findings
18
19. types
of
projects
• form
a
hypothesis
and
set
out
to
test
it
– are
rich
people
happier?
• start
with
an
interes2ng
ques2on
– which
are
hipster
neighborhoods
in
my
city?
• start
with
a
business
idea
– recommend
relevant
music
to
music
listeners
– recommend
clothes
to
music
listeners
• start
with
a
problem
that
you
(think)
can
solve
– how
to
iden2fy
trends
in
space
and
2me?
• start
with
a
cool
dataset
and
explore
it
19
20. your
project
analyze
data
set
a
goal
for
your
project
(what’s
the
ques2on
you
want
to
answer)
study
related
literature
(what
has
/
hasn’t
been
done
already?
or
you
think
you
can
do
it
beEer)
collect
data
(some
data
are
more
difficult
to
come
by)
results
evalua2on
(have
you
answered
the
ques2on
asked
originally?
possible
improvements?
future
work?)
1
2
3
4
5
6
20
21. coming
up
with
a
project
idea
• conferences:
SIGKDD,
ICWSM,
WWW,
WSDM
• themes
– urban
compu2ng,
trend
/
event
detec2on,
social
networks,
poli2cal
sen2ment,
privacy
– other
• google
scholar
• talk
with
us
office
hours:
Mon,
14:15-‐15:30
and
by
appointment
21
22. collec2ng
the
data
• what
data
are
available?
– different
plaHorms
share
different
data
about
their
users’
ac2vity
– browse
dev
sites
of
social
networks
find
out
about
privacy
policies
and
APIs
– browse
public
data
repositories
– the
data
mining
group
has
data
for
blog
posts,
twiEer,
google+,
facebook,
foursquare
• code
Mining
the
Social
Web
(github)
hEps://github.com/ptwobrussell/Mining-‐the-‐Social-‐
Web-‐2nd-‐Edi2on
22
23. schedule
• Today:
overview
• February
2nd
:
discuss
literature
(Aris)
• February
9th
:
discuss
literature
(Michael)
• February
16th
23rd:
present
project
proposals
• March
30th
:
students
submit
progress
report
• March
30th
April
6th:
intermediate
presenta2ons
• May
4th
May
11th
:
final
presenta2ons
• May
15th
:
final
report
due
23
24. final
report
• introduc2on
• related
work
• problem
statement
• proposed
technique
(algorithms)
• data
descrip2on
• empirical
evalua2on
– results
– comparison
with
state
of
the
art
• future
work
24
25. grading
• originality
(has
it
been
done
before)
• poten2al
impact
(how
interes2ng
it
is
why)
• rigorousness
of
proposed
technique
• reproducibility
(public
code)
• presenta2on
• teams
of
2
are
encouraged
• presenta2ons
reports
are
required
• surveys
of
exis2ng
techniques
are
ok,
too
25
26. schedule
• Today:
overview
• February
2nd
:
discuss
literature
(Aris)
• February
9th
:
discuss
literature
(Michael)
• February
16th
and
23rd:
students
present
project
proposals
• March
30th
:
students
submit
progress
report
• March
30th
April
6th:
intermediate
presenta2ons
• May
4th
May
11th
:
final
presenta2ons
• May
15th
:
final
report
due
26
27. un2l
then...
browse
literature
see
papers
posted
on
noppa
for
a
sample
conferences
KDD,
ICWSM,
WWW,
WSDM
google
scholar
dev
websites,
for
example...
hEps://dev.twiEer.com,
hEps://developers.facebook.com,
hEps://developer.github.com/,
hEps://developer.foursquare.com
code
samples,
hEps://github.com/ptwobrussell/Mining-‐the-‐Social-‐Web-‐2nd-‐Edi2on
data
repositories,
hEp://snap.stanford.edu/,
hEp://icwsm.org/2013/datasets/datasets/,
hEp://wadam-‐data.dis.uniroma1.it
and
talk
to
us!
27
28. see
you
next
week!
Aris2des
Gionis
Michael
Mathioudakis
contact:
firstname.lastname@aalto.fi
Office
Hours:
Mon,
14:15-‐15:30
and
by
appointment
28