SlideShare a Scribd company logo
1 of 36
Part I → Part II → Part III → Part IV → Part V

         Data and Software
Part II: Outline
       Types of datasets

       Propagation of information “memes”

       Propagation of other actions

       Synthetic datasets

       Software tools


2
Contents of a dataset
       Action traces
        ◦ Sometimes not obvious (e.g. gaining weight
          can be an action)
        ◦ Propagation explicitly / implicitly attributed

       Social network
        ◦ Explicitly declared / Implicitly inferred
        ◦ Symmetrical / Non-symmetrical


3
Data availability limits research
       Often you have to pick two of these


                                 Includes
                                   Social
                                 Network




                   Is Publicly              Includes
                   Available                 Action
                                             Traces



4
Classification: according to
    availability
       Proprietary, impossible or very hard to
        reproduce (e.g. shopping history in e-
        commerce)
        ◦ Increasingly being rejected in IR, DM
          communities

       Proprietary, reproducible (e.g. web crawl of
        a sub-set of public websites)

       Existing open dataset

       New open dataset
5
Propagation of Information
           “Memes”




6
Memes and “Internet Memes”




7
Microblogging data
       Providers: Twitter, Identi.ca, Diaspora, etc.
        ◦ Directly or through data re-sellers

       Actions: posting a message

       Connections: explicitly declared, non-
        symmetrical

       Propagations: explicitly linked (in principle),
        but implicitly linked (in practice) due to
        client implementations
8
Extracting info. propagations
       Idea: start from a large corpus and then
        extract information propagations
        ◦ Blogs, news articles, academic papers, generic
          web pages, etc.
        ◦ Simple in theory, extremely difficult in practice

       Looking for citations doesn't work
        ◦ People on the web seldom attribute explicitly

       Keywords and phrases
        ◦ Usually end up with a mixture of too broad (e.g.
          stylistic idioms) and/or too narrow (e.g. one
          specific copy of a news item) “topics”
9
Using #hashtags and URLs
        Twitter: #hashtags and URLs

        With some exceptions
         ◦ #hashtags are too broad,
         ◦ URLs are too narrow

        Let's propose two methods that can
         alleviate these problems ...


10
Extracting info. propagations:
     Meme tracker
        Public dataset: http://memetracker.org/

        Tracks “mutated” key phrases in a
         document collection, example cluster:
           the fundamentals of our economy are strong
           the fundamentals of the economy are strong
           i promise you we will never put america in this position again we will clean up wall street
           the fundamentals of our economy are strong but these are very very difficult times and i
           promise you we will never put america in this position again
           but these are very very difficult times



        No a-priori network exists. Inference
         methods are used.
11                                                                      [Leskovec et al. KDD 2009]
Extracting info. propagations:
     Meme tracker




12                          [Leskovec et al. KDD 2009]
Extracting info. propagation:
     Trending topics
        Method
         ◦ Look for “bursty” (spiky, trending) topics,
           represented e.g. as a collection of keywords
         ◦ Track the propagation of those topics

        Rely on a proven method for burst
           detection
         ◦ The tricky part is not to detect the burst, but
           to represent it (e.g. as a query) e.g. Haiti
           earthquake tweets might not include “Haiti”
           or “earthquake”
13                             [Mathioudakis and Koudas, SIGMOD 2010]
Extracting information
     propagations: Other methods
        Internet chain letters; look for copies
         online of petition letters




14                           [Liben-Nowell and Kleinberg, PNAS 2008]
Sampling issues
   Issues with recall along information
    cascades
    ◦ e.g. twitter stream 1% sample gardenhose




                                [Sadikov et al. WSDM 2011]
Propagation of Other Actions




16
Consuming media and products
        Media consumption/appraisal platforms
         ◦ Examples: Flixter / Last.fm / GoodReads
           Action: rating, watching, listening or reading a
            movie, a song, or a book
           Connections: Explicit friendships
         ◦ Propagations: usually implicitly linked unless
           “recommend to a friend” feature is used and
           publicly available

        Product recommendations
         ◦ Example: @cosme cosmetics
           recommendations
17
@cosme recommendations




18                [Matsuo and Yamamoto, WWW 2009]
Cross-provider data
        One provides the network, the other the
         actions

        MSN + Bing: Social network is MSN IM,
         actions are searches

        YIM + YMovies



19                [Singla and Richardson, WWW 2008] [Goyal et al. CIKM 2008]
Phone calls
        Social networks are calls, actions are
         leaving the company (“churning”)

        Some call datasets are available for
         academic labs (not for industrial ones)




20                                  [Dasgupta et al. EDBT 2008]
Phone calls




21                 [Senseable City Lab, MIT, 2009]
Community membership
        DBLP/Arnetminer
         ◦ Social network is co-authorship
         ◦ Action is publishing in a conference or
           publishing on a topic

        Livejournal / Flickr
         ◦ Social network is friendship graph
         ◦ Action is joining a community/group

        Bloglines
         ◦ Action is subscribing to a rss feed
22
Other datasets
        Flickr
         ◦ Explicit friendship, action is (1) favoring a
           photo or (2) using a tag

        Digg/Reddit votes
           Explicit friendship, action is vote-up




23
Off-line datasets
        Participation of women in 14 social
         activities over 9 months in US south (n=18)

        Romantic network in a high school (n=288)

        Medical records during 32 years (n=12,067)

        Network only
         ◦ Zachary's Karate club
         ◦ Presumed acquaintances links between terrorist
           suspects (n=74, n=63 if main CC is used)
24
“Chains of Affection”




25               [Bearman et al. Amer. Journal of Sociology 2004]
“Chains of Affection”


      Probably
      not a future
      computer
      scientist 




26
Size proportional to BMI, yellow fill indicates obesity. Blue border=men, Red border=women

27                        [Christakis and Fowler, New England Journal of Medicine 2007]
Synthetic Datasets




28
Network data are widely available
        Domains
         ◦ Online social networks: slashdot, epinions, …
         ◦ Communication: internet as, p2p, roads, …
         ◦ Collaboration: scientists, actors, jazz
           musicians, wikipedia editors, ...
         ◦ Citations: web graphs, academic
           publications, patents, …
         ◦ References: linked data in
           freebase/dbpedia, protein interactions,
           metabolic networks, ...
29
Publishing your own datasets
        Document every step of sampling, filtering,
         processing methodology
        CC0 (public domain) data releases
        Ad-hoc data releases: look at items in example
         agreements (duration, purpose, warranties, item
         deletion policies, etc.)
        Privacy concerns
        It may take some extra work, but remember that
         it is also in YOUR interest that your data is used
         by others
30
Software




31
Graph software Tools
        Software
         ◦ SNAP [GPL] Gephi [GPL, gui]
         ◦ Pajek [Free for non-commercial use, Windows, gui
         ◦ Webgraph [GPL] Graphviz [GPL]

        Graph generation, transformation,
         ◦ SNAP, Gephi, Pajek, Webgraph [compress], ...

        Subgraphs: clustering, connected components, etc. Node
         metrics: centrality, local clustering coeff.
         ◦ SNAP, Gephi, Pajek

        Graph visualization: Gephi, Pajek, Graphviz

        Other:
         https://sites.google.com/site/ucinetsoftware/downloads
32
Propagation software tools
        SPINE software
         ◦ IC model
         ◦ Inference with given social network
         ◦ Sparsification of influence models

        Internet network simulator

        Ask authors, some software is known to
         be available on request

33
Visualization




34
Visualization




35                   [Project Cascade by NYT Labs 2011]
Key takeaways of part II
        Data availability affects our research

        Current alternatives are not good
         ◦ Results on proprietary data sources are not
           reproducible
         ◦ Synthetic information propagations might
           not be realistic

        Software is not readily available

        This is something to work on collectively!
36

More Related Content

What's hot

Maura.fujieh
Maura.fujiehMaura.fujieh
Maura.fujiehNASAPMC
 
JTerm Day 2 - History, Definitions & Stats
JTerm Day 2 - History, Definitions & StatsJTerm Day 2 - History, Definitions & Stats
JTerm Day 2 - History, Definitions & StatsAndrew Hoffman
 
Web 2.0 and virtual worlds
Web 2.0 and virtual worldsWeb 2.0 and virtual worlds
Web 2.0 and virtual worldsBryan Alexander
 
"You can just tell whether a website looks reliable or not." People's modes o...
"You can just tell whether a website looks reliable or not." People's modes o..."You can just tell whether a website looks reliable or not." People's modes o...
"You can just tell whether a website looks reliable or not." People's modes o...Lynn Connaway
 
On Line Basics LinkedIn Facebook Twitter_Joyce M Sullivan NYPL SIBL 2010 05 19
On Line Basics LinkedIn Facebook Twitter_Joyce M Sullivan NYPL SIBL 2010 05 19On Line Basics LinkedIn Facebook Twitter_Joyce M Sullivan NYPL SIBL 2010 05 19
On Line Basics LinkedIn Facebook Twitter_Joyce M Sullivan NYPL SIBL 2010 05 19SocMediaFin - Joyce Sullivan
 
CrowdTruth @VU Faculty Colloquium (June 2015)
CrowdTruth @VU Faculty Colloquium (June 2015)CrowdTruth @VU Faculty Colloquium (June 2015)
CrowdTruth @VU Faculty Colloquium (June 2015)Lora Aroyo
 
Micro-Blogging, Futurism, and the Agora
Micro-Blogging, Futurism, and the AgoraMicro-Blogging, Futurism, and the Agora
Micro-Blogging, Futurism, and the AgoraThe New School
 
The End Run Around The Information Gate Keepers Week# 3 Global Net Activism
The End Run Around The  Information Gate Keepers Week# 3 Global Net ActivismThe End Run Around The  Information Gate Keepers Week# 3 Global Net Activism
The End Run Around The Information Gate Keepers Week# 3 Global Net ActivismThe New School
 
Mining the Social Web - Lecture 2 - T61.6020
Mining the Social Web - Lecture 2 - T61.6020Mining the Social Web - Lecture 2 - T61.6020
Mining the Social Web - Lecture 2 - T61.6020Michael Mathioudakis
 
How Web2 Is Revolutionising Education
How Web2 Is Revolutionising EducationHow Web2 Is Revolutionising Education
How Web2 Is Revolutionising EducationMichael Coghlan
 
Hatfieldskip
HatfieldskipHatfieldskip
HatfieldskipNASAPMC
 
Personalizing Media Interaction on the (Semantic & Social) Web
Personalizing Media Interaction on the (Semantic & Social) WebPersonalizing Media Interaction on the (Semantic & Social) Web
Personalizing Media Interaction on the (Semantic & Social) WebLora Aroyo
 
Isle of Man open data overview
Isle of Man open data overviewIsle of Man open data overview
Isle of Man open data overviewChris Taggart
 
Open Access, Open Data. Open Research?
Open Access, Open Data. Open Research?Open Access, Open Data. Open Research?
Open Access, Open Data. Open Research?Cameron Neylon
 
Eduwebinar: Our Everyday Tools for Success
Eduwebinar:  Our Everyday Tools for SuccessEduwebinar:  Our Everyday Tools for Success
Eduwebinar: Our Everyday Tools for SuccessJudy O'Connell
 
Social Media. Introduction to the Syllabus
Social Media. Introduction to the SyllabusSocial Media. Introduction to the Syllabus
Social Media. Introduction to the SyllabusThe New School
 
People Like You Like Presentations Like This
People Like You Like Presentations Like ThisPeople Like You Like Presentations Like This
People Like You Like Presentations Like ThisDavid Millard
 
Helping Darwin: How to think about evolution of consciousness (Biosciences ta...
Helping Darwin: How to think about evolution of consciousness (Biosciences ta...Helping Darwin: How to think about evolution of consciousness (Biosciences ta...
Helping Darwin: How to think about evolution of consciousness (Biosciences ta...Aaron Sloman
 

What's hot (20)

Arc 08 12 10 final short
Arc 08 12 10 final shortArc 08 12 10 final short
Arc 08 12 10 final short
 
Maura.fujieh
Maura.fujiehMaura.fujieh
Maura.fujieh
 
JTerm Day 2 - History, Definitions & Stats
JTerm Day 2 - History, Definitions & StatsJTerm Day 2 - History, Definitions & Stats
JTerm Day 2 - History, Definitions & Stats
 
Web 2.0 and virtual worlds
Web 2.0 and virtual worldsWeb 2.0 and virtual worlds
Web 2.0 and virtual worlds
 
"You can just tell whether a website looks reliable or not." People's modes o...
"You can just tell whether a website looks reliable or not." People's modes o..."You can just tell whether a website looks reliable or not." People's modes o...
"You can just tell whether a website looks reliable or not." People's modes o...
 
On Line Basics LinkedIn Facebook Twitter_Joyce M Sullivan NYPL SIBL 2010 05 19
On Line Basics LinkedIn Facebook Twitter_Joyce M Sullivan NYPL SIBL 2010 05 19On Line Basics LinkedIn Facebook Twitter_Joyce M Sullivan NYPL SIBL 2010 05 19
On Line Basics LinkedIn Facebook Twitter_Joyce M Sullivan NYPL SIBL 2010 05 19
 
CrowdTruth @VU Faculty Colloquium (June 2015)
CrowdTruth @VU Faculty Colloquium (June 2015)CrowdTruth @VU Faculty Colloquium (June 2015)
CrowdTruth @VU Faculty Colloquium (June 2015)
 
Micro-Blogging, Futurism, and the Agora
Micro-Blogging, Futurism, and the AgoraMicro-Blogging, Futurism, and the Agora
Micro-Blogging, Futurism, and the Agora
 
The End Run Around The Information Gate Keepers Week# 3 Global Net Activism
The End Run Around The  Information Gate Keepers Week# 3 Global Net ActivismThe End Run Around The  Information Gate Keepers Week# 3 Global Net Activism
The End Run Around The Information Gate Keepers Week# 3 Global Net Activism
 
Week3
Week3 Week3
Week3
 
Mining the Social Web - Lecture 2 - T61.6020
Mining the Social Web - Lecture 2 - T61.6020Mining the Social Web - Lecture 2 - T61.6020
Mining the Social Web - Lecture 2 - T61.6020
 
How Web2 Is Revolutionising Education
How Web2 Is Revolutionising EducationHow Web2 Is Revolutionising Education
How Web2 Is Revolutionising Education
 
Hatfieldskip
HatfieldskipHatfieldskip
Hatfieldskip
 
Personalizing Media Interaction on the (Semantic & Social) Web
Personalizing Media Interaction on the (Semantic & Social) WebPersonalizing Media Interaction on the (Semantic & Social) Web
Personalizing Media Interaction on the (Semantic & Social) Web
 
Isle of Man open data overview
Isle of Man open data overviewIsle of Man open data overview
Isle of Man open data overview
 
Open Access, Open Data. Open Research?
Open Access, Open Data. Open Research?Open Access, Open Data. Open Research?
Open Access, Open Data. Open Research?
 
Eduwebinar: Our Everyday Tools for Success
Eduwebinar:  Our Everyday Tools for SuccessEduwebinar:  Our Everyday Tools for Success
Eduwebinar: Our Everyday Tools for Success
 
Social Media. Introduction to the Syllabus
Social Media. Introduction to the SyllabusSocial Media. Introduction to the Syllabus
Social Media. Introduction to the Syllabus
 
People Like You Like Presentations Like This
People Like You Like Presentations Like ThisPeople Like You Like Presentations Like This
People Like You Like Presentations Like This
 
Helping Darwin: How to think about evolution of consciousness (Biosciences ta...
Helping Darwin: How to think about evolution of consciousness (Biosciences ta...Helping Darwin: How to think about evolution of consciousness (Biosciences ta...
Helping Darwin: How to think about evolution of consciousness (Biosciences ta...
 

Viewers also liked

Kdd12 tutorial-inf-part-iv
Kdd12 tutorial-inf-part-ivKdd12 tutorial-inf-part-iv
Kdd12 tutorial-inf-part-ivLaks Lakshmanan
 
Kdd12 tutorial-inf-part-iii
Kdd12 tutorial-inf-part-iiiKdd12 tutorial-inf-part-iii
Kdd12 tutorial-inf-part-iiiLaks Lakshmanan
 
Social Media News Communities: Gatekeeping, Coverage, and Statement Bias
 Social Media News Communities: Gatekeeping, Coverage, and Statement Bias Social Media News Communities: Gatekeeping, Coverage, and Statement Bias
Social Media News Communities: Gatekeeping, Coverage, and Statement BiasMounia Lalmas-Roelleke
 
The Effects of Time on Query Flow Graph-based Models for Query Suggestion
The Effects of Time on Query Flow Graph-based Models for Query SuggestionThe Effects of Time on Query Flow Graph-based Models for Query Suggestion
The Effects of Time on Query Flow Graph-based Models for Query SuggestionCarlos Castillo (ChaTo)
 
Social Media News Mining and Automatic Content Analysis of News
Social Media News Mining and Automatic Content Analysis of NewsSocial Media News Mining and Automatic Content Analysis of News
Social Media News Mining and Automatic Content Analysis of NewsCarlos Castillo (ChaTo)
 
Dr. Searcher and Mr. Browser: A unified hyperlink-click graph
Dr. Searcher and Mr. Browser: A unified hyperlink-click graphDr. Searcher and Mr. Browser: A unified hyperlink-click graph
Dr. Searcher and Mr. Browser: A unified hyperlink-click graphCarlos Castillo (ChaTo)
 
Characterizing the Life Cycle of Online News Stories Using Social Media React...
Characterizing the Life Cycle of Online News Stories Using Social Media React...Characterizing the Life Cycle of Online News Stories Using Social Media React...
Characterizing the Life Cycle of Online News Stories Using Social Media React...Carlos Castillo (ChaTo)
 
Information Verification During Natural Disasters
Information Verification During Natural DisastersInformation Verification During Natural Disasters
Information Verification During Natural DisastersCarlos Castillo (ChaTo)
 
Keynote talk: Big Crisis Data, an Open Invitation
Keynote talk: Big Crisis Data, an Open InvitationKeynote talk: Big Crisis Data, an Open Invitation
Keynote talk: Big Crisis Data, an Open InvitationCarlos Castillo (ChaTo)
 
TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...
TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...
TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...IIIT Hyderabad
 
Extracting Information Nuggets from Disaster-Related Messages in Social Media
Extracting Information Nuggets from Disaster-Related Messages in Social MediaExtracting Information Nuggets from Disaster-Related Messages in Social Media
Extracting Information Nuggets from Disaster-Related Messages in Social MediaMuhammad Imran
 
Chapter 17 Designing And Managing Integrated Marketing Communications
Chapter 17 Designing And Managing Integrated Marketing CommunicationsChapter 17 Designing And Managing Integrated Marketing Communications
Chapter 17 Designing And Managing Integrated Marketing CommunicationsDiarta
 
What to Expect When the Unexpected Happens: Social Media Communications Acros...
What to Expect When the Unexpected Happens: Social Media Communications Acros...What to Expect When the Unexpected Happens: Social Media Communications Acros...
What to Expect When the Unexpected Happens: Social Media Communications Acros...Carlos Castillo (ChaTo)
 
Emotions and dialogue in a peer-production community: the case of Wikipedia
Emotions and dialogue in a peer-production community: the case of WikipediaEmotions and dialogue in a peer-production community: the case of Wikipedia
Emotions and dialogue in a peer-production community: the case of WikipediaDavid Laniado
 
SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...
SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...
SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...Artificial Intelligence Institute at UofSC
 
Chapter 17 - Designing and Managing Integrated Marketing Communications
Chapter 17 - Designing and Managing Integrated Marketing CommunicationsChapter 17 - Designing and Managing Integrated Marketing Communications
Chapter 17 - Designing and Managing Integrated Marketing CommunicationsDr. Ankit Kesharwani
 

Viewers also liked (20)

Kdd12 tutorial-inf-part-iv
Kdd12 tutorial-inf-part-ivKdd12 tutorial-inf-part-iv
Kdd12 tutorial-inf-part-iv
 
Kdd12 tutorial-inf-part-iii
Kdd12 tutorial-inf-part-iiiKdd12 tutorial-inf-part-iii
Kdd12 tutorial-inf-part-iii
 
Social Media News Communities: Gatekeeping, Coverage, and Statement Bias
 Social Media News Communities: Gatekeeping, Coverage, and Statement Bias Social Media News Communities: Gatekeeping, Coverage, and Statement Bias
Social Media News Communities: Gatekeeping, Coverage, and Statement Bias
 
The Effects of Time on Query Flow Graph-based Models for Query Suggestion
The Effects of Time on Query Flow Graph-based Models for Query SuggestionThe Effects of Time on Query Flow Graph-based Models for Query Suggestion
The Effects of Time on Query Flow Graph-based Models for Query Suggestion
 
Social Media News Mining and Automatic Content Analysis of News
Social Media News Mining and Automatic Content Analysis of NewsSocial Media News Mining and Automatic Content Analysis of News
Social Media News Mining and Automatic Content Analysis of News
 
Dr. Searcher and Mr. Browser: A unified hyperlink-click graph
Dr. Searcher and Mr. Browser: A unified hyperlink-click graphDr. Searcher and Mr. Browser: A unified hyperlink-click graph
Dr. Searcher and Mr. Browser: A unified hyperlink-click graph
 
Characterizing the Life Cycle of Online News Stories Using Social Media React...
Characterizing the Life Cycle of Online News Stories Using Social Media React...Characterizing the Life Cycle of Online News Stories Using Social Media React...
Characterizing the Life Cycle of Online News Stories Using Social Media React...
 
Information Verification During Natural Disasters
Information Verification During Natural DisastersInformation Verification During Natural Disasters
Information Verification During Natural Disasters
 
Keynote talk: Big Crisis Data, an Open Invitation
Keynote talk: Big Crisis Data, an Open InvitationKeynote talk: Big Crisis Data, an Open Invitation
Keynote talk: Big Crisis Data, an Open Invitation
 
TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...
TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...
TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...
 
Extracting Information Nuggets from Disaster-Related Messages in Social Media
Extracting Information Nuggets from Disaster-Related Messages in Social MediaExtracting Information Nuggets from Disaster-Related Messages in Social Media
Extracting Information Nuggets from Disaster-Related Messages in Social Media
 
Chapter 17 Designing And Managing Integrated Marketing Communications
Chapter 17 Designing And Managing Integrated Marketing CommunicationsChapter 17 Designing And Managing Integrated Marketing Communications
Chapter 17 Designing And Managing Integrated Marketing Communications
 
What to Expect When the Unexpected Happens: Social Media Communications Acros...
What to Expect When the Unexpected Happens: Social Media Communications Acros...What to Expect When the Unexpected Happens: Social Media Communications Acros...
What to Expect When the Unexpected Happens: Social Media Communications Acros...
 
Fairness-Aware Data Mining
Fairness-Aware Data MiningFairness-Aware Data Mining
Fairness-Aware Data Mining
 
Crisis Computing
Crisis ComputingCrisis Computing
Crisis Computing
 
Emotions and dialogue in a peer-production community: the case of Wikipedia
Emotions and dialogue in a peer-production community: the case of WikipediaEmotions and dialogue in a peer-production community: the case of Wikipedia
Emotions and dialogue in a peer-production community: the case of Wikipedia
 
SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...
SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...
SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...
 
Social Media Mining and Retrieval
Social Media Mining and RetrievalSocial Media Mining and Retrieval
Social Media Mining and Retrieval
 
Discrimination Discovery
Discrimination DiscoveryDiscrimination Discovery
Discrimination Discovery
 
Chapter 17 - Designing and Managing Integrated Marketing Communications
Chapter 17 - Designing and Managing Integrated Marketing CommunicationsChapter 17 - Designing and Managing Integrated Marketing Communications
Chapter 17 - Designing and Managing Integrated Marketing Communications
 

Similar to Kdd12 tutorial-inf-part-ii

Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...Mike Kujawski
 
Lecture 7: Social Web Challenges (2012)
Lecture 7: Social Web Challenges (2012)Lecture 7: Social Web Challenges (2012)
Lecture 7: Social Web Challenges (2012)Lora Aroyo
 
Keynote baezayates
Keynote baezayatesKeynote baezayates
Keynote baezayatesPROS-UPV
 
Big data in the web
Big data in the webBig data in the web
Big data in the webcaise2013
 
Keynote baezayates
Keynote baezayatesKeynote baezayates
Keynote baezayatescaise2013vlc
 
SocialCom09-tutorial.pdf
SocialCom09-tutorial.pdfSocialCom09-tutorial.pdf
SocialCom09-tutorial.pdfBalasundaramSr
 
Toward a System Building Agenda for Data Integration(and Dat.docx
Toward a System Building Agenda for Data Integration(and Dat.docxToward a System Building Agenda for Data Integration(and Dat.docx
Toward a System Building Agenda for Data Integration(and Dat.docxjuliennehar
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisFarida Vis
 
Lida change-reference-abels
Lida change-reference-abelsLida change-reference-abels
Lida change-reference-abelsfpehar
 
Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Carly Strasser
 
2014 TheNextWeb-Mapping connections with NodeXL
2014 TheNextWeb-Mapping connections with NodeXL2014 TheNextWeb-Mapping connections with NodeXL
2014 TheNextWeb-Mapping connections with NodeXLMarc Smith
 
Cl15 a koene_ca_sma
Cl15 a koene_ca_smaCl15 a koene_ca_sma
Cl15 a koene_ca_smaAnsgar Koene
 
Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...
Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...
Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...BO TRUE ACTIVITIES SL
 
Streamlining Nonprofit Organizations - It's all About the Cloud!
Streamlining Nonprofit Organizations - It's all About the Cloud!Streamlining Nonprofit Organizations - It's all About the Cloud!
Streamlining Nonprofit Organizations - It's all About the Cloud!Marc Baizman
 
Streamlining Nonprofit Organizations: It's All About the Cloud
Streamlining Nonprofit Organizations: It's All About the CloudStreamlining Nonprofit Organizations: It's All About the Cloud
Streamlining Nonprofit Organizations: It's All About the CloudDebra Askanase
 
Accessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeAccessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeJosh Cowls
 
Using DITA-wiki Hybrid Solutions For Better Knowledge Systems
Using DITA-wiki Hybrid Solutions For Better Knowledge SystemsUsing DITA-wiki Hybrid Solutions For Better Knowledge Systems
Using DITA-wiki Hybrid Solutions For Better Knowledge SystemsLisa Dyer
 

Similar to Kdd12 tutorial-inf-part-ii (20)

Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...
 
Lecture 7: Social Web Challenges (2012)
Lecture 7: Social Web Challenges (2012)Lecture 7: Social Web Challenges (2012)
Lecture 7: Social Web Challenges (2012)
 
Keynote baezayates
Keynote baezayatesKeynote baezayates
Keynote baezayates
 
Big data in the web
Big data in the webBig data in the web
Big data in the web
 
Keynote baezayates
Keynote baezayatesKeynote baezayates
Keynote baezayates
 
Conclusions: Summary and Outlook
Conclusions: Summary and OutlookConclusions: Summary and Outlook
Conclusions: Summary and Outlook
 
SocialCom09-tutorial.pdf
SocialCom09-tutorial.pdfSocialCom09-tutorial.pdf
SocialCom09-tutorial.pdf
 
Toward a System Building Agenda for Data Integration(and Dat.docx
Toward a System Building Agenda for Data Integration(and Dat.docxToward a System Building Agenda for Data Integration(and Dat.docx
Toward a System Building Agenda for Data Integration(and Dat.docx
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media Analysis
 
Lida change-reference-abels
Lida change-reference-abelsLida change-reference-abels
Lida change-reference-abels
 
Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014
 
2014 TheNextWeb-Mapping connections with NodeXL
2014 TheNextWeb-Mapping connections with NodeXL2014 TheNextWeb-Mapping connections with NodeXL
2014 TheNextWeb-Mapping connections with NodeXL
 
Cl15 a koene_ca_sma
Cl15 a koene_ca_smaCl15 a koene_ca_sma
Cl15 a koene_ca_sma
 
sm@jgc Session Three
sm@jgc Session Threesm@jgc Session Three
sm@jgc Session Three
 
Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...
Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...
Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...
 
Streamlining Nonprofit Organizations - It's all About the Cloud!
Streamlining Nonprofit Organizations - It's all About the Cloud!Streamlining Nonprofit Organizations - It's all About the Cloud!
Streamlining Nonprofit Organizations - It's all About the Cloud!
 
Streamlining Nonprofit Organizations: It's All About the Cloud
Streamlining Nonprofit Organizations: It's All About the CloudStreamlining Nonprofit Organizations: It's All About the Cloud
Streamlining Nonprofit Organizations: It's All About the Cloud
 
Accessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeAccessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science Knowledge
 
Using DITA-wiki Hybrid Solutions For Better Knowledge Systems
Using DITA-wiki Hybrid Solutions For Better Knowledge SystemsUsing DITA-wiki Hybrid Solutions For Better Knowledge Systems
Using DITA-wiki Hybrid Solutions For Better Knowledge Systems
 
Horizon
HorizonHorizon
Horizon
 

More from Laks Lakshmanan

Combating Fake News: Combating Fake News: A Data Management and Mining Perspe...
Combating Fake News: Combating Fake News: A Data Management and Mining Perspe...Combating Fake News: Combating Fake News: A Data Management and Mining Perspe...
Combating Fake News: Combating Fake News: A Data Management and Mining Perspe...Laks Lakshmanan
 
Big-O(Q) VLDB 2015 Keynote: Social Network Analytics: Beyond the Obvious
Big-O(Q) VLDB 2015 Keynote: Social Network Analytics: Beyond the ObviousBig-O(Q) VLDB 2015 Keynote: Social Network Analytics: Beyond the Obvious
Big-O(Q) VLDB 2015 Keynote: Social Network Analytics: Beyond the ObviousLaks Lakshmanan
 
Big-O(Q) Social Network Analytics
Big-O(Q) Social Network AnalyticsBig-O(Q) Social Network Analytics
Big-O(Q) Social Network AnalyticsLaks Lakshmanan
 

More from Laks Lakshmanan (6)

cuhk-fb-mi-talk.pdf
cuhk-fb-mi-talk.pdfcuhk-fb-mi-talk.pdf
cuhk-fb-mi-talk.pdf
 
Combating Fake News: Combating Fake News: A Data Management and Mining Perspe...
Combating Fake News: Combating Fake News: A Data Management and Mining Perspe...Combating Fake News: Combating Fake News: A Data Management and Mining Perspe...
Combating Fake News: Combating Fake News: A Data Management and Mining Perspe...
 
SDM 2019 Keynote
SDM 2019 KeynoteSDM 2019 Keynote
SDM 2019 Keynote
 
Big-O(Q) VLDB 2015 Keynote: Social Network Analytics: Beyond the Obvious
Big-O(Q) VLDB 2015 Keynote: Social Network Analytics: Beyond the ObviousBig-O(Q) VLDB 2015 Keynote: Social Network Analytics: Beyond the Obvious
Big-O(Q) VLDB 2015 Keynote: Social Network Analytics: Beyond the Obvious
 
Big-O(Q) Social Network Analytics
Big-O(Q) Social Network AnalyticsBig-O(Q) Social Network Analytics
Big-O(Q) Social Network Analytics
 
Pro max icdm2012-slides
Pro max icdm2012-slidesPro max icdm2012-slides
Pro max icdm2012-slides
 

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Recently uploaded (20)

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

Kdd12 tutorial-inf-part-ii

  • 1. Part I → Part II → Part III → Part IV → Part V Data and Software
  • 2. Part II: Outline  Types of datasets  Propagation of information “memes”  Propagation of other actions  Synthetic datasets  Software tools 2
  • 3. Contents of a dataset  Action traces ◦ Sometimes not obvious (e.g. gaining weight can be an action) ◦ Propagation explicitly / implicitly attributed  Social network ◦ Explicitly declared / Implicitly inferred ◦ Symmetrical / Non-symmetrical 3
  • 4. Data availability limits research  Often you have to pick two of these Includes Social Network Is Publicly Includes Available Action Traces 4
  • 5. Classification: according to availability  Proprietary, impossible or very hard to reproduce (e.g. shopping history in e- commerce) ◦ Increasingly being rejected in IR, DM communities  Proprietary, reproducible (e.g. web crawl of a sub-set of public websites)  Existing open dataset  New open dataset 5
  • 8. Microblogging data  Providers: Twitter, Identi.ca, Diaspora, etc. ◦ Directly or through data re-sellers  Actions: posting a message  Connections: explicitly declared, non- symmetrical  Propagations: explicitly linked (in principle), but implicitly linked (in practice) due to client implementations 8
  • 9. Extracting info. propagations  Idea: start from a large corpus and then extract information propagations ◦ Blogs, news articles, academic papers, generic web pages, etc. ◦ Simple in theory, extremely difficult in practice  Looking for citations doesn't work ◦ People on the web seldom attribute explicitly  Keywords and phrases ◦ Usually end up with a mixture of too broad (e.g. stylistic idioms) and/or too narrow (e.g. one specific copy of a news item) “topics” 9
  • 10. Using #hashtags and URLs  Twitter: #hashtags and URLs  With some exceptions ◦ #hashtags are too broad, ◦ URLs are too narrow  Let's propose two methods that can alleviate these problems ... 10
  • 11. Extracting info. propagations: Meme tracker  Public dataset: http://memetracker.org/  Tracks “mutated” key phrases in a document collection, example cluster: the fundamentals of our economy are strong the fundamentals of the economy are strong i promise you we will never put america in this position again we will clean up wall street the fundamentals of our economy are strong but these are very very difficult times and i promise you we will never put america in this position again but these are very very difficult times  No a-priori network exists. Inference methods are used. 11 [Leskovec et al. KDD 2009]
  • 12. Extracting info. propagations: Meme tracker 12 [Leskovec et al. KDD 2009]
  • 13. Extracting info. propagation: Trending topics  Method ◦ Look for “bursty” (spiky, trending) topics, represented e.g. as a collection of keywords ◦ Track the propagation of those topics  Rely on a proven method for burst detection ◦ The tricky part is not to detect the burst, but to represent it (e.g. as a query) e.g. Haiti earthquake tweets might not include “Haiti” or “earthquake” 13 [Mathioudakis and Koudas, SIGMOD 2010]
  • 14. Extracting information propagations: Other methods  Internet chain letters; look for copies online of petition letters 14 [Liben-Nowell and Kleinberg, PNAS 2008]
  • 15. Sampling issues  Issues with recall along information cascades ◦ e.g. twitter stream 1% sample gardenhose [Sadikov et al. WSDM 2011]
  • 16. Propagation of Other Actions 16
  • 17. Consuming media and products  Media consumption/appraisal platforms ◦ Examples: Flixter / Last.fm / GoodReads  Action: rating, watching, listening or reading a movie, a song, or a book  Connections: Explicit friendships ◦ Propagations: usually implicitly linked unless “recommend to a friend” feature is used and publicly available  Product recommendations ◦ Example: @cosme cosmetics recommendations 17
  • 18. @cosme recommendations 18 [Matsuo and Yamamoto, WWW 2009]
  • 19. Cross-provider data  One provides the network, the other the actions  MSN + Bing: Social network is MSN IM, actions are searches  YIM + YMovies 19 [Singla and Richardson, WWW 2008] [Goyal et al. CIKM 2008]
  • 20. Phone calls  Social networks are calls, actions are leaving the company (“churning”)  Some call datasets are available for academic labs (not for industrial ones) 20 [Dasgupta et al. EDBT 2008]
  • 21. Phone calls 21 [Senseable City Lab, MIT, 2009]
  • 22. Community membership  DBLP/Arnetminer ◦ Social network is co-authorship ◦ Action is publishing in a conference or publishing on a topic  Livejournal / Flickr ◦ Social network is friendship graph ◦ Action is joining a community/group  Bloglines ◦ Action is subscribing to a rss feed 22
  • 23. Other datasets  Flickr ◦ Explicit friendship, action is (1) favoring a photo or (2) using a tag  Digg/Reddit votes  Explicit friendship, action is vote-up 23
  • 24. Off-line datasets  Participation of women in 14 social activities over 9 months in US south (n=18)  Romantic network in a high school (n=288)  Medical records during 32 years (n=12,067)  Network only ◦ Zachary's Karate club ◦ Presumed acquaintances links between terrorist suspects (n=74, n=63 if main CC is used) 24
  • 25. “Chains of Affection” 25 [Bearman et al. Amer. Journal of Sociology 2004]
  • 26. “Chains of Affection” Probably not a future computer scientist  26
  • 27. Size proportional to BMI, yellow fill indicates obesity. Blue border=men, Red border=women 27 [Christakis and Fowler, New England Journal of Medicine 2007]
  • 29. Network data are widely available  Domains ◦ Online social networks: slashdot, epinions, … ◦ Communication: internet as, p2p, roads, … ◦ Collaboration: scientists, actors, jazz musicians, wikipedia editors, ... ◦ Citations: web graphs, academic publications, patents, … ◦ References: linked data in freebase/dbpedia, protein interactions, metabolic networks, ... 29
  • 30. Publishing your own datasets  Document every step of sampling, filtering, processing methodology  CC0 (public domain) data releases  Ad-hoc data releases: look at items in example agreements (duration, purpose, warranties, item deletion policies, etc.)  Privacy concerns  It may take some extra work, but remember that it is also in YOUR interest that your data is used by others 30
  • 32. Graph software Tools  Software ◦ SNAP [GPL] Gephi [GPL, gui] ◦ Pajek [Free for non-commercial use, Windows, gui ◦ Webgraph [GPL] Graphviz [GPL]  Graph generation, transformation, ◦ SNAP, Gephi, Pajek, Webgraph [compress], ...  Subgraphs: clustering, connected components, etc. Node metrics: centrality, local clustering coeff. ◦ SNAP, Gephi, Pajek  Graph visualization: Gephi, Pajek, Graphviz  Other: https://sites.google.com/site/ucinetsoftware/downloads 32
  • 33. Propagation software tools  SPINE software ◦ IC model ◦ Inference with given social network ◦ Sparsification of influence models  Internet network simulator  Ask authors, some software is known to be available on request 33
  • 35. Visualization 35 [Project Cascade by NYT Labs 2011]
  • 36. Key takeaways of part II  Data availability affects our research  Current alternatives are not good ◦ Results on proprietary data sources are not reproducible ◦ Synthetic information propagations might not be realistic  Software is not readily available  This is something to work on collectively! 36