ENHANCING THE C-SPAN ARCHIVE
WITH COMMUNICATIVE
METADATA:
A PRACTICE CAPITAL PROPOSAL
Sorin Adam Matei
Associate Professor
Discovery Park and Polytechnic Institute Fellow
Director of Research for Computational Social Science,
CyberCenter

BRIAN LAMB SCHOOL OF COMMUNICATION
DATA EVERYWHERE
• The C-Span Archive is a Big Data repository
• Social and Political Big Data
• Captures not just words or moving images
but INTERACTIONS

BRIAN LAMB SCHOOL OF COMMUNICATION
AN INTERACTION REPOSITORY
• The C-Span archive captures who said, what,
to whom
• Sender  Message  Receiver
• Concatenated, such chains of interaction
become SOCIAL NETWORKS OF DEBATE

BRIAN LAMB SCHOOL OF COMMUNICATION
COMMUNICATIVE META-DATA
• Each member of the network can be evaluated
for his or her role, importance, and impact
• The role, importance and impact can be turned
into search and visualization criteria both for the
speakers and for what was said
• Meta-data is data that describes the context of
the speech-act and can extend the search past
tags, keywords, author, or time

BRIAN LAMB SCHOOL OF COMMUNICATION
SOCIAL NETWORKS – THE BRIEFEST INTRO
• Mapping people as members of a network
reveals things that are not immediately
apparent

• What is important is not how much you talk to
other people, but how central you are in the
debate

BRIAN LAMB SCHOOL OF COMMUNICATION
THE IMPORTANCE OF BEING CENTRAL
• Centrality
– Simple
• How many conversation partners you have
• Follow the distribution of contributions

– Complex and subtle
• How important are you in the network of
communications
• If you were not there, would the network be poorer

BRIAN LAMB SCHOOL OF COMMUNICATION
THE MAGIC OF BETWEENNESS CENTRALITY
1 is the most central
node, although it is not
the most directly
connected
It might even be a very
unimportant (by
attributes) node or
even ignored
It is potentially a
bridge maker and
connector
BRIAN LAMB SCHOOL OF COMMUNICATION
PRACTICE CAPITAL
• Practice: working together within a human space
• Co-work ties are practice ties, not necessarily
communicative
• Practice ties can be detected via network
analysis
• High betweenness in practice space = high
practice capital

BRIAN LAMB SCHOOL OF COMMUNICATION
HOW DOES THIS MATTER?
• Mapping social conversations as networks
• Reveals the unseen powerbrokers or bridgemakers

• Suggests new information cues and selection
criteria for browsing the videos
• Facilitates a new kind of “data journalism”

BRIAN LAMB SCHOOL OF COMMUNICATION
AN EXAMPLE: JOINT SELECT COMMITTEE ON BUDGET
DEFICIT REDUCTION HEARINGS
• November, October 2011
• 17 speakers
representatives, senators,
former presidential
administration
staffers/players
• 280 minutes of conversation
• Over 115 turns of speech

http://c-spanvideo.org/topic/85
BRIAN LAMB SCHOOL OF COMMUNICATION
TURNING CONVERSATIONS INTO NETWORKS
• Analyze who is speaking to whom

• Create conversation ties that decay the longer the time that
passed between turns of speech
• Speakers that are closest to each other are the most connected,
those more distant are exponentially less connected
• Highest connection as defined by centrality in practice space,
higher practice capital

BRIAN LAMB SCHOOL OF COMMUNICATION
TECHNOLOGY WAS TESTED
• Methodology
already applied to
Wikipedia
• We created a
network of 3
million nodes
• Code is written in
JAVA, is open
source and will be
released soon

BRIAN LAMB SCHOOL OF COMMUNICATION
TEST ANALYSIS APPLIED TO A C-SPAN DEBATE
Baucus
Becerra
Bowles
Camp
Clyburn
Domenici
Elmendorf
Hensarling
Kerry
Kyl
Murray
Portman
Rivlin
Simpson
Toomey
Upton
Van Hollen

Two groups, several central talkers. Solid lines the strongest relationships.
BRIAN LAMB SCHOOL OF COMMUNICATION
HOW DOES CENTRALITY CHANGE THE STORY?
Betweeness Centrality

Speech minutes

Speech Minutes

100
80

90

72.3

70

80
70

60

60

50

50

40

40

30

30

20

20

10

10

49.86
39.25
32.5

6

0

0

Clyburn
Clyburn

Bowles

Domenici

Rivlin

Domenici

Rivlin

Bowles

Elmendorf

Elmendorf

Highest talkers are are not the most central practice capital members of the debate
BRIAN LAMB SCHOOL OF COMMUNICATION
THE MODEST PROPOSAL

Add search criteria for centrality, verbosity (amount), and persistence (turns of speech)
BRIAN LAMB SCHOOL OF COMMUNICATION
LOOKING FORWARD
• Analyze all C-Span video corpus, generate
centrality, verbosity, persistence for each
debater
• Store info, create service that serves data
alongside other metadata
• Allow third-parties to create visualization tools
and apps that indicate degree of connectedness
of speakers in practice space
• Visualize practice capital

BRIAN LAMB SCHOOL OF COMMUNICATION
Thank you!

QUESTIONS? COMMENTS?

Enhancing C-Span Video Archive with Practice Capital Metadata and data journalism APIs

  • 1.
    ENHANCING THE C-SPANARCHIVE WITH COMMUNICATIVE METADATA: A PRACTICE CAPITAL PROPOSAL Sorin Adam Matei Associate Professor Discovery Park and Polytechnic Institute Fellow Director of Research for Computational Social Science, CyberCenter BRIAN LAMB SCHOOL OF COMMUNICATION
  • 2.
    DATA EVERYWHERE • TheC-Span Archive is a Big Data repository • Social and Political Big Data • Captures not just words or moving images but INTERACTIONS BRIAN LAMB SCHOOL OF COMMUNICATION
  • 3.
    AN INTERACTION REPOSITORY •The C-Span archive captures who said, what, to whom • Sender  Message  Receiver • Concatenated, such chains of interaction become SOCIAL NETWORKS OF DEBATE BRIAN LAMB SCHOOL OF COMMUNICATION
  • 4.
    COMMUNICATIVE META-DATA • Eachmember of the network can be evaluated for his or her role, importance, and impact • The role, importance and impact can be turned into search and visualization criteria both for the speakers and for what was said • Meta-data is data that describes the context of the speech-act and can extend the search past tags, keywords, author, or time BRIAN LAMB SCHOOL OF COMMUNICATION
  • 5.
    SOCIAL NETWORKS –THE BRIEFEST INTRO • Mapping people as members of a network reveals things that are not immediately apparent • What is important is not how much you talk to other people, but how central you are in the debate BRIAN LAMB SCHOOL OF COMMUNICATION
  • 6.
    THE IMPORTANCE OFBEING CENTRAL • Centrality – Simple • How many conversation partners you have • Follow the distribution of contributions – Complex and subtle • How important are you in the network of communications • If you were not there, would the network be poorer BRIAN LAMB SCHOOL OF COMMUNICATION
  • 7.
    THE MAGIC OFBETWEENNESS CENTRALITY 1 is the most central node, although it is not the most directly connected It might even be a very unimportant (by attributes) node or even ignored It is potentially a bridge maker and connector BRIAN LAMB SCHOOL OF COMMUNICATION
  • 8.
    PRACTICE CAPITAL • Practice:working together within a human space • Co-work ties are practice ties, not necessarily communicative • Practice ties can be detected via network analysis • High betweenness in practice space = high practice capital BRIAN LAMB SCHOOL OF COMMUNICATION
  • 9.
    HOW DOES THISMATTER? • Mapping social conversations as networks • Reveals the unseen powerbrokers or bridgemakers • Suggests new information cues and selection criteria for browsing the videos • Facilitates a new kind of “data journalism” BRIAN LAMB SCHOOL OF COMMUNICATION
  • 10.
    AN EXAMPLE: JOINTSELECT COMMITTEE ON BUDGET DEFICIT REDUCTION HEARINGS • November, October 2011 • 17 speakers representatives, senators, former presidential administration staffers/players • 280 minutes of conversation • Over 115 turns of speech http://c-spanvideo.org/topic/85 BRIAN LAMB SCHOOL OF COMMUNICATION
  • 11.
    TURNING CONVERSATIONS INTONETWORKS • Analyze who is speaking to whom • Create conversation ties that decay the longer the time that passed between turns of speech • Speakers that are closest to each other are the most connected, those more distant are exponentially less connected • Highest connection as defined by centrality in practice space, higher practice capital BRIAN LAMB SCHOOL OF COMMUNICATION
  • 12.
    TECHNOLOGY WAS TESTED •Methodology already applied to Wikipedia • We created a network of 3 million nodes • Code is written in JAVA, is open source and will be released soon BRIAN LAMB SCHOOL OF COMMUNICATION
  • 13.
    TEST ANALYSIS APPLIEDTO A C-SPAN DEBATE Baucus Becerra Bowles Camp Clyburn Domenici Elmendorf Hensarling Kerry Kyl Murray Portman Rivlin Simpson Toomey Upton Van Hollen Two groups, several central talkers. Solid lines the strongest relationships. BRIAN LAMB SCHOOL OF COMMUNICATION
  • 14.
    HOW DOES CENTRALITYCHANGE THE STORY? Betweeness Centrality Speech minutes Speech Minutes 100 80 90 72.3 70 80 70 60 60 50 50 40 40 30 30 20 20 10 10 49.86 39.25 32.5 6 0 0 Clyburn Clyburn Bowles Domenici Rivlin Domenici Rivlin Bowles Elmendorf Elmendorf Highest talkers are are not the most central practice capital members of the debate BRIAN LAMB SCHOOL OF COMMUNICATION
  • 15.
    THE MODEST PROPOSAL Addsearch criteria for centrality, verbosity (amount), and persistence (turns of speech) BRIAN LAMB SCHOOL OF COMMUNICATION
  • 16.
    LOOKING FORWARD • Analyzeall C-Span video corpus, generate centrality, verbosity, persistence for each debater • Store info, create service that serves data alongside other metadata • Allow third-parties to create visualization tools and apps that indicate degree of connectedness of speakers in practice space • Visualize practice capital BRIAN LAMB SCHOOL OF COMMUNICATION
  • 17.