Enhancing C-Span Video Archive with Practice Capital Metadata and data journalism APIs


Published on

The presentation argues that the C-Span archive is not a mere repository of moving pictures. It can also be seen as a one of a kind “big data” repository. If processed from a “practice capital” perspective with quantitative and network analytic tools, such data can significantly extend the capabilities of C-Span archives by identifying the central actors in a debate and their ability to sway it. The proposed approach may serve the public interest though API tools that support third party development of visualization and analytic apps, which can lead to more informed debates and new forms of data driven journalism.

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Enhancing C-Span Video Archive with Practice Capital Metadata and data journalism APIs

  1. 1. ENHANCING THE C-SPAN ARCHIVE WITH COMMUNICATIVE METADATA: A PRACTICE CAPITAL PROPOSAL Sorin Adam Matei Associate Professor Discovery Park and Polytechnic Institute Fellow Director of Research for Computational Social Science, CyberCenter BRIAN LAMB SCHOOL OF COMMUNICATION
  2. 2. DATA EVERYWHERE • The C-Span Archive is a Big Data repository • Social and Political Big Data • Captures not just words or moving images but INTERACTIONS BRIAN LAMB SCHOOL OF COMMUNICATION
  3. 3. AN INTERACTION REPOSITORY • The C-Span archive captures who said, what, to whom • Sender  Message  Receiver • Concatenated, such chains of interaction become SOCIAL NETWORKS OF DEBATE BRIAN LAMB SCHOOL OF COMMUNICATION
  4. 4. COMMUNICATIVE META-DATA • Each member of the network can be evaluated for his or her role, importance, and impact • The role, importance and impact can be turned into search and visualization criteria both for the speakers and for what was said • Meta-data is data that describes the context of the speech-act and can extend the search past tags, keywords, author, or time BRIAN LAMB SCHOOL OF COMMUNICATION
  5. 5. SOCIAL NETWORKS – THE BRIEFEST INTRO • Mapping people as members of a network reveals things that are not immediately apparent • What is important is not how much you talk to other people, but how central you are in the debate BRIAN LAMB SCHOOL OF COMMUNICATION
  6. 6. THE IMPORTANCE OF BEING CENTRAL • Centrality – Simple • How many conversation partners you have • Follow the distribution of contributions – Complex and subtle • How important are you in the network of communications • If you were not there, would the network be poorer BRIAN LAMB SCHOOL OF COMMUNICATION
  7. 7. THE MAGIC OF BETWEENNESS CENTRALITY 1 is the most central node, although it is not the most directly connected It might even be a very unimportant (by attributes) node or even ignored It is potentially a bridge maker and connector BRIAN LAMB SCHOOL OF COMMUNICATION
  8. 8. PRACTICE CAPITAL • Practice: working together within a human space • Co-work ties are practice ties, not necessarily communicative • Practice ties can be detected via network analysis • High betweenness in practice space = high practice capital BRIAN LAMB SCHOOL OF COMMUNICATION
  9. 9. HOW DOES THIS MATTER? • Mapping social conversations as networks • Reveals the unseen powerbrokers or bridgemakers • Suggests new information cues and selection criteria for browsing the videos • Facilitates a new kind of “data journalism” BRIAN LAMB SCHOOL OF COMMUNICATION
  10. 10. AN EXAMPLE: JOINT SELECT COMMITTEE ON BUDGET DEFICIT REDUCTION HEARINGS • November, October 2011 • 17 speakers representatives, senators, former presidential administration staffers/players • 280 minutes of conversation • Over 115 turns of speech http://c-spanvideo.org/topic/85 BRIAN LAMB SCHOOL OF COMMUNICATION
  11. 11. TURNING CONVERSATIONS INTO NETWORKS • Analyze who is speaking to whom • Create conversation ties that decay the longer the time that passed between turns of speech • Speakers that are closest to each other are the most connected, those more distant are exponentially less connected • Highest connection as defined by centrality in practice space, higher practice capital BRIAN LAMB SCHOOL OF COMMUNICATION
  12. 12. TECHNOLOGY WAS TESTED • Methodology already applied to Wikipedia • We created a network of 3 million nodes • Code is written in JAVA, is open source and will be released soon BRIAN LAMB SCHOOL OF COMMUNICATION
  13. 13. TEST ANALYSIS APPLIED TO A C-SPAN DEBATE Baucus Becerra Bowles Camp Clyburn Domenici Elmendorf Hensarling Kerry Kyl Murray Portman Rivlin Simpson Toomey Upton Van Hollen Two groups, several central talkers. Solid lines the strongest relationships. BRIAN LAMB SCHOOL OF COMMUNICATION
  14. 14. HOW DOES CENTRALITY CHANGE THE STORY? Betweeness Centrality Speech minutes Speech Minutes 100 80 90 72.3 70 80 70 60 60 50 50 40 40 30 30 20 20 10 10 49.86 39.25 32.5 6 0 0 Clyburn Clyburn Bowles Domenici Rivlin Domenici Rivlin Bowles Elmendorf Elmendorf Highest talkers are are not the most central practice capital members of the debate BRIAN LAMB SCHOOL OF COMMUNICATION
  15. 15. THE MODEST PROPOSAL Add search criteria for centrality, verbosity (amount), and persistence (turns of speech) BRIAN LAMB SCHOOL OF COMMUNICATION
  16. 16. LOOKING FORWARD • Analyze all C-Span video corpus, generate centrality, verbosity, persistence for each debater • Store info, create service that serves data alongside other metadata • Allow third-parties to create visualization tools and apps that indicate degree of connectedness of speakers in practice space • Visualize practice capital BRIAN LAMB SCHOOL OF COMMUNICATION
  17. 17. Thank you! QUESTIONS? COMMENTS?