3rd Socio-Cultural Data Summit


Published on

Introductory slides for the 3rd Socio-Cultural Data Summit held at the National Defense University November 2012

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

3rd Socio-Cultural Data Summit

  1. 1. 3rd Socio-Cultural Data Summit National Defense UniversityCenter for Technology and National Security Policy
  2. 2. Admin• Unclassified conference• Chatham House rules• Lunch in the new fiscal reality (the cafeteria)• We have breaks and time built into our schedule to continue discussions or to sidebar 2
  3. 3. Data Summit(s) Objective• “Good” data are required for reliable analysis. − Socio-cultural data of any sort are hard to find. − When we do find them, they are messy, fragmented, disorganized, poorly measured, etc.• These Data Summits are committed to fostering a community that is interested in finding, evaluating, collecting, cleaning up, smartly integrating, and then using socio-cultural data against applied problems with scientific rigor. − Focus on a broad community with as few restrictions as possible. − Focus on rigor and science without sacrificing the ability to conduct real world applications. 3
  4. 4. Logical Progression of these Data Efforts1. DataCards: quick and dirty effort to find, tag, and index data of all sorts for as many audiences as possible to reduce search costs for socio-cultural data.2. First Data Summit: Take a first cut at data evaluation criteria and beat the heck out of it in working groups so that can start to evaluate socio-cultural data that we’ve found.3. Second Data Summit: Expand the aperture on what constitutes data and relate working group insights back to prior evaluation criteria and lessons learned for continuing to find and define data.4. Third Data Summit: Start to tackle the complex issue of “how we put the data together” once we have found it.......more working groups focused on areas where we perceive we canmake concrete progress on data integration, cleaning, and fusion. 4
  5. 5. DataCards Overview• DataCards is a structured wiki-like platform that uses “cards” (like card catalog cards or baseball cards) to index and describe key details re: socio-cultural (and related) data sources.• Objectives of DataCards include: – Make sources of data discoverable. – Reduce search costs for data. – Conduit to discover and share data sources between and among non-traditional, academic, NGO, defense, law enforcement, and intelligence communities.• Accessing DataCards: − Commercial Internet: http://www.datacards.org/ − Development Site: http://beta.datacards.org/ − SIPRNet: by request, hosted by OSD CAPE 5
  6. 6. DataCards Content/Usage Update• Total cards: 1,682 (2,416 pending additional cards) • Total datacards.org users: 537• Since .org launch: 5,703 visits; 54,229 pageviews; 00:10:40 average time/visit; multiple visits from 28 countries 6
  7. 7. Related to DataCards 7
  8. 8. Summary of 1st Data Summit• Data, and the quality of the data, used for applied socio-cultural work for the DoD and other agencies is generally poor. • Often general and hard to apply to real world situations • Rarely evaluated, and even more rarely evaluated objectively• Worked on data evaluation criteria so that a “smart person” isn’t needed to evaluate data sources. • Smart people used to create the criteria, and will use “smart people in training” to apply the ratings. • The ratings shouldn’t rely on the experience of the rater, but on the quality of the criteria.• The effort acknowledged that one size does not fit all requirements, and criteria should be flexible enough to accommodate a variety of conceptions of what constitutes “data.”• DataCards assists consumers of socio-cultural data to rapidly find the data they need. The evaluation criteria help assess suitability and quality of possible data sources for their desired application. 8
  9. 9. Summary of 2nd Data Summit• “Data” is a user-defined term; it is not specific to one particular type of data. DataCards is a platform with a wide user base with varied data needs. DataCards should seek to assist with the discovery and evaluation of data sources.• Big data is a growing field of interest within analytical and knowledge communities. Big data, which was defined by the complexity, structure, and size of data, is not just social media but is generally transactional in nature, including financial transactions, SMS, and search engine results.• Many data sources are qualitative in nature and cannot be analyzed and machine processed the way quantitative or geospatial data are processed and analyzed.• The most important considerations for users of geospatial data require robust searching capabilities, a minimal path to finding data, and complete data.• There is no one way that individuals use to find data. Discovery is often project specific and individuals tend to establish and follow predictable patterns of behavior when finding data because certain sources tend to be proven relevant and trustworthy. 9
  10. 10. What is this Summit About?• This summit is about getting the mess of socio-cultural “stuff” we often call data into a usable analytic format.• The first panel focuses on two unique and innovative approaches toward putting data together for intelligence and analytic purposes; and a Phase 3 IARPA program that is rapidly fusing data in support of the intelligence community’s requirements for integrated and disparate data.• The second panel focuses on two of the major types of data that are often trumpeted as the silver bullet to understanding all things socio-cultural: social media and polling/surveys. However, these are great case studies in the potential pitfalls of data aggregation without careful thought about what it is you are putting together. 10
  11. 11. What is this Summit About? (continued)• The third panel provides three approaches to dealing with socio- cultural data, with moderate technical detail. This includes a look at the application of statistics to missing data, the dirty work of getting socio-cultural data ready for a DARPA program, and dealing with situations where socio-cultural data are sparse.• Tomorrow, the fourth panel will focus on scientific and technical approaches to information extraction and data fusion challenges.• The fifth panel will offer up thoughts on three compelling and promising areas for socio-cultural data integration: geospatial data of multiple resolutions, qualitative/subject matter expert-derived data, and human geography data.• We’ll end after lunch with a discussion about how we as a community want to proceed on this conquest. 11
  12. 12. What Do I Want to Get Out Of this Summit?• Community-building and the invigoration of new ideas to support better work with socio-cultural data.• Feedback on what methods we are missing and what has merit.• Feedback on what the forward operator needs from a group like this—this includes the warfighter, but also law enforcement officers, NGOs, partner nations, foreign service officers, economic development professionals: anyone working in the field to make a difference. 12