Applications of Semantic Technology in the Real World Today
Hscb Focus 2010 Data Acquisition Extraction Management Debrief Jgm R1
1. Socio-Cultural Data Acquisition, Extraction and Management Dr. Jeffrey G. MorrisonCombating Terror Technical Support Office (CTTSO/TSWG) Department of Defense morrisonj@tswg.gov jeffrey.g.morrison@ugov.gov
2. Key Concepts “Essentially, all models are wrong – but some are useful.”, (George E. P. Box) Useful models need good, reliable data, therefore Gooddata is the key to solving the HSCB problem! Data interoperability is a must! We cannot afford multiple, incompatible databases There are critical technical and policy challenges to the data problem… Must focus on Data Access, Use and Sharing Model development & data collection takes time (and will occur asynchronously) How do we keep them in sync and ahead of the needs for emerging models / data? Don’t forget the Users! HSCB is not about & tools are not for modelers!
3. Data … What Data?? What is “HSCB” Data? What do we have? … and what do we need? So…. where is it? … Who can have it??? One man’s data is another’s model… Is it reasonable to expect that data developed for one purpose / model / culture is going to be usable for another?
4. Some Deep Data Questions When do you have enough data (to do ….)? We tend toward massive data today but without differentiation. Is this the best way to go? How is this extensible? Do we just keep adding data? What "social" information are we actually capturing other than the highly concrete and obvious dimensions? Should we constrain data by topic or domain? What claims can be made if we limit the topic/domain/time/collection methods? How do data types provide different information? Are the distinctions in data type important? What insights do the varying types provide? Data tends to be in English. Language contains strong social indicators. How do we collect data and adequately analyze it in other languages? If polling or interviews are used to collect data, what are we missing? Are we "over-claiming" based on these data?
5. Data Needs Develop appropriate HSCB taxonomies, ontologies,… Implement efforts to tailor HSCB data to satisfy the intended purposes Perform “Verification &Validation”: Data integrity, consistency, reliability, pedigreeas metadata; (record with the data) Update local, regional & national data, with appropriate periodicity Capture data on environment, attitudes & values in many dimensions (e.g., infrastructure, medical, attitudes, affiliations, legal systems) Assess a Central HSCB Data Repository (issues: classification, access, open source data, legal, granularity, qualitative data, maintenance, dissemination)
9. Level of detail varies from Individual to National, etc.
10.
11. Specific Data Challenges Dynamic & Static Data Factoids & Models Meta-knowledge Analyst / Modeler Beliefs & Assumptions Culture Specific / General Raw (Source) data, Vetted (Finished) Data, Derivative Data Change over time or event Heterogeneity / homogeneity within data (or subject population) may be a key aspect of interpreting data. Summary Data Describing what’s in the data set The subject populations The context in which the data was collected Assumptions, Intended use, & known limitations Change in resolution Interpolation, abstraction, fusion within and across data sets
12. State of the Data Existing HSCB data sets are: Diffused, Difficult to find and access “If you don’t know it exists and where, you’re not going to find it!” Live in different security enclaves * The data you develop with will not be the data your tool / model is used with. Lack common references – hard to fuse Are rarely ready for use – they require clean up, conversion to fit current needs Lack necessary information to support analysis (e.g., adequate metadata, indications of pedigree) Don’t have a “Use Before” and “Expiration” date Don’t have a “Use For” description Etc.
13. Jeff’s Data To-Do’s Make sure data is usable by multiple communities Develop Understanding for and Uses of Data: Support Extrapolations / Interpolations Assumptions How to manage Dynamic with Static data Aging of data Develop a best practices for use of “Best Available” & Useful versus Best Ensure we don’t get caught up in Privacy / Personal Protection issues Propagate Understanding of the prospective users of data System Developers / Modelers / Analysts / Operational Users Capabilities & Limitations Define requirements for tools / techniques that would improve the utility of field data.
14. Overcoming the Challenges… Develop a single “melting pot” approach… Common Comprehensive Universally accessible (security-enclave aware) Scalable, grow-able ontology and architecture Develop a way to tag (and maybe even fill-in) missing data Make weaknesses explicit to models & users Develop and deploy data collection tools and aids that are compatible with melting pot
15. Strategies for Data Management CTTSO FY10 HSCB BAA (June, 2009) R2532 - HSCB Dataset Repository & Management System Build a federation of Dataset Repositories Actively manage & broker data to both users & models. Development of meta-data / meta-knowledge R2533 – Data Translation & Brokering System Wizard to match data to model to user requirement HSCB Data Working Group
16. By the way… Belief and behavior are certainly related…but how closely are the correlated? How does the data relate??? Models are often validated in hindsight, based on expected outcomes. How do we collect the right kinds of data to predict the unexpected – (outside the box). Unexpected events Novel / New Outcomes When we don’t know what we don’t know.
17. As long as were talking about Culture … There’s culture…. Academic / Military Analytic / Operational …and then there’s “Culture” “People” (Us vs. Them) Family Friends Tribes Highly dependent on CONTEXT ****Subject to change without notice!****
18. Take-Aways Useful models need good, reliable data, therefore Good Data is the key to solving the HSCB problem! Data interoperability is a must! Model development & Data collection take time (and will evolve asynchronously) Don’t forget the Users!