SlideShare a Scribd company logo
1 of 15
Socio-Cultural Data Acquisition, Extraction and Management  Dr. Jeffrey G. MorrisonCombating Terror Technical Support Office (CTTSO/TSWG) Department of Defense morrisonj@tswg.gov jeffrey.g.morrison@ugov.gov
Key Concepts “Essentially, all models are wrong – but some are useful.”, (George E. P. Box) Useful models need good, reliable data, therefore Gooddata is the key to solving the HSCB problem! Data interoperability is a must! We cannot afford multiple, incompatible databases There are critical technical and policy challenges to the data problem… Must focus on Data Access, Use and Sharing  Model development & data collection takes time (and will occur asynchronously) How do we keep them in sync and ahead of the needs for emerging models / data? Don’t forget the Users! HSCB is not about & tools are not for modelers!
Data … What Data?? What is “HSCB” Data? What do we have?                        … and what do we need? So…. where is it?     			… Who can have it??? One man’s data is another’s model… Is it reasonable to expect that data developed for one purpose / model / culture is going to be usable for another?
Some Deep Data Questions When do you have enough data (to do ….)? We tend toward massive data today but without differentiation.  Is this the best way to go?  How is this extensible?  Do we just keep adding data?  What "social" information are we actually capturing other than the highly concrete and obvious dimensions? Should we constrain data by topic or domain?  What claims can be made if we limit the topic/domain/time/collection methods? How do data types provide different information?  Are the distinctions in data type important?  What insights do the varying types provide? Data tends to be in English.  Language contains strong social indicators.  How do we collect data and adequately analyze it in other languages? If polling or interviews are used to collect data, what are we missing?  Are we "over-claiming" based on these data?
Data Needs Develop appropriate HSCB taxonomies, ontologies,… Implement efforts to tailor HSCB data to satisfy the intended purposes Perform “Verification &Validation”: Data integrity, consistency, reliability, pedigreeas metadata; (record with the data)  Update local, regional & national data, with appropriate periodicity Capture data on environment, attitudes & values in many dimensions (e.g., infrastructure, medical, attitudes, affiliations, legal systems) Assess a Central HSCB Data Repository (issues: classification, access, open source data, legal, granularity, qualitative data, maintenance, dissemination)
Data Foci ,[object Object]
Silos shaped by the community they derive from:
 Geospatial / Social Networking /   Terrorism / Basic Attitudes & Values
 Level of detail varies from Individual to National, etc.
 Do they relate or influence each other?  How?,[object Object]
Specific Data Challenges Dynamic & Static Data Factoids & Models Meta-knowledge Analyst / Modeler Beliefs & Assumptions Culture Specific / General Raw (Source) data, Vetted (Finished) Data, Derivative Data Change over time or event Heterogeneity / homogeneity within data (or subject population) may be a key aspect of interpreting data. Summary Data Describing what’s in the data set The subject populations  The context in which the data was collected Assumptions, Intended use, & known limitations  Change in resolution  Interpolation, abstraction, fusion within and across data sets
State of the Data Existing HSCB data sets are: Diffused,  Difficult to find and access “If you don’t know it exists and where, you’re not going to find it!” Live in different security enclaves * The data you develop with will not be the data your tool / model is used with. Lack common references – hard to fuse Are rarely ready for use – they require clean up, conversion to fit current needs Lack necessary information to support analysis (e.g., adequate metadata, indications of pedigree) Don’t have a “Use Before” and “Expiration” date Don’t have a “Use For” description Etc.
Jeff’s Data To-Do’s Make sure data is usable by multiple communities Develop Understanding for and Uses of Data: Support Extrapolations / Interpolations Assumptions How to manage Dynamic with Static data Aging of data Develop a best practices for use of “Best Available” & Useful versus Best Ensure we don’t get caught up in Privacy / Personal Protection issues Propagate Understanding of the prospective users of data System Developers / Modelers / Analysts / Operational Users Capabilities & Limitations Define requirements for tools / techniques that would improve the utility of field data.
Overcoming the Challenges… Develop a single “melting pot” approach… Common Comprehensive Universally accessible (security-enclave aware) Scalable, grow-able ontology and architecture Develop a way to tag (and maybe even fill-in) missing data Make weaknesses explicit to models & users Develop and deploy data collection tools and aids that are compatible with melting pot
Strategies for Data Management CTTSO FY10 HSCB BAA (June, 2009) R2532 - HSCB Dataset Repository & Management System Build a federation of Dataset Repositories Actively manage & broker data to both users & models. Development of meta-data / meta-knowledge  R2533 – Data Translation & Brokering System Wizard to match data to model to user requirement HSCB Data Working Group

More Related Content

What's hot

Data mining financial services
Data mining financial servicesData mining financial services
Data mining financial servicesHprentice
 
USING BIGDATA WITH ACADEMIC LIBRARY SERVICES: A VIEW
USING BIGDATA WITH ACADEMIC LIBRARY SERVICES: A VIEWUSING BIGDATA WITH ACADEMIC LIBRARY SERVICES: A VIEW
USING BIGDATA WITH ACADEMIC LIBRARY SERVICES: A VIEWNellore Harilakshmi
 
Adventures in Data Profiling
Adventures in Data ProfilingAdventures in Data Profiling
Adventures in Data ProfilingJim Harris
 
Data Mining and Data Warehouse
Data Mining and Data WarehouseData Mining and Data Warehouse
Data Mining and Data WarehouseAnupam Sharma
 
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...Dr. Haxel Consult
 
Pistoia Alliance debates AI in life science
Pistoia Alliance debates AI in life sciencePistoia Alliance debates AI in life science
Pistoia Alliance debates AI in life sciencePistoia Alliance
 
FAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsFAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsTom Plasterer
 
Paradigm4 Research Report: Leaving Data on the table
Paradigm4 Research Report: Leaving Data on the tableParadigm4 Research Report: Leaving Data on the table
Paradigm4 Research Report: Leaving Data on the tableParadigm4
 
Challenges in business analytics
Challenges in business analyticsChallenges in business analytics
Challenges in business analyticsMiklos Koren
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introductionhktripathy
 
AHM 2014: OceanLink, Smart Data versus Smart Applications
AHM 2014: OceanLink, Smart Data versus Smart Applications AHM 2014: OceanLink, Smart Data versus Smart Applications
AHM 2014: OceanLink, Smart Data versus Smart Applications EarthCube
 

What's hot (20)

Data mining financial services
Data mining financial servicesData mining financial services
Data mining financial services
 
Sdl use cases
Sdl use casesSdl use cases
Sdl use cases
 
Data Analytics
Data AnalyticsData Analytics
Data Analytics
 
Reports vs analysis
Reports vs analysisReports vs analysis
Reports vs analysis
 
USING BIGDATA WITH ACADEMIC LIBRARY SERVICES: A VIEW
USING BIGDATA WITH ACADEMIC LIBRARY SERVICES: A VIEWUSING BIGDATA WITH ACADEMIC LIBRARY SERVICES: A VIEW
USING BIGDATA WITH ACADEMIC LIBRARY SERVICES: A VIEW
 
Adventures in Data Profiling
Adventures in Data ProfilingAdventures in Data Profiling
Adventures in Data Profiling
 
Data Mining and Data Warehouse
Data Mining and Data WarehouseData Mining and Data Warehouse
Data Mining and Data Warehouse
 
The Genopolis Microarray database
The Genopolis Microarray databaseThe Genopolis Microarray database
The Genopolis Microarray database
 
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...
 
Pistoia Alliance debates AI in life science
Pistoia Alliance debates AI in life sciencePistoia Alliance debates AI in life science
Pistoia Alliance debates AI in life science
 
FAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsFAIR Data Knowledge Graphs
FAIR Data Knowledge Graphs
 
Data Science in Action
Data Science in ActionData Science in Action
Data Science in Action
 
Big Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARLBig Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARL
 
Data mining
Data miningData mining
Data mining
 
Jonathan Breeze, Symplectic
Jonathan Breeze, SymplecticJonathan Breeze, Symplectic
Jonathan Breeze, Symplectic
 
Paradigm4 Research Report: Leaving Data on the table
Paradigm4 Research Report: Leaving Data on the tableParadigm4 Research Report: Leaving Data on the table
Paradigm4 Research Report: Leaving Data on the table
 
Data analytics
Data analyticsData analytics
Data analytics
 
Challenges in business analytics
Challenges in business analyticsChallenges in business analytics
Challenges in business analytics
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 
AHM 2014: OceanLink, Smart Data versus Smart Applications
AHM 2014: OceanLink, Smart Data versus Smart Applications AHM 2014: OceanLink, Smart Data versus Smart Applications
AHM 2014: OceanLink, Smart Data versus Smart Applications
 

Similar to Hscb Focus 2010 Data Acquisition Extraction Management Debrief Jgm R1

Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityPrecisely
 
Becoming Datacentric
Becoming DatacentricBecoming Datacentric
Becoming DatacentricTimothy Cook
 
Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsfBrad Houston
 
Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsfBrad Houston
 
Data management plans
Data management plansData management plans
Data management plansBrad Houston
 
Week-1-Introduction to Data Mining.pptx
Week-1-Introduction to Data Mining.pptxWeek-1-Introduction to Data Mining.pptx
Week-1-Introduction to Data Mining.pptxTake1As
 
Introduction of Data Science and Data Analytics
Introduction of Data Science and Data AnalyticsIntroduction of Data Science and Data Analytics
Introduction of Data Science and Data AnalyticsVrushaliSolanke
 
ASA conference Feb 2013
ASA conference Feb 2013ASA conference Feb 2013
ASA conference Feb 2013mrkwr
 
Data management plans
Data management plansData management plans
Data management plansBrad Houston
 
IWMW 2002: The Value of Metadata and How to Realise It
IWMW 2002: The Value of Metadata and How to Realise ItIWMW 2002: The Value of Metadata and How to Realise It
IWMW 2002: The Value of Metadata and How to Realise ItIWMW
 
2018 10 igneous
2018 10 igneous2018 10 igneous
2018 10 igneousChris Dwan
 
Transform Your Downstream Cloud Analytics with Data Quality 
Transform Your Downstream Cloud Analytics with Data Quality Transform Your Downstream Cloud Analytics with Data Quality 
Transform Your Downstream Cloud Analytics with Data Quality Precisely
 
Introduction to Data Analysis Course Notes.pdf
Introduction to Data Analysis Course Notes.pdfIntroduction to Data Analysis Course Notes.pdf
Introduction to Data Analysis Course Notes.pdfGraceOkeke3
 
KU_Big_Data_3_25_2015a
KU_Big_Data_3_25_2015aKU_Big_Data_3_25_2015a
KU_Big_Data_3_25_2015avonmcconnell
 
Denver Event - 2013 - Floodlight and Data Engine User Survey
Denver Event - 2013 - Floodlight and Data Engine User SurveyDenver Event - 2013 - Floodlight and Data Engine User Survey
Denver Event - 2013 - Floodlight and Data Engine User SurveyKDMC
 
Introduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdfIntroduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdfAbdulrahimShaibuIssa
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big DataIndu Khemchandani
 
Applications of Semantic Technology in the Real World Today
Applications of Semantic Technology in the Real World TodayApplications of Semantic Technology in the Real World Today
Applications of Semantic Technology in the Real World TodayAmit Sheth
 

Similar to Hscb Focus 2010 Data Acquisition Extraction Management Debrief Jgm R1 (20)

Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data Quality
 
Becoming Datacentric
Becoming DatacentricBecoming Datacentric
Becoming Datacentric
 
2015 04-18-wilson cg
2015 04-18-wilson cg2015 04-18-wilson cg
2015 04-18-wilson cg
 
Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsf
 
Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsf
 
Data management plans
Data management plansData management plans
Data management plans
 
Week-1-Introduction to Data Mining.pptx
Week-1-Introduction to Data Mining.pptxWeek-1-Introduction to Data Mining.pptx
Week-1-Introduction to Data Mining.pptx
 
Introduction of Data Science and Data Analytics
Introduction of Data Science and Data AnalyticsIntroduction of Data Science and Data Analytics
Introduction of Data Science and Data Analytics
 
ASA conference Feb 2013
ASA conference Feb 2013ASA conference Feb 2013
ASA conference Feb 2013
 
Data management plans
Data management plansData management plans
Data management plans
 
IWMW 2002: The Value of Metadata and How to Realise It
IWMW 2002: The Value of Metadata and How to Realise ItIWMW 2002: The Value of Metadata and How to Realise It
IWMW 2002: The Value of Metadata and How to Realise It
 
2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
 
2018 10 igneous
2018 10 igneous2018 10 igneous
2018 10 igneous
 
Transform Your Downstream Cloud Analytics with Data Quality 
Transform Your Downstream Cloud Analytics with Data Quality Transform Your Downstream Cloud Analytics with Data Quality 
Transform Your Downstream Cloud Analytics with Data Quality 
 
Introduction to Data Analysis Course Notes.pdf
Introduction to Data Analysis Course Notes.pdfIntroduction to Data Analysis Course Notes.pdf
Introduction to Data Analysis Course Notes.pdf
 
KU_Big_Data_3_25_2015a
KU_Big_Data_3_25_2015aKU_Big_Data_3_25_2015a
KU_Big_Data_3_25_2015a
 
Denver Event - 2013 - Floodlight and Data Engine User Survey
Denver Event - 2013 - Floodlight and Data Engine User SurveyDenver Event - 2013 - Floodlight and Data Engine User Survey
Denver Event - 2013 - Floodlight and Data Engine User Survey
 
Introduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdfIntroduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdf
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big Data
 
Applications of Semantic Technology in the Real World Today
Applications of Semantic Technology in the Real World TodayApplications of Semantic Technology in the Real World Today
Applications of Semantic Technology in the Real World Today
 

Hscb Focus 2010 Data Acquisition Extraction Management Debrief Jgm R1

  • 1. Socio-Cultural Data Acquisition, Extraction and Management Dr. Jeffrey G. MorrisonCombating Terror Technical Support Office (CTTSO/TSWG) Department of Defense morrisonj@tswg.gov jeffrey.g.morrison@ugov.gov
  • 2. Key Concepts “Essentially, all models are wrong – but some are useful.”, (George E. P. Box) Useful models need good, reliable data, therefore Gooddata is the key to solving the HSCB problem! Data interoperability is a must! We cannot afford multiple, incompatible databases There are critical technical and policy challenges to the data problem… Must focus on Data Access, Use and Sharing Model development & data collection takes time (and will occur asynchronously) How do we keep them in sync and ahead of the needs for emerging models / data? Don’t forget the Users! HSCB is not about & tools are not for modelers!
  • 3. Data … What Data?? What is “HSCB” Data? What do we have? … and what do we need? So…. where is it? … Who can have it??? One man’s data is another’s model… Is it reasonable to expect that data developed for one purpose / model / culture is going to be usable for another?
  • 4. Some Deep Data Questions When do you have enough data (to do ….)? We tend toward massive data today but without differentiation.  Is this the best way to go?  How is this extensible?  Do we just keep adding data?  What "social" information are we actually capturing other than the highly concrete and obvious dimensions? Should we constrain data by topic or domain?  What claims can be made if we limit the topic/domain/time/collection methods? How do data types provide different information?  Are the distinctions in data type important?  What insights do the varying types provide? Data tends to be in English.  Language contains strong social indicators.  How do we collect data and adequately analyze it in other languages? If polling or interviews are used to collect data, what are we missing?  Are we "over-claiming" based on these data?
  • 5. Data Needs Develop appropriate HSCB taxonomies, ontologies,… Implement efforts to tailor HSCB data to satisfy the intended purposes Perform “Verification &Validation”: Data integrity, consistency, reliability, pedigreeas metadata; (record with the data) Update local, regional & national data, with appropriate periodicity Capture data on environment, attitudes & values in many dimensions (e.g., infrastructure, medical, attitudes, affiliations, legal systems) Assess a Central HSCB Data Repository (issues: classification, access, open source data, legal, granularity, qualitative data, maintenance, dissemination)
  • 6.
  • 7. Silos shaped by the community they derive from:
  • 8. Geospatial / Social Networking / Terrorism / Basic Attitudes & Values
  • 9. Level of detail varies from Individual to National, etc.
  • 10.
  • 11. Specific Data Challenges Dynamic & Static Data Factoids & Models Meta-knowledge Analyst / Modeler Beliefs & Assumptions Culture Specific / General Raw (Source) data, Vetted (Finished) Data, Derivative Data Change over time or event Heterogeneity / homogeneity within data (or subject population) may be a key aspect of interpreting data. Summary Data Describing what’s in the data set The subject populations The context in which the data was collected Assumptions, Intended use, & known limitations Change in resolution Interpolation, abstraction, fusion within and across data sets
  • 12. State of the Data Existing HSCB data sets are: Diffused, Difficult to find and access “If you don’t know it exists and where, you’re not going to find it!” Live in different security enclaves * The data you develop with will not be the data your tool / model is used with. Lack common references – hard to fuse Are rarely ready for use – they require clean up, conversion to fit current needs Lack necessary information to support analysis (e.g., adequate metadata, indications of pedigree) Don’t have a “Use Before” and “Expiration” date Don’t have a “Use For” description Etc.
  • 13. Jeff’s Data To-Do’s Make sure data is usable by multiple communities Develop Understanding for and Uses of Data: Support Extrapolations / Interpolations Assumptions How to manage Dynamic with Static data Aging of data Develop a best practices for use of “Best Available” & Useful versus Best Ensure we don’t get caught up in Privacy / Personal Protection issues Propagate Understanding of the prospective users of data System Developers / Modelers / Analysts / Operational Users Capabilities & Limitations Define requirements for tools / techniques that would improve the utility of field data.
  • 14. Overcoming the Challenges… Develop a single “melting pot” approach… Common Comprehensive Universally accessible (security-enclave aware) Scalable, grow-able ontology and architecture Develop a way to tag (and maybe even fill-in) missing data Make weaknesses explicit to models & users Develop and deploy data collection tools and aids that are compatible with melting pot
  • 15. Strategies for Data Management CTTSO FY10 HSCB BAA (June, 2009) R2532 - HSCB Dataset Repository & Management System Build a federation of Dataset Repositories Actively manage & broker data to both users & models. Development of meta-data / meta-knowledge R2533 – Data Translation & Brokering System Wizard to match data to model to user requirement HSCB Data Working Group
  • 16. By the way… Belief and behavior are certainly related…but how closely are the correlated? How does the data relate??? Models are often validated in hindsight, based on expected outcomes. How do we collect the right kinds of data to predict the unexpected – (outside the box). Unexpected events Novel / New Outcomes When we don’t know what we don’t know.
  • 17. As long as were talking about Culture … There’s culture…. Academic / Military Analytic / Operational …and then there’s “Culture” “People” (Us vs. Them) Family Friends Tribes Highly dependent on CONTEXT ****Subject to change without notice!****
  • 18. Take-Aways Useful models need good, reliable data, therefore Good Data is the key to solving the HSCB problem! Data interoperability is a must! Model development & Data collection take time (and will evolve asynchronously) Don’t forget the Users!