SURVEY OF COMMONALITY WITH OTHER DISCIPLINES
WORKSHOP 2 – JULY 25, 2013
INDIANAPOLIS, INDIANA
MICAH ALTMAN
DIRECTOR OF RES...
Prepared for
DASPOS Workshop
JCDL 2013
Characterizing Data and Software for
Social Science Research
Dr. Micah Altman
<esci...
DISCLAIMER
These opinions are my own, they are not the opinions
of MIT, Brookings, any of the project funders, nor (with
t...
Collaborators & Co-Conspirators
• Jonathan Crabtree, Nancy McGovern
• National Digital Stewardship Coordination
Committee ...
Related Work
• CoData Task Group on Data Citations, 2013 (Forthcoming) Out of Cite, Out of
Mind: The Current State of Prac...
This Talk
• Landscape
(dimensions & attributes)
• Landmarks
(sample use cases)
Data and Software in Social Science Research
Landscape:
Characteristics of Social
Science Research Data
Data and Software in Social Science Research
Some Characteristics of Research Data
Data and Software in Social Science Research
Attribute Type Examples
Data: Structure...
Some Characteristics of Research Measurements
Data and Software in Social Science Research
Attribute Type Examples
Measure...
Some Characteristics of Research Data Use
Data and Software in Social Science Research
Attribute Type Examples
Analysis me...
Data and Software in Social Science Research
Some Characteristics of Use Constraints
Contract Intellectual Property
Access...
Landmarks
(Exemplar Use Cases)
Data and Software in Social Science Research
Exemplar: Policy Analysis
Data and Software in Social Science Research
Attribute Type Examples
Data: Structure - Single re...
Exemplar: Media Anthropology Dissertation
Data and Software in Social Science Research
Attribute Type Examples
Data: Struc...
Exemplar: Social Message Analysis
Data and Software in Social Science Research
Attribute Type Examples
Data: Structure - n...
Trends: More
More Types of Evidence More CollaborationMore Data
More Publications, More Filters
More Learners
More Open
Da...
Some Challenges for Long-Term
Replication/Access
• “messy” human sensors
• Mix of data types, structures, sparsity
• Compl...
Questions?
E-mail: escience@mit.edu
Web: micahaltman.com
Twitter: @drmaltman
Data and Software in Social Science
Research
Upcoming SlideShare
Loading in …5
×

Characterizing Data and Software for Social Science Research

1,991 views

Published on

This presentation describes the landscape of data and software use across the social sciences in terms of the abstract dimensions of data and data use. It then examines three use cases.

Presentation for DASPOS < https://daspos.crc.nd.edu/index.php/workshops/workshop-2 > Workshop at JCDL.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,991
On SlideShare
0
From Embeds
0
Number of Embeds
895
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • This work. by Micah Altman (http://micahaltman.com) is licensed under the Creative Commons Attribution-Share Alike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.Any images included in derivative works must be individually attributed to their original sources, as indicated in notes
  • The structure and design of digital storage systems is a cornerstone of digital preservation. To better understand ongoing storage practices of organizations committed to digital preservation, the National Digital Stewardship Alliance conducted a survey of member organizations. This talk discusses findings from this survey, common gaps, and trends in this area.(I also have a little fun highlighting the hidden assumptions underlying Amazon Glacier&apos;s reliability claims. For more on that see this earlier post: http://drmaltman.wordpress.com/2012/11/15/amazons-creeping-glacier-and-digital-preservation )
  • Survey image source, licensed under CC-SA-NC : http://gaithersburgbookfestival.org/take-our-survey-and-win/Students images source: commons.wikimedia.orgOther images source: nsf.gov.
  • File icon is licensed under CC0 on pixabay.com. http://pixabay.com/en/spreadsheet-excel-table-diagram-98491/Dissertation is licensed under CC-BY-SA by Victoria Catterson http://www.flickr.com/photos/cowlet/354911838/Other images available through commons.wikimedia.org
  • Other image source: wikimedia commons
  • LHC produces a PB every 2 weeks, Sloan Galaxy zoo has hundreds of thousands of “authors”, 50K people attend a class from the University of michigan, and to understand public opinion instead of surveying 100’s of people per month we can analyze 10ooo tweets per second.
  • Characterizing Data and Software for Social Science Research

    1. 1. SURVEY OF COMMONALITY WITH OTHER DISCIPLINES WORKSHOP 2 – JULY 25, 2013 INDIANAPOLIS, INDIANA MICAH ALTMAN DIRECTOR OF RESEARCH, MIT LIBRARIES MASSACHUSETTS INSTITUTE OF TECHNOLOGY ESCIENCE@MIT.EDU PRIMARY RESEARCH OR PRACTICE AREA(S) • INFORMATION SCIENCE • SOCIAL SCIENCE PREVIOUS EXPERIENCE • DIGITAL LIBRARIES • DIGITAL PRESERVATION • STATISTICAL COMPUTING RELATED WORK • PUBLICMAPPING.ORG • INFORMATICS.MIT.EDU CONTACT INFORMATION E25-131, 77 MASSACHUSETTS AVE, MIT, CAMBRIDGE, MA, 02139
    2. 2. Prepared for DASPOS Workshop JCDL 2013 Characterizing Data and Software for Social Science Research Dr. Micah Altman <escience@mit.edu> Director of Research, MIT Libraries Non-Resident Senior Fellow, Brookings Institution
    3. 3. DISCLAIMER These opinions are my own, they are not the opinions of MIT, Brookings, any of the project funders, nor (with the exception of co-authored previously published work) my collaborators Secondary disclaimer: “It’s tough to make predictions, especially about the future!” -- Attributed to Woody Allen, Yogi Berra, Niels Bohr, Vint Cerf, Winston Churchill, Confucius, Disreali [sic], Freeman Dyson, Cecil B. Demille, Albert Einstein, Enrico Fermi, Edgar R. Fiedler, Bob Fourer, Sam Goldwyn, Allan Lamport, Groucho Marx, Dan Quayle, George Bernard Shaw, Casey Stengel, Will Rogers, M. Taub, Mark Twain, Kerr L. White, etc. Data and Software in Social Science Research
    4. 4. Collaborators & Co-Conspirators • Jonathan Crabtree, Nancy McGovern • National Digital Stewardship Coordination Committee & Working Group Chairs • Privacy Tools for Sharing Research Data Team (Salil Vadhan, P.I.) http://privacytools.seas.harvard.edu/peopl e • Research Support – Supported in part by NSF grant CNS-1237235 – Thanks to the Library of Congress, & the Massachusetts Institute of Technology.Data and Software in Social Science Research
    5. 5. Related Work • CoData Task Group on Data Citations, 2013 (Forthcoming) Out of Cite, Out of Mind: The Current State of Practice, Policy, and Technology for the Citation of Data, Co- Data Journal (Special Volume). • Altman & Jackman, 2012, 19 Ways of Looking at Statistical Software, Journal of Statistical Software • National Digital Stewardship Alliance, 2013, 2014 National Agenda for Digital Stewardship. • Novak, K., Altman, M., Broch, E., Carroll, J. M., Clemins, P. J., Fournier, D., Laevart, C., et al. 201.. Communicating Science and Engineering Data in the Information Age. Computer Science and Telecommunications. National Academies Press • Altman, M., Rogerson, K., & U, D. (2008). Open Research Questions on Information and Technology in Global and Domestic Politics – Beyond “E-.i, 41(4), 1-8. Retrieved from http://www.journals.cambridge.org/abstract_S104909650824093X • Altman, Gill & McDonald. 2003. Numerical Issues in Statistical Computing for the Social Scientist Most reprints available from:Data and Software in Social Science Research
    6. 6. This Talk • Landscape (dimensions & attributes) • Landmarks (sample use cases) Data and Software in Social Science Research
    7. 7. Landscape: Characteristics of Social Science Research Data Data and Software in Social Science Research
    8. 8. Some Characteristics of Research Data Data and Software in Social Science Research Attribute Type Examples Data: Structure - Single relation (table) - Fully relational - Network - Geospatial - Semi-structured (e.g. text) Data: Attribute Types - Continuous/Discrete - Scale: ratio/interval/ordinal/nominal Data: Performance Characteristics - Number of observations - Frequency of updates - Dimensionality - Sparsity - Collection heterogeneity
    9. 9. Some Characteristics of Research Measurements Data and Software in Social Science Research Attribute Type Examples Measurement: Unit of Observation - Individuals - Groups - Institutions - Organizations - Interactions Measurement: Measurement type - Experimental - Observational - Synthetic/computational Measurement: Performance characteristic - Metadata - Ontology - Quality
    10. 10. Some Characteristics of Research Data Use Data and Software in Social Science Research Attribute Type Examples Analysis methods - Counting - GLM model family - MLE model family - (Constrained) continuous nonlinear optimization - Blind global optimization - Discrete optimization - Bayesian Methods (MCMC) - Heuristically/algorithmically defined - Text mining - Clustering - Coding and qualitative analysis - Exploratory Data Analysis Desired Outputs - Summary scalars - Summary table - Data subset - Static data publication - Static visualization - Dynamic Visualization
    11. 11. Data and Software in Social Science Research Some Characteristics of Use Constraints Contract Intellectual Property Access Rights Confidentiality Copyright Fair Use DMCA Database Rights Moral Rights Intellectual Attribution Trade Secret Patent Trademark Common Rule 45 CFR 26 HIPAA FERPA EU Privacy Directive Privacy Torts (Invasion, Defamation) Rights of Publicity Sensitive but Unclassified Potentially Harmful (Archeological Sites, Endangered Species, Animal Testing, …) Classified FOIA CIPSEA State Privacy Laws EAR State FOI Laws Journal Replication Requirements Funder Open Access Contract License Click-Wrap TOU Export Restrictions NDA
    12. 12. Landmarks (Exemplar Use Cases) Data and Software in Social Science Research
    13. 13. Exemplar: Policy Analysis Data and Software in Social Science Research Attribute Type Examples Data: Structure - Single relation (table) Data: Attribute Types - Continuous/Discrete - Scale: ratio/interval/ordinal Data: Performance Characteristics - 10K-100K observation - Monthly/annual updates - Dozens of dimensions/measures Measurement: Unit of Observation - Individuals; Organization; Institutions Measurement: Measurement type - Observational - Repeated cross-sectional/longitudinal over decades Measurement: Performance characteristic - High quality measurements - Systematic and complete metadata - Controlled ontology - Regular updates & long-term access Management Constraints - Confidentiality; Public Access Analysis methods - Counting (contingency tables); GLM Family Desired Outputs - Summary scalars - Summary table - Static visualization (map) More Information • Science and Engineering Indicators: http://www.nsf.gov/statistics/seind12/ • Details of NCSES use case: Novak et al. 2011 • Policy data producer perspectives: Journal of Official Statistics
    14. 14. Exemplar: Media Anthropology Dissertation Data and Software in Social Science Research Attribute Type Examples Data: Structure - audio video - GIS coverage/ GPS trails - Semi structured field notes - Coded qualitative and quantitative data Data: Attribute Types - Discrete - Scale: ordinal/nominal Data: Performance Characteristics - 100’s of observed units - Longitudinal - Dozens of dimensions/measures - Static after publication Measurement: Unit of Observation - Individuals; Organizations; Physical environment Measurement: Measurement type - Observational; Interaction Measurement: Performance characteristic - High quality measurements - Systematic and complete metadata - Emergent coding/ontology Management Constraints - Confidentiality; social norms Analysis methods - Counting; Discourse; CAQDA (Qualitative) - (Future) AI/Machine learning Desired Outputs - Book - 1-2 hour video / interactive media synthesis More Information • Harvard media anthropology Ph.D. Program: sel.fas.harvard.edu/phd.html Image Sources: Wikimedia Commons. Pixabay.com, Flickr
    15. 15. Exemplar: Social Message Analysis Data and Software in Social Science Research Attribute Type Examples Data: Structure - network Data: Attribute Types - Continuous/Discrete/ - Scale: ratio/interval/ordinal/nominal Data: Performance Characteristics - 10M-1B observations - Sample from stream of continuously updated corpus - Dozens of dimensions/measures Measurement: Unit of Observation - Individuals; Interactions Measurement: Measurement type - Observational Measurement: Performance characteristic - High volume - Complex network structure - Sparsity - Systematic and sparse metadata Management Constraints - License; Replication Analysis methods - Bespoke algorithms (clustering); nonlinear optimization; Bayesian methods Desired Outputs - Summary scalars (model coefficients) - Summary table - Static /interactive visualization More Information • Grimmer, Justin, and Gary King. "General purpose computer- assisted clustering and conceptualization." Proceedings of the National Academy of Sciences 108.7 (2011): 2643-2650. • King, Gary, Jennifer Pan, and Molly Roberts. "How censorship in China allows government criticism but silences collective expression." APSA 2012 Annual Meeting Paper. 2012. • Lazer, David, et al. "Life in the network: the coming age of computational social science." Science (New York, NY) 323.5915 (2009): 721.
    16. 16. Trends: More More Types of Evidence More CollaborationMore Data More Publications, More Filters More Learners More Open Data and Software in Social Science Research More Replication
    17. 17. Some Challenges for Long-Term Replication/Access • “messy” human sensors • Mix of data types, structures, sparsity • Complex constraints: confidentiality, licensing, NDA’s • Manual/Computer-assisted coding • Niche commercial software (and private bespoke software) integral to analysis • Very long term longitudinal data/accessibility requirements Data and Software in Social Science Research
    18. 18. Questions? E-mail: escience@mit.edu Web: micahaltman.com Twitter: @drmaltman Data and Software in Social Science Research

    ×