NEON is an example of a large-scale NSF-funded organization designed to help ecologists have the data and tools they need to enable higher quality research. Ecology has been notoriously bad at sharing data among researchers in the past, where most researchers collect small-scale datasets that often don’t aggregate well or get shared.
Datasift is an aggregator of data from social media sites that makes data available to companies via their own API.Researchers can pay for access to data from DataSift, though it can get expensive very quickly depending on the research questions and doesn’t collect many types of data that are important (e.g., many types of network data). The point of this slide is simply to show that the backend processing of big social data is not trivial and there are many interesting research questions tied to the actual infrastructure development that could be tackled while providing the tools necessary to support computational social science at a much larger scale.
This is the Query Builder put out by DataSift. A good example of a first attempt at making the creation of complex queries intuitive.
Infrastructure for Supporting Computational Social Science
Infrastructure Research to Support Computational Social Science Derek Hansen & Kevin Tew Brigham Young University
Current Options for Researchers Data Sources Code it Yourself Computer Scientists APIs ScrapersUse Corporate Tools Software Libraries Use Free 3rd Party Tools Social Scientists
Problems with Current Approach• Non-coders have limited opportunities• Corporate tools not designed for research needs and high cost• Major duplication of effort – Extra work for researchers – More resource intensive for companies• APIs not available, constantly changing, or rate limited• Creating and maintaining 3rd party tools is hard – Ongoing funding is challenging in a research environment – Contribution not always recognized in academia• Inconsistency in legal & ethical approaches
A Large-Scale Solution?Enabling a Better Understanding of Continental-Scale EcologyNEON is designed to gather and synthesize data on the impacts of climatechange, land use change and invasive species on natural resources andbiodiversity… NEON will combine site-based data with remotely sensed dataand existing continental-scale data sets (e.g. satellite data) to provide arange of scaled data products that can be used to describe changes in thenation’s ecosystem through space and time.Free and Publicly Accessible ResourcesNEON’s open-access approach to its data and information products willenable scientists, educators, planners, decision makers and the public tomap, understand and predict the effects of human activities on ecology andeffectively address critical ecological questions and issues.
Infrastructure Research• Data Handling and Processing – “Big Data” storage and analysis (e.g., scalable, real-time) – Customized programming language(s)• Human-Computer Interaction – Support usability and encourage high quality work – Visualization• Legal and Social – Legal framework for companies & IRBs – Community-building among researchers
Collaboration OpportunitiesCenter for the Advanced Study ofCommunities and Information (CASCI) 2013 Digital Societies and Social Technologies (DSST) Summer Institute