This document discusses the design and prototyping of a social media observatory to collect and analyze large-scale social media data from Twitter. Some key points: - It collects a random sample of Twitter data using the Streaming API dating back to August 2010, totaling over 5TB of compressed data related to US politics, social movements, and news. - It aims to provide reliable, reproducible, and openly accessible social media data and analytics for non-profit uses through standardized data storage, reference models, and APIs/visualizations. - Legal compliance with Twitter's terms of service is considered, limiting access to tweet text and derived aggregate data rather than individual tweets. The goal is to enable large-scale