This document discusses large-scale data processing enabled by new technologies. It notes that large data volumes from 100s of TBs to 10s of PBs can now be processed at low cost using distributed parallel frameworks like MapReduce. New data sources include sensors, devices, and unstructured data like text and images. These new technologies enable analyzing this data to answer questions and gain new insights about product popularity, best ads to serve, and detecting fraud.