Pete Zybrick will discuss techniques for analyzing, extracting, and validating large datasets using tools from Cloudera and AWS. He will provide examples using the Federal Reserve Economic Database (FRED) and SiteCatalyst data. The presentation will cover programmatically analyzing the data structures, defining extraction and validation rules, bulk importing data into Impala and Redshift, and productivity tools for business users to access subsets of large datasets.