Wait! Exclusive 60 day trial to the world's largest digital library.
The SlideShare family just got bigger. You now have unlimited* access to books, audiobooks, magazines, and more from Scribd.Cancel anytime.
Big data tools such as Hadoop and Spark allow you to process data at unprecedented scale, but keeping your processing engine fed can be a challenge. Upstream data sources can 'drift' due to infrastructure, OS and application changes, causing ETL tools and hand-coded solutions to fail. StreamSets Data Collector (SDC) is an open source platform for building big data ingest pipelines that allows you to design, execute and monitor robust data flows. In this session we'll look at how SDC's "intent-driven" approach keeps the data flowing, whether you're processing data 'off-cluster', in Spark, or in MapReduce.
StreamSets software delivers performance management for data flows that feed the next generation of big data applications. Its mission is to bring operational excellence to the management of data in motion, so that data arrives on time and with quality, accelerating analysis and decision making. StreamSets Data Collector is in use at hundreds of companies where it brings unprecedented visibility into and control over data as it moves between an expanding variety of sources and destinations.
Pat Patterson has been working with Internet technologies since 1997, building software and working with communities at Sun Microsystems, Huawei, Salesforce and StreamSets. At Sun, Pat was the community lead for the OpenSSO open source project, while at Huawei he developed cloud storage infrastructure software. Part of the developer evangelism team at Salesforce, Pat focused on identity, integration and the Internet of Things. Now community champion at StreamSets, Pat is responsible for the care and feeding of the StreamSets open source community.