Cassandra is a perfect fit for consuming high volumes of time-series data directly from users, devices, and sensors. Sometimes, though, when we consume data from the real world, systematic and random errors creep in. In this session, we'll see how to use open source tools like RabbitMQ and StreamSets Data Collector with Cassandra features such as User Defined Aggregates to collect, cleanse and ingest variable quality data at scale. Discover how to combine the power of Cassandra with the flexibility of StreamSets to implement adaptive data cleansing.
About the Speaker
Pat Patterson Community Champion, StreamSets
Pat Patterson has been working with Internet technologies since 1997, building software and working with communities at Sun Microsystems, Huawei, Salesforce and StreamSets. At Sun, Pat was the community lead for OpenSSO, while at Huawei he developed cloud storage infrastructure software. A developer evangelist at Salesforce, Pat focused on identity, integration and IoT. Now community champion at StreamSets, Pat is responsible for the care and feeding of the StreamSets open source community.