Abstract: This webinar will introduce the De-duplication functionality in Malhar. De-duplication is a very important part of the processing pipeline in ETL workflows. We will introduce the use cases and walk through the implementation details. Next we'll look at how to configure the Dedup operator for various use cases (time based expiry as well as batch de-duplication). We will also get into a demonstration of an application which uses De-duplication (Dedup) operator.
Presenter: Bhupesh Chadwa is a Software Engineer at DataTorrent and Committer for Apache Apex.
Learn more about Apex and DataTorrent: https://www.datatorrent.com/apex/