With today's abundance of data (big or small), organization's ability to capture, understand and process new content is key for their success. Martin Magdinier has developed a custom data transformation stack to integrate over an hundred eclectic data feeds into a single repository. His process goes through three stages:
- Data discovery and exploration,
- Rapid data transformation prototyping and
- Automation of data cleaning and transformation process.
This presentation review challenges specific to each step of the integration process, describe tools used (OpenRefine, Talend, Crowdflower) and processes developed to address them while keeping agility and flexibility of the overall stack in mind.