For data, and data science, to be the fuel of the 21th century, data driven applications should not be confined to dashboards and static analyses. Instead they should be the driver of the organizations that own or generates the data. Most of these applications are web-based and require real-time access to the data. However, many Big Data analyses and tools are inherently batch-driven and not well suited for real-time and performance-critical connections with applications. Trade-offs become often inevitable, especially when mixing multiple tools and data sources. In this talk we will describe our journey to build a data driven application at a large Dutch financial institution. We will dive into the issues we faced, why we chose Python and pandas and what that meant for real-time data analysis (and agile development). Important points in the talk will be, among others, the handling of geographical data, the access to hundreds of millions of records as well as the real time analysis of millions of data points.
Clipping is a handy way to collect important slides you want to go back to later.