In this talk Mark Baker (CSL) will show how CSL Behring is Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache NIFI to a central Hadoop data lake at CSL Behring
The challenge of merging data from disparate systems has been a leading driver behind investments in data warehousing systems, as well as, in Hadoop. While data warehousing solutions are ready-built for RDBMS integration, Hadoop adds the benefits of infinite and economical scale – not to mention the variety of structured and non-structured formats that it can handle. Whether using a data warehouse or Hadoop or both, physical data movement and consolidation is the primary method of integration.
There may also be challenges with synchronizing rapidly changing data from a system of record to a consolidated Hadoop platform .
This introduces the need for “data federation” , where data is integrated without copying data between systems.
For historical/batch data use cases there is a replication of data across remote data hubs into a central data lake using Apache NIFI.
We will demo using Apache Zeppelin for analyzing data using Apache Spark and Apache HIVE.