Demand for increased capture of information to drive analytic insights into an organizations’ assets and infrastructure is growing at unprecedented rates. Technology commoditization is allowing information to be generated and collected at an unprecedented level of detail and granularity. This data revolution is providing a fresh perspective to previously well solved problems and continues the expansion of what is considered a data source. However, as data volume growth soars, the ability to provide a seamless integration into ingestion pipelines becomes operationally complex as the magnitude of data sources and types continuously grows. All data is not created equal and while there exists a romanticized notion of “all the data, right now,” real world conditions dictate otherwise. As volume of data and number of sources continues to grow, so does the need to value each. For data, there is need to establish a quality of service for higher priority content. For sources, there is an inherent cost in both operation and maintenance that needs to be traceable to the value generated so informed decisions can be made as to continued organizational expenditures.
This talk will focus on the efforts behind the child project of Apache NiFi, MiNiFi; an agent based architecture for varied infrastructures and its relation to the core Apache NiFi project. MiNiFi is focused on providing a platform that meets and adapts to where data is born while providing the core tenets of NiFi in provenance, security, and command and control. These capabilities provide versatile avenues for the bi-directional exchange of information across the data and control planes as well as increased insights into the full lineage of data from the time of its creation through analysis, providing a more complete picture while simultaneously establishing a basis for valuing resources.