HDF supports over 90 difference processors to accelerate the process of ingesting and processing data. There are ready-made “off the shelf” processors for data collection, data processing. For example – in alphabetical order, not necessarily popularity: EncryptContent, ExecuteFlumeSink, ExecuteFlumeSource, ExecuteSQL, ExtractHL7, GetFTP, GetHTTP, PutKafka, MergeContent, MonitorActivity, PutEmail, PutHDFS, SpltJSON, TransformXML.
There are many different processors, some of which are designed to simplify collection of big data from popular data sources. Twitter is one of them. Others include:
This is a very unique capability of dataflow – the ability to see processors update in real time This gives data developers and data scientists the ability to quickly verify hypothesis and as well enable on-time decision making – within the relevant time-window.
Once the data flow is established, it can be dynamically manipulated, replicated and transformed. This removes the need to develop code in a test environment, and then porting to a production environment. Being able to immediately test within the production environment, accelerates the time to insight.
And all of this is tracked so when you get to the point of “what did I try before again”, or “what happened last time”, it is readily accessible via the GUI interface.
HDF provides very fine-grained, high fidelity reporting about the origins of data, how it was used, who used it etc.
Design a Dataflow in 7 minutes with Apache NiFi/HDF