Apache Metron is a streaming cybersecurity application
built on Apache Storm and Hadoop. One of its core missions is to enable advanced analytics through machine learning and data science to the users. Because of the relative immaturity of data science platform infrastructure integrated into Hadoop that is oriented to streaming analytics applications, we have been forced to create the requisite platform components out of necessity, utilizing many of the pieces of the Hadoop ecosystem.
In this talk, we will speak about the Metron analytics architecture and how it utilizes a custom data science model deployment and autodiscovery service that is tightly integrated with Hadoop via Yarn and Zookeeper. We will discuss how we interact with the models deployed there via a custom domain specific language that can query models as data streams past. We will generally discuss the full-stack data science tooling that has been created to enable data science at scale on an advanced analytics
Casey Stella, Principal Software Engineer/Data Scientist, Hortonworks