Video from presentation: https://youtu.be/pDi-npLE3Zo
Wandera, the company recognized by both Gartner and IDC in their Q3/2017 reports on Mobile Threat Defense, offers organizations a solution for Enterprise Mobile Security and Data Management.In this talk, we will introduce the backend, which powers the Wandera service with emphasis on data flow, analysis and processing. Then we will discuss how our Machine Learning models are trained, evaluated and are then verified by humans via QA before being deployed into a production environment. The final part of the presentation will be devoted to lessons learned.
Speaker: Zdenek Letko, Software Engineer at Wandera, https://www.linkedin.com/in/zdenek-letko-26064848/
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
How to Deliver Machine Learning To Production - Zdenek Letko
1. How to Deliver Machine Learning To Production
David Pryce, Oscar Knagg, Ruslan Ciurca, Zdenek Letko
2. Wandera: Mobile Threat Defence & Data Management Solution for Enterprise
● Integration with Enterprise Mobility Management (EMM) &
Security Information and Event Management (SIEM)
7. PMML -- Predictive Model Markup Language
● Header
● Data Dictionary
● Mining Schema
● Data Transformations
● Model
● Targets
● Output
● Model Explanation
● Model verification
http://dmg.org/pmml/
Pre-processing
Post-processing
8. Openscoring Project
● Loads PMML and execute the model for given inputs
● Ready to use solutions include
○ Libraries -- JPMML Evaluator
○ REST microservice
○ Heroku / OpenShift cloud
○ PMML hadoop functions for Apache Hive, Apache Pig
○ PMML functions for PostgreSQL
● High performance, high throughput, thread safe
http://openscoring.io/
https://github.com/openscoring/openscoring
13. PMML-Openscoring
Pros:
● Clear separation of data scientists and data engineers (tools, languages, …)
● Easy integration with our infrastructure
● Models versioning
○ QA before the model hits production data
○ Ability to rollback infrastructure
● Pricing
○ Openscoring requirements are super low
14. PMML-Openscoring
Cons:
● Online learning is not possible
Lessons learned:
● Create reusable training process
● Avoid manual interference with XML file
● XML can be huge → OOM
● Data scientists like to see context in logs and they like to query those data