Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Mesos Meetup - Building an enterprise-ready analytics and operational ecosystem on DC/OS

250 views

Published on

On November 6th, we got together at Google Campus to talk about Mesos and DC/OS.

Ignacio Mulas, Sparta & Spark Product Owner at Stratio, explained how to build an environment that can secure and govern its data for operational and analytical applications on top of DC/OS platform. He showed that analytical and machine learning pipelines can be combined with operational processes maintaining the security and providing governing tools to manage our data. He focused on the architecture and tools needed to achieve an ecosystem like this and we will show a demo of it. He also explained how we can develop our pipelines interactively with auto-discovered data catalogs and explore our results.

Find out more: https://www.stratio.com/events/discover-how-to-deploy-a-secure-big-data-pipeline-with-dcos/

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Mesos Meetup - Building an enterprise-ready analytics and operational ecosystem on DC/OS

  1. 1. Building an enterprise-ready analytics and operational ecosystem on DC/OS Ignacio Mulas
  2. 2. Index: ● Data Centric Overview ● Non-functional Requirements ● Functional Case: ○ Data Exploration ○ Data Preparation ○ Data Validation ○ Productionalization ○ Evaluation
  3. 3. Overview
  4. 4. SAP : ERP Mobile App Campaign Manager CRM Call center THE ROOT OF THE PROBLEM OF PHYSICAL COMPANIES HAS BEEN IDENTIFIED: SILOS & APPLICATION CENTRIC Big Data LakeDATA MART DATA MART E-commerce DATA WAREHOUSE TPV APP Lost data No Real Time 10X Data Replication Low TPO/TCO 10X Costs Day-1 analytics Non-integrated vision Silos between departments Not a real IA Problems
  5. 5. Mobile APP Campaign Management Digital Marketing Legacy Applications Call center Core Application ATG TPV APP CRM E-commerce Microservices of the Data Intelligence layer New Applications are developed through microservice orchestration reducing code in half Unique data at the center and applications around it using it in real time with maximum intelligence Operational and Informational Applications use the microservices of the Data as a Service layer Microservices SOLUTION: STRATIO DATACENTRIC Operationalizing Big Data DATA Data intelligence Api Daas (Data as a Service) DC/OS Infrastructure and container manager MultiDataStore & Multiprocessing
  6. 6. Outer look.... Stratio DataCentric Stratio EOS Stratio XData Stratio Sparta Stratio Discovery Stratio Governance Stratio GoSec Deploy and manage all your services with a single click Gain a centralized vision of all your data and easily govern its access and management Apply real-time and batch processing across multiple engines in distributed environments Become a truly data- driven company with AI Turn difficult concepts into something simple Protect your data against security breaches and maintain compliance Stratio Intelligence Begin the journey from data to knowledge Microservices Framework Design, Develop and manager applications easily
  7. 7. Non-Functional requirements
  8. 8. Key non-functional requirements on data centric 1. Security levels & profiling —On this scenario, we need to be able to support encrypted communications, authentication & authorization mechanisms, audit and a centralized easy-to-use security manager that enforces complex policies on applications and data. 2. Isolation of resources—we should guarantee that each application/user have what they need to work properly without stepping into others resources. Mixing different workloads should not affect the correct functioning of the most critical services, i.e. operational microservices vs big data frameworks. 3. Data governance tools—getting all together imposes new levels of data management requirements where data is not modelled but auto-discovered and enriched with business context. 4. DevOps productionalization mechanisms—in the cloud and containers era, maintenance and operations are reduced to the minimum thanks to automation mechanisms. Scaling, upgrading, deploying is a day-to-day task and therefore, we need to ensure easy mechanisms to do and manage them.
  9. 9. Key non-functional requirements on data centric 1. Security levels & profiling —On this scenario, we need to be able to support encrypted communications, authentication & authorization mechanisms, audit and a centralized easy-to-use security manager that enforces complex policies on applications and data. µs SSO Policies Audit µs-2 Secrets
  10. 10. Key non-functional requirements on data centric 2. Isolation of resources—we should guarantee that each application/user have what they need to work properly without stepping into others resources. Mixing different workloads should not affect the correct functioning of the most critical services, i.e. operational microservices vs big data frameworks. µs Big Data Process ... - Network isolation - CPU, RAM, Disk isolation
  11. 11. Key non-functional requirements on data centric 3. Data governance tools—getting all together imposes new levels of data management requirements where data is not modelled but auto-discovered and enriched with business context. Big Data Tool A process / application need data to work properly but, we need to maintain certain guarantees: - Data Security: - Who are you? - Are you authorized to read/write data from here - Data processes development: - Where can I read a trusted source of information containing my clients emails? - Is this personal data? I need to follow GDPR! - Can I delete this record? I do not think it is used in our business… - Who created this? Data Dictionary Business glossary Lineage A process / application need data to work properly but, we need to maintain certain guarantees: - Data Security: - Who are you? - Are you authorized to read/write data from here - Data processes development: - Where can I read a trusted source of information containing my clients emails? - Is this personal data? I need to follow GDPR! - Can I delete this record? I do not think it is used in our business… - Who created this?
  12. 12. Key non-functional requirements on data centric 4. DevOps productionalization mechanisms—in the cloud and containers era, maintenance and operations are reduced to the minimum thanks to automation mechanisms. Scaling, upgrading, deploying is a day-to-day task and therefore, we need to ensure easy mechanisms to do and manage them. Different deployment models: ● Replace version ● Blue/Green ● Canary Testing ● Versioning and history ● Rollback mechanisms ● Models retraining ● Functioning Evaluation ● Metrics tracking ● Versions comparison Applications are monitored on several metrics: ● Application metrics ● Business metrics ● Computational metrics Deployment Monitoring Management Evaluation
  13. 13. Functional case: Clients Scoring
  14. 14. Functional case: Client Scoring for a financial institution
  15. 15. Functional case: Client Scoring for a financial institution 1. Data exploration—Occurs early in a project; may include viewing sample data, running queries for statistical profiling, exploratory analysis and visualizing data. 2. Data preparation —Iterative task; may include cleaning, standardizing, transforming, denormalizing, and aggregating data; typically the most time-intensive task of a project 3. Data validation —Recurring task; may include viewing sample data, running queries for statistical profiling and aggregate analysis, and visualizing data; typically occurs as part of data exploration, data preparation, development, pre-deployment, and post-deployment phases 4. Productionalization—Occurs late in a project; may include deploying code to production, backfilling datasets, training models, validating data, and scheduling workflows
  16. 16. Data Exploration
  17. 17. Data Exploration
  18. 18. Data Preparation
  19. 19. Data Preparation
  20. 20. Data Validation
  21. 21. Data Validation
  22. 22. Productionalization - Workflow
  23. 23. Productionalization - Workflow Versioning
  24. 24. Productionalization - Workflow Deployment
  25. 25. Evaluation
  26. 26. Evaluation
  27. 27. BIG DATA CHILD`S PLAY Questions? :)
  28. 28. ● Facial Recognition: ability to correctly identify a high percentage of the known individuals, given the image of face. Ability to learn new faces. ● Emotion classification: ability to correctly classify above 65% of the emotions of persons, given the image of face. The emotions identified are: happiness, sadness, surprise, anger. ● Object Recognition: ability to segment and classify objects from images. ● Natural Interaction Agent: ability to talk to humans in a natural way (typing or through voice using a phone terminal). Ability to trigger basic actions based on the identified intent, e.g., "show a document" or "switch on a light bulb". ● Semantic Document Retrieval: ability to find documents based on their content. The way of querying is based on a natural interaction using standard text. ● Question Answering: ability to answer a specific questions from a text or a document. E.g., "when was Peter born?" => "May 20th, 2001" ● Awareness: ability to manage any amount of data in an almost instantaneous way in order to reach conclusions, create warnings or trigger actions. The data managed by this ability could come from the previous abilities and/or any other external feed. New Capabilities…

×