Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big Data Security: Facing the challenge

251 views

Published on

A data-centric platform integrates multiple Big Data open source technologies. For example, at Stratio we use Spark, Kafka, Elastic search and many more. Most of these technologies do not offer native security. This lack of security, not only leaves companies open to critical risks like data leakage, unsecure communications or DoS attacks but is also a major barrier to complying with different regulations such as LOPD, PCI-DSS or the upcoming GDPR. This talk gives a technical and innovative overview of how companies can face the challenge of protecting the data and services that are in their data-centric platform, focusing on three main aspects: implementing network segmentation, managing AAA and securing data processing.
By: Carlos Gómez

Published in: Technology
  • Be the first to comment

Big Data Security: Facing the challenge

  1. 1. Big Data Security Facing the challenge
  2. 2. Experience the presentation xlic.es/v/E98605
  3. 3. © Stratio 2017. Confidential, All Rights Reserved. 3 About me • Father of a 5 year old child • Technical leader in Architecture and Security team at Stratio • Sailing skipper
  4. 4. © Stratio 2017. Confidential, All Rights Reserved. In your opinion, how difficult is it to manage security in your projects? 4 ● Very difficult ● Difficult ● Easy ● Very Easy ● What is security?
  5. 5. DATA GOVERNANCE LOGS CENTRALIZATION PROJECTS FOR EVER ONGOING IN BIG COMPANIES In a monolithic application centric it with data silos these initiatives never get accomplished HUNDRED OF MILLIONS OF EUROS SPENT DURING THE YEARS IN GLOBAL IT CROSS INITIATIVES SAS CRM Earnix (Pricing) Towers Watson ERP Data Warehouse Lab H0 (Plataforma Big Data compartida por el grupo) WebFocus Oracle Mainframe MONITORING SECURITYDATA SECURITY AUDIT
  6. 6. PROJECTS FOR EVER ONGOING IN BIG COMPANIES DATA GOVERNANCE LOGS CENTRALIZATION MONITORING DATA SECURITY AUDIT 1 2 3 4 5
  7. 7. PROJECTS FOR EVER ONGOING IN BIG COMPANIES DATA GOVERNANCE LOGS CENTRALIZATION MONITORING DATA SECURITY AUDIT 1 2 3 4 5
  8. 8. ETL PROJECTS FOR EVER ONGOING IN BIG COMPANIES DATA GOVERNANCE LOGS CENTRALIZATION MONITORING DATA SECURITY AUDIT 1 2 3 4 5
  9. 9. GALGO CHASING ELECTRONIC RABBIT… COMPANIES ALWAYS TRY TO GET THE RABBIT In an application centric company with data silos you never will be able to achieve successfully those projects DATA GOVERNANCE LOGS CENTRALIZATION MONITORING SECURITY DATA SECURITY AUDIT
  10. 10. STRUCTURAL INITIATIVES ARE SOLVED COMPLETELY WITH DATA CENTRIC DaaS (data as a service) Data Data Intelligence DATA GOVERNANCE LOGS CENTRALIZATION MONITORING SECURITYDATA SECURITY AUDIT Functionalities Implemented in the product
  11. 11. RABBIT IN A JAIL MINIMUM EFFORT AND COST TO GET THE RABBIT
  12. 12. 12 Facing the challenge
  13. 13. © Stratio 2017. Confidential, All Rights Reserved. 13 SECURITY IN A DATA CENTRIC Protect the data • Perimeter security to access the cluster. • Support identity management and authentication to prove that a user/service is who claims to be. • In a multi-data store platform ACLs should be centralized to simplified the correct authorization to different data stores. • Audit events must be centralized to control misuse of the cluster in real time. • Data integrity and confidentiality in network communications to protect data on the fly. Protect the service • Perimeter security to access the cluster. • Support identity management and authentication to prove that a user/service is who claims to be. • A user/service should be authorized so more resources than expected are not used. • A user/service should not interfere with other users/services when it is not needed. • To control the use of resources, it should be audited.
  14. 14. DATA CENTER OPERATING SYSTEM MESOS SERVICE ORCHESTATION CONTAINERS NODE PROVISIONING TERRAFORM Kafka Zookeeper VAULT BAREMETAL PUBLIC CLOUD SQL PRIVATE CLOUD Docker DaaS Apps Apps Docker Microservices Microservices Docker Data Intelligence as a Service Microservices Apps with Standalone Applications Standalone Applications A P P S SERVICE DISCOVERY STRATIO EOS (Enterprise Operating System) Microservices Apps with Docker Docker Docker MARATHON CONSUL DOCKER StratioDataCentric INFRAS NETWORK ISOLATION CALICO
  15. 15. DATA CENTER OPERATING SYSTEM MESOS SERVICE ORCHESTATION CONTAINERS NODE PROVISIONING TERRAFORM Kafka Zookeeper VAULT BAREMETAL PUBLIC CLOUD SQL PRIVATE CLOUD Docker DaaS Apps Apps Docker Microservices Microservices Docker Data Intelligence as a Service Microservices Apps with Standalone Applications Standalone Applications A P P S SERVICE DISCOVERY STRATIO EOS (Enterprise Operating System) Microservices Apps with Docker Docker Docker MARATHON CONSUL DOCKER StratioDataCentric INFRAS NETWORK ISOLATION CALICO
  16. 16. DATA CENTER OPERATING SYSTEM MESOS SERVICE ORCHESTATION CONTAINERS NODE PROVISIONING TERRAFORM Kafka Zookeeper VAULT BAREMETAL PUBLIC CLOUD SQL PRIVATE CLOUD Docker DaaS Apps Apps Docker Microservices Microservices Docker Data Intelligence as a Service Microservices Apps with Standalone Applications Standalone Applications A P P S SERVICE DISCOVERY STRATIO EOS (Enterprise Operating System) Microservices Apps with Docker Docker Docker MARATHON CONSUL DOCKER StratioDataCentric INFRAS NETWORK ISOLATION CALICO
  17. 17. DATA CENTER OPERATING SYSTEM MESOS SERVICE ORCHESTATION CONTAINERS NODE PROVISIONING TERRAFORM Kafka Zookeeper VAULT BAREMETAL PUBLIC CLOUD SQL PRIVATE CLOUD Docker DaaS Apps Apps Docker Microservices Microservices Docker Data Intelligence as a Service Microservices Apps with Standalone Applications Standalone Applications A P P S SERVICE DISCOVERY STRATIO EOS (Enterprise Operating System) Microservices Apps with Docker Docker Docker MARATHON CONSUL DOCKER StratioDataCentric INFRAS NETWORK ISOLATION CALICO
  18. 18. DATA CENTER OPERATING SYSTEM MESOS SERVICE ORCHESTATION CONTAINERS NODE PROVISIONING TERRAFORM Kafka Zookeeper VAULT BAREMETAL PUBLIC CLOUD SQL PRIVATE CLOUD Docker DaaS Apps Apps Docker Microservices Microservices Docker Data Intelligence as a Service Microservices Apps with Standalone Applications Standalone Applications A P P S SERVICE DISCOVERY STRATIO EOS (Enterprise Operating System) Microservices Apps with Docker Docker Docker MARATHON CONSUL DOCKER StratioDataCentric INFRAS NETWORK ISOLATION CALICO
  19. 19. © Stratio 2017. Confidential, All Rights Reserved. In order to guide the security priorities in the product roadmap, we are focused on helping to comply with LOPD within the platform. Every release of the Stratio platform, the security status is notified through: • Results of the OWASP tests for the main components of the platform. • Results of additional general purpose security tests defined to assure the quality expected. • Security Risk Report that includes the known issues found. • When Critical and High issues are found: ‐ We explain how can be mitigated. ‐ We plan to solve them during the next release. 19 SECURITY OVERVIEW
  20. 20. © Stratio 2017. Confidential, All Rights Reserved. 20 PERIMETER SECURITY: NETWORKING Public Network Private network Private Agents Admin network Admin Router Master Nodes Admin network Admin Router Public Agents • The default network configuration allows a zone-based network security design: ‐ Public. ‐ Admin. ‐ Private. • Using Mesos roles to identify nodes ensures that only tasks specifically configured with this role will be executed outside the Private zone. • Using Marathon labels, endpoints can be registered dynamically: ‐ Admin Router for the Admin zone. ‐ Marathon LB for the Public zone.
  21. 21. © Stratio 2017. Confidential, All Rights Reserved. The solution is integrated with LDAP and Kerberos owned by the company where Stratio DCS is installed. 21 AUTHENTICATION, AUTHORIZATION AND AUDIT • Authentication: ‐ Web: OAuth2. ‐ Services & Data Stores: Kerberos or TLS-Mutual. • Authorization: ‐ OAuth2 ‐ goSec Management: API Rest and website used to manage roles, profiles and ACLs. Also it shows users, groups and audit data. • Audit: authentication and authorization events are structured and stored in a data bus (Kafka) to be computed and collected.
  22. 22. © Stratio 2017. Confidential, All Rights Reserved. Plugins are lightweight programs running within processes of each cluster component. They are responsible for: • Authorization (using goSec ACLs). • Audit of every request sent to the component. Currently plugins have been developed for: • Crossdata • Sparta • Zookeeper • HDFS 22 AUTHENTICATION, AUTHORIZATION AND AUDIT • Kafka • Elasticsearch
  23. 23. © Stratio 2017. Confidential, All Rights Reserved. • It is a good practice to manage secretes by key management system instead of store them locally. • For this purpose Stratio DCS uses HashiCorp Vault 23 KEY MANAGEMENT SYSTEM
  24. 24. © Stratio 2017. Confidential, All Rights Reserved. • Can applications obtain authentication tokens in a secure way? • Where applications save vault’s tokens? • How are tokens protected? • How will I know if someone steal tokens? 24 KEY MANAGEMENT SYSTEM the secret of secrets Mesos First secret management Application MarathonAdmin
  25. 25. © Stratio 2017. Confidential, All Rights Reserved. • Can applications obtain authentication tokens in a secure way? • Where applications save vault’s tokens? • How are tokens protected? • How will I know if someone steal tokens? 25 KEY MANAGEMENT SYSTEM the secret of secrets Mesos First secret management Application Marathon one time secret Run Application Env: one time secretAdmin
  26. 26. © Stratio 2017. Confidential, All Rights Reserved. • Can applications obtain authentication tokens in a secure way? • Where applications save vault’s tokens? • How are tokens protected? • How will I know if someone steal tokens? 26 KEY MANAGEMENT SYSTEM the secret of secrets Mesos First secret management Application Marathon one time secret login Run Application Env: one time secret token < - > ACL Admin
  27. 27. © Stratio 2017. Confidential, All Rights Reserved. • Can applications obtain tokens in a secure way? • Where applications save vault’s tokens? • How are tokens guarded? • How will I know if someone steal tokens? 27 KEY MANAGEMENT SYSTEM the secret of secrets Mesos First secret management Application Marathon one time secret Run Application Env: one time secretAdmin
  28. 28. © Stratio 2017. Confidential, All Rights Reserved. • Can applications obtain tokens in a secure way? • Where applications save vault’s tokens? • How are tokens guarded? • How will I know if someone steal tokens? 28 KEY MANAGEMENT SYSTEM the secret of secrets Mesos First secret management Application Marathon one time secret login Run Application Env: one time secretAdmin
  29. 29. © Stratio 2017. Confidential, All Rights Reserved. • Can applications obtain tokens in a secure way? • Where applications save vault’s tokens? • How are tokens guarded? • How will I know if someone steal tokens? 29 KEY MANAGEMENT SYSTEM the secret of secrets Mesos First secret management Application Marathon one time secret login Run Application Env: one time secretAdmin
  30. 30. © Stratio 2017. Confidential, All Rights Reserved. • Can applications obtain tokens in a secure way? • Where applications save vault’s tokens? • How are tokens guarded? • How will I know if someone steal tokens? 30 KEY MANAGEMENT SYSTEM the secret of secrets Mesos First secret management Application Marathon one time secret login Run Application Env: one time secret Logs Alert Admin
  31. 31. © Stratio 2017. Confidential, All Rights Reserved. • Spark jobs need access to multiple data stores so that Spark needs to support the security of Stratio DCS. • Spark 2.x compilation has been modified by Stratio in order to: ‐ Access secrets that are stored in the KMS. ‐ Allow access to Kerberized HDFS. ‐ Allow access to PostgreSQL with TLS authentication. ‐ Allow access to Elasticsearch TLS authentication. ‐ Allow access to Kafka with TLS authentication. 31 DATA PROCESSING ENGINE: SPARK
  32. 32. © Stratio 2017. Confidential, All Rights Reserved. ADMIN NETWORK PUBLIC NETWORK PRIVATE NETWORK 32 PROTECT THE DATA GOSSEC SSO AUDIT KAFKA KMS LDAP KERBEROS TABLEAU MARATHON-LB GOSEC MANAGEMENT ZOOKEEPER HDFS ADMIN ROUTER ZOOKEEPER Admin Perimeter security Authentication, Authorization, Audit Ciphered communications - use case -
  33. 33. © Stratio 2017. Confidential, All Rights Reserved. ADMIN NETWORK PUBLIC NETWORK PRIVATE NETWORK 33 PROTECT THE DATA GOSSEC SSO AUDIT KAFKA KMS LDAP KERBEROS TABLEAU MARATHON-LB GOSEC MANAGEMENT ZOOKEEPER HDFS ADMIN ROUTER ZOOKEEPER Admin Perimeter security Authentication, Authorization, Audit Ciphered communications - use case -
  34. 34. © Stratio 2017. Confidential, All Rights Reserved. ADMIN NETWORK PUBLIC NETWORK PRIVATE NETWORK 34 PROTECT THE DATA GOSSEC SSO AUDIT KAFKA KMS LDAP KERBEROS TABLEAU MARATHON-LB GOSEC MANAGEMENT ZOOKEEPER HDFS ADMIN ROUTER ZOOKEEPER Admin Perimeter security Authentication, Authorization, Audit Ciphered communications - use case -
  35. 35. © Stratio 2017. Confidential, All Rights Reserved. ADMIN NETWORK PUBLIC NETWORK PRIVATE NETWORK 35 PROTECT THE DATA GOSSEC SSO AUDIT KAFKA KMS LDAP KERBEROS TABLEAU MARATHON-LB GOSEC MANAGEMENT ZOOKEEPER HDFS ADMIN ROUTER ZOOKEEPER Admin Perimeter security Authentication, Authorization, Audit Ciphered communications - use case -
  36. 36. © Stratio 2017. Confidential, All Rights Reserved. ADMIN NETWORK PUBLIC NETWORK PRIVATE NETWORK 36 PROTECT THE DATA GOSSEC SSO AUDIT KAFKA KMS LDAP KERBEROS TABLEAU MARATHON-LB GOSEC MANAGEMENT ZOOKEEPER HDFS ADMIN ROUTER ZOOKEEPER Admin Perimeter security Authentication, Authorization, Audit Ciphered communications - use case -
  37. 37. © Stratio 2017. Confidential, All Rights Reserved. ADMIN NETWORK PUBLIC NETWORK PRIVATE NETWORK 37 PROTECT THE DATA GOSSEC SSO AUDIT KAFKA KMS LDAP KERBEROS TABLEAU MARATHON-LB GOSEC MANAGEMENT ZOOKEEPER HDFS ADMIN ROUTER ZOOKEEPER Admin Perimeter security Authentication, Authorization, Audit Ciphered communications - use case -
  38. 38. © Stratio 2017. Confidential, All Rights Reserved. • Stratio DCS cluster resources (memory, disk, cpus and port ranges) are managed by Mesos. • Mesos, Marathon and Metronome security can be activated post-installation in order to limit the use of the available resources for each framework. • Once it is activated, admins will be able to: ‐ Reserve resources for a Mesos role. ‐ Grant permissions for each user/framework to do actions such as register frameworks, run tasks, reserve resources, create volumes, etc. • Grant a minimum set of resources to a specific mesos role 38 MULTI-TENANCY CAPABILITIES: RESOURCES ISOLATION Mesos Cluster MASTER Marathon AGENT 1 role=slave_public AGENT 2 role=* AGENT 3 role=postgresql AGENT 5 role=* AGENT 4 role=*
  39. 39. © Stratio 2017. Confidential, All Rights Reserved. 39 MULTI-TENANCY CAPABILITIES: NETWORKS ISOLATION • What about network isolation into containerized world? • For this purpose Stratio DCS uses Project Calico
  40. 40. © Stratio 2017. Confidential, All Rights Reserved. • Virtual networks topologies can be created dynamically. • Virtual networks topologies can be managed by network policies. • Virtual networks can manage all Mesos supported containerized technologies. • Virtual networks barely impacts big data performance. • Frameworks/apps are authorized into a network. • Frameworks/apps can be isolated into a virtual network. • Frameworks/apps IP addresses and ports are managed by instance. 40 MULTI-TENANCY CAPABILITIES: NETWORKS ISOLATION
  41. 41. © Stratio 2015. Confidential, All Rights Reserved. Network Isolation components 41
  42. 42. © Stratio 2015. Confidential, All Rights Reserved. Network Isolation Virtual Networks 42
  43. 43. © Stratio 2015. Confidential, All Rights Reserved. Network Isolation Integration 43
  44. 44. © Stratio 2017. Confidential, All Rights Reserved. MESOS 44 PROTECT THE SERVICE CALICO & DOCKER ENGINE Admin Framework authentication Check resources for the role Authorization to launch tasks Authorization to use the network Audit (logs and Mesos API) - use case -
  45. 45. © Stratio 2017. Confidential, All Rights Reserved. MESOS 45 PROTECT THE SERVICE CALICO & DOCKER ENGINE Admin Framework authentication Check resources for the role Authorization to launch tasks Authorization to use the network Audit (logs and Mesos API) - use case - At least 1 core, 1GB to framework 1
  46. 46. © Stratio 2017. Confidential, All Rights Reserved. MESOS 46 PROTECT THE SERVICE CALICO & DOCKER ENGINE Admin Framework authentication Check resources for the role Authorization to launch tasks Authorization to use the network Audit (logs and Mesos API) - use case - net_2: Deny from framework 1 At least 1 core, 1GB to framework 1
  47. 47. © Stratio 2017. Confidential, All Rights Reserved. MESOS NETWORK B 2 CORES 5Gb RAM NETWORK A 0.5 CORES 1Gb RAM 47 PROTECT THE SERVICE CONTAINER 1 User 2. Launches FRAMEWORK 1 CALICO & DOCKER ENGINE CONTAINER 2 Admin User 2. Launches FRAMEWORK 2 Framework authentication Check resources for the role Authorization to launch tasks Authorization to use the network Audit (logs and Mesos API) - use case - net_2: Deny from framework 1 At least 1 core, 1GB to framework 1
  48. 48. © Stratio 2017. Confidential, All Rights Reserved. MESOS NETWORK B 2 CORES 5Gb RAM NETWORK A 0.5 CORES 1Gb RAM 48 PROTECT THE SERVICE CONTAINER 1 User 2. Launches FRAMEWORK 1 CALICO & DOCKER ENGINE CONTAINER 2 Admin User 2. Launches FRAMEWORK 2 Framework authentication Check resources for the role Authorization to launch tasks Authorization to use the network Audit (logs and Mesos API) - use case - net_2: Deny from framework 1 At least 1 core, 1GB to framework 1
  49. 49. © Stratio 2017. Confidential, All Rights Reserved. MULTI-DATA CENTER 49 - a use case -

×