Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ebay

735 views

Published on

ebay

Published in: Technology
  • Be the first to comment

ebay

  1. 1. 2 Apache Eagle Monitor Hadoop in Real Time Yong Zhang | Senior Architect | yonzhang2012@gmail.com Arun Manoharan | Senior Product Manager | @lycos_86
  2. 2. Big Data @ eBay 800M Listings * 159M Global Active Buyers * *Q3 2015 data 7 Hadoop Clusters* 800M HDFS operations (single cluster)* 120 PB Data* Hadoop @ eBay
  3. 3. HADOOP SECURITY Authorization & Access Control Perimeter Security Data Classification Activity Monitoring Security MDR • Perimeter Security • Authorization & Access Control • Discovery • Activity Monitoring Security for Hadoop
  4. 4. Who is accessing the data? What data are they accessing? Is someone trying to access data that they don’t have access to? Are there any anomalous access patterns? Is there a security threat? How to monitor and get notified during or prior to an anomalous event occurring? Motivation
  5. 5. Apache Eagle Apache Eagle: Monitor Hadoop in Real Time Apache Eagle is an Open Source Monitoring Platform for Hadoop eco-system, which started with monitoring data activities in Hadoop. It can instantly identify access to sensitive data, recognize attacks/malicious activities and blocks access in real time. In conjunction with components such as Ranger, Sentry, Knox, DgSecure and Splunk etc., Eagle provides comprehensive solution to secure sensitive data stored in Hadoop.
  6. 6. Apache Eagle Composition Apache Eagle Integrations Alert Engine HDFS AUDIT HIVE QUERY HBASE AUDIT CASSANDRA AUDIT MapR AUDIT 2 HADOOP Performance Metric Namenode JMX Metrics Datanode JMX Metrics System Metrics 3 M/R Job Performance Metric History Job Metrics Running Job Metrics 4 Spark Job Performance Metric Spark Job Metrics Queue Metrics 1 Data Activity Monitoring RM JMX Metrics 1 Policy Store 2 Metadata API 3 Scalability 4 Extensibility [Domains] [Applications]
  7. 7. More Integrations •Cassandra •MapR •Mongo DB •Job •Queue
  8. 8. Extensibility  Ranger • As remediation engine • As generic data source  DgSecure • Source of truth for data classification  Splunk • Syslog format output • EAGLE alert output is the 1st abstraction of analytics and Splunk is the 2nd abstraction
  9. 9. Eagle Architecture
  10. 10. Highlights 1. Turn-key integration: after installation, user defines rules 2. Comprehensive rules on high volume of data: Eagle solves some unique problem in Hadoop 3. Hot deploy rule: Eagle does not provide a lot of charts, instead it allows user to write ad-hoc rule and hot deploy it. 4. Metadata driven: kept in mind, here metadata includes policy, event schema and UI component etc. 5. Extensibility: Keep in mind that Eagle can’t succeed alone, Eagle has to be integrated with other system for example data classification, policy enforcement etc. 6. Monolithic storm topology: application pre-processing are running together with alert engine.
  11. 11. Example 1: Integration with HDFS AUDIT log • Ingestion  KafkaLog4jAppender+Kafk a  Logstash+Kafka • Partition  By user • Pre-processing  Sensitivity join  Command re-assembler Namenode Kafka Partition_1 Kafka Partition_2 Kafka Partition_N Storm Kafka Spout User1 User1 Alert Executor_1 Alert Executor_2 Alert Executor_K User2 User2 User1 User2
  12. 12. Data Classification - HDFS •Browse HDFS file system •Batch import sensitivity metadata through Eagle API •Manually mark sensitivity in Eagle UI
  13. 13.  One user command generates multiple HDFS audit events  Eagle does reverse engineering to figure out original user command  Example COPYFROMLOCAL_PATTERN = “every a = eventStream[cmd==‘getfileinfo’] ” + “-> b = eventStream[cmd==‘getfileinfo’ and user==a.user and src==str:concat(a.src,‘._COPYING_’)] ” + “-> c = eventStream[cmd==‘create’ and user==a.user and src==b.src] ” + “-> d = eventStream[cmd==‘getfileinfo’ and user==a.user and src==b.src] ” + “-> e = eventStream[cmd==‘delete’ and user==a.user and src==a.src] ” + “-> f = eventStream[cmd==‘rename’ and user==a.user and src==b.src and dst==a.src]” 2015-11-20 00:06:47,090 INFO FSNamesystem.audit: allowed=true ugi=root (auth:SIMPLE) ip=/10.0.2.15 cmd=getfileinfo src=/tmp/private dst=null perm=null proto=rpc 2015-11-20 00:06:47,185 INFO FSNamesystem.audit: allowed=true ugi=root (auth:SIMPLE) ip=/10.0.2.15 cmd=getfileinfo src=/tmp/private._COPYING_ dst=null perm=null proto=rpc 2015-11-20 00:06:47,254 INFO FSNamesystem.audit: allowed=true ugi=root (auth:SIMPLE) ip=/10.0.2.15 cmd=create src=/tmp/private._COPYING_ dst=null perm=root:hdfs:rw-r--r-- proto=rpc 2015-11-20 00:06:47,289 INFO FSNamesystem.audit: allowed=true ugi=root (auth:SIMPLE) ip=/10.0.2.15 cmd=getfileinfo src=/tmp/private._COPYING_ dst=null perm=null proto=rpc 2015-11-20 00:06:47,609 INFO FSNamesystem.audit: allowed=true ugi=root (auth:SIMPLE) ip=/10.0.2.15 cmd=delete src=/tmp/private dst=null perm=null proto=rpc 2015-11-20 00:06:47,624 INFO FSNamesystem.audit: allowed=true ugi=root (auth:SIMPLE) ip=/10.0.2.15 cmd=rename src=/tmp/private._COPYING_ dst=/tmp/private perm=root:hdfs:rw-r--r-- proto=rpc User Command Re-assembly
  14. 14. • Policy evaluation is stateful (one user’s data has to go to one physical bolt) • Partition by user all the way (hash) • User is not balanced at all • Greedy algorithm https://en.wikipedia.org/wiki/Partition_problem#The_greedy_algorithm Data Skew Problem
  15. 15.  Policy weight is not even • Regex policy is CPU intensive • Window based policy is Memory intensive Computation Skew Problem
  16. 16. Example 2: Integration with Hive • Ingestion  Yarn API • Partition  user • Pre- processing  Sensitivity join  Hive SQL parser
  17. 17. Data Classification - Hive •Browse Hive databases/tables/columns •Batch import sensitivity metadata through Eagle API •Manually mark sensitivity in Eagle UI
  18. 18. Eagle Alert Engine Overview 1 Runs CEP engine on Apache Storm • Use CEP engine as library (Siddhi CEP) • Evaluate policy on streamed data • Rule is hot deployable 2 Inject policy dynamically • API • Intuitive UI 3 Scalability • Computation # of policies (policy placement) • Storage # of events (event partition) 4 Extensibility for policy enforcement • Post-alert processing with plugin
  19. 19. Run CEP Engine on Storm Storm Bolt CEP Worker CEP Worker CEP Worker … … Policy Check Thread Policy Store Metadata API event1 event1 event1 event1 policy1,2,3,4,5,6policy1,2,3 policy1 policy2 policy3 Storm Bolt event1 policy4,5,6 event schema
  20. 20. Primitives – event, policy, alert Raw Event 2015-10-11 01:00:00,014 INFO FSNamesystem.audit: allowed=true ugi=user_tom@sandbox.hortonworks.com (auth:KERBEROS) ip=/10.0.0.1 cmd=getfileinfo src=/tmp/private dst=null perm=null Alert Event Timestamp, cmd, src, dst, ugi, sensitivityType, securityZone Policy viewPrivate: from hdfsAuditLogEventStream[(cmd=='getfileinfo') and (src=’/tmp/private’)] Alert 2015-10-11 01:00:09[UTC] hdfsAuditLog viewPrivate user_tom/10.0.0.1 The Policy "viewPrivate" has been detected with the below information: timestamp="1445993770932" allowed="true" cmd="getfileinfo" host="/10.0.0.1" sensitivityType="PRIVATE" securityZone="NA" src="/tmp/private" dst="NA" user=“user_tom”
  21. 21. Event Schema • Modeling event
  22. 22. 1 Single event evaluation • threshold check with various conditions Policy Capabilities 2 Event window based evaluation • various window semantics (time/length sliding/batch window) • comprehensive aggregation support 3 Correlation for multiple event streams • SQL-like join 4 Pattern Match and Sequence • a happens followed by b Powered by Siddhi 3.0.5, but Eagle provides dynamic capabilities and intuitive API/UI
  23. 23. 1 Namenode master/slave lag from every a = hadoopJmxMetricEventStream[metric=="hadoop.namenode.journaltransaction.lastappliedorwrittent xid"] -> b = hadoopJmxMetricEventStream[metric==a.metric and b.host != a.host and (max(convert(a.value, "long")) + 100) <= max(convert(value, "long"))] within 5 min select a.host as hostA, a.value as transactIdA, b.host as hostB, b.value as transactIdB insert into tmp; Some policy examples 3 Namenode HA state change from every a = hadoopJmxMetricEventStream[metric=="hadoop.namenode.hastate.active.count"] -> b = hadoopJmxMetricEventStream[metric==a.metric and b.host == a.host and (convert(a.value, "long") != convert(value, "long"))] within 10 min select a.host, a.value as oldHaState, b.value as newHaState, b.timestamp as timestamp, b.metric as metric, b.component as component, b.site as site insert into tmp; 2 Namenode last checkpoint time • from hadoopJmxMetricEventStream[metric == "hadoop.namenode.dfs.lastcheckpointtime" and (convert(value, "long") + 18000000) < timestamp] select metric, host, value, timestamp, component, site insert into tmp;
  24. 24. Define policy in UI and API curl -u ${EAGLE_SERVICE_USER}:${EAGLE_SERVICE_PASSWD} -X POST -H 'Content- Type:application/json' "http://${EAGLE_SERVICE_HOST}:${EAGLE_SERVICE_PORT}/eagle- service/rest/entities?serviceName=AlertDefinitionService" -d ' [ { "prefix": "alertdef", "tags": { "site": "sandbox", "application": "hadoopJmxMetricDataSource", "policyId": "capacityUsedPolicy", "alertExecutorId": "hadoopJmxMetricAlertExecutor", "policyType": "siddhiCEPEngine" }, "description": "jmx metric ", "policyDef": "{"expression":"from hadoopJmxMetricEventStream[metric == "hadoop.namenode.fsnamesystemstate.capacityused" and convert(value, "long") > 0] select metric, host, value, timestamp, component, site insert into tmp; ","type":"siddhiCEPEngine"}", "enabled": true, "dedupeDef": "{"alertDedupIntervalMin":10,"emailDedupIntervalMin":10}", "notificationDef": "[{"sender":"eagle@apache.org","recipients":"eagle@apache.org","subject ":"missing block found.","flavor":"email","id":"email_1","tplFileName":""}]" } ] ' 1 Create policy using API 2 Create policy using UI
  25. 25. Scalability •Scale with # of events •Scale with # of policies
  26. 26. Statistics • # of events evaluated per second • audit for policy change Eagle Service As of 0.3.0, Eagle stores metadata and statistics into HBASE, and support Druid as metric store. Metadata • Policy • Event schema • Site/Application/UI Features HBASE • Store metrics • Store M/R job/task data • Rowkey design for time-series data • HBase Coprocessor Raw data • Druid for metric • HBASE for M/R job/task etc. • ES for log (future) 1 Data to be stored 2 Storage 3 API/UI Druid • Consume data from Kafka HBASE • filter, groupby, sort, top Druid • Druid query API • Dashboard in Eagle
  27. 27. Alert Engine Limitations in Eagle 0.3 1 High cost for integrating • Coding for onboarding new data source • Monolithic topology for pre-processing and alert 3 Policy capability restricted by event partition • Can’t do ad-hoc group-by policy expression For example from groupby user to groupby cmd 2 Not multi-tenant • Alert engine is embedded into application • Many separate Storm topologies 4 Correlation is not declarative • Coding for correlating existing data sources If traffic is partitioned by user, policy only supports expression of user based group-by One storm topology even for one trivial data source Even if it is a simple data source, you have to write storm topology and then deploy Can’t declare correlations for multiple metrics 5 Stateful policy evaluation • fail over when bolt is down How to replay one week history data when node is down
  28. 28. Eagle Next Releases • Improve User experience  Remote start storm topology  Metadata stored in RDBMS Eagle 0.4 Eagle 0.5 • Alert Engine as Platform  No monolithic topology  Declarative data source onboard  Easy correlation  Support policies with any field group-by  Elastic capacity management
  29. 29. USER PROFILE ALGORITHMS… Eigen Value Decomposition • Compute mean and variance • Compute Eigen Vectors and determine Principal Components • Normal data points lie near first few principal components • Abnormal data points lie further from first few principal components and closer to later components
  30. 30. USER PROFILE ARCHITECTURE
  31. 31. dev@eagle.incubator.apache.org http://eagle.incubator.apache.org https://github.com/apache/incubator-eagleGithub Dev Mail List @TheApacheEagleTwitter Q & A

×