Tri hug 2010 wei

1,065 views

Published on

  • Be the first to comment

  • Be the first to like this

Tri hug 2010 wei

  1. 1. SecureMR - Practical Hadoop Security Triangle Hadoop Users Group September 14 th , 2010 <ul><li>/32 </li></ul>
  2. 2. SecureMR - Overview <ul><li>Long-term Goal </li></ul><ul><ul><li>Deploy MapReduce over open systems with security guarantee </li></ul></ul><ul><li>Motivation </li></ul><ul><ul><li>Industry </li></ul></ul><ul><ul><ul><li>Google, Yahoo!, Facebook </li></ul></ul></ul><ul><ul><li>Academia </li></ul></ul><ul><ul><ul><li>Machine Learning, Data Intensive Computation, Image Processing </li></ul></ul></ul><ul><li>Our Focus </li></ul><ul><ul><li>Provide integrity assurance for MapReduce in open systems </li></ul></ul><ul><li>Basic Idea </li></ul><ul><ul><li>Adopt a replication-based scheme </li></ul></ul><ul><ul><li>Decentralize integrity verification </li></ul></ul><ul><li>/32 </li></ul>
  3. 3. Outline <ul><li>Introduction </li></ul><ul><li>System Model </li></ul><ul><li>System Design </li></ul><ul><li>Analysis and Evaluation </li></ul><ul><li>Related Work </li></ul><ul><li>Conclusion </li></ul><ul><li>/32 </li></ul>
  4. 4. MapReduce Overview <ul><li>… … </li></ul>Reduce Phase <ul><li>DFS </li></ul><ul><li>… … </li></ul><ul><li>Map Phase </li></ul><ul><li>M2 </li></ul><ul><li>R1 </li></ul><ul><li>Input </li></ul>B2 … … Bn B1 <ul><li>M1 </li></ul>Local Write <ul><li>Read from DFS </li></ul>Assign MapTask <ul><li>Assign ReduceTask </li></ul>Remote Read Output 1 Output r Write to DFS <ul><li>… … </li></ul>Intermediate Result <ul><li>DFS </li></ul><ul><li>/32 </li></ul><ul><li>Rr </li></ul><ul><li>Reducer </li></ul><ul><li>Mapper </li></ul><ul><li>Mn </li></ul><ul><li>Master </li></ul>P1 ... … Pr P1 … … Pr P1 … … Pr
  5. 5. MapReduce – WordCount Application Hello World, Bye World!  Hello MapReduce, Goodbye to MapReduce. Welcome to ACSAC, Goodbye to ACSAC. Reduce Phase <ul><li>DFS </li></ul><ul><li>Map Phase </li></ul>Intermediate Result <ul><li>DFS </li></ul>(Hello, 1) (Bye, 1) (World, 1) (World, 1) (Welcome, 1) (to, 1) (to, 1) (ACSAC, 1) (Goodbye, 1) (ACSAC, 1) (Hello, 1) (to, 1) (MapReduce, 1) (Goodbye, 1) (MapReduce, 1) R1 R2 (Hello, 2) (Bye, 1) (Welcome, 1) (to, 3) (World, 2) (ACSAC, 2) (Goodbye, 2) (MapReduce, 2) <ul><li>/32 </li></ul><ul><li>M1 </li></ul><ul><li>M2 </li></ul><ul><li>M3 </li></ul>
  6. 6. Outline <ul><li>Introduction </li></ul><ul><li>System Model </li></ul><ul><li>System Design </li></ul><ul><li>Analysis and Evaluation </li></ul><ul><li>Related Work </li></ul><ul><li>Conclusion </li></ul><ul><li>/32 </li></ul>
  7. 7. System Model <ul><li>Goal </li></ul><ul><ul><li>Deploy MapReduce over open systems with integrity assurance </li></ul></ul><ul><li>Open system is different from closed system </li></ul><ul><li>Attacks against MapReduce in open systems </li></ul><ul><ul><li>Communication attacks </li></ul></ul><ul><ul><ul><li>Eavesdropping, DoS and replay attacks </li></ul></ul></ul><ul><ul><li>Data processing service integrity attacks </li></ul></ul><ul><ul><ul><li>Insert fake data, tamper data and drop data </li></ul></ul></ul><ul><li>/32 </li></ul>(Our Focus)
  8. 8. System Model – Integrity Attacks <ul><li>… … </li></ul>Reduce Phase <ul><li>DFS </li></ul><ul><li>… … </li></ul><ul><li>Map Phase </li></ul><ul><li>Input </li></ul>P1 ... … Pr B2 … … Bn B1 P1 … … Pr P1 … … Pr Output 1 Output r <ul><li>… … </li></ul>Intermediate Result <ul><li>DFS </li></ul><ul><li>/32 </li></ul><ul><li>M2 </li></ul><ul><li>R1 </li></ul><ul><li>M1 </li></ul><ul><li>Rr </li></ul><ul><li>Mn </li></ul><ul><li>Master </li></ul>
  9. 9. System Model <ul><li>Assumptions </li></ul><ul><ul><li>PKI is deployed in advance </li></ul></ul><ul><ul><li>Master is trusted </li></ul></ul><ul><ul><li>DFS provides data integrity protection [Atallah, et al., ICDE’08] </li></ul></ul><ul><li>Attack Models </li></ul><ul><ul><li>Non-collusive malicious behavior </li></ul></ul><ul><ul><li>Collusive malicious behavior </li></ul></ul><ul><li>/32 </li></ul>
  10. 10. Outline <ul><li>Introduction </li></ul><ul><li>System Model </li></ul><ul><li>System Design </li></ul><ul><li>Analysis and Evaluation </li></ul><ul><li>Related Work </li></ul><ul><li>Conclusion </li></ul><ul><li>/32 </li></ul>
  11. 11. SecureMR <ul><li>Basic Idea </li></ul><ul><ul><li>Adopt a replication-based scheme ( integrity ) </li></ul></ul><ul><li>/32 </li></ul>
  12. 12. A Naive Approach B1 B2 B3 B4 <ul><li>Read </li></ul>Send results to master Send results to master Send intermediate result to reducer <ul><li>Process </li></ul>… … Bn <ul><li>Ma </li></ul><ul><li>Mb </li></ul><ul><li>Ra </li></ul><ul><li>/32 </li></ul><ul><li>Rb </li></ul><ul><li>Process </li></ul>Scalability? Integrity? P1 P2 … … Pr P1 P2 … … Pr H P1 … H P1 … == ???
  13. 13. A Naive Approach B1 B2 B3 B4 <ul><li>Read </li></ul>Send results to master Send results to master … … Bn <ul><li>Ma </li></ul><ul><li>Mb </li></ul><ul><li>Ra </li></ul><ul><li>/32 </li></ul><ul><li>Rb </li></ul>P1 P2 … … Pr P1 P2 … … Pr H P1 H P2 … H P1 H P2 … H ==
  14. 14. A Naive Approach <ul><li>Read </li></ul>Send results to master Send results to master <ul><li>Ma </li></ul><ul><li>Mb </li></ul><ul><li>Ra </li></ul>Send tampered result to reducer Output 1 <ul><li>/32 </li></ul><ul><li>Rb </li></ul>Output 1 == P1 P2 … … Pr P1 P2 … … Pr B1 B2 B3 B4 … … Bn H P1 H P2 … H P1 H P2 … H ==
  15. 15. SecureMR <ul><li>Basic Idea </li></ul><ul><ul><li>Adopt a replication-based scheme ( integrity ) </li></ul></ul><ul><ul><li>Decentralize integrity verification ( scalability & integrity ) </li></ul></ul><ul><li>Design Goals </li></ul><ul><ul><li>Security </li></ul></ul><ul><ul><ul><li>Non-repudiation, resilience to DoS and replay attacks </li></ul></ul></ul><ul><ul><li>Performance </li></ul></ul><ul><ul><ul><li>Minimize computation cost and network communications </li></ul></ul></ul><ul><ul><li>Applicability </li></ul></ul><ul><ul><ul><li>Preserve existing protocol as much as possible </li></ul></ul></ul><ul><li>/32 </li></ul>
  16. 16. SecureMR – Architecture Design <ul><li>MapReduce </li></ul>Open Systems Grid Computing, Volunteer Computing and P2P Computing Network Infrastructure User Applications Task Executor Scheduler Task Executor <ul><li>/32 </li></ul><ul><li>Reducer </li></ul><ul><li>Master </li></ul><ul><li>Mapper </li></ul>
  17. 17. SecureMR – Architecture Design <ul><li>SecureMR </li></ul>Open Systems Grid Computing, Volunteer Computing and P2P Computing Network Infrastructure User Applications Secure Task Executor Secure Verifier Secure Scheduler Secure Manager Secure Task Executor Secure Committer <ul><li>/32 </li></ul><ul><li>Reducer </li></ul><ul><li>Master </li></ul><ul><li>Mapper </li></ul>
  18. 18. SecureMR – Communication Design <ul><li>… … </li></ul><ul><li>Reduce Phase </li></ul><ul><li>B1 </li></ul><ul><li>B2 </li></ul><ul><li>… … </li></ul><ul><li>Bn </li></ul><ul><li>DFS </li></ul><ul><li>2. Read </li></ul><ul><li>7. Notify </li></ul><ul><li>… … </li></ul><ul><li>Map Phase </li></ul><ul><li>5. Compare </li></ul><ul><li>1.1. Assign </li></ul><ul><li>8. Request </li></ul><ul><li>9. Response </li></ul><ul><li>10. Verify </li></ul><ul><li>3. Process </li></ul><ul><li>Master </li></ul><ul><li>4. Commit </li></ul><ul><li>1.2. Assign </li></ul><ul><li>6. Assign </li></ul><ul><li>Input </li></ul><ul><li>/32 </li></ul><ul><li>M2 </li></ul><ul><li>R1 </li></ul><ul><li>M1 </li></ul><ul><li>Rr </li></ul><ul><li>Reducer </li></ul><ul><li>Mapper </li></ul><ul><li>Mn </li></ul><ul><li>Commitment </li></ul><ul><li>Verification </li></ul>
  19. 19. SecureMR – Commitment Protocol Send hashes Send hashes <ul><li>Ma </li></ul><ul><li>Mb </li></ul><ul><li>Read </li></ul><ul><li>/32 </li></ul>P1 P2 … … Pr H P1 H P2 … H Pr {H r } sig P1 P2 … … Pr H P1 H P2 … H Pr {H r } sig H P1 H P2 … {H} sig H P1 … {H} sig == B1 B2 B3 B4 … … Bn
  20. 20. SecureMR – Verification Protocol H P1 H P2 … H Pr {H r } sig P1 P2 … … Pr H P1 H P2 … H Pr {H r } sig Send hashes Send hashes Notify & {H P1 }sig Read & Calculate H’ P1 H P1 == H’ P1 ? … … … … Notify & {H Pr }sig Read & Calculate H’ Pr H Pr == H’ Pr ? <ul><li>Read </li></ul><ul><li>Ma </li></ul><ul><li>Mb </li></ul><ul><li>R1 </li></ul><ul><li>Rr </li></ul><ul><li>/32 </li></ul>P1 P2 … … Pr B1 B2 B3 B4 … … Bn
  21. 21. SecureMR – Verification Protocol H P1 H P2 … H Pr {H r } sig P1 P2 … … Pr H P1 H P2 … H Pr {H r } sig Send hashes Send hashes Notify & {H P1 }sig Read & Calculate H’ P1 H P1 == H’ P1 <ul><li>Read </li></ul><ul><li>Ma </li></ul><ul><li>Mb </li></ul><ul><li>R1 </li></ul><ul><li>/32 </li></ul>P1 P2 … … Pr B1 B2 B3 B4 … … Bn
  22. 22. MapReduce in Open Systems – Integrity <ul><li>… … </li></ul>Reduce Phase <ul><li>DFS </li></ul><ul><li>… … </li></ul><ul><li>Map Phase </li></ul><ul><li>Input </li></ul>B2 … … Bn B1 Local Write <ul><li>Read from DFS </li></ul>Assign MapTask <ul><li>Assign ReduceTask </li></ul>Remote Read Output 1 Output r Write to DFS <ul><li>… … </li></ul>Intermediate Result <ul><li>DFS </li></ul><ul><li>/32 </li></ul><ul><li>M2 </li></ul><ul><li>R1 </li></ul><ul><li>M1 </li></ul><ul><li>Rr </li></ul><ul><li>Reducer </li></ul><ul><li>Mapper </li></ul><ul><li>Mn </li></ul><ul><li>Master </li></ul>P1 ... … Pr P1 … … Pr P1 … … Pr
  23. 23. Outline <ul><li>Introduction </li></ul><ul><li>System Model </li></ul><ul><li>System Design </li></ul><ul><li>Analysis and Evaluation </li></ul><ul><li>Related Work </li></ul><ul><li>Conclusion </li></ul><ul><li>/32 </li></ul>
  24. 24. SecureMR – Analysis <ul><li>Security Analysis </li></ul><ul><ul><li>No false alarm </li></ul></ul><ul><ul><li>Non-repudiation </li></ul></ul><ul><li>Attacker Behavior Analysis </li></ul><ul><ul><li>Periodical attackers without collusion (Detection Rate) </li></ul></ul><ul><ul><li>Periodical attackers with collusion (Detection Rate) </li></ul></ul><ul><ul><li>Strategic attackers (Misbehaving Probability) </li></ul></ul><ul><ul><ul><li>Detection Rate </li></ul></ul></ul><ul><ul><li>We define the detection rate, denoted D rate , as the probability that the inconsistency between results caused by the misbehavior is detected during l jobs. </li></ul></ul><ul><li>/32 </li></ul>
  25. 25. SecureMR – Analysis <ul><li>/32 </li></ul><ul><li>Detection Rate for Collusive Periodical Attacker </li></ul><ul><li># of works n = 50 </li></ul><ul><li>misbehaving probability p m = 0.5 </li></ul><ul><li># of blocks b = 20 </li></ul><ul><li># of jobs l = 15 </li></ul><ul><li>p b – duplication rate </li></ul><ul><li>m – # of malicious workers </li></ul>
  26. 26. SecureMR – Evaluation <ul><li>System Implementation </li></ul><ul><ul><li>Implementation based on Hadoop </li></ul></ul><ul><ul><li>Two scheduling algorithms for comparisons </li></ul></ul><ul><ul><ul><li>Naive task scheduling algorithm </li></ul></ul></ul><ul><ul><ul><li>Commitment-based task scheduling algorithm </li></ul></ul></ul><ul><ul><li>Non-blocking Consistency verification </li></ul></ul><ul><li>Experiment Setup </li></ul><ul><ul><li>14 hosts in Virtual Computing Lab (VCL) </li></ul></ul><ul><ul><li>2.66GHz Intel Intel(R) Core(TM) 2 Duo </li></ul></ul><ul><ul><li>Ubuntu Linux 8.04, Sun JDK 6 and Hadoop 0.19 </li></ul></ul><ul><ul><li>Hadoop WordCount application </li></ul></ul><ul><li>/32 </li></ul>
  27. 27. SecureMR – Evaluation <ul><li>/32 </li></ul><ul><li># of map tasks = 60 </li></ul><ul><li># of reduce tasks = 25 </li></ul><ul><li>size of input data = 1GB </li></ul><ul><li>Response Time </li></ul><ul><ul><li>We define the response time as the time to finish map and reduce tasks in a job. </li></ul></ul><ul><li>Response Time vs Duplication Rate </li></ul>
  28. 28. Outline <ul><li>Introduction </li></ul><ul><li>System Model </li></ul><ul><li>System Design </li></ul><ul><li>Analysis and Evaluation </li></ul><ul><li>Related Work </li></ul><ul><li>Conclusion </li></ul><ul><li>/32 </li></ul>
  29. 29. Related Work <ul><li>Research related to MapReduce </li></ul><ul><ul><li>Machine Learning [Cheng, et al., NIPS 2006] </li></ul></ul><ul><ul><li>Data Intensive Computing [Ekanayake, et al., eScience 2008] </li></ul></ul><ul><ul><li>Semantic Annotation [Laclav´ık, et al., ICCS 2008] </li></ul></ul><ul><ul><li>Few attention paied to the integrity protection in MapReduce </li></ul></ul><ul><li>Related techniques </li></ul><ul><ul><li>Sampling for uncheatable grid computing [Du, et al., ICDCS 2004] </li></ul></ul><ul><ul><li>Quiz for result verification [Zhao, et al., P2P 2005] </li></ul></ul><ul><ul><li>Majority voting and sport-checking [Sarmenta, et al., FGCS 2002] </li></ul></ul><ul><ul><li>None of them addressed unique challenges like massive data processing and multi-party distributed computation </li></ul></ul><ul><li>Research on system security </li></ul><ul><ul><li>Securing publish-subscribe services [Srivatsa, et al., CCS 2005] </li></ul></ul><ul><ul><li>Peerreview in distributed systems [Haeberlen, et al., SOSP 2007] </li></ul></ul><ul><ul><li>SecureMR focuses on a different domain </li></ul></ul><ul><li>/32 </li></ul>
  30. 30. Outline <ul><li>Introduction </li></ul><ul><li>System Model </li></ul><ul><li>System Design </li></ul><ul><li>Analysis and Evaluation </li></ul><ul><li>Related Work </li></ul><ul><li>Conclusion </li></ul><ul><li>/32 </li></ul>
  31. 31. Conclusion <ul><li>To the best of our knowledge, our work makes the first attempt to address this problem. </li></ul><ul><li>Contributions </li></ul><ul><ul><li>A decentralized replication-based integrity verification scheme </li></ul></ul><ul><ul><li>A prototype of SecureMR </li></ul></ul><ul><ul><li>Analytical study and experimental evaluation of performance overhead </li></ul></ul><ul><li>Future Work </li></ul><ul><ul><li>Explore other techniques to address collusion attack </li></ul></ul><ul><ul><li>Provide data quality assurance for final result </li></ul></ul><ul><li>/32 </li></ul>
  32. 32. <ul><li>Thank you </li></ul><ul><li>Questions? </li></ul><ul><li>/32 </li></ul>

×