• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Tri hug 2010   wei
 

Tri hug 2010 wei

on

  • 975 views

 

Statistics

Views

Total Views
975
Views on SlideShare
447
Embed Views
528

Actions

Likes
0
Downloads
0
Comments
0

4 Embeds 528

http://www.trihug.org 525
http://www.tumblr.com 1
http://static.slidesharecdn.com 1
http://trihug.tumblr.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Use Hello as an example.

Tri hug 2010   wei Tri hug 2010 wei Presentation Transcript

  • SecureMR - Practical Hadoop Security Triangle Hadoop Users Group September 14 th , 2010
    • /32
  • SecureMR - Overview
    • Long-term Goal
      • Deploy MapReduce over open systems with security guarantee
    • Motivation
      • Industry
        • Google, Yahoo!, Facebook
      • Academia
        • Machine Learning, Data Intensive Computation, Image Processing
    • Our Focus
      • Provide integrity assurance for MapReduce in open systems
    • Basic Idea
      • Adopt a replication-based scheme
      • Decentralize integrity verification
    • /32
  • Outline
    • Introduction
    • System Model
    • System Design
    • Analysis and Evaluation
    • Related Work
    • Conclusion
    • /32
  • MapReduce Overview
    • … …
    Reduce Phase
    • DFS
    • … …
    • Map Phase
    • M2
    • R1
    • Input
    B2 … … Bn B1
    • M1
    Local Write
    • Read from DFS
    Assign MapTask
    • Assign ReduceTask
    Remote Read Output 1 Output r Write to DFS
    • … …
    Intermediate Result
    • DFS
    • /32
    • Rr
    • Reducer
    • Mapper
    • Mn
    • Master
    P1 ... … Pr P1 … … Pr P1 … … Pr
  • MapReduce – WordCount Application Hello World, Bye World!  Hello MapReduce, Goodbye to MapReduce. Welcome to ACSAC, Goodbye to ACSAC. Reduce Phase
    • DFS
    • Map Phase
    Intermediate Result
    • DFS
    (Hello, 1) (Bye, 1) (World, 1) (World, 1) (Welcome, 1) (to, 1) (to, 1) (ACSAC, 1) (Goodbye, 1) (ACSAC, 1) (Hello, 1) (to, 1) (MapReduce, 1) (Goodbye, 1) (MapReduce, 1) R1 R2 (Hello, 2) (Bye, 1) (Welcome, 1) (to, 3) (World, 2) (ACSAC, 2) (Goodbye, 2) (MapReduce, 2)
    • /32
    • M1
    • M2
    • M3
  • Outline
    • Introduction
    • System Model
    • System Design
    • Analysis and Evaluation
    • Related Work
    • Conclusion
    • /32
  • System Model
    • Goal
      • Deploy MapReduce over open systems with integrity assurance
    • Open system is different from closed system
    • Attacks against MapReduce in open systems
      • Communication attacks
        • Eavesdropping, DoS and replay attacks
      • Data processing service integrity attacks
        • Insert fake data, tamper data and drop data
    • /32
    (Our Focus)
  • System Model – Integrity Attacks
    • … …
    Reduce Phase
    • DFS
    • … …
    • Map Phase
    • Input
    P1 ... … Pr B2 … … Bn B1 P1 … … Pr P1 … … Pr Output 1 Output r
    • … …
    Intermediate Result
    • DFS
    • /32
    • M2
    • R1
    • M1
    • Rr
    • Mn
    • Master
  • System Model
    • Assumptions
      • PKI is deployed in advance
      • Master is trusted
      • DFS provides data integrity protection [Atallah, et al., ICDE’08]
    • Attack Models
      • Non-collusive malicious behavior
      • Collusive malicious behavior
    • /32
  • Outline
    • Introduction
    • System Model
    • System Design
    • Analysis and Evaluation
    • Related Work
    • Conclusion
    • /32
  • SecureMR
    • Basic Idea
      • Adopt a replication-based scheme ( integrity )
    • /32
  • A Naive Approach B1 B2 B3 B4
    • Read
    Send results to master Send results to master Send intermediate result to reducer
    • Process
    … … Bn
    • Ma
    • Mb
    • Ra
    • /32
    • Rb
    • Process
    Scalability? Integrity? P1 P2 … … Pr P1 P2 … … Pr H P1 … H P1 … == ???
  • A Naive Approach B1 B2 B3 B4
    • Read
    Send results to master Send results to master … … Bn
    • Ma
    • Mb
    • Ra
    • /32
    • Rb
    P1 P2 … … Pr P1 P2 … … Pr H P1 H P2 … H P1 H P2 … H ==
  • A Naive Approach
    • Read
    Send results to master Send results to master
    • Ma
    • Mb
    • Ra
    Send tampered result to reducer Output 1
    • /32
    • Rb
    Output 1 == P1 P2 … … Pr P1 P2 … … Pr B1 B2 B3 B4 … … Bn H P1 H P2 … H P1 H P2 … H ==
  • SecureMR
    • Basic Idea
      • Adopt a replication-based scheme ( integrity )
      • Decentralize integrity verification ( scalability & integrity )
    • Design Goals
      • Security
        • Non-repudiation, resilience to DoS and replay attacks
      • Performance
        • Minimize computation cost and network communications
      • Applicability
        • Preserve existing protocol as much as possible
    • /32
  • SecureMR – Architecture Design
    • MapReduce
    Open Systems Grid Computing, Volunteer Computing and P2P Computing Network Infrastructure User Applications Task Executor Scheduler Task Executor
    • /32
    • Reducer
    • Master
    • Mapper
  • SecureMR – Architecture Design
    • SecureMR
    Open Systems Grid Computing, Volunteer Computing and P2P Computing Network Infrastructure User Applications Secure Task Executor Secure Verifier Secure Scheduler Secure Manager Secure Task Executor Secure Committer
    • /32
    • Reducer
    • Master
    • Mapper
  • SecureMR – Communication Design
    • … …
    • Reduce Phase
    • B1
    • B2
    • … …
    • Bn
    • DFS
    • 2. Read
    • 7. Notify
    • … …
    • Map Phase
    • 5. Compare
    • 1.1. Assign
    • 8. Request
    • 9. Response
    • 10. Verify
    • 3. Process
    • Master
    • 4. Commit
    • 1.2. Assign
    • 6. Assign
    • Input
    • /32
    • M2
    • R1
    • M1
    • Rr
    • Reducer
    • Mapper
    • Mn
    • Commitment
    • Verification
  • SecureMR – Commitment Protocol Send hashes Send hashes
    • Ma
    • Mb
    • Read
    • /32
    P1 P2 … … Pr H P1 H P2 … H Pr {H r } sig P1 P2 … … Pr H P1 H P2 … H Pr {H r } sig H P1 H P2 … {H} sig H P1 … {H} sig == B1 B2 B3 B4 … … Bn
  • SecureMR – Verification Protocol H P1 H P2 … H Pr {H r } sig P1 P2 … … Pr H P1 H P2 … H Pr {H r } sig Send hashes Send hashes Notify & {H P1 }sig Read & Calculate H’ P1 H P1 == H’ P1 ? … … … … Notify & {H Pr }sig Read & Calculate H’ Pr H Pr == H’ Pr ?
    • Read
    • Ma
    • Mb
    • R1
    • Rr
    • /32
    P1 P2 … … Pr B1 B2 B3 B4 … … Bn
  • SecureMR – Verification Protocol H P1 H P2 … H Pr {H r } sig P1 P2 … … Pr H P1 H P2 … H Pr {H r } sig Send hashes Send hashes Notify & {H P1 }sig Read & Calculate H’ P1 H P1 == H’ P1
    • Read
    • Ma
    • Mb
    • R1
    • /32
    P1 P2 … … Pr B1 B2 B3 B4 … … Bn
  • MapReduce in Open Systems – Integrity
    • … …
    Reduce Phase
    • DFS
    • … …
    • Map Phase
    • Input
    B2 … … Bn B1 Local Write
    • Read from DFS
    Assign MapTask
    • Assign ReduceTask
    Remote Read Output 1 Output r Write to DFS
    • … …
    Intermediate Result
    • DFS
    • /32
    • M2
    • R1
    • M1
    • Rr
    • Reducer
    • Mapper
    • Mn
    • Master
    P1 ... … Pr P1 … … Pr P1 … … Pr
  • Outline
    • Introduction
    • System Model
    • System Design
    • Analysis and Evaluation
    • Related Work
    • Conclusion
    • /32
  • SecureMR – Analysis
    • Security Analysis
      • No false alarm
      • Non-repudiation
    • Attacker Behavior Analysis
      • Periodical attackers without collusion (Detection Rate)
      • Periodical attackers with collusion (Detection Rate)
      • Strategic attackers (Misbehaving Probability)
        • Detection Rate
      • We define the detection rate, denoted D rate , as the probability that the inconsistency between results caused by the misbehavior is detected during l jobs.
    • /32
  • SecureMR – Analysis
    • /32
    • Detection Rate for Collusive Periodical Attacker
    • # of works n = 50
    • misbehaving probability p m = 0.5
    • # of blocks b = 20
    • # of jobs l = 15
    • p b – duplication rate
    • m – # of malicious workers
  • SecureMR – Evaluation
    • System Implementation
      • Implementation based on Hadoop
      • Two scheduling algorithms for comparisons
        • Naive task scheduling algorithm
        • Commitment-based task scheduling algorithm
      • Non-blocking Consistency verification
    • Experiment Setup
      • 14 hosts in Virtual Computing Lab (VCL)
      • 2.66GHz Intel Intel(R) Core(TM) 2 Duo
      • Ubuntu Linux 8.04, Sun JDK 6 and Hadoop 0.19
      • Hadoop WordCount application
    • /32
  • SecureMR – Evaluation
    • /32
    • # of map tasks = 60
    • # of reduce tasks = 25
    • size of input data = 1GB
    • Response Time
      • We define the response time as the time to finish map and reduce tasks in a job.
    • Response Time vs Duplication Rate
  • Outline
    • Introduction
    • System Model
    • System Design
    • Analysis and Evaluation
    • Related Work
    • Conclusion
    • /32
  • Related Work
    • Research related to MapReduce
      • Machine Learning [Cheng, et al., NIPS 2006]
      • Data Intensive Computing [Ekanayake, et al., eScience 2008]
      • Semantic Annotation [Laclav´ık, et al., ICCS 2008]
      • Few attention paied to the integrity protection in MapReduce
    • Related techniques
      • Sampling for uncheatable grid computing [Du, et al., ICDCS 2004]
      • Quiz for result verification [Zhao, et al., P2P 2005]
      • Majority voting and sport-checking [Sarmenta, et al., FGCS 2002]
      • None of them addressed unique challenges like massive data processing and multi-party distributed computation
    • Research on system security
      • Securing publish-subscribe services [Srivatsa, et al., CCS 2005]
      • Peerreview in distributed systems [Haeberlen, et al., SOSP 2007]
      • SecureMR focuses on a different domain
    • /32
  • Outline
    • Introduction
    • System Model
    • System Design
    • Analysis and Evaluation
    • Related Work
    • Conclusion
    • /32
  • Conclusion
    • To the best of our knowledge, our work makes the first attempt to address this problem.
    • Contributions
      • A decentralized replication-based integrity verification scheme
      • A prototype of SecureMR
      • Analytical study and experimental evaluation of performance overhead
    • Future Work
      • Explore other techniques to address collusion attack
      • Provide data quality assurance for final result
    • /32
    • Thank you
    • Questions?
    • /32