Security and Privacy in Cloud ComputingRagib HasanJohns Hopkins Universityen.600.412 Spring 2010Lecture 603/22/2010
Verifying Computations in a Cloud3/22/2010Scenario   User sends her data processing job to the cloud.   Clouds provide dataflow operation as a service (e.g., MapReduce, Hadoop etc.)Problem: Users have no way of evaluating the correctness of resultsen.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan2
DataFlow Operations3/22/2010en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan3PropertiesHigh performance, in-memory data processingEach node performs a particular functionNodes are mostly independent of each otherExamplesMapReduce, Hadoop, System S, Dryad
How do we ensure DataFlow operation results are correct?3/22/2010en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan4GoalsTo determine the malicious nodes in a DataFlow system
To determine the nature of their malicious action
To evaluate the quality of output dataDu et al., RunTest: Assuring Integrity of Dataflow Processing in Cloud Computing Infrastructures, AsiaCCS 2010
Possible ApproachesRe-do the computationCheck memory footprint of code executionMajority votingHardware-based attestationRun-time attestation3/22/2010en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan5
RunTest: Randomized Data AttestationIdeaFor some data inputs, send it along multiple dataflow pathsRecord and match all intermediate results from the matching nodes in the pathsBuild an attestation graph using node agreementOver time, the graph shows which node misbehave (always or time-to-time)3/22/2010en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan6
Attack ModelData model:Input deterministic DataFlow (i.e., same input to a function will always produce the same output)Data processing is stateless (e.g., selection, filtering)Attacker:Malicious or compromised cloud nodesCan produce bad results always or some timeCan collude with other malicious nodes to provide same bad result3/22/2010en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan7
Attack Model (scenarios)Parametersb_i = probability of providing bad resultc_i = probability of providing the same bad result as another malicious nodeAttack scenariosNCAM: b_i = 1, c_i = 0NCPM: 0 < b_i <1, c_i = 0 FTFC: b_i = 1, c_i = 1PTFC: 0< b_i < 1, c_i = 0PTPC: 0< b_i < 1, 0 < c_i < 13/22/2010en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan8
Integrity Attestation GraphDefinition:Vertices: Nodes in the DataFlow pathsEdges: Consistency relationships. Edge weight: fraction of consistent output of all outputs generated from same data items3/22/2010en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan9
Consistency CliqueComplete subgraph of an attestation graph which has2 or more nodesAll nodes always agree with each other (i.e., all edge weights are 1)3/22/2010en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan1021354
How to find malicious nodesIntuitionsHonest nodes will always agree with each other to produce the same outputs, given the same dataNumber of malicious nodes is less than half of all nodes3/22/2010en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan11
Finding Consistency Cliques: BK AlgorithmGoal: find the maximal clique in the attestation graphTechnique: 	Apply Bron-Kerbosch algorithm to find the maximal clique(s) (see better example at Wikipedia)Any node not in a maximal clique of size k/2 is a malicious node3/22/2010en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan12Note: BK algorithm is NP-HardAuthors proposed 2 optimizations to make it run quicker
Identifying attack patterns3/22/2010en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan13NCAMPTFC/NCPMPTFCFTFC
Inferring data qualityQuality =  1 – (c/n)wheren = total number of unique data itemsc = total number of duplicated data with inconsistent results 3/22/2010en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan14
EvaluationExtended IBM System SExperiments:Detection rateSensitivity to parametersComparison with majority voting3/22/2010en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan15
Evaluation3/22/2010en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan16NCPM (b=0.2, c=0)Different misbehavior probabilities

600.412.Lecture06

  • 1.
    Security and Privacyin Cloud ComputingRagib HasanJohns Hopkins Universityen.600.412 Spring 2010Lecture 603/22/2010
  • 2.
    Verifying Computations ina Cloud3/22/2010Scenario User sends her data processing job to the cloud. Clouds provide dataflow operation as a service (e.g., MapReduce, Hadoop etc.)Problem: Users have no way of evaluating the correctness of resultsen.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan2
  • 3.
    DataFlow Operations3/22/2010en.600.412 Spring2010 Lecture 6 | JHU | Ragib Hasan3PropertiesHigh performance, in-memory data processingEach node performs a particular functionNodes are mostly independent of each otherExamplesMapReduce, Hadoop, System S, Dryad
  • 4.
    How do weensure DataFlow operation results are correct?3/22/2010en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan4GoalsTo determine the malicious nodes in a DataFlow system
  • 5.
    To determine thenature of their malicious action
  • 6.
    To evaluate thequality of output dataDu et al., RunTest: Assuring Integrity of Dataflow Processing in Cloud Computing Infrastructures, AsiaCCS 2010
  • 7.
    Possible ApproachesRe-do thecomputationCheck memory footprint of code executionMajority votingHardware-based attestationRun-time attestation3/22/2010en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan5
  • 8.
    RunTest: Randomized DataAttestationIdeaFor some data inputs, send it along multiple dataflow pathsRecord and match all intermediate results from the matching nodes in the pathsBuild an attestation graph using node agreementOver time, the graph shows which node misbehave (always or time-to-time)3/22/2010en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan6
  • 9.
    Attack ModelData model:Inputdeterministic DataFlow (i.e., same input to a function will always produce the same output)Data processing is stateless (e.g., selection, filtering)Attacker:Malicious or compromised cloud nodesCan produce bad results always or some timeCan collude with other malicious nodes to provide same bad result3/22/2010en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan7
  • 10.
    Attack Model (scenarios)Parametersb_i= probability of providing bad resultc_i = probability of providing the same bad result as another malicious nodeAttack scenariosNCAM: b_i = 1, c_i = 0NCPM: 0 < b_i <1, c_i = 0 FTFC: b_i = 1, c_i = 1PTFC: 0< b_i < 1, c_i = 0PTPC: 0< b_i < 1, 0 < c_i < 13/22/2010en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan8
  • 11.
    Integrity Attestation GraphDefinition:Vertices:Nodes in the DataFlow pathsEdges: Consistency relationships. Edge weight: fraction of consistent output of all outputs generated from same data items3/22/2010en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan9
  • 12.
    Consistency CliqueComplete subgraphof an attestation graph which has2 or more nodesAll nodes always agree with each other (i.e., all edge weights are 1)3/22/2010en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan1021354
  • 13.
    How to findmalicious nodesIntuitionsHonest nodes will always agree with each other to produce the same outputs, given the same dataNumber of malicious nodes is less than half of all nodes3/22/2010en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan11
  • 14.
    Finding Consistency Cliques:BK AlgorithmGoal: find the maximal clique in the attestation graphTechnique: Apply Bron-Kerbosch algorithm to find the maximal clique(s) (see better example at Wikipedia)Any node not in a maximal clique of size k/2 is a malicious node3/22/2010en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan12Note: BK algorithm is NP-HardAuthors proposed 2 optimizations to make it run quicker
  • 15.
    Identifying attack patterns3/22/2010en.600.412Spring 2010 Lecture 6 | JHU | Ragib Hasan13NCAMPTFC/NCPMPTFCFTFC
  • 16.
    Inferring data qualityQuality= 1 – (c/n)wheren = total number of unique data itemsc = total number of duplicated data with inconsistent results 3/22/2010en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan14
  • 17.
    EvaluationExtended IBM SystemSExperiments:Detection rateSensitivity to parametersComparison with majority voting3/22/2010en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan15
  • 18.
    Evaluation3/22/2010en.600.412 Spring 2010Lecture 6 | JHU | Ragib Hasan16NCPM (b=0.2, c=0)Different misbehavior probabilities