SlideShare a Scribd company logo
1 of 29
Download to read offline
7/21/2014
1
Faculty of Engineering and Computer Science
Concordia Institute for Information Systems Engineering
Cloud Traffic Security
Wen Ming Liu
INSE 6620 July 23, 2014
Agenda
2
Cloud Applications
Side-Channel Attacks
Challenges and Solutions
Ceiling Padding
7/21/2014
2
Cloud Computing Architecture
3
Cloud Computing Architecture
4
7/21/2014
3
Agenda
5
Cloud Applications
Side-Channel Attacks
Challenges and Solutions
Ceiling Padding
6
Web-Based Applications
untrusted
Internet
Client Server
Encryption
“Cryptography solves all security problems!”Really?
7/21/2014
4
7
Side-Channel Attack on Encrypted Traffic
Internet
Client ServerEncrypted Traffic
User Input Observed Directional Packet Sizes
a: 801→, ←54, ←509, 60→
00: 812→, ←54, ←505, 60→,
813→, ←54, ←507, 60→
b-byte s-byte
Network packets’ sizes and directions
between user and a popular search engine
By acting as a normal user and
eavesdropping traffic with sniffer pro 4.7.5.
Collected in May 2012
Indicator of the
input itself
Fixed pattern: identified input string
8
Updated Patterns Dec 2013
Internet
Client ServerEncrypted Traffic
User Input Observed Directional Packet Sizes
a: 590→, 67→, ←60, ←60, ←728, 60→
00: 590→, 67→, ←60, ←60, ←698, 60→,
590→, 68→, ←60, ←60, ←717, 60→
b-byte s-byte
Patterns may change over time, but attacks
will still work in similar ways
Patterns may be different for different Web
applications, but they are always there
Indicator of
input itself
Fixed pattern: identified input string
7/21/2014
5
9
To Make Things Worse
The “Autocomplete” feature allows adversaries to combine the packets
corresponding to multiple keystrokes
Web applications are highly interactive
User Input Observed Directional Packet Sizes
a: 590→, 67→, ←60, ←60, ←728, 60→
00: 590→, 67→, ←60, ←60, ←698, 60→,
590→, 68→, ←60, ←60, ←717, 60→
b-byte s-byte
10
Longer Inputs, More Unique the Patterns
S value for each character entered as:
a b c d e f g
509 504 502 516 499 504 502
h i j k l m n
509 492 517 499 501 503 488
o p q r s t
509 525 494 498 488 494
u v w x y z
503 522 516 491 502 501
First keystroke: Second keystroke:
First
Keystroke
Second Keystroke
a b c d
a 509 487 493 501 497
b 504 516 488 482 481
c 502 501 488 473 477
d 516 543 478 509 499
Unique s value 12 out of 1616 out of 16
The unique patterns leak out users’
private information: the input string
In reality, it may
take more than two
keystrokes to
uniquely identify
an input string.
7/21/2014
6
Side-Channel Leaks
To protect the information in critical applications against
network sniffing, a common practice is to encrypt their
network traffic. However, as discovered in the research,
serious information leaks are still a reality.
Even though the communications generated during these
state transitions are protected by HTTPS, their observable
attributes can still give away the information about the user’s
selection.
The eavesdropper cannot see the contents, but can observe:
number of packets, timing/size of each packet.
11
Slides 11-27 are partially based on: S. Chen, R. Wang, X. Wang, and K. Zhang. Side-channel leaks in
web applications: A reality today, a challenge tomorrow. In IEEE Symposium on Security and Privacy’10,
pages 191–206, 2010.
Main Findings
Analysis of the side-channel weakness in web applications.
Several high-profile and really popular web applications actually
disclose surprisingly detailed user’s sensitive information
Personal health data, family income, investment details, search queries
The root causes of the side-channel information leaks are the
fundamental characteristics in today’s web applications.
Stateful communication, low entropy input and significant traffic
distinctions
In-depth study on the challenges in mitigating the threat.
Evaluate the effectiveness and the overhead of common mitigation
techniques such as packet padding.
Show that effective solutions to the side-channel problem have to be
application-specific, relying on an in-depth understanding of the
application being protected.
This suggests the necessity of a significant improvement of the
current practice for developing web applications.
12
7/21/2014
7
Fundamental Characteristics of Web Applications
The root causes are some fundamental characteristics
in today’s web applications :
Low entropy inputs for better interactions.
Small input space
Autosuggestion, auto-complete
Stateful communications.
Transitions to next states depend both on the current state and
on its input.
Although information for each transition may be insignificant, their
combination can be really powerful.
Significant traffic distinctions.
The chance of two different user actions having the same traffic
pattern is really small.
Such distinctions often come from the objected updated by client-
server data exchanges.
13
Significant traffic distinctions
14
7/21/2014
8
Basic of Wi-Fi Encryption Schemes:
WEP: susceptible to key-recovery attacks
WPA: TKIP (RC4)
WPA2: CCMP (128-bit AES block cipher in counter
mode)
The ciphertext fully preserves the size of its plaintext!
15
Scenario: search using encrypted Wi-Fi WPA/WPA2.
Example: user types “list” on a WPA2 laptop.
Consequence: Anybody on the street knows our search queries.
Attacker’s effort: linear, not exponential.
821
 910
822
 931
823
 995
824
 1007
16
7/21/2014
9
OnlineHealthA
(“A” denoting a pseudonym)
A web application by one of the most reputable
companies of online services
Illness/medication/surgery information is leaked
out, as well as the type of doctor being queried.
Vulnerable designs
Entering health records
By typing – auto suggestion
By mouse selecting – a tree-structure organization of
elements
Finding a doctor
Using a dropdown list item as the search input
17
tabs
Entering health records: no matter
keyboard typing or mouse selection,
attacker has a 2000× ambiguity
reduction power.
Find-A-Doctor: attacker can
uniquely identify the specialty.
Attacker’s power
7/21/2014
10
OnlineTaxA
It is the online version of one of the most widely
used applications for the U.S. tax preparation.
Design: a tax-preparation wizard
Tailor the conversation based on user’s previous input.
The forms that you work on tell a lot about your
family
Filing status
Number of children
Paid big medical bill
The adjusted gross income (AGI)
19
Entry page of
Deductions &
Credits
Summary of
Deductions &
Credits
Full credit
Not eligible
Partial credit
All transitions have unique traffic patterns.
Consult the IRS instruction:
$1000 for each child
Phase-out starting from $110,000. For every $1000 income, lose $50
credit.
$0
$110000 $150000
Not eligibleFull credit Partial credit
(two children scenario)
child credit state machine
20
7/21/2014
11
Entry page of
Deductions &
Credits
Summary of
Deductions &
Credits
Full credit
Not eligible
Partial credit
Even worse, most decision procedures for credits/deductions
have asymmetric paths.
Eligible – more questions
Not eligible – no more question
Enter your
paid interest
$0
$115000 $145000
Not eligibleFull credit Partial credit
Student-loan-interest credit
21
Disabled Credit
$24999
Retirement Savings
$53000
IRA Contribution
$85000 $105000
College Expense $116000
$115000Student Loan Interest $145000
First-time Homebuyer credit $150000 $170000
Earned Income Credit
$41646
Child credit *
$110000
Adoption expense $174730 $214780
$130000 or $150000 or $170000 …
$0
A subset of identifiable AGI thresholds
We are not tax experts.
OnlineTaxA can find more than 350 credits/deductions.
22
7/21/2014
12
A major financial institution in the U.S.
Which funds you invest?
• No secret.
• Each price history curve is a
GIF image from MarketWatch.
• Everybody in the world can
obtain the images from
MarketWatch.
• Just compare the image sizes!
OnlineInvestA
Your investment allocation
• Given only the size of the pie chart,
can we recover it?
• Challenge: hundreds of pie-charts
collide on a same size.
23
Inference based on the evolution of
the pie-chart size in 4-or-5 days
The financial institution updates the pie chart every day
after the market is closed.
The mutual fund prices are public knowledge.
≅ 800 charts ≅ 80 charts ≅ 8 charts 1 chart
Sizeofday1
Sizeofday2;
Pricesoftheday
Sizeofday3;
Pricesoftheday
Sizeofday4;
Pricesoftheday
≅80000charts
24
7/21/2014
13
Challenging to Mitigate the Vulnerabilities
25
Traffic differences are everywhere. Which ones result in
serious data leaks?
Need to analyze the application semantics, the availability of
domain knowledge, etc.
Hard.
Is there a vulnerability-agnostic defense to fix the
vulnerabilities without finding them?
Obviously, padding is a must-do strategy.
We found that even for the discussed apps, the defense policies
have to be case-by-case.
Why challenging?
26
7/21/2014
14
See if problem can be solved without analyzing
individual application
Application-agnostic manner: Padding
Rounding:
Random padding:
Average overhead:
Given , reduction power being
calculated after padding
Universal Mitigation Policies
27
Any Problem?
Agenda
28
Cloud Applications
Side-Channel Attacks
Challenges and Solutions
Ceiling Padding
7/21/2014
15
29
Trivial problem?
30
Don’t Forget the Cost
(Prefix) char s Value Rounding (ΔΔΔΔ)
64 160 256
(c) c 473 512 480 512
(c) d 477 512 480 512
(d) b 478 512 480 512
(d) d 499 512 640 512
(a) c 501 512 640 512
(b) a 516 576 640 768
Padding Overhead (%) 6.5% 14.1% 13.0%
1 S. Chen, R. Wang, X. Wang, and K. Zhang. Side-channel leaks in web applications: A reality today, a
challenge tomorrow. In IEEE Symposium on Security and Privacy’10, pages 191–206, 2010.
No guarantee of better privacy at a higher cost
↑			⇏ privacy			↑
↑	 ⇏ overhead	↑
To make all inputs indistinguishable by rounding will result in
a 21074% overhead for a well-known online tax system 1
7/21/2014
16
Two Conflicting Goals
31
To prevent side-channel attacks, we face two
seemingly conflicting goals,
Privacy protection:
Reduce the differences in packet sizes
Cost:
Minimize the overhead (communication and processing…)
The similar goals may allow us to borrow existing
expertise on privacy-preserving data publishing(PPDP).
How?
Grouping and Breaking
Slides 31-45 are partially based on: W. M. Liu, L. Wang, K. Ren, P. Cheng, M. Debbabi, S. Zhu, “PPTP:
Privacy-Preserving Traffic Padding in Web-Based Applications,” IEEE Trans. on Dependable and
Secure Computing (TDSC),
Solution: Ceiling Padding
32
473 477 478 (c) c
477 477 478 (c) d
478 499 478 (d) b
499 499 516 (d) d
501 516 516 (a) c
516 516 516 (b) a
S Value Padding (Prefix)
charOption 1 Option 2
Quasi-ID Function 1 Function 2 Sensitive
AttributeGeneralization
PPTP:
Padding group
PPDP:
anonymized group
PPTP goals:
Privacy
Cost
PPDP goals:
Privacy
Data utility
So we can apply existing techniques in data publication to achieve ceiling padding
However, there are a few difference, and hence challenges...
Ceiling padding: pad every packet to the maximum size in the group
7/21/2014
17
33
PPTP Components
Internet
Interaction:
action a:
Atomic user input that triggers traffic
A keystroke, a mouse click …
action-sequence ܽԦ :
A sequence of actions with complete input info
Consecutive keystrokes…
action-set Ai:
Collection of all ith actions in a set of action-seq
Observation:
flow-vector v:
A sequence of flows (directional packet sizes)
Triggered an action
vector-sequence ‫ݒ‬Ԧ	:
A sequence of flow-vectors
Triggered by an equal-length action-sequence
vector-set Vi:
Collection of all ith vectors in a set of vector-seq
Vector-Action Set VAi:
Pairs of ith actions and corresponding ith flow-vectors
User Input Observed Directional Packet Sizes
a: 801→, ←54, ←509, 60→
00: 812→, ←54, ←505, 60→,
813→, ←54, ←507, 60→
34
Privacy and Cost
k-indistinguishability: Given a vector-action set VA
Padding group:
any S⊆VA satisfying all the pairs in S have identical flow-vectors and no S’ ⊃S can
satisfy this property
We say VA satisfies k-indistinguishability (k is an integer) if the cardinality
of every padding group is no less than k
Goal of privacy protection:
Upon observing any flow-vector in the traffic, the eavesdropper cannot
determine which action in the table (vector-action set) has triggered this flow-
vector.
l-diversity:
Address the cases that:
No all inputs should be treated equally in padding (for example, some statistical
information regarding the likelihood of different inputs may be publicly known).
7/21/2014
18
35
Privacy and Cost
Vector-distance:
Given two equal-length flow-vectors v1 and v2, vector-distance is the
total number of bytes different in the flows: ‫ݏ݅݀ݒ‬ ‫ݒ‬ଵ, ‫ݒ‬ଶ =
∑ (|‫ݏ‬ଵ௜ − ‫ݏ‬ଶ௜|)
௩భ
௜ୀଵ .
Padding cost:
Given a vector-set V, the padding cost is the sum of the vector-
distances between each flow-vector in V and its countpart after padding.
Processing cost:
Given a vector-set V, the processing cost is the number of flows in V
which corresponding packets should be padded.
Agenda
36
Cloud Applications
Side-Channel Attacks
Challenges and Solutions
Ceiling Padding
7/21/2014
19
Challenge 1
37
473 477 478 (c) c
477 477 478 (c) d
478 499 478 (d) b
499 499 516 (d) d
501 516 516 (a) c
516 516 516 (b) a
S Value Padding (Prefix)
charOption 1 Option 2
Differences and challenges:
Data utility measures & padding cost
Traffic padding: cost of option 1 is worse than that of option 2
Data publication: utility of function 1 is better than that of option 2
38
Challenge 2
Recall that adversaries may combine multiple keystokes
Example:
One obvious, but invalid solution:
Pad every keystroke (separately)
Another obvious, but invalid solution:
Pad on the whole string!
First
Keystroke
Second Keystroke
a b c d
a 487 493 501 497
b 516 488 482 481
c 501 488 473 477
d 543 478 509 499
First
Keystroke
Second Keystroke
a b c d
a 487 493 501 497
b 516 488 482 481
c 501 488 473 477
d 543 478 509 499
First
Keystroke
Second Keystroke
a b c d
a … … 501 …
b … 488 … …
c 501 488 … …
d … … … …
First
Keystroke
Second Keystroke
a b c d
a 509 … … 501 …
b 504 … 488 … …
c 502 501 488 … …
d 516 … … … …
First
Keystroke
Second Keystroke
a b c d
a 516 … … 501 …
b 504 … 488 … …
c 504 501 488 … …
d 516 … … … …
Strings 1st keystroke 2nd keystroke
ac 509 501
ca 502 501
ad 509 497
dd 516 499
… … …
7/21/2014
20
39
PPTP - Overview of Algorithms
Intention:
To demonstrate the existence of abundant possibilities in approaching PPTP
issue, and not to design an exhaustive list of solutions.
Design three algorithms for partitioning inputs into padding groups.
Main difference: the algorithms handle in increasingly complicated cases .
Computational complexity:
svsdSimple algorithm: Ο ݈݊‫݊݃݋‬
svmdGreedy algorithm: Ο(݊‫݌‬ ⨯ 	݊2) (worse case), 	Ο(݊‫݌‬ ⨯ 	݈݊‫)݊݃݋‬ (average case)
mvmdGreedy algorithm:Ο(݊‫݌‬ ⨯ 	݊2) (worse case), 	Ο(݊‫݌‬ ⨯ 	݈݊‫)݊݃݋‬ (average case)
40
Experiment Settings
Collect data from two real-world web applications:
A popular search engine
(users’ search keyword needs to be protected)
Collect flow-vectors for query suggestion widget for all possible combinations of four letters
by crafting requests to simulate the normal AJAX connection request.
Authoritative drug information system from national institute
(user’s possible health information needs to be protected)
Collect vector-action set for all the drug information by mouse-selecting following the
application’s three-level tree-hierarchical navigation.
7/21/2014
21
41
Overhead - Padding Cost
The padding cost against k:
To compare to rounding, Δ=512	(engineB)	and	Δ=5120	(drugB)	which achieves only 5-indistinguishility.
Our algorithms have less padding cost in both cases.
Observe that our algorithms are superior specially when the number of flow-vectors is larger.
42
Overhead – Execution Time
Generate n-size flow data by synthesizing n/|VA| copies of engineB and drugB.
The computation time of mvmdGreedy increases slowly with n.
Practically efficient (1.2s for 2.7m flow-vectors),
Require slightly more overhead than rounding when it is applied to a single Δ value.
The computational time of mvmdGreedy against privacy property k
A tighter upper bound: Ο(݊‫݌‬ ⨯ 	݊ ⨯ 2݇ ⨯ 	λ) (worse case), 	Ο(݊‫݌‬ ⨯ 	݊ ⨯ log	(2݇ ⨯ 	λ)) (average case)
The computation time increases slowly with k for engineB, and decreases slowly for drugB.
7/21/2014
22
43
Overhead – Processing Cost
An application can choose to incorporate the padding at different stage of
processing a request, however, we must minimize the number of packets to be
padded.
Pad the flow-vectors on the fly,
Modify the original data beforehand.
The processing cost against k:
Rounding must pad each flow-vector regardless of the k’s and the applications, while our
algorithms have much less cost for engineB and slightly less for drugB.
Extension
44
Adapt l-diversity to address cases that:
No all inputs should be treated equally in padding.
Model:
Catch the information about the inequality.
Algorithms:
Need additional constraints on partition.
7/21/2014
23
45
Experiments
Collect data from two real-world web applications:
Another popular search engine
(users’ search keyword needs to be protected)
Authoritative patent information system from national institute
(company’s patent interest needs to be protected)
46
Challenge 3: Ceiling Padding Defeated
Condition s Value Rounding (ΔΔΔΔ) Ceiling
Padding112 144 176
Cancer 360 448 432 528 360
Cervicitis 290 336 432 352 360
Cold 290 336 432 352 290
Cough 290 336 432 352 290
Padding Overhead (%) 18.4% 40.5% 28.8% 5.7%
2-indistinguishability
Observation: a patient received a 360-byte packet after login
Cancer? Cervicitis? ⇒	50%	,	50%
Extra knowledge: this patient is a male
Cancer? Cervicitis? ⇒	100%	,	0%
Facts (Ceiling Padding):
Slides 46-58 are partially based on: W. M. Liu, L. Wang, K. Ren, M. Debbabi,, “Background Knowledge-
Resistant Traffic Padding for Preserving User Privacy in Web-Based Applications,” Proc. The 5th IEEE
International Conference and on Cloud Computing Technology and Science (IEEE CloudCom 2013),
7/21/2014
24
47
Solution: Add Randomness
Condition s Value
Cancer 360
Cervicitis 290
Cold 290
Cough 290
Cancerous Person
360
360
360
Random Ceiling Padding
Instead of deterministically forming padding groups, the server will randomly
(at uniform, in this example) selects one out of the other three conditions
(together with the real condition) to form a padding group for ceiling padding.
Always receive a
360-byte packet
Condition s Value
Cancer 360
Cervicitis 290
Cold 290
Cough 290
Cervicitis Patient
360
290
66.7%: 290-byte packet
33.3%: 360-byte packet
290
48
Better Privacy Protection
Diseases s Value
Cancer 360
Cervicitis 290
Cold 290
Cough 290
Can tolerate adversaries’ extra knowledge
Suppose an adversary knows a patient is male and he saw s = 360
Patient
Has
Server
Selects
Cancer Cervicitis
Cancer Cold
Cancer Cough
Cold Cancer
Cough Cancer
The adversary
now can only
be 60%, instead
of 100%, sure
that patient has
Cancer.
Cost is not necessarily worse
In this example, these two methods actually lead to exactly the same
expected padding and processing costs
7/21/2014
25
Analysis
49
Scenario:
Algorithm: randomness drawn from uniform distribution
Data: action-sequence and flow-vector are of length one
Analysis of privacy preservation:
Lemma 6.1
Analysis of costs:
Lemma 6.2, Lemma 6.3
Model, Scheme and Experiment
50
Model:
component, privacy, method, cost
Scheme:
Main idea:
Server randomly selects members to form the group.
Different choices of random distribution lead to different algorithms.
Two instantiations of scheme:
Bounded uniform distribution; Normal distribution
Computation complexity: O(k)
Experiment:
Data from real-world web applications
Low overheads and high uncertainty
7/21/2014
26
51
Privacy Properties
k-Indistinguishability:
For any flow-vector, at least k different actions can trigger it.
Given vector-action set VA, padding algorithm M, range Range(M,VA)
∀ ‫ݒ‬ ∈ ܴܽ݊݃݁ ‫,ܯ‬ ܸ‫ܣ‬ , ܽ: Pr	(‫ܯ‬ିଵ ‫ݒ‬ = ܽ > 0 ∧ ܽ ∈ ‫ܣ‬ ≥ ݇
Model the privacy requirement of a traffic padding from two perspectives
Uncertainty:
Apply the concept of entropy in information theory to quantify an
adversary’s uncertainty about the real action performed by a user.
Given vector-action sequence ܸ‫,ܣ‬ padding algorithm M
߮ ‫,ݒ‬ ܸ‫,ܣ‬ ‫ܯ‬ = − ෍ Pr	(‫ܯ‬ିଵ
‫ݒ‬ = ܽ logଶ Pr	(‫ܯ‬ିଵ
‫ݒ‬ = ܽ
௔∈஺
߶ ܸ‫,ܣ‬ ‫ܯ‬ = ෍ ߮(‫,ݒ‬ ܸ‫,ܣ‬ ‫)ܯ‬ × Pr	(‫ܯ‬ ‫ܣ‬ = ‫)ݒ‬
௩∈ோ௔௡௚௘(ெ,௏஺)
Φ ܸ‫,ܣ‬ ‫ܯ‬ = ෑ ߶(ܸ‫,ܣ‬ ‫)ܯ‬
௏஺∈௏஺
52
Privacy Properties (cont.)
δ-uncertain k-Indistinguishability:
An algorithm M give δ-uncertain k-Indistinguishability for a
vector-action sequence ܸ‫ܣ‬ if
M w.r.t. any ܸ‫ܣ‬ ∈ ܸ‫ܣ‬ satisfies k-indistinguishability, and
The uncertainty of M w.r.t. ܸ‫ܣ‬ is not less than δ.
7/21/2014
27
53
Padding Method
Ceiling padding [15][16]:
Inspired by PPDP: grouping and breaking
Dominant-vector of a padding group:
Size of each group is not less than k, and
every flow-vector in a group is padded to dominant-vector of that group.
Achieves k-
indistinguishability, but
not sufficient if the
adversary possess
prior knowledge.
Random ceiling padding method:
A mechanism M: when responding to an action a (per each user request),
It randomly selects k-1 other actions, and
Pads the flow-vector of action a to be dominant-vector of transient group
(those k actions).
Randomness:
Randomly selects members of transient group from certain candidates
based on certain distributions.
To reduce the cost, change the probability of an action being selected as
a member of transient group.
54
Cost Metrics
Expected processing cost:
How many flow-vectors need to be padded
‫ݏ݋ܿݎ‬ ܸ‫,ܣ‬ ‫ܯ‬ =
∑ ∑ ୔୰	 ெ(௔)ஷ௩(ೌ,ೡ)∈ೇಲೇಲ∈ೇಲ
∑ ௔,௩ : ௔,௩ ∈௏஺
ೇಲ∈ೇಲ
Expected padding cost:
The proportion of packet size increases compared to original flow-vectors
Given vector-action sequence ܸ‫,ܣ‬ padding algorithm M,
(ܽ, ‫:)ݒ‬
‫ݏ݋ܿ݌‬ ܽ, ܸ‫,ܣ‬ ‫ܯ‬ = ෍ 	 Pr ‫ܯ‬ ܽ = ‫ݒ‬ᇱ × ‫ݒ‬ᇱ − ‫ݒ‬
௩ᇱ∈ோ௔௡௚௘(ெ,௏஺) ଵ
ܸ‫:ܣ‬
‫ݏ݋ܿ݌‬ ܸ‫,ܣ‬ ‫ܯ‬ = ෍ (‫,ܽ(ݏ݋ܿ݌‬ ܸ‫,ܣ‬ ‫))ܯ‬
௔,௩ ∈௏஺
ܸ‫:ܣ‬ ‫ݏ݋ܿ݌‬ ܸ‫,ܣ‬ ‫ܯ‬ = ෍ (‫,ܣܸ(ݏ݋ܿ݌‬ ‫))ܯ‬
௏஺∈௏஺
7/21/2014
28
55
Overview of Scheme
Main idea:
To response a user input, server randomly selects members to form the group.
Different choices of random distribution lead to different algorithms.
Goal:
The privacy properties need to be ensured.
The costs of achieving such privacy protection should be minimized.
Two stage scheme:
Stage 1: derive randomness parameters, one-time, optimization problem;
Stage 2: form transient group, real-time.
56
Scheme (cont.)
Computational complexity: O(k)
Stage 1: pre-calculated only once
Stage 2: select k-1 random actions without duplicate, O(k).
Discussion on privacy:
The adversary cannot collect vector-action set even acting as normal user,
ܸ‫ܣ‬ = 100, ݇ = 20, ‫	݉ݎ݋݂݅݊ݑ‬ ⟹	 ‫)݊݋݅ݐܿܽ	݊ܽ(	ݏ݌ݑ݋ݎ݃	ݐ݊݁݅ݏ݊ܽݎݐ‬ =
99
19
≈ 2଺଺
Approximate the distribution is hard: all users share one random process.
Discussion on costs:
Deterministically incomparable with those of ceiling padding.
7/21/2014
29
57
Instantiations of Scheme
Bounded uniform distribution:
ct: cardinality of candidate actions
cl: number of larger candidates
Scheme can be realized in many different ways.
Choose group members from different subsets of candidates and based on
different distributions, in order to reduce costs.
Normal distribution:
µ: mean
σ: standard deviation
0 n
i
action
cl
ct
0 n
i
action
largest smallest largest smallest
probability
[max(0,min(i-cl, |VA|-ct)), min(max(0,min(i-cl, |VA|-ct))+ct, |VA|)]
(return)
58
Uncertainty and Costs v.s. k
The padding and processing costs of all algorithms increase
with k, while TUNI and NORM have less than those of SVMD.
Our algorithms have much larger uncertainty for Drug and
slightly larger for Engine.

More Related Content

Viewers also liked (12)

Hadoop
HadoopHadoop
Hadoop
 
Live migration
Live migrationLive migration
Live migration
 
Gfs
GfsGfs
Gfs
 
Handout1o
Handout1oHandout1o
Handout1o
 
Map reduce
Map reduceMap reduce
Map reduce
 
About clouds
About cloudsAbout clouds
About clouds
 
Nist cloud comp
Nist cloud compNist cloud comp
Nist cloud comp
 
Outsourcing control
Outsourcing controlOutsourcing control
Outsourcing control
 
Bigtable osdi06
Bigtable osdi06Bigtable osdi06
Bigtable osdi06
 
Paravirtualization
ParavirtualizationParavirtualization
Paravirtualization
 
6620handout4o
6620handout4o6620handout4o
6620handout4o
 
Datacenter as computer
Datacenter as computerDatacenter as computer
Datacenter as computer
 

Similar to 6620handout5t

Edgescan 2022 Vulnerability Statistics Report
Edgescan 2022 Vulnerability Statistics ReportEdgescan 2022 Vulnerability Statistics Report
Edgescan 2022 Vulnerability Statistics ReportEoin Keary
 
2022 Vulnerability Statistics Report.pdf
2022 Vulnerability Statistics Report.pdf2022 Vulnerability Statistics Report.pdf
2022 Vulnerability Statistics Report.pdfssuserc3d7ec1
 
Wireless Communication, Sensing and REM: A Security Perspective
Wireless Communication, Sensing and REM: A Security PerspectiveWireless Communication, Sensing and REM: A Security Perspective
Wireless Communication, Sensing and REM: A Security PerspectiveIRJET Journal
 
IEEE 2015 C# Projects
IEEE 2015 C# ProjectsIEEE 2015 C# Projects
IEEE 2015 C# ProjectsVijay Karan
 
IEEE 2015 C# Projects
IEEE 2015 C# ProjectsIEEE 2015 C# Projects
IEEE 2015 C# ProjectsVijay Karan
 
Roberto minerva 20181130
Roberto minerva 20181130  Roberto minerva 20181130
Roberto minerva 20181130 Roberto Minerva
 
Networking IEEE 2014 Projects
Networking IEEE 2014 ProjectsNetworking IEEE 2014 Projects
Networking IEEE 2014 ProjectsVijay Karan
 
Networking ieee-2014-projects
Networking ieee-2014-projectsNetworking ieee-2014-projects
Networking ieee-2014-projectsVijay Karan
 
© 2014 Straand may notStrayer UnivCIS 502 FacTechn.docx
© 2014 Straand may notStrayer UnivCIS 502 FacTechn.docx© 2014 Straand may notStrayer UnivCIS 502 FacTechn.docx
© 2014 Straand may notStrayer UnivCIS 502 FacTechn.docxLynellBull52
 
Inside TorrentLocker (Cryptolocker) Malware C&C Server
Inside TorrentLocker (Cryptolocker) Malware C&C Server Inside TorrentLocker (Cryptolocker) Malware C&C Server
Inside TorrentLocker (Cryptolocker) Malware C&C Server Davide Cioccia
 
IEEE 2014 DOTNET DATA MINING PROJECTS A robust multiple watermarking techniqu...
IEEE 2014 DOTNET DATA MINING PROJECTS A robust multiple watermarking techniqu...IEEE 2014 DOTNET DATA MINING PROJECTS A robust multiple watermarking techniqu...
IEEE 2014 DOTNET DATA MINING PROJECTS A robust multiple watermarking techniqu...IEEEMEMTECHSTUDENTPROJECTS
 
Offline and Online Bank Data Synchronization System
Offline and Online Bank Data Synchronization SystemOffline and Online Bank Data Synchronization System
Offline and Online Bank Data Synchronization Systemijceronline
 
The Good, the bad, and the ugly of Thin Client/Server Computing
The Good, the bad, and the ugly of Thin Client/Server ComputingThe Good, the bad, and the ugly of Thin Client/Server Computing
The Good, the bad, and the ugly of Thin Client/Server ComputingThe Integral Worm
 
Data Center Interconnects: An Overview
Data Center Interconnects: An OverviewData Center Interconnects: An Overview
Data Center Interconnects: An OverviewXO Communications
 
Big Data for Mobile Network Operator
Big Data for Mobile Network OperatorBig Data for Mobile Network Operator
Big Data for Mobile Network OperatorErkan Örün
 
Mobile Computing IEEE 2014 Projects
Mobile Computing IEEE 2014 ProjectsMobile Computing IEEE 2014 Projects
Mobile Computing IEEE 2014 ProjectsVijay Karan
 

Similar to 6620handout5t (20)

Edgescan 2022 Vulnerability Statistics Report
Edgescan 2022 Vulnerability Statistics ReportEdgescan 2022 Vulnerability Statistics Report
Edgescan 2022 Vulnerability Statistics Report
 
2022 Vulnerability Statistics Report.pdf
2022 Vulnerability Statistics Report.pdf2022 Vulnerability Statistics Report.pdf
2022 Vulnerability Statistics Report.pdf
 
PacketsNeverLie
PacketsNeverLiePacketsNeverLie
PacketsNeverLie
 
Wireless Communication, Sensing and REM: A Security Perspective
Wireless Communication, Sensing and REM: A Security PerspectiveWireless Communication, Sensing and REM: A Security Perspective
Wireless Communication, Sensing and REM: A Security Perspective
 
IEEE 2015 C# Projects
IEEE 2015 C# ProjectsIEEE 2015 C# Projects
IEEE 2015 C# Projects
 
IEEE 2015 C# Projects
IEEE 2015 C# ProjectsIEEE 2015 C# Projects
IEEE 2015 C# Projects
 
Secure Data Transmission using IBOOS in VANET
Secure Data Transmission using IBOOS in VANET Secure Data Transmission using IBOOS in VANET
Secure Data Transmission using IBOOS in VANET
 
Roberto minerva 20181130
Roberto minerva 20181130  Roberto minerva 20181130
Roberto minerva 20181130
 
Networking IEEE 2014 Projects
Networking IEEE 2014 ProjectsNetworking IEEE 2014 Projects
Networking IEEE 2014 Projects
 
Networking ieee-2014-projects
Networking ieee-2014-projectsNetworking ieee-2014-projects
Networking ieee-2014-projects
 
© 2014 Straand may notStrayer UnivCIS 502 FacTechn.docx
© 2014 Straand may notStrayer UnivCIS 502 FacTechn.docx© 2014 Straand may notStrayer UnivCIS 502 FacTechn.docx
© 2014 Straand may notStrayer UnivCIS 502 FacTechn.docx
 
Inside TorrentLocker (Cryptolocker) Malware C&C Server
Inside TorrentLocker (Cryptolocker) Malware C&C Server Inside TorrentLocker (Cryptolocker) Malware C&C Server
Inside TorrentLocker (Cryptolocker) Malware C&C Server
 
Emmert_Resume
Emmert_ResumeEmmert_Resume
Emmert_Resume
 
IEEE 2014 DOTNET DATA MINING PROJECTS A robust multiple watermarking techniqu...
IEEE 2014 DOTNET DATA MINING PROJECTS A robust multiple watermarking techniqu...IEEE 2014 DOTNET DATA MINING PROJECTS A robust multiple watermarking techniqu...
IEEE 2014 DOTNET DATA MINING PROJECTS A robust multiple watermarking techniqu...
 
Offline and Online Bank Data Synchronization System
Offline and Online Bank Data Synchronization SystemOffline and Online Bank Data Synchronization System
Offline and Online Bank Data Synchronization System
 
Networking java titles Adrit Solution
Networking java titles Adrit SolutionNetworking java titles Adrit Solution
Networking java titles Adrit Solution
 
The Good, the bad, and the ugly of Thin Client/Server Computing
The Good, the bad, and the ugly of Thin Client/Server ComputingThe Good, the bad, and the ugly of Thin Client/Server Computing
The Good, the bad, and the ugly of Thin Client/Server Computing
 
Data Center Interconnects: An Overview
Data Center Interconnects: An OverviewData Center Interconnects: An Overview
Data Center Interconnects: An Overview
 
Big Data for Mobile Network Operator
Big Data for Mobile Network OperatorBig Data for Mobile Network Operator
Big Data for Mobile Network Operator
 
Mobile Computing IEEE 2014 Projects
Mobile Computing IEEE 2014 ProjectsMobile Computing IEEE 2014 Projects
Mobile Computing IEEE 2014 Projects
 

Recently uploaded

Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 

Recently uploaded (20)

Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 

6620handout5t

  • 1. 7/21/2014 1 Faculty of Engineering and Computer Science Concordia Institute for Information Systems Engineering Cloud Traffic Security Wen Ming Liu INSE 6620 July 23, 2014 Agenda 2 Cloud Applications Side-Channel Attacks Challenges and Solutions Ceiling Padding
  • 3. 7/21/2014 3 Agenda 5 Cloud Applications Side-Channel Attacks Challenges and Solutions Ceiling Padding 6 Web-Based Applications untrusted Internet Client Server Encryption “Cryptography solves all security problems!”Really?
  • 4. 7/21/2014 4 7 Side-Channel Attack on Encrypted Traffic Internet Client ServerEncrypted Traffic User Input Observed Directional Packet Sizes a: 801→, ←54, ←509, 60→ 00: 812→, ←54, ←505, 60→, 813→, ←54, ←507, 60→ b-byte s-byte Network packets’ sizes and directions between user and a popular search engine By acting as a normal user and eavesdropping traffic with sniffer pro 4.7.5. Collected in May 2012 Indicator of the input itself Fixed pattern: identified input string 8 Updated Patterns Dec 2013 Internet Client ServerEncrypted Traffic User Input Observed Directional Packet Sizes a: 590→, 67→, ←60, ←60, ←728, 60→ 00: 590→, 67→, ←60, ←60, ←698, 60→, 590→, 68→, ←60, ←60, ←717, 60→ b-byte s-byte Patterns may change over time, but attacks will still work in similar ways Patterns may be different for different Web applications, but they are always there Indicator of input itself Fixed pattern: identified input string
  • 5. 7/21/2014 5 9 To Make Things Worse The “Autocomplete” feature allows adversaries to combine the packets corresponding to multiple keystrokes Web applications are highly interactive User Input Observed Directional Packet Sizes a: 590→, 67→, ←60, ←60, ←728, 60→ 00: 590→, 67→, ←60, ←60, ←698, 60→, 590→, 68→, ←60, ←60, ←717, 60→ b-byte s-byte 10 Longer Inputs, More Unique the Patterns S value for each character entered as: a b c d e f g 509 504 502 516 499 504 502 h i j k l m n 509 492 517 499 501 503 488 o p q r s t 509 525 494 498 488 494 u v w x y z 503 522 516 491 502 501 First keystroke: Second keystroke: First Keystroke Second Keystroke a b c d a 509 487 493 501 497 b 504 516 488 482 481 c 502 501 488 473 477 d 516 543 478 509 499 Unique s value 12 out of 1616 out of 16 The unique patterns leak out users’ private information: the input string In reality, it may take more than two keystrokes to uniquely identify an input string.
  • 6. 7/21/2014 6 Side-Channel Leaks To protect the information in critical applications against network sniffing, a common practice is to encrypt their network traffic. However, as discovered in the research, serious information leaks are still a reality. Even though the communications generated during these state transitions are protected by HTTPS, their observable attributes can still give away the information about the user’s selection. The eavesdropper cannot see the contents, but can observe: number of packets, timing/size of each packet. 11 Slides 11-27 are partially based on: S. Chen, R. Wang, X. Wang, and K. Zhang. Side-channel leaks in web applications: A reality today, a challenge tomorrow. In IEEE Symposium on Security and Privacy’10, pages 191–206, 2010. Main Findings Analysis of the side-channel weakness in web applications. Several high-profile and really popular web applications actually disclose surprisingly detailed user’s sensitive information Personal health data, family income, investment details, search queries The root causes of the side-channel information leaks are the fundamental characteristics in today’s web applications. Stateful communication, low entropy input and significant traffic distinctions In-depth study on the challenges in mitigating the threat. Evaluate the effectiveness and the overhead of common mitigation techniques such as packet padding. Show that effective solutions to the side-channel problem have to be application-specific, relying on an in-depth understanding of the application being protected. This suggests the necessity of a significant improvement of the current practice for developing web applications. 12
  • 7. 7/21/2014 7 Fundamental Characteristics of Web Applications The root causes are some fundamental characteristics in today’s web applications : Low entropy inputs for better interactions. Small input space Autosuggestion, auto-complete Stateful communications. Transitions to next states depend both on the current state and on its input. Although information for each transition may be insignificant, their combination can be really powerful. Significant traffic distinctions. The chance of two different user actions having the same traffic pattern is really small. Such distinctions often come from the objected updated by client- server data exchanges. 13 Significant traffic distinctions 14
  • 8. 7/21/2014 8 Basic of Wi-Fi Encryption Schemes: WEP: susceptible to key-recovery attacks WPA: TKIP (RC4) WPA2: CCMP (128-bit AES block cipher in counter mode) The ciphertext fully preserves the size of its plaintext! 15 Scenario: search using encrypted Wi-Fi WPA/WPA2. Example: user types “list” on a WPA2 laptop. Consequence: Anybody on the street knows our search queries. Attacker’s effort: linear, not exponential. 821  910 822  931 823  995 824  1007 16
  • 9. 7/21/2014 9 OnlineHealthA (“A” denoting a pseudonym) A web application by one of the most reputable companies of online services Illness/medication/surgery information is leaked out, as well as the type of doctor being queried. Vulnerable designs Entering health records By typing – auto suggestion By mouse selecting – a tree-structure organization of elements Finding a doctor Using a dropdown list item as the search input 17 tabs Entering health records: no matter keyboard typing or mouse selection, attacker has a 2000× ambiguity reduction power. Find-A-Doctor: attacker can uniquely identify the specialty. Attacker’s power
  • 10. 7/21/2014 10 OnlineTaxA It is the online version of one of the most widely used applications for the U.S. tax preparation. Design: a tax-preparation wizard Tailor the conversation based on user’s previous input. The forms that you work on tell a lot about your family Filing status Number of children Paid big medical bill The adjusted gross income (AGI) 19 Entry page of Deductions & Credits Summary of Deductions & Credits Full credit Not eligible Partial credit All transitions have unique traffic patterns. Consult the IRS instruction: $1000 for each child Phase-out starting from $110,000. For every $1000 income, lose $50 credit. $0 $110000 $150000 Not eligibleFull credit Partial credit (two children scenario) child credit state machine 20
  • 11. 7/21/2014 11 Entry page of Deductions & Credits Summary of Deductions & Credits Full credit Not eligible Partial credit Even worse, most decision procedures for credits/deductions have asymmetric paths. Eligible – more questions Not eligible – no more question Enter your paid interest $0 $115000 $145000 Not eligibleFull credit Partial credit Student-loan-interest credit 21 Disabled Credit $24999 Retirement Savings $53000 IRA Contribution $85000 $105000 College Expense $116000 $115000Student Loan Interest $145000 First-time Homebuyer credit $150000 $170000 Earned Income Credit $41646 Child credit * $110000 Adoption expense $174730 $214780 $130000 or $150000 or $170000 … $0 A subset of identifiable AGI thresholds We are not tax experts. OnlineTaxA can find more than 350 credits/deductions. 22
  • 12. 7/21/2014 12 A major financial institution in the U.S. Which funds you invest? • No secret. • Each price history curve is a GIF image from MarketWatch. • Everybody in the world can obtain the images from MarketWatch. • Just compare the image sizes! OnlineInvestA Your investment allocation • Given only the size of the pie chart, can we recover it? • Challenge: hundreds of pie-charts collide on a same size. 23 Inference based on the evolution of the pie-chart size in 4-or-5 days The financial institution updates the pie chart every day after the market is closed. The mutual fund prices are public knowledge. ≅ 800 charts ≅ 80 charts ≅ 8 charts 1 chart Sizeofday1 Sizeofday2; Pricesoftheday Sizeofday3; Pricesoftheday Sizeofday4; Pricesoftheday ≅80000charts 24
  • 13. 7/21/2014 13 Challenging to Mitigate the Vulnerabilities 25 Traffic differences are everywhere. Which ones result in serious data leaks? Need to analyze the application semantics, the availability of domain knowledge, etc. Hard. Is there a vulnerability-agnostic defense to fix the vulnerabilities without finding them? Obviously, padding is a must-do strategy. We found that even for the discussed apps, the defense policies have to be case-by-case. Why challenging? 26
  • 14. 7/21/2014 14 See if problem can be solved without analyzing individual application Application-agnostic manner: Padding Rounding: Random padding: Average overhead: Given , reduction power being calculated after padding Universal Mitigation Policies 27 Any Problem? Agenda 28 Cloud Applications Side-Channel Attacks Challenges and Solutions Ceiling Padding
  • 15. 7/21/2014 15 29 Trivial problem? 30 Don’t Forget the Cost (Prefix) char s Value Rounding (ΔΔΔΔ) 64 160 256 (c) c 473 512 480 512 (c) d 477 512 480 512 (d) b 478 512 480 512 (d) d 499 512 640 512 (a) c 501 512 640 512 (b) a 516 576 640 768 Padding Overhead (%) 6.5% 14.1% 13.0% 1 S. Chen, R. Wang, X. Wang, and K. Zhang. Side-channel leaks in web applications: A reality today, a challenge tomorrow. In IEEE Symposium on Security and Privacy’10, pages 191–206, 2010. No guarantee of better privacy at a higher cost ↑ ⇏ privacy ↑ ↑ ⇏ overhead ↑ To make all inputs indistinguishable by rounding will result in a 21074% overhead for a well-known online tax system 1
  • 16. 7/21/2014 16 Two Conflicting Goals 31 To prevent side-channel attacks, we face two seemingly conflicting goals, Privacy protection: Reduce the differences in packet sizes Cost: Minimize the overhead (communication and processing…) The similar goals may allow us to borrow existing expertise on privacy-preserving data publishing(PPDP). How? Grouping and Breaking Slides 31-45 are partially based on: W. M. Liu, L. Wang, K. Ren, P. Cheng, M. Debbabi, S. Zhu, “PPTP: Privacy-Preserving Traffic Padding in Web-Based Applications,” IEEE Trans. on Dependable and Secure Computing (TDSC), Solution: Ceiling Padding 32 473 477 478 (c) c 477 477 478 (c) d 478 499 478 (d) b 499 499 516 (d) d 501 516 516 (a) c 516 516 516 (b) a S Value Padding (Prefix) charOption 1 Option 2 Quasi-ID Function 1 Function 2 Sensitive AttributeGeneralization PPTP: Padding group PPDP: anonymized group PPTP goals: Privacy Cost PPDP goals: Privacy Data utility So we can apply existing techniques in data publication to achieve ceiling padding However, there are a few difference, and hence challenges... Ceiling padding: pad every packet to the maximum size in the group
  • 17. 7/21/2014 17 33 PPTP Components Internet Interaction: action a: Atomic user input that triggers traffic A keystroke, a mouse click … action-sequence ܽԦ : A sequence of actions with complete input info Consecutive keystrokes… action-set Ai: Collection of all ith actions in a set of action-seq Observation: flow-vector v: A sequence of flows (directional packet sizes) Triggered an action vector-sequence ‫ݒ‬Ԧ : A sequence of flow-vectors Triggered by an equal-length action-sequence vector-set Vi: Collection of all ith vectors in a set of vector-seq Vector-Action Set VAi: Pairs of ith actions and corresponding ith flow-vectors User Input Observed Directional Packet Sizes a: 801→, ←54, ←509, 60→ 00: 812→, ←54, ←505, 60→, 813→, ←54, ←507, 60→ 34 Privacy and Cost k-indistinguishability: Given a vector-action set VA Padding group: any S⊆VA satisfying all the pairs in S have identical flow-vectors and no S’ ⊃S can satisfy this property We say VA satisfies k-indistinguishability (k is an integer) if the cardinality of every padding group is no less than k Goal of privacy protection: Upon observing any flow-vector in the traffic, the eavesdropper cannot determine which action in the table (vector-action set) has triggered this flow- vector. l-diversity: Address the cases that: No all inputs should be treated equally in padding (for example, some statistical information regarding the likelihood of different inputs may be publicly known).
  • 18. 7/21/2014 18 35 Privacy and Cost Vector-distance: Given two equal-length flow-vectors v1 and v2, vector-distance is the total number of bytes different in the flows: ‫ݏ݅݀ݒ‬ ‫ݒ‬ଵ, ‫ݒ‬ଶ = ∑ (|‫ݏ‬ଵ௜ − ‫ݏ‬ଶ௜|) ௩భ ௜ୀଵ . Padding cost: Given a vector-set V, the padding cost is the sum of the vector- distances between each flow-vector in V and its countpart after padding. Processing cost: Given a vector-set V, the processing cost is the number of flows in V which corresponding packets should be padded. Agenda 36 Cloud Applications Side-Channel Attacks Challenges and Solutions Ceiling Padding
  • 19. 7/21/2014 19 Challenge 1 37 473 477 478 (c) c 477 477 478 (c) d 478 499 478 (d) b 499 499 516 (d) d 501 516 516 (a) c 516 516 516 (b) a S Value Padding (Prefix) charOption 1 Option 2 Differences and challenges: Data utility measures & padding cost Traffic padding: cost of option 1 is worse than that of option 2 Data publication: utility of function 1 is better than that of option 2 38 Challenge 2 Recall that adversaries may combine multiple keystokes Example: One obvious, but invalid solution: Pad every keystroke (separately) Another obvious, but invalid solution: Pad on the whole string! First Keystroke Second Keystroke a b c d a 487 493 501 497 b 516 488 482 481 c 501 488 473 477 d 543 478 509 499 First Keystroke Second Keystroke a b c d a 487 493 501 497 b 516 488 482 481 c 501 488 473 477 d 543 478 509 499 First Keystroke Second Keystroke a b c d a … … 501 … b … 488 … … c 501 488 … … d … … … … First Keystroke Second Keystroke a b c d a 509 … … 501 … b 504 … 488 … … c 502 501 488 … … d 516 … … … … First Keystroke Second Keystroke a b c d a 516 … … 501 … b 504 … 488 … … c 504 501 488 … … d 516 … … … … Strings 1st keystroke 2nd keystroke ac 509 501 ca 502 501 ad 509 497 dd 516 499 … … …
  • 20. 7/21/2014 20 39 PPTP - Overview of Algorithms Intention: To demonstrate the existence of abundant possibilities in approaching PPTP issue, and not to design an exhaustive list of solutions. Design three algorithms for partitioning inputs into padding groups. Main difference: the algorithms handle in increasingly complicated cases . Computational complexity: svsdSimple algorithm: Ο ݈݊‫݊݃݋‬ svmdGreedy algorithm: Ο(݊‫݌‬ ⨯ ݊2) (worse case), Ο(݊‫݌‬ ⨯ ݈݊‫)݊݃݋‬ (average case) mvmdGreedy algorithm:Ο(݊‫݌‬ ⨯ ݊2) (worse case), Ο(݊‫݌‬ ⨯ ݈݊‫)݊݃݋‬ (average case) 40 Experiment Settings Collect data from two real-world web applications: A popular search engine (users’ search keyword needs to be protected) Collect flow-vectors for query suggestion widget for all possible combinations of four letters by crafting requests to simulate the normal AJAX connection request. Authoritative drug information system from national institute (user’s possible health information needs to be protected) Collect vector-action set for all the drug information by mouse-selecting following the application’s three-level tree-hierarchical navigation.
  • 21. 7/21/2014 21 41 Overhead - Padding Cost The padding cost against k: To compare to rounding, Δ=512 (engineB) and Δ=5120 (drugB) which achieves only 5-indistinguishility. Our algorithms have less padding cost in both cases. Observe that our algorithms are superior specially when the number of flow-vectors is larger. 42 Overhead – Execution Time Generate n-size flow data by synthesizing n/|VA| copies of engineB and drugB. The computation time of mvmdGreedy increases slowly with n. Practically efficient (1.2s for 2.7m flow-vectors), Require slightly more overhead than rounding when it is applied to a single Δ value. The computational time of mvmdGreedy against privacy property k A tighter upper bound: Ο(݊‫݌‬ ⨯ ݊ ⨯ 2݇ ⨯ λ) (worse case), Ο(݊‫݌‬ ⨯ ݊ ⨯ log (2݇ ⨯ λ)) (average case) The computation time increases slowly with k for engineB, and decreases slowly for drugB.
  • 22. 7/21/2014 22 43 Overhead – Processing Cost An application can choose to incorporate the padding at different stage of processing a request, however, we must minimize the number of packets to be padded. Pad the flow-vectors on the fly, Modify the original data beforehand. The processing cost against k: Rounding must pad each flow-vector regardless of the k’s and the applications, while our algorithms have much less cost for engineB and slightly less for drugB. Extension 44 Adapt l-diversity to address cases that: No all inputs should be treated equally in padding. Model: Catch the information about the inequality. Algorithms: Need additional constraints on partition.
  • 23. 7/21/2014 23 45 Experiments Collect data from two real-world web applications: Another popular search engine (users’ search keyword needs to be protected) Authoritative patent information system from national institute (company’s patent interest needs to be protected) 46 Challenge 3: Ceiling Padding Defeated Condition s Value Rounding (ΔΔΔΔ) Ceiling Padding112 144 176 Cancer 360 448 432 528 360 Cervicitis 290 336 432 352 360 Cold 290 336 432 352 290 Cough 290 336 432 352 290 Padding Overhead (%) 18.4% 40.5% 28.8% 5.7% 2-indistinguishability Observation: a patient received a 360-byte packet after login Cancer? Cervicitis? ⇒ 50% , 50% Extra knowledge: this patient is a male Cancer? Cervicitis? ⇒ 100% , 0% Facts (Ceiling Padding): Slides 46-58 are partially based on: W. M. Liu, L. Wang, K. Ren, M. Debbabi,, “Background Knowledge- Resistant Traffic Padding for Preserving User Privacy in Web-Based Applications,” Proc. The 5th IEEE International Conference and on Cloud Computing Technology and Science (IEEE CloudCom 2013),
  • 24. 7/21/2014 24 47 Solution: Add Randomness Condition s Value Cancer 360 Cervicitis 290 Cold 290 Cough 290 Cancerous Person 360 360 360 Random Ceiling Padding Instead of deterministically forming padding groups, the server will randomly (at uniform, in this example) selects one out of the other three conditions (together with the real condition) to form a padding group for ceiling padding. Always receive a 360-byte packet Condition s Value Cancer 360 Cervicitis 290 Cold 290 Cough 290 Cervicitis Patient 360 290 66.7%: 290-byte packet 33.3%: 360-byte packet 290 48 Better Privacy Protection Diseases s Value Cancer 360 Cervicitis 290 Cold 290 Cough 290 Can tolerate adversaries’ extra knowledge Suppose an adversary knows a patient is male and he saw s = 360 Patient Has Server Selects Cancer Cervicitis Cancer Cold Cancer Cough Cold Cancer Cough Cancer The adversary now can only be 60%, instead of 100%, sure that patient has Cancer. Cost is not necessarily worse In this example, these two methods actually lead to exactly the same expected padding and processing costs
  • 25. 7/21/2014 25 Analysis 49 Scenario: Algorithm: randomness drawn from uniform distribution Data: action-sequence and flow-vector are of length one Analysis of privacy preservation: Lemma 6.1 Analysis of costs: Lemma 6.2, Lemma 6.3 Model, Scheme and Experiment 50 Model: component, privacy, method, cost Scheme: Main idea: Server randomly selects members to form the group. Different choices of random distribution lead to different algorithms. Two instantiations of scheme: Bounded uniform distribution; Normal distribution Computation complexity: O(k) Experiment: Data from real-world web applications Low overheads and high uncertainty
  • 26. 7/21/2014 26 51 Privacy Properties k-Indistinguishability: For any flow-vector, at least k different actions can trigger it. Given vector-action set VA, padding algorithm M, range Range(M,VA) ∀ ‫ݒ‬ ∈ ܴܽ݊݃݁ ‫,ܯ‬ ܸ‫ܣ‬ , ܽ: Pr (‫ܯ‬ିଵ ‫ݒ‬ = ܽ > 0 ∧ ܽ ∈ ‫ܣ‬ ≥ ݇ Model the privacy requirement of a traffic padding from two perspectives Uncertainty: Apply the concept of entropy in information theory to quantify an adversary’s uncertainty about the real action performed by a user. Given vector-action sequence ܸ‫,ܣ‬ padding algorithm M ߮ ‫,ݒ‬ ܸ‫,ܣ‬ ‫ܯ‬ = − ෍ Pr (‫ܯ‬ିଵ ‫ݒ‬ = ܽ logଶ Pr (‫ܯ‬ିଵ ‫ݒ‬ = ܽ ௔∈஺ ߶ ܸ‫,ܣ‬ ‫ܯ‬ = ෍ ߮(‫,ݒ‬ ܸ‫,ܣ‬ ‫)ܯ‬ × Pr (‫ܯ‬ ‫ܣ‬ = ‫)ݒ‬ ௩∈ோ௔௡௚௘(ெ,௏஺) Φ ܸ‫,ܣ‬ ‫ܯ‬ = ෑ ߶(ܸ‫,ܣ‬ ‫)ܯ‬ ௏஺∈௏஺ 52 Privacy Properties (cont.) δ-uncertain k-Indistinguishability: An algorithm M give δ-uncertain k-Indistinguishability for a vector-action sequence ܸ‫ܣ‬ if M w.r.t. any ܸ‫ܣ‬ ∈ ܸ‫ܣ‬ satisfies k-indistinguishability, and The uncertainty of M w.r.t. ܸ‫ܣ‬ is not less than δ.
  • 27. 7/21/2014 27 53 Padding Method Ceiling padding [15][16]: Inspired by PPDP: grouping and breaking Dominant-vector of a padding group: Size of each group is not less than k, and every flow-vector in a group is padded to dominant-vector of that group. Achieves k- indistinguishability, but not sufficient if the adversary possess prior knowledge. Random ceiling padding method: A mechanism M: when responding to an action a (per each user request), It randomly selects k-1 other actions, and Pads the flow-vector of action a to be dominant-vector of transient group (those k actions). Randomness: Randomly selects members of transient group from certain candidates based on certain distributions. To reduce the cost, change the probability of an action being selected as a member of transient group. 54 Cost Metrics Expected processing cost: How many flow-vectors need to be padded ‫ݏ݋ܿݎ‬ ܸ‫,ܣ‬ ‫ܯ‬ = ∑ ∑ ୔୰ ெ(௔)ஷ௩(ೌ,ೡ)∈ೇಲೇಲ∈ೇಲ ∑ ௔,௩ : ௔,௩ ∈௏஺ ೇಲ∈ೇಲ Expected padding cost: The proportion of packet size increases compared to original flow-vectors Given vector-action sequence ܸ‫,ܣ‬ padding algorithm M, (ܽ, ‫:)ݒ‬ ‫ݏ݋ܿ݌‬ ܽ, ܸ‫,ܣ‬ ‫ܯ‬ = ෍ Pr ‫ܯ‬ ܽ = ‫ݒ‬ᇱ × ‫ݒ‬ᇱ − ‫ݒ‬ ௩ᇱ∈ோ௔௡௚௘(ெ,௏஺) ଵ ܸ‫:ܣ‬ ‫ݏ݋ܿ݌‬ ܸ‫,ܣ‬ ‫ܯ‬ = ෍ (‫,ܽ(ݏ݋ܿ݌‬ ܸ‫,ܣ‬ ‫))ܯ‬ ௔,௩ ∈௏஺ ܸ‫:ܣ‬ ‫ݏ݋ܿ݌‬ ܸ‫,ܣ‬ ‫ܯ‬ = ෍ (‫,ܣܸ(ݏ݋ܿ݌‬ ‫))ܯ‬ ௏஺∈௏஺
  • 28. 7/21/2014 28 55 Overview of Scheme Main idea: To response a user input, server randomly selects members to form the group. Different choices of random distribution lead to different algorithms. Goal: The privacy properties need to be ensured. The costs of achieving such privacy protection should be minimized. Two stage scheme: Stage 1: derive randomness parameters, one-time, optimization problem; Stage 2: form transient group, real-time. 56 Scheme (cont.) Computational complexity: O(k) Stage 1: pre-calculated only once Stage 2: select k-1 random actions without duplicate, O(k). Discussion on privacy: The adversary cannot collect vector-action set even acting as normal user, ܸ‫ܣ‬ = 100, ݇ = 20, ‫ ݉ݎ݋݂݅݊ݑ‬ ⟹ ‫)݊݋݅ݐܿܽ ݊ܽ( ݏ݌ݑ݋ݎ݃ ݐ݊݁݅ݏ݊ܽݎݐ‬ = 99 19 ≈ 2଺଺ Approximate the distribution is hard: all users share one random process. Discussion on costs: Deterministically incomparable with those of ceiling padding.
  • 29. 7/21/2014 29 57 Instantiations of Scheme Bounded uniform distribution: ct: cardinality of candidate actions cl: number of larger candidates Scheme can be realized in many different ways. Choose group members from different subsets of candidates and based on different distributions, in order to reduce costs. Normal distribution: µ: mean σ: standard deviation 0 n i action cl ct 0 n i action largest smallest largest smallest probability [max(0,min(i-cl, |VA|-ct)), min(max(0,min(i-cl, |VA|-ct))+ct, |VA|)] (return) 58 Uncertainty and Costs v.s. k The padding and processing costs of all algorithms increase with k, while TUNI and NORM have less than those of SVMD. Our algorithms have much larger uncertainty for Drug and slightly larger for Engine.