Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

1,520 views

1,386 views

1,386 views

Published on

network analysis to network security."

No Downloads

Total views

1,520

On SlideShare

0

From Embeds

0

Number of Embeds

250

Shares

0

Downloads

0

Comments

0

Likes

2

No embeds

No notes for slide

- 1. The Relational Event Analysis Toolkit Hadoop World 2011, New York, NY Josh Lospinoso Guy Filippellijosh@redowlconsulting.com guy@redowlconsulting.com
- 2. What’s the problem?We want to say something intelligent about howindividuals behave in networks: – General tendencies in behavior – Systematic deviations from general tendencies – Anomalous behavior & event detection – Key event mining
- 3. …but everything depends on everythingWhat weighs in on how often: – Computers send packets to each other – Co-workers email each other – Businesses transact goods – Proteins interact These data sets are huge, jumbled, dependent messes.
- 4. …but everything depends on everythingIf we ignore all of this stuff—and it’s the mostinteresting stuff—we’ve already lost: – How can we explain network phenomena without context? – How much information do we amputate from the data? We can’t treat people, computers, etc. like particles
- 5. This changes everything• Truly revolutionary work in statistical social network analysis from academia in the past five years• Cheap commodity computer hardware• Workable frameworks for computing on this cheap hardware
- 6. What influences Joshto email Guy with higher frequency?
- 7. Reciprocity
- 8. Similarity
- 9. Similarity
- 10. Similarity
- 11. Jeremy normally sends15 emails per day.
- 12. Guy sends Jeremy anemail tomorrowmorning.
- 13. Do we expect Jeremyto send more emailstomorrow?
- 14. How statisticians think of the worldThe world is chaotic and unpredictable, but itexhibits tendencies.We model why events occur in a networkthrough tendencies surrounding e.g. human orcomputer behavior.“All models are wrong, some are useful”
- 15. A C B
- 16. A C BThere’s a 1/6 chance—even odds—any email occurs.
- 17. A C B A emails BLet’s say this increases the rate of emails from B to A
- 18. A C B
- 19. A C BThere’s now a 5/10 chance that the next email we observe is from B to A because of reciprocity.
- 20. Keep in mind that we don’t pick these odds out of thin air.We hypothesize about what behaviors are important—
- 21. Keep in mind that we don’t pick these odds out of thin air.We hypothesize about what behaviors are important— then let the data tell us: • What’s important? • How important is it? • How uncertain are we?
- 22. At any moment, for 3 people, there are 6possible directed events that could occur.We model the rates at which these events tendto occur through effects like similarity,reciprocity, the “trickle-down” effect, etc.So, we need to keep track of the context foreach of these possible relationships.
- 23. …but what if we want to analyze a network of25K people?
- 24. …but what if we want to analyze a network of25K people?That’s nearly 625M events that could occur atany instant! [ 25000 x 24999 ]This is impossible without a scalable computeplatform.
- 25. HadoopWe can use cheap hardware to:• Store large amounts of data• Perform statistical modeling on this data
- 26. Modeling and EstimationIn a simple model, suppose we want to baselinereciprocity of text messagesIf I am sent a text message, does that increasethe rate that I send text messages to the sender?
- 27. Modeling and EstimationIn a simple model, suppose we want to baselinereciprocity of text messagesIf I am sent a text message, does that increasethe rate that I send text messages to the sender?We could elaborate this baseline to see who isbad at responding to emails!
- 28. Modeling and EstimationDefine:• Set of three people { A, B, C }• Messages < , , ∈ where s is sender, r is receiver, t is timestamp
- 29. Modeling and EstimationDefine:• Set of three people { A, B, C }• Messages , , ∈ where s is sender, r is receiver, t is timestamp• The reciprocation function , , = #{ , , ∈ }• The rate function , , = { 1 + 2 , , }
- 30. Modeling and Estimation , , =0 1 = .5 per day
- 31. Modeling and Estimation , , =1 , , =0
- 32. Modeling and EstimationWe have to estimate these rate functions givenan event history:A,B,.1 B,A,.2 C,B,.7 C,A,1...
- 33. Modeling and EstimationWe want to maximize the probability density ofour model over our data.This is called maximum likelihood estimation.Based on a few easy to compute derivatives, wework our way forward through the data…
- 34. Modeling and EstimationFor our first event, A,B,.1 we have: , , =0 , , =0 , , =0 , , =0 , , =0 , , =0
- 35. Modeling and Estimation For our second event, B,A,.2 we have:Event History: , , =0A,B,.1 , , =0 , , =1 , , =0 , , =0 , , =0
- 36. Modeling and Estimation For our third event, C,B,.7 we have:Event History: , , =1A,B,.1 , , =0B,A,.2 , , =1 , , =0 , , =0 , , =0
- 37. Modeling and Estimation For our fourth event, C,A,1. we have:Event History: , , =1A,B,.1 , , =0B,A,.2 , , =1C,B,.7 , , =1 , , =0 , , =0
- 38. Modeling and Estimation For our fourth event, C,A,1. we have:Event History: , , =1A,B,.1 Note that we must , , =0 know the wholeB,A,.2 , , =1 event history!C,B,.7 , , =1 , , =0 , , =0
- 39. Modeling and EstimationKnowing all of these statistics combinations, wecan maximize the likelihood function over theparameter space for .
- 40. MapReduce -- OptimizationFor each observation / statistics pair, wecalculate the Log-Likelihood, its first, and itssecond derivative (“contributions”)MapObserv.; CollectionObserv. = Observ.; ContributionsReduceObserv.; Contributions= Null;AddedContributions
- 41. Example -- Enron~.5M messages between 150 senior managersAvailable from http://www.cs.cmu.edu/~enron/Baselining for full dataset is not yet completeWe present and interpret a smaller dataset here(between top 10 most active users)
- 42. Person Sent Received Totaljeff.dasovich@enron.com 11566 4961 16527tana.jones@enron.com 9947 4416 14363sara.shackleton@enron.com 5849 4226 10075kay.mann@enron.com 6445 2098 8543chris.germany@enron.com 6903 1312 8215louise.kitchen@enron.com 1950 3645 5595vince.kaminski@enron.com 4146 1436 5582gerald.nemec@enron.com 2668 2680 5348mark.taylor@enron.com 2351 2951 5302susan.mara@enron.com 2596 2008 4604
- 43. Example -- EnronEffect Estimate Std. ErrorOutdegree -.204 (.02)Reciprocity .576 (.03)*Due to the duration of the dataset, we use a decay function todown-weight older events: exp − − .Here, log 2/ is called the “half life”We use a “half life” of one week (~.1) , T: days
- 44. Example -- EnronWe can draw a baseline. So what?Now we consider what would happen if weadmitted fixed effects parameters: Candidate Sender Fixed Effect Candidate Receiver Fixed Effect
- 45. Example – Enron Fixed EffectsPerson Effect FE Position Estimate (SE)jeff.dasovich@enron.com Outdegree Sender .175 (.07)jeff.dasovich@enron.com Outdegree Receiver -.04 (.03)jeff.dasovich@enron.com Reciprocity Sender -.24 (.10)jeff.dasovich@enron.com Reciprocity Receiver .19 (.07)tana.jones@enron.com Outdegree Sender .14 (.06)tana.jones@enron.com Outdegree Receiver .02 (.09)…*Estimates obtained by conducting a full Newton-Raphson step evaluated at the baseline.
- 46. The Relational Event Analysis Toolkit Hadoop World 2011, New York, NY Josh Lospinoso Guy Filippellijosh@redowlconsulting.com guy@redowlconsulting.com

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment