Security’s in your DNA:
Genomics for InfoSec
Rob Bird
@conduit242
What is the most efficient way to
analyze a sequence of events?
What’s a genome?
• The genetic material of an organism
• A redundant encoding of instructions
• A big sequence of letters
HIV
tggatgggttaatttactccaagcaaagacaagatatccttgatctgtgggtctaccacacacaaggctacttccctgattggcagaattacacaccagggccaggagtcagataccc...
Basics
• Letters (nucleotides)
– 4 in DNA, A,G,C,T
• Codons
– Triplets of nucleotides e.g. GAA
• Genomes have coding regio...
It’s all about the Codons
• The Genetic Code is a dictionary of
Codons
• 64 entries (4^3)
Analyzing Genomes
• Compare them to each other
– Alignments (e.g. Smith-Waterman, etc.)
– Distances
• Levenshtein (edit) d...
Analyzing Genomes
• Look for interesting regions
– Information gain (Kullback-Leibler Div)
– Coding Costs (Kolmogorov Comp...
Rule 1:
Size doesn’t matter
Smallest
(almost)
• Mycoplasma Genitalium
• 580,000 bp
Largest
• Polychaos Dubium
• 670 billion bp
Rule 2:
Repetition matters
Don’t say that again
• Sections of DNA that do not repeat are the
most important
• Protein coding genes and RNA coding
gen...
Rule 3:
Compression is hard
Putting the squeeze on
• Normal compressors ~ 2bit codes
• Special genetic compressors exist
• Compressibility equates to ...
So what does this have to do with
security???
A Question
If we could convert sequences of logs,
packets, etc. to a genomic encoding, could
we use genomic analysis to dr...
How?
• Step 1: Convert events into alphabet
• Step 2: Convert stream into string of
letters
• Step 3: Money bath
A Naïve Solution
• Step 1: Hash each input, use hash value
as a letter
• Step 2: Create stream of hash values
• Step 3: #f...
Answer
• The alphabet is too big
• The stream will need at least
2^(2^<hash_key_size) examples
• Stream is virtually unpre...
Enter
blar.py
WTF is a ‘blarp’?
• Let’s ask Google
• The sound a fat person makes being fat
• The sound of taking big fat data and
makin...
Idea
• We want similar events to be represented
by a single letter
• Hashes are random projections
• Let’s use geometry in...
Position in space
• To precisely locate something in space D,
you need dist. to n=D+1 reference points
• Key notion: To ge...
Locality-Sensitive Hashing
• Created by Yahoo in late 90’s
• Used within indexing for text lookups on
massive data sets
• ...
How it works
• Basic type: Random Projection
• Given a numeric vector (e.g. 1, 15, 3,
14.8) calculate its dot product vs. ...
Blar.py Pipeline
Vectorize
Input
Find
Locality
Sensitive
Hash
Convert
to UTF-
16 char
Output
stream
of UTF-
16
Analyze
sli...
Vectorizing
• Idea: Count things that matter, take
measurements, etc. and create an array to
hold that information
• Where...
Basic Vectorizing in Blar.py
• Basic model: character n-grams
• Also known as Markov chains or Bag of
Letters
• Counts up ...
Let’s Vectorize Better
• Use Feature Hashing otherwise known as
the hashing trick
• Find hash mod length and increment
cou...
Blar.py code
1. def feature_hash_string(s, window, dim):
2. # Generate window-char Markov chains & create feature hashes
3...
Now let’s find the LSH
1. # Use random projection for LSH and output a UTF char for
the locality-sensitive hash
2. def loc...
Blar.py analysis
• Analyzes 4 character sequences and
assigns a decaying version of the optimal
coding cost to each line
•...
Blar.py defaults (ATM)
• 4 character sliding windows
• 4 bit hashes
• 64d feature hashes
• Outputs a list of the most inte...
Blar.py vs. Toy File
1. Mary had a little lamb whose fleece was white as snow.
2. Mary had a little lamb whose fleece was ...
Blar.py vs. Toy File
Blar.py vs. Toy File
(Look Raffy, I’m using the completely inappropriate chart type)
Blar.py vs. BlueGene/L
• From the Usenix Computer Failure Data
Repository
• 1.2GB combined log file from 131,072
processor...
Blar.py vs. BlueGene/L
Blar.py vs. BlueGene/L
TL;DR
• Fast, accurate, free: Blar.py genomic
encoding tool provides very fast, low noise
anomaly detection
• Stop searchi...
Security's in your DNA: Genomics for InfoSec
Security's in your DNA: Genomics for InfoSec
Upcoming SlideShare
Loading in …5
×

Security's in your DNA: Genomics for InfoSec

1,402 views
1,183 views

Published on

Releases the blar.py tool which creates a genomic encoding from text files. This encoding results in a lossy, highly compressible representation of the original file that can be used for rapid anomaly detection and forensic analysis.

Published in: Data & Analytics
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,402
On SlideShare
0
From Embeds
0
Number of Embeds
50
Actions
Shares
0
Downloads
10
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Security's in your DNA: Genomics for InfoSec

  1. 1. Security’s in your DNA: Genomics for InfoSec Rob Bird @conduit242
  2. 2. What is the most efficient way to analyze a sequence of events?
  3. 3. What’s a genome? • The genetic material of an organism • A redundant encoding of instructions • A big sequence of letters
  4. 4. HIV tggatgggttaatttactccaagcaaagacaagatatccttgatctgtgggtctaccacacacaaggctacttccctgattggcagaattacacaccagggccaggagtcagatacccactaacatttggatggtgcttcaagctagtaccagttgatccagatgaagtag agaaggatactgagggagagaacaacagcctattacaccctatatgccaacatggaatggatgatgaggagaaagaagtattaaggtggaaatttgacagccgcctggcactaaaacacagagcccaagagatgcatccggagttctacaaagactgctgacac agaagttgctgacagggactttccgctgggactttccaggggaggtgtggtttgggcggagttggggagtggccaaccctcagatgctgcatataagcagctgcttttcgcttgtactgggtctctctaggtagaccagatccgagcctgggagctctctggctatctgg ggaacccactgcttaagcctcaataaagcttgccttgagtgctctaagtagtgtgtgcccgtctgttgtgtgactctggtaactagagatccctcagaccactctagactgagtaaaaatctctagcagtggcgcccgaacagggactcgaaagcgaaagtaagaccag agaagttctctcgacgcaggactcggcttgctgaggtgcacacagcaagaggcgagagcggcgactggtgagtacgccaatttttgactagcggaggctagaaggagagagatgggtgcgagagcgtcagtattaagcgggggaaaattagatgcatgggaga gaattcggttaaggccagggggaaagaaaaaatatagaatgaaacatctagtatgggcaagcagggagctggaaagatttgcacttaaccctggcctgttagaaacaacagaaggatgtcaacaaataatagaacagttacaaccagctctcaagacaggaacag aagaacttagatcattatttaatacagtagtaaccctctattgtgtacatcaacggatagaggtaaaagacaccaaggaagctctagataaaatagaggaaatacaaaataagagcaagcaaaagacacaacaggcagcagctgccacaggaaacagcagcaatgt cagccaaaattaccctatagtgcaaaatgcacaagggcaaatggtacaccaggctgtatcacctaggacattgaatgcatgggtgaaggtaatagaagaaaaggctttcagcccagaagtaatacccatgttctcagcattgtcagaaggagccaccccacaagatt taaatatgatgctaaacatagtggggggacaccaggcagctatgcagatgttgaaagataccatcaatgaggaagctgcagaatgggacaggttacatccagtacaggcagggcctattccaccaggccaattgagagaaccaaggggaagtgacatagcagga actactagtacccctcaagaacaaataggatggatgacaggcaacccacctattccagtgggagacatctataaaagatggataatcctgggattaaataaaatagtaagaatgtatagccctgttagcattttggacgtaaaacaagggccaaaagaacccttcaga gactatgtagataggttctttaaaattctcagagctgagcaagctacacaggaggtaaaaggttggatgacagaaaccttgctggtccaaaatgcaaatccagattgtaagtccattttaagagcactaggaacaggagctacattagaagaaatgatgacagcatgcc agggagtgggaggacccggccataaagcaagggttttggctgaggcaatgagtcaagtacaacatacaaacataatgatgcagagaggcaattttaggggtcagagaaggatgattaaatgtttcaattgtggcaaagaaggacacctagccagaaattgcagag cccctaggaaaaagggctgttggaaatgtgggaaagagggacaccaaatgaaggactgcactgaaagacaggctaattttttagggaaaatttggccttccagcaaggggaggccaggaaactttccccagagcaggccagagccaacagccccaccagcag agctctttgggatggaggaagaaaaaacctccgctctgaagcaggagcagaaggacaggaaacaggacccacctttagtttccctcaaatcactctttggcaacgaccccttgtcacagtaaaagtagggggacagctaaaagaagctctattagatacaggagca gatgacacagtattagaagatataaatttgccaggaaaatggaaaccaagaatgatagggggaattggaggttttatcaaagtaaaacagtatgatcagatacttatagaaatttgtggaaaaaaggctataggtacagtattagtaggacccacacctgtcaacataatt ggaaggaatatgttgacccagattggatgtactttaaatttcccaattagtcctattgagactgtgccagtaaaattaaagccaggaatggatggcccaaaggttaaacaatggccattgacagaagaaaaaataaaagcattaacagaaatttgtacagatatggaaaa ggaaggaaaaatttcaagaattgggcctgaaaatccatacaatactccaatatttgctataaagaaaaaagacagcactaaatggaggaaactagtagatttcagagagctcaataaaagaacacaagacttttgggaagttcaattgggaataccgcatccagcggg cctaaaaaagaaaaaatcagtaacagtactagatgtgggggacgcatatttttcagttcctttagatgaaagctttagaaagtatactgcgttcaccatacctagtacaaataatgagacaccaggaatcaggtatcaatacaatgtgctgccacagggatggaaaggat caccggcaatattccagagtagcatgacaaaaatcttagagccctatagatcaaagaatccagaaataattatctatcaatacatggatgacttgtatgtaggatctgatttagaaatagggcagcatagaacaaaaatagaggagttgagagctcatctattgagctggg gatttactacaccagacaaaaagcatcaaaaagaacctccatttctttggatggggtatgaactccatcctgacaaatggacagtacagcctatacaactgccagaaaaggatagctggactgtcaatgatatacagaagttggtggggaaactgaattgggcaagtca aatttatgcagggattaaagtaaagcaactgtgcaaactcctcaggggagccaaagcactaacagaggtagtaactctgactgaggaagcagaattagaattggcagagaacagggaaattctaaaagaccctgtgcatggagtatattatgacccatcaaaagaatt aatagcagaaatacagaaacaagggcaagaccaatggacatatcaaatttatcaagagccatttaaaaatctaaaaacaggaaaatatgcaagaaaaaggtctgctcacactaatgatgtaaagcaattagcagaagtggtgcaaaaggtggtcatggagagcata gtaatatggggaaagactcctaaatttaaactacccatacaaaaagagacatgggaaacatggtggatggactattggcaggctacctggattcctgaatgggagtttgtcaatacccctcccctagtaaaattgtggtaccagttagagaaagaccctatagcaggag cagaaactttctatgtagatggggcagccaatagggagactaagctaggaaaagcagggtatgtaactgacagaggaagacaaaaggttgtttccctaactgagacaacaaatcaaaagactgaactacatgcaatccatctagccttacaggattcaggatcagaa gtaaacatagtaacggactcacagtatgcattaggaatcattcaggcacaaccagacaggagtgaatcagaattagtcaatctaataatagaggagctaatagaaaaggacaaggtctacctgtcatgggtaccagcacacaaaggaattggaggaaatgaacaag tagataaattagtcagttccggaattaggaaggtgctgtttttagatgggatagataaagctcaagaagaacatgaaagatatcacagcaattggaaagcaatggctagtgattttaatctgccacctatagtagcaaaggaaatagtagccagctgtgataaatgccaac taaaaggagaagccatgcatggacaggtagactgtagtccaggaatatggcaattagattgcacacatctagaaggaaaagtaatcctggtagcagtccatgtagccagtggttatatagaagcagaagttatcccagcagaaacaggacaagagacagcatacttt ctactaaaattagcaggaagatggccagtaaaagtagtacacacagacaatggaggcaatttcaccagtgctgcagttaaagcagcctgttggtgggcaaatatccaacaggaatttgggattccctacaatccccaaagtcaaggagtagtggaatctatgaataaa gaattaaagaaaatcatagggcaggtaagagatcaagctgaacatcttaagacagcagtacaaatggcagtattcattcacaattttaaaagaaaaggggggattggggggtacagtgcaggggaaaggataatagacataatagcaacagacatgcaaactaaag aattacaaaaacaaattacaaaaattcaaaattttcgggtttattacagggacagcagagatccaatttggaaaggaccagcaaaactactctggaaaggtgaaggggcagtagtaatacaggacaatagtgatatcaaggtagtaccaagaagaaaagcaaagatc attagggattatggaaaacagatggcaggtgatgattgtgtggcaggtagacaggatgaggattagaacatggaacagtttagtaaaatatcatatgtatgtctcaaagaaagctcgaaagtggctctatagacatcactatgatagcaggcatccaaaagtaagttcag aagtacacatcccactaggggatgctagattagtagtaagaacatattggggtctgcatacaggagaaaaagactggcaattgggtcacggggtctccatagaatggaggctaagaagatatagcacacaaatagatcctgacctagcagaccaactaattcatctg cattattttgactgtttttcagaatctgccataaggagagccatattaggacaagtagttagccctaggtgtgtatatccaacaggacataaccaggtaggatccctacaatatctagcactgaaggcattagtaacaccaataaagacaagaccacctttgcctagtgttaa gatattaacagaggatagatggaacaagccccagaagaccaggggccacagagggaaccatacaatgaatggatgttagaactgttagaagatcttaaacatgaagcagttagacactttcctagaccatgggctaggacaacatatatataacacctatggggata cttgggaaggagtcgaagctatagtaagaattttgcaacaactactgtttgttcatttcagaattgggtgccaacatagcagaataggcattattcaagggagaagagtcagaaatggagccggtagatcctaacttagagccctggaaccatccgggaagtcagccta caactgcttgtaccaagtgttactgtaaaaagtgttgctatcattgcctagtttgctttctgaacaaaggcttaggcatctcctatggcaggaagaagcggagcaagcgacgacgaactcctcacagcagtaaggatcatcaaaatcctataccaaagcagtaagtatca gtaattagtatatgtaatgagtcctttagaaatctgtgcaatagtaggattgatagtagcgctaatcatagcaatagttgtgtggactatagtaggtatagaatataagagattgttaaagcaaaggaaaatagacaggttaattaagaaaatacgagaaagagcagaagac agtggcaatgagagtgatggggacatggatgaattggcaaaacttgtggagagggggaactatgatcttggggatgttaatgatctgtagtactgcagaaaacttgtgggttactgtctactatggggtacctgtgtggaaagatgcagaaaccaccttattttgtgcatca gatgctaaagcatacgacacagaggcgcataatgtctgggctacacatgcctgtgtacccacagaccccaacccacaagaaatatatttggaaaatgtgacagaagagtttaacatgtggaaaaataacatggtagagcagatgcatacagatataatcagtctatgg gatcaaagcctaaagccatgtgtacagttaacccctctctgcgttactttaaattgtaataacatcaccatcaataacatcaccaccaacatcactgaggacatgagaggagaaataaaaaactgctcgtacaatatgaccacagtattaagggataagagaaggaaag tgtattcacttttttatagacttgatatagtaccacttgatgaggggaataataactctgctgggagtagtgactatagattaataaattgtaatacctcaaccataacacaagcctgtccaaaggtctcttttgacccaattcctatacattattgtgctccagctggttttgcgattc taaaatgtaaggatccagatttcaatggaacagggccatgcaagaatgtcagcacagtacaatgcacacatggaatcaagccagtagtatcaactcaactgctgttaaatggcagtctagcagaaggaaaggtaagaattagatctgaaaatattacaaacaatgcca aaaacataatagtacaacttgtcaagcctgtaaaaattaattgtgtcagacctaacaacaatacaagaacaagtgtacgtataggaccaggacaaacattctatgcaacaggtgaaataataggggatataagacaagcattttgtactgtcaatgaatcagaatggaat gaaactttacaacaggtagctacgcaattaagagaacactttgagaacaaaacaataaaatttactaactcctcaggaggggatttagaaattacaacacatagctttaattgtggaggagaatttttctattgtaatacatcaggcctgtttaatagcacctggaataataat aataccagggagaagataaatggtacagagtcaaatagcactataactctccattgcagaataaagcaaattataaataggtggcaggaagtaggacaagcaatgtatgcccctcccatcccaggagtaataaattgtagatcaaacattacaggactaatattaacaa gagatggtggggatggggataacaatacggaaatcttcagacctggaggaggaaatatgaaggacaattggagaagtgaattatataagtataaagtagtaaaaattgaaccactgggagtagcacccaccagggctaagagaagagtggtggagagagcaaaa agagcagttggaataggagctgttttccttgggttcttaggagcagcaggaagcactatgggcgcggcgtcaataacgctgacggtacaggccagacaattattgtctggcatagtgcaacagcaaagcaatttgctgagggctatagaggctcaacaacatctgttg aaactcacggtctggggcattaaacagctccaggcaagagtccttgctgtggaaagatacctgcaggatcaacagctcctaggaatttggggctgctctggaaaactcatctgcaccactaatgtgccctggaactctagttggagtaataaatctcagagtgagatat gggagaacatgacctggctgcaatgggataaagaaattagcagttacacaggcataatatataaactaattgaagaatcgcagaaccagcaggaaaagaatgaacaagacttattggcattggacaagtgggcaagtctatggaattggtttgaaatatcaaagtggc tgtggtatataaaaatatttataatgatagtaggaggattaataggattaagaatagtttttgctgtgctttctataatcaatagagttaggcagggatactcacctttgtcatttcagacccacaccccaaacccaagggaacccgacaggcccgaaagaatcgaagaaga aggtggagagcaaggcagagacagatcgatacgcttagtgagcggattcttagcacttgcctgggacgacctacggagcctgtgccttttcagctaccaccgcttgagagacttcatcttgattgcagcgaggactgtggaacttctgggacacagcagtctcaagg ggttgagactggggtgggaaagcctcaagtatctggggaatcttctgctatattggagtcaggaactaaaaattagtgctgttaatttagttgataccatagcaatagcagtagctggctggacagataggattatagaaacaggacaaagattttgtagagctcttctcaa cgtacctagaagaatcagacaaggatttgaaagggctctgctataacatgggtggcaagtggtcaaaaagtagcatagtgggatggcctgagattagggaaagaatgaggcgtgctcctccagcagcaaaaggagtaggagcagtatctcaagatttagataaattt
  5. 5. Basics • Letters (nucleotides) – 4 in DNA, A,G,C,T • Codons – Triplets of nucleotides e.g. GAA • Genomes have coding regions (proteins) & non-coding regions (other) • One strand can be read forward, the other in reverse
  6. 6. It’s all about the Codons • The Genetic Code is a dictionary of Codons • 64 entries (4^3)
  7. 7. Analyzing Genomes • Compare them to each other – Alignments (e.g. Smith-Waterman, etc.) – Distances • Levenshtein (edit) distance (metric) • Longest Common Subsequence distance (metric) • Normalized Compression Distance (metric) – Optimal Grammars • Pisa.c: Optimal sequence grammar search using hyperstring encodings
  8. 8. Analyzing Genomes • Look for interesting regions – Information gain (Kullback-Leibler Div) – Coding Costs (Kolmogorov Complexity) – Decaying Coding Costs (Lossy Kolmogorov Complexity)
  9. 9. Rule 1: Size doesn’t matter
  10. 10. Smallest (almost) • Mycoplasma Genitalium • 580,000 bp
  11. 11. Largest • Polychaos Dubium • 670 billion bp
  12. 12. Rule 2: Repetition matters
  13. 13. Don’t say that again • Sections of DNA that do not repeat are the most important • Protein coding genes and RNA coding genes are non-repetitive • Higher-order creatures are largely repetitive
  14. 14. Rule 3: Compression is hard
  15. 15. Putting the squeeze on • Normal compressors ~ 2bit codes • Special genetic compressors exist • Compressibility equates to sequence predictability for the model in use
  16. 16. So what does this have to do with security???
  17. 17. A Question If we could convert sequences of logs, packets, etc. to a genomic encoding, could we use genomic analysis to dramatically speed up & improve forensics, incident response and anomaly detection?
  18. 18. How? • Step 1: Convert events into alphabet • Step 2: Convert stream into string of letters • Step 3: Money bath
  19. 19. A Naïve Solution • Step 1: Hash each input, use hash value as a letter • Step 2: Create stream of hash values • Step 3: #fail Why?
  20. 20. Answer • The alphabet is too big • The stream will need at least 2^(2^<hash_key_size) examples • Stream is virtually unpredictable
  21. 21. Enter blar.py
  22. 22. WTF is a ‘blarp’? • Let’s ask Google • The sound a fat person makes being fat • The sound of taking big fat data and making it useful & efficient small data • A cool little python tool for creating and analyzing genomic encodings • The last two will not be found on Google…yet
  23. 23. Idea • We want similar events to be represented by a single letter • Hashes are random projections • Let’s use geometry instead
  24. 24. Position in space • To precisely locate something in space D, you need dist. to n=D+1 reference points • Key notion: To get something’s general area you can use n<<D+1 reference points
  25. 25. Locality-Sensitive Hashing • Created by Yahoo in late 90’s • Used within indexing for text lookups on massive data sets • Many hashes; data-type dependent • Question: What if you thought about it as a ‘general area’ hash instead?
  26. 26. How it works • Basic type: Random Projection • Given a numeric vector (e.g. 1, 15, 3, 14.8) calculate its dot product vs. a random vector • If result is positive, call it a ‘1’ • If negative, call it a ‘0’ • Repeat • Concatenate binary together, result is LSH
  27. 27. Blar.py Pipeline Vectorize Input Find Locality Sensitive Hash Convert to UTF- 16 char Output stream of UTF- 16 Analyze sliding window over genome stream Score Chart stuff
  28. 28. Vectorizing • Idea: Count things that matter, take measurements, etc. and create an array to hold that information • Where the rubber meets the road • Lots of chances for domain expertise
  29. 29. Basic Vectorizing in Blar.py • Basic model: character n-grams • Also known as Markov chains or Bag of Letters • Counts up sliding windows of text • E.G. 2-grams for ‘sassyfrassy’ sa: 1 as: 2 ss: 2 sy: 2 yf: 1 fr: 1 ra: 1 For 256^2 length array (1,0…0,2,0…0,2,0…
  30. 30. Let’s Vectorize Better • Use Feature Hashing otherwise known as the hashing trick • Find hash mod length and increment counter for each model pattern • Permits lossy counting with graceful random collisions • Blar.py uses length 64 by default and xxHash
  31. 31. Blar.py code 1. def feature_hash_string(s, window, dim): 2. # Generate window-char Markov chains & create feature hashes 3. chains = [(xxhash.xxh32(s[i:i+window]) % dim) for i in xrange(len(s)-(window-1))] 4. # Initialize counter array 5. counters = numpy.zeros(dim) 6. # Count instances of feature hashes 7. for i in range(len(chains)): 8. counters[chains[i]] += 1 9. # Return feature hash count vector 10. return counters
  32. 32. Now let’s find the LSH 1. # Use random projection for LSH and output a UTF char for the locality-sensitive hash 2. def locality_hash_vector(v, width): 3. hash = numpy.zeros(width, dtype=int) 4. for x in range(0, width - 1): 5. projection = numpy.dot(COMP_VECTORS[x], v) 6. if projection < 0: 7. hash[x] = 0 8. else: 9. hash[x] = 1 10. # Return unicode char equal to the LSH 11. return unichr(int(''.join(map(str, hash)),2))
  33. 33. Blar.py analysis • Analyzes 4 character sequences and assigns a decaying version of the optimal coding cost to each line • Tells you how interesting a certain event is relative to everything else in the genome, accounting for ordering • Blar.py Genomes are extremely compressible using bzip especially
  34. 34. Blar.py defaults (ATM) • 4 character sliding windows • 4 bit hashes • 64d feature hashes • Outputs a list of the most interesting scores • Outputs a few bad charts
  35. 35. Blar.py vs. Toy File 1. Mary had a little lamb whose fleece was white as snow. 2. Mary had a little lamb whose fleece was white as snow. 3. Mary had a little lamb whose fleece was white as snow. 4. Mary had a little lamb whose fleece was white as snow. 5. Mary had a little lamb whose fleece was white as snow. 6. Gary had a little hand whose hair was as white as blow. 7. some more strings 8. some more strings 9. some more strings 10.some more strings 11.some more strings 12.John McAfee was the keynote for Skytalks. 13.John McAfee was the keynote for Skytalks. 14.John McAfee was the keynote for Skytalks. 15.some more strings 16.some more strings 17.some more strings 18.John McAfee was the keynote for Skytalks. 19.John McAfee was the keynote for Skytalks. 20.FOO BAR BAS
  36. 36. Blar.py vs. Toy File
  37. 37. Blar.py vs. Toy File (Look Raffy, I’m using the completely inappropriate chart type)
  38. 38. Blar.py vs. BlueGene/L • From the Usenix Computer Failure Data Repository • 1.2GB combined log file from 131,072 processors for six months • 119MB compressed with gzip • 9.4MB blar.py genome • Blar.py ~1000 lines/sec
  39. 39. Blar.py vs. BlueGene/L
  40. 40. Blar.py vs. BlueGene/L
  41. 41. TL;DR • Fast, accurate, free: Blar.py genomic encoding tool provides very fast, low noise anomaly detection • Stop searching in a crisis: Great way to quickly explore data for IR, forensics, etc., especially from unknown sources • Want it? Follow me @conduit242 for the GitHub posting announcement

×