Leveraging the public      internetTonimir Kisasondi, mag.inf., EUCIP
$whois tkisason•  Junior resarcher @ www.foi.hr•  Head of Open Systems and Security lab•  Likes to build and break things•...
What happens when you digitizethe whole world?•  Google, Facebook, Twitter•  Is it a bubble or a valid business model?•  T...
What happens when you digitizethe whole world?
What happens when you digitizethe whole world?•  Storing 20 Tbps traffic•  Map/Reduce like infrastructure to mine and     ...
First: OSINT•  OSINT: Open Source Intelligence  o  Finding, selecting and acquiring information over     open, publicly av...
First: OSINT•  Not everything is OSINT, but you can  actually glean interesting data from almost  anything•  It worked for...
Data analysis 101•  Data is just data, you have to correlate it or  put it in context for it to be useful   o    Find outl...
Do i need advanced statistics?•  Most of the time: No•  Are statistics awesome? Yup•  Well, don’t play with things where y...
How can we approach the problem•  There are many (finished) tools, if they help,     great•    Roll your own script     • ...
Finished tools•  Wget, python, ruby, perl...  •    Just kidding•  Tapir•  Maltego•  Metagofil, FOCA, ExifTool•  Wayback ma...
Bad design 101•  If you hack it together, watch out for some     gotchas•  Line per line analysis     o  Minimal complexit...
ignorecase?#!/usr/bin/pythonimport rea = open("access.log")b = open("test.log","w")for line in a:   if re.search("DENIED",...
simple RE#!/usr/bin/pythonimport rea = open("access.log")b = open("test.log","w")for line in a:   if re.search("DENIED",li...
find#!/usr/bin/pythona = open("access.log")b = open("test.log","w")for line in a:   c = line.find("DENIED")   if c >= 0 : ...
grep$ time grep DENIED access.log > testreal   0m0.074suser   0m0.040ssys    0m0.032s
To sum it up...Python RE ignorecase   :   4.516sPython RE              :   2.520sPython find            :   0.781sgrep    ...
Primer on useful and interestingtools•  ipython  o  http://ipython.org/•  python-nltk  o  http://nltk.org/   (nltk.clean_h...
pydot and graphviz#!/usr/bin/pythonimport pydotgraph = pydot.Dot(graph_type=graph)graph.add_edge(pydot.Edge(link 1,person ...
Visualization: pydot and graphviz
So, how about a short showcase ofsome things i did•  Yeah, they are lame, and simple•  Works for me•  Available on github•...
mkwordlist -https://github.com/tkisason/gcrack•  Idea: Create wordlists with google results for     a set of keywords•    ...
mkwordlist -https://github.com/tkisason/gcrack•  Some other cool things  o  Keywords can be google dorks     §  site:.bg ...
gcrack -https://github.com/tkisason/gcrack•  Idea: Most of the weak password hashes are     cracked and leaked on the publ...
logtoolhttps://github.com/tkisason/logtool•  log files are interesting..ish•  Especially if you have a compromised     mac...
linkcrawl and nltkhttps://github.com/tkisason/linkcrawl•  Building a simple crawler is easy (or use  wget and cURL, man up...
conclusion•  Well, have just have fun•  Problems are all around you, try to solve  some J
questions?
Thank you!
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internet
Upcoming SlideShare
Loading in …5
×

OpenFest 2012 : Leveraging the public internet

625 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
625
On SlideShare
0
From Embeds
0
Number of Embeds
15
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

OpenFest 2012 : Leveraging the public internet

  1. 1. Leveraging the public internetTonimir Kisasondi, mag.inf., EUCIP
  2. 2. $whois tkisason•  Junior resarcher @ www.foi.hr•  Head of Open Systems and Security lab•  Likes to build and break things•  tonimir.kisasondi@foi.hr•  skype:tkisason
  3. 3. What happens when you digitizethe whole world?•  Google, Facebook, Twitter•  Is it a bubble or a valid business model?•  The new buzzword is big data•  Storage per capita doubles every three years•  Kryders law says that storage density doubles every 18 months•  Can you really store the whole world?
  4. 4. What happens when you digitizethe whole world?
  5. 5. What happens when you digitizethe whole world?•  Storing 20 Tbps traffic•  Map/Reduce like infrastructure to mine and combine data•  Why is this interesting to us now? o  Storage is cheap o  Big data is useful everywhere o  Use tricks that intel agencies use to enable cool stuff o  It’s not rocket science... o  Yes, the most interesting applications are in cross disciplinary fields
  6. 6. First: OSINT•  OSINT: Open Source Intelligence o  Finding, selecting and acquiring information over open, publicly available sources like newspapers, internet, books, internet, social networks (twitter)... o  Various registries (firm, open postings, public listing) o  Metadata o  Mine those, and you might find a lot of interesting stuffo  White zone – Legal and ethicalo  Black zone – Illegal and Unethicalo  Gray zone – Legal but unethical
  7. 7. First: OSINT•  Not everything is OSINT, but you can actually glean interesting data from almost anything•  It worked for the guys that wrote Splunk, so they decided to write Splunk.•  It works for data mining folks.
  8. 8. Data analysis 101•  Data is just data, you have to correlate it or put it in context for it to be useful o  Find outliers o  Spot differences o  Find common attributes o  Find connections, not answers o  First identify, then try to interpret o  Put data into perspective, seek help J o  "Data driven design”•  A nice showcase of data driven design: o  A/B Testing
  9. 9. Do i need advanced statistics?•  Most of the time: No•  Are statistics awesome? Yup•  Well, don’t play with things where you can get hurt. J•  Seek professional help•  Grep, Google refine/Mojo facets, and your favorite scripting languages are just fine...
  10. 10. How can we approach the problem•  There are many (finished) tools, if they help, great•  Roll your own script •  Duct tape some finished libraries •  Most of the times it takes less time then finding a tool. •  Cheating and stealing is encouraged. ;)
  11. 11. Finished tools•  Wget, python, ruby, perl... •  Just kidding•  Tapir•  Maltego•  Metagofil, FOCA, ExifTool•  Wayback machine (Extremely interesting)
  12. 12. Bad design 101•  If you hack it together, watch out for some gotchas•  Line per line analysis o  Minimal complexity O(n)•  You can easily kill the speed of your script/ parser/*•  Best separator is t•  .split() is godsent
  13. 13. ignorecase?#!/usr/bin/pythonimport rea = open("access.log")b = open("test.log","w")for line in a: if re.search("DENIED",line,re.IGNORECASE): b.write(line)b.close()$ time ./re-search.pyreal 0m4.516suser 0m4.444ssys 0m0.056s
  14. 14. simple RE#!/usr/bin/pythonimport rea = open("access.log")b = open("test.log","w")for line in a: if re.search("DENIED",line): b.write(line)b.close()$ time time ./re-search.pyreal 0m2.520suser 0m2.456ssys 0m0.056s
  15. 15. find#!/usr/bin/pythona = open("access.log")b = open("test.log","w")for line in a: c = line.find("DENIED") if c >= 0 : b.write(line)b.close()$ time ./testparse.pyreal 0m0.781suser 0m0.728ssys 0m0.044s
  16. 16. grep$ time grep DENIED access.log > testreal 0m0.074suser 0m0.040ssys 0m0.032s
  17. 17. To sum it up...Python RE ignorecase : 4.516sPython RE : 2.520sPython find : 0.781sgrep : 0.074s
  18. 18. Primer on useful and interestingtools•  ipython o  http://ipython.org/•  python-nltk o  http://nltk.org/ (nltk.clean_html(messy_html))•  python-requests o  www.python-requests.org•  python-graphviz o  http://code.google.com/p/pydot/•  python-google by Mario Vilas o  https://github.com/MarioVilas
  19. 19. pydot and graphviz#!/usr/bin/pythonimport pydotgraph = pydot.Dot(graph_type=graph)graph.add_edge(pydot.Edge(link 1,person 2,label=link 3))graph.add_edge(pydot.Edge(person 2,person 3,label=link4,color="red",penwidth=6)).........graph.write_png(output.png,prog=dot)
  20. 20. Visualization: pydot and graphviz
  21. 21. So, how about a short showcase ofsome things i did•  Yeah, they are lame, and simple•  Works for me•  Available on github•  Hope they can motivate you to do some fun and simple “one afternoon” stuff•  Most of the “hard” stuff is easy once you try to hack it together
  22. 22. mkwordlist -https://github.com/tkisason/gcrack•  Idea: Create wordlists with google results for a set of keywords•  For a keyword return top 5 links (or N)•  Scrape and clean with NLTK•  Optional lowercasing for future mutations o  You can use JtR/HashCat with a ruleset to mutate the lists•  Result: Nice targeted wordlist generator
  23. 23. mkwordlist -https://github.com/tkisason/gcrack•  Some other cool things o  Keywords can be google dorks §  site:.bg §  filetype:txt §  “”•  Interesting results for targeted attacks•  Broad keywords are also ok o  If you are pentesting a company or similar
  24. 24. gcrack -https://github.com/tkisason/gcrack•  Idea: Most of the weak password hashes are cracked and leaked on the public internet•  Google indexes the pages, and the content of this pages contains the plaintext•  Use google searches for password cracking•  Create bag of words as a wordlist•  Result: Very effective and fast hash cracker•  Bonus: hash agnostic
  25. 25. logtoolhttps://github.com/tkisason/logtool•  log files are interesting..ish•  Especially if you have a compromised machine and the attackers were noobish enough to leave the log files•  What can you learn: o  IP addresses (known proxyes and tor exit points) o  Usernames (are they generic or are they specific) o  IP-GeoIP data o  Toolmarks (user agents, wordlists for attacks)
  26. 26. linkcrawl and nltkhttps://github.com/tkisason/linkcrawl•  Building a simple crawler is easy (or use wget and cURL, man up and write some shell scripts)•  NLTK is awesome! o  import nltk, nltk.clean_html(data)•  http://orange.biolab.si is also a nice platform
  27. 27. conclusion•  Well, have just have fun•  Problems are all around you, try to solve some J
  28. 28. questions?
  29. 29. Thank you!

×