Your SlideShare is downloading. ×
0
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internet
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

OpenFest 2012 : Leveraging the public internet

475

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
475
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Leveraging the public internetTonimir Kisasondi, mag.inf., EUCIP
  • 2. $whois tkisason•  Junior resarcher @ www.foi.hr•  Head of Open Systems and Security lab•  Likes to build and break things•  tonimir.kisasondi@foi.hr•  skype:tkisason
  • 3. What happens when you digitizethe whole world?•  Google, Facebook, Twitter•  Is it a bubble or a valid business model?•  The new buzzword is big data•  Storage per capita doubles every three years•  Kryders law says that storage density doubles every 18 months•  Can you really store the whole world?
  • 4. What happens when you digitizethe whole world?
  • 5. What happens when you digitizethe whole world?•  Storing 20 Tbps traffic•  Map/Reduce like infrastructure to mine and combine data•  Why is this interesting to us now? o  Storage is cheap o  Big data is useful everywhere o  Use tricks that intel agencies use to enable cool stuff o  It’s not rocket science... o  Yes, the most interesting applications are in cross disciplinary fields
  • 6. First: OSINT•  OSINT: Open Source Intelligence o  Finding, selecting and acquiring information over open, publicly available sources like newspapers, internet, books, internet, social networks (twitter)... o  Various registries (firm, open postings, public listing) o  Metadata o  Mine those, and you might find a lot of interesting stuffo  White zone – Legal and ethicalo  Black zone – Illegal and Unethicalo  Gray zone – Legal but unethical
  • 7. First: OSINT•  Not everything is OSINT, but you can actually glean interesting data from almost anything•  It worked for the guys that wrote Splunk, so they decided to write Splunk.•  It works for data mining folks.
  • 8. Data analysis 101•  Data is just data, you have to correlate it or put it in context for it to be useful o  Find outliers o  Spot differences o  Find common attributes o  Find connections, not answers o  First identify, then try to interpret o  Put data into perspective, seek help J o  "Data driven design”•  A nice showcase of data driven design: o  A/B Testing
  • 9. Do i need advanced statistics?•  Most of the time: No•  Are statistics awesome? Yup•  Well, don’t play with things where you can get hurt. J•  Seek professional help•  Grep, Google refine/Mojo facets, and your favorite scripting languages are just fine...
  • 10. How can we approach the problem•  There are many (finished) tools, if they help, great•  Roll your own script •  Duct tape some finished libraries •  Most of the times it takes less time then finding a tool. •  Cheating and stealing is encouraged. ;)
  • 11. Finished tools•  Wget, python, ruby, perl... •  Just kidding•  Tapir•  Maltego•  Metagofil, FOCA, ExifTool•  Wayback machine (Extremely interesting)
  • 12. Bad design 101•  If you hack it together, watch out for some gotchas•  Line per line analysis o  Minimal complexity O(n)•  You can easily kill the speed of your script/ parser/*•  Best separator is t•  .split() is godsent
  • 13. ignorecase?#!/usr/bin/pythonimport rea = open("access.log")b = open("test.log","w")for line in a: if re.search("DENIED",line,re.IGNORECASE): b.write(line)b.close()$ time ./re-search.pyreal 0m4.516suser 0m4.444ssys 0m0.056s
  • 14. simple RE#!/usr/bin/pythonimport rea = open("access.log")b = open("test.log","w")for line in a: if re.search("DENIED",line): b.write(line)b.close()$ time time ./re-search.pyreal 0m2.520suser 0m2.456ssys 0m0.056s
  • 15. find#!/usr/bin/pythona = open("access.log")b = open("test.log","w")for line in a: c = line.find("DENIED") if c >= 0 : b.write(line)b.close()$ time ./testparse.pyreal 0m0.781suser 0m0.728ssys 0m0.044s
  • 16. grep$ time grep DENIED access.log > testreal 0m0.074suser 0m0.040ssys 0m0.032s
  • 17. To sum it up...Python RE ignorecase : 4.516sPython RE : 2.520sPython find : 0.781sgrep : 0.074s
  • 18. Primer on useful and interestingtools•  ipython o  http://ipython.org/•  python-nltk o  http://nltk.org/ (nltk.clean_html(messy_html))•  python-requests o  www.python-requests.org•  python-graphviz o  http://code.google.com/p/pydot/•  python-google by Mario Vilas o  https://github.com/MarioVilas
  • 19. pydot and graphviz#!/usr/bin/pythonimport pydotgraph = pydot.Dot(graph_type=graph)graph.add_edge(pydot.Edge(link 1,person 2,label=link 3))graph.add_edge(pydot.Edge(person 2,person 3,label=link4,color="red",penwidth=6)).........graph.write_png(output.png,prog=dot)
  • 20. Visualization: pydot and graphviz
  • 21. So, how about a short showcase ofsome things i did•  Yeah, they are lame, and simple•  Works for me•  Available on github•  Hope they can motivate you to do some fun and simple “one afternoon” stuff•  Most of the “hard” stuff is easy once you try to hack it together
  • 22. mkwordlist -https://github.com/tkisason/gcrack•  Idea: Create wordlists with google results for a set of keywords•  For a keyword return top 5 links (or N)•  Scrape and clean with NLTK•  Optional lowercasing for future mutations o  You can use JtR/HashCat with a ruleset to mutate the lists•  Result: Nice targeted wordlist generator
  • 23. mkwordlist -https://github.com/tkisason/gcrack•  Some other cool things o  Keywords can be google dorks §  site:.bg §  filetype:txt §  “”•  Interesting results for targeted attacks•  Broad keywords are also ok o  If you are pentesting a company or similar
  • 24. gcrack -https://github.com/tkisason/gcrack•  Idea: Most of the weak password hashes are cracked and leaked on the public internet•  Google indexes the pages, and the content of this pages contains the plaintext•  Use google searches for password cracking•  Create bag of words as a wordlist•  Result: Very effective and fast hash cracker•  Bonus: hash agnostic
  • 25. logtoolhttps://github.com/tkisason/logtool•  log files are interesting..ish•  Especially if you have a compromised machine and the attackers were noobish enough to leave the log files•  What can you learn: o  IP addresses (known proxyes and tor exit points) o  Usernames (are they generic or are they specific) o  IP-GeoIP data o  Toolmarks (user agents, wordlists for attacks)
  • 26. linkcrawl and nltkhttps://github.com/tkisason/linkcrawl•  Building a simple crawler is easy (or use wget and cURL, man up and write some shell scripts)•  NLTK is awesome! o  import nltk, nltk.clean_html(data)•  http://orange.biolab.si is also a nice platform
  • 27. conclusion•  Well, have just have fun•  Problems are all around you, try to solve some J
  • 28. questions?
  • 29. Thank you!

×