Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

M|18 How InfoArmor Harvests Data from the Underground Economy


Published on

M|18 How InfoArmor Harvests Data from the Underground Economy

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

M|18 How InfoArmor Harvests Data from the Underground Economy

  1. 1. InfoArmor, Threat Intelligence & Data Ingestion Christian Lees & Steve Olson
  2. 2. What we will be covering today. HOW DID WE GET HERE? A brief history of InfoArmor, and the greatness that got us to where we are today. WHERE ARE WE GOING? A look at the vision and where we see InfoArmor going in the future. HOW DO WE GET THERE? What will it take for us to achieve our vision, and what is our process to get there? 1 2 3
  3. 3. Threat Actors / Dark Web
  4. 4. Source: most-valuable-resource “The world’s most valuable resource is no longer oil, but data” - The Economist
  5. 5. Hacked Inside Job Poor Security Accidental Publish Device lost/stolen
  6. 6. The unseen threats. Dark web monitoring through InfoArmor Advanced Threat Intelligence. Forum scraping Programmatic forum scraping with bots while humans operatives gain access to closed forums. Human operatives Combat hackers that are using technology and innovating everyday. Structuring raw data Compromised data files must be formatted, organized and canonized to be fully leveraged. Threat actor profiling Tracking threat actors moves as we built out profiles, information and patterns to thwart risks.
  7. 7. 60% of companies can not detect compromised credentials survey says Source: surveyed.html
  8. 8. This product will get you 100.000 United Kingdom "HOTMAIL" Emails Leads Source: http[:]//6qlocfg6zq2kyacl.onion/viewProduct?offer=857044.38586
  9. 9. SpamBot
  10. 10. Lessons from 1 billion rows What I learned that allowed me to sleep again
  11. 11. Bird’s eye view of data - Relational dbs for web application and storage of known structured data - Elasticsearch for unstructured and fulltext searching - Replication off-site - MariaDB remote DBAs monitor all InfoArmor Over 2 billion credentials 45 million forum posts 300 GB and growing of botnet logs Pretty much all code is in Python.
  12. 12. Don’t Do That! - Feature worked for some inputs, but not others - Schema was suboptimal, leading to full table scans - 4 way join, hundreds of thousands of seconds - Had to kill ‘em - With MariaDB assistance, planned out new schema for credentials - More intuitive - Meets business needs in API and GUI - Listen to end users! Non tech lesson: Cultivate relationships outside of tech!
  13. 13. Multithreading Mayhem - Parallelized queries to multiple databases - In Pyramid, achieved with separate DB Sessions - Sessions weren’t closed, leaving connections open - Fell outside of normal Zope/SQLAlchemy flow - Monyog alerts about max’d connections, restarted application to clear connections - Found issue in code, added .close() Lesson: Configuration changes solve and don’t solve problems at the same time
  14. 14. Obviously….
  15. 15. Don’t Bring All Groceries in at Once - Sometimes a ton of rows need to be updated - Even if something doesn’t get committed…. ...Log entries and rollbacks get created - Gums up replication - Wastes time - MAX ALLOWED PACKET Lesson: Data should be updated in small bites Programmatic!
  16. 16. Same for import parsing scripts Where multithreading amplifies binlog size - Don’t get greedy, nothing is worth screwing up replication or your application Non tech lesson: Add 20 to 200 percent to time estimates for imports. Process and organization will set you free
  17. 17. IDS - Intrusion Detection System Or rather “Inline Data Shredder” - Scrape malicious looking javascript, php, python, perl scripts - Will normally get bounced on the way in from the scraper - Replication kept mysteriously stopping - Engineering team getting “WTF?” alerts from all angles Found the chunk of code in the database. Replication now over SSL. Lesson: Coincidence...or degree of separation?
  18. 18. Final thoughts... - Data is business, business is data. - Let remote dbas do nuts and bolts - Focus on your application and goal of the data - Make data available to sales people, but toolify it - Keep evolving
  19. 19. Fin Gracias por eschucar