M|18 How InfoArmor Harvests Data from the Underground Economy

InfoArmor, Threat Intelligence &
Data Ingestion
Christian Lees & Steve Olson

What we will be covering today.
HOW DID WE GET HERE?
A brief history of InfoArmor, and the
greatness that got us to where we are
today.
WHERE ARE WE GOING?
A look at the vision and where we see
InfoArmor going in the future.
HOW DO WE GET THERE?
What will it take for us to achieve our
vision, and what is our process to get
there?
1 2 3

Source: https://www.economist.com/news/leaders/21721656-data-economy-demands-new-approach-antitrust-rules-worlds-
most-valuable-resource
“The world’s most valuable resource is no longer oil, but data”
- The Economist

Hacked
Inside Job
Poor
Security
Accidental
Publish
Device
lost/stolen

The unseen threats.
Dark web monitoring through InfoArmor Advanced Threat Intelligence.
Forum scraping
Programmatic forum
scraping with bots while
humans operatives gain
access to closed forums.
Human operatives
Combat hackers that are
using technology and
innovating everyday.
Structuring raw data
Compromised data files
must be formatted,
organized and canonized
to be fully leveraged.
Threat actor profiling
Tracking threat actors
moves as we built out
profiles, information and
patterns to thwart risks.

60% of companies can not detect compromised credentials survey says
Source: https://www.csoonline.com/article/3022066/security/60-of-companies-cannot-detect-compromised-credentials-say-security-pros-
surveyed.html

This product will get you 100.000 United Kingdom "HOTMAIL" Emails Leads
Source: http[:]//6qlocfg6zq2kyacl.onion/viewProduct?offer=857044.38586

Lessons from 1 billion
rows
What I learned that allowed me to sleep
again

Bird’s eye view of data
- Relational dbs for web application and storage of known
structured data
- Elasticsearch for unstructured and fulltext searching
- Replication off-site
- MariaDB remote DBAs monitor all InfoArmor
Over 2 billion credentials
45 million forum posts
300 GB and growing of botnet logs
Pretty much all code is in Python.

Don’t Do That!
- Feature worked for some inputs, but not others
- Schema was suboptimal, leading to full table scans
- 4 way join, hundreds of thousands of seconds
- Had to kill ‘em
- With MariaDB assistance, planned out new schema for
credentials
- More intuitive
- Meets business needs in API and GUI
- Listen to end users!
Non tech lesson: Cultivate relationships outside of tech!

Multithreading Mayhem
- Parallelized queries to multiple databases
- In Pyramid, achieved with separate DB Sessions
- Sessions weren’t closed, leaving connections open
- Fell outside of normal Zope/SQLAlchemy flow
- Monyog alerts about max’d connections, restarted application to
clear connections
- Found issue in code, added .close()
Lesson: Configuration changes solve and don’t solve problems at the
same time

Don’t Bring All Groceries in at Once
- Sometimes a ton of rows need to be updated
- Even if something doesn’t get committed….
...Log entries and rollbacks get created
- Gums up replication
- Wastes time
- MAX ALLOWED PACKET
Lesson: Data should be updated in small bites
Programmatic!

Same for import parsing scripts
Where multithreading amplifies binlog size
- Don’t get greedy, nothing is worth screwing up replication or your
application
Non tech lesson: Add 20 to 200 percent to time estimates for imports.
Process and organization will set you free

IDS - Intrusion Detection System
Or rather “Inline Data Shredder”
- Scrape malicious looking javascript, php, python, perl scripts
- Will normally get bounced on the way in from the scraper
- Replication kept mysteriously stopping
- Engineering team getting “WTF?” alerts from all angles
Found the chunk of code in the database. Replication now over SSL.
Lesson: Coincidence...or degree of separation?

Final thoughts...
- Data is business, business is data.
- Let remote dbas do nuts and bolts
- Focus on your application and goal of the data
- Make data available to sales people, but toolify it
- Keep evolving

M|18 How InfoArmor Harvests Data from the Underground Economy

More Related Content

What's hot

Similar to M|18 How InfoArmor Harvests Data from the Underground Economy

More from MariaDB plc

Recently uploaded

M|18 How InfoArmor Harvests Data from the Underground Economy

Editor's Notes