SQL Ginsu (SANS @Night, SANSFIRE 2011)

875 views

Published on

Let's face it - data sources are growing larger and more diverse every year - far outpacing the ability for us puny humans to successfully grok the data alone, let alone it's relevance to an investigation. One of the most important skills for an investigator or incident responder is that which will help him or her quickly answer diverse questions about the data they have been presented. Whether reviewing data from network captures, filesystem analysis, online services, or anywhere else - knowing how to slice data can give the investigator a clear edge in their pursuits. Most importantly, we can quickly eliminate the overwhelming volume of data that has no bearing on the investigation, leaving behind just the valuable tidbits that are most necessary. This talk will discuss how normalizing various data sources into a database can help wrangle data into a highly efficient tool that, when used properly, provides fast and decisive insight to the investigation at hand.

Published in: Technology, Sports
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
875
On SlideShare
0
From Embeds
0
Number of Embeds
79
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • MOAR DATA\nMy HDDs\nMOAR DATA SOURCES\n- supertimelining = consistent picture from diverse data sources\n\n
  • MOAR DATA\nMy HDDs\nMOAR DATA SOURCES\n- supertimelining = consistent picture from diverse data sources\n\n
  • MOAR DATA\nMy HDDs\nMOAR DATA SOURCES\n- supertimelining = consistent picture from diverse data sources\n\n
  • MOAR DATA\nMy HDDs\nMOAR DATA SOURCES\n- supertimelining = consistent picture from diverse data sources\n\n
  • MOAR DATA\nMy HDDs\nMOAR DATA SOURCES\n- supertimelining = consistent picture from diverse data sources\n\n
  • MOAR DATA\nMy HDDs\nMOAR DATA SOURCES\n- supertimelining = consistent picture from diverse data sources\n\n
  • MOAR DATA\nMy HDDs\nMOAR DATA SOURCES\n- supertimelining = consistent picture from diverse data sources\n\n
  • MOAR DATA\nMy HDDs\nMOAR DATA SOURCES\n- supertimelining = consistent picture from diverse data sources\n\n
  • MOAR DATA\nMy HDDs\nMOAR DATA SOURCES\n- supertimelining = consistent picture from diverse data sources\n\n
  • MOAR DATA\nMy HDDs\nMOAR DATA SOURCES\n- supertimelining = consistent picture from diverse data sources\n\n
  • Niche data sources too new or small to get attention from “big boys”\nScripting is possible, but even that can be limiting (and proprietary)\nFLEXIBLE!\n
  • Concept of foundational skill sets\nConstruction: Measuring, structural integrity\nCooking: Knife handling, food safety\nSales: Psychology, personality\nComputer Science: Data structures, algorithms\n
  • \n
  • SQL is an advanced art\nRaw power still accessible to anyone\n
  • \n
  • Data management\nDon’t re-invent if not needed\n“Can we fix it?”\n
  • Build for reduction\nSave space (and time!)\nKnow tradeoffs\n
  • Command Line Kung Fu\nUse what you know best!\n\n
  • Pitfalls with either choice\nBest of both: go big until too big\n
  • \n
  • Elegant/creative not always fast\nDOCUMENT!\n\n
  • \n
  • \n
  • \n
  • Only 800 records in VM\n
  • VM\n
  • VM\n
  • \n
  • Used schema with 8-10M sessions/day\n
  • 150k records in VM\n
  • Observe to characterize traffic for reduction\n“always” reduce is bad practice (tunneling)\n
  • VM: Simple list of SSH sessions\nCould do Mbps\n
  • Two-step query\n(<200ms for both, ~10s for subSELECT)\n
  • Notional idea - untested\nALTER (can we fix it?)\nTrack running “last seen” date/time stamp\n
  • \n
  • \n
  • SQL Ginsu (SANS @Night, SANSFIRE 2011)

    1. 1. SQL Ginsu Better Living (And Data Reduction) Through Databases Normalize and reduce lots of data. Provide quick and decisive insight. Jul 21, 2011 SANSFIRE©2011 Lewes Technology Consulting, LLC
    2. 2. Phil Hagen 8yrs Contract InfoSec/ Forensic work with DoD, IC, LE, commercial 5yrs USAF InfoSec/Comm BS: CompSci, USAFA Contacts: gplus.to/philhagen @PhilHagen stuffphilwrites.com©2011 Lewes Technology Consulting, LLC
    3. 3. Phil Hagen 8yrs Contract InfoSec/ Forensic work with DoD, IC, LE, commercial 5yrs USAF InfoSec/Comm BS: CompSci, USAFA Contacts: gplus.to/philhagen @PhilHagen stuffphilwrites.com©2011 Lewes Technology Consulting, LLC
    4. 4. Phil Hagen 8yrs Contract InfoSec/ Forensic work with DoD, IC, LE, commercial 5yrs USAF InfoSec/Comm BS: CompSci, USAFA Contacts: gplus.to/philhagen @PhilHagen stuffphilwrites.com©2011 Lewes Technology Consulting, LLC
    5. 5. Phil Hagen 8yrs Contract InfoSec/ Forensic work with DoD, IC, LE, commercial 5yrs USAF InfoSec/Comm BS: CompSci, USAFA Contacts: gplus.to/philhagen @PhilHagen stuffphilwrites.com©2011 Lewes Technology Consulting, LLC
    6. 6. Why “SQL Ginsu”? Photos: lifeandtimesincleveland.blogspot.com/©2011 Lewes Technology Consulting, LLC 2011/06/michael.html
    7. 7. Why “SQL Ginsu”? ↓ Time spent manually reviewing data Photos: lifeandtimesincleveland.blogspot.com/©2011 Lewes Technology Consulting, LLC 2011/06/michael.html
    8. 8. Why “SQL Ginsu”? ↓ Time spent manually reviewing data ↓ Complexity of data for clearer presentation Photos: lifeandtimesincleveland.blogspot.com/©2011 Lewes Technology Consulting, LLC 2011/06/michael.html
    9. 9. Why “SQL Ginsu”? ↓ Time spent manually reviewing data ↓ Complexity of data for clearer presentation ↑ YOUR value to the investigation! Photos: lifeandtimesincleveland.blogspot.com/©2011 Lewes Technology Consulting, LLC 2011/06/michael.html
    10. 10. Why “SQL Ginsu”? ↓ Time spent manually reviewing data ↓ Complexity of data for clearer presentation ↑ YOUR value to the investigation! Cuts a lead pipe and still cuts wafer-thin tomatoes Photos: lifeandtimesincleveland.blogspot.com/©2011 Lewes Technology Consulting, LLC 2011/06/michael.html
    11. 11. Why “SQL Ginsu”? ↓ Time spent manually reviewing data ↓ Complexity of data for clearer presentation ↑ YOUR value to the investigation! Cuts a lead pipe and still cuts wafer-thin tomatoes Infomercials are funny Photos: lifeandtimesincleveland.blogspot.com/©2011 Lewes Technology Consulting, LLC 2011/06/michael.html
    12. 12. Background Data continues to grow exponentially 1995: 1GB @ $600; 2011: 1TB @ $90 [6,826x ↓ unit cost!] Data sources increasingly diverse 3+ browsers, many databases, countless mobile apps, tools, sites and services, cross-platform inconsistencies Core questions remain consistent: who/what/where/why/when/how©2011 Lewes Technology Consulting, LLC
    13. 13. Background Data continues to grow exponentially 1995: 1GB @ $600; 2011: 1TB @ $90 [6,826x ↓ unit cost!] Data sources increasingly diverse 3+ browsers, many databases, countless mobile apps, tools, sites and services, cross-platform inconsistencies Core questions remain consistent: who/what/where/why/when/how©2011 Lewes Technology Consulting, LLC
    14. 14. Background Value of the “all in one” forensic tool is not in question, but... ...can no longer be the ONLY tool used in an examination Staying flexible is key! Cyclical nature of the collect/analyze/report processes New leads generated - from the exam or elsewhere Change in legal strategy New personalities with different perspectives©2011 Lewes Technology Consulting, LLC
    15. 15. Foundational Skill Sets©2011 Lewes Technology Consulting, LLC
    16. 16. Foundational Skill Sets In computer forensics and incident response, I contend two foundational skills are: Data management and data reduction Many ways to accomplish, but we’ll focus on using SQL SQL is very scalable, relatively universal, extremely powerful and repeatable Excel, text manipulation (sed, awk, cut, grep), etc are also perfectly good tools for this©2011 Lewes Technology Consulting, LLC
    17. 17. SQL: It’s Not that Hard CREATE TABLE `tblname` (`col1` integer, `col2` varchar(10)); INSERT INTO `tblname` (`col1`, `col2`) VALUES (‘1’, ‘val2’); SELECT `col2` FROM `tblname` WHERE `col1` = 1;©2011 Lewes Technology Consulting, LLC
    18. 18. OK, That’s Not Quite Fair SQL can be extremely powerful, but not prohibitively complicated JOINs: Combine data from different tables into one result set Sub-SELECTs: Nested queries can simplify logic and decrease need to master JOINs UNIONs: Craft similarly-structured queries against different data sets and present as one result set INDEXes: Can speed queries and JOINs when created on commonly-used columns As with anything worthwhile, it’s an ongoing educational process©2011 Lewes Technology Consulting, LLC
    19. 19. SQL Super Functions Do analysis within SQL statements SUM(), AVE(), MAX(), MIN(), COUNT(), STD() Date/time addition/subtraction/slicing Pull date or clock time from DATETIME values Math, number formatting Use most efficient data types INET_ATON(), INET_NTOA() Start writing the report with SQL: GROUP_CONCAT()©2011 Lewes Technology Consulting, LLC
    20. 20. Step 1: Schema Design Schema consists of column names and type definitions Your data might already be databased somewhere! log2timeline, SGUIL, Splunk Adapt to developing requirements: ALTER TABLE (or fix what you messed up when you started) Doing this efficiently takes practice Ranges of integers “Normal Forms”©2011 Lewes Technology Consulting, LLC
    21. 21. Step 1: Schema Tricks/Tips Use an ‘interesting’ column ‘y’, ‘n’, ‘-’ values for later data reduction Use 32-bit unsigned integers for IP addresses Might want to use dotted quads too, if you expect on-the- fly subnet-style queries (add/remove as needed) Use indexes where sensible Too many indexes decrease performance, increase storage Focus on commonly-queried columns©2011 Lewes Technology Consulting, LLC
    22. 22. Step 1: Schema Tricks/Tips Use an ‘interesting’ column ‘y’, ‘n’, ‘-’ values for later data reduction Use 32-bit unsigned integers for IP addresses Might want to use dotted quads too, if you expect on-the- fly subnet-style queries (add/remove as needed) Use indexes where sensible Too many indexes decrease performance, increase storage Focus on commonly-queried columns©2011 Lewes Technology Consulting, LLC
    23. 23. Step 2: Data Load Lots of data available - how do we normalize it for a database? By any means possible! Shell commands/scripts (CLKF!): awk, sed, cut, grep Scripting languages: Python, Perl, PHP Office apps: Excel, OOCalc By any means possible! Separate individual data items to craft SQL INSERTs©2011 Lewes Technology Consulting, LLC http://blog.commandlinekungfu.com
    24. 24. Step 3: Data Reduction “Load less data” or “classify after load”? Load less? Might not be able to go back and get it! Needs less disk space, simplifies queries Load and classify? Might need lots of disk space Queries can be slower and more complicated Me? Load everything until resources are an issue©2011 Lewes Technology Consulting, LLC
    25. 25. Step 4: Analyze! (Profit?)©2011 Lewes Technology Consulting, LLC
    26. 26. Step 4: Analysis Tricks/Tips Sub-SELECTs and UNIONs can drastically affect speed Sometimes a script or two-stage querysaves hours! Save SQL statements with report for re-generation in the future - especially with new data Partially-tested, unverified, embarrassing but functional scripts at stuffphilwrites.com/2011/07/sql-ginsu Seriously, don’t laugh. Please.©2011 Lewes Technology Consulting, LLC
    27. 27. Example 1: Login Records©2011 Lewes Technology Consulting, LLC
    28. 28. Example 1, Step 1: Schema CREATE TABLE `logins` ( `id` int(11) unsigned NOT NULL auto_increment, `userid` varchar(20) NOT NULL, `systemname` varchar(20) NOT NULL default , `src_ip` int unsigned NOT NULL, `logintime` datetime NOT NULL, Integer! `logouttime` datetime NOT NULL, `interesting` enum(y,n,-) NOT NULL default -, PRIMARY KEY (`id`), KEY `src_ip` (`src_ip`), KEY `userid` (`userid`) Indexes );©2011 Lewes Technology Consulting, LLC
    29. 29. Example 1, Step 2: Load Output of Linux “last -i” command: phil pts/0 1.2.3.4 Thu May 5 16:39 - 22:02 (10+05:22) “phil” = username, “pts/0” = terminal “1.2.3.4” = source IP, “Thu May 5 16:39” = login date and time “22:02” = logout (time only!), “10+05:22” = duration INSERT INTO `logins` (`userid`, `src_ip`, `logintime`, `logouttime`) VALUES (‘phil’, INET_ATON(‘1.2.3.4’), ‘2010-05-05 16:39:00’, TIMEADD(‘2010-05-05 16:39’, ’10 05:22:00’)); Python script for Linux at stuffphilwrites.com/2011/07/sql-ginsu©2011 Lewes Technology Consulting, LLC
    30. 30. Example 1, Step 3: Reduce Eliminate known-good IPs, system accounts, date ranges, etc What pitfalls could this induce?©2011 Lewes Technology Consulting, LLC
    31. 31. Example 1, Step 4: Analyze! Session info for most frequently used source IPs SELECT systemname, INET_NTOA(src_ip), COUNT(*) AS count FROM logins WHERE interesting != n GROUP BY systemname, src_ip ORDER BY count DESC, src_ip; Sessions with login duration > 1 day SELECT *, INET_NTOA(src_ip), TIMEDIFF(logouttime, logintime) AS duration FROM logins WHERE interesting != n HAVING duration > 24:00:00 ORDER BY duration DESC;©2011 Lewes Technology Consulting, LLC
    32. 32. Example 1, Step 4: Analyze! Daily login window per user SELECT userid, systemname, COUNT(*), MIN(TIME(logintime)), MAX(TIME(logintime)) FROM logins WHERE interesting != n GROUP BY userid,systemname; IPs per username per host SELECT userid, systemname, COUNT(*), GROUP_CONCAT(DISTINCT INET_NTOA(src_ip) SEPARATOR , ) FROM logins WHERE interesting != n GROUP BY userid, systemname ORDER BY systemname, userid;©2011 Lewes Technology Consulting, LLC
    33. 33. Example 2: Network Flows©2011 Lewes Technology Consulting, LLC
    34. 34. Example 2, Step 1: Schema CREATE TABLE `traffic` ( `interesting` enum (y,n,-) default -, `sancp_id` bigint unsigned, `start_time_gmt` datetime, `stop_time_gmt` datetime, `eth_proto` smallint unsigned, `ip_proto` tinyint unsigned, `src_ip` int unsigned, Smaller Integers `src_port` smallint unsigned, `dst_ip` int unsigned, `dst_port` smallint unsigned, `duration` int unsigned, `src_pkts` bigint unsigned, `dst_pkts` bigint unsigned, `src_bytes` bigint unsigned, `dst_bytes` bigint unsigned, );©2011 Lewes Technology Consulting, LLC
    35. 35. Example 2, Step 2: Load Distill pcap (or live) network data to sessions with SANCP Creates pipe-separated list of 50 fields per session Use script to extract relevant fields INSERT INTO `traffic` (`sancp_id`, `start_time_gmt`, `stop_time_gmt`, `eth_proto`, `ip_proto`, `src_ip`, `src_port`, `dst_ip`, `dst_port`, `duration`, `src_pkts`, `dst_pkts`, `src_bytes`, `dst_bytes`) VALUES (5613951026752893809, ‘2011-06-03 11:17:11’, ‘2011-06-03 11:17:25’, 8, 6, INET_NTOA(‘0.2.246.190’), 3306, INET_NTOA(‘0.3.162.24’), 14, 82, 43, 15866, 0); Python script for pcaps at stuffphilwrites.com/2011/07/sql-ginsu©2011 Lewes Technology Consulting, LLC http://metre.net/sancp.html
    36. 36. Example 2, Step 3: Reduce Brute force SSH logins? Do one and observe network traffic <5 sec && <45 packets && <2500 bytes UPDATE traffic SET interesting=n WHERE (src_port=22 OR dst_port=22) AND duration < 5 AND src_pkts+dst_pkts < 45 AND src_bytes+dst_bytes < 2500; DNS, NTP can sometimes be ruled out UPDATE traffic SET interesting=n WHERE eth_proto=8 AND ip_proto=17 AND (dst_port=53 OR dst_port=123);©2011 Lewes Technology Consulting, LLC
    37. 37. Example 2, Step 4: Analyze! SSH/SCP/SFTP sources SELECT INET_NTOA(src_ip), duration, INET_NTOA(dst_ip), ((src_bytes+dst_bytes)/1024/1024) AS MB FROM traffic WHERE (src_port=22 or dst_port=22) AND interesting != n;©2011 Lewes Technology Consulting, LLC
    38. 38. Example 2, Step 4: Analyze! Inbound FTP: multiple sessions used - two-stage query SELECT GROUP_CONCAT(src_ip SEPARATOR , ) FROM traffic WHERE (dst_port=21 AND dst_bytes>0); SELECT INET_NTOA(src_ip), duration, (src_bytes/1024/1024) AS src_MB, (dst_bytes/1024/1024) AS dst_MB, start_time_gmt, stop_time_gmt FROM traffic WHERE (src_port>1024 AND dst_port>1024) AND src_ip IN (<list from prev query>) AND interesting != n;©2011 Lewes Technology Consulting, LLC
    39. 39. Example 2, Step 4: Analyze! Bot beaconing Add column ALTER TABLE `traffic` ADD COLUMN `elapsed` TIME DEFAULT NULL AFTER `dst_bytes`; During data load,“time last seen” = “srcIP:dstIP:dstport” If you’ve seen the IP+IP+port before, insert time delta: elapsed = TIMEDIFF(start_time_gmt, <time_last_seen>); Set “time last seen” in loader script to new start_time_gmt©2011 Lewes Technology Consulting, LLC
    40. 40. Wrap-up Mo’ data, mo’ problems Forensicators need foundational skills of: data management and data reduction Use SQL to do this: 1: Schema design 2: Data load 3: Reduce 4: Analyze!©2011 Lewes Technology Consulting, LLC
    41. 41. Questions?©2011 Lewes Technology Consulting, LLC

    ×