sqlmap - Under the Hood

Information security, penetration testing, software development, @sqlmap
May. 23, 2013

More Related Content

Similar to sqlmap - Under the Hood(20)


sqlmap - Under the Hood

  1. sqlmap – Under the Hood Miroslav Štampar ( sqlmap – Under the Hood Miroslav Štampar (
  2. PHDays 2013, Moscow (Russia) May 23, 2013 2 BigArray  Support for huge table dumps (e.g. millions of rows)  Raw data needs to be held somewhere before being processed (and eventually stored)  In-memory was a good enough choice until recent years (user appetites went bigger)  Avoidance of MemoryError  Memory mapping into smaller chunks/pages (e.g. 4096 entries)  Temporary files are used for storing chunks  O(1) read/write access (page table principle)
  3. PHDays 2013, Moscow (Russia) May 23, 2013 3 HashDB  Storage of resumable session data at centralized place (local SQLite3 database)  Non-ASCII values are automatically serialized/deserialized (pickle)  INSERT INTO storage VALUES (LONG(MD5(target_url || key || MILESTONE_SALT)[:8]), stored_value)  MILESTONE_SALT is changed whenever there is a change in HashDB mechanism that is bringing incompatibility with previous versions  key uniquely describes storage_value for a given target_url (e.g.: KB_INJECTIONS, SELECT banner FROM v$version WHERE ROWNUM=1, etc.)
  4. PHDays 2013, Moscow (Russia) May 23, 2013 4 Payloads  XML format (xml/payloads.xml)  Tag type <boundary> used for storage of all possible prefix and suffix formations (<prefix>, <suffix>) together with context sensitive information (subtags <level>, <clause>, <where> and <ptype>)  Tag type <test> used for storage of data required for successful testing and usage of each SQL injection payload type (subtags <title>, <stype>, <level>, <risk>, <clause>, <where>, <vector>, <request> and <response>)
  5. PHDays 2013, Moscow (Russia) May 23, 2013 5 Payloads (2) <boundary> <level>1</level> <clause>1</clause> <where>1,2</where> <ptype>1</ptype> <prefix>)</prefix> <suffix>AND ([RANDNUM]=[RANDNUM]</suffix> </boundary>
  6. PHDays 2013, Moscow (Russia) May 23, 2013 6 Payloads (3) <test> <title>Microsoft SQL Server/Sybase AND error-based - WHERE or HAVING clause (IN)</title> <stype>2</stype> <level>2</level> <risk>0</risk> <clause>1</clause> <where>1</where> <vector>AND [RANDNUM] IN (('[DELIMITER_START]'+([QUERY]) +'[DELIMITER_STOP]'))</vector> <request> <payload>AND [RANDNUM] IN (('[DELIMITER_START]'+(SELECT (CASE WHEN ([RANDNUM]=[RANDNUM]) THEN '1' ELSE '0' END)) +'[DELIMITER_STOP]'))</payload> </request> <response> <grep>[DELIMITER_START](?P&lt;result&gt;.*?) [DELIMITER_STOP]</grep> </response> <details> <dbms>Microsoft SQL Server</dbms> <dbms>Sybase</dbms> <os>Windows</os> </details> </test>
  7. PHDays 2013, Moscow (Russia) May 23, 2013 7 Queries  XML format (xml/queries.xml)  Tag type <dbms> used for storage of all DBMS specific SQL formations required for successful enumeration (subtags <users>, <passwords>, <dbs>, <tables>, <columns>, <dump_table>, etc.) and resulting data (pre)processing (subtags <cast>, <length>, <isnull>, <count>, <substring>, <concatenate>, etc.)  Each enumeration subtag has an <inband> and <blind> form used in respective techniques
  8. PHDays 2013, Moscow (Russia) May 23, 2013 8 Queries (2) <dbms value="MySQL"> <cast query="CAST(%s AS CHAR)"/> <length query="CHAR_LENGTH(%s)"/> <isnull query="IFNULL(%s,' ')"/> <delimiter query=","/> <limit query="LIMIT %d,%d"/> … <passwords> <inband query="SELECT user,password FROM mysql.user" condition="user"/> <blind query="SELECT DISTINCT(password) FROM mysql.user WHERE user='%s' LIMIT %d,1" count="SELECT COUNT(DISTINCT(password)) FROM mysql.user WHERE user='%s'"/> </passwords> …
  9. PHDays 2013, Moscow (Russia) May 23, 2013 9 Multithreading  Multithreading implemented wherever applicable (option --threads)  Techniques covered: boolean-based blind, error-based and partial UNION query  Deliberately turned off for techniques: time- based and stacked (lots of reasons)  Each thread covers a part of value in case of boolean-based blind  In other techniques, each thread covers one enumerated entry  Also, implemented for brute force column/table name search and crawling
  10. PHDays 2013, Moscow (Russia) May 23, 2013 10 Direct connection  Direct connection to DBMS (option -d)  python -d “mysql://root:password123@ 06/testdb”  Support for: Microsoft SQL Server, MySQL, Oracle, PostgreSQL, SQLite, Microsoft Access, Firebird, SAP MaxDB, Sybase, IBM DB2  Using of 3rd party connectors (e.g. python- pymssql, pymysql, cx_Oracle, python-psycopg2, etc.)  SQLAlchemy used as an alternative
  11. PHDays 2013, Moscow (Russia) May 23, 2013 11 Load request(s) from file  Load HTTP request(s) from a textual file (option -r)  Supporting RAW request format (any MITM proxy can be used to catch one)  Particularly usable in requests with large content body (e.g. POST)  Load and parse log files (option -l)  Supporting Burp and WebScarab log formats  Unlimited number of parsed HTTP requests (using only unique ones)
  12. PHDays 2013, Moscow (Russia) May 23, 2013 12 Content type detection  Automatic detection of (specialized) request content types  Supporting SOAP, JSON and (generic) XML  For example: --data="{ "pid": 4412, "id": 1, "action": "do"}" --data="<request><pid>4412</pid> <id>1</id><action>do</action></request>"  Appropriate exploitation of parameter values  In case of non-supported format(s), custom injection mark (*) can be used
  13. PHDays 2013, Moscow (Russia) May 23, 2013 13 Site crawling/form searching  Collect usable (on site) target links (option --crawl)  User defines crawling depth (e.g. 3) limiting search based on distance from starting page  Optional form searching at visited pages (switch --forms)  Arbitrary filling of missing form data  Reparation of non-HTML compliant pages for easier processing
  14. PHDays 2013, Moscow (Russia) May 23, 2013 14 Mnemonics  Usage of mnemonics for faster setting up of sqlmap options and switches (option -z)  Longer (original): python --flush-session --threads=4 --ignore-proxy --batch --banner -u …  Shorter (using mnemonics): python -z “flu,thre=4,ign,bat,ban” -u …  Highly generic prefix based recognition (e.g. -z “flu,bat,ban” is interpreted the same as -z “flush,batc,bann”)
  15. PHDays 2013, Moscow (Russia) May 23, 2013 15 Keep-alive  HTTP persistent connection (switch --keep- alive)  Opposed to new connection for every single request/response pair  Slightly adapted 3rd party module keepalive and adjusted for multi-threading  Connection pool – reusage of existing target connection(s) where applicable  Reduced network congestion (fewer TCP connections), reduced latency (no handshaking), faster enumeration, etc.
  16. PHDays 2013, Moscow (Russia) May 23, 2013 16 Tor  Support for The Onion Router (Tor) online anonymity network (switch --tor)  Concealing identity and network activity  Used against surveillance and (targeted) traffic sniffing  Configurable Tor proxy type (option --tor-type) and port number (option --tor-port)  DNS leakage is prevented (no DNS requests outside of Tor)  Available safety check for proper usage of Tor (switch --check-tor)
  17. PHDays 2013, Moscow (Russia) May 23, 2013 17 Domain name resolution caching  DNS resolution request is done by default for each HTTP request (from Python HTTP dedicated modules – e.g. httplib)  Noticeable slowdown in some cases (e.g. excessive network latency)  Problem noticed and reported by (nagging) users (looking into Wireshark traffic captures)  Problem patched at the lowest level (method socket.getaddrinfo(*args, **kwargs) is encapsulated for caching)
  18. PHDays 2013, Moscow (Russia) May 23, 2013 18 Authentication methods  Implemented support for authentication methods: basic, digest, NTLM and certificate (options --auth-type, --auth-cred and --auth- cert)  python -u “” --auth-type=basic --auth- cred=”testuser:testpass”  Handling HTTP status code 401 (Unauthorized)  Authorization headers are being cached (where applicable)
  19. PHDays 2013, Moscow (Russia) May 23, 2013 19 Reflection detection and removal  Noisy response resulting from request reflection  Query results for: 1%20AND%201%3D1  Can cause problems in detection phase  Particularly problematic for boolean-based blind technique (fuzzy page comparison)  Automatic detection of reflected payload value and marking with predefined constant value  Query results for: __REFLECTED_VALUE__
  20. PHDays 2013, Moscow (Russia) May 23, 2013 20 Dynamicity detection and removal  Noisy response resulting from sporadically changing content (e.g. ads, banners, etc.)  Can cause problems in both detection and enumeration phase  Particularly problematic for boolean-based blind technique  Automatic detection and marking of dynamic parts (info held in internal knowledge base)  In best case, automatic recognition and usage of string value appearing only in True responses (option --string)
  21. PHDays 2013, Moscow (Russia) May 23, 2013 21 Content filtering  Occasionally pages are bulked with non-textual content (CSS styles, comments, JavaScript, HTML tags, embedded objects, etc.)  Changes regarding boolean-based blind technique are usually affecting only one small textual part (e.g. table entry)  Optional filtering of non-textual content (switch –text-only)  For example: <html>...<td>Tooth fairy</td>...</html> is filtered to ...Tooth fairy...  Better detection and less trash(y) results
  22. PHDays 2013, Moscow (Russia) May 23, 2013 22 Wizard mode  For beginner users and script kiddies (switch --wizard)  Questions asked: Target URL POST data (if any) Injection difficulty (Normal/Medium/Hard) Enumeration (Basic/Intermediate/All)  Infamous for Comodo Brazil breach (March 2011) – attackers posted wizard mode console output to the Pastebin
  23. PHDays 2013, Moscow (Russia) May 23, 2013 23 Level/risk of detection  Number of requests per each parameter in testing phase can grow from 10 up to 10K  To prevent unnecessary noise and speed up the testing time, tests are classified by level and risk  Level (option --level) represents (passing) possibility/usability of the test case (higher level means lower possibility)  Risk (option --risk) represents potential damage that the test case can cause (higher risk means higher potential damage)
  24. PHDays 2013, Moscow (Russia) May 23, 2013 24 Heuristic SQL injection checks  Recognition of the backend DBMS if error message can be provoked with arbitrary invalid SQL sequence (e.g. ())'”(''”')  In case that the parameter value is integer and response for (e.g.) 1 is the same as for (2-1), there is a good chance that the target is vulnerable  In case of detected boolean-based blind technique, DBMS specific queries are used (e.g. (SELECT 0x616263)=0x616263) to potentially move focus to a particular DBMS in further tests
  25. PHDays 2013, Moscow (Russia) May 23, 2013 25 Type casting detection  Type casting is an efficient way for dealing with SQL injection on numeric values  $query = "SELECT * FROM log WHERE id=" . intval($_GET['id']);  Implemented automatic detection of such cases  In case that the parameter value is integer and response for (e.g.) 1 is the same as for 1foobar, there is a good chance that the target is using integer casting  User is warned of a potentially “futile” run
  26. PHDays 2013, Moscow (Russia) May 23, 2013 26 Fingerprinting  Web server is being fingerprinted by known HTTP headers, cookie values, etc.  DBMS is being fingerprinted through error message parsing, banner parsing and tests with version specific payloads (obtained from release notes and reference manuals)  For example, cookie value ASP.NET_SessionId is specific for ASP.NET/IIS/Windows platform, while TO_SECONDS(950501)>0 check should work only on MySQL >= 5.5.0  Detailed DBMS version check is done only if switch -f/--fingerpint is used
  27. PHDays 2013, Moscow (Russia) May 23, 2013 27 Suhosin-patch detection  Open source patch for PHP, protecting web server from “insecure PHP practices”  suhosin.get.max_value_length (default: 512),, etc.  Causing problems in enumeration phase when payloads are big (e.g. enumerating column names)  After the detection phase single payload (depending on detected techniques) is sent having size greater than 512 (e.g. 1 AND 6525 = … 6525)  User is warned in case of False response
  28. PHDays 2013, Moscow (Russia) May 23, 2013 28 WAF/IDS/IPS detection  Sending one “suspicious” request (in form of dummy parameter value) and checking for response change(s) when compared to original (switch --check-waf)  WAF scripts (switch --identify-waf) do a through checking, each focusing on peculiarities of a particular product  For example, WebKnight responds with HTTP status code 999 on detected suspicious activity  Currently there are 29 WAF scripts (,,, etc.)
  29. PHDays 2013, Moscow (Russia) May 23, 2013 29 WAF/IDS/IPS bypass  Tamper scripts (option --tamper) do changes on injected payload before it's being sent  User has to choose appropriate one(s) based on collected knowledge of target's behavior and/or detected WAF/IDS/IPS product  If required, a chain of tamper scripts can be used (e.g. --tamper=”between, ifnull2ifisnull”)  Currently there are 36 tamper scripts (,,, etc.)
  30. PHDays 2013, Moscow (Russia) May 23, 2013 30 String value escaping  Each string value inside payload is automatically escaped (quoteless format) depending on targeted DBMS  For example: 1 ... AND username=”root”-- is in case of MySQL escaped to 1 ... AND username=0x726f6f74--  Avoidance of filter-based escaping functions (e.g. addslashes)  Adding implicit dependence to targeted DBMS  Payload obfuscation (harder noticeability in target log files)
  31. PHDays 2013, Moscow (Russia) May 23, 2013 31 Evaluation of custom code  Custom Python code can be evaluated before each request (option --eval)  In such code, each request parameter is accessible as a local variable  All resulting variable values are included into the request as new parameter values  --eval="import hashlib;hash=hashlib.md5(id).hexdigest()"  AND 1=1&hash=7f134e52836a00e26493e690ed8aa735
  32. PHDays 2013, Moscow (Russia) May 23, 2013 32 Fuzzy page comparison  Used (mostly) in boolean-based blind technique  Gestalt pattern matching (Ratcliff-Obershelp algorithm)  Supported by standard Python module difflib  Class SequenceMatcher  Method ratio() (or faster quick_ratio()) giving a measure of the sequences’ similarity as a float in range [0, 1]  True result if ratio() > 0.98 when compared with original page
  33. PHDays 2013, Moscow (Russia) May 23, 2013 33 Definite page comparison  Used mostly in boolean-based blind technique  When fuzzy page comparison fails (e.g. too much page dynamicity) and user is able to distinguish True from False responses by himself (non-n**b)  String to match when result should be recognized as True (option --string)  Regular expression to match … (option --regex)  Compare HTTP codes (switch --code)  Compare HTML titles (switch --title)
  34. PHDays 2013, Moscow (Russia) May 23, 2013 34 Null connection  Sometimes there is no need for retrieval of whole page content (size can be enough)  Boolean-based blind technique  3 methods: Range, HEAD and “skip-read”  Range: bytes=-1 Content-Range: bytes 4789-4790/4790  HEAD /search.aspx HTTP/1.1 Content-Length: 4790  Both are resulting (if applicable) with either empty or 1 char long response  Method “skip-read” retrieves only HTTP headers looking for Content-Length
  35. PHDays 2013, Moscow (Russia) May 23, 2013 35 False positive detection  False positives are highly undesirable  Specific for boolean-based blind and time- based blind techniques  False positive tests are done in cases when only one of those techniques is detected  Set of trivial mathematical checks performed to see if target can “respond” correctly  For example: (123+447)=570 319>(519+110) (654+267)>854
  36. PHDays 2013, Moscow (Russia) May 23, 2013 36 Delay detection  Detection of “artificial” delay  Statistical comparison with normal response times  Response time must fit under the Gaussian bell curve to be marked as “normal”  Is <current_response_time> > avg(<normal_response_times>) +7*stdev(<normal_response_times>)?  If answer is yes, probability that we are dealing with “artificial” delay is 99.9999999997440%  Especially useful when heavy queries are used (not knowing expected delay value)
  37. PHDays 2013, Moscow (Russia) May 23, 2013 37 Delay detection (2)
  38. PHDays 2013, Moscow (Russia) May 23, 2013 38 UNION query column #  UNION query requires knowledge of number of columns (N) for vulnerable SQL statement  Two methods used: ORDER BY and statistical (same principle as in delay detection)  ORDER BY N+1 should respond noticeably different (preferably with error message) than for ORDER BY N (binary searched)  In statistical method responses for candidates (UNION SELECT NULL, NULL,...) are compared to original (not injected) response  Right one is the one that seems “not normal” (having ratio outside the Gaussian bell curve)
  39. PHDays 2013, Moscow (Russia) May 23, 2013 39 Output prediction  Inference techniques (boolean-based blind and time-based blind) require optimization wherever and whenever possible  In certain cases prediction(s) can be made  Checking if current retrieved entry shares same prefix with previous retrieved entr(ies)  For example DROP ANY ROLE has same prefix as DROP ANY RULE (one request per checked character compared to bit-by-bit retrieval)  Using common output values too (e.g. information_schema, phpmyadmin, etc.)
  40. PHDays 2013, Moscow (Russia) May 23, 2013 40 Brute forcing identifier names  In case of missing schema (e.g. deleted information_schema) brute force search is required (e.g. 1=(SELECT 1 FROM users))  Searching for common table names (switch --common-tables)  Searching for common column names (switch --common-columns)  Conducted automated search and parsing of resulting SQL files for chosen Google dorks (e.g. ext:sql “CREATE TABLE”)  Collected most frequent 3.3K table names and 2.5K column names
  41. PHDays 2013, Moscow (Russia) May 23, 2013 41 Pivot dump table  Some DBMSes (e.g. Microsoft SQL Server) don't have OFFSET/LIMIT query mechanism making enumeration problematic in non-UNION query techniques  Column with most DISTINCT values is automatically chosen as the pivot column  Pivot's first value bigger than previous (e.g. SELECT MIN(id) WHERE id > ' ') is retrieved  Entries for other columns (e.g. SELECT name WHERE id=1) are being retrieved using current pivot value  Iterative process
  42. PHDays 2013, Moscow (Russia) May 23, 2013 42 International letters  Добрый день Россия  Page encoding is parsed from Content-Type HTTP header, Content-Type meta HTML header or heuristically detected (3rd party module chardet)  RAW target response is automatically decoded to Unicode (using detected page encoding)  In case of inband techniques (UNION query and error-based) results with international letters are already supported if decoding went properly
  43. PHDays 2013, Moscow (Russia) May 23, 2013 43 International letters (2)  In case of inference techniques (boolean-based blind and time-based blind) characters are being inferred already in their Unicode form  Potential problems occur when stored data and/or database connector use different (non- compatible) charset than target's response  In case of unsuccessful decoding of international letters (e.g. gibberish output) charset can be enforced (option --charset)
  44. PHDays 2013, Moscow (Russia) May 23, 2013 44 Hex encoding retrieved data  All supported DBMSes have capabilities to encode resulting data to hexadecimal format (switch --hex)  Most useful in cases when (parts of) results are potentially lost (e.g. binary data in inband techniques)  Retrieved data is automatically decoded to its original (non-hexadecimal) format  Such binary content is checked for known formats (usign 3rd party module magic) and (if recognized) stored to output files
  45. PHDays 2013, Moscow (Russia) May 23, 2013 45 Dump format  Dumped table content can be stored in 3 different formats: CSV (default), HTML and SQLite (option --dump-format)  In CSV format each row is represented by one line and each column entry is being separated by a predefined separator character (e.g. ,)  In HTML format dump is stored into a visually recognizable (browser) table  In SQLite format dump is “replicated” to a locally stored SQLite3 database giving a possibility of (among others) running queries against it
  46. PHDays 2013, Moscow (Russia) May 23, 2013 46 Password cracking  Implemented support for detection and wordlist-based cracking of 14 different commonly used hash algorithms  MySQL (newer and older), MsSQL (newer and older), Oracle (newer and older), PostgreSQL, MD5, SHA1, etc.  Automatic analysis of retrieved passwords (-- passwords) and table dumps (--dump)  (Optional) common suffix forms (1, 123, etc.)  Multiprocessed attack (# of CPUs)  1M MySQL hash guesses in under 10 seconds on 4 core Intel Xeon W3550 @ 3.07GHz
  47. PHDays 2013, Moscow (Russia) May 23, 2013 47 Large dictionary support  Distributed access in multiprocessing environment  Support for huge dictionaries (chunk read)  Support for dictionary lists  Support for ZIP compressed dictionaries  Included custom built and compressed dictionary (1.2M entries) based on highly popular and publicly available dumps, like RockYou, Gawker, Yahoo, etc.
  48. PHDays 2013, Moscow (Russia) May 23, 2013 48 Stagers and backdoors  Stagers are used for uploading arbitrary (binary) files (e.g. UDF files, backdoors, etc.)  Backdoors are used for OS command execution (switches --os-cmd and --os-shell)  Prerequisite is that one of known SQL file write methods can be used (e.g. INTO DUMPFILE, EXEC xp_cmdshell 'debug.exe < dump.src', etc.)  4 different platforms supported: ASP, ASP.NET, JSP and PHP  Stored in “cloaked” format (preventing local AV triggering) inside shell directory
  49. PHDays 2013, Moscow (Russia) May 23, 2013 49 Metasploit integration  Automatized creation, upload and run of Metasploit shellcode payload (switch --os-pwn)  User can choose payload (Meterpreter, shell or VNC), connection (reverse TCP, reverse HTTP, etc.) and encoder type (no encoder, Call+4 Dword XOR Encoder, etc.)  shellcodeexec(.exe) is being uploaded along with (non-compiled) Metasploit shellcode payload using stager or other means  Metasploit CLI is being run at the host machine  Payload is being executed at the target machine connecting back to the host machine
  50. PHDays 2013, Moscow (Russia) May 23, 2013 50 Second order SQL injection  Occurs when provided user data stored at one place is being used in vulnerable SQL statement at the other place  Similar to permanent XSS  User can explicitly set the location where to look for the response (option --second-order)  Effectively doubling number of required requests
  51. PHDays 2013, Moscow (Russia) May 23, 2013 51 DNS exfiltration  Out-of-band SQL injection technique using DNS resolution mechanism (option --dns-domain)  Fake DNS server instance is automatically being made at the host machine  SQL injection payloads being sent are deliberately provoking DNS resolution mechanism at the target machine  Provoked DNS requests carry results of a query  Fake DNS server instance intercepts requests and responds with dummy resolution answers  Requires registration of a nameserver for the used domain pointing to the host machine
  52. PHDays 2013, Moscow (Russia) May 23, 2013 52 Output purging  Output directory can be (optionally) “safely” removed (switch --purge-output)  Content of all contained files (sessions, logs, dumps, etc.) is being overwritten with random data  Files truncated and renamed to random values  (sub)directories renamed to random values  At the end, whole output directory tree is being removed
  53. PHDays 2013, Moscow (Russia) May 23, 2013 53 Questions?