Ph days 2013-miroslav-stampar_-_sqlmap_under_the_hood
sqlmap – Under the Hood
Miroslav Štampar
(dev@sqlmap.org)
sqlmap – Under the Hood
Miroslav Štampar
(dev@sqlmap.org)
PHDays 2013, Moscow (Russia) May 23, 2013 2
BigArray
Support for huge table dumps (e.g. millions of
rows)
Raw data needs to be held somewhere before
being processed (and eventually stored)
In-memory was a good enough choice until
recent years (user appetites went bigger)
Avoidance of MemoryError
Memory mapping into smaller chunks/pages
(e.g. 4096 entries)
Temporary files are used for storing chunks
O(1) read/write access (page table principle)
PHDays 2013, Moscow (Russia) May 23, 2013 3
HashDB
Storage of resumable session data at
centralized place (local SQLite3 database)
Non-ASCII values are automatically
serialized/deserialized (pickle)
INSERT INTO storage VALUES
(LONG(MD5(target_url || key ||
MILESTONE_SALT)[:8]), stored_value)
MILESTONE_SALT is changed whenever there is a
change in HashDB mechanism that is bringing
incompatibility with previous versions
key uniquely describes storage_value for a
given target_url (e.g.: KB_INJECTIONS, SELECT
banner FROM v$version WHERE ROWNUM=1, etc.)
PHDays 2013, Moscow (Russia) May 23, 2013 4
Payloads
XML format (xml/payloads.xml)
Tag type <boundary> used for storage of all
possible prefix and suffix formations (<prefix>,
<suffix>) together with context sensitive
information (subtags <level>, <clause>,
<where> and <ptype>)
Tag type <test> used for storage of data
required for successful testing and usage of
each SQL injection payload type (subtags
<title>, <stype>, <level>, <risk>, <clause>,
<where>, <vector>, <request> and <response>)
PHDays 2013, Moscow (Russia) May 23, 2013 6
Payloads (3)
<test>
<title>Microsoft SQL Server/Sybase AND error-based - WHERE or HAVING
clause (IN)</title>
<stype>2</stype>
<level>2</level>
<risk>0</risk>
<clause>1</clause>
<where>1</where>
<vector>AND [RANDNUM] IN (('[DELIMITER_START]'+([QUERY])
+'[DELIMITER_STOP]'))</vector>
<request>
<payload>AND [RANDNUM] IN (('[DELIMITER_START]'+(SELECT (CASE WHEN
([RANDNUM]=[RANDNUM]) THEN '1' ELSE '0' END))
+'[DELIMITER_STOP]'))</payload>
</request>
<response>
<grep>[DELIMITER_START](?P<result>.*?)
[DELIMITER_STOP]</grep>
</response>
<details>
<dbms>Microsoft SQL Server</dbms>
<dbms>Sybase</dbms>
<os>Windows</os>
</details>
</test>
PHDays 2013, Moscow (Russia) May 23, 2013 7
Queries
XML format (xml/queries.xml)
Tag type <dbms> used for storage of all DBMS
specific SQL formations required for successful
enumeration (subtags <users>, <passwords>,
<dbs>, <tables>, <columns>, <dump_table>, etc.)
and resulting data (pre)processing (subtags
<cast>, <length>, <isnull>, <count>,
<substring>, <concatenate>, etc.)
Each enumeration subtag has an <inband> and
<blind> form used in respective techniques
PHDays 2013, Moscow (Russia) May 23, 2013 8
Queries (2)
<dbms value="MySQL">
<cast query="CAST(%s AS CHAR)"/>
<length query="CHAR_LENGTH(%s)"/>
<isnull query="IFNULL(%s,' ')"/>
<delimiter query=","/>
<limit query="LIMIT %d,%d"/>
…
<passwords>
<inband query="SELECT user,password
FROM mysql.user" condition="user"/>
<blind query="SELECT DISTINCT(password)
FROM mysql.user WHERE user='%s' LIMIT %d,1"
count="SELECT COUNT(DISTINCT(password)) FROM
mysql.user WHERE user='%s'"/>
</passwords>
…
PHDays 2013, Moscow (Russia) May 23, 2013 9
Multithreading
Multithreading implemented wherever
applicable (option --threads)
Techniques covered: boolean-based blind,
error-based and partial UNION query
Deliberately turned off for techniques: time-
based and stacked (lots of reasons)
Each thread covers a part of value in case of
boolean-based blind
In other techniques, each thread covers one
enumerated entry
Also, implemented for brute force column/table
name search and crawling
PHDays 2013, Moscow (Russia) May 23, 2013 10
Direct connection
Direct connection to DBMS (option -d)
python sqlmap.py -d
“mysql://root:password123@192.168.21.129:33
06/testdb”
Support for: Microsoft SQL Server, MySQL,
Oracle, PostgreSQL, SQLite, Microsoft Access,
Firebird, SAP MaxDB, Sybase, IBM DB2
Using of 3rd
party connectors (e.g. python-
pymssql, pymysql, cx_Oracle, python-psycopg2,
etc.)
SQLAlchemy used as an alternative
PHDays 2013, Moscow (Russia) May 23, 2013 11
Load request(s) from file
Load HTTP request(s) from a textual file (option
-r)
Supporting RAW request format (any MITM
proxy can be used to catch one)
Particularly usable in requests with large
content body (e.g. POST)
Load and parse log files (option -l)
Supporting Burp and WebScarab log formats
Unlimited number of parsed HTTP requests
(using only unique ones)
PHDays 2013, Moscow (Russia) May 23, 2013 12
Content type detection
Automatic detection of (specialized) request
content types
Supporting SOAP, JSON and (generic) XML
For example:
--data="{ "pid": 4412, "id":
1, "action": "do"}"
--data="<request><pid>4412</pid>
<id>1</id><action>do</action></request>"
Appropriate exploitation of parameter values
In case of non-supported format(s), custom
injection mark (*) can be used
PHDays 2013, Moscow (Russia) May 23, 2013 13
Site crawling/form searching
Collect usable (on site) target links (option
--crawl)
User defines crawling depth (e.g. 3) limiting
search based on distance from starting page
Optional form searching at visited pages
(switch --forms)
Arbitrary filling of missing form data
Reparation of non-HTML compliant pages for
easier processing
PHDays 2013, Moscow (Russia) May 23, 2013 14
Mnemonics
Usage of mnemonics for faster setting up of
sqlmap options and switches (option -z)
Longer (original):
python sqlmap.py --flush-session
--threads=4 --ignore-proxy --batch --banner
-u …
Shorter (using mnemonics):
python sqlmap.py -z
“flu,thre=4,ign,bat,ban” -u …
Highly generic prefix based recognition (e.g. -z
“flu,bat,ban” is interpreted the same as -z
“flush,batc,bann”)
PHDays 2013, Moscow (Russia) May 23, 2013 15
Keep-alive
HTTP persistent connection (switch --keep-
alive)
Opposed to new connection for every single
request/response pair
Slightly adapted 3rd
party module keepalive
and adjusted for multi-threading
Connection pool – reusage of existing target
connection(s) where applicable
Reduced network congestion (fewer TCP
connections), reduced latency (no
handshaking), faster enumeration, etc.
PHDays 2013, Moscow (Russia) May 23, 2013 16
Tor
Support for The Onion Router (Tor) online
anonymity network (switch --tor)
Concealing identity and network activity
Used against surveillance and (targeted) traffic
sniffing
Configurable Tor proxy type (option --tor-type)
and port number (option --tor-port)
DNS leakage is prevented (no DNS requests
outside of Tor)
Available safety check for proper usage of Tor
(switch --check-tor)
PHDays 2013, Moscow (Russia) May 23, 2013 17
Domain name resolution caching
DNS resolution request is done by default for
each HTTP request (from Python HTTP
dedicated modules – e.g. httplib)
Noticeable slowdown in some cases (e.g.
excessive network latency)
Problem noticed and reported by (nagging)
users (looking into Wireshark traffic captures)
Problem patched at the lowest level (method
socket.getaddrinfo(*args, **kwargs) is
encapsulated for caching)
PHDays 2013, Moscow (Russia) May 23, 2013 18
Authentication methods
Implemented support for authentication
methods: basic, digest, NTLM and certificate
(options --auth-type, --auth-cred and --auth-
cert)
python sqlmap.py -u
“http://192.168.21.129/vuln.php?id=1”
--auth-type=basic --auth-
cred=”testuser:testpass”
Handling HTTP status code 401 (Unauthorized)
Authorization headers are being cached (where
applicable)
PHDays 2013, Moscow (Russia) May 23, 2013 19
Reflection detection and removal
Noisy response resulting from request
reflection
Query results for: 1%20AND%201%3D1
Can cause problems in detection phase
Particularly problematic for boolean-based
blind technique (fuzzy page comparison)
Automatic detection of reflected payload value
and marking with predefined constant value
Query results for: __REFLECTED_VALUE__
PHDays 2013, Moscow (Russia) May 23, 2013 20
Dynamicity detection and removal
Noisy response resulting from sporadically
changing content (e.g. ads, banners, etc.)
Can cause problems in both detection and
enumeration phase
Particularly problematic for boolean-based
blind technique
Automatic detection and marking of dynamic
parts (info held in internal knowledge base)
In best case, automatic recognition and usage
of string value appearing only in True
responses (option --string)
PHDays 2013, Moscow (Russia) May 23, 2013 21
Content filtering
Occasionally pages are bulked with non-textual
content (CSS styles, comments, JavaScript,
HTML tags, embedded objects, etc.)
Changes regarding boolean-based blind
technique are usually affecting only one small
textual part (e.g. table entry)
Optional filtering of non-textual content (switch
–text-only)
For example: <html>...<td>Tooth
fairy</td>...</html> is filtered to ...Tooth
fairy...
Better detection and less trash(y) results
PHDays 2013, Moscow (Russia) May 23, 2013 22
Wizard mode
For beginner users and script kiddies (switch
--wizard)
Questions asked:
Target URL
POST data (if any)
Injection difficulty (Normal/Medium/Hard)
Enumeration (Basic/Intermediate/All)
Infamous for Comodo Brazil breach (March
2011) – attackers posted wizard mode console
output to the Pastebin
PHDays 2013, Moscow (Russia) May 23, 2013 23
Level/risk of detection
Number of requests per each parameter in
testing phase can grow from 10 up to 10K
To prevent unnecessary noise and speed up the
testing time, tests are classified by level and
risk
Level (option --level) represents (passing)
possibility/usability of the test case (higher
level means lower possibility)
Risk (option --risk) represents potential
damage that the test case can cause (higher
risk means higher potential damage)
PHDays 2013, Moscow (Russia) May 23, 2013 24
Heuristic SQL injection checks
Recognition of the backend DBMS if error
message can be provoked with arbitrary invalid
SQL sequence (e.g. ())'”(''”')
In case that the parameter value is integer and
response for (e.g.) 1 is the same as for (2-1),
there is a good chance that the target is
vulnerable
In case of detected boolean-based blind
technique, DBMS specific queries are used (e.g.
(SELECT 0x616263)=0x616263) to potentially
move focus to a particular DBMS in further
tests
PHDays 2013, Moscow (Russia) May 23, 2013 25
Type casting detection
Type casting is an efficient way for dealing with
SQL injection on numeric values
$query = "SELECT * FROM log WHERE id=" .
intval($_GET['id']);
Implemented automatic detection of such
cases
In case that the parameter value is integer and
response for (e.g.) 1 is the same as for 1foobar,
there is a good chance that the target is using
integer casting
User is warned of a potentially “futile” run
PHDays 2013, Moscow (Russia) May 23, 2013 26
Fingerprinting
Web server is being fingerprinted by known
HTTP headers, cookie values, etc.
DBMS is being fingerprinted through error
message parsing, banner parsing and tests
with version specific payloads (obtained from
release notes and reference manuals)
For example, cookie value ASP.NET_SessionId is
specific for ASP.NET/IIS/Windows platform,
while TO_SECONDS(950501)>0 check should work
only on MySQL >= 5.5.0
Detailed DBMS version check is done only if
switch -f/--fingerpint is used
PHDays 2013, Moscow (Russia) May 23, 2013 27
Suhosin-patch detection
Open source patch for PHP, protecting web
server from “insecure PHP practices”
suhosin.get.max_value_length (default: 512),
suhosin.post.max_value_length, etc.
Causing problems in enumeration phase when
payloads are big (e.g. enumerating column
names)
After the detection phase single payload
(depending on detected techniques) is sent
having size greater than 512 (e.g. 1 AND 6525
= … 6525)
User is warned in case of False response
PHDays 2013, Moscow (Russia) May 23, 2013 28
WAF/IDS/IPS detection
Sending one “suspicious” request (in form of
dummy parameter value) and checking for
response change(s) when compared to original
(switch --check-waf)
WAF scripts (switch --identify-waf) do a
through checking, each focusing on
peculiarities of a particular product
For example, WebKnight responds with HTTP
status code 999 on detected suspicious activity
Currently there are 29 WAF scripts (airlock.py,
barracuda.py, bigip.py, etc.)
PHDays 2013, Moscow (Russia) May 23, 2013 29
WAF/IDS/IPS bypass
Tamper scripts (option --tamper) do changes on
injected payload before it's being sent
User has to choose appropriate one(s) based
on collected knowledge of target's behavior
and/or detected WAF/IDS/IPS product
If required, a chain of tamper scripts can be
used (e.g. --tamper=”between,
ifnull2ifisnull”)
Currently there are 36 tamper scripts
(apostrophemask.py, apostrophenullencode.py,
appendnullbyte.py, etc.)
PHDays 2013, Moscow (Russia) May 23, 2013 30
String value escaping
Each string value inside payload is
automatically escaped (quoteless format)
depending on targeted DBMS
For example: 1 ... AND username=”root”-- is
in case of MySQL escaped to 1 ... AND
username=0x726f6f74--
Avoidance of filter-based escaping functions
(e.g. addslashes)
Adding implicit dependence to targeted DBMS
Payload obfuscation (harder noticeability in
target log files)
PHDays 2013, Moscow (Russia) May 23, 2013 31
Evaluation of custom code
Custom Python code can be evaluated before
each request (option --eval)
In such code, each request parameter is
accessible as a local variable
All resulting variable values are included into
the request as new parameter values
--eval="import
hashlib;hash=hashlib.md5(id).hexdigest()"
www.target.com/vuln.php?id=1 AND
1=1&hash=7f134e52836a00e26493e690ed8aa735
PHDays 2013, Moscow (Russia) May 23, 2013 32
Fuzzy page comparison
Used (mostly) in boolean-based blind
technique
Gestalt pattern matching (Ratcliff-Obershelp
algorithm)
Supported by standard Python module difflib
Class SequenceMatcher
Method ratio() (or faster quick_ratio())
giving a measure of the sequences’ similarity
as a float in range [0, 1]
True result if ratio() > 0.98 when compared
with original page
PHDays 2013, Moscow (Russia) May 23, 2013 33
Definite page comparison
Used mostly in boolean-based blind technique
When fuzzy page comparison fails (e.g. too
much page dynamicity) and user is able to
distinguish True from False responses by
himself (non-n**b)
String to match when result should be
recognized as True (option --string)
Regular expression to match … (option --regex)
Compare HTTP codes (switch --code)
Compare HTML titles (switch --title)
PHDays 2013, Moscow (Russia) May 23, 2013 34
Null connection
Sometimes there is no need for retrieval of
whole page content (size can be enough)
Boolean-based blind technique
3 methods: Range, HEAD and “skip-read”
Range: bytes=-1
Content-Range: bytes 4789-4790/4790
HEAD /search.aspx HTTP/1.1
Content-Length: 4790
Both are resulting (if applicable) with either
empty or 1 char long response
Method “skip-read” retrieves only HTTP
headers looking for Content-Length
PHDays 2013, Moscow (Russia) May 23, 2013 35
False positive detection
False positives are highly undesirable
Specific for boolean-based blind and time-
based blind techniques
False positive tests are done in cases when
only one of those techniques is detected
Set of trivial mathematical checks performed to
see if target can “respond” correctly
For example:
(123+447)=570
319>(519+110)
(654+267)>854
PHDays 2013, Moscow (Russia) May 23, 2013 36
Delay detection
Detection of “artificial” delay
Statistical comparison with normal response
times
Response time must fit under the Gaussian bell
curve to be marked as “normal”
Is <current_response_time> >
avg(<normal_response_times>)
+7*stdev(<normal_response_times>)?
If answer is yes, probability that we are dealing
with “artificial” delay is 99.9999999997440%
Especially useful when heavy queries are used
(not knowing expected delay value)
PHDays 2013, Moscow (Russia) May 23, 2013 38
UNION query column #
UNION query requires knowledge of number of
columns (N) for vulnerable SQL statement
Two methods used: ORDER BY and statistical
(same principle as in delay detection)
ORDER BY N+1 should respond noticeably
different (preferably with error message) than
for ORDER BY N (binary searched)
In statistical method responses for candidates
(UNION SELECT NULL, NULL,...) are compared
to original (not injected) response
Right one is the one that seems “not normal”
(having ratio outside the Gaussian bell curve)
PHDays 2013, Moscow (Russia) May 23, 2013 39
Output prediction
Inference techniques (boolean-based blind and
time-based blind) require optimization
wherever and whenever possible
In certain cases prediction(s) can be made
Checking if current retrieved entry shares same
prefix with previous retrieved entr(ies)
For example DROP ANY ROLE has same prefix as
DROP ANY RULE (one request per checked
character compared to bit-by-bit retrieval)
Using common output values too (e.g.
information_schema, phpmyadmin, etc.)
PHDays 2013, Moscow (Russia) May 23, 2013 40
Brute forcing identifier names
In case of missing schema (e.g. deleted
information_schema) brute force search is
required (e.g. 1=(SELECT 1 FROM users))
Searching for common table names (switch
--common-tables)
Searching for common column names (switch
--common-columns)
Conducted automated search and parsing of
resulting SQL files for chosen Google dorks
(e.g. ext:sql “CREATE TABLE”)
Collected most frequent 3.3K table names and
2.5K column names
PHDays 2013, Moscow (Russia) May 23, 2013 41
Pivot dump table
Some DBMSes (e.g. Microsoft SQL Server) don't
have OFFSET/LIMIT query mechanism making
enumeration problematic in non-UNION query
techniques
Column with most DISTINCT values is
automatically chosen as the pivot column
Pivot's first value bigger than previous (e.g.
SELECT MIN(id) WHERE id > ' ') is retrieved
Entries for other columns (e.g. SELECT name
WHERE id=1) are being retrieved using current
pivot value
Iterative process
PHDays 2013, Moscow (Russia) May 23, 2013 42
International letters
Добрый день Россия
Page encoding is parsed from Content-Type
HTTP header, Content-Type meta HTML header
or heuristically detected (3rd
party module
chardet)
RAW target response is automatically decoded
to Unicode (using detected page encoding)
In case of inband techniques (UNION query and
error-based) results with international letters
are already supported if decoding went
properly
PHDays 2013, Moscow (Russia) May 23, 2013 43
International letters (2)
In case of inference techniques (boolean-based
blind and time-based blind) characters are
being inferred already in their Unicode form
Potential problems occur when stored data
and/or database connector use different (non-
compatible) charset than target's response
In case of unsuccessful decoding of
international letters (e.g. gibberish output)
charset can be enforced (option --charset)
PHDays 2013, Moscow (Russia) May 23, 2013 44
Hex encoding retrieved data
All supported DBMSes have capabilities to
encode resulting data to hexadecimal format
(switch --hex)
Most useful in cases when (parts of) results are
potentially lost (e.g. binary data in inband
techniques)
Retrieved data is automatically decoded to its
original (non-hexadecimal) format
Such binary content is checked for known
formats (usign 3rd
party module magic) and (if
recognized) stored to output files
PHDays 2013, Moscow (Russia) May 23, 2013 45
Dump format
Dumped table content can be stored in 3
different formats: CSV (default), HTML and
SQLite (option --dump-format)
In CSV format each row is represented by one
line and each column entry is being separated
by a predefined separator character (e.g. ,)
In HTML format dump is stored into a visually
recognizable (browser) table
In SQLite format dump is “replicated” to a
locally stored SQLite3 database giving a
possibility of (among others) running queries
against it
PHDays 2013, Moscow (Russia) May 23, 2013 46
Password cracking
Implemented support for detection and
wordlist-based cracking of 14 different
commonly used hash algorithms
MySQL (newer and older), MsSQL (newer and
older), Oracle (newer and older), PostgreSQL,
MD5, SHA1, etc.
Automatic analysis of retrieved passwords (--
passwords) and table dumps (--dump)
(Optional) common suffix forms (1, 123, etc.)
Multiprocessed attack (# of CPUs)
1M MySQL hash guesses in under 10 seconds
on 4 core Intel Xeon W3550 @ 3.07GHz
PHDays 2013, Moscow (Russia) May 23, 2013 47
Large dictionary support
Distributed access in multiprocessing
environment
Support for huge dictionaries (chunk read)
Support for dictionary lists
Support for ZIP compressed dictionaries
Included custom built and compressed
dictionary (1.2M entries) based on highly
popular and publicly available dumps, like
RockYou, Gawker, Yahoo, etc.
PHDays 2013, Moscow (Russia) May 23, 2013 48
Stagers and backdoors
Stagers are used for uploading arbitrary
(binary) files (e.g. UDF files, backdoors, etc.)
Backdoors are used for OS command execution
(switches --os-cmd and --os-shell)
Prerequisite is that one of known SQL file write
methods can be used (e.g. INTO DUMPFILE, EXEC
xp_cmdshell 'debug.exe < dump.src', etc.)
4 different platforms supported: ASP, ASP.NET,
JSP and PHP
Stored in “cloaked” format (preventing local AV
triggering) inside shell directory
PHDays 2013, Moscow (Russia) May 23, 2013 49
Metasploit integration
Automatized creation, upload and run of
Metasploit shellcode payload (switch --os-pwn)
User can choose payload (Meterpreter, shell
or VNC), connection (reverse TCP, reverse HTTP,
etc.) and encoder type (no encoder, Call+4
Dword XOR Encoder, etc.)
shellcodeexec(.exe) is being uploaded along
with (non-compiled) Metasploit shellcode
payload using stager or other means
Metasploit CLI is being run at the host machine
Payload is being executed at the target
machine connecting back to the host machine
PHDays 2013, Moscow (Russia) May 23, 2013 50
Second order SQL injection
Occurs when provided user data stored at one
place is being used in vulnerable SQL
statement at the other place
Similar to permanent XSS
User can explicitly set the location where to
look for the response (option --second-order)
Effectively doubling number of required
requests
PHDays 2013, Moscow (Russia) May 23, 2013 51
DNS exfiltration
Out-of-band SQL injection technique using DNS
resolution mechanism (option --dns-domain)
Fake DNS server instance is automatically
being made at the host machine
SQL injection payloads being sent are
deliberately provoking DNS resolution
mechanism at the target machine
Provoked DNS requests carry results of a query
Fake DNS server instance intercepts requests
and responds with dummy resolution answers
Requires registration of a nameserver for the
used domain pointing to the host machine
PHDays 2013, Moscow (Russia) May 23, 2013 52
Output purging
Output directory can be (optionally) “safely”
removed (switch --purge-output)
Content of all contained files (sessions, logs,
dumps, etc.) is being overwritten with random
data
Files truncated and renamed to random values
(sub)directories renamed to random values
At the end, whole output directory tree is being
removed