sqlmap - Under the Hood

sqlmap – Under the Hood
Miroslav Štampar
(dev@sqlmap.org)
sqlmap – Under the Hood
Miroslav Štampar
(dev@sqlmap.org)

PHDays 2013, Moscow (Russia) May 23, 2013 2
BigArray
 Support for huge table dumps (e.g. millions of
rows)
 Raw data needs to be held somewhere before
being processed (and eventually stored)
 In-memory was a good enough choice until
recent years (user appetites went bigger)
 Avoidance of MemoryError
 Memory mapping into smaller chunks/pages
(e.g. 4096 entries)
 Temporary files are used for storing chunks
 O(1) read/write access (page table principle)

HashDB
 Storage of resumable session data at
centralized place (local SQLite3 database)
 Non-ASCII values are automatically
serialized/deserialized (pickle)
 INSERT INTO storage VALUES
(LONG(MD5(target_url || key ||
MILESTONE_SALT)[:8]), stored_value)
 MILESTONE_SALT is changed whenever there is a
change in HashDB mechanism that is bringing
incompatibility with previous versions
 key uniquely describes storage_value for a
given target_url (e.g.: KB_INJECTIONS, SELECT
banner FROM v$version WHERE ROWNUM=1, etc.)

Payloads
 XML format (xml/payloads.xml)
 Tag type <boundary> used for storage of all
possible prefix and suffix formations (<prefix>,
<suffix>) together with context sensitive
information (subtags <level>, <clause>,
<where> and <ptype>)
 Tag type <test> used for storage of data
required for successful testing and usage of
each SQL injection payload type (subtags
<title>, <stype>, <level>, <risk>, <clause>,
<where>, <vector>, <request> and <response>)

Payloads (2)
<boundary>
<level>1</level>
<clause>1</clause>
<where>1,2</where>
<ptype>1</ptype>
<prefix>)</prefix>
<suffix>AND
([RANDNUM]=[RANDNUM]</suffix>
</boundary>

Payloads (3)
<test>
<title>Microsoft SQL Server/Sybase AND error-based - WHERE or HAVING
clause (IN)</title>
<stype>2</stype>
<level>2</level>
<risk>0</risk>
<clause>1</clause>
<where>1</where>
<vector>AND [RANDNUM] IN (('[DELIMITER_START]'+([QUERY])
+'[DELIMITER_STOP]'))</vector>
<request>
<payload>AND [RANDNUM] IN (('[DELIMITER_START]'+(SELECT (CASE WHEN
([RANDNUM]=[RANDNUM]) THEN '1' ELSE '0' END))
+'[DELIMITER_STOP]'))</payload>
</request>
<response>
<grep>[DELIMITER_START](?P<result>.*?)
[DELIMITER_STOP]</grep>
</response>
<details>
<dbms>Microsoft SQL Server</dbms>
<dbms>Sybase</dbms>
<os>Windows</os>
</details>
</test>

Queries
 XML format (xml/queries.xml)
 Tag type <dbms> used for storage of all DBMS
specific SQL formations required for successful
enumeration (subtags <users>, <passwords>,
<dbs>, <tables>, <columns>, <dump_table>, etc.)
and resulting data (pre)processing (subtags
<cast>, <length>, <isnull>, <count>,
<substring>, <concatenate>, etc.)
 Each enumeration subtag has an <inband> and
<blind> form used in respective techniques

Queries (2)
<dbms value="MySQL">
<cast query="CAST(%s AS CHAR)"/>
<length query="CHAR_LENGTH(%s)"/>
<isnull query="IFNULL(%s,' ')"/>
<delimiter query=","/>
<limit query="LIMIT %d,%d"/>
…
<passwords>
<inband query="SELECT user,password
FROM mysql.user" condition="user"/>
<blind query="SELECT DISTINCT(password)
FROM mysql.user WHERE user='%s' LIMIT %d,1"
count="SELECT COUNT(DISTINCT(password)) FROM
mysql.user WHERE user='%s'"/>
</passwords>
…

Multithreading
 Multithreading implemented wherever
applicable (option --threads)
 Techniques covered: boolean-based blind,
error-based and partial UNION query
 Deliberately turned off for techniques: time-
based and stacked (lots of reasons)
 Each thread covers a part of value in case of
boolean-based blind
 In other techniques, each thread covers one
enumerated entry
 Also, implemented for brute force column/table
name search and crawling

Direct connection
 Direct connection to DBMS (option -d)
 python sqlmap.py -d
“mysql://root:password123@192.168.21.129:33
06/testdb”
 Support for: Microsoft SQL Server, MySQL,
Oracle, PostgreSQL, SQLite, Microsoft Access,
Firebird, SAP MaxDB, Sybase, IBM DB2
 Using of 3rd
party connectors (e.g. python-
pymssql, pymysql, cx_Oracle, python-psycopg2,
etc.)
 SQLAlchemy used as an alternative

Load request(s) from file
 Load HTTP request(s) from a textual file (option
-r)
 Supporting RAW request format (any MITM
proxy can be used to catch one)
 Particularly usable in requests with large
content body (e.g. POST)
 Load and parse log files (option -l)
 Supporting Burp and WebScarab log formats
 Unlimited number of parsed HTTP requests
(using only unique ones)

Content type detection
 Automatic detection of (specialized) request
content types
 Supporting SOAP, JSON and (generic) XML
 For example:
--data="{ "pid": 4412, "id":
1, "action": "do"}"
--data="<request><pid>4412</pid>
<id>1</id><action>do</action></request>"
 Appropriate exploitation of parameter values
 In case of non-supported format(s), custom
injection mark (*) can be used

Site crawling/form searching
 Collect usable (on site) target links (option
--crawl)
 User defines crawling depth (e.g. 3) limiting
search based on distance from starting page
 Optional form searching at visited pages
(switch --forms)
 Arbitrary filling of missing form data
 Reparation of non-HTML compliant pages for
easier processing

Mnemonics
 Usage of mnemonics for faster setting up of
sqlmap options and switches (option -z)
 Longer (original):
python sqlmap.py --flush-session
--threads=4 --ignore-proxy --batch --banner
-u …
 Shorter (using mnemonics):
python sqlmap.py -z
“flu,thre=4,ign,bat,ban” -u …
 Highly generic prefix based recognition (e.g. -z
“flu,bat,ban” is interpreted the same as -z
“flush,batc,bann”)

Keep-alive
 HTTP persistent connection (switch --keep-
alive)
 Opposed to new connection for every single
request/response pair
 Slightly adapted 3rd
party module keepalive
and adjusted for multi-threading
 Connection pool – reusage of existing target
connection(s) where applicable
 Reduced network congestion (fewer TCP
connections), reduced latency (no
handshaking), faster enumeration, etc.

Tor
 Support for The Onion Router (Tor) online
anonymity network (switch --tor)
 Concealing identity and network activity
 Used against surveillance and (targeted) traffic
sniffing
 Configurable Tor proxy type (option --tor-type)
and port number (option --tor-port)
 DNS leakage is prevented (no DNS requests
outside of Tor)
 Available safety check for proper usage of Tor
(switch --check-tor)

Domain name resolution caching
 DNS resolution request is done by default for
each HTTP request (from Python HTTP
dedicated modules – e.g. httplib)
 Noticeable slowdown in some cases (e.g.
excessive network latency)
 Problem noticed and reported by (nagging)
users (looking into Wireshark traffic captures)
 Problem patched at the lowest level (method
socket.getaddrinfo(*args, **kwargs) is
encapsulated for caching)

Authentication methods
 Implemented support for authentication
methods: basic, digest, NTLM and certificate
(options --auth-type, --auth-cred and --auth-
cert)
 python sqlmap.py -u
“http://192.168.21.129/vuln.php?id=1”
--auth-type=basic --auth-
cred=”testuser:testpass”
 Handling HTTP status code 401 (Unauthorized)
 Authorization headers are being cached (where
applicable)

Reflection detection and removal
 Noisy response resulting from request
reflection
 Query results for: 1%20AND%201%3D1
 Can cause problems in detection phase
 Particularly problematic for boolean-based
blind technique (fuzzy page comparison)
 Automatic detection of reflected payload value
and marking with predefined constant value
 Query results for: __REFLECTED_VALUE__

Dynamicity detection and removal
 Noisy response resulting from sporadically
changing content (e.g. ads, banners, etc.)
 Can cause problems in both detection and
enumeration phase
 Particularly problematic for boolean-based
blind technique
 Automatic detection and marking of dynamic
parts (info held in internal knowledge base)
 In best case, automatic recognition and usage
of string value appearing only in True
responses (option --string)

Content filtering
 Occasionally pages are bulked with non-textual
content (CSS styles, comments, JavaScript,
HTML tags, embedded objects, etc.)
 Changes regarding boolean-based blind
technique are usually affecting only one small
textual part (e.g. table entry)
 Optional filtering of non-textual content (switch
–text-only)
 For example: <html>...<td>Tooth
fairy</td>...</html> is filtered to ...Tooth
fairy...
 Better detection and less trash(y) results

Wizard mode
 For beginner users and script kiddies (switch
--wizard)
 Questions asked:
Target URL
POST data (if any)
Injection difficulty (Normal/Medium/Hard)
Enumeration (Basic/Intermediate/All)
 Infamous for Comodo Brazil breach (March
2011) – attackers posted wizard mode console
output to the Pastebin

Level/risk of detection
 Number of requests per each parameter in
testing phase can grow from 10 up to 10K
 To prevent unnecessary noise and speed up the
testing time, tests are classified by level and
risk
 Level (option --level) represents (passing)
possibility/usability of the test case (higher
level means lower possibility)
 Risk (option --risk) represents potential
damage that the test case can cause (higher
risk means higher potential damage)

Heuristic SQL injection checks
 Recognition of the backend DBMS if error
message can be provoked with arbitrary invalid
SQL sequence (e.g. ())'”(''”')
 In case that the parameter value is integer and
response for (e.g.) 1 is the same as for (2-1),
there is a good chance that the target is
vulnerable
 In case of detected boolean-based blind
technique, DBMS specific queries are used (e.g.
(SELECT 0x616263)=0x616263) to potentially
move focus to a particular DBMS in further
tests

Type casting detection
 Type casting is an efficient way for dealing with
SQL injection on numeric values
 $query = "SELECT * FROM log WHERE id=" .
intval($_GET['id']);
 Implemented automatic detection of such
cases
 In case that the parameter value is integer and
response for (e.g.) 1 is the same as for 1foobar,
there is a good chance that the target is using
integer casting
 User is warned of a potentially “futile” run

Fingerprinting
 Web server is being fingerprinted by known
HTTP headers, cookie values, etc.
 DBMS is being fingerprinted through error
message parsing, banner parsing and tests
with version specific payloads (obtained from
release notes and reference manuals)
 For example, cookie value ASP.NET_SessionId is
specific for ASP.NET/IIS/Windows platform,
while TO_SECONDS(950501)>0 check should work
only on MySQL >= 5.5.0
 Detailed DBMS version check is done only if
switch -f/--fingerpint is used

Suhosin-patch detection
 Open source patch for PHP, protecting web
server from “insecure PHP practices”
 suhosin.get.max_value_length (default: 512),
suhosin.post.max_value_length, etc.
 Causing problems in enumeration phase when
payloads are big (e.g. enumerating column
names)
 After the detection phase single payload
(depending on detected techniques) is sent
having size greater than 512 (e.g. 1 AND 6525
= … 6525)
 User is warned in case of False response

WAF/IDS/IPS detection
 Sending one “suspicious” request (in form of
dummy parameter value) and checking for
response change(s) when compared to original
(switch --check-waf)
 WAF scripts (switch --identify-waf) do a
through checking, each focusing on
peculiarities of a particular product
 For example, WebKnight responds with HTTP
status code 999 on detected suspicious activity
 Currently there are 29 WAF scripts (airlock.py,
barracuda.py, bigip.py, etc.)

WAF/IDS/IPS bypass
 Tamper scripts (option --tamper) do changes on
injected payload before it's being sent
 User has to choose appropriate one(s) based
on collected knowledge of target's behavior
and/or detected WAF/IDS/IPS product
 If required, a chain of tamper scripts can be
used (e.g. --tamper=”between,
ifnull2ifisnull”)
 Currently there are 36 tamper scripts
(apostrophemask.py, apostrophenullencode.py,
appendnullbyte.py, etc.)

String value escaping
 Each string value inside payload is
automatically escaped (quoteless format)
depending on targeted DBMS
 For example: 1 ... AND username=”root”-- is
in case of MySQL escaped to 1 ... AND
username=0x726f6f74--
 Avoidance of filter-based escaping functions
(e.g. addslashes)
 Adding implicit dependence to targeted DBMS
 Payload obfuscation (harder noticeability in
target log files)

Evaluation of custom code
 Custom Python code can be evaluated before
each request (option --eval)
 In such code, each request parameter is
accessible as a local variable
 All resulting variable values are included into
the request as new parameter values
 --eval="import
hashlib;hash=hashlib.md5(id).hexdigest()"
 www.target.com/vuln.php?id=1 AND
1=1&hash=7f134e52836a00e26493e690ed8aa735

Fuzzy page comparison
 Used (mostly) in boolean-based blind
technique
 Gestalt pattern matching (Ratcliff-Obershelp
algorithm)
 Supported by standard Python module difflib
 Class SequenceMatcher
 Method ratio() (or faster quick_ratio())
giving a measure of the sequences’ similarity
as a float in range [0, 1]
 True result if ratio() > 0.98 when compared
with original page

Definite page comparison
 Used mostly in boolean-based blind technique
 When fuzzy page comparison fails (e.g. too
much page dynamicity) and user is able to
distinguish True from False responses by
himself (non-n**b)
 String to match when result should be
recognized as True (option --string)
 Regular expression to match … (option --regex)
 Compare HTTP codes (switch --code)
 Compare HTML titles (switch --title)

Null connection
 Sometimes there is no need for retrieval of
whole page content (size can be enough)
 Boolean-based blind technique
 3 methods: Range, HEAD and “skip-read”
 Range: bytes=-1
Content-Range: bytes 4789-4790/4790
 HEAD /search.aspx HTTP/1.1
Content-Length: 4790
 Both are resulting (if applicable) with either
empty or 1 char long response
 Method “skip-read” retrieves only HTTP
headers looking for Content-Length

False positive detection
 False positives are highly undesirable
 Specific for boolean-based blind and time-
based blind techniques
 False positive tests are done in cases when
only one of those techniques is detected
 Set of trivial mathematical checks performed to
see if target can “respond” correctly
 For example:
(123+447)=570
319>(519+110)
(654+267)>854

Delay detection
 Detection of “artificial” delay
 Statistical comparison with normal response
times
 Response time must fit under the Gaussian bell
curve to be marked as “normal”
 Is <current_response_time> >
avg(<normal_response_times>)
+7*stdev(<normal_response_times>)?
 If answer is yes, probability that we are dealing
with “artificial” delay is 99.9999999997440%
 Especially useful when heavy queries are used
(not knowing expected delay value)

Delay detection (2)

UNION query column #
 UNION query requires knowledge of number of
columns (N) for vulnerable SQL statement
 Two methods used: ORDER BY and statistical
(same principle as in delay detection)
 ORDER BY N+1 should respond noticeably
different (preferably with error message) than
for ORDER BY N (binary searched)
 In statistical method responses for candidates
(UNION SELECT NULL, NULL,...) are compared
to original (not injected) response
 Right one is the one that seems “not normal”
(having ratio outside the Gaussian bell curve)

Output prediction
 Inference techniques (boolean-based blind and
time-based blind) require optimization
wherever and whenever possible
 In certain cases prediction(s) can be made
 Checking if current retrieved entry shares same
prefix with previous retrieved entr(ies)
 For example DROP ANY ROLE has same prefix as
DROP ANY RULE (one request per checked
character compared to bit-by-bit retrieval)
 Using common output values too (e.g.
information_schema, phpmyadmin, etc.)

Brute forcing identifier names
 In case of missing schema (e.g. deleted
information_schema) brute force search is
required (e.g. 1=(SELECT 1 FROM users))
 Searching for common table names (switch
--common-tables)
 Searching for common column names (switch
--common-columns)
 Conducted automated search and parsing of
resulting SQL files for chosen Google dorks
(e.g. ext:sql “CREATE TABLE”)
 Collected most frequent 3.3K table names and
2.5K column names

Pivot dump table
 Some DBMSes (e.g. Microsoft SQL Server) don't
have OFFSET/LIMIT query mechanism making
enumeration problematic in non-UNION query
techniques
 Column with most DISTINCT values is
automatically chosen as the pivot column
 Pivot's first value bigger than previous (e.g.
SELECT MIN(id) WHERE id > ' ') is retrieved
 Entries for other columns (e.g. SELECT name
WHERE id=1) are being retrieved using current
pivot value
 Iterative process

International letters
 Добрый день Россия
 Page encoding is parsed from Content-Type
HTTP header, Content-Type meta HTML header
or heuristically detected (3rd
party module
chardet)
 RAW target response is automatically decoded
to Unicode (using detected page encoding)
 In case of inband techniques (UNION query and
error-based) results with international letters
are already supported if decoding went
properly

International letters (2)
 In case of inference techniques (boolean-based
blind and time-based blind) characters are
being inferred already in their Unicode form
 Potential problems occur when stored data
and/or database connector use different (non-
compatible) charset than target's response
 In case of unsuccessful decoding of
international letters (e.g. gibberish output)
charset can be enforced (option --charset)

Hex encoding retrieved data
 All supported DBMSes have capabilities to
encode resulting data to hexadecimal format
(switch --hex)
 Most useful in cases when (parts of) results are
potentially lost (e.g. binary data in inband
techniques)
 Retrieved data is automatically decoded to its
original (non-hexadecimal) format
 Such binary content is checked for known
formats (usign 3rd
party module magic) and (if
recognized) stored to output files

Dump format
 Dumped table content can be stored in 3
different formats: CSV (default), HTML and
SQLite (option --dump-format)
 In CSV format each row is represented by one
line and each column entry is being separated
by a predefined separator character (e.g. ,)
 In HTML format dump is stored into a visually
recognizable (browser) table
 In SQLite format dump is “replicated” to a
locally stored SQLite3 database giving a
possibility of (among others) running queries
against it

Password cracking
 Implemented support for detection and
wordlist-based cracking of 14 different
commonly used hash algorithms
 MySQL (newer and older), MsSQL (newer and
older), Oracle (newer and older), PostgreSQL,
MD5, SHA1, etc.
 Automatic analysis of retrieved passwords (--
passwords) and table dumps (--dump)
 (Optional) common suffix forms (1, 123, etc.)
 Multiprocessed attack (# of CPUs)
 1M MySQL hash guesses in under 10 seconds
on 4 core Intel Xeon W3550 @ 3.07GHz

Large dictionary support
 Distributed access in multiprocessing
environment
 Support for huge dictionaries (chunk read)
 Support for dictionary lists
 Support for ZIP compressed dictionaries
 Included custom built and compressed
dictionary (1.2M entries) based on highly
popular and publicly available dumps, like
RockYou, Gawker, Yahoo, etc.

Stagers and backdoors
 Stagers are used for uploading arbitrary
(binary) files (e.g. UDF files, backdoors, etc.)
 Backdoors are used for OS command execution
(switches --os-cmd and --os-shell)
 Prerequisite is that one of known SQL file write
methods can be used (e.g. INTO DUMPFILE, EXEC
xp_cmdshell 'debug.exe < dump.src', etc.)
 4 different platforms supported: ASP, ASP.NET,
JSP and PHP
 Stored in “cloaked” format (preventing local AV
triggering) inside shell directory

Metasploit integration
 Automatized creation, upload and run of
Metasploit shellcode payload (switch --os-pwn)
 User can choose payload (Meterpreter, shell
or VNC), connection (reverse TCP, reverse HTTP,
etc.) and encoder type (no encoder, Call+4
Dword XOR Encoder, etc.)
 shellcodeexec(.exe) is being uploaded along
with (non-compiled) Metasploit shellcode
payload using stager or other means
 Metasploit CLI is being run at the host machine
 Payload is being executed at the target
machine connecting back to the host machine

Second order SQL injection
 Occurs when provided user data stored at one
place is being used in vulnerable SQL
statement at the other place
 Similar to permanent XSS
 User can explicitly set the location where to
look for the response (option --second-order)
 Effectively doubling number of required
requests

DNS exfiltration
 Out-of-band SQL injection technique using DNS
resolution mechanism (option --dns-domain)
 Fake DNS server instance is automatically
being made at the host machine
 SQL injection payloads being sent are
deliberately provoking DNS resolution
mechanism at the target machine
 Provoked DNS requests carry results of a query
 Fake DNS server instance intercepts requests
and responds with dummy resolution answers
 Requires registration of a nameserver for the
used domain pointing to the host machine

Output purging
 Output directory can be (optionally) “safely”
removed (switch --purge-output)
 Content of all contained files (sessions, logs,
dumps, etc.) is being overwritten with random
data
 Files truncated and renamed to random values
 (sub)directories renamed to random values
 At the end, whole output directory tree is being
removed

Questions?

sqlmap - Under the Hood

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (19)

Similar to sqlmap - Under the Hood

Similar to sqlmap - Under the Hood (20)

More from Miroslav Stampar

More from Miroslav Stampar (6)

Recently uploaded

Recently uploaded (20)

sqlmap - Under the Hood