ISAAC DAWSON,
AROUND THE WEB IN 80 HOURS: SCALABLE
FINGERPRINTING WITH CHROMIUM AUTOMATION
VERACODE
15
VERACODE
AROUND THE WEB IN 80 HOURS: SCALABLE FINGERPRINTING WITH
CHROMIUM AUTOMATION
ABOUT ME:
▸ Previously at @stake, Symantec (10 years)
▸ Moved into research role at Veracode, Inc. (6 years)
▸ Living in Japan for 12 years
▸ I <3
VERACODE
AROUND THE WEB IN 80 HOURS: SCALABLE FINGERPRINTING WITH
CHROMIUM AUTOMATION
IT ALL STARTED IN 2012…
VERACODE
AROUND THE WEB IN 80 HOURS: SCALABLE FINGERPRINTING WITH
CHROMIUM AUTOMATION
SECURITY HEADER SCANNING HISTORY
▸ All scanners use the Alexa Top 1 Million URLs
▸ Galexa (November 2012 - March 2014)
▸ Golexa (March 2014 - February 2016)
▸ Creeper v0-v1 (February 2016 - July 2016)
▸ Creeper v2 (July 2016 - …)
ARCHITECTURE
THE SYSTEM:
VERACODE
AROUND THE WEB IN 80 HOURS: SCALABLE FINGERPRINTING WITH
CHROMIUM AUTOMATION
SUMMARY OF SYSTEMS & COMPONENTS
▸ Admin (x1) - Manages jobs
▸ Agents (x50) - Analyzes URLs
▸ DB Writers (x4) - Feeds analysis data into the DB & S3
▸ Database (x1) - PostgreSQL 9.5 DB
▸ NSQ - A message queue for URLs, reports and responses
▸ S3 - Stores serialized DOM and HTML/JS
VERACODE
AROUND THE WEB IN 80 HOURS: SCALABLE FINGERPRINTING WITH
CHROMIUM AUTOMATION
THE MESSAGE QUEUE -NSQD, NSQLOOKUPD
▸ NSQ is an easy to deploy message queue
▸ JSON messages between all systems
▸ All agents point to Admin service running NSQLookupd
VERACODE
AROUND THE WEB IN 80 HOURS: SCALABLE FINGERPRINTING WITH
CHROMIUM AUTOMATION
HELPFUL NSQ FEATURES
// Create consumer
c.urlConsumer, err = nsq.NewConsumer(job.Topics["url"],
creeper_types.UrlChannel, cfg)
// Process numBrowser of messages concurrently (7)
c.urlConsumer.AddConcurrentHandlers(
nsq.HandlerFunc(c.processUrls),
numBrowsers)


// Job taking too long to handle/process a message?
msg.Touch() // notify we are still working on this message
// Need to requeue because chrome crashed?
msg.RequeueWithoutBackoff(-1)
// Need to change max # of inflight messages?
c.urlConsumer.ChangeMaxInFlight(c.getInflightCount())
1
2
3
4
VERACODE
DATA STORAGE
AROUND THE WEB IN 80 HOURS: SCALABLE FINGERPRINTING WITH
CHROMIUM AUTOMATION
DATAFLOW
DB
AGENT
ADMIN
WRITER
WRITER
WRITER S3
AGENT
AGENT
CREEPER AGENTS
GETTING THE DATA WITH:
VERACODE
CREEPER AGENTS: GETTING THE DATA
VERACODE
CREEPER AGENTS: GETTING THE DATA
BROWSER AUTOMATION REQUIREMENTS
▸ Automatable
▸ Fast
▸ Capture network
▸ Capture various browser events (CSP violations)
▸ Inject JavaScript
VERACODE
CREEPER AGENTS: GETTING THE DATA
CHOSE CHROME, FOR OBVIOUS REASONS…
▸ Each agent runs 3-6 tabs concurrently
▸ Headless, uses Xvfb
▸ Can get full read access to network response data
▸ Easily inject javascript
▸ Can subscribe to console messages
VERACODE
CREEPER AGENTS: GETTING THE DATA
AGENT DESIGN
CREEPER AGENT
BROWSER
MANAGER
ANALYZER
REPORTER
APP LOGIC
CONTROLLING THE
BROWSER
VERACODE
CREEPER AGENTS: GETTING THE DATA
GOOGLE CHROME REMOTE DEBUGGER
▸ Huge definition files: browser_protocol.json and
js_protocol.json
{
"version": { "major": "1", "minor": "1" },
"domains": [{ "domain": "Inspector",
"hidden": true,
"types": [],
"commands": [{
"name": "enable",
"description": "Enables inspector domain...”,
"handlers": ["browser", "renderer"]
}],
"events": [{
"name": "evaluateForTestInFrontend",
"parameters": [ … ]
}],
}
}
VERACODE
CREEPER AGENTS: GETTING THE DATA
GCD
▸ GCD generates Go code using templates
▸ Remote access to debugger events, functions, types.
▸ Can be updated easily as the protocol files change
VERACODE
CREEPER AGENTS: GETTING THE DATA
GCD WAS GOOD BUT…
▸ Needed something better
▸ Built autogcd to automate:
▸ Trapping console messages
▸ Intercepting network data
▸ Injecting JS
▸ Took some inspiration from WebDriver
VERACODE
CREEPER AGENTS: GETTING THE DATA
GETTING CSP EVENTS
func (b *Browser) StartIntercepting() error {
b.tab.GetConsoleMessages(b.cspHandler())
return nil
}
func (b *Browser) cspHandler() autogcd.ConsoleMessageFunc {
return func(tab *autogcd.Tab, message
*gcdapi.ConsoleConsoleMessage) {
if message.Source != "security" {
return
}
parseCsp(b.creeperData.CspResults,
b.creeperData.ReportOnlyCspResults, message.Text)
}
}
1
2
VERACODE
CREEPER AGENTS: GETTING THE DATA
TRAPPING NETWORK RESPONSES
func (b *Browser) StartIntercepting() error {
b.tab.GetNetworkTraffic(nil, b.responseHandler(), b.respFinishedHandler())
}
func (b *Browser) responseHandler() autogcd.NetworkResponseHandlerFunc {
return func(tab *autogcd.Tab, response *autogcd.NetworkResponse) {
creeperResponse.Url = response.Response.Url
b.networkContainer.WaitFor(response.RequestId)
creeperResponse.ResponseBody, _ = b.encodeBody(response.RequestId,
creeperResponse.MimeType,
creeperResponse.Url)
b.networkContainer.AddReady(creeperResponse)
}
}
// mark the body as ready
func (b *Browser) respFinishedHandler() autogcd.NetworkFinishedHandlerFunc {
return func(tab *autogcd.Tab, requestId string, dataLength, timeStamp float64) {
b.networkContainer.BodyReady(requestId)
}
}
1
2
3
4
VERACODE
CREEPER AGENTS: GETTING THE DATA
INJECTING JAVASCRIPT
▸ Extract JS libraries and versions
▸ Retire.js and Wappalyzer have some good pointers
▸ Created a JSON file with 86 frameworks
▸ Must wait for the page to be fully loaded
VERACODE
CREEPER AGENTS: GETTING THE DATA
INJECTING JAVASCRIPT - THE QUERIES
{
"libraries": [ {
"url": "http://jquery.com/",
"key": "jquery",
"statement": "jQuery.fn.jquery"
}, {
"url": "https://jquerymobile.com/",
"key": "jquery-mobile",
"statement": "jQuery.mobile.version"
}, {
"url": "http://www.embeddedjs.com/",
"key": "embeddedjs 1.0",
"statement": "(typeof EJS === "function"
&& typeof EJS.Buffer === "function") ? "ejs 1.0":"""
}, {
"url": "http://www.embeddedjs.com/",
"key": "embeddedjs 0.x",
"statement": "(typeof EJS === "function"
&& typeof EjsScanner === "function") ? "ejs 0.x":"""
} ]
}
VERACODE
CREEPER AGENTS: GETTING THE DATA
INJECTING JAVASCRIPT - INJECTING
for _, library := range JsLibs.Libraries {
res, err := b.ExecuteScript(library.Statement)
if err == nil && string(res) != "" {
log.Printf("%s library result was: %sn",
library.Key,
string(res))
report.JavaScriptLibraries[library.Key] = string(res)
}
}
VERACODE
CREEPER AGENTS: GETTING THE DATA
INJECTING JAVASCRIPT - WHEN IS A PAGE DONE?
▸ DOMContentLoaded doesn’t handle dynamically loaded
JS
▸ Listen for DOM change events
▸ Page loaded if no DOM change events occur for > 2
seconds
▸ Timeout after 5 seconds
CHALLENGES
VERACODE
CREEPER AGENTS: GETTING THE DATA
CHALLENGES - CONTAMINATION
					
+ + + +
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
+ + + +
																																																					
Start	
Capture
Load	
URL
Document		
Loaded	
Stop		
Capture
VERACODE
CREEPER AGENTS: GETTING THE DATA
CHALLENGES - CONTAMINATION - SOLUTION
+ + + + + + +
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
+ + + + + + +
Borrow	
Browser
Start	
Capture
Load	
URL
Document	
Loaded
Stop	
Capture
Kill	
Browser
Start/Add	
Pool
VERACODE
CREEPER AGENTS: GETTING THE DATA
CHALLENGES - CHROME BUG #1
▸ Turns out opening tabs excessively can cause tabs to not
respond to debugger protocol
VERACODE
CREEPER AGENTS: GETTING THE DATA
CHALLENGES - CHROME BUG #1 - SOLUTION
▸ Mark tabs as ‘dead’
▸ If max dead tab count is reached, drain active URLs and kill
chrome
CRASHSAFARI.COM
VERACODE
CREEPER AGENTS: GETTING THE DATA
CHALLENGES - CHROME BUG #2 - CHRASHSAFARI.COM
▸ Would completely kill chrome *and* agent
▸ Lost all active tabs
▸ This site cost me about 2-3 weeks development time
VERACODE
▸ Created killface package
▸ Sends a notification to stop active work
▸ Worker count dynamically adjusted to 1
▸ Pauses queue, runs all unfinished URLs again
▸ Once active count is 0, restart normally
CREEPER AGENTS: GETTING THE DATA
CHALLENGES - CRASHSAFARI.COM - SOLUTION
VERACODE
CREEPER AGENTS: GETTING THE DATA
OTHER CHALLENGES
✘ NSQ messages too large, zipping ineffective
✓Split response data/report data
✘ Sites block AWS IP ranges, (craigslist.com etc)
☹ Timeout…
✘ Concurrency issues
✓ Very careful use of go routines, channels and timers.
✘ Site analysis failures/timeouts
✓ Try 3 times, keep track of retry state.
✓ During retry, open a new browser and work on additional url
DB WRITERS & S3
STORING THE DATA WITH:
VERACODE
DB WRITERS: STORING THE DATA
PREVIOUSLY…
▸ Creeper v0 had many problems
▸ RDS did not support PostgreSQL 9.5
▸ Duplicate data
▸ For v1, wrote to disk, SHA1 of contents:
▸ /job/files/5/a/b/c/5abcfbe73e39e0572a939b09f1eb16d7.html
▸ v1 did not shard database tables
▸ Database tables were normalized
▸ Lock contention
VERACODE
DB WRITERS: STORING THE DATA
DATABASE REFRESHER - NORMALIZING
url header_name header_value
http://veracode.com x-xss-protection 1; mode=block
http://codeblue.jp x-xss-protection 1; mode=block
http://google.jp x-xss-protection 1; mode=block report-uri
…
url header_name_id header_value_id
http://veracode.com 0 0
http://codeblue.jp 0 0
http://google.jp 0 1
header_name_id header_name
0 x-xss-protection
header_value_id header_value
0 1; mode=block
1 1; mode=block report-uri …
NORMALIZED:
FLATTENED:
VERACODE
DB WRITERS: STORING THE DATA
CHALLENGES - GETTING THE DATA IN QUICKLY
▸ Get the data out of the DB writers as soon as possible
▸ Careful to not overload the database with many
connections
▸ Reduce lock contention for writing
VERACODE
DB WRITERS: STORING THE DATA
SOLUTION #1 - GETTING THE DATA IN QUICKLY
▸ DB Writers batch up reports and responses
▸ Inserted every 2.5-3.5 seconds
▸ Reduces number of required DB connections
VERACODE
DB WRITERS: STORING THE DATA
SOLUTION #1 BATCHER
func (b *Batcher) AddReport(r *creeper_types.CreeperReport) {
select {
case b.reportPool <- r:
atomic.AddInt32(&b.reportCount, 1)
}
}
func (b *Batcher) EmptyReports() []*creeper_types.CreeperReport {
reports := make([]*creeper_types.CreeperReport, 0)
for {
select {
case report := <-b.reportPool:
reports = append(reports, report)
default:
return reports
}
}
return nil
}
VERACODE
DB WRITERS: STORING THE DATA
SOLUTION #2 - GETTING THE DATA IN QUICKLY
▸ Insert into temporary table using COPY FROM
▸ Extracted from temporary table and INSERTed into final
table. This allows for UPSERTS:
INSERT INTO header_names (header_name)
SELECT responses_tmp.header_name FROM responses_tmp
ON CONFLICT DO NOTHING;
VERACODE
DB WRITERS: STORING THE DATA
CHALLENGES - LARGE TABLES
▸ INSERT INTO … FROM SELECT … on a table with
80,000,000 rows
▸ As tables got bigger, db writers slowed down
▸ This is not scalable
VERACODE
DB WRITERS: STORING THE DATA
SOLUTION - TABLE SHARDING
▸ Much like sharding for the file system
▸ Requires a key:
▸ URL ID. (Ex: 1,google.com 2,microsoft.com etc)
▸ Only large tables require sharding
VERACODE
shardKey % inputId
shardKey = 1
shardKey = 2
shardKey = 3
DB
DB WRITERS: STORING THE DATA
TABLE SHARDING
WRITER
VERACODE
DB WRITERS: STORING THE DATA
CREATING A SHARD KEY
▸ Choose the number of times to shard your tables:
▸ shardKey = input_id % 32
▸ Created PLpgSQL functions:
▸
create unlogged table if not exists job_0_responses (
response_id serial primary key,
input_id integer not null,
body_hash varchar(64) not null,
resp_url bytea not null,
resp_uuid varchar(64) unique not null,
resp_type_id integer references resp_types (resp_type_id) not null,
status_id integer references status_lines (status_id) not null,
status_code integer,
mime_type_id integer references mime_types (mime_type_id) not null,
response_time bigint
);
EXECUTE merge_headers(job, shardKey)
VERACODE
DB WRITERS: STORING THE DATA
CONS WITH SHARDING
▸ Added complexity for querying
▸ Best to create a new table with all data for reporting
▸ In the future, may use Citus for sharding across multiple
databases
VERACODE
DB WRITERS: STORING THE DATA
RESPONSE DATA (JS/HTML)
VERACODE
▸ S3 limits 100/rps, but pushing 200-2000/rps
▸ Had to contact support
▸ Exponential Backoff, retry 10 times
▸ Hash is stored in response table
▸ HeadObject first to check existence, then PutObject
▸ HeadObjects are way cheaper
DB WRITERS: STORING THE DATA
MOVING TO S3
VERACODE
DB WRITERS: STORING THE DATA
LASTLY…
▸ Created unlogged tables
▸ Modified PostgreSQL configuration:
▸ Set checkpoints 5 minutes (max) instead of 1
▸ Enabled fsync
▸ Set max_wal_size 256
THE RESULTS
A LOOK AT THE DATA
VERACODE
THE RESULTS: A LOOK AT DATA
SCAN STATISTICS
Responses 72,193,155
Headers 525,385,900
JS Results 1,943,925
URLs w/Errors 67,315
Redirected to HTTPS 145,268
URLS w/CSP Violations 740
Scan Time 15 Hours
Cost 343$ / 35063円
VERACODE
THE RESULTS: A LOOK AT DATA
CSP VIOLATIONS
▸ 722 out of 4965 sites using CSP had violations
▸ Security sites:
▸ https://www.globalsign.com/en/, http://secunia.com/,
▸ https://lastpass.com/, https://www.avant.com/, http://
www.veracode.com/
▸ Well known organizations:
▸ http://www.alibaba.com, https://www.doubleclickbygoogle.com
▸ https://mozillians.org/en-US/
VERACODE
THE RESULTS: A LOOK AT DATA
SUM OF CSP VIOLATION TYPES
0
750
1500
2250
3000
SCRIPTSRC
IMGSRC
FRAMESRC
FONTSRC
STYLESRC
CONNECTSRC
MEDIASRC
CHILDSRC
OBJECTSRC
BASEURI
FORMACTIONMANIFESTSRC
VERACODE
THE RESULTS: A LOOK AT DATA
TOP JAVASCRIPT LIBRARIES > 3000
0
200000
400000
600000
800000
JQUERY
JQUERY-UI
MODERNIZR
JQUERY-UI-DIALOG
YEPNOPE
JQUERY-UI-AUTOCOMPLETE
JQUERY-UI-TOOLTIP
BOOTSTRAP
HTML5SHIV
UNDERSCORE
JQUERY.PRETTYPHOTO
PROTOTYPEJS
DRUPAL
MOOTOOLS
MEJS
BACKBONE.JS
ANGULARJS
FOUNDATION
JWPLAYER
REQUIREJS
HANDLEBARS.JS
HAMMERJS
JPLAYER
MUSTACHE.JS
SCRIPTACULOUS
SHADOWBOX
ZEROCLIPBOARD
YUI
RAPHAEL
DATATABLES
KNOCKOUT
VERACODE
THE RESULTS: A LOOK AT DATA
JAVASCRIPT ‘NEXTGEN’ FRAMEWORKS > 100
0
4500
9000
13500
18000
BACKBONE.JS
ANGULARJS
FOUNDATION
YUI
KNOCKOUT
DOJO
REACTJS
MARIONETTEJS
VUEJS
EMBER
METEOR
MITHRIL
EXTJS
POLYMER
VERACODE
THE RESULTS: A LOOK AT DATA
VULNERABILITY COUNTS
0
20000
40000
60000
80000
JQUERY
JQUERY-UI-DIALOG
JQUERY.PRETTYPHOTO
ANGULARJS
JQUERY-UI-TOOLTIP
JPLAYER
HANDLEBARS.JS
ZEROCLIPBOARD
MUSTACHE.JS
YUI
PROTOTYPEJS
MEJS
JWPLAYER
DOJO
EMBER
TINYMCE
PLUPLOAD
JQUERY-MOBILE
CKEDITOR
VERACODE
THE RESULTS: A LOOK AT DATA
LONGEST SECURITY HEADER AWARD - HTTPS://WWW.INSIGHTGUIDES.COM/
Content-Security-Policy: default-src 'self' http://tagmanager.google.com https://tagmanager.google.com https://*.doubleclick.net http://*.doubleclick.net https://*.google-
analytics.com http://*.google-analytics.com https://*.livechatinc.com http://*.livechatinc.com https://*.cloudfront.net http://*.cloudfront.net https://*.googleusercontent.com http://
*.googleusercontent.com https://www.bugherd.com http://www.bugherd.com https://*.braintreegateway.com http://*.braintreegateway.com https://www.biblioimages.com http://
www.biblioimages.com https://fonts.gstatic.com http://fonts.gstatic.com https://*.googleapis.com http://*.googleapis.com https://tripadvisor.com http://tripadvisor.com https://
*.gstatic.com http://*.gstatic.com https://www.tripadvisor.com http://www.tripadvisor.com https://www.insightguides.com http://www.insightguides.com https://rum-
static.pingdom.net http://rum-static.pingdom.net https://rum-collector.pingdom.net http://rum-collector.pingdom.net https://*.youtube.com http://*.youtube.com https://
www.googleadservices.com http://www.googleadservices.com https://connect.facebook.net http://connect.facebook.net https://googleads.g.doubleclick.net http://
googleads.g.doubleclick.net https://www.facebook.com http://www.facebook.com https://cdn.inspectlet.com http://cdn.inspectlet.com https://hn.inspectlet.com http://
hn.inspectlet.com https://*.apa.yoda.site http://*.apa.yoda.site https://www.preprod.apa.yoda.site http://www.preprod.apa.yoda.site https://www.test.apa.yoda.site http://
www.test.apa.yoda.site https://www.google.com http://www.google.com https://www.google.pl http://www.google.pl https://www.google.co.uk http://www.google.co.uk https://
google.com http://google.com https://google.pl http://google.pl https://google.co.uk http://google.co.uk https://ethn.io http://ethn.io https://stats.g.doubleclick.net http://
stats.g.doubleclick.net https://platform.instagram.com http://platform.instagram.com https://instagram.com http://instagram.com https://www.instagram.com http://
www.instagram.com https://*.amazonaws.com http://*.amazonaws.com blob:; script-src 'self' http://www.googletagmanager.com https://www.googletagmanager.com http://
tagmanager.google.com https://tagmanager.google.com https://*.doubleclick.net http://*.doubleclick.net https://*.google-analytics.com http://*.google-analytics.com https://
*.livechatinc.com http://*.livechatinc.com https://*.cloudfront.net http://*.cloudfront.net https://*.googleusercontent.com http://*.googleusercontent.com https://www.bugherd.com
http://www.bugherd.com https://*.braintreegateway.com http://*.braintreegateway.com https://www.biblioimages.com http://www.biblioimages.com https://fonts.gstatic.com http://
fonts.gstatic.com https://*.googleapis.com http://*.googleapis.com https://tripadvisor.com http://tripadvisor.com https://*.gstatic.com http://*.gstatic.com https://www.tripadvisor.com
http://www.tripadvisor.com https://www.insightguides.com http://www.insightguides.com https://rum-static.pingdom.net http://rum-static.pingdom.net https://rum-
collector.pingdom.net http://rum-collector.pingdom.net https://*.youtube.com http://*.youtube.com https://www.googleadservices.com http://www.googleadservices.com https://
connect.facebook.net http://connect.facebook.net https://googleads.g.doubleclick.net http://googleads.g.doubleclick.net https://www.facebook.com http://www.facebook.com
https://cdn.inspectlet.com http://cdn.inspectlet.com https://hn.inspectlet.com http://hn.inspectlet.com https://*.apa.yoda.site http://*.apa.yoda.site https://www.preprod.apa.yoda.site
http://www.preprod.apa.yoda.site https://www.test.apa.yoda.site http://www.test.apa.yoda.site https://www.google.com http://www.google.com https://www.google.pl http://
www.google.pl https://www.google.co.uk http://www.google.co.uk https://google.com http://google.com https://google.pl http://google.pl https://google.co.uk http://google.co.uk
https://ethn.io http://ethn.io https://stats.g.doubleclick.net http://stats.g.doubleclick.net https://platform.instagram.com http://platform.instagram.com https://instagram.com http://
instagram.com https://www.instagram.com http://www.instagram.com https://*.amazonaws.com http://*.amazonaws.com 'unsafe-eval' 'unsafe-inline' https://apis.google.com blob:;
connect-src * 'self' http://tagmanager.google.com https://tagmanager.google.com https://*.doubleclick.net http://*.doubleclick.net https://*.google-analytics.com http://*.google-
analytics.com https://*.livechatinc.com http://*.livechatinc.com https://*.cloudfront.net http://*.cloudfront.net https://*.googleusercontent.com http://*.googleusercontent.com https://
www.bugherd.com http://www.bugherd.com https://*.braintreegateway.com http://*.braintreegateway.com https://www.biblioimages.com http://www.biblioimages.com https://
fonts.gstatic.com http://fonts.gstatic.com https://*.googleapis.com http://*.googleapis.com https://tripadvisor.com http://tripadvisor.com https://*.gstatic.com http://*.gstatic.com
https://www.tripadvisor.com http://www.tripadvisor.com https://www.insightguides.com http://www.insightguides.com https://rum-static.pingdom.net http://rum-static.pingdom.net
https://rum-collector.pingdom.net http://rum-collector.pingdom.net https://*.youtube.com http://*.youtube.com https://www.googleadservices.com http://www.googleadservices.com
https://connect.facebook.net http://connect.facebook.net https://googleads.g.doubleclick.net http://googleads.g.doubleclick.net https://www.facebook.com http://
www.facebook.com https://cdn.inspectlet.com http://cdn.inspectlet.com https://hn.inspectlet.com http://hn.inspectlet.com https://*.apa.yoda.site http://*.apa.yoda.site https://
www.preprod.apa.yoda.site http://www.preprod.apa.yoda.site https://www.test.apa.yoda.site http://www.test.apa.yoda.site https://www.google.com http://www.google.com https://
www.google.pl http://www.google.pl https://www.google.co.uk http://www.google.co.uk https://google.com http://google.com https://google.pl http://google.pl https://
google.co.uk http://google.co.uk https://ethn.io http://ethn.io https://stats.g.doubleclick.net http://stats.g.doubleclick.net https://platform.instagram.com http://
platform.instagram.com https://instagram.com http://instagram.com https://www.instagram.com http://www.instagram.com https://*.amazonaws.com http://*.amazonaws.com blob:;
VERACODE
THE RESULTS: A LOOK AT DATA
SOME OF MY FAVORITE HTTP STATUS LINES
▸ HTTP 500 access denied ("java.io.FilePermission" "D:
homeXXXXXXXXX.comoriModelGlueunity
eventrequestEventRequest.cfc" "read")
▸ HTTP 500 "Duplicate entry '1473335051' for key
'timestamp' SQL=INSERT INTO `#__zt_visitor_counter`
(`id`,`timestamp`,`visits`,`guests`,`ipaddress`,`useragent`)
VALUES (null, '1473335051', 1 , 1 , '54.208.81.16',
‘chrome')"
▸ HTTP 500 "Server Made Big Boo"
“NO HACKING”
ABSOLUTE FAVORITE STATUS LINE
VERACODE
THE RESULTS: A LOOK AT DATA
CONCLUSION
▸ Use NSQ, seriously.
▸ Concurrency can be difficult
▸ Batch data before inserting to DB
▸ If DB rows > a few million, consider sharding
▸ Test different types of table schema for performance
▸ Treat browsers like garbage and handle appropriately
VERACODE
THE RESULTS: A LOOK AT DATA
QUESTIONS?
▸ twitter: @_wirepair
▸ github: wirepair
▸ gcd: https://github.com/wirepair/gcd
▸ autogcd: https://github.com/wirepair/autogcd
▸ killface: https://github.com/wirepair/killface
▸ Thanks to all my coworkers supporting and listening to my
daily rants!

[CB16] 80時間でWebを一周:クロムミウムオートメーションによるスケーラブルなフィンガープリント by Isaac Dawson

  • 1.
    ISAAC DAWSON, AROUND THEWEB IN 80 HOURS: SCALABLE FINGERPRINTING WITH CHROMIUM AUTOMATION VERACODE 15
  • 2.
    VERACODE AROUND THE WEBIN 80 HOURS: SCALABLE FINGERPRINTING WITH CHROMIUM AUTOMATION ABOUT ME: ▸ Previously at @stake, Symantec (10 years) ▸ Moved into research role at Veracode, Inc. (6 years) ▸ Living in Japan for 12 years ▸ I <3
  • 3.
    VERACODE AROUND THE WEBIN 80 HOURS: SCALABLE FINGERPRINTING WITH CHROMIUM AUTOMATION IT ALL STARTED IN 2012…
  • 4.
    VERACODE AROUND THE WEBIN 80 HOURS: SCALABLE FINGERPRINTING WITH CHROMIUM AUTOMATION SECURITY HEADER SCANNING HISTORY ▸ All scanners use the Alexa Top 1 Million URLs ▸ Galexa (November 2012 - March 2014) ▸ Golexa (March 2014 - February 2016) ▸ Creeper v0-v1 (February 2016 - July 2016) ▸ Creeper v2 (July 2016 - …)
  • 5.
  • 6.
    VERACODE AROUND THE WEBIN 80 HOURS: SCALABLE FINGERPRINTING WITH CHROMIUM AUTOMATION SUMMARY OF SYSTEMS & COMPONENTS ▸ Admin (x1) - Manages jobs ▸ Agents (x50) - Analyzes URLs ▸ DB Writers (x4) - Feeds analysis data into the DB & S3 ▸ Database (x1) - PostgreSQL 9.5 DB ▸ NSQ - A message queue for URLs, reports and responses ▸ S3 - Stores serialized DOM and HTML/JS
  • 7.
    VERACODE AROUND THE WEBIN 80 HOURS: SCALABLE FINGERPRINTING WITH CHROMIUM AUTOMATION THE MESSAGE QUEUE -NSQD, NSQLOOKUPD ▸ NSQ is an easy to deploy message queue ▸ JSON messages between all systems ▸ All agents point to Admin service running NSQLookupd
  • 8.
    VERACODE AROUND THE WEBIN 80 HOURS: SCALABLE FINGERPRINTING WITH CHROMIUM AUTOMATION HELPFUL NSQ FEATURES // Create consumer c.urlConsumer, err = nsq.NewConsumer(job.Topics["url"], creeper_types.UrlChannel, cfg) // Process numBrowser of messages concurrently (7) c.urlConsumer.AddConcurrentHandlers( nsq.HandlerFunc(c.processUrls), numBrowsers) 
 // Job taking too long to handle/process a message? msg.Touch() // notify we are still working on this message // Need to requeue because chrome crashed? msg.RequeueWithoutBackoff(-1) // Need to change max # of inflight messages? c.urlConsumer.ChangeMaxInFlight(c.getInflightCount()) 1 2 3 4
  • 9.
    VERACODE DATA STORAGE AROUND THEWEB IN 80 HOURS: SCALABLE FINGERPRINTING WITH CHROMIUM AUTOMATION DATAFLOW DB AGENT ADMIN WRITER WRITER WRITER S3 AGENT AGENT
  • 10.
  • 11.
  • 12.
    VERACODE CREEPER AGENTS: GETTINGTHE DATA BROWSER AUTOMATION REQUIREMENTS ▸ Automatable ▸ Fast ▸ Capture network ▸ Capture various browser events (CSP violations) ▸ Inject JavaScript
  • 13.
    VERACODE CREEPER AGENTS: GETTINGTHE DATA CHOSE CHROME, FOR OBVIOUS REASONS… ▸ Each agent runs 3-6 tabs concurrently ▸ Headless, uses Xvfb ▸ Can get full read access to network response data ▸ Easily inject javascript ▸ Can subscribe to console messages
  • 14.
    VERACODE CREEPER AGENTS: GETTINGTHE DATA AGENT DESIGN CREEPER AGENT BROWSER MANAGER ANALYZER REPORTER APP LOGIC
  • 15.
  • 16.
    VERACODE CREEPER AGENTS: GETTINGTHE DATA GOOGLE CHROME REMOTE DEBUGGER ▸ Huge definition files: browser_protocol.json and js_protocol.json { "version": { "major": "1", "minor": "1" }, "domains": [{ "domain": "Inspector", "hidden": true, "types": [], "commands": [{ "name": "enable", "description": "Enables inspector domain...”, "handlers": ["browser", "renderer"] }], "events": [{ "name": "evaluateForTestInFrontend", "parameters": [ … ] }], } }
  • 17.
    VERACODE CREEPER AGENTS: GETTINGTHE DATA GCD ▸ GCD generates Go code using templates ▸ Remote access to debugger events, functions, types. ▸ Can be updated easily as the protocol files change
  • 18.
    VERACODE CREEPER AGENTS: GETTINGTHE DATA GCD WAS GOOD BUT… ▸ Needed something better ▸ Built autogcd to automate: ▸ Trapping console messages ▸ Intercepting network data ▸ Injecting JS ▸ Took some inspiration from WebDriver
  • 19.
    VERACODE CREEPER AGENTS: GETTINGTHE DATA GETTING CSP EVENTS func (b *Browser) StartIntercepting() error { b.tab.GetConsoleMessages(b.cspHandler()) return nil } func (b *Browser) cspHandler() autogcd.ConsoleMessageFunc { return func(tab *autogcd.Tab, message *gcdapi.ConsoleConsoleMessage) { if message.Source != "security" { return } parseCsp(b.creeperData.CspResults, b.creeperData.ReportOnlyCspResults, message.Text) } } 1 2
  • 20.
    VERACODE CREEPER AGENTS: GETTINGTHE DATA TRAPPING NETWORK RESPONSES func (b *Browser) StartIntercepting() error { b.tab.GetNetworkTraffic(nil, b.responseHandler(), b.respFinishedHandler()) } func (b *Browser) responseHandler() autogcd.NetworkResponseHandlerFunc { return func(tab *autogcd.Tab, response *autogcd.NetworkResponse) { creeperResponse.Url = response.Response.Url b.networkContainer.WaitFor(response.RequestId) creeperResponse.ResponseBody, _ = b.encodeBody(response.RequestId, creeperResponse.MimeType, creeperResponse.Url) b.networkContainer.AddReady(creeperResponse) } } // mark the body as ready func (b *Browser) respFinishedHandler() autogcd.NetworkFinishedHandlerFunc { return func(tab *autogcd.Tab, requestId string, dataLength, timeStamp float64) { b.networkContainer.BodyReady(requestId) } } 1 2 3 4
  • 21.
    VERACODE CREEPER AGENTS: GETTINGTHE DATA INJECTING JAVASCRIPT ▸ Extract JS libraries and versions ▸ Retire.js and Wappalyzer have some good pointers ▸ Created a JSON file with 86 frameworks ▸ Must wait for the page to be fully loaded
  • 22.
    VERACODE CREEPER AGENTS: GETTINGTHE DATA INJECTING JAVASCRIPT - THE QUERIES { "libraries": [ { "url": "http://jquery.com/", "key": "jquery", "statement": "jQuery.fn.jquery" }, { "url": "https://jquerymobile.com/", "key": "jquery-mobile", "statement": "jQuery.mobile.version" }, { "url": "http://www.embeddedjs.com/", "key": "embeddedjs 1.0", "statement": "(typeof EJS === "function" && typeof EJS.Buffer === "function") ? "ejs 1.0":""" }, { "url": "http://www.embeddedjs.com/", "key": "embeddedjs 0.x", "statement": "(typeof EJS === "function" && typeof EjsScanner === "function") ? "ejs 0.x":""" } ] }
  • 23.
    VERACODE CREEPER AGENTS: GETTINGTHE DATA INJECTING JAVASCRIPT - INJECTING for _, library := range JsLibs.Libraries { res, err := b.ExecuteScript(library.Statement) if err == nil && string(res) != "" { log.Printf("%s library result was: %sn", library.Key, string(res)) report.JavaScriptLibraries[library.Key] = string(res) } }
  • 24.
    VERACODE CREEPER AGENTS: GETTINGTHE DATA INJECTING JAVASCRIPT - WHEN IS A PAGE DONE? ▸ DOMContentLoaded doesn’t handle dynamically loaded JS ▸ Listen for DOM change events ▸ Page loaded if no DOM change events occur for > 2 seconds ▸ Timeout after 5 seconds
  • 25.
  • 26.
    VERACODE CREEPER AGENTS: GETTINGTHE DATA CHALLENGES - CONTAMINATION + + + + | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | + + + + Start Capture Load URL Document Loaded Stop Capture
  • 27.
    VERACODE CREEPER AGENTS: GETTINGTHE DATA CHALLENGES - CONTAMINATION - SOLUTION + + + + + + + | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | + + + + + + + Borrow Browser Start Capture Load URL Document Loaded Stop Capture Kill Browser Start/Add Pool
  • 28.
    VERACODE CREEPER AGENTS: GETTINGTHE DATA CHALLENGES - CHROME BUG #1 ▸ Turns out opening tabs excessively can cause tabs to not respond to debugger protocol
  • 29.
    VERACODE CREEPER AGENTS: GETTINGTHE DATA CHALLENGES - CHROME BUG #1 - SOLUTION ▸ Mark tabs as ‘dead’ ▸ If max dead tab count is reached, drain active URLs and kill chrome
  • 30.
  • 31.
    VERACODE CREEPER AGENTS: GETTINGTHE DATA CHALLENGES - CHROME BUG #2 - CHRASHSAFARI.COM ▸ Would completely kill chrome *and* agent ▸ Lost all active tabs ▸ This site cost me about 2-3 weeks development time
  • 32.
    VERACODE ▸ Created killfacepackage ▸ Sends a notification to stop active work ▸ Worker count dynamically adjusted to 1 ▸ Pauses queue, runs all unfinished URLs again ▸ Once active count is 0, restart normally CREEPER AGENTS: GETTING THE DATA CHALLENGES - CRASHSAFARI.COM - SOLUTION
  • 33.
    VERACODE CREEPER AGENTS: GETTINGTHE DATA OTHER CHALLENGES ✘ NSQ messages too large, zipping ineffective ✓Split response data/report data ✘ Sites block AWS IP ranges, (craigslist.com etc) ☹ Timeout… ✘ Concurrency issues ✓ Very careful use of go routines, channels and timers. ✘ Site analysis failures/timeouts ✓ Try 3 times, keep track of retry state. ✓ During retry, open a new browser and work on additional url
  • 34.
    DB WRITERS &S3 STORING THE DATA WITH:
  • 35.
    VERACODE DB WRITERS: STORINGTHE DATA PREVIOUSLY… ▸ Creeper v0 had many problems ▸ RDS did not support PostgreSQL 9.5 ▸ Duplicate data ▸ For v1, wrote to disk, SHA1 of contents: ▸ /job/files/5/a/b/c/5abcfbe73e39e0572a939b09f1eb16d7.html ▸ v1 did not shard database tables ▸ Database tables were normalized ▸ Lock contention
  • 36.
    VERACODE DB WRITERS: STORINGTHE DATA DATABASE REFRESHER - NORMALIZING url header_name header_value http://veracode.com x-xss-protection 1; mode=block http://codeblue.jp x-xss-protection 1; mode=block http://google.jp x-xss-protection 1; mode=block report-uri … url header_name_id header_value_id http://veracode.com 0 0 http://codeblue.jp 0 0 http://google.jp 0 1 header_name_id header_name 0 x-xss-protection header_value_id header_value 0 1; mode=block 1 1; mode=block report-uri … NORMALIZED: FLATTENED:
  • 37.
    VERACODE DB WRITERS: STORINGTHE DATA CHALLENGES - GETTING THE DATA IN QUICKLY ▸ Get the data out of the DB writers as soon as possible ▸ Careful to not overload the database with many connections ▸ Reduce lock contention for writing
  • 38.
    VERACODE DB WRITERS: STORINGTHE DATA SOLUTION #1 - GETTING THE DATA IN QUICKLY ▸ DB Writers batch up reports and responses ▸ Inserted every 2.5-3.5 seconds ▸ Reduces number of required DB connections
  • 39.
    VERACODE DB WRITERS: STORINGTHE DATA SOLUTION #1 BATCHER func (b *Batcher) AddReport(r *creeper_types.CreeperReport) { select { case b.reportPool <- r: atomic.AddInt32(&b.reportCount, 1) } } func (b *Batcher) EmptyReports() []*creeper_types.CreeperReport { reports := make([]*creeper_types.CreeperReport, 0) for { select { case report := <-b.reportPool: reports = append(reports, report) default: return reports } } return nil }
  • 40.
    VERACODE DB WRITERS: STORINGTHE DATA SOLUTION #2 - GETTING THE DATA IN QUICKLY ▸ Insert into temporary table using COPY FROM ▸ Extracted from temporary table and INSERTed into final table. This allows for UPSERTS: INSERT INTO header_names (header_name) SELECT responses_tmp.header_name FROM responses_tmp ON CONFLICT DO NOTHING;
  • 41.
    VERACODE DB WRITERS: STORINGTHE DATA CHALLENGES - LARGE TABLES ▸ INSERT INTO … FROM SELECT … on a table with 80,000,000 rows ▸ As tables got bigger, db writers slowed down ▸ This is not scalable
  • 42.
    VERACODE DB WRITERS: STORINGTHE DATA SOLUTION - TABLE SHARDING ▸ Much like sharding for the file system ▸ Requires a key: ▸ URL ID. (Ex: 1,google.com 2,microsoft.com etc) ▸ Only large tables require sharding
  • 43.
    VERACODE shardKey % inputId shardKey= 1 shardKey = 2 shardKey = 3 DB DB WRITERS: STORING THE DATA TABLE SHARDING WRITER
  • 44.
    VERACODE DB WRITERS: STORINGTHE DATA CREATING A SHARD KEY ▸ Choose the number of times to shard your tables: ▸ shardKey = input_id % 32 ▸ Created PLpgSQL functions: ▸ create unlogged table if not exists job_0_responses ( response_id serial primary key, input_id integer not null, body_hash varchar(64) not null, resp_url bytea not null, resp_uuid varchar(64) unique not null, resp_type_id integer references resp_types (resp_type_id) not null, status_id integer references status_lines (status_id) not null, status_code integer, mime_type_id integer references mime_types (mime_type_id) not null, response_time bigint ); EXECUTE merge_headers(job, shardKey)
  • 45.
    VERACODE DB WRITERS: STORINGTHE DATA CONS WITH SHARDING ▸ Added complexity for querying ▸ Best to create a new table with all data for reporting ▸ In the future, may use Citus for sharding across multiple databases
  • 46.
    VERACODE DB WRITERS: STORINGTHE DATA RESPONSE DATA (JS/HTML)
  • 47.
    VERACODE ▸ S3 limits100/rps, but pushing 200-2000/rps ▸ Had to contact support ▸ Exponential Backoff, retry 10 times ▸ Hash is stored in response table ▸ HeadObject first to check existence, then PutObject ▸ HeadObjects are way cheaper DB WRITERS: STORING THE DATA MOVING TO S3
  • 48.
    VERACODE DB WRITERS: STORINGTHE DATA LASTLY… ▸ Created unlogged tables ▸ Modified PostgreSQL configuration: ▸ Set checkpoints 5 minutes (max) instead of 1 ▸ Enabled fsync ▸ Set max_wal_size 256
  • 49.
    THE RESULTS A LOOKAT THE DATA
  • 50.
    VERACODE THE RESULTS: ALOOK AT DATA SCAN STATISTICS Responses 72,193,155 Headers 525,385,900 JS Results 1,943,925 URLs w/Errors 67,315 Redirected to HTTPS 145,268 URLS w/CSP Violations 740 Scan Time 15 Hours Cost 343$ / 35063円
  • 51.
    VERACODE THE RESULTS: ALOOK AT DATA CSP VIOLATIONS ▸ 722 out of 4965 sites using CSP had violations ▸ Security sites: ▸ https://www.globalsign.com/en/, http://secunia.com/, ▸ https://lastpass.com/, https://www.avant.com/, http:// www.veracode.com/ ▸ Well known organizations: ▸ http://www.alibaba.com, https://www.doubleclickbygoogle.com ▸ https://mozillians.org/en-US/
  • 52.
    VERACODE THE RESULTS: ALOOK AT DATA SUM OF CSP VIOLATION TYPES 0 750 1500 2250 3000 SCRIPTSRC IMGSRC FRAMESRC FONTSRC STYLESRC CONNECTSRC MEDIASRC CHILDSRC OBJECTSRC BASEURI FORMACTIONMANIFESTSRC
  • 53.
    VERACODE THE RESULTS: ALOOK AT DATA TOP JAVASCRIPT LIBRARIES > 3000 0 200000 400000 600000 800000 JQUERY JQUERY-UI MODERNIZR JQUERY-UI-DIALOG YEPNOPE JQUERY-UI-AUTOCOMPLETE JQUERY-UI-TOOLTIP BOOTSTRAP HTML5SHIV UNDERSCORE JQUERY.PRETTYPHOTO PROTOTYPEJS DRUPAL MOOTOOLS MEJS BACKBONE.JS ANGULARJS FOUNDATION JWPLAYER REQUIREJS HANDLEBARS.JS HAMMERJS JPLAYER MUSTACHE.JS SCRIPTACULOUS SHADOWBOX ZEROCLIPBOARD YUI RAPHAEL DATATABLES KNOCKOUT
  • 54.
    VERACODE THE RESULTS: ALOOK AT DATA JAVASCRIPT ‘NEXTGEN’ FRAMEWORKS > 100 0 4500 9000 13500 18000 BACKBONE.JS ANGULARJS FOUNDATION YUI KNOCKOUT DOJO REACTJS MARIONETTEJS VUEJS EMBER METEOR MITHRIL EXTJS POLYMER
  • 55.
    VERACODE THE RESULTS: ALOOK AT DATA VULNERABILITY COUNTS 0 20000 40000 60000 80000 JQUERY JQUERY-UI-DIALOG JQUERY.PRETTYPHOTO ANGULARJS JQUERY-UI-TOOLTIP JPLAYER HANDLEBARS.JS ZEROCLIPBOARD MUSTACHE.JS YUI PROTOTYPEJS MEJS JWPLAYER DOJO EMBER TINYMCE PLUPLOAD JQUERY-MOBILE CKEDITOR
  • 56.
    VERACODE THE RESULTS: ALOOK AT DATA LONGEST SECURITY HEADER AWARD - HTTPS://WWW.INSIGHTGUIDES.COM/ Content-Security-Policy: default-src 'self' http://tagmanager.google.com https://tagmanager.google.com https://*.doubleclick.net http://*.doubleclick.net https://*.google- analytics.com http://*.google-analytics.com https://*.livechatinc.com http://*.livechatinc.com https://*.cloudfront.net http://*.cloudfront.net https://*.googleusercontent.com http:// *.googleusercontent.com https://www.bugherd.com http://www.bugherd.com https://*.braintreegateway.com http://*.braintreegateway.com https://www.biblioimages.com http:// www.biblioimages.com https://fonts.gstatic.com http://fonts.gstatic.com https://*.googleapis.com http://*.googleapis.com https://tripadvisor.com http://tripadvisor.com https:// *.gstatic.com http://*.gstatic.com https://www.tripadvisor.com http://www.tripadvisor.com https://www.insightguides.com http://www.insightguides.com https://rum- static.pingdom.net http://rum-static.pingdom.net https://rum-collector.pingdom.net http://rum-collector.pingdom.net https://*.youtube.com http://*.youtube.com https:// www.googleadservices.com http://www.googleadservices.com https://connect.facebook.net http://connect.facebook.net https://googleads.g.doubleclick.net http:// googleads.g.doubleclick.net https://www.facebook.com http://www.facebook.com https://cdn.inspectlet.com http://cdn.inspectlet.com https://hn.inspectlet.com http:// hn.inspectlet.com https://*.apa.yoda.site http://*.apa.yoda.site https://www.preprod.apa.yoda.site http://www.preprod.apa.yoda.site https://www.test.apa.yoda.site http:// www.test.apa.yoda.site https://www.google.com http://www.google.com https://www.google.pl http://www.google.pl https://www.google.co.uk http://www.google.co.uk https:// google.com http://google.com https://google.pl http://google.pl https://google.co.uk http://google.co.uk https://ethn.io http://ethn.io https://stats.g.doubleclick.net http:// stats.g.doubleclick.net https://platform.instagram.com http://platform.instagram.com https://instagram.com http://instagram.com https://www.instagram.com http:// www.instagram.com https://*.amazonaws.com http://*.amazonaws.com blob:; script-src 'self' http://www.googletagmanager.com https://www.googletagmanager.com http:// tagmanager.google.com https://tagmanager.google.com https://*.doubleclick.net http://*.doubleclick.net https://*.google-analytics.com http://*.google-analytics.com https:// *.livechatinc.com http://*.livechatinc.com https://*.cloudfront.net http://*.cloudfront.net https://*.googleusercontent.com http://*.googleusercontent.com https://www.bugherd.com http://www.bugherd.com https://*.braintreegateway.com http://*.braintreegateway.com https://www.biblioimages.com http://www.biblioimages.com https://fonts.gstatic.com http:// fonts.gstatic.com https://*.googleapis.com http://*.googleapis.com https://tripadvisor.com http://tripadvisor.com https://*.gstatic.com http://*.gstatic.com https://www.tripadvisor.com http://www.tripadvisor.com https://www.insightguides.com http://www.insightguides.com https://rum-static.pingdom.net http://rum-static.pingdom.net https://rum- collector.pingdom.net http://rum-collector.pingdom.net https://*.youtube.com http://*.youtube.com https://www.googleadservices.com http://www.googleadservices.com https:// connect.facebook.net http://connect.facebook.net https://googleads.g.doubleclick.net http://googleads.g.doubleclick.net https://www.facebook.com http://www.facebook.com https://cdn.inspectlet.com http://cdn.inspectlet.com https://hn.inspectlet.com http://hn.inspectlet.com https://*.apa.yoda.site http://*.apa.yoda.site https://www.preprod.apa.yoda.site http://www.preprod.apa.yoda.site https://www.test.apa.yoda.site http://www.test.apa.yoda.site https://www.google.com http://www.google.com https://www.google.pl http:// www.google.pl https://www.google.co.uk http://www.google.co.uk https://google.com http://google.com https://google.pl http://google.pl https://google.co.uk http://google.co.uk https://ethn.io http://ethn.io https://stats.g.doubleclick.net http://stats.g.doubleclick.net https://platform.instagram.com http://platform.instagram.com https://instagram.com http:// instagram.com https://www.instagram.com http://www.instagram.com https://*.amazonaws.com http://*.amazonaws.com 'unsafe-eval' 'unsafe-inline' https://apis.google.com blob:; connect-src * 'self' http://tagmanager.google.com https://tagmanager.google.com https://*.doubleclick.net http://*.doubleclick.net https://*.google-analytics.com http://*.google- analytics.com https://*.livechatinc.com http://*.livechatinc.com https://*.cloudfront.net http://*.cloudfront.net https://*.googleusercontent.com http://*.googleusercontent.com https:// www.bugherd.com http://www.bugherd.com https://*.braintreegateway.com http://*.braintreegateway.com https://www.biblioimages.com http://www.biblioimages.com https:// fonts.gstatic.com http://fonts.gstatic.com https://*.googleapis.com http://*.googleapis.com https://tripadvisor.com http://tripadvisor.com https://*.gstatic.com http://*.gstatic.com https://www.tripadvisor.com http://www.tripadvisor.com https://www.insightguides.com http://www.insightguides.com https://rum-static.pingdom.net http://rum-static.pingdom.net https://rum-collector.pingdom.net http://rum-collector.pingdom.net https://*.youtube.com http://*.youtube.com https://www.googleadservices.com http://www.googleadservices.com https://connect.facebook.net http://connect.facebook.net https://googleads.g.doubleclick.net http://googleads.g.doubleclick.net https://www.facebook.com http:// www.facebook.com https://cdn.inspectlet.com http://cdn.inspectlet.com https://hn.inspectlet.com http://hn.inspectlet.com https://*.apa.yoda.site http://*.apa.yoda.site https:// www.preprod.apa.yoda.site http://www.preprod.apa.yoda.site https://www.test.apa.yoda.site http://www.test.apa.yoda.site https://www.google.com http://www.google.com https:// www.google.pl http://www.google.pl https://www.google.co.uk http://www.google.co.uk https://google.com http://google.com https://google.pl http://google.pl https:// google.co.uk http://google.co.uk https://ethn.io http://ethn.io https://stats.g.doubleclick.net http://stats.g.doubleclick.net https://platform.instagram.com http:// platform.instagram.com https://instagram.com http://instagram.com https://www.instagram.com http://www.instagram.com https://*.amazonaws.com http://*.amazonaws.com blob:;
  • 57.
    VERACODE THE RESULTS: ALOOK AT DATA SOME OF MY FAVORITE HTTP STATUS LINES ▸ HTTP 500 access denied ("java.io.FilePermission" "D: homeXXXXXXXXX.comoriModelGlueunity eventrequestEventRequest.cfc" "read") ▸ HTTP 500 "Duplicate entry '1473335051' for key 'timestamp' SQL=INSERT INTO `#__zt_visitor_counter` (`id`,`timestamp`,`visits`,`guests`,`ipaddress`,`useragent`) VALUES (null, '1473335051', 1 , 1 , '54.208.81.16', ‘chrome')" ▸ HTTP 500 "Server Made Big Boo"
  • 58.
  • 59.
    VERACODE THE RESULTS: ALOOK AT DATA CONCLUSION ▸ Use NSQ, seriously. ▸ Concurrency can be difficult ▸ Batch data before inserting to DB ▸ If DB rows > a few million, consider sharding ▸ Test different types of table schema for performance ▸ Treat browsers like garbage and handle appropriately
  • 60.
    VERACODE THE RESULTS: ALOOK AT DATA QUESTIONS? ▸ twitter: @_wirepair ▸ github: wirepair ▸ gcd: https://github.com/wirepair/gcd ▸ autogcd: https://github.com/wirepair/autogcd ▸ killface: https://github.com/wirepair/killface ▸ Thanks to all my coworkers supporting and listening to my daily rants!