The world is y0ur$: Geolocation-based wordlist generation with wordsmith
THE WORLD IS Y0UR$:
GEOLOCATION-BASED WORDLIST
GENERATION WITH WORDSMITH
SANJI V KAWA | TO M PO RTER
@ h a c k e r j i v | @ p o r t e r h a u 5
Formalities
2
Sanjiv Kawa
@hackerjiv
S R . P E N E T R A T I O N T E S T E R
P S C / N C C G R O U P
• Roots in dev and IT
• Penetration testing
• Binary analysis and exploitation
Formalities
3
Tom Porter
@porterhau5
S R . S E C U R I T Y C O N S U L T A N T
F U S I O N X R E D T E A M
• Flow data analytics
• Penetration testing
• Red teaming
• BloodHound extensions
What is Wordsmith?
4
Custom wordlist generation
Crack hashes / password
attacks
Tailored for your target
Geo-location data Modular and extensible
Username generation
Dictionary Attack
5
1. Guess
2. Encrypt
3. Compare
apple
banana
cherry
…
$hash <- encrypt(apple)
$hash : 5ebe7dfa074da8ee8aef1faa2bbde876
Search for $hash in obtained hash list:
af5432a79b941528fa7fac9e7e391651
5ebe7dfa074da8ee8aef1faa2bbde876
8846f7eaee8fb117ad06bdd830b7586c
Wordsmith v1: Geo-location Data Collected
7
Major league sports teams
Colleges and universities
Common names
Area codesZip codes
Streets and roads
Landmarks
Cities, towns, etc
Wordsmith v1: Additional Features
8
CeWL Integration
Basic mangling
(whitespace, specials, split
on space)
Specify minimum
character length
To lowercase[a-z]
Wordsmith v1: Things we learned
9
Feedback from the community was incredible. Thank you!
Top three requests:
1. More countries need to be available (v1 was US only)
2. Needs to be a way to introduce more/your own data
3. Limited to the English language
Wordsmith v2
10
New CLI design
Multi-language
(13 so far!)
Introduced religions
Generate usernames
Modular framework allows
for user contribution and
extensibility
Geo-location data sets
for over 230 countries!
Data Sources
Coverage: World
Data types: Population, Religion,
Languages, etc
11
www.cia.gov/library/publications/the-world-
factbook/geos/print_[aa-zz].html
Coverage: 13 languages (hunspell)
Data Sources
12
Coverage: US
Data Types: Sports teams, colleges
Coverage: World
Data Types: Landmarks and archeological
sites
Coverage: World
Data Types: Religious texts
Data Sources
13
Coverage: World
Data Types: Roads, Cities, Counties
Coverage: US
Data Types: Popular first names. Last
names
Coverage: US
Data Types: Area Codes, Zip Codes
How to get Wordsmith
14
❯ git clone https://github.com/skahwah/wordsmith.git
❯ cd wordsmith
❯ bundle install # (optional for CeWL integration)
❯ ruby wordsmith.rb
wordsmith v2.0.7
Written by: Sanjiv "Trashcan Head" Kawa & Tom "Pain Train" Porter
Twitter: @hackerjiv & @porterhau5
[*] Hello new wordsmither!
[*] This script will remove the data/ directory in the current working
directory. Enter 'y' to continue: y
[*] Just need to unpack some files (Running: tar -xf data.tar.xz)
[*] Unpack completed!
[*] CeWL found: /usr/bin/cewl
Files
15
❯ ls -l
-rw-r--r-- 1 user staff 3159 Oct 1 22:57 CHANGELOG.md
drwxr-xr-x 2 user staff 4096 Oct 1 22:57 data
-rw-r--r-- 1 user staff 50602888 Oct 1 22:57 data.tar.xz
-rw-r--r-- 1 user staff 116 Oct 1 22:57 Gemfile
-rw-r--r-- 1 user staff 1393 Oct 1 22:57 LICENSE
-rw-r--r-- 1 user staff 7514 Oct 1 22:57 README.md
-rwxr-xr-x 1 user staff 31081 Oct 1 22:57 wordsmith.rb
• View README first, or check out –E option (examples)
• wordsmith.rb: primary ruby script
• data.tar.xz (~50 MB): compressed archive of data
• data/ (~250 MB): data arranged in hierarchy
Boundaries & Attributes
16
Boundaries (-I <input>)
• Areas of the world to get
words for
• 249 countries and
territories
• States/Provinces
• Cities
• Custom regions
Attributes (ex: -r -l)
• Types of words to grab:
• Cities
• Colleges
• Landmarks
• Languages
• Names
• Roads
• Religions
• and more…
❯ ruby wordsmith.rb –I usa –r –l
Structure
17
❯ ls data/
abw afg ago aia ala alb and are arg arm ... wlf wsm yem zaf zmb zwe
ISO ALPHA-3 Country Codes
❯ ls data/usa
ak al ar az ca cia.txt co ct dc ... tx usa.yaml ut va vt wa wi wv wy
States, Provinces, Counties, Municipalities
❯ ls data/usa/nc
areacodes.txt charlotte cities.txt colleges.txt counties.txt ...
Cities, Counties
❯ ls data/usa/nc/charlotte
sports.txt
Attributes (sports, colleges, roads, etc.) are .txt files
Boundaries and Input
18
❯ ruby wordsmith.rb –I usa [options]
❯ ruby wordsmith.rb –I usa-nc [options]
❯ ruby wordsmith.rb –I usa-nc-charlotte [options]
❯ ruby wordsmith.rb –I usa,can [options]
❯ ruby wordsmith.rb –I usa-dc,usa-md,usa-va [options]
-I for specifying input boundaries
Can supply one or many boundaries
❯ ruby wordsmith.rb –I 10 [options]
Providing a number (ex: 10) will select N most populous countries
Regions
19
❯ ruby wordsmith.rb –I europe [options]
❯ grep europe data/regions.csv
europe,"Continent of Europe",ala alb and arm aut aze bel bgr bih blr che
cyp cze deu dnk esp est fin fra fro gbr geo ggy gib grc hrv hun imn irl
isl ita jey kaz lie ltu lux lva mco mda mkd mlt mne nld nor pol prt rou
rus sjm smr srb svk svn swe tur ukr vat
regions.csv contains custom grouping of boundaries
Can see regions with -R option:
❯ ruby wordsmith.rb –R
Alias: newengland
Description: US - New England
Members: usa-ct usa-me usa-ma usa-nh usa-ri usa-vt
Alias: mideast
Description: US - Mideast
Members: usa-de usa-dc usa-md usa-nj usa-ny usa-pa
Alias: greatlakes
Description: US - Great Lakes
Members: usa-il usa-in usa-mi usa-oh usa-wi
Attributes
20
❯ ruby wordsmith.rb –I europe [options]
❯ ruby wordsmith.rb –h
Main Arguments:
-I, --input <input> Comma-delimited list of inputs
Input Options:
-a, --all Grab all options
-b, --other Grab other miscellaneous attributes
-e, --cia Grab demographics compiled by the CIA
-c, --cities Grab all city names
-f, --colleges Grab all college sports
-l, --landmarks Grab all landmarks
-v, --language Grab the most popular language(s)
-N, --all-names Grab all first names and last names
-G, --first-names Grab all first names
-L, --last-names Grab all last names
-F, --female-fnames Grab all female first names
-M, --male-fnames Grab all male first names
-p, --phone Grab all area codes
-r, --roads Grab all road names
-g, --religion Grab the most popular relgious text(s)
-t, --teams Grab all major sports teams
-u, --counties Grab all counties
-z, --zip Grab all zip codes
Attribute Examples
21
❯ ruby wordsmith.rb –I usa-ca -z
90001
90002
90003
90004
...
Grab all zip codes for California
❯ ruby wordsmith.rb –I gbr-eng –r –c -l
Ab Kettleby
Abberley
Abberton
Abbess Roding
...
Grab all roads, cities, and landmarks for England, GBR
❯ ruby wordsmith.rb –I asia -a
Abas
Abatan
Abbeg
Abejao
...
Grab all attributes for Asia
Country Metadata
23
❯ ls -l data/jpn/
-rw-r--r-- 1 user staff 32002 Aug 30 19:16 cia.txt
-rw-r--r-- 1 user staff 13184 Sep 9 2016 cities.txt
-rw-r--r-- 1 user staff 5608 Sep 9 2016 counties.txt
-rw-r--r-- 1 user staff 107 Aug 30 19:36 jpn.yaml
-rw-r--r-- 1 user staff 113672 Oct 1 21:10 landmarks.txt
-rw-r--r-- 1 user staff 871994 Sep 9 2016 roads.txt
❯ cat data/jpn/jpn.yaml
config:
population: 126,702,133
language_1: Japanese
religion_1: Shintoism
religion_2: Buddhism
The World Factbook:
Population
Official languages
Most popular religions
Most populous countries (ex: -I 25)
Official languages (-v, --language)
Most popular religions (-g, --religion)
Religions
24
❯ wc -l data/religion/*
28168 douay-rheims-parsed.txt
97682 king-james-bible-book-verse.txt
20190 king-james-bible-parsed.txt
42876 niv-bible-parsed-spanish.txt
34202 niv-bible-parsed.txt
7872 quran-parsed-eng.txt
❯ cat king-james-bible-book-verse.txt
The First Book of Moses: Called Genesis
Genesis1:1
1:1Genesis
John3:16
3:16John
...
❯ cat king-james-bible-parsed.txt
...
Jesuite
Jesus
Jether
Jetheth
Jethro
...
(-g, --religion)
Identified the most
common religions
• KJV Bible
• NIV Bible
• Douay Rheims
• Quran
~ 200 countries are
covered
Languages
25
❯ head –n 5 language-frequency.txt
83:English
38:French
29:Spanish
26:Arabic
11:Russian
❯ wc -l data/languages/*.txt
457097 arabic.txt
47866 bahasa.txt
110750 bengali.txt
115485 cedict.txt
466544 english.txt
72038 french.txt
585844 german.txt
338534 hebrew.txt
15990 hindi.txt
95152 italian.txt
47866 malay.txt
340235 portuguese.txt
379324 russian.txt
798915 spanish.txt
371169 turkish.txt
(-v, --language)
Identified the most
common languages
~ 195 countries are
covered
Modular Design
26
❯ ls data/usa/mn/
areacodes.txt colleges.txt fnames.txt landmarks.txt sports.txt
cities.txt counties.txt lakes.txt roads.txt zipcodes.txt
❯ cat data/usa/mn/lakes.txt
Aaron
Abbey
Acorn
Adelman's Pond
...
❯ ruby wordsmith.rb –I usa-mn –b
Aaron
Abbey
Acorn
Adelman's Pond
...
Modular design:
- Easily extensible
- Introduce your own .txt files (grab with –b option)
- Contribute and help build the project
Output Options
27
❯ ruby wordsmith.rb –h
<Input options snipped>
Output Options:
-o, --output FILE The filename for writing output
-q, --quiet Don't show words, use with -o option
-k, --min-length LEN Minimum length of word to include
-n, --max-length LEN Maximum length of word to include
-D, --complexity Words meet Windows default complexity
-j, --lowercase Convert all words to lowercase
-w, --specials Add words with special chars removed
-x, --spaces Add words with spaces removed
-y, --split Split words by space and add
-m, --mangle Add all permutations (-w, -x, -y)
-P, --prepend-phones Prepend state area codes to each word
-A, --append-phones Append state area codes to each word
-X, --prepend-zips Prepend zip codes to each word
-Z, --append-zips Append zip codes to each word
-W, --prepend-wordlist FILE Prepend words in FILE to each word
-Y, --append-wordlist FILE Append words in FILE to each word
Username Generation
33
❯ ruby wordsmith.rb –h
<other options snipped>
Username Generation Options:
--filn FirstInitialLastName (bsmith)
--fnln FirstNameLastName (bobsmith)
--fnli FirstNameLastInitial (bobs)
--lnfi LastNameFirstInitial (smithb)
--lnfn LastNameFirstName (smithbob)
--fidln FirstInitial.LastName (b.smith)
--fndln FirstName.LastName (bob.smith)
--truncate LEN Truncate username at LEN number of chars (bobsmi)
--max-users LEN Max number of usernames to generate
--name-depth LEN Num of first/last names to iterate over
(default:100, 0 will get all)
• Generate different username formats
• Use --max-users and --name-depth to handle speed &
volume
Username Generation
34
❯ ruby wordsmith.rb –I usa --fnln
JamesSmith
JamesJohnson
JamesWilliams
JamesBrown
JamesJones
JamesGarcia
JamesMiller
...
First name Last Name
❯ ruby wordsmith.rb –I usa --fndln
James.Smith
James.Johnson
James.Williams
James.Brown
James.Jones
James.Garcia
James.Miller
...
First name (dot) Last Name
Username Generation
35
❯ ruby wordsmith.rb –I usa –filn –-truncate 8
...
aDavis
aRodrigu
aMartine
aHernand
aGonzale
aWilson
aAnderso
...
Truncate down to 8 characters
❯ ruby wordsmith.rb –I usa –lnfn –q
usernames in ./data/usa: 10000
❯ ruby wordsmith.rb –I usa –lnfn –q --name-depth 250
usernames in ./data/usa: 62500
❯ ruby wordsmith.rb –I usa –lnfn –q --name-depth 1000
usernames in ./data/usa: 1000000
Adjust --name-depth to generate more usernames
Multinational Organization Results
38
• Organization has offices in USA, Australia and Canada
• Unable to disclose total number of hashes
Wordlist Hashcat
run time
Number of
passwords recovered
Top 10k
(10k words)
4 sec
Rockyou
(14.4m words)
30 mins
AUS, CAN, USA Wordlist
(7.3m words)
13 mins
256
476
241
ruby wordsmith.rb -I aus,can,usa -a -j -q -m
-o aus-can-usa-all-lowercase-q-m.txt
• Collecting and collating this data required the
development of some parsers
Parsers
40
❯ git clone https://github.com/skahwah/wordsmith_parsers.git
❯ ls
LICENSE cia-parsers landmark-parser osm-parsers
README.md census-parsers names-parsers religion-parsers
https://github.com/skahwah/wordsmith_parsers
Future Work
41
• Data!
– Diving deeper into OpenStreetMap
– Popular song lyrics (h/t @pfizzell)
– Got ideas? We’d love to hear them!
• Skills
– GIS
– Multiple language speakers
– Obscure website hunting & scraping
• Design
– Lookups based on coordinates
Thank you!
42
Sanjiv Kawa
@hackerjiv
S R . P E N E T R A T I O N T E S T E R
P S C / N C C G R O U P
Tom Porter
@porterhau5
S R . S E C U R I T Y C O N S U L T A N T
F U S I O N X R E D T E A M
https://github.com/skahwah/wordsmith