Turbocharge Your Front-End Talent Sourcing Efforts for Free: How Non-Programmers Can Automate Name-Generation Through Web Scraping - presented by Glenn Gutmacher at SourceCon 2013


Published on

Turbocharge Your Front-End Talent Sourcing Efforts for (Nearly) Free: How Non-Programmers Can Automate Name-Generation Through Web Scraping

by Glenn Gutmacher, North America Sourcing Group Manager, Avanade Inc. & founder, Recruiting-Online.com

Does your sourcing need to support multiple recruiters, or do you manage a recruiting team that needs to figure out how to scale its sourcing efforts more efficiently? Whether your leads go into an ATS or a recruiting CRM, then you first need to look at the front end of the funnel. Are you getting enough of the active and (particularly) passive candidates you need? Learn how to solve one key piece of the puzzle from a 1990s pioneer in Internet sourcing who's trained and/or supported large sourcing/recruiting teams at various corporate and third party recruiting operations ever since.

Multi-resume database search/retrieve tools and huge social network profile repositories are great, but what if you can't afford the licenses they charge, or you're wondering what else you can do? You can parse passive candidates from the Deep Web efficiently for about $60 (lifetime) with OutWit Hub or automate similar activities with iMacros. Are you using feeds to pull relevant candidates (resumes, profiles and even more passive online footprints) to you on an ongoing basis, analogous to job boards' resume alerts, from the open web? Bing results via RSS can do that for you, or in an even more robust way via Yahoo Pipes, all for free. Learn how to use powerful free (or near free) tools in all these categories in a real recruiting context (we'll go live online), not just bullets on a slide.

About the presenter: Since September 2010, Glenn Gutmacher has been developing innovative sourcing strategies, methods and tools in ways that scale cost-effectively for Avanade, a $1 billion IT consulting firm and most-awarded Microsoft Gold partner, jointly owned by Accenture and Microsoft since its founding in 2000. He is primarily focused on front-end candidate pipelining methodologies, leading an online-focused offshore team and a junior onshore calling team that support North America Recruiting. He also leads some global recruiting training and sourcing initiatives.
In the 1990s, Glenn created one of the recruiting industry's first Internet sourcing seminars, training recruiters and sourcers from hundreds of companies ranging from the Fortune 500 to small staffing firms. Recruiting-Online.com remains the world's longest continuously-running, self-paced 100% online course for learning candidate sourcing.
Glenn was a senior Internet researcher for Microsoft from July 2005 to September 2008, focusing on competitive intelligence and proactive international sourcing. He sourced the two previous years at Getronics North America (2003-05) where his work contributed meaningfully to two finalist nominations in 2005's ERE’s Excellence Awards. Glenn entered the recruiting field in 1996 by founding one

Published in: Technology, Design
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Turbocharge Your Front-End Talent Sourcing Efforts for Free: How Non-Programmers Can Automate Name-Generation Through Web Scraping - presented by Glenn Gutmacher at SourceCon 2013

  1. 1. Turbocharge Your Front-End Talent Sourcing Efforts for (Nearly) Free: How Non-Programmers Can Automate Name-Generation Through Web Scraping Glenn Gutmacher North America Sourcing Group Manager, Avanade Inc. & founder, Recruiting-Online.com Glenn.Gutmacher@avanade.com Please connect - www.linkedin.com/in/gutmach Presented by Glenn Gutmacher 2013.
  2. 2. Why this topic? What are we talking about? • How many of you are sourcing-focused individual contributors (i.e., you don't manage a recruiting team)? • How secure are you about your job? •But still hard • People information is easier to come by online and/or timeconsuming to – Proliferating thanks largely to parse social networks – More junk mixed in with the results (miss the Y2K era?) – Need even more targeted search strings 2 – Need efficient tools – Learning curve – Cost 2 Presented by Glenn Gutmacher 2013.
  3. 3. Which would you rather do? An expensive software suite but with more bells and whistles; modest learning curve (e.g., Broadlook, eGrabber) vs. An inexpensive tool, requires a bit more tech savvy and lacks a bit on functionality (e.g., Outwit Hub) vs. Build it yourself free…if you can (e.g., your name is Mike Notaro or John Turnberg) Broadlook Diver is the more robust version of Outwit Docs, Broadlook Eclipse is the easier equivalent to Outwit Hub Pro… for about $2,500 more per tool per person! (Other Broadlook suite products, notably Profiler and MarketMapper, have no Outwit equivalent.) 3 Presented by Glenn Gutmacher 2013. 3
  4. 4. Broadlook Diver • You can add favorites (just like a web browser) • Pre-load a bunch of saved searches (.blf file) • Embed prompts for keyword variations to enter in search strings Example resume search (Resumes – Google: ~CV you): • Note the prompts for keyword, job title, etc. • Dive for: Resumes  Dive into Results • Parses the resume body • Filter results containing desired fields (name, title, company, phone and/or email • Export to Excel 4 4 Presented by Glenn Gutmacher 2013.
  5. 5. Broadlook Diver • User-friendly to filter results by keyword Examples - contact search (Cardiology Nurses TX): • Dive for: Contacts  Dive into results • Same filters and export Example Social Networks  Google - LinkedIn profiles (accountant “greater Boston area”): • Magic Dive LinkedIn • Tool doesn’t parse as many fields as it used to 5 5 Presented by Glenn Gutmacher 2013.
  6. 6. Outwit Docs – download resumes, etc. The Outwit Hub functionality subset of downloading actual files in bulk as described on this slide is something you can also do with the free Outwit Docs tool. To download resumes or other documents: 1. Example – in Outwit, Google this: ~cv xamarin android education present (filetype:doc OR filetype:docx OR filetype:pdf) 2. Click Documents in left-hand column navigation menu 3. Click any result in main pane, then Select All (Ctrl+A) 4. Right mouse click, select “Download Selected Files in” (specify desired folder) 5. All linked resumes in the Google search results are downloaded in a few seconds! What if it’s >1 page of results? 1. Google this in Outwit: resume java developer android NY 10001..11999 (filetype:doc OR filetype:docx OR filetype:pdf) 2. Do previous steps 2-4 (at left) 3. Click Page in Outwit's leftcolumn nav 4. Click Next button at bottom of Google results page 5. Do previous steps 2-4 at left, to download the new set of corresponding documents! Presented by Glenn Gutmacher 2013. 6 6
  7. 7. Associations and Virtual Communities (MeetUps, portfolio sites, etc.) Many technical and other communities exist online with plenty of info about individual talent, collected into similar-skilled buckets. You can web scrape *any* of these, but check the site’s terms of service. Some are best searched directly within the site using its native search, e.g.: • profiles of users on Github • MeetUp.com (and see this article about how to source from them) • portfolios on Coroflot or Behance.net • Outwit Hub example: http://portfolios.aiga.org While others may yield better results using a search engine, e.g.: • StackOverflow.com (and its technical sister sites) – try Googling: – TECH TERMS: site:stackoverflow.com inurl:users intitle:user sitecore – LOCATIONS: site:stackoverflow.com inurl:users intitle:user (houston OR texas OR "tx") • Yes, you can web scrape the content linked from Google results, too! 7 Presented by Glenn Gutmacher 2013. 7
  8. 8. Outwit Hub – one-step data scraping off a webpage using Guess function Sometimes it’s very easy for Outwit to guess what you want to do Some simple find/ with data on a webpage, and it will scrape it with just one click! 1. 2. 3. 4. replace changes in Example - http://partywithpalermo-eorg.eventbrite.com Excel make Scroll down page to click “Show More” at bottom of name list. this ready Click Guess button to share. Click on a result, right mouse click and Select All, then Export 8 Presented by Glenn Gutmacher 2013. 8
  9. 9. Outwit – scrape LinkedIn Public Profiles You don’t need to be a LinkedIn member (nor be logged in if you are) to use this (but don’t worry, LinkedIn stakeholders – you’ll see why this helps you, too): 1. 2. 3. 4. Click Page tab in Outwit left-hand column navigation, go to Google.com, then run this query: site:www.linkedin.com (inurl:pub OR inurl:in) "sitecore developers" -inurl:company -inurl:groups -inurl:dir Empty the catch (clears out previous searches’ results) then make sure "On page load" has "Catch selection" checkbox selected. At bottom of last results page, click "repeat the search with the omitted results included" which adds &filter=0 to URL. Who knows why this is important? Click “Links” tab in Outwit left-column nav. Click the Page Url column heading to sort links by URL. We are only interested in URLs of LI profiles, i.e., in the format www.linkedin.com/in/username or www.linkedin.com/pub/username so... Presented by Glenn Gutmacher 2013. 9 9
  10. 10. Outwit – scrape LI Public Profiles (cont’d.) You need to know a bit of RegEx (or just copy boldface exactly when you run it): 5. 6. 7. 8. 9. use filter "Select row if Page Url" contains /http://www.linkedin.com/(in|pub)// (trim any leading/trailing spaces) and click Catch button. (You will see about 7-10 links added to the bottom catch pane.) Click Next link at bottom of Google results page and you will see the catch has added another bunch of links from that second page of results. Click Next link at bottom of 2nd Google results page and you will see the catch has added another bunch of links from this 3rd results page. Repeat previous step until you are at the end of your Google search results (around 200 results in this case). Export the catch to a file and run a scraper on it (this is the magic!) that parses data into the proper fields. 10 10 Presented by Glenn Gutmacher 2013.
  11. 11. Outwit – scrape LI Public Profiles (cont’d.) Creating a scraper: • Go to an example page that contains the data you want to scrape (e.g., http://portfolios.aiga.org/ShannonGlutting below). • Click Scraper in left-column navigation. This creates a two-pane view with the page’s source code on top and the scraper grid on the bottom (see screenshot below). • The key is finding what surrounds your important data and setting those as the “Marker Before” and “Marker After” (which can include HTML tags). Use the Find: bar between the panes to jump to the part of the page you want. • Do this repeatedly for each field you want, then save the scraper. 11 11 Presented by Glenn Gutmacher 2013.
  12. 12. Outwit – scrape LI Public Profiles (cont’d.) Or you could automate this even further by creating macros and a job in Outwit Hub: • Notice all the previous Google result page URLs follow a pattern, where the only difference is the number following &start= • So we’ll set up an Outwit job containing two macros: • The first macro will: (1) scrape the Google result URLs we want, and 2) catch links that fit our desired pattern (i.e., LI public profiles), then • The second macro will: (1) run the relevant scraper to put the desired profile data into fields, and (2) export all that to Excel! The key syntax in the macro is in the boldfaced part of the “Start Page or Query Directory” field value https://www.google.com/search?q=site:ww w.linkedin.com+(inurl:pub+OR+inurl:in)+%22 sitecore+developers%22+-inurl:company+inurl:groups+inurl:dir&biw=1460&bih=531&noj=1&start=[ 0:250/10]&sa=N&filter=0 • This square-bracketed section means use parameter values of 0 up to 250 (since that’s about the total # of Google results), in increments of 10 (since Google displays 10 results/page) • For our purposes, we’ll change to [0:20/10] in order to take just a few pages’ worth of results to illustrate (through p.3 / start=20), but it can do all the way to the end just as easily (just takes a bit longer to run). 12 12 Presented by Glenn Gutmacher 2013.
  13. 13. Outwit Hub – using its results *with* LinkedIn synergistically Now here’s where we make LinkedIn and its paid users happy… I’m very much for using LinkedIn Recruiter, both for inmail functionality when you don’t know/lack time to find email addresses and because it will allow you to search more results (all 250MM+ profiles!) vs. what’s available via public profiles. The key is the quality of those additional results, so enter Outwit Hub... Example: find Microsoft Gold partners in key IT solution technology areas from http://pinpoint.microsoft.com 1. Scrape the company names 2. Create a Boolean OR string containing all of them (an Excel template can help) 3. Paste it in LinkedIn Recruiter’s advanced search Company field (remember, LinkedIn supports thousands of chars per field!) 4. Combine with job titles to find more quality results than if you searched on technology keywords (a shortcuts macro tool like PerfectKeyboard can help greatly to store and quickly access these). Why is this better than keyword search? Presented by Glenn Gutmacher 2013. 13 13
  14. 14. Is web scraping too complex for you? • It’s not quite programming, but it is technical. So get your techiest sourcer on it – use products’ built-in training/tutorials! • Note that all Outwit automators (scrapers, macros, jobs, etc.) can be saved in a small gear file and transferred to any other Outwit Hub Pro user, to avoid re-creating the wheel. For further scaling, each license can be saved onto 1-3 computers you control (but each license can only be run from one computer at any given moment). • Or try a simpler program: Email Sourcer 1.0 was created by Outwit with recruiters in mind to extract email addresses and (when available) tries to associate phone, fax, toll free number, physical address, URL, etc., to each email found. • It grabs this information from a large series of pages (within current website, in all page links, or in all linked websites) without you ever seeing the source code. Export extracted data in a click to TXT, CSV, HTML, Excel or SQL db. • Or pay to go with a more robust alternative for finding people and autopopulating contact info (e.g., Broadlook Profiler or the less expensive Account Researcher from eGrabber) • Or at least start with these easy and free automation tools… 14 14 Presented by Glenn Gutmacher 2013.
  15. 15. Distributing Your Jobs (and other content) for Free, More Effectively/Efficiently When it comes to building candidate pipelines, a recruitment marketing push complements your sourcing pull… Leverage your employees’ social networks to get the word out: • Dlvr.it – post from any RSS feed (note: starting March 2013, Twitter searches can no longer be fed this way: http://search.twitter.com/search.atom?q=+from%3Ainfacloud) • IFTTT.com – IF This Then That. Lets you customize automated task “recipes”. • SproutSocial – schedule posts on a recurring schedule and can centralize multiple profiles (fee-based) • HootSuite – free for an individual user (limited profile accounts) and can also pre-schedule and centralize for multiple users like SproutSocial (feebasd) • Bullhorn Reach (and see sister Bullhorn Reach Radar service) • LinkedIn Groups (group membership required to post) 15 15 Presented by Glenn Gutmacher 2013.
  16. 16. RSS feeds to Microsoft Outlook RSS feeds can be processed like emails in your email program: • Outlook 2007 or newer - RSS reader integrated so feeds can be processed just like emails: Right mouse click on RSS Feeds folder, select Add a New RSS Feed, and enter the RSS URL (see at right) • Add an Outlook rule to forward results to appropriate team members to process/research. For Bing.com searches, just append &format=rss to any results URL to create a feed, e.g.: www.bing.com/search?q="how+to+configure"+instreamset%3A(title+url+anchor)%3Asharepoi nt&qs=n&form=QBRE&pq="how+to+configure"+instreamset%3A(title+url+anchor)%3Asharepo int&format=rss 16 Presented by Glenn Gutmacher 2013.
  17. 17. Q&A Any other examples you want to try? Any other related cool tools you’ve been using effectively, such as: • Things I didn’t get to, such as iMacros and Yahoo Pipes • Other browser tools (e.g., bookmarklets, Rapportive, TamperMonkey) • Resume aggregator automation tools (e.g., InfoGIST, TalentHook, etc.) • Profile aggregation tools (e.g., Dice Open Web, Entelo, Gild, HiringSolved, SwoopTalent, TalentBin, Yatedo)? If time doesn’t allow or you want to discuss later, feel free to email me (best) at glenn.s.gutmacher@avanade.com or glenn@recruiting-online.com Or ping via one of these social networks: www.linkedin.com/in/gutmach * facebook.com/glenn.gutmacher * twitter.com/gutmach * Google Plus * Skype: glenn.gutmacher 17 17 Presented by Glenn Gutmacher 2013.