1ST TECH TALK: Web Crawler and Scraper by Abaam Germones

WEB CRAWLER
What is web crawler?
 A crawler is a program that visits Web sites and reads their pages and other
information in order to create entries for a search engine index. The major
search engines on the Web all have such a program, which is also known as a
"spider" or a "bot." Crawlers are typically programmed to visit sites that have
been submitted by their owners as new or updated. Entire sites or specific
pages can be selectively visited and indexed. Crawlers apparently gained the
name because they crawl through a site a page at a time, following the links to
other pages on the site until all pages have been read.
 But now? Not anymore :)

WEB SCRAPING
Web Scraping refers to an application that processes the HTML of a Web
page to extract data for manipulation such as converting the Web
page to another format (i.e. HTML to WML). Web Scraping scripts and
applications will simulate a person viewing a Web site with a browser.
With these scripts you can connect to a Web page and request a
page, exactly as a browser would do. The Web server will send back
the page which you can then manipulate or extract specific
information from.
Also known as Data or information mining

WHAT FOR?
 Used for Data Mining
 Copying contents (without permission) on a website
 Real states Website or application
 Online store price comparison,
 Website or app that suggest information
 Used for SEO
 Check Google, Yahoo, Bing daily rank or position on search engine
results.
 Link builder or dropper (high-breed)
 Spammer (high-breed)
 Automated account creator (high-breed)

LANGUAGES AND TOOLS
 Tools
 Ubotstudio – GUI base but you can also do your stuff in coding.
 Languages
 Phantomjs – know as one of the headless javascript
 Casperjs - know as one of the headless javascript
 PHP
 Python
 Perl
 Etc.

1ST TECH TALK: Web Crawler and Scraper by Abaam Germones

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to 1ST TECH TALK: Web Crawler and Scraper by Abaam Germones

Similar to 1ST TECH TALK: Web Crawler and Scraper by Abaam Germones (20)

Recently uploaded

Recently uploaded (20)

1ST TECH TALK: Web Crawler and Scraper by Abaam Germones