Outline Introduction Web Elements Tools Examples      Accessing and Extracting Data from Internet                     Usin...
Outline Introduction Web Elements Tools Examples  1    Introduction  2    Web Elements        URLs        HTML  3    Tools...
Outline Introduction Web Elements Tools ExamplesIntroduction   Suppose you want to:          Download financial time series...
Outline Introduction Web Elements Tools ExamplesIntroduction   Suppose you want to:          Download financial time series...
Outline Introduction Web Elements Tools ExamplesIntroduction   Suppose you want to:          Download financial time series...
Outline Introduction Web Elements Tools ExamplesIntroduction   Suppose you want to:          Download financial time series...
Outline Introduction Web Elements Tools ExamplesIntroduction   Suppose you want to:          Download financial time series...
Outline Introduction Web Elements Tools ExamplesIntroduction   Suppose you want to:          Download financial time series...
Outline Introduction Web Elements Tools ExamplesIntroduction   Suppose you want to:          Download financial time series...
Outline Introduction Web Elements Tools ExamplesIntroduction   Suppose you want to:          Download financial time series...
Outline Introduction Web Elements Tools ExamplesIntroduction                                                Yes!          ...
Outline Introduction Web Elements Tools ExamplesIntroduction                                                Yes!          ...
Outline Introduction Web Elements Tools ExamplesIntroduction   SAS is capable for web mining          SAS has its own web ...
Outline Introduction Web Elements Tools ExamplesIntroduction   SAS is capable for web mining          SAS has its own web ...
Outline Introduction Web Elements Tools ExamplesIntroduction   SAS is capable for web mining          SAS has its own web ...
Outline Introduction Web Elements Tools ExamplesIntroduction   SAS is capable for web mining          SAS has its own web ...
Outline Introduction Web Elements Tools Examples      URLs HTMLElements of Web Accessing                                  ...
Outline Introduction Web Elements Tools Examples      URLs HTMLElements of Web Accessing   For Web accessing, we need to k...
Outline Introduction Web Elements Tools Examples      URLs HTMLElements of Web Accessing   For Web accessing, we need to k...
Outline Introduction Web Elements Tools Examples      URLs HTMLElements of Web Accessing   For Web accessing, we need to k...
Outline Introduction Web Elements Tools Examples      URLs HTMLElements of Web Accessing   For Web accessing, we need to k...
Outline Introduction Web Elements Tools Examples      URLs HTMLElements of Web Accessing   For Web accessing, we need to k...
Outline Introduction Web Elements Tools Examples      URLs HTMLURLs          This is the address we type or displayed in t...
Outline Introduction Web Elements Tools Examples      URLs HTMLURLs          This is the address we type or displayed in t...
Outline Introduction Web Elements Tools Examples      URLs HTMLURLs          This is the address we type or displayed in t...
Outline Introduction Web Elements Tools Examples      URLs HTMLURLs          This is the address we type or displayed in t...
Outline Introduction Web Elements Tools Examples      URLs HTMLURLs          This is the address we type or displayed in t...
Outline Introduction Web Elements Tools Examples      URLs HTMLURLs          This is the address we type or displayed in t...
Outline Introduction Web Elements Tools Examples      URLs HTMLURLs   Two types of URLs: Static and Dynamic       A Static...
Outline Introduction Web Elements Tools Examples      URLs HTMLURLs   Two types of URLs: Static and Dynamic       A Static...
Outline Introduction Web Elements Tools Examples      URLs HTMLURLs   Two types of URLs: Static and Dynamic       A Static...
Outline Introduction Web Elements Tools Examples      URLs HTMLURLs          For all static webpages and some dynamic webp...
Outline Introduction Web Elements Tools Examples      URLs HTMLURLs          For all static webpages and some dynamic webp...
Outline Introduction Web Elements Tools Examples      URLs HTMLURLs          For all static webpages and some dynamic webp...
Outline Introduction Web Elements Tools Examples      URLs HTMLURLs          URLs can not contain space or any special cha...
Outline Introduction Web Elements Tools Examples      URLs HTMLURLs          URLs can not contain space or any special cha...
Outline Introduction Web Elements Tools Examples      URLs HTMLURLs          URLs can not contain space or any special cha...
Outline Introduction Web Elements Tools Examples      URLs HTMLURLs          URLs can not contain space or any special cha...
Outline Introduction Web Elements Tools Examples      URLs HTMLURLs   Two methods of sending the required parameters to th...
Outline Introduction Web Elements Tools Examples      URLs HTMLURLs   Two methods of sending the required parameters to th...
Outline Introduction Web Elements Tools Examples      URLs HTMLURLs   Two methods of sending the required parameters to th...
Outline Introduction Web Elements Tools Examples      URLs HTMLURLs   Two methods of sending the required parameters to th...
Outline Introduction Web Elements Tools Examples      URLs HTMLURLs   Two methods of sending the required parameters to th...
Outline Introduction Web Elements Tools Examples      URLs HTMLURLs   Two methods of sending the required parameters to th...
Outline Introduction Web Elements Tools Examples      URLs HTMLHTML          The response from the server is usually in th...
Outline Introduction Web Elements Tools Examples      URLs HTMLHTML          The response from the server is usually in th...
Outline Introduction Web Elements Tools Examples      URLs HTMLHTML          The response from the server is usually in th...
Outline Introduction Web Elements Tools Examples      URLs HTMLHTML          The response from the server is usually in th...
Outline Introduction Web Elements Tools Examples      URLs HTMLHTML          The response from the server is usually in th...
Outline Introduction Web Elements Tools Examples      URLs HTMLHTML          The response from the server is usually in th...
Outline Introduction Web Elements Tools Examples      URLs HTMLHTML          HTML file consists pairs of tags and actual in...
Outline Introduction Web Elements Tools Examples      URLs HTMLHTML          HTML file consists pairs of tags and actual in...
Outline Introduction Web Elements Tools Examples      URLs HTMLHTML          HTML file consists pairs of tags and actual in...
Outline Introduction Web Elements Tools Examples      URLs HTMLHTML   Examples of tags (normally come in pairs):          ...
Outline Introduction Web Elements Tools Examples      SAS Functions SAS Statements cURL Perl/LWPWeb Accessing Tools       ...
Outline Introduction Web Elements Tools Examples      SAS Functions SAS Statements cURL Perl/LWPTools for Web Accessing   ...
Outline Introduction Web Elements Tools Examples      SAS Functions SAS Statements cURL Perl/LWPTools for Web Accessing   ...
Outline Introduction Web Elements Tools Examples      SAS Functions SAS Statements cURL Perl/LWPTools for Web Accessing   ...
Outline Introduction Web Elements Tools Examples      SAS Functions SAS Statements cURL Perl/LWPTools for Web Accessing   ...
Outline Introduction Web Elements Tools Examples      SAS Functions SAS Statements cURL Perl/LWPSAS functions for text ext...
Outline Introduction Web Elements Tools Examples      SAS Functions SAS Statements cURL Perl/LWPSAS functions for text ext...
Outline Introduction Web Elements Tools Examples      SAS Functions SAS Statements cURL Perl/LWPSAS functions for text ext...
Outline Introduction Web Elements Tools Examples      SAS Functions SAS Statements cURL Perl/LWPSAS functions for text ext...
Outline Introduction Web Elements Tools Examples      SAS Functions SAS Statements cURL Perl/LWPSAS Web Access Statements ...
Outline Introduction Web Elements Tools Examples      SAS Functions SAS Statements cURL Perl/LWPSAS Web Access Statements ...
Outline Introduction Web Elements Tools Examples      SAS Functions SAS Statements cURL Perl/LWPSAS Web Access Statements ...
Outline Introduction Web Elements Tools Examples      SAS Functions SAS Statements cURL Perl/LWPSAS Web Access Statements ...
Outline Introduction Web Elements Tools Examples      SAS Functions SAS Statements cURL Perl/LWPSAS Web Access Statements ...
Outline Introduction Web Elements Tools Examples      SAS Functions SAS Statements cURL Perl/LWPSAS Web Access Statements ...
Outline Introduction Web Elements Tools Examples      SAS Functions SAS Statements cURL Perl/LWPSAS Web Access Statements ...
Outline Introduction Web Elements Tools Examples      SAS Functions SAS Statements cURL Perl/LWPSAS Web Access Statements ...
Outline Introduction Web Elements Tools Examples      SAS Functions SAS Statements cURL Perl/LWPSAS Web Access Statements ...
Outline Introduction Web Elements Tools Examples      SAS Functions SAS Statements cURL Perl/LWPSAS Web Access Statements ...
Outline Introduction Web Elements Tools Examples      SAS Functions SAS Statements cURL Perl/LWPSAS Web Access Statements ...
Outline Introduction Web Elements Tools Examples      SAS Functions SAS Statements cURL Perl/LWPSAS Web Access Statements ...
Outline Introduction Web Elements Tools Examples      SAS Functions SAS Statements cURL Perl/LWPcURL Command Line Program ...
Outline Introduction Web Elements Tools Examples      SAS Functions SAS Statements cURL Perl/LWPcURL Command Line Program ...
Outline Introduction Web Elements Tools Examples      SAS Functions SAS Statements cURL Perl/LWPcURL Command Line Program ...
Outline Introduction Web Elements Tools Examples      SAS Functions SAS Statements cURL Perl/LWPcURL Command Line Program ...
Outline Introduction Web Elements Tools Examples      SAS Functions SAS Statements cURL Perl/LWPcURL Command Line Program ...
Outline Introduction Web Elements Tools Examples      SAS Functions SAS Statements cURL Perl/LWPcURL Command Line Program ...
Outline Introduction Web Elements Tools Examples      SAS Functions SAS Statements cURL Perl/LWPcURL Command Line Program ...
Outline Introduction Web Elements Tools Examples      SAS Functions SAS Statements cURL Perl/LWPcURL Command Line Program ...
Outline Introduction Web Elements Tools Examples      SAS Functions SAS Statements cURL Perl/LWPthe LWP package with Perl ...
Outline Introduction Web Elements Tools Examples      SAS Functions SAS Statements cURL Perl/LWPthe LWP package with Perl ...
Outline Introduction Web Elements Tools Examples      SAS Functions SAS Statements cURL Perl/LWPthe LWP package with Perl ...
Outline Introduction Web Elements Tools Examples      SAS Functions SAS Statements cURL Perl/LWPthe LWP package with Perl ...
Outline Introduction Web Elements Tools Examples      SAS Functions SAS Statements cURL Perl/LWPthe LWP package with Perl ...
Outline Introduction Web Elements Tools Examples      Example 1 Example 2 Example 3 Example 4 Example 5Examples           ...
Outline Introduction Web Elements Tools Examples                   Example 1 Example 2 Example 3 Example 4 Example 5Exampl...
Outline Introduction Web Elements Tools Examples               Example 1 Example 2 Example 3 Example 4 Example 5Example 2:...
Outline Introduction Web Elements Tools Examples      Example 1 Example 2 Example 3 Example 4 Example 5Example 2: Get the ...
Outline Introduction Web Elements Tools Examples              Example 1 Example 2 Example 3 Example 4 Example 5Example 2: ...
Outline Introduction Web Elements Tools Examples              Example 1 Example 2 Example 3 Example 4 Example 5Example 3: ...
Outline Introduction Web Elements Tools Examples      Example 1 Example 2 Example 3 Example 4 Example 5Example 3: Find out...
Outline Introduction Web Elements Tools Examples      Example 1 Example 2 Example 3 Example 4 Example 5Example 3: Find out...
Outline Introduction Web Elements Tools Examples               Example 1 Example 2 Example 3 Example 4 Example 5Example 4:...
Outline Introduction Web Elements Tools Examples      Example 1 Example 2 Example 3 Example 4 Example 5Example 4: Get and ...
Outline Introduction Web Elements Tools Examples            Example 1 Example 2 Example 3 Example 4 Example 5Example 5: Do...
Outline Introduction Web Elements Tools Examples      Example 1 Example 2 Example 3 Example 4 Example 5Example 5: Download...
Outline Introduction Web Elements Tools Examples      Example 1 Example 2 Example 3 Example 4 Example 5Example 5: Download...
Outline Introduction Web Elements Tools Examples      Example 1 Example 2 Example 3 Example 4 Example 5Example 5: Download...
Outline Introduction Web Elements Tools Examples      Example 1 Example 2 Example 3 Example 4 Example 5Example 5: Download...
Outline Introduction Web Elements Tools Examples      Example 1 Example 2 Example 3 Example 4 Example 5Example 5: Download...
Upcoming SlideShare
Loading in …5
×

eSUG fall 2011

1,059 views

Published on

My presentation on eSUG, Fall 2011

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,059
On SlideShare
0
From Embeds
0
Number of Embeds
486
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

eSUG fall 2011

  1. 1. Outline Introduction Web Elements Tools Examples Accessing and Extracting Data from Internet Using SAS George Zhu, Sunita Ghosh Alberta Health Services - Cancer Care Oct 26, 2011 Edmonton SAS User Group (eSUG) Meeting George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  2. 2. Outline Introduction Web Elements Tools Examples 1 Introduction 2 Web Elements URLs HTML 3 Tools SAS Functions SAS Statements cURL Perl/LWP 4 Examples Example 1: Download .csv file Example 2: Get the list of eSUG presentations Example 3: Find out all registered clinical trials Example 4: Get and plot today’s temperature Example 5: Download City of Edmonton job postings George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  3. 3. Outline Introduction Web Elements Tools ExamplesIntroduction Suppose you want to: Download financial time series from FRED, Yahoo financial, OANDA, etc. Obtain a list of clinical trials conducted in Canada from the www.clinicaltrials.gov website Download presentations from eSUG webpage, or all Proceedings collected in Lex Jansen’s website Monitor career websites and notify you if suitable positions are available Get Twitter feeds on some hot topics and save them in data sets for further analysis or text mining. And many more... Question: Is it possible to do these using SAS? George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  4. 4. Outline Introduction Web Elements Tools ExamplesIntroduction Suppose you want to: Download financial time series from FRED, Yahoo financial, OANDA, etc. Obtain a list of clinical trials conducted in Canada from the www.clinicaltrials.gov website Download presentations from eSUG webpage, or all Proceedings collected in Lex Jansen’s website Monitor career websites and notify you if suitable positions are available Get Twitter feeds on some hot topics and save them in data sets for further analysis or text mining. And many more... Question: Is it possible to do these using SAS? George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  5. 5. Outline Introduction Web Elements Tools ExamplesIntroduction Suppose you want to: Download financial time series from FRED, Yahoo financial, OANDA, etc. Obtain a list of clinical trials conducted in Canada from the www.clinicaltrials.gov website Download presentations from eSUG webpage, or all Proceedings collected in Lex Jansen’s website Monitor career websites and notify you if suitable positions are available Get Twitter feeds on some hot topics and save them in data sets for further analysis or text mining. And many more... Question: Is it possible to do these using SAS? George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  6. 6. Outline Introduction Web Elements Tools ExamplesIntroduction Suppose you want to: Download financial time series from FRED, Yahoo financial, OANDA, etc. Obtain a list of clinical trials conducted in Canada from the www.clinicaltrials.gov website Download presentations from eSUG webpage, or all Proceedings collected in Lex Jansen’s website Monitor career websites and notify you if suitable positions are available Get Twitter feeds on some hot topics and save them in data sets for further analysis or text mining. And many more... Question: Is it possible to do these using SAS? George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  7. 7. Outline Introduction Web Elements Tools ExamplesIntroduction Suppose you want to: Download financial time series from FRED, Yahoo financial, OANDA, etc. Obtain a list of clinical trials conducted in Canada from the www.clinicaltrials.gov website Download presentations from eSUG webpage, or all Proceedings collected in Lex Jansen’s website Monitor career websites and notify you if suitable positions are available Get Twitter feeds on some hot topics and save them in data sets for further analysis or text mining. And many more... Question: Is it possible to do these using SAS? George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  8. 8. Outline Introduction Web Elements Tools ExamplesIntroduction Suppose you want to: Download financial time series from FRED, Yahoo financial, OANDA, etc. Obtain a list of clinical trials conducted in Canada from the www.clinicaltrials.gov website Download presentations from eSUG webpage, or all Proceedings collected in Lex Jansen’s website Monitor career websites and notify you if suitable positions are available Get Twitter feeds on some hot topics and save them in data sets for further analysis or text mining. And many more... Question: Is it possible to do these using SAS? George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  9. 9. Outline Introduction Web Elements Tools ExamplesIntroduction Suppose you want to: Download financial time series from FRED, Yahoo financial, OANDA, etc. Obtain a list of clinical trials conducted in Canada from the www.clinicaltrials.gov website Download presentations from eSUG webpage, or all Proceedings collected in Lex Jansen’s website Monitor career websites and notify you if suitable positions are available Get Twitter feeds on some hot topics and save them in data sets for further analysis or text mining. And many more... Question: Is it possible to do these using SAS? George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  10. 10. Outline Introduction Web Elements Tools ExamplesIntroduction Suppose you want to: Download financial time series from FRED, Yahoo financial, OANDA, etc. Obtain a list of clinical trials conducted in Canada from the www.clinicaltrials.gov website Download presentations from eSUG webpage, or all Proceedings collected in Lex Jansen’s website Monitor career websites and notify you if suitable positions are available Get Twitter feeds on some hot topics and save them in data sets for further analysis or text mining. And many more... Question: Is it possible to do these using SAS? George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  11. 11. Outline Introduction Web Elements Tools ExamplesIntroduction Yes! Mostly, but not for all web accessing with SAS only George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  12. 12. Outline Introduction Web Elements Tools ExamplesIntroduction Yes! Mostly, but not for all web accessing with SAS only George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  13. 13. Outline Introduction Web Elements Tools ExamplesIntroduction SAS is capable for web mining SAS has its own web access tools: filename url, filename socket, etc. SAS has data extraction tools: character string functions and powerful Perl Regular Expression functions and call process SAS provides two mechanisms to integrate specialized web access programs to get the work done: X statement, filename pipe statement George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  14. 14. Outline Introduction Web Elements Tools ExamplesIntroduction SAS is capable for web mining SAS has its own web access tools: filename url, filename socket, etc. SAS has data extraction tools: character string functions and powerful Perl Regular Expression functions and call process SAS provides two mechanisms to integrate specialized web access programs to get the work done: X statement, filename pipe statement George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  15. 15. Outline Introduction Web Elements Tools ExamplesIntroduction SAS is capable for web mining SAS has its own web access tools: filename url, filename socket, etc. SAS has data extraction tools: character string functions and powerful Perl Regular Expression functions and call process SAS provides two mechanisms to integrate specialized web access programs to get the work done: X statement, filename pipe statement George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  16. 16. Outline Introduction Web Elements Tools ExamplesIntroduction SAS is capable for web mining SAS has its own web access tools: filename url, filename socket, etc. SAS has data extraction tools: character string functions and powerful Perl Regular Expression functions and call process SAS provides two mechanisms to integrate specialized web access programs to get the work done: X statement, filename pipe statement George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  17. 17. Outline Introduction Web Elements Tools Examples URLs HTMLElements of Web Accessing Web Elements George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  18. 18. Outline Introduction Web Elements Tools Examples URLs HTMLElements of Web Accessing For Web accessing, we need to know some basic elements about internet: Where is the information located: URL (Universal Resource Locator) How the information is organized: HTML (Hypertext Markup Language) How to communicate (send and receive) with the server having the information: HTTP (Hypertext Transfer Protocol), HTTPS (secured HTTP), FTP, and more Request header and response header: These are critial information for more complicated websites, such as cookies, response status codes, etc. George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  19. 19. Outline Introduction Web Elements Tools Examples URLs HTMLElements of Web Accessing For Web accessing, we need to know some basic elements about internet: Where is the information located: URL (Universal Resource Locator) How the information is organized: HTML (Hypertext Markup Language) How to communicate (send and receive) with the server having the information: HTTP (Hypertext Transfer Protocol), HTTPS (secured HTTP), FTP, and more Request header and response header: These are critial information for more complicated websites, such as cookies, response status codes, etc. George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  20. 20. Outline Introduction Web Elements Tools Examples URLs HTMLElements of Web Accessing For Web accessing, we need to know some basic elements about internet: Where is the information located: URL (Universal Resource Locator) How the information is organized: HTML (Hypertext Markup Language) How to communicate (send and receive) with the server having the information: HTTP (Hypertext Transfer Protocol), HTTPS (secured HTTP), FTP, and more Request header and response header: These are critial information for more complicated websites, such as cookies, response status codes, etc. George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  21. 21. Outline Introduction Web Elements Tools Examples URLs HTMLElements of Web Accessing For Web accessing, we need to know some basic elements about internet: Where is the information located: URL (Universal Resource Locator) How the information is organized: HTML (Hypertext Markup Language) How to communicate (send and receive) with the server having the information: HTTP (Hypertext Transfer Protocol), HTTPS (secured HTTP), FTP, and more Request header and response header: These are critial information for more complicated websites, such as cookies, response status codes, etc. George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  22. 22. Outline Introduction Web Elements Tools Examples URLs HTMLElements of Web Accessing For Web accessing, we need to know some basic elements about internet: Where is the information located: URL (Universal Resource Locator) How the information is organized: HTML (Hypertext Markup Language) How to communicate (send and receive) with the server having the information: HTTP (Hypertext Transfer Protocol), HTTPS (secured HTTP), FTP, and more Request header and response header: These are critial information for more complicated websites, such as cookies, response status codes, etc. George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  23. 23. Outline Introduction Web Elements Tools Examples URLs HTMLURLs This is the address we type or displayed in the address bar in the web browser, such as: http://www.sas.com/offices/NA/canada/en/edmonton.html http://maps.google.com/maps?q=edmonton Usually has 3 components: Transfer Protocol: http, https, ftp, etc. IP Address or hostname: www.sas.com and maps.google.com The path and file name of the webpage on the host (web server): /offices/NA/canada/en (path), edmonton.html (filename). The third part can also be a function name (maps) and pairs of parameter=value (q=edmonton). George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  24. 24. Outline Introduction Web Elements Tools Examples URLs HTMLURLs This is the address we type or displayed in the address bar in the web browser, such as: http://www.sas.com/offices/NA/canada/en/edmonton.html http://maps.google.com/maps?q=edmonton Usually has 3 components: Transfer Protocol: http, https, ftp, etc. IP Address or hostname: www.sas.com and maps.google.com The path and file name of the webpage on the host (web server): /offices/NA/canada/en (path), edmonton.html (filename). The third part can also be a function name (maps) and pairs of parameter=value (q=edmonton). George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  25. 25. Outline Introduction Web Elements Tools Examples URLs HTMLURLs This is the address we type or displayed in the address bar in the web browser, such as: http://www.sas.com/offices/NA/canada/en/edmonton.html http://maps.google.com/maps?q=edmonton Usually has 3 components: Transfer Protocol: http, https, ftp, etc. IP Address or hostname: www.sas.com and maps.google.com The path and file name of the webpage on the host (web server): /offices/NA/canada/en (path), edmonton.html (filename). The third part can also be a function name (maps) and pairs of parameter=value (q=edmonton). George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  26. 26. Outline Introduction Web Elements Tools Examples URLs HTMLURLs This is the address we type or displayed in the address bar in the web browser, such as: http://www.sas.com/offices/NA/canada/en/edmonton.html http://maps.google.com/maps?q=edmonton Usually has 3 components: Transfer Protocol: http, https, ftp, etc. IP Address or hostname: www.sas.com and maps.google.com The path and file name of the webpage on the host (web server): /offices/NA/canada/en (path), edmonton.html (filename). The third part can also be a function name (maps) and pairs of parameter=value (q=edmonton). George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  27. 27. Outline Introduction Web Elements Tools Examples URLs HTMLURLs This is the address we type or displayed in the address bar in the web browser, such as: http://www.sas.com/offices/NA/canada/en/edmonton.html http://maps.google.com/maps?q=edmonton Usually has 3 components: Transfer Protocol: http, https, ftp, etc. IP Address or hostname: www.sas.com and maps.google.com The path and file name of the webpage on the host (web server): /offices/NA/canada/en (path), edmonton.html (filename). The third part can also be a function name (maps) and pairs of parameter=value (q=edmonton). George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  28. 28. Outline Introduction Web Elements Tools Examples URLs HTMLURLs This is the address we type or displayed in the address bar in the web browser, such as: http://www.sas.com/offices/NA/canada/en/edmonton.html http://maps.google.com/maps?q=edmonton Usually has 3 components: Transfer Protocol: http, https, ftp, etc. IP Address or hostname: www.sas.com and maps.google.com The path and file name of the webpage on the host (web server): /offices/NA/canada/en (path), edmonton.html (filename). The third part can also be a function name (maps) and pairs of parameter=value (q=edmonton). George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  29. 29. Outline Introduction Web Elements Tools Examples URLs HTMLURLs Two types of URLs: Static and Dynamic A Static webpage is the html file that is already stored in the web server. The file Edmonton.html is already physically stored in the SAS server, under the specified path. http://www.sas.com/offices/NA/canada/en/edmonton.html A Dynamic webpage is a webpage that is dynamically generated, depending on what information the server received. http://maps.google.com/maps?q=edmonton George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  30. 30. Outline Introduction Web Elements Tools Examples URLs HTMLURLs Two types of URLs: Static and Dynamic A Static webpage is the html file that is already stored in the web server. The file Edmonton.html is already physically stored in the SAS server, under the specified path. http://www.sas.com/offices/NA/canada/en/edmonton.html A Dynamic webpage is a webpage that is dynamically generated, depending on what information the server received. http://maps.google.com/maps?q=edmonton George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  31. 31. Outline Introduction Web Elements Tools Examples URLs HTMLURLs Two types of URLs: Static and Dynamic A Static webpage is the html file that is already stored in the web server. The file Edmonton.html is already physically stored in the SAS server, under the specified path. http://www.sas.com/offices/NA/canada/en/edmonton.html A Dynamic webpage is a webpage that is dynamically generated, depending on what information the server received. http://maps.google.com/maps?q=edmonton George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  32. 32. Outline Introduction Web Elements Tools Examples URLs HTMLURLs For all static webpages and some dynamic webpages, the url (when you click the link) is usually given with <a href="url"> in the HTML file, so you can easily extract the next url For many dynamic webpages, you need to determine function name and what parameter/value pairs are needed in order to get the required results. The function name and parameters are usually specified in the current HTML with tags like: <form submit=[function]>, <input name=[param], value=["value"]>. George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  33. 33. Outline Introduction Web Elements Tools Examples URLs HTMLURLs For all static webpages and some dynamic webpages, the url (when you click the link) is usually given with <a href="url"> in the HTML file, so you can easily extract the next url For many dynamic webpages, you need to determine function name and what parameter/value pairs are needed in order to get the required results. The function name and parameters are usually specified in the current HTML with tags like: <form submit=[function]>, <input name=[param], value=["value"]>. George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  34. 34. Outline Introduction Web Elements Tools Examples URLs HTMLURLs For all static webpages and some dynamic webpages, the url (when you click the link) is usually given with <a href="url"> in the HTML file, so you can easily extract the next url For many dynamic webpages, you need to determine function name and what parameter/value pairs are needed in order to get the required results. The function name and parameters are usually specified in the current HTML with tags like: <form submit=[function]>, <input name=[param], value=["value"]>. George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  35. 35. Outline Introduction Web Elements Tools Examples URLs HTMLURLs URLs can not contain space or any special characters (like #.,()"’;:, etc.). These need to be encoded SAS provides the function urlencode for this purpose. Also, the function urldecode will do the opposite If there are more than one pair of parameter=value in the dynamic url, a & is used to separate the pairs These special characters (especially &) will cause trouble when constructing URLs in a SAS macro. Need to use macro quoting functions (like %str, %quote, %nrquote, %superq, etc.) for the URL George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  36. 36. Outline Introduction Web Elements Tools Examples URLs HTMLURLs URLs can not contain space or any special characters (like #.,()"’;:, etc.). These need to be encoded SAS provides the function urlencode for this purpose. Also, the function urldecode will do the opposite If there are more than one pair of parameter=value in the dynamic url, a & is used to separate the pairs These special characters (especially &) will cause trouble when constructing URLs in a SAS macro. Need to use macro quoting functions (like %str, %quote, %nrquote, %superq, etc.) for the URL George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  37. 37. Outline Introduction Web Elements Tools Examples URLs HTMLURLs URLs can not contain space or any special characters (like #.,()"’;:, etc.). These need to be encoded SAS provides the function urlencode for this purpose. Also, the function urldecode will do the opposite If there are more than one pair of parameter=value in the dynamic url, a & is used to separate the pairs These special characters (especially &) will cause trouble when constructing URLs in a SAS macro. Need to use macro quoting functions (like %str, %quote, %nrquote, %superq, etc.) for the URL George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  38. 38. Outline Introduction Web Elements Tools Examples URLs HTMLURLs URLs can not contain space or any special characters (like #.,()"’;:, etc.). These need to be encoded SAS provides the function urlencode for this purpose. Also, the function urldecode will do the opposite If there are more than one pair of parameter=value in the dynamic url, a & is used to separate the pairs These special characters (especially &) will cause trouble when constructing URLs in a SAS macro. Need to use macro quoting functions (like %str, %quote, %nrquote, %superq, etc.) for the URL George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  39. 39. Outline Introduction Web Elements Tools Examples URLs HTMLURLs Two methods of sending the required parameters to the server: GET: through the static or dynmaic url - parameters can be seen on the address bar POST: through the request header - parameters not displayed in the address bar POST method can be used to transfer large amount of parameters or sensitive data (such as password) In SAS, most GET requests can be realized with FILENAME URL statement For POST requests, you can only use FILENAME Socket statement - very complicated! - not recommended. Use other tools (like cURL) George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  40. 40. Outline Introduction Web Elements Tools Examples URLs HTMLURLs Two methods of sending the required parameters to the server: GET: through the static or dynmaic url - parameters can be seen on the address bar POST: through the request header - parameters not displayed in the address bar POST method can be used to transfer large amount of parameters or sensitive data (such as password) In SAS, most GET requests can be realized with FILENAME URL statement For POST requests, you can only use FILENAME Socket statement - very complicated! - not recommended. Use other tools (like cURL) George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  41. 41. Outline Introduction Web Elements Tools Examples URLs HTMLURLs Two methods of sending the required parameters to the server: GET: through the static or dynmaic url - parameters can be seen on the address bar POST: through the request header - parameters not displayed in the address bar POST method can be used to transfer large amount of parameters or sensitive data (such as password) In SAS, most GET requests can be realized with FILENAME URL statement For POST requests, you can only use FILENAME Socket statement - very complicated! - not recommended. Use other tools (like cURL) George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  42. 42. Outline Introduction Web Elements Tools Examples URLs HTMLURLs Two methods of sending the required parameters to the server: GET: through the static or dynmaic url - parameters can be seen on the address bar POST: through the request header - parameters not displayed in the address bar POST method can be used to transfer large amount of parameters or sensitive data (such as password) In SAS, most GET requests can be realized with FILENAME URL statement For POST requests, you can only use FILENAME Socket statement - very complicated! - not recommended. Use other tools (like cURL) George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  43. 43. Outline Introduction Web Elements Tools Examples URLs HTMLURLs Two methods of sending the required parameters to the server: GET: through the static or dynmaic url - parameters can be seen on the address bar POST: through the request header - parameters not displayed in the address bar POST method can be used to transfer large amount of parameters or sensitive data (such as password) In SAS, most GET requests can be realized with FILENAME URL statement For POST requests, you can only use FILENAME Socket statement - very complicated! - not recommended. Use other tools (like cURL) George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  44. 44. Outline Introduction Web Elements Tools Examples URLs HTMLURLs Two methods of sending the required parameters to the server: GET: through the static or dynmaic url - parameters can be seen on the address bar POST: through the request header - parameters not displayed in the address bar POST method can be used to transfer large amount of parameters or sensitive data (such as password) In SAS, most GET requests can be realized with FILENAME URL statement For POST requests, you can only use FILENAME Socket statement - very complicated! - not recommended. Use other tools (like cURL) George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  45. 45. Outline Introduction Web Elements Tools Examples URLs HTMLHTML The response from the server is usually in the form of HTML file In web browser, you can right-click and select View page source to see the webpage in HTML codes HTML file is a text file, so it can be read into a SAS data set. Each line is stored as one record in SAS data set. In SAS, use the INFILE statement with varying length to read each line Be aware that SAS has a length limit of 32767 - characters after this limit in a line can’t be read into SAS. George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  46. 46. Outline Introduction Web Elements Tools Examples URLs HTMLHTML The response from the server is usually in the form of HTML file In web browser, you can right-click and select View page source to see the webpage in HTML codes HTML file is a text file, so it can be read into a SAS data set. Each line is stored as one record in SAS data set. In SAS, use the INFILE statement with varying length to read each line Be aware that SAS has a length limit of 32767 - characters after this limit in a line can’t be read into SAS. George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  47. 47. Outline Introduction Web Elements Tools Examples URLs HTMLHTML The response from the server is usually in the form of HTML file In web browser, you can right-click and select View page source to see the webpage in HTML codes HTML file is a text file, so it can be read into a SAS data set. Each line is stored as one record in SAS data set. In SAS, use the INFILE statement with varying length to read each line Be aware that SAS has a length limit of 32767 - characters after this limit in a line can’t be read into SAS. George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  48. 48. Outline Introduction Web Elements Tools Examples URLs HTMLHTML The response from the server is usually in the form of HTML file In web browser, you can right-click and select View page source to see the webpage in HTML codes HTML file is a text file, so it can be read into a SAS data set. Each line is stored as one record in SAS data set. In SAS, use the INFILE statement with varying length to read each line Be aware that SAS has a length limit of 32767 - characters after this limit in a line can’t be read into SAS. George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  49. 49. Outline Introduction Web Elements Tools Examples URLs HTMLHTML The response from the server is usually in the form of HTML file In web browser, you can right-click and select View page source to see the webpage in HTML codes HTML file is a text file, so it can be read into a SAS data set. Each line is stored as one record in SAS data set. In SAS, use the INFILE statement with varying length to read each line Be aware that SAS has a length limit of 32767 - characters after this limit in a line can’t be read into SAS. George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  50. 50. Outline Introduction Web Elements Tools Examples URLs HTMLHTML The response from the server is usually in the form of HTML file In web browser, you can right-click and select View page source to see the webpage in HTML codes HTML file is a text file, so it can be read into a SAS data set. Each line is stored as one record in SAS data set. In SAS, use the INFILE statement with varying length to read each line Be aware that SAS has a length limit of 32767 - characters after this limit in a line can’t be read into SAS. George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  51. 51. Outline Introduction Web Elements Tools Examples URLs HTMLHTML HTML file consists pairs of tags and actual information. Tags are used to instruct the web browser how to display the response on the screen We can use Tags to locate the required information and how to extract them, for example, where is the URL for next page, where is the data table and what values need to be extracted. George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  52. 52. Outline Introduction Web Elements Tools Examples URLs HTMLHTML HTML file consists pairs of tags and actual information. Tags are used to instruct the web browser how to display the response on the screen We can use Tags to locate the required information and how to extract them, for example, where is the URL for next page, where is the data table and what values need to be extracted. George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  53. 53. Outline Introduction Web Elements Tools Examples URLs HTMLHTML HTML file consists pairs of tags and actual information. Tags are used to instruct the web browser how to display the response on the screen We can use Tags to locate the required information and how to extract them, for example, where is the URL for next page, where is the data table and what values need to be extracted. George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  54. 54. Outline Introduction Web Elements Tools Examples URLs HTMLHTML Examples of tags (normally come in pairs): <a href=...>: for specifying a link (an URL) <div id=...>: for identifying a division or a section in an html document <table>, <th>, <tr>, <td>: identify the table, table head, table row, table data <form ...method="post">, <input..name=** value=**>: specify what parameters (name) and values are needed to be sent to the server, and what method to send the request (get or post) George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  55. 55. Outline Introduction Web Elements Tools Examples SAS Functions SAS Statements cURL Perl/LWPWeb Accessing Tools Web Accessing Tools George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  56. 56. Outline Introduction Web Elements Tools Examples SAS Functions SAS Statements cURL Perl/LWPTools for Web Accessing Tools (software) we use for getting information from Internet: Web browsers: Internet Explorer, Firefox, Google Chrome, Safari, etc. - For browsing and clicking, not for automation. Command line programs (cURL), LWP package for Perl programing language - Very powerful web accessing, basically can replicate any web browser functionalities. SAS filename statements: URL, Socket, FTP, EMail - basic web accessing for specific functionalities, may not enough for accessing all websites. George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  57. 57. Outline Introduction Web Elements Tools Examples SAS Functions SAS Statements cURL Perl/LWPTools for Web Accessing Tools (software) we use for getting information from Internet: Web browsers: Internet Explorer, Firefox, Google Chrome, Safari, etc. - For browsing and clicking, not for automation. Command line programs (cURL), LWP package for Perl programing language - Very powerful web accessing, basically can replicate any web browser functionalities. SAS filename statements: URL, Socket, FTP, EMail - basic web accessing for specific functionalities, may not enough for accessing all websites. George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  58. 58. Outline Introduction Web Elements Tools Examples SAS Functions SAS Statements cURL Perl/LWPTools for Web Accessing Tools (software) we use for getting information from Internet: Web browsers: Internet Explorer, Firefox, Google Chrome, Safari, etc. - For browsing and clicking, not for automation. Command line programs (cURL), LWP package for Perl programing language - Very powerful web accessing, basically can replicate any web browser functionalities. SAS filename statements: URL, Socket, FTP, EMail - basic web accessing for specific functionalities, may not enough for accessing all websites. George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  59. 59. Outline Introduction Web Elements Tools Examples SAS Functions SAS Statements cURL Perl/LWPTools for Web Accessing Tools (software) we use for getting information from Internet: Web browsers: Internet Explorer, Firefox, Google Chrome, Safari, etc. - For browsing and clicking, not for automation. Command line programs (cURL), LWP package for Perl programing language - Very powerful web accessing, basically can replicate any web browser functionalities. SAS filename statements: URL, Socket, FTP, EMail - basic web accessing for specific functionalities, may not enough for accessing all websites. George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  60. 60. Outline Introduction Web Elements Tools Examples SAS Functions SAS Statements cURL Perl/LWPSAS functions for text extraction SAS string functions: FIND(), INDEX(), SCAN(), etc. These functions are useful for locating the target information Perl Regular Expression functions (Version 9): prxparse(), prxmatch(), prxchange(), and the prx-call routines: call prxchange(), call prxmatch(), call prxnext(). These are very powerful for exacting the information Example: Extract the URLs in the record (variable name: line): url=prxchange(’s/ˆ.*?href="(.*?)".*?$/$1/’,-1,line); George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  61. 61. Outline Introduction Web Elements Tools Examples SAS Functions SAS Statements cURL Perl/LWPSAS functions for text extraction SAS string functions: FIND(), INDEX(), SCAN(), etc. These functions are useful for locating the target information Perl Regular Expression functions (Version 9): prxparse(), prxmatch(), prxchange(), and the prx-call routines: call prxchange(), call prxmatch(), call prxnext(). These are very powerful for exacting the information Example: Extract the URLs in the record (variable name: line): url=prxchange(’s/ˆ.*?href="(.*?)".*?$/$1/’,-1,line); George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  62. 62. Outline Introduction Web Elements Tools Examples SAS Functions SAS Statements cURL Perl/LWPSAS functions for text extraction SAS string functions: FIND(), INDEX(), SCAN(), etc. These functions are useful for locating the target information Perl Regular Expression functions (Version 9): prxparse(), prxmatch(), prxchange(), and the prx-call routines: call prxchange(), call prxmatch(), call prxnext(). These are very powerful for exacting the information Example: Extract the URLs in the record (variable name: line): url=prxchange(’s/ˆ.*?href="(.*?)".*?$/$1/’,-1,line); George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  63. 63. Outline Introduction Web Elements Tools Examples SAS Functions SAS Statements cURL Perl/LWPSAS functions for text extraction SAS string functions: FIND(), INDEX(), SCAN(), etc. These functions are useful for locating the target information Perl Regular Expression functions (Version 9): prxparse(), prxmatch(), prxchange(), and the prx-call routines: call prxchange(), call prxmatch(), call prxnext(). These are very powerful for exacting the information Example: Extract the URLs in the record (variable name: line): url=prxchange(’s/ˆ.*?href="(.*?)".*?$/$1/’,-1,line); George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  64. 64. Outline Introduction Web Elements Tools Examples SAS Functions SAS Statements cURL Perl/LWPSAS Web Access Statements SAS provides two main statements for web access: FILENAME URL: for the GET request method (static or dynamic url) filename eSUG url "http://www.sas.com/offices/NA/canada/en/edmonton.html"; Before 9.2, FILENAME URL did not support HTTPS protocol completely, now in 9.2, it seems to support HTTPS for secured transfer FILENAME Socket: two-way commnuication, for more complicated web requests (like POST method, cookies, referer, etc.) Other statements: FILENAME EMAIL - for accessing and sending emails FILENAME FTP - for transfering files with the web server George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  65. 65. Outline Introduction Web Elements Tools Examples SAS Functions SAS Statements cURL Perl/LWPSAS Web Access Statements SAS provides two main statements for web access: FILENAME URL: for the GET request method (static or dynamic url) filename eSUG url "http://www.sas.com/offices/NA/canada/en/edmonton.html"; Before 9.2, FILENAME URL did not support HTTPS protocol completely, now in 9.2, it seems to support HTTPS for secured transfer FILENAME Socket: two-way commnuication, for more complicated web requests (like POST method, cookies, referer, etc.) Other statements: FILENAME EMAIL - for accessing and sending emails FILENAME FTP - for transfering files with the web server George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  66. 66. Outline Introduction Web Elements Tools Examples SAS Functions SAS Statements cURL Perl/LWPSAS Web Access Statements SAS provides two main statements for web access: FILENAME URL: for the GET request method (static or dynamic url) filename eSUG url "http://www.sas.com/offices/NA/canada/en/edmonton.html"; Before 9.2, FILENAME URL did not support HTTPS protocol completely, now in 9.2, it seems to support HTTPS for secured transfer FILENAME Socket: two-way commnuication, for more complicated web requests (like POST method, cookies, referer, etc.) Other statements: FILENAME EMAIL - for accessing and sending emails FILENAME FTP - for transfering files with the web server George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  67. 67. Outline Introduction Web Elements Tools Examples SAS Functions SAS Statements cURL Perl/LWPSAS Web Access Statements SAS provides two main statements for web access: FILENAME URL: for the GET request method (static or dynamic url) filename eSUG url "http://www.sas.com/offices/NA/canada/en/edmonton.html"; Before 9.2, FILENAME URL did not support HTTPS protocol completely, now in 9.2, it seems to support HTTPS for secured transfer FILENAME Socket: two-way commnuication, for more complicated web requests (like POST method, cookies, referer, etc.) Other statements: FILENAME EMAIL - for accessing and sending emails FILENAME FTP - for transfering files with the web server George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  68. 68. Outline Introduction Web Elements Tools Examples SAS Functions SAS Statements cURL Perl/LWPSAS Web Access Statements SAS provides two main statements for web access: FILENAME URL: for the GET request method (static or dynamic url) filename eSUG url "http://www.sas.com/offices/NA/canada/en/edmonton.html"; Before 9.2, FILENAME URL did not support HTTPS protocol completely, now in 9.2, it seems to support HTTPS for secured transfer FILENAME Socket: two-way commnuication, for more complicated web requests (like POST method, cookies, referer, etc.) Other statements: FILENAME EMAIL - for accessing and sending emails FILENAME FTP - for transfering files with the web server George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  69. 69. Outline Introduction Web Elements Tools Examples SAS Functions SAS Statements cURL Perl/LWPSAS Web Access Statements SAS provides two main statements for web access: FILENAME URL: for the GET request method (static or dynamic url) filename eSUG url "http://www.sas.com/offices/NA/canada/en/edmonton.html"; Before 9.2, FILENAME URL did not support HTTPS protocol completely, now in 9.2, it seems to support HTTPS for secured transfer FILENAME Socket: two-way commnuication, for more complicated web requests (like POST method, cookies, referer, etc.) Other statements: FILENAME EMAIL - for accessing and sending emails FILENAME FTP - for transfering files with the web server George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  70. 70. Outline Introduction Web Elements Tools Examples SAS Functions SAS Statements cURL Perl/LWPSAS Web Access Statements SAS provides two main statements for web access: FILENAME URL: for the GET request method (static or dynamic url) filename eSUG url "http://www.sas.com/offices/NA/canada/en/edmonton.html"; Before 9.2, FILENAME URL did not support HTTPS protocol completely, now in 9.2, it seems to support HTTPS for secured transfer FILENAME Socket: two-way commnuication, for more complicated web requests (like POST method, cookies, referer, etc.) Other statements: FILENAME EMAIL - for accessing and sending emails FILENAME FTP - for transfering files with the web server George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  71. 71. Outline Introduction Web Elements Tools Examples SAS Functions SAS Statements cURL Perl/LWPSAS Web Access Statements Two SAS mechanisms for extending SAS functions with other software: FILENAME PIPE statement: the results from this statement are treated as the inputs to the SAS data set. For example: filename fileref pipe "<DOS Command>"; X statement: leave SAS temperately and run the external program, and then return to SAS after the execution: X "<DOS Command>"; The X statement does not provide direct interaction between SAS and the external program The PIPE statement feeds the results directly to SAS George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  72. 72. Outline Introduction Web Elements Tools Examples SAS Functions SAS Statements cURL Perl/LWPSAS Web Access Statements Two SAS mechanisms for extending SAS functions with other software: FILENAME PIPE statement: the results from this statement are treated as the inputs to the SAS data set. For example: filename fileref pipe "<DOS Command>"; X statement: leave SAS temperately and run the external program, and then return to SAS after the execution: X "<DOS Command>"; The X statement does not provide direct interaction between SAS and the external program The PIPE statement feeds the results directly to SAS George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  73. 73. Outline Introduction Web Elements Tools Examples SAS Functions SAS Statements cURL Perl/LWPSAS Web Access Statements Two SAS mechanisms for extending SAS functions with other software: FILENAME PIPE statement: the results from this statement are treated as the inputs to the SAS data set. For example: filename fileref pipe "<DOS Command>"; X statement: leave SAS temperately and run the external program, and then return to SAS after the execution: X "<DOS Command>"; The X statement does not provide direct interaction between SAS and the external program The PIPE statement feeds the results directly to SAS George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  74. 74. Outline Introduction Web Elements Tools Examples SAS Functions SAS Statements cURL Perl/LWPSAS Web Access Statements Two SAS mechanisms for extending SAS functions with other software: FILENAME PIPE statement: the results from this statement are treated as the inputs to the SAS data set. For example: filename fileref pipe "<DOS Command>"; X statement: leave SAS temperately and run the external program, and then return to SAS after the execution: X "<DOS Command>"; The X statement does not provide direct interaction between SAS and the external program The PIPE statement feeds the results directly to SAS George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  75. 75. Outline Introduction Web Elements Tools Examples SAS Functions SAS Statements cURL Perl/LWPSAS Web Access Statements Two SAS mechanisms for extending SAS functions with other software: FILENAME PIPE statement: the results from this statement are treated as the inputs to the SAS data set. For example: filename fileref pipe "<DOS Command>"; X statement: leave SAS temperately and run the external program, and then return to SAS after the execution: X "<DOS Command>"; The X statement does not provide direct interaction between SAS and the external program The PIPE statement feeds the results directly to SAS George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  76. 76. Outline Introduction Web Elements Tools Examples SAS Functions SAS Statements cURL Perl/LWPcURL Command Line Program cURL is a command line tool for transferring data with URL syntax libCurl is a C library, and cURL is binary program by integrating all the functions in libCurl libCurl has been implemented in many programming languages and software Basically anything you can do with a web browser can be replicated with cURL, including clicking, downloading files (any file, like music, video, pdf, etc) and uploading. Most importantly, it is FREE. It can be downloaded from: http://curl.haxx.se/ George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  77. 77. Outline Introduction Web Elements Tools Examples SAS Functions SAS Statements cURL Perl/LWPcURL Command Line Program cURL is a command line tool for transferring data with URL syntax libCurl is a C library, and cURL is binary program by integrating all the functions in libCurl libCurl has been implemented in many programming languages and software Basically anything you can do with a web browser can be replicated with cURL, including clicking, downloading files (any file, like music, video, pdf, etc) and uploading. Most importantly, it is FREE. It can be downloaded from: http://curl.haxx.se/ George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  78. 78. Outline Introduction Web Elements Tools Examples SAS Functions SAS Statements cURL Perl/LWPcURL Command Line Program cURL is a command line tool for transferring data with URL syntax libCurl is a C library, and cURL is binary program by integrating all the functions in libCurl libCurl has been implemented in many programming languages and software Basically anything you can do with a web browser can be replicated with cURL, including clicking, downloading files (any file, like music, video, pdf, etc) and uploading. Most importantly, it is FREE. It can be downloaded from: http://curl.haxx.se/ George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  79. 79. Outline Introduction Web Elements Tools Examples SAS Functions SAS Statements cURL Perl/LWPcURL Command Line Program cURL is a command line tool for transferring data with URL syntax libCurl is a C library, and cURL is binary program by integrating all the functions in libCurl libCurl has been implemented in many programming languages and software Basically anything you can do with a web browser can be replicated with cURL, including clicking, downloading files (any file, like music, video, pdf, etc) and uploading. Most importantly, it is FREE. It can be downloaded from: http://curl.haxx.se/ George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  80. 80. Outline Introduction Web Elements Tools Examples SAS Functions SAS Statements cURL Perl/LWPcURL Command Line Program cURL is a command line tool for transferring data with URL syntax libCurl is a C library, and cURL is binary program by integrating all the functions in libCurl libCurl has been implemented in many programming languages and software Basically anything you can do with a web browser can be replicated with cURL, including clicking, downloading files (any file, like music, video, pdf, etc) and uploading. Most importantly, it is FREE. It can be downloaded from: http://curl.haxx.se/ George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  81. 81. Outline Introduction Web Elements Tools Examples SAS Functions SAS Statements cURL Perl/LWPcURL Command Line Program Basic usage of cURL CURL --options urls cURL has lots of options, with these options it can deal with virtually any requests with any websites. with FILENAME PIPE statement: filename eSUG pipe "CURL --options urls"; with X statement: x "CURL --options urls"; George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  82. 82. Outline Introduction Web Elements Tools Examples SAS Functions SAS Statements cURL Perl/LWPcURL Command Line Program Basic usage of cURL CURL --options urls cURL has lots of options, with these options it can deal with virtually any requests with any websites. with FILENAME PIPE statement: filename eSUG pipe "CURL --options urls"; with X statement: x "CURL --options urls"; George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  83. 83. Outline Introduction Web Elements Tools Examples SAS Functions SAS Statements cURL Perl/LWPcURL Command Line Program Basic usage of cURL CURL --options urls cURL has lots of options, with these options it can deal with virtually any requests with any websites. with FILENAME PIPE statement: filename eSUG pipe "CURL --options urls"; with X statement: x "CURL --options urls"; George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  84. 84. Outline Introduction Web Elements Tools Examples SAS Functions SAS Statements cURL Perl/LWPthe LWP package with Perl Perl programming language is very powerful and flexible in parsing information from HTML or XML files The LWP (library for WWW in Perl) package is a web accessing package in Perl Go to the website http://www.perl.org/ for more information about Perl and its packages The web accessing and information parsing with LWP is much faster than with SAS You can write a Perl script and use FILENAME PIPE statement or X statement to let SAS run the Perl script and return the results to SAS for further processing George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  85. 85. Outline Introduction Web Elements Tools Examples SAS Functions SAS Statements cURL Perl/LWPthe LWP package with Perl Perl programming language is very powerful and flexible in parsing information from HTML or XML files The LWP (library for WWW in Perl) package is a web accessing package in Perl Go to the website http://www.perl.org/ for more information about Perl and its packages The web accessing and information parsing with LWP is much faster than with SAS You can write a Perl script and use FILENAME PIPE statement or X statement to let SAS run the Perl script and return the results to SAS for further processing George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  86. 86. Outline Introduction Web Elements Tools Examples SAS Functions SAS Statements cURL Perl/LWPthe LWP package with Perl Perl programming language is very powerful and flexible in parsing information from HTML or XML files The LWP (library for WWW in Perl) package is a web accessing package in Perl Go to the website http://www.perl.org/ for more information about Perl and its packages The web accessing and information parsing with LWP is much faster than with SAS You can write a Perl script and use FILENAME PIPE statement or X statement to let SAS run the Perl script and return the results to SAS for further processing George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  87. 87. Outline Introduction Web Elements Tools Examples SAS Functions SAS Statements cURL Perl/LWPthe LWP package with Perl Perl programming language is very powerful and flexible in parsing information from HTML or XML files The LWP (library for WWW in Perl) package is a web accessing package in Perl Go to the website http://www.perl.org/ for more information about Perl and its packages The web accessing and information parsing with LWP is much faster than with SAS You can write a Perl script and use FILENAME PIPE statement or X statement to let SAS run the Perl script and return the results to SAS for further processing George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  88. 88. Outline Introduction Web Elements Tools Examples SAS Functions SAS Statements cURL Perl/LWPthe LWP package with Perl Perl programming language is very powerful and flexible in parsing information from HTML or XML files The LWP (library for WWW in Perl) package is a web accessing package in Perl Go to the website http://www.perl.org/ for more information about Perl and its packages The web accessing and information parsing with LWP is much faster than with SAS You can write a Perl script and use FILENAME PIPE statement or X statement to let SAS run the Perl script and return the results to SAS for further processing George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  89. 89. Outline Introduction Web Elements Tools Examples Example 1 Example 2 Example 3 Example 4 Example 5Examples Examples George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  90. 90. Outline Introduction Web Elements Tools Examples Example 1 Example 2 Example 3 Example 4 Example 5Example 1: Download .csv file Download Moody’s seasoned AAA coporate bond yield from FRED website: The return data is in .csv format, so no data extraction is needed, just like reading a local .csv file SAS Codes Resulting Data Set filename DAAA url "http://research.stlouisfed.org/fred2 /series/DAAA/downloaddata/DAAA.csv"; data BY_AAA(drop=dd); length dd $10; format date date9.; infile DAAA dlm="," dsd; input dd$ value; date=input(dd,yymmdd10.); if not missing(date) then output; run; filename DAAA clear; George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  91. 91. Outline Introduction Web Elements Tools Examples Example 1 Example 2 Example 3 Example 4 Example 5Example 2: Get the list of eSUG presentations Download the list of all presentations in eSUG webpage All the information is in one webpage, no need to look for url for next page All presentations are in .pdf files, so for data extraction we search the string within quotation marks and ends with.pdf SAS Codes filename eSUG url "http://www.sas.com/offices/NA/canada/en/edmonton.html"; data eSUG_achive(keep=pdffile); length pdffile $200; infile eSUG length=len lrecl=32767; input line $varying32767. len; *all the file names end with .pdf; if find(line,".pdf") then do; *get the string ending with .pdf and enclosed by quotation marks; pdffile=prxchange(’s/ˆ.*?"(.*?.pdf)".*$/$1/i’,-1,line); output; end; run; filename eSUG clear; George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  92. 92. Outline Introduction Web Elements Tools Examples Example 1 Example 2 Example 3 Example 4 Example 5Example 2: Get the list of eSUG presentations Resulting Data Set: eSUG Archive George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  93. 93. Outline Introduction Web Elements Tools Examples Example 1 Example 2 Example 3 Example 4 Example 5Example 2: Get the list of eSUG presentations How about down load all the presentations? FILENAME URL does not work - can’t save files use cURL command line program with the SAS X statement cURL option −−output filename can save the result to a file specified by filename SAS Codes proc sql noprint; select pdffile into :pdffiles separated by "|" from eSUG_achive; quit; options noxwait; %macro downLoadFiles(); %let eSUGFolder=H:eSUGPDFs; %let i=1; %let pdf1=%scan(&pdffiles.,&i.,"|"); %do %while (%quote(&pdf1.)˜=); *I only need the file name, don’t need the path name; %let filename=%scan(&pdf1.,-1,"/"); *the --output option saves the file; x "curl --output &eSUGFolder./&filename. &pdf1."; %let i=%eval(&i.+1); %let pdf1=%scan(&pdffiles.,&i.,"|"); %end; %mend downLoadFiles; %downLoadFiles; George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  94. 94. Outline Introduction Web Elements Tools Examples Example 1 Example 2 Example 3 Example 4 Example 5Example 3: Find out all clinical trials Locate and extract information in the html using the tags Find the url for next webpage SAS Codes %macro GetTrials(startlink=, out=Trials); %let Continue=Yes; %let NextLink=&startlink.; %let i=0; %do %while(&Continue.=Yes); filename Trials url "http://www.clinicaltrial.gov/%superq(NextLink)" lrecl=8192; data _Trials_(drop=recordline record nextpage); length Status $25 Study $200 link $200; retain Rank Status Study link; retain recordLine 0; =1 means this line may contain needed information; infile Trials length=len; input record $varying8192. len; if prxmatch(’/>Study</th>/i’,record) then recordLine=1; if recordLine=1 and find(record,"/div") then recordLine=0; if (recordLine=1) then do; *now get the related information; if prxmatch(’/<span.*>.*?</span>s*$/’,record) then status=prxchange(’s/ˆ.*?<span.*>(.*?)</span>s*$/$1/’,-1,record); if find(record,"href") then do; Study=prxchange(’s/ˆ.*>(.*?)<.*$/$1/’,-1,record); link=prxchange(’s/ˆ.*?href="(.*?)".*$/$1/’,-1,record); end; if find(record,"</table>") then do; rank=input(scan(link,-1,"="),5.0); George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  95. 95. Outline Introduction Web Elements Tools Examples Example 1 Example 2 Example 3 Example 4 Example 5Example 3: Find out all clinical trials SAS Codes (continued) output; end; end; if find(record,"Next Page") then do; if find(record,"href") then do; nextpage=tranwrd(prxchange(’s/ˆ.*?href="(.*?)".*?$/$1/’,-1,record),’&amp;’,’&’); call symput("nextlink",nextpage); end; else call symput("Continue","No"); end; run; filename Trials clear; data _Trials_; set _Trials_ end=last; if (last˜=1) then output; run; %if (&i.=0) %then %do; data &out.; set _Trials_; run; %end; %else %do; data &out.; set &out. _Trials_; run; %end; %let i=%eval(&i.+1); %end; %mend GetTrials; George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  96. 96. Outline Introduction Web Elements Tools Examples Example 1 Example 2 Example 3 Example 4 Example 5Example 3: Find out all clinical trials This macro accepts a start link and then keep looking for all the trials in next pages To get all trials in Alberta, use the start link of ct2/results?state1=NA%3ACA%3AAB Note: NA%3ACA%3AAB is the encoded string of NA:CA:AB SAS Codes %GetTrials(startlink=%str(ct2/results?state1=NA%3ACA%3AAB),out=AB_Trials); For all trials in Canada: SAS Codes %GetTrials(startlink=%str(ct2/results?state1=NA%3ACA),out=CA_Trials); George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  97. 97. Outline Introduction Web Elements Tools Examples Example 1 Example 2 Example 3 Example 4 Example 5Example 4: Get and plot today’s temperature From the weather underground website: www.wunderground.com Although the url link can be specified easily, it is not the final link, so SAS URL statement can not get the required results cURL option −−location can track down to the final link and get the resulting text file (basically a .csv file) Use FILENAME PIPE statement to get the return from cURL directly into SAS SAS Codes *There are 3 weather stations in Edmonton: IABEDMON10 IABEDMON12 IABEDMON13; %let WStation=IABEDMON13; *This is Edmonton Downtown weather station; *get today’s date and obtain day, month and year. These are used to construct the url; %let Day=%substr(&sysdate9.,1,2); %let Month=%sysfunc(month(%sysfunc(today()))); %let Year=%substr(&sysdate9.,6,4); %let EdmWeather="http://www.wunderground.com/weatherstation/WXDailyHistory.asp? ID=&WStation.%str(&)month=&month.%str(&)day=&Day.%str(&)year=&Year.%str(&)format=1"; filename Weather pipe "curl --location %superq(EdmWeather)";*FILENAME URL doesn’t work; data weather(drop=line); infile Weather lrecl=2000 length=len; input line $varying2000. len; if find(line,",") then do; time=input(scan(line,1,",","m"),anydtdtm21.); format time datetime12.; Temp=input(scan(line,2,",","m"),5.0); Dewpoint=input(scan(line,3,",","m"),5.0); if not missing(time) then output; end; run; filename Weather clear; George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  98. 98. Outline Introduction Web Elements Tools Examples Example 1 Example 2 Example 3 Example 4 Example 5Example 4: Get and plot today’s temperature SAS Codes for Plotting the Temperature symbol1 i=j; legend1 label=(h=2 "Weather") value=(h=2); axis1 value=(h=2); title1 "Temperature on &sysdate9."; proc gplot data=weather; plot (temp Dewpoint)*time/haxis=axis1 vaxis=axis1 overlay legend=legend1; run; quit; Resulting Temperature Plot George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  99. 99. Outline Introduction Web Elements Tools Examples Example 1 Example 2 Example 3 Example 4 Example 5Example 5: Download City of Edmonton job postings Most web accessing use POST method - Use Perl/LWP for web accessing and also for data extraction Then in SAS use filename pipe statement to run the Perl program and get the extracted data into a data set SAS Codes filename EdJobs pipe "C:Perlbinperl C:esugedmjobs.pl"; data Edm_Jobs; *Totally 16 variables, 14 are read in as character strings. Declare their length; length JobCode $10 Title $100 Description $10000 Qualification $10000 Salary $500; length Hours $500 PDate $20 CDate $20 JobType1 $100 JobType2 $100 Union $100; length Department $100 WorkLocation $200 WorkAddress $200; infile EdJobs lrecl=32767 length=len; *use multiple input because the final result from Perl is printed in multiple lines; input JobCode $varying10. len; input Title $varying100. len; input JobNum; input Description $varying10000. len; input Qualification $varying10000. len; input Salary $varying500. len; input Hours $varying500. len; input PDate $varying20. len; input CDate $varying20. len; input NumOfOpening; input JobType1 $varying100. len; input JobType2 $varying100. len; input Union $varying100. len; input Department $varying100. len; input WorkLocation $varying200. len; input WorkAddress $varying200. len; run; filename EdJobs clear; George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  100. 100. Outline Introduction Web Elements Tools Examples Example 1 Example 2 Example 3 Example 4 Example 5Example 5: Download City of Edmonton job postings Perl program: edmjobs.pl #===================================================================================== # Perl program to download the job postings from City of Edmonton # Written by George Zhu (george.zhu@albertahealthservices.ca) # for the Edmonton SAS User Group (eSUG) meeting, Fall 2011 # Usage: # 1. Stand Alone: # C:Perlbinperl C:esugedmjobs.pl >[output file name] # 2. With SAS, use filename pipe statement: # filename EdJobs pipe "C:Perlbinperl C:esugedmjobs.pl" # then in the data set step, use infile statement: # infile EdJobs lrecl=32767 length=len; # Note: change the folder name to where the program edmjobs.pl is located. #===================================================================================== use strict; use LWP::UserAgent; use XML::Parser; use URI::Escape; my $EdJobs=’https://edmonton.taleo.net/careersection/2/moresearch.ftl’; my $EdBrowser=LWP::UserAgent->new(); $EdBrowser->agent(’Mozilla/5.0 (Windows NT 5.1; rv:6.0.2) Gecko/20100101 Firefox/6.0.2’); my $EdResponse=$EdBrowser->get($EdJobs); my $EdPage1=$EdResponse->content() if $EdResponse->is_success(); ## obtain the form items using the XML Parser; my %inputAttrs; #for the POST data for next inquery; my $EdParse=XML::Parser->new(Handlers=>{Start=>&Input_start,}); George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  101. 101. Outline Introduction Web Elements Tools Examples Example 1 Example 2 Example 3 Example 4 Example 5Example 5: Download City of Edmonton job postings Perl program: edmjobs.pl - continued sub Input_start { my ($expat,$element,%attrs)=@_; if($element=˜/input/i) # look for the input tag { my ($param,$pmval); while(my($key,$value)=each(%attrs)) { if ($key=˜/name/i) {$param=$value;} if ($key=˜/value/i) {$value=˜tr/ /+/; $pmval=$value;} } $inputAttrs{$param}=$pmval; } } $EdParse->parse($EdPage1); my %AttrsPage1=%inputAttrs; my %AttrsPost1=%inputAttrs; ## total number of postings; my $nJobs=$1 if $AttrsPage1{"initialHistory"}=˜/listRequisition.nbElements!%7C!(d+?)!%7C!/i; ## Number of postings per page; my $listSize=$1 if $AttrsPage1{"initialHistory"}=˜/listRequisition.size!%7C!(d+?)!%7C!/i; George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  102. 102. Outline Introduction Web Elements Tools Examples Example 1 Example 2 Example 3 Example 4 Example 5Example 5: Download City of Edmonton job postings Perl program: edmjobs.pl - continued ## calculate number of pages; my $nPages=$nJobs/$listSize; $nPages=int($nPages+1) if ($nPages>int($nPages)); ## Obtain the pages of the postings, first set the parameters; $AttrsPage1{"countryPanelErrorDrawer.state"}="false"; $AttrsPage1{"errorMessageDrawer.state"}="false"; $AttrsPage1{"ftlcompclass"}="PagerComponent"; $AttrsPage1{"ftlcallback"}="ftlPager_processResponse"; $AttrsPage1{"ftlcallback"}="ftlPager_processResponse"; $AttrsPage1{"ftlcompid"}="rlPager"; $AttrsPage1{"ftlinterfaceid"}="requisitionListInterface"; my $pp=1; my $i=0; my @JobCodes; my $jobDetails=’https://edmonton.taleo.net/careersection/2/jobdetail.ftl’; while ($pp<=$nPages) { $AttrsPage1{"rlPager.currentPage"}="$pp"; my $JobPage=$EdBrowser->post(jobDetails,%AttrsPage1, ’Referer’=>EdJobs,); my @fillList=split("’,’",$1) if $JobPage->content()=˜/api.fillList(.*?);/i; shift(@fillList); shift(@fillList); my $nLine=3; my $listSize=@fillList; George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  103. 103. Outline Introduction Web Elements Tools Examples Example 1 Example 2 Example 3 Example 4 Example 5Example 5: Download City of Edmonton job postings Perl program: edmjobs.pl - continued while($nLine<$listSize) { $JobCodes[$i]=$fillList[$nLine]; $nLine+=33; $i++; } $pp++; } $AttrsPost1{"ftlcompid"}="actOpenRequisitionDescription"; $AttrsPost1{"ftlinterfaceid"}="requisitionListInterface"; foreach my $code (@JobCodes) { $AttrsPost1{"actOpenRequisitionDescription.requisitionNo"}="$code"; # Use the POST method to get the detailed info about a posting my $Job1=$EdBrowser->post(jobDetails,%AttrsPost1,’Referer’=>EdJobs,); $EdParse->parse($Job1->content()); $inputAttrs{"initialHistory"}=˜tr/+/ /; my @infor=split("!%7C!",$inputAttrs{"initialHistory"}); my $description=uri_unescape($infor[15]); $description=˜s/(<.*?>)|(!*!)//g; $description=˜s/(&nbsp;)/ /g; $description=˜s/(')/’/g; $description=˜s/(n)/ /g; my $qualification=uri_unescape($infor[16]); $qualification=˜s/(<.*?>)|(!*!)//g; George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS
  104. 104. Outline Introduction Web Elements Tools Examples Example 1 Example 2 Example 3 Example 4 Example 5Example 5: Download City of Edmonton job postings Perl program: edmjobs.pl - continued $qualification=˜s/(&nbsp;)/ /g; $qualification=˜s/(')/’/g; my $salary=$1 if $qualification=˜s/Salary Range: (.*?)n//i; my $Hours=$1 if $qualification=˜s/Hours of Work: (.*?)n//i; $qualification=˜s/(n)/ /g; ## these are the extracted information from a job posting; print $code,"n"; #jobcode print $infor[12],"n"; #title print $infor[13],"n"; #Job Number print $description,"n"; #Description print $qualification,"n"; #Qualification print $salary,"n"; #Salary Range print $Hours,"n"; #Hours of Work print $infor[20],"n"; #Posting Date print $infor[22],"n"; #Closing Date print $infor[23],"n"; #Number of opening print $infor[25],"n"; #Job type 1 print $infor[26],"n"; #Job type 2 print $infor[27],"n"; #Union print $infor[29],"n"; #Department print $infor[32],"n"; #Work location print $infor[33],"n"; #Work location address #print "=================================nn"; } #### End of program #### George Zhu & Sunita Ghosh (AHS - Cancer Care) Accessing and Extracting Data from Internet Using SAS

×