Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Migrating from PHP 4 to PHP 5

9,147 views

Published on

A talk I gave at php|tropics on the potential risks and challenges when migrating an existing PHP version 4 application to PHP 5

Published in: Business, Technology
  • http://www.dbmanagement.info/Tutorials/MYSQL-PHP.htm
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Migrating from PHP 4 to PHP 5

  1. 1. Migrating from PHP 4 to 5 John Coggeshall Zend Technologies
  2. 2. Let's get started <ul><li>PHP 5, among its engine-level improvements such as OO and Iterators for users also improved from an internals perspective </li></ul><ul><li>These improvements allow for incredible flexibility in PHP 5 extensions </li></ul><ul><ul><li>Object overloading for XML parsing </li></ul></ul><ul><ul><li>Integration of other objects from Java, .NET, etc. </li></ul></ul><ul><li>PHP 5 also boasts some great new DB functionality </li></ul>
  3. 3. SQLite <ul><li>One of the best new DB features in PHP 5 is SQLite </li></ul><ul><li>What is SQLite? </li></ul><ul><ul><li>A stand-alone RDBMS database system </li></ul></ul><ul><ul><li>Allows developers to store relational data in the local file system </li></ul></ul><ul><ul><li>No external server exists, nor is needed </li></ul></ul><ul><ul><li>Depending on the application, it can significantly out perform other DB packages </li></ul></ul>
  4. 4. The difference in paradigms <ul><li>While RDBMS like MySQL run on a client server model, SQLite modifies files directly. </li></ul>
  5. 5. Simplicity at a price <ul><li>While SQLite is a simpler RDBMS model, the simplicity comes at a price </li></ul><ul><ul><li>Because of the architecture, the database is inherently un-scalable for concurrent writing (every write locks the entire database) </li></ul></ul><ul><ul><li>The simplicity makes it usable in almost any environment </li></ul></ul><ul><ul><li>SQLite, however, is incredibly good at reading! </li></ul></ul>
  6. 6. Example: Zip Code / Area Code lookup <ul><li>SQLite is extremely good for look-up tables </li></ul><ul><ul><li>For instance, relating U.S. postal codes to city names and phone area codes </li></ul></ul><ul><li>Where to get the data: zipfiles.txt </li></ul><ul><ul><li>A little text file I picked up years ago somewhere along the line </li></ul></ul>
  7. 7. Zip file format <ul><li>File is one line per entry in the format: <ZIPCODE><STATE><AREACODE><CITYNAME> </li></ul><ul><li>First step: Create tables </li></ul><ul><li>Second step: Create indexes </li></ul><ul><li>Third step: populate database </li></ul><ul><li>Fourth Step: Lock and Load! </li></ul>
  8. 8. A note about creating tables in SQLite <ul><li>SQLite is unlike most other RDBMS packages does not require typing information such as </li></ul><ul><ul><li>INTEGER </li></ul></ul><ul><ul><li>VARCHAR </li></ul></ul><ul><ul><li>etc. </li></ul></ul><ul><li>Rather, SQLite has only a notion of type classes: </li></ul><ul><ul><li>textual </li></ul></ul><ul><ul><li>numeric </li></ul></ul>
  9. 9. A lack of typing information means <ul><li>Although you can use whatever you want for a type, SQLite does have some simple rules </li></ul><ul><ul><li>INTEGER must be used if you want to create an auto incrementing key </li></ul></ul><ul><ul><li>Anything with the substring “CHAR” in it will be considered textual </li></ul></ul>
  10. 10. Create your tables <ul><li>Download the sqlite command line tool from sqlite.org and create zipfiles.db: $ sqlite zipfiles.db sqlite> CREATE TABLE cities (zip INTEGER, city_name, state); sqlite> CREATE INDEX cities_city_name_idx on cities(city_name); sqlite> CREATE INDEX cities_zip_idx on cities(zip); sqlite> CREATE TABLE areacode(zip INTEGER, areacode INTEGER); sqlite> CREATE INDEX areacode_idx on areacode(areacode); sqlite> CREATE INDEX areacode_zip_idx on areacode(zip); </li></ul>
  11. 11. Populate the tables (zipcode_db_populate.php) <ul><li>With the tables created populate them using a simple PHP script to parse our text file </li></ul><ul><ul><li>Use sqlite_open() to open the database file </li></ul></ul><ul><ul><li>When inserting data, always use sqlite_escape_string() to escape data </li></ul></ul><ul><ul><li>Use sqlite_query() to perform the queries </li></ul></ul><ul><ul><li>Use sqlite_close() to close the database </li></ul></ul>
  12. 12. The Zipcode lookup API (zipcode_api.php, zipcode_lookup.php) <ul><li>Now that we have our database, wrap the queries into a clean API that we can use </li></ul><ul><ul><li>find_cities_by_zipcode($zipcode, $db) </li></ul></ul><ul><ul><li>find_state_by_zipcode($zipcode, $db) </li></ul></ul><ul><ul><li>find_areacodes_by_zipcode($zipcode, $db) </li></ul></ul><ul><ul><li>find_zipcode_by_city_name($city, $state, $db) </li></ul></ul><ul><ul><li>_handle_sqlite_error($result, $db) </li></ul></ul><ul><ul><ul><li>To handle errors which occur during a query </li></ul></ul></ul>
  13. 13. Improving Write Performance <ul><li>Although SQLite isn't very good at writing there are a number of things you can do to improve write performance </li></ul><ul><ul><li>Wrap large numbers of queries in a transaction </li></ul></ul><ul><ul><li>Using PRAGMA to tweak SQLite options to improve performance </li></ul></ul><ul><ul><li>Spread tables across multiple database files </li></ul></ul>
  14. 14. Synchronous <ul><li>The synchronous option is a very import option in SQLite. It controls the trade off between absolute data integrity and speed </li></ul><ul><li>Three different levels: </li></ul><ul><ul><li>NONE : Fastest, but sudden power outage can result in data loss </li></ul></ul><ul><ul><li>NORMAL: Default setting offering a reasonable mix between data integrity and speed </li></ul></ul><ul><ul><li>FULL: Near 100% assurance of data integrity at the cost of performance </li></ul></ul>
  15. 15. Synchronous <ul><li>Control this setting using PRAGMA in a query: PRAGMA default_synchronous=OFF; </li></ul><ul><li>Other interesting PRAGMA options: </li></ul><ul><ul><li>count_changes: If enabled SQLite will count the number of affected rows in a query. If disabled functionality which relies on knowing the number of rows will be disabled </li></ul></ul>
  16. 16. Splitting up Tables <ul><li>Since every write locks the entire database, splitting tables which have heavy writing to them can improve performance </li></ul><ul><ul><li>Multiple databases means multiple files </li></ul></ul><ul><ul><li>Join them together using SQLite's ATTACH: ATTACH DATABASE mydatabase.db AS mydb; </li></ul></ul>
  17. 17. Table splitting pitfalls <ul><li>Can only attach a maximum of 10 databases together </li></ul><ul><li>Transactions lock all databases </li></ul><ul><li>Cross-database Transactions are not atomic </li></ul><ul><li>Attached databases cannot have their schema modified </li></ul>
  18. 18. Improving Reads <ul><li>By default SQLite performs reads using a buffered query </li></ul><ul><ul><li>Allows for data seeks forward and backward </li></ul></ul><ul><li>If you are only interested in reading from start to finish you can use an unbuffered query </li></ul><ul><ul><li>sqlite_unbuffered_query() </li></ul></ul><ul><ul><li>Only fetches one row at a time </li></ul></ul><ul><ul><li>Good for large result sets </li></ul></ul>
  19. 19. Questions?
  20. 20. MySQLi <ul><li>MySQLi (or I mproved MySQL) is a complete re-write of the old MySQL extension for PHP </li></ul><ul><li>Used with MySQL version 4.1 and above </li></ul><ul><li>Supports PHP APIs for new MySQL 4.1 features </li></ul><ul><li>Most legacy functions still exist, although their name has changed </li></ul>
  21. 21. Making the leap (mysqlidiff.php) <ul><li>MySQL and MySQLi share a similar API </li></ul><ul><li>Most functions which existed in the old extension exist today: </li></ul><ul><ul><li>instead of mysql_query() use mysqli_query() </li></ul></ul><ul><li>There are incompatibilities however </li></ul><ul><ul><li>No more implicit database resources (all queries must specify the database connection being used) </li></ul></ul><ul><ul><li>Doesn't work with versions of MySQL < 4.1 </li></ul></ul>
  22. 22. Backward Compatibility? <ul><li>MySQLi and the old MySQL extension do not play very nicely together </li></ul><ul><ul><li>Difficult, if not impossible, to get both mysql_* and mysqli_* functions available at the same time from PHP </li></ul></ul><ul><li>To overcome this, I created a compatibility layer: </li></ul><ul><ul><li>http://www.coggeshall.org/oss/mysql2i/ </li></ul></ul><ul><ul><li>Maps MySQLi functions to the old mysql_* names </li></ul></ul><ul><ul><li>Should be a drop-in fix to most legacy code </li></ul></ul>
  23. 23. Same steps <ul><li>Although the API has changed slightly, the steps for working with MySQL are the same: </li></ul><ul><ul><li>Connect to the database server </li></ul></ul><ul><ul><li>Select the database to use </li></ul></ul><ul><ul><li>Perform queries </li></ul></ul><ul><ul><li>Retrieve results </li></ul></ul><ul><ul><li>Close database connection </li></ul></ul>
  24. 24. An example (mysqli_simple.php) <ul><li>Here is a simple example of using MySQLi </li></ul><ul><ul><li>mysqli_connect() to connect to the database server </li></ul></ul><ul><ul><li>mysqli_select_db() to select the database </li></ul></ul><ul><ul><li>mysqli_query() to perform queries </li></ul></ul><ul><ul><li>mysqli_fetch_array() to return results </li></ul></ul><ul><ul><ul><li>mysqli_fetch_row() and mysqli_fetch_assoc() are also both available as helper methods </li></ul></ul></ul><ul><ul><li>mysqli_close() to close the connection </li></ul></ul>
  25. 25. Dealing with Errors (mysqli_error.php) <ul><li>You'll notice in my example I didn't deal with errors very nicely </li></ul><ul><li>As with the old extension, MySQLi can retrieve nice error codes and messages for users </li></ul><ul><ul><li>mysqli_errno() - returns an error code </li></ul></ul><ul><ul><li>mysqli_error() - returns a string representation of the error </li></ul></ul><ul><ul><li>mysqli_connect_error() - returns an error code from the connection process </li></ul></ul><ul><ul><li>mysqli_connect_error() - returns an error string from the connection process </li></ul></ul>
  26. 26. Executing Multiple Queries <ul><li>One of the big improvements in MySQLi is the ability to execute multiple queries at the same time . </li></ul><ul><li>Using a single multiquery in MySQLi is more complex than a single query </li></ul><ul><li>Must iterate through a set of result objects and then treat each one as a result </li></ul>
  27. 27. Need to know for Multiqueries (using_multiqueries.php) <ul><li>There are a few functions you need to know about when dealing with Multi-query select statements: </li></ul><ul><ul><li>mysqli_multi_query(): perform the multi-query </li></ul></ul><ul><ul><li>mysqli_store_result(): retrieve a result </li></ul></ul><ul><ul><ul><li>Perform operations against result as before </li></ul></ul></ul><ul><ul><li>mysqli_more_results(): check for another result </li></ul></ul><ul><ul><li>mysqli_next_result(): increment to next result </li></ul></ul>
  28. 28. Prepared Statements <ul><li>Prepared Statements are a more efficient way of performing queries against the database </li></ul><ul><ul><li>Every time you execute a query, the query must be parsed, checked for syntax, etc. </li></ul></ul><ul><ul><li>This is an expensive process </li></ul></ul><ul><ul><li>Prepared statements allow you to save compiled “templates” of a query </li></ul></ul><ul><ul><li>Instead of recompiling and retransmitting an entire query, only the values to plug into the templates are sent. </li></ul></ul>
  29. 29. Using prepared statements <ul><li>Consider the following query INSERT INTO mytable VALUES($data1, '$data2'); </li></ul><ul><li>Instead of specifying the variable in the query ($id), replace it with a ? placeholder </li></ul><ul><li>Use this type of prepared statement for database writes </li></ul>
  30. 30. Using prepared statements <ul><li>The same query as a prepared statement INSERT INTO mytable VALUES(?, ?); </li></ul><ul><li>Variable was replaced with ? </li></ul><ul><li>Note quotes are no longer necessary </li></ul><ul><ul><li>Prepared statements automatically escape data </li></ul></ul>
  31. 31. Using a Prepared Statement (mysqli_bound_param.php) <ul><li>Once you have a query you can use it in four steps: </li></ul><ul><ul><li>Prepare the statement using mysqli_prepare() </li></ul></ul><ul><ul><li>Bind PHP variables to the statement using mysqli_bind_param </li></ul></ul><ul><ul><li>Set the variable values </li></ul></ul><ul><ul><li>Execute the query and write to the database </li></ul></ul>
  32. 32. Using Result-Bound Prepared Statements (mysqli_bind_result.php) <ul><li>The second type of prepared statement is a result-bound prepared statement </li></ul><ul><li>Bind PHP variables to columns being returned </li></ul><ul><li>Loop over the result set and the PHP variables will be automatically be populated with current data </li></ul>
  33. 33. Transactions <ul><li>One of the biggest improvements in MySQLi/MySQL 4.0+ is the support for atomic transactions </li></ul><ul><ul><li>Multiple writes done as a single write </li></ul></ul><ul><ul><li>Insures data integrity during critical multi-write operations such as credit card processing </li></ul></ul>
  34. 34. Transactions API (mysqli_transactions.php) <ul><li>MySQLi supports a number of transaction APIs </li></ul><ul><ul><li>mysqli_autocommit() enables and disables auto committing of transactions </li></ul></ul><ul><ul><li>mysqli_commit() allows you to explicitly commit a transaction </li></ul></ul><ul><ul><li>mysqli_rollback() allows you to roll back (undo) a transaction </li></ul></ul><ul><ul><li>To determine the state of auto committing, perform the query: SELECT @@autocommit; </li></ul></ul>
  35. 35. Questions?
  36. 36. That's it for Databases <ul><li>Now that you've been introduced to the two new database extensions available in PHP 5, let's take a look at some of the other functionality </li></ul><ul><ul><li>PHP 5 boasts a completely revamped XML system </li></ul></ul><ul><ul><ul><li>Based on libxml2 library </li></ul></ul></ul><ul><ul><ul><li>dom </li></ul></ul></ul><ul><ul><ul><li>simplexml </li></ul></ul></ul><ul><ul><ul><li>xmlreader (to be released in PHP 5.1) </li></ul></ul></ul>
  37. 37. XML Processing in PHP 5 <ul><li>PHP 5 can parse XML in a variety of ways </li></ul><ul><ul><li>SAX (inherited from PHP 4) </li></ul></ul><ul><ul><li>DOM (as defined by the W3C) </li></ul></ul><ul><ul><li>Xpath </li></ul></ul><ul><ul><li>SimpleXML </li></ul></ul>
  38. 38. Benefits to the new XML <ul><li>In PHP 5 because everything uses a single underlying library many improvements have been made </li></ul><ul><ul><li>Can switch between SimpleXML/DOM processing at will </li></ul></ul><ul><ul><li>Streams support has been extended to XML documents themselves (use a stream for an <xsl:include> or <xi:include> tag, for instance.) </li></ul></ul>
  39. 39. DOM in PHP 5 <ul><li>PHP 5 supports a W3C compliant Document Object Model for XML </li></ul><ul><ul><li>A very detailed way of parsing XML </li></ul></ul><ul><ul><li>Refer to http://www.w3c.org/DOM for a complete description </li></ul></ul>
  40. 40. Reading XML using DOM <ul><li>Consider the following simple XML document <?xml version=&quot;1.0&quot; encoding=&quot;iso-8859-1&quot; ?> <articles> <item> <title>PHP Weekly: Issue # 172</title> <link>http://www.zend.com/zend/week172.php</link> </item> </articles> </li></ul>
  41. 41. Reading XML using DOM <ul><li>To use DOM, create a new instance of the DomDocument() object </li></ul><ul><li>Load an XML file using the load() method </li></ul><ul><li>To output the XML file to the browser, using the saveXML() method </li></ul><ul><li>To write the XML file to the filesystem use the save() method </li></ul>
  42. 42. Retrieving nodes by name (dom_getelementbytagname.php) <ul><li>One of the easiest ways to pull data out of an XML document is to retrieve them by name </li></ul><ul><ul><li>the getElementsByTagName() method returns a DomNodeList object </li></ul></ul><ul><ul><li>To get the content of a node refer to $node->firstChild->data; </li></ul></ul><ul><ul><li>PHP 5 also provides $node->textContent to retrieve the same data in a simplified fashion </li></ul></ul><ul><ul><li>DomNodeList objects can be iterated over like an array using foreach() </li></ul></ul>
  43. 43. More DOM navigation (dom_navigation.php) <ul><li>Although getElementsByTagName is useful, it is also a bit limited </li></ul><ul><ul><li>Doesn't give you information stored in the structure of the XML document itself </li></ul></ul><ul><ul><li>To be more detailed, you must parse the document manually </li></ul></ul><ul><ul><ul><li>Iterate over the childNodes property to get child nodes </li></ul></ul></ul><ul><ul><ul><li>Use nodeType and nodeName to identify nodes you are interested in </li></ul></ul></ul>
  44. 44. Writing XML using DOM (dom_writing.php) <ul><li>You can also write to XML documents using the DOM model </li></ul><ul><ul><li>Create nodes using the createElement() method </li></ul></ul><ul><ul><li>Create values using the createTextNode() method </li></ul></ul><ul><ul><li>Add nodes as children to existing nodes using appendChild() </li></ul></ul>
  45. 45. Extending DOM <ul><li>Because in PHP 5 DOM is handled through a DomDocument class, you can extend it to implement your own helper functions </li></ul><ul><ul><li>Must call the DomDocument constructor (__construct) when your extended class is constructed </li></ul></ul><ul><ul><li>Add a method like addArticle() which encapsulates the steps from the previous example to add a new article to the XML document </li></ul></ul>
  46. 46. XML Validation <ul><li>You can also validate XML documents using DOM in PHP 5 using one of the following three methods: </li></ul><ul><ul><li>DTD: A very old and largely unneeded method of XML validation </li></ul></ul><ul><ul><li>XML Schema: Defined by the W3C and can be very complex to work with </li></ul></ul><ul><ul><li>RelaxNG: A much simplified version of XML validation (recommended) </li></ul></ul>
  47. 47. XML Validation <ul><li>To use one of these three methods simply call one of the following after loading an XML document using the load() method </li></ul><ul><ul><li>$dom->validate('myxmlfile.dtd'); </li></ul></ul><ul><ul><li>$dom->relaxNGValidate('myxmlfile.rng'); </li></ul></ul><ul><ul><li>$dom->schemaValidate('myxmlfile.xsd'); </li></ul></ul><ul><li>These functions will return a boolean indicating if the validation was successful. </li></ul><ul><li>Currently doesn't have the best error handling... </li></ul>
  48. 48. Simplified XML parsing <ul><li>Although DOM is great when you don't really know what you are looking for, it is overly complex for when you do </li></ul><ul><li>For these reasons PHP 5 comes with the SimpleXML extension </li></ul><ul><li>Maps the structure of an XML document directly to a PHP 5 overloaded object for easy navigation </li></ul><ul><li>Only good for when you know the structure of the XML document beforehand. </li></ul>
  49. 49. Using SimpleXML (simplexml.php) <ul><li>To use simpleXML Load the XML document using... </li></ul><ul><ul><li>simplexml_load_file() to load a file </li></ul></ul><ul><ul><li>simplexml_load_string() to load from a string </li></ul></ul><ul><ul><li>simplexml_import_dom() to load from an existing DOM node </li></ul></ul><ul><li>Once loaded you can access nodes directly by name as properties / methods of the object returned </li></ul>
  50. 50. More details on SimpleXML <ul><li>As you can see, nodes can be directly accessed by name from the returned object </li></ul><ul><li>If you would like to extract attributes from a node, reference the name as an associative array: </li></ul><ul><ul><li>$simplexml->title['id']; </li></ul></ul><ul><ul><li>This will get the ID attribute of the TITLE root node </li></ul></ul>
  51. 51. Xpath in SimpleXML (simplexml_xpath.php) <ul><li>SimpleXML also supports Xpath for pulling particular nodes out of a XML document </li></ul><ul><ul><li>Use the xpath() method to provide your query </li></ul></ul>
  52. 52. Writing XML using SimpleXML <ul><li>Although there are limitations, you can also write XML documents using SimpleXML </li></ul><ul><ul><li>Just reassign a node or attribute to a new value </li></ul></ul><ul><ul><ul><li>$simplexml->item->title = “My new title”; </li></ul></ul></ul><ul><ul><ul><li>$simplexml->item->title['id'] = 42; </li></ul></ul></ul><ul><ul><li>Use the asXML() method to return back an XML document from SimpleXML </li></ul></ul><ul><ul><li>Alternatively you can also reimport a SimpleXML document into DOM using dom_import_simplexml() </li></ul></ul>
  53. 53. Questions?
  54. 54. Moving along <ul><li>As you can see, XML support has been drastically improved for PHP 5 </li></ul><ul><ul><li>HTML support has been improved as well </li></ul></ul><ul><ul><li>The new tidy extension allows for intelligent HTML parsing, manipulation and repair </li></ul></ul>
  55. 55. What exactly is Tidy? <ul><li>Tidy is a intelligent HTML parser </li></ul><ul><li>It can parse malformed HTML documents and intelligently correct most common errors in their syntax </li></ul><ul><ul><li>Missing or misaligned end tags </li></ul></ul><ul><ul><li>Unquoted attributes </li></ul></ul><ul><ul><li>Missing required tag elements </li></ul></ul><ul><li>Tidy automatically adjusts itself based on the detected HTML document type </li></ul>
  56. 56. Using Tidy in PHP (tidy_syntax_fix.php) <ul><li>In its simplest form Tidy will read an HTML document, parse it, correct any syntax errors and allow you to display the corrected document back </li></ul><ul><ul><li>Use tidy_parse_file() to parse the file </li></ul></ul><ul><ul><li>Use tidy_get_output() to return the corrected HTML </li></ul></ul><ul><li>Note that the resource returned from tidy_parse_file() can also be treated as a string to get the output </li></ul>
  57. 57. Identifying problems with a document <ul><li>Once a document has been parsed you can identify problems with the document by examining the return value of the tidy_get_error_buffer() function </li></ul><ul><li>Returns something like the following: line 1 column 1 – Warning: missing <!DOCTYPE> declaration line 1 column 1 – Warning: replacing unexpected i by </i> line 1 column 43 – Warning: <u> is probably intended as </u> line 1 column 1 – Warning: inserting missing 'title' element </li></ul>
  58. 58. Repairing HTML documents <ul><li>Once a document has been parsed you can be sure it is valid from a syntax standpoint </li></ul><ul><li>However, this does not mean a document is actually web-standards compliant </li></ul><ul><li>To make a parsed HTML document standards complaint call the tidy_clean_repair() function </li></ul><ul><li>Brings the document up to spec according to configuration options (discussed later) </li></ul>
  59. 59. Configuration Options? <ul><li>The vast majority of power in tidy comes from the configuration options which can be set </li></ul><ul><ul><li>Allows you to do everything from replace deprecated <FONT> tags with CSS to converting HTML 3.2 documents into XHTML 1.0 documents </li></ul></ul><ul><ul><li>Can be set either at run time or through a configuration file </li></ul></ul><ul><ul><li>A default configuration can be set using the tidy.default_config php.ini directive. </li></ul></ul>
  60. 60. Runtime configuration (tidy_runtime_config.php) <ul><li>To configure tidy at run time you must pass the configuration as the second parameter to tidy_parse_file() </li></ul><ul><ul><li>If the second parameter is an array, it should be a series of key/value pairs mapping to configuration options / values </li></ul></ul><ul><ul><li>If the second parameter is a string it will be treated as a tidy configuration filename and loaded from the filesystem. </li></ul></ul>
  61. 61. Configuration Files <ul><li>Configuration files are useful for creating tidy “profiles” representing different tasks </li></ul><ul><ul><li>A profile to strip all unnecessary data from an HTML document (save bandwidth) </li></ul></ul><ul><ul><li>A profile to beautify HTML documents which are difficult to read </li></ul></ul>
  62. 62. Configuration Files <ul><li>Below is an example tidy configuration file indent: yes indent-spaces: 4 wrap: 4096 tidy-mark: no new-blocklevel-tags: mytag, anothertag </li></ul>
  63. 63. Parsing with Tidy <ul><li>Along with all of the functionality for parsing/cleaning/repairing HTML tidy can also be used to parse HTML documents </li></ul><ul><ul><li>Four different entry points </li></ul></ul><ul><ul><ul><li>ROOT </li></ul></ul></ul><ul><ul><ul><li>HEAD </li></ul></ul></ul><ul><ul><ul><li>HTML </li></ul></ul></ul><ul><ul><ul><li>BODY </li></ul></ul></ul><ul><ul><li>Enter using the root(), head(), html(), or body() methods </li></ul></ul>
  64. 64. The Tidy Node (pseudo_tidy_node.php) <ul><li>When calling one of the entry-point methods against the return value from tidy_parse_file(), you get back a tidyNode object </li></ul><ul><ul><li>Each node represents a tag in the HTML document </li></ul></ul><ul><ul><li>Allows you to find out many interesting things about the node </li></ul></ul><ul><ul><li>Allows you to pull out attributes quickly, making screen scraping a snap </li></ul></ul><ul><ul><li>Consult the PHP manual for details on type, etc. </li></ul></ul>
  65. 65. Example of using Tidy Parsing (tidy_dump_nodes.php) <ul><li>In this example we will parse a document using Tidy and extract all of the URLs found within <A> tags </li></ul><ul><ul><li>Check the $id property of the node to see if it matches the TIDY_TAG_A constant </li></ul></ul><ul><ul><li>Look for the 'href' property in the $attribute array </li></ul></ul>
  66. 66. Questions?

×