Your SlideShare is downloading. ×
0
PHP Continuous Data Processing
PHP Continuous Data Processing
PHP Continuous Data Processing
PHP Continuous Data Processing
PHP Continuous Data Processing
PHP Continuous Data Processing
PHP Continuous Data Processing
PHP Continuous Data Processing
PHP Continuous Data Processing
PHP Continuous Data Processing
PHP Continuous Data Processing
PHP Continuous Data Processing
PHP Continuous Data Processing
PHP Continuous Data Processing
PHP Continuous Data Processing
PHP Continuous Data Processing
PHP Continuous Data Processing
PHP Continuous Data Processing
PHP Continuous Data Processing
PHP Continuous Data Processing
PHP Continuous Data Processing
PHP Continuous Data Processing
PHP Continuous Data Processing
PHP Continuous Data Processing
PHP Continuous Data Processing
PHP Continuous Data Processing
PHP Continuous Data Processing
PHP Continuous Data Processing
PHP Continuous Data Processing
PHP Continuous Data Processing
PHP Continuous Data Processing
PHP Continuous Data Processing
PHP Continuous Data Processing
PHP Continuous Data Processing
PHP Continuous Data Processing
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

PHP Continuous Data Processing

1,269

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,269
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
7
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Imagine viewing a customers fleet of 30 vehicles on a map? 60 queries refreshing every 30 seconds
  • Transcript

    • 1. PHP & Continuous Data Processing<br />Michael Peacock, October, 2011<br />
    • 2. No. Not milk floats (anymore)<br />All Electric, Commercial Vehicles.<br />Photo courtesy of kenjonbro: http://www.flickr.com/photos/kenjonbro/4037649210/in/set-72157623026469013<br />
    • 3. About Michael Peacock<br /><ul><li>Senior/Lead Web Developer
    • 4. Web Systems Developer
    • 5. Telemetry Team – Smith Electric Vehicles US Corp
    • 6. Author
    • 7. PHP 5 Social Networking, PHP 5 E-Commerce Development, Drupal Social Networking (6 & 7), Selling online with Drupal e-Commerce, Building Websites with TYPO3
    • 8. PHPNE Volunteer
    • 9. Occasional technical speaker
    • 10. PHP North-East, PHPNW 2010, SuperMondays, PHPNW 2011 Unconference, ConFoo 2012</li></li></ul><li>Smith Electric Vehicles & Telemetry <br />Worlds largest manufacturer of Commercial, all-electric vehicles<br />Smith Link – on-board vehicle telematics system, capturing over 2500 data points each second on the vehicle and broadcasting them over mobile network<br />~400 telemetry enabled vehicles on the road<br />Worlds largest telemetry project outside of F1<br />
    • 11. System Architecture<br />
    • 12. System Architecture<br />
    • 13. Problem #1: We Can’t Loose Any Data<br />Data is required as part of a $32 million grant from the US Department of Energy<br /><ul><li>Thousands of pieces of information collected on a per second basis from a range of remote collection devices
    • 14. Un-predictable amounts of data at any one time
    • 15. More vehicles rolling off the production line with telemetry enabled
    • 16. What about system downtime, upgrades, roll-outs and connectivity problems?</li></li></ul><li>Message Queuing<br />Solution: We use a fast, reliable, scalable, secure, hosted message queue<br /><ul><li>If our systems are offline, data builds up in the external message queue
    • 17. If we are processing at full capacity, surplus builds in in the message queue
    • 18. If the vehicle loses GPRS signal, or message queue were to be inaccessible, vehicles have an internal buffer of up to 7 days</li></li></ul><li>Secret Weapon #1: StormMQ<br /><ul><li>Based on AMQP, an open standard
    • 19. Secure: All data is encrypted and sent over SSL
    • 20. Reliable: Huge investment in server infrastructure
    • 21. Hosted: Backed up with an SLA
    • 22. Scalable: Capable of processing huge numbers of incoming messages, with capacity to store the messages when we perform maintenance on our systems</li></li></ul><li>Problem #2: Processing data quickly<br />We utilise a dedicated server and number of dedicated applications to pull these messages and process them<br /><ul><li>This needs to happen quick enough for live data to be seen through the web interface
    • 23. Data is rapidly converted into batch SQL files, which are imported to MySQL via “LOAD DATA INFILE”
    • 24. Results in high number of inserts per second (20,000 – 80,000)
    • 25. LOAD DATA INFILE isn’t enough on its own...</li></li></ul><li>Secret Weapon #2: DBA<br />Sam Lambert – DBA Extraordinaire<br /><ul><li>Constantly tweaking the servers and configuration to get more and more performance
    • 26. Pushing the capabilities of our SAN, tweaking configs where no DBA has gone before
    • 27. www.samlambert.com
    • 28. http://www.samlambert.com/2011/07/how-to-push-your-san-with-open-iscsi_13.html
    • 29. http://www.samlambert.com/2011/07/diagnosing-and-fixing-mysql-io.html
    • 30. sam.lambert@smithelectric.com</li></li></ul><li>Sharding<br /><ul><li>Huge volumes of data being stored
    • 31. We shard the data based on the truck it came from, each truck has its own database
    • 32. Databases held on one of many database servers in our cluster each with ~100GB RAM</li></li></ul><li>Live, Real Time Information<br />[live screen photo]<br />
    • 33. Real Time Status and Tracking<br />
    • 34. Live, Real Time Information: Problem<br />Original database design dictated:<br /><ul><li>All data-points were stored in the same table
    • 35. Each type of data point required a separate query, sub-query or join to obtain</li></ul>Workings of the remote device collecting the data, and the processing server, dictated:<br /><ul><li>GPS Co-ordinates can be up to 6 separate data points, including: Longitude; Latitude; Altitude; Speed; Number of Satellites used to get location; Direction</li></li></ul><li>Real Time Information: Concurrent<br />Initial Solution from the original developers:<br /><ul><li>Pull as many pieces of real time information through asynchronously
    • 36. Involved the use of Flash based “widgets” which called separate PHP scripts to query the data
    • 37. Pages loaded relatively quickly
    • 38. Data points took a little time to load
    • 39. Not good enough</li></li></ul><li>Real Time Information: Caching<br /><ul><li>High volumes of data, and varying levels of concurrent processing means query times are often not consistent
    • 40. Memcachewas used when processing the data from the message queue, keeping a copy of the most recent of each data point for each truck
    • 41. Live, Real-Time information accessed directly from memcache, bypassing the database</li></li></ul><li>Caching: Registry/DI is Ideal<br /><ul><li>Sporadic use of memcache within the web application – ideal use case for a lazy loading registry or DI container
    • 42. Give the registry or container details of memcache
    • 43. Object only instantiated and connection made only when data is requested from memcache</li></li></ul><li>Lazy Loading<br />public function getObject( $key )<br />{<br /> if( in_array( $key, array_keys( $this->objects ) ) )<br /> {<br /> return $this->objects[$key];<br /> }<br />elseif( in_array( $key, array_keys( $this->objectSetup ) ) )<br /> {<br /> if( ! is_null( $this->objectSetup[ $key ]['abstract'] ) )<br /> {<br />require_once( FRAMEWORK_PATH . 'registry/aspects/' . $this->objectSetup[ $key ]['folder'] . '/' . $this->objectSetup[ $key ]['abstract'] .'.abstract.php' );<br /> }<br />require_once( FRAMEWORK_PATH . 'registry/aspects/' . $this->objectSetup[ $key ]['folder'] . '/' . $this- >objectSetup[ $key ]['file'] . '.class.php' );<br /> $o = new $this->objectSetup[ $key ]['class']( $this );<br /> $this->storeObject( $o, $key );<br /> return $o;<br /> }<br />elseif( $key == 'memcache' )<br /> {<br /> // requesting memcache for the first time, instantiate, connect, store and return<br /> $mc = new Memcache();<br /> $mc->connect( MEMCACHE_SERVER, MEMCACHE_PORT );<br /> $this->storeObject( $mc, 'memcache' );<br /> return $mc;<br /> }<br />}<br />Becomes the limit for the registry pattern, DI container more suitable<br />
    • 44. Real Time Information: Extrapolate and Assume<br /><ul><li>Our telemetry unit broadcasts each data point once per second
    • 45. Data doesn’t change every second, e.g.
    • 46. Battery state of charge may take several minutes to loose a percentage point
    • 47. Fault flags only change to 1 when there is a fault
    • 48. Make an assumption.
    • 49. We compare the data to the last known value…if it’s the same we don’t insert, instead we assume it was the same
    • 50. Unfortunately, this requires us to put additional checks and balances in place</li></li></ul><li>Extrapolate and Assume: “Interlation”<br />Built a special library which:<br /><ul><li>Accepted a number of arrays, each representing a collection of data points for one variable on the truck
    • 51. Used key indicators and time differences to work out if/when the truck was off, and extrapolation should stop
    • 52. For each time data was recorded, pull down data for other variables for consistency</li></li></ul><li>Interlace<br /> * Add an array to the interlation<br /> public function addArray( $name, $array )<br /> * Get the time that we first receive data in one of our arrays<br /> public function getFirst( $field )<br /> * Get the time that we last received data in any of our arrays<br /> public function getLast( $field )<br /> * Generate the interlaced array<br /> public function generate( $keyField, $valueField)<br /> * Beak the interlaced array down into seperate days<br /> public function dayBreak( $interlationArray)<br /> * Generate an interlaced array and fill for all timestamps withinthe range of _first_ to _last_<br /> public function generateAndFill( $keyField, $valueField)<br /> * Populate the new combined array with key fields using the common field<br /> public function populateKeysFromField( $field, $valueField=null )<br />http://www.michaelpeacock.co.uk/interlation-library<br />
    • 53. Real Time Information: Single Request<br /><ul><li>Currently, each piece of “live data” is loaded into a flash graph or widget, which updates every 30 seconds using an AJAX request
    • 54. The move from MySQL to Memcache reduces database load, but large number of requests still add strain to web server
    • 55. Moving to image and JavaScript widgets, which are updated from a single AJAX request</li></li></ul><li>Lots of Data: Race Conditions<br />Sessions in PHP close at the end of the execution cycle<br /><ul><li>Unpredictable query times
    • 56. Large number of concurrent requests per screen</li></ul>Session Locking<br />Completely locks out a users session, as PHP hasn’t closed the session<br />
    • 57. Race Conditions: PHP & Sessions<br />session_write_close()<br />Added after each write to the $_SESSION array. Closes the current session.<br />(requires a call to session_start immediately before any further reads or writes)<br />
    • 58. Race Conditions: Use a ******* Template Engine<br /><ul><li>V1 of the system mixed PHP and HTML 
    • 59. You can’t re-initialise your session once output has been sent
    • 60. All new code uses a template engine, so session interaction has no bearing on output. When the template is processed and output, all database and session work has been completed long before.</li></li></ul><li>Race Conditions: Use a Single Entry Point<br /><ul><li>Race conditions are further exacerbated by the PHP timeout values
    • 61. Certain exports, actions and processes take longer than 30 seconds, so the default execution time is longer
    • 62. Initially the project lacked a single entry point, and execution flow was muddled
    • 63. Single Entry Point makes it easier to enforce a lower time out, which is overridden by intensive controllers or models</li></li></ul><li>Intensive queries & Calculations<br /><ul><li>How far did this vehicle travel?
    • 64. Motor RPM x Various vehicle specific constants
    • 65. Calculated for every RPM value held during drive process
    • 66. How much energy did the vehicle use
    • 67. Battery Current x Battery Voltage x Time
    • 68. For every current and voltage value combination held during the driving process
    • 69. How well was the vehicle driven
    • 70. Analysis of idle time
    • 71. Harshness of accelerator and brake pedal usage
    • 72. Inappropriate duration of AC / Heater on time?
    • 73. What about for a customers fleet, or all of our vehicles sold?</li></li></ul><li>Intensive Queries & Calculations<br />
    • 74. Intensive queries & Calculations<br /><ul><li>Involves a fair number of queries per vehicle
    • 75. Calculations involve holding this data in memory
    • 76. Processing required for every single record for that piece of data during that day</li></ul>Takes a while!<br />Solution:<br /><ul><li>Calculate information overnight
    • 77. Save it as a compiled report
    • 78. Lookups and comparisons only need to look at the compiled / saved reports in the database</li></li></ul><li>Reports<br />In addition to our calculated reports, we also need to export key bits of information to grant authorities<br /><ul><li>Initially our PHP based export scripts held one database connection per database (~400 databases)
    • 79. Re-wrote to maintain only one connection per server, and switch the database used
    • 80. Toggles to instruct the export to only apply for 1 of the servers at a time
    • 81. Modulus magic to run multiple export scripts per server</li></li></ul><li>Triggers and Events<br />Currently a work-in-progress R&D project, evaluating two options:<br /><ul><li>Golden hammer: Use PHP
    • 82. Run PHP as a daemon
    • 83. http://kevin.vanzonneveld.net/techblog/article/create_daemons_in_php/
    • 84. Continually monitor for specific changes to memcache variables
    • 85. Node.js
    • 86. Light weight and fast
    • 87. Give PHP another friend
    • 88. Link into PHP based API to run triggers </li></li></ul><li>The Future<br /><ul><li>More sharding
    • 89. Based on time – keep the individual tables smaller
    • 90. NoSQL?
    • 91. Currently investigating NoSQL solutions as alternatives
    • 92. Rationalisation
    • 93. Do we need as much data as we collect?
    • 94. Abstraction
    • 95. We need to continually abstract concepts and ideas to make on-going maintenance and expansion easier; especially in terms of mapping code to database shards
    • 96. More hardware
    • 97. Expand our DB cluster, more RAM, R&D
    • 98. Design
    • 99. A much needed design refresh</li></li></ul><li>Conclusions<br /><ul><li>Make the solution scalable from the start
    • 100. Where data collection is critical, use a message queue, ideally hosted or “cloud based”
    • 101. Hire a genius DBA to push your database engine
    • 102. Make use of data caching systems to reduce strain on the database
    • 103. Calculations and post-processing should be done during dead time and automated
    • 104. Add more tools to your toolbox – PHP needs lots of friends in these situations
    • 105. Watch out for Session race conditions: where they can’t be avoided, use session_write_close, a template engine and a single entry point
    • 106. Reduce the number of continuous AJAX calls</li></li></ul><li>Q & A<br />Michael Peacock<br />Web Systems Developer – Telemetry Team – Smith Electric Vehicles US Corp<br />michael.peacock@smithelectric.com<br />Senior / Lead Developer, Author & Entrepreneur<br />me@michaelpeacock.co.uk <br />www.michaelpeacock.co.uk<br />@michaelpeacock<br />http://joind.in/3808<br />http://www.slideshare.net/michaelpeacock<br /> Extra information!<br />

    ×