TAMING THE TIGER
DEALING WITH PHYSICS IN PROGRAMMING
THE LAWS OF PHYSICS ALWAYS APPLY
• CPUS use electricity and produce heat
• Even computers “in the cloud” are on physical hardware
• There is a limit to the amount of throughput a nic can push
• The larger the data the longer it takes to move it, and the more surface it takes to store it
ARRAYS ARE EVIL
• There are other ways to store data that are more efficient
• They should be used for small numbers of data
• No matter how hard you try, there is C overhead
USE THE ITERATION, LUKE
• Lazy fetching exists for database fetching – use it!
• Always page (window) your result sets from the database – ALWAYS
• Use filters or generators to format or alter results on the fly
STREAM YOUR DATA
• Work on chunks at a time
• Seek back and forth through data if necessary
• Use PHP streams as they were meant to be used
STREAMS: COMPUTING CONCEPT
Definitions
• Idea originating in 1950’s
• Standard way to get Input and Output
• A source or sink of data
Who uses them
• C – stdin, stderr, stdout
• C++ iostream
• Perl IO
• Python io
• Java
• C#
WHAT IS A STREAM?
• Access input and output generically
• Can write and read linearly
• May or may not be seekable
• Comes in chunks of data
WHAT USES STREAMS?
• EVERYTHING
• include/require _once
• stream functions
• file system functions
• many other extensions
ALL IO
Attach
Context
Stream
Transport
Stream
Filter
Stream
Wrapper
HOW PHP STREAMS WORK
USING STREAMS
WHAT ARE FILTERS?
• Performs operations on stream data
• Can be prepended or appended (even on the fly)
• Can be attached to read or write
• When a filter is added for read and write, two instances of the filter are created.
USING FILTERS
THINGS TO WATCH FOR!
• Data has an input and output state
• When reading in chunks, you may need to cache in between reads to make filters useful
• Use the right tool for the job
PROCESS WITH THE APPROPRIATE TOOLS
• Load data into the appropriate place for processing
• Hint – arrays are IN MEMORY – that is generally not an appropriate place for processing
• Datastores are meant for storing and retrieving data, use them
OFFLOAD WORK
• Put work items in queues and inform the user when they’re completed
• It’s not realistic to expect complex reports to be done in seconds, physics apply here too
• Caching complex work items is a good way to balance offloaded work with immediate results
COMMUNICATE WITH OTHER PROCESSES
• Microservices are in essence jobbed systems communicated via http
• You can overload them to work via unix sockets as well
• Rachet or other websockets solutions allow for heavy work with multiplexed communication
• PHP can run in daemons, and even listen and communicate over sockets
NETWORK SOCKET TYPES
• Stream
• Connection oriented (tcp)
• Datagram
• Connectionless (udp)
• Raw
• Low level protocols
DEFINITIONS
• Socket
• Bidirectional network stream that speaks a protocol
• Transport
• Tells a network stream how to communicate
• Wrapper
• Tells a stream how to handle specific protocols and encodings
USING
SOCKETS
THAT SOCKETS EXTENSION…
• New APIS in streams and filesystem functions are replacements
• Extension is very low level
• stream_socket_server
• stream_socket_client
About Me
 http://emsmith.net
 auroraeosrose@gmail.com
 twitter - @auroraeosrose
 IRC – freenode – auroraeosrose
 #phpmentoring
 https://joind.in/talk/f88ef

Taming the tiger - pnwphp

  • 1.
    TAMING THE TIGER DEALINGWITH PHYSICS IN PROGRAMMING
  • 2.
    THE LAWS OFPHYSICS ALWAYS APPLY • CPUS use electricity and produce heat • Even computers “in the cloud” are on physical hardware • There is a limit to the amount of throughput a nic can push • The larger the data the longer it takes to move it, and the more surface it takes to store it
  • 3.
    ARRAYS ARE EVIL •There are other ways to store data that are more efficient • They should be used for small numbers of data • No matter how hard you try, there is C overhead
  • 4.
    USE THE ITERATION,LUKE • Lazy fetching exists for database fetching – use it! • Always page (window) your result sets from the database – ALWAYS • Use filters or generators to format or alter results on the fly
  • 5.
    STREAM YOUR DATA •Work on chunks at a time • Seek back and forth through data if necessary • Use PHP streams as they were meant to be used
  • 6.
    STREAMS: COMPUTING CONCEPT Definitions •Idea originating in 1950’s • Standard way to get Input and Output • A source or sink of data Who uses them • C – stdin, stderr, stdout • C++ iostream • Perl IO • Python io • Java • C#
  • 7.
    WHAT IS ASTREAM? • Access input and output generically • Can write and read linearly • May or may not be seekable • Comes in chunks of data
  • 8.
    WHAT USES STREAMS? •EVERYTHING • include/require _once • stream functions • file system functions • many other extensions
  • 9.
  • 10.
  • 11.
    WHAT ARE FILTERS? •Performs operations on stream data • Can be prepended or appended (even on the fly) • Can be attached to read or write • When a filter is added for read and write, two instances of the filter are created.
  • 12.
  • 13.
    THINGS TO WATCHFOR! • Data has an input and output state • When reading in chunks, you may need to cache in between reads to make filters useful • Use the right tool for the job
  • 14.
    PROCESS WITH THEAPPROPRIATE TOOLS • Load data into the appropriate place for processing • Hint – arrays are IN MEMORY – that is generally not an appropriate place for processing • Datastores are meant for storing and retrieving data, use them
  • 15.
    OFFLOAD WORK • Putwork items in queues and inform the user when they’re completed • It’s not realistic to expect complex reports to be done in seconds, physics apply here too • Caching complex work items is a good way to balance offloaded work with immediate results
  • 16.
    COMMUNICATE WITH OTHERPROCESSES • Microservices are in essence jobbed systems communicated via http • You can overload them to work via unix sockets as well • Rachet or other websockets solutions allow for heavy work with multiplexed communication • PHP can run in daemons, and even listen and communicate over sockets
  • 17.
    NETWORK SOCKET TYPES •Stream • Connection oriented (tcp) • Datagram • Connectionless (udp) • Raw • Low level protocols
  • 18.
    DEFINITIONS • Socket • Bidirectionalnetwork stream that speaks a protocol • Transport • Tells a network stream how to communicate • Wrapper • Tells a stream how to handle specific protocols and encodings
  • 19.
  • 20.
    THAT SOCKETS EXTENSION… •New APIS in streams and filesystem functions are replacements • Extension is very low level • stream_socket_server • stream_socket_client
  • 21.
    About Me  http://emsmith.net auroraeosrose@gmail.com  twitter - @auroraeosrose  IRC – freenode – auroraeosrose  #phpmentoring  https://joind.in/talk/f88ef

Editor's Notes

  • #2 No matter how many virtual machines you throw at a problem you always have the physical limitations of hardware. Memory, CPU, and even your NIC's throughput have finite limits. Are you trying to load that 5 GB csv into memory to process it? No really, you shouldn't! PHP has many built in features to deal with data in more efficient ways that pumping everything into an array or object. Using PHP stream and stream filtering mechanisms you can work with chunked data in an efficient matter, with sockets and processes you can farm out work efficiently and still keep track of what your application is doing. These features can help with memory, CPU, and other physical system limitations to help you scale without the giant AWS bill.
  • #4 n PHP 5.x a whopping 144 bytes per element were required. In PHP 7 the value is down to 36 bytes, or 32 bytes for the packed case but it’s STILL not the best
  • #7 Quick computer science lesson Originally done with magic numbers in fortran, C and unix standardized the way it worked On Unix and related systems based on the C programming language, a stream is a source or sink of data, usually individual bytes or characters. Streams are an abstraction used when reading or writing files, or communicating over network sockets. The standard streams are three streams made available to all programs. Who else uses them? Most languages descended from C have the “files as streams concept” and ways to extend the IO functionality beyond merely files, this allows them to be merged all together Great way to standardize the way data is grabbed and used Questions on who has used streams in other languages
  • #8 Streams are a huge underlying component of PHP Streams were introduced with PHP 4.3.0 – they are old, but underuse means they can have rough edges… so TEST TEST TEST But they are more powerful then almost anything else you can use Why is this better ? Lots and lots of data in small chunks lets you do large volumes without maxing out memory and cpu
  • #9 Any good extension will use the underlying streams API to let you use any kind of stream for example, cairo does this stuff to work with PHP streams is spread across at least two portions of the manual, plus appendixes for the build in transports/filters/context options. It’s very poorly arranged so be sure to take the time to learn where to look in the manual – there should be three main places What doesn’t use streams? Chmod, touch and some other very file specific funtionality, lazy/bad extensions, extensions with issues in the libraries they wrap around
  • #10 All input and output comes into PHP It gets pushed through a streams filter Then through the streams wrapper During this point the stream context is available for the filter and wrapper to use Streams themselves are the “objects” coming in Wrappers are the “classes” defining how to deal with the stream
  • #11 Some notes – file_get_contents and it’s cousin stream_get_contents are your fastest most efficient way if you need the whole file File(blah) is going to be the best way to get the whole file split by lines Both are going to stick the whole file into memory at some point. For very large files and to help with memory consumption, the use of fgets and fread will help
  • #12 A filter is a final piece of code which may perform operations on data as it is being read from or written to a stream. Any number of filters may be stacked onto a stream. Custom filters can be defined in a PHP script using stream_filter_register() or in an extension using the API Reference in Working with streams. To access the list of currently registered filters, use stream_get_filters(). Stream data is read from resources (both local and remote) in chunks, with any unconsumed data kept in internal buffers. When a new filter is prepended to a stream, data in the internal buffers, which has already been processed through other filters will not be reprocessed through the new filter at that time. This differs from the behavior of stream_filter_append(). Filters are nice for manipulating data on the fly – but remember you’ll be getting data in chunks, so your filter needs to be smart enough to handle that
  • #13 Filters can be appended or prepended – and attached to READ or WRITE Notice that stream_filter_prepend and append are smart – if you opened with the r flag, by default it’ll attach to read, if you opened with the w flag, it will attach to write Note: Stream data is read from resources (both local and remote) in chunks, with any unconsumed data kept in internal buffers. When a new filter is prepended to a stream, data in the internal buffers, which has already been processed through other filters will not be reprocessed through the new filter at that time. This differs from the behavior of stream_filter_append(). Note: When a filter is added for read and write, two instances of the filter are created. stream_filter_prepend() must be called twice with STREAM_FILTER_READ and STREAM_FILTER_WRITE to get both filter resources.
  • #14 Well it may look like manipulating data in a variable is preferable to the above. But the above is just a simple example. Once you add a filter to a stream it basically hides all the implementation details from the user. You will be unaware of the data being manipulated in a stream. And also the same filter can be used with any stream (files, urls, various protocols etc.) without any changes to the underlying code. Also multiple filters can be chained together, so that the output of one can be the input of another. The filters need an input state and an output state. And they need to respect the the fact that number of requested bytes does not necessarily mean reading the same amount of data on the other end. In fact the output side does generally not know whether less, the same amount or more input is to be read. But this can be dealt with inside the filter. However the filters should return the number input vs the number of output filters always independently. Regarding states we would be interested if reaching EOD on the input state meant reaching EOD on the output side prior to the requested amount, at the requested amount or not at all yet (more data available).
  • #19 What is streamable behavorior? We’ll get to that in a bit Protocol: set of rules which is used by computers to communicate with each other across a network Resource: A resource is a special variable, holding a reference to an external resource Talk about resources in PHP and talk about general protocols, get a list from the audience of protocols they can name (yes http is a protocol) A socket is a special type of stream – pound this into their heads A socket is an endpoint of communication to which a name can be bound. A socket has a type and one associated process. Sockets were designed to implement the client-server model for interprocess communication where: In php , a wrapper ties the stream to the transport – so your http wrapper ties your PHP data to the http transport and tells it how to behave when reading and writing data
  • #20 By default sockets are going to assume tcp – since that’s a pretty standard way of doing things. Notice that we have to do things the old fashioned way just for this simple http request – sticking our headers together, making sure stuff gets closed. However if you can’t use allow_url_fopen this is a way around it a dirty dirty way but – there you have it remember allow_url_fopen only stops “drive-by” hacking
  • #21 Avoid the old sockets extension unless you really really know what you’re doing Most of the things you used to need the sockets extension for you no longer do those last two functions, stream socket server and stream socket client make doing a client/server relationship really easy with much less code It’s sometimes hard to find examples on the stream_socket stuff since most of the old stuff on the internet still uses the sockets extension Don’t follow their lead, take the time to read the php documentation and use the new APIs
  • #22 There is SOOO much more you can do from hooking objects to hooking the engine!