Socket programming is the workhorse of the internet. But there's far more to sockets than just HTTP. Although many of the design decisions for PHP have been optimized to solve web specific problems, that doesn't mean that it doesn’t have very capable tools for other types of programming. PHP has not only one but two sets of comprehensive tools for working with sockets. The socket apis in PHP can support everything you would need. The streams api and can be used to rapidly build applications and the sockets extension gives low level control over every aspect of socket communication. Learn about socket and network socket programming basics, sockets with php streams, sockets with the php sockets extension, and finally using websockets (part of html5) with PHPPrimarily going to talk about “internet sockets”
So why do we need socket programming? we need socket programming so processes can talk to one another – but not only locally talk to each other
IPC is a set of methods for the exchange of data among multiple threads in one or more processes.Processes may be running on one or more computers connected by a network.IPC methods are divided into methods for message passing, synchronization, shared memory, and remote procedure calls (RPC).The method of IPC used may vary based on the bandwidth and latency of communication between the threads, and the type of data being communicated
What if we’re talking about a single system with needs?How about two systems in two different buildings at the department of defense?... yeah that’s what started the internet!Defense Advanced Research Projects Agency (DARPA)Arpanet was the first real network to run on packet switching technology (new at the time).On the October 29, 1969, computers at Stanford and UCLA connected for the first time. This was the birth of the internetAn Arpanet network was established between Harvard, MIT, and BBN (the company that created the "interface message processor" computers used to connect to the network) in 1970.1974 was a breakthrough year. A proposal was published to link Arpa-like networks together into a so-called "inter-network", which would have no central control and would work around a transmission control protocol (which eventually became TCP/IP)1983: Arpanet computers switch over to TCP/IPJanuary 1, 1983 was the deadline for Arpanet computers to switch over to the TCP/IP protocols developed by Vinton Cerf. A few hundred computers were affected by the switch. The name server was also developed in ’83in 1984 dns was created and the rest, as they say – is historyso IP is the backbone of our internet, yet many people don’t even know what it means
In the beginning of the internet, open source, and all things networky…ARPA wanted to connect several separate, dissimilar networks to create an internetworkIBM and AT&T and even microsoft were all working on these competing protocolsThese protocols included IBM Systems Network Architecture (SNA), Open Systems Interconnection (OSI),Microsoft's native NetBIOSXerox Network Systems (XNS)IP (and tcp/udp by extension) won when AT&T agreed to place the TCP/IP code that they had little copyright ties for and helped develop for UNIX into the public domain(Agreed is a very loose term)that code is the basic workhouse and api of all thingsit’s known as BSD sockets
As with all other communications protocol, TCP/IP is composed of layers:IP - is responsible for moving packet of data from node to node. IP forwards each packet based on a four byte destination address (the IP number). The Internet authorities assign ranges of numbers to different organizations. The organizations assign groups of their numbers to departments. IP operates on gateway machines that move data from department to organization to region and then around the world.TCP - is responsible for verifying the correct delivery of data from client to server. Data can be lost in the intermediate network. TCP adds support to detect errors or lost data and to trigger retransmission until the data is correctly and completely received.Sockets - is a name given to the package of subroutines that provide access to TCP/IP on most systems.
On the battlefield a communications network will sustain damage, so the DOD designed TCP/IP to be robust and automatically recover from any node or phone line failure. This design allows the construction of very large networks with less central management. However, because of the automatic recovery, network problems can go undiagnosed and uncorrected for long periods of time.maintained by the Internet Engineering Task Force (IETF)
So your application level is the basic data you want to sendin most http applications this is your http page INLUDING the headers sectionthe transport is how you’re sending it – UDP and TCP are the most popularthe Internet layer is the “IP” layer – with the header telling the system what address (ip) to send the data to and what port to take tothen you get a frame header and footer on the actual piece of data the packet being sent
than just tcp and IP – ther’esa lot more to it than that – when we work with sockets we work at the transport and internet layers – making connections, using supported protocols, and using transports to talk to other layersThe tools that most programmers use for sockets are cross OS and you all have a bunch of old guys in beards to thank for writing it ;)
A socket provides a bidirectional communication endpoint for sending and receiving data with another socketSockets can talk – interprocess (unix sockets) or across a network (network socket)You hear talk of "sockets" all the time, and perhaps you are wondering just what they are exactly. Well, they're this: a way to speak to other programs using standard Unix file descriptors.http://beej.us/guide/bgnet/output/html/multipage/theory.html
but wait wait – so sockets are holes you stuff data into! and it’s supposed to come out the other side! maybeif the even horizon doesn’t eat them
Connection-Oriented Protocols: These protocols require that a logical connection be established between two devices before transferring data. This is generally accomplished by following a specific set of rules that specify how a connection should be initiated, negotiated, managed and eventually terminated. Usually one device begins by sending a request to open a connection, and the other responds. They pass control information to determine if and how the connection should be set up. If this is successful, data is sent between the devices. When they are finished, the connection is broken. Connectionless Protocols: These protocols do not establish a connection between devices. As soon as a device has data to send to another, it just sends it.Host A sends a TCP SYNchronize packet to Host BHost B receives A's SYNHost B sends a SYNchronize-ACKnowledgementHost A receives B's SYN-ACKHost A sendsACKnowledgeHost B receives ACK. TCP socket connection is ESTABLISHED.- See more at: http://www.inetdaemon.com/tutorials/internet/tcp/3-way_handshake.shtml#sthash.qceUbEnp.dpuf
TCP is a full-duplex protocol, yet SMTP uses TCP in a half-duplex fashion. The client sends a command then stops and waits for the reply.HTTP pipelining and websockets are full duplexbut basically your transports can be any of these (well, tcp can never be simplex) depending on the implementation you’re using
So DARPA wanted sockets for unix – so they contract university of california at berkley to work on itthe Computer Systems Research Group (CSRG) at the University of California, Berkeley, which had a license for the source code of UNIX from AT&T's Bell Labs. Students doing operating systems research at the CSRG modified and extended UNIX, and the CSRG made several releases of the modified operating system beginning in 1978, with AT&T's blessingBecause this Berkeley Software Distribution (BSD) contained copyrighted AT&T UNIX source code it was only available to organizations with a source code license for UNIX from AT&T.Students and faculty at the CSRG audited the software code for the TCP/IP stack, removing all the AT&T intellectual property, and released it to the general public in 1988 as NET-1 under the BSD license.When it became apparent that the Berkeley CSRG would soon close, students and faculty at the CSRG began an effort to remove all the remaining AT&T code from the BSD and replace it with their own. This effort resulted in the public release of NET-2 in 1991, again under the BSD licenseBerkeley Software Design (BSDi) obtained the source for NET-2, and ported it to the Intel i386 computer architecture. BSDi then it.This drew the ire of AT&T, which did not agree with BSDi's claim that BSD/386 was free of AT&T IPAT&T's Unix System Laboratories subsidiary filed suit against BSDi in New Jersey in April 1990, a suit that was later amended to include The Regents of the University of California.NOTE: no one “won” or “lost” the case – it was SETTLEDsome files were not allowed to be distributed, others were to be always freesome copyright headers had to be changed
You’re welcome to take a look at the difference between bsd and posix sockets, but really they’re more minor than the differences between thewinsock implementation and bsd sockets ;)
There are several Internet socket types available:Datagram sockets, also known as connectionless sockets, which use User Datagram Protocol (UDP)Stream sockets, also known as connection-oriented sockets, which use Transmission Control Protocol (TCP) or Stream Control Transmission Protocol (SCTP).Raw sockets (or Raw IP sockets), typically available in routers and other network equipment. Here the transport layer is bypassed, and the packet headers are made accessible to the application.There are also non-Internet sockets, implemented over other transport protocols, such as Systems Network Architecture (SNA). See also Unix domain sockets (UDS), for internal inter-process communication.
Funny story about this – I needed to consume some data sent via udp for some testing – wrote a 3 line server in PHPpython guy had an “oh yeah, I forgot PHP can do that” moment, was quite amusing ;)
Streams are a huge underlying component of PHPStreams were introduced with PHP 4.3.0 – they are old, but underuse means they can have rough edges… so TEST TESTTESTBut they are more powerful then almost anything else you can useWhy is this better ?Lots and lots of data in small chunks lets you do large volumes without maxing out memory and cpu
All input and output comes into PHPIt gets pushed through a streams filterThen through the streams wrapperDuring this point the stream context is available for the filter and wrapper to useStreams themselves are the “objects” coming inWrappers are the “classes” defining how to deal with the stream
What is streamablebehavorior? We’ll get to that in a bitProtocol: set of rules which is used by computers to communicate with each other across a networkResource: A resource is a special variable, holding a reference to an external resourceTalk about resources in PHP and talk about general protocols, get a list from the audience of protocols they can name (yes http is a protocol)A socket is a special type of stream – pound this into their headsA socket is an endpoint of communication to which a name can be bound. A socket has a type and one associated process. Sockets were designed to implement the client-server model for interprocess communication where:Inphp , a wrapper ties the stream to the transport – so your http wrapper ties your PHP data to the http transport and tells it how to behave when reading and writing data
Internet Domain sockets expect a port number in addition to a target address. In the case of fsockopen() this is specified in a second parameter and therefore does not impact the formatting of transport URL. With stream_socket_client() and related functions as with traditional URLs however, the port number is specified as a suffix of the transport URL delimited by a colon. unix:// provides access to a socket stream connection in the Unix domain. udg:// provides an alternate transport to a Unix domain socket using the user datagram protocol. Unix domain sockets, unlike Internet domain sockets, do not expect a port number. In the case of fsockopen() the portno parameter should be set to 0.
Some quick notes:blocking is like sleeping – it’s synchronous – nothing is going to happen until it’s doneYou can use stream_set_blocking to get around this – which means reads and writes will fail instead of blockingyou have to check the values from fread and fwrite and if they’re zero, try again (send the data again)you’ll need internal buffering and feof has no meaningthis is buggy as hell under windows – particularly with processes – works really quite well with sockets – mixed results with streams\your mileage may varyYou can use stream_set_timeout – defaults to 60 seconds, after that sets “timed_out” in meta data and returns empty string/zerotimeouts are really only useful with one socketblocking is a PAIN to get working correctly but very useful when doing a lot of things at oncestream_select is also buggy on windows – especially with processes (the processes stuff with PHP on windows is …. icky)it does timeouts and blocking basically – tells you when what you want to do will NOT blockfeof does NOT MEAN CONNECTION CLOSEDit means either a read failed and the buffer is empty ORbuffer is empty and there is no data within the timeoutYou’re moving data across the black hole – do yourself a favor and do it in little chunks. Will make the world a better placeand while stream_get_meta_data has some awesome information don’t be poking at it, it’s for information purposes only
By default sockets are going to assume tcp – since that’s a pretty standard way of doing things. Notice that we have to do things the old fashioned way just for this simple http request – sticking our headers together, making sure stuff gets closed. However if you can’t use allow_url_fopen this is a way around ita dirty dirty way but – there you have itremember allow_url_fopen only stops “drive-by” hacking
The only mandatory argument is the specification of the socket you want to connect to, and it returns a resource on success or false on error.The socket specification is in the form of $protocol://$host:$port where protocol is one of the following:tcp, for communicating via TCP, which is used by almost all common internet protocols like HTTP, FTP, SMTP where reliability is needed.udpor unix, which connects to a Unix Socket, a special kind of network socket, which is internal to the operating system's network stack. Slightly more efficient, because no network interface is involved.
The Stream extension also provides a simple way to make socket servers with the stream_socket_server function.The function stream_socket_server, again, takes a socket specification as first argument, in the same format as the string passed to stream_socket_client.Running a server involves at least these things:Bind on a Socket, tells the operating system that we're interested in network packages arriving at the given network interface and port (= socket)Check if an incoming connection is available"Accept" the incoming connection (with stream_socket_accept).Send something useful back to the clientClose the connection, or let the client close itGo to (2)Whenwriting a server, youfirsthave to do an "Accept" operation on the serversocket. Thisisdone with the stream_socket_accept function. This function blocks until a client connects to the server, or the timeout runs out.
So this part of the talk will focus on writing code using the PHP sockets extensionsome things to rememberthis is a very raw wrapper around the C code, almost an ffi versionunfortunately it was written in the days before objects and the engine’s object store, so it uses resources instead(these are slower)
When using these functions, it is important to remember that while many of them have identical names to their C counterparts, they often have different declarations. Please be sure to read the descriptions to avoid confusionThe manual says to use @ - do not do thiseverthe proper thing to do is to have an error handler specifically for handling socket warnings. You can log them, write them to stderr, do any number of things but do NOT just suppress them ;)
use the man pages to figure out which constants work, their names and their valuesthe php manual isn’t really that helpful for this but remember underneath there is nothing but C code – you can look upvalues as neededbasically if you remove the socket_ prefix you have the name to look up of the BSD sockets apiexcept for a few “helper” methods
NOTE: for domain you have to specify ipv4or 6!! that means you may have issues with configurable hosts/ports excthis is made more annoying by the fact that gethostbyname that PHP implements into userland only supports ipv4 – doubltfaileand uses gethostbyname instead of getaddrinfo so you can check before you connect
the other more esoteric stream types are going to be completely dependant on your underlying os – which again can make the sockets extension a bit wobbly to useThis will call the underlying bsd/winsock socket() api and is basically identical to the underlying C calls – remember you can’t change this stuff after you set it, you’ll have to create a new socket insteadYou always have to do socket bind after you socket create – before you do anything else to the socket… yeah they should just be one call ;) – Leftover from C api specthat address needs to be an ip address in either ipv4 or 6 depending on what you specsocket create listen is a helpery thing – it calls create, bind, listen but only works for AF_INET on all local interfaces
If there are no pending connections, socket_accept() will block until a connection becomes present. If socket has been made non-blocking using socket_set_blocking() or socket_set_nonblock(), FALSE will be returned.The actual error code can be retrieved by calling socket_last_error(). This error code may be passed to socket_strerror() to get a textual explanation of the error.
this connects to a socket to communicate – unlike listen which binds to a port and waits for stuff to say hello, connect is used as the CLIENTso roughly listen == server and connect == client
socket_read() returns a zero length string ("") when there is no more data to read.It is perfectly valid for socket_write() to return zero which means no bytes have been written. Be sure to use the === operator to check for FALSE in case of an error.YES THIS IS SCREWED up- write and read should both return false on failure properly – oh well
are used to receive messages from a socket, and may be used to receive data on a socket whether or not it is connection-oriented.The recv() call is normally used only on a connected socket (see connect(2)) and is identical to recvfrom() with a NULL src_addr argument.The socket_recvfrom() function receives len bytes of data in buf from name on port port (if the socket is not of type AF_UNIX) using socket. socket_recvfrom() can be used to gather data from both connected and unconnected sockets. Additionally, one or more flags can be specified to modify the behaviour of the function.The name and port must be passed by reference. If the socket is not connection-oriented, name will be set to the internet protocol address of the remote host or the path to the UNIX socket. If the socket is connection-oriented, name is NULL. Additionally, the port will contain the port of the remote host in the case of an unconnected AF_INET or AF_INET6 socket.
The function socket_send() sends len bytes to the socket socket from buf.The function socket_sendto() sends len bytes from buf through the socket socket to the port at the address addr. are the send
if you are not actually connected, shutdown will fails with socket_error = 107, Transport endpoint is not connected. This is true for both TPC and UDP connection (which is suprising, UDP being a connectionless protocol)SHUTDOWN IS NOT GRACEFUL – you should make sure you’re done sending/receiving data first! then shutdown to make sure you’re not chopping data, and finally close it down
So far in this chapter, you've seen that select() can be used to detect when data is available to read from a socket. However, there are times when its useful to be able to call send(), recv(), connect(), accept(), etc without having to wait for the result.For example, let's say that you're writing a web browser. You try to connect to a web server, but the server isn't responding. When a user presses (or clicks) a stop button, you want the connect() API to stop trying to connect.With what you've learned so far, that can't be done. When you issue a call to connect(), your program doesn't regain control until either the connection is made, or an error occurs.The solution to this problem is called "non-blocking sockets".By default, TCP sockets are in "blocking" mode. For example, when you call recv() to read from a stream, control isn't returned to your program until at least one byte of data is read from the remote site. This process of waiting for data to appear is referred to as "blocking". The same is true for the write() API, the connect() API, etc. When you run them, the connection "blocks" until the operation is complete.Its possible to set a descriptor so that it is placed in "non-blocking" mode. When placed in non-blocking mode, you never wait for an operation to complete. This is an invaluable tool if you need to switch between many different connected sockets, and want to ensure that none of them cause the program to "lock up.”You should always try to use socket_select() without timeout. Your program should have nothing to do if there is no data available. Code that depends on timeouts is not usually portable and difficult to debug.No socket resource must be added to any set if you do not intend to check its result after the socket_select() call, and respond appropriately. After socket_select() returns, all socket resources in all arrays must be checked. Any socket resource that is available for writing must be written to, and any socket resource available for reading must be read from.If you read/write to a socket returns in the arrays be aware that they do not necessarily read/write the full amount of data you have requested. Be prepared to even only be able to read/write a single byte.It's common to most socket implementations that the only exception caught with the except array is out-of-bound data received on a socket.
select() was introduced in 4.2BSD Unix, released in August 1983.poll() was introduced in SVR3 Unix, released 1986. In Linux, the poll() system call was introduced in 2.1.23 (January 1997) while the poll() library call was introduced in libc 5.4.28 (May 1997)select() and poll() provide basically the same functionality. They only differ in the details:select() overwrites the fd_set variables whose pointers are passed in as arguments 2-4, telling it what to wait for. This makes a typical loop having to either have a backup copy of the variables, or even worse, do the loop to populate the bitmasks every time select() is to be called. poll() doesn't destroy the input data, so the same input array can be used over and over.poll() handles many file handles, like more than 1024 by default and without any particular work-arounds. Since select() uses bitmasks for file descriptor info with fixed size bitmasks it is much less convenient. On some operating systems like Solaris, you can compile for support with > 1024 file descriptors by changing the FD_SETSIZE define.poll offers somewhat more flavours of events to wait for, and to receive, although for most common networked cases they don't add a lot of valueDifferent timeout values. poll takes milliseonds, select takes a structtimeval pointer that offers microsecond resolution. In practise however, there probably isn't any difference that will matter.
This is pretty useful function – it allows you to do most of your apis with the easier to use streams functionality, but if you need to do something lower level in a special case, you can import the streamwhich takes the underlying socket from C that the stream created, and shoves it into a socket resource that the sockets extension can usethis can be great for things like “getsockname” “socketsetoption” and “socket_cmsg_space”
This is only covering a very small portion of what is possible
the rfc is new and there were multiple versions of the protocol before it finalizedif you have to support old browsers… this may not be what you’re looking forevery single new browser (IE10, FIrefox11, Chrome 16, Safari 6, Opera 12) support websockets protocol!!
Now we’ve taken away all that overhead of an HTTP request/response cycle with headersbut the persistence has actually added a new issueThere are some caveats for using websockets in PHP - #1 being the nature of how PHP is generally run in itself, a non-long running non event driven systemActually one of the BEST ways to solve this is a dedicated websockets server or using nginx’s really clever proxying supportIIS8 also has pretty good proxying support for websockets as well!apache – well there you’ll need to dig up some extensions for getting websocketsproxied through properly – there are a couple including ws_Tunnel and disconnect’swebsockets extension
do NOT invent your own websockets code. It’s a very hard thing, especially for someone who hasn’t done a lot of sockets programming to understand and performinstead use some of the ready made solutions for doing websockets right
The WebSocket Protocol is already built into modern browsers and provides bidirectional, low-latency message-based communication. However, as such, WebSocket it is quite low-level and only provides raw messaging.Modern Web applications often have a need for higher level messaging patterns such as Publish & Subscribe and Remote Procedure Calls.This is where The WebSocket Application Messaging Protocol (WAMP) enters. WAMP adds the higher level messaging patterns of RPC and PubSub to WebSocket - within one protocol.
UPDATE THIS WITH PROGRESS
I’m a freelance developer – doing primarily C and C++ dev but available for other stuffAlso do a lot of open sourceAurora Eos Rose is the handle I’ve had forever – greek and roman goddesses of the dawn and aurora rose from sleeping beauty
Socket programming with php
Back to the Basics
In the beginning
There was a process
and the process ran well
but it needed to talk to another process
“Use IPC and all shall be clear”
Inter Process Communication
But which do I need?
Are the processes on the same computer?
Does it need to support multiple OS types?
Communication needs to be one way or two way?
Layers of Fun
IP – forwards packets of data based on a destination
TCP – verifies the correct delivery of data from client to
server with error and lost data correction
Network Sockets – subroutines that provide TCP/IP (and
UDP and some other support) on most systems
The DOD section DARPA built ARPANET which ran
on TCP/IP and the protocols are maintained by IETF
TCP - Transmission Control Protocol
IP – Internet Protocol
UDP - User Datagram Protocol
What is a Stream?
Access input and output generically
Can write and read linearly
May or may not be seekable
Comes in chunks of data
How PHP Streams Work
Bidirectional network stream that speaks a protocol
Tells a network stream how to communicate
Tells a stream how to handle specific protocols and
Things to watch for!
feof means “connection_closed”?
huge reads or writes (think 8K)
stream_get_meta_data is READ ONLY
All the power of C apis, all the pain of C apis
Almost C, but not quite
Wrapper around BSD sockets api
Some of the declarations are different
You really need to know how sockets work
Like writing in C without the hassles
A core extension but NOT a default extension
Possibly noisy, use an error handler
Keep your man handy
C man page
domain (IPv4, IPv6, unix)
type – stream for tcp and dgram for udp is the usual
protocol – hint, you can pass 0 to get the “default”
protocol for the type
gives you a NEW socket resource to read off of, but you
can’t accept off the newly created socket resource it
address (ipv4 or 6 depending on what you set)
Read and Write
len (will truncate to this!)
type (stop on newlines or grab as binary)
socket_recv - usually connected socket
socket_recvfrom - from a specific address and port
socket_recvmsg – grabs a whole message (datagram’s
socket_shutdown – stop reading and writing
socket_close – all done, close it down
what – no poll?
Poll vs. Select
select - has existed for a great while and exists almost
poll - Not existing on older unixes nor in Windows
before Vista. Broken implementation in Mac OS X at
least up to 10.4
There’s a lot more!
READ THE C MAN PAGES!
TCP on bidirectional full-duplex acid
New and shiny
Take the place of ajax, polling, and other hackery
Allow cross origin communications
based on TCP
look like an “upgrade” request to HTTP clients/servers
reduce latency and overhead
full-duplex on a single socket
usable with proxies
allow polling and streaming without pain
persistent…. wait - persistent?
Hmmm - persistent
Websockets are persistent connections
Your $server_of_choice may not like this
newer nginx is smart about proxying them
apache has some websockets modules
IIS8 and higher has support as well
use a standalone php server
If using fastcgi/php-fpm, you’re going to eat processes