Successfully reported this slideshow.
Your SlideShare is downloading. ×

Php internal architecture

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
Taming the resource tiger
Taming the resource tiger
Loading in …3
×

Check these out next

1 of 52 Ad

More Related Content

Slideshows for you (20)

Advertisement

More from Elizabeth Smith (20)

Advertisement

Recently uploaded (20)

Php internal architecture

  1. 1. PHP Internal Architecture Pluggable, Extendable, Useable
  2. 2. Architecture PHP piece by piece
  3. 3. You should know the basics
  4. 4. All the puzzle pieces PHP Input/Output •SAPI •Streams Engine •Lexer •Parser •AST •Compiler •Executor Extensions •Zend Extensions •Compiled In •Loaded at startup •Loaded at runtime
  5. 5. Running PHP server makes request SAPI talks to engine engine runs SAPI returns output to server
  6. 6. How other languages do this Python (Cpython) • mod_python (embedded python interpreter, deprecated) • mod_wsgi (embedded or daemon) – basically a mod_python copy OR speaking to python interpreter with a special library installed via unix sockets) • command line interpreter • Fastcgi/cgi (using a library in python) Ruby (MRI) • also known as “CRuby” • Matz’s Ruby Interpreter • use Rack (library) to: • write/run a ruby webserver • use another server in between with hooks to nginx/apache (unicorn, passenger) • use FastCgi/Cgi
  7. 7. And still more.. NodeJS • Your app is your server • This is a pain • Write your own clustering or other neat features!! • So you stick a process manager in front • And you reverse proxy from apache/nginx • Or you use passenger or some other server…. Perl • Yes it still exists – shhh you in the back • PSGI + plack • mod_perl • mod_psgi
  8. 8. What makes PHP different? • Shared nothing architecture by design • application lifecycle is per-request • no shared state natively • infinite horizontal scalability in the language itself • HTTP is a first class citizen • You don’t need a library or framework • SAPI is a first class citizen • Designed to have a server in front of it • No library necessary • You don’t need a deployment tool to keep it all going
  9. 9. The answer to your question is
  10. 10. SAPI Server API – the least understood feature in PHP
  11. 11. What is a SAPI? • Tells a Server how to talk to PHP via an API • Server API • Server Application Programming Interface • “Server” is a bit broad as it means any type of Input/Output mechanism • SAPIS do: • input arguments • output, flushing, file descriptors, interruptions, system user info • input filtering and optionally headers, POST data, HTTP specific stuff • Handling a stream for the request body
  12. 12. In the beginning • CGI • Common gateway interface • Shim between web server and program • Simple • Stateless • Slow • Local • Good security with linux tools • Slow • Local • Programs can have too much access • Memory use not transparent (thrash and die!)
  13. 13. Then there was PHP in a Webserver • mod_php (apache2handler) • Run the language directly in the webserver, speaking to a webserver’s module api • Can access all of apache’s stuff • Webserver handles all the request stuff, no additional sockets/processes • It works well • Requires prefork MPM or thread safe PHP • Eats all your memories and never lets the system have it back • Makes apache children take more memory
  14. 14. CGI is slow: FastCGI to the rescue! • Persistent processes but CGI mad style • Biggest drawbacks? • “it’s old” • “I don’t like the protocol” • “it’s not maintained” • “other people say it’s not stable” • Apache fcgi modules do kind of suck  • Nginx “just works” • IIS8+ “just works”
  15. 15. php-fpm – Make FastCGI better • FastCGI Process Manager • Adds more features than traditional FastCGI • Better process management including graceful stop/start • Uid/gid/chroot/environment/port/ini configuration per worker • Better logging • Emergency restart • Accelerated upload support • Dynamic/static child spawning
  16. 16. CLI? • Yes, in PHP the CLI is a SAPI • (Did you know there’s a special windows cli that doesn’t pop a console window?) • PHP “overloads” the CLI to have a command line webserver for easier development (even though it SHOULD be on its own) • PHP did that because fighting with distros to always include the cli-server would have meant pain, and if you just grab php.exe the dev webserver is always available • The CLI treats console STDIN/STDOUT as its request/response
  17. 17. php-embed • A thin wrapper allowing PHP to be easily embedded via C • Used for extensions in node, python, ruby, and perl to interact with PHP • Corresponding extensions do exist for those languages embedded in PHP
  18. 18. phpdbg • Wait – there’s a debugger SAPI? • Yes, yes there is
  19. 19. litespeed • It is a SAPI • The server just went open source… • I’ve never tried it, but they take care of the SAPI 
  20. 20. Just connect to the app? • Use a webserver to reverse proxy to webserver built into a framework? • Smart to use a webserver that has already solved the hard stuff • But the app/web framework on top needs to deal with • HTTP keepalive? • Gzip with caching? • X-forwarded-for? Logging? Issues • Load balancing and failover? • HTTPS and caching? • ulimit? Remember we’re opening up a bunch of sockets!
  21. 21. Well, PHP streams can do that 
  22. 22. Streams Input and Output beyond the SAPI
  23. 23. What is a Stream? • Access input and output generically • Can write and read linearly • May or may not be seekable • Comes in chunks of data
  24. 24. How PHP Streams Work Stream Contexts Stream Wrapper Stream Filter ALL IO
  25. 25. Definitions • Socket • Bidirectional network stream that speaks a protocol • Transport • Tells a network stream how to communicate • Wrapper • Tells a stream how to handle specific protocols and encodings
  26. 26. Built in Socket Transports • tcp • udp • unix • udg • SSL extension • ssl • sslv2 • sslv3 • tls
  27. 27. You can write your own streams! • You can do a stream wrapper in userland and register it • But you need an extension to register them if they have a transport • Extensions with streams include ssh, bzip2, openssl • I’d really like the curl stream back (not with the compile flag, but curl://)
  28. 28. Welcome to the Engine Lexers and Parsers and Opcodes OH MY!
  29. 29. Lexer • checks PHP’s spelling • turns into tokens • see token_get_all for what PHP sees
  30. 30. Parser + AST • checks PHP’s grammar • E_PARSE means “bad phpish” • creates AST
  31. 31. Compiler • Turns AST into Opcodes • Allows for fancier grammar • Opcodes can then be cached (opcache) skipping lex/parse/compile cycle
  32. 32. Opcodes • dump with http://derickrethans.nl/projects.html • machine readable language which the runtime understands
  33. 33. Engine (Virtual Machine) • reads opcode • does something • zend extension can hook it! • ??? • PROFIT
  34. 34. Extensions How a simple design pattern made PHP more useful
  35. 35. “When I say that PHP is a ball of nails, basically, PHP is just this piece of shit that you just put all the parts together and you throw it against the wall and it fucking sticks” - Terry Chay
  36. 36. So what is an extension? • Written in C or C++ • Compiled statically into the PHP binary or as a shared object (so/dylib/dll) • Provides • Bindings to a C or C++ library • even embed other languages • Code in C instead of PHP (speed) • template engine • Alter engine functionality • debugging
  37. 37. So why an extension? • add functionality from other languages (mainly C) • speed • to infinity and beyond! • intercept the engine • add debugging • add threading capability • the impossible (see: operator)
  38. 38. About Extensions • Types • Zend Extension • PHP Module • Sources • Core Built in • Core Default • Core • PECL • Github and Other 3rd Party
  39. 39. – “We need to foster a greater sense of community for people writing PHP extensions, […] Quite what this means hasn't been decided, although one of the major responsibilities is to spark up some community spirit, and that is the purpose of this email.” - Wez Furlong, 2003
  40. 40. What is PECL? • PHP Extension Code Library • The place for people to find PHP extensions • No GPL code – license should be PHP license compatible (LGPL is ok but not encouraged) • http://news.php.net/article.php?group=php.pecl.dev&article=5
  41. 41. PECL Advantages • Code reviews • See https://wiki.php.net/internals/review_comments • Help from other devs with internal API changes (if in PHP source control) • https://svn.php.net/viewvc?view=revision&revision=297236 • Advertising and individual release cycles • http://pecl.php.net/news/ • pecl command line integration • actually just integration with PEAR installer (which support binaries/compiling) and unique pecl channel • php.net documentation!
  42. 42. PECL Problems • Has less oversight into code quality • peclqa? • not all source accessible • no action taken for abandoned code • still has “siberia” modules mixed with “need a maintainer” • never enough help • tests • bug triaging • maintainers • code reviews • docs! • no composer integration • Half the code in git, half in svn still, half… elsewhere …
  43. 43. “It’s really free as in pull request” - me
  44. 44. My extension didn’t make it faster! • PHP is usually not the real bottleneck • Do full stack profiling and benchmarking to see if PHP is the real bottleneck • If PHP IS the real bottleneck you’re awesome – and you need to be writing stuff in C or C++ • Most times your bottleneck is not PHP but I/O
  45. 45. What about other languages? • Ruby gem • Will compile and install • Node’s npm • Will compile and install • Perl’s CPAN • Written in special “xs” language • Will compile and install • Python • Mixed bag? Distutils can install or grab a binary
  46. 46. FFI Talk C without compiling
  47. 47. What is FFI? • Foreign Function Interface • Most things written in C use libffi • https://github.com/libffi/libffi
  48. 48. Who has FFI? • Java calls it JNI • HHVM calls it HNI • Python calls it “ctypes” (do not ask, stupidest name ever) • C# calls it P/Invoke • Ruby calls it FFI • Perl has Inline::C (a bit of a mess) • PHP calls it…
  49. 49. FFI
  50. 50. Oh wait… • PHP’s FFI is rather broken • PHP’s FFI has no maintainer • It needs some TLC • There’s MFFI but it’s not done • https://github.com/mgdm/MFFI • Are you interested and not afraid?
  51. 51. For the future? • More SAPIs? • Websockets • PSR-7 • Other ideas? • Fix server-tests.php so we can test SAPIs  • Only CGI and CLI are currently tested well • More extensions • Guidelines for extensions • Better documentation • Builds + pickle + composer integration
  52. 52. About Me  http://emsmith.net  auroraeosrose@gmail.com  twitter - @auroraeosrose  IRC – freenode – auroraeosrose  #phpmentoring  https://joind.in/talk/67433

Editor's Notes

  • PHP is an ever evolving piece of software that is useful and easily hacked without touching the core
    The features in PHP that make it so pluggable are not unique to PHP, other languages use them as well
    Understanding how PHP is structured and works in comparison to other languages can help you understand what your code is doing
    You don’t have to know C to see what’s going on “beneath the hood”
  • PHP has a pretty standard architecture for an interpreted language actually. We’ll take a look at how it compares to other similar languages
    By similar I mean – no compilation step
    So Ruby, Python, and NodeJS


    We’re not going to compare apples to oranges here – what use would that be? So don’t get me started on compiled languages (yes, that includes JAVA, C#, go and Rust, sorry)
  • It’s very important to understand the “big picture” of how your system fits together. You don’t have to know exactly how the parser works, but knowing you have a parser is important. It’s not that important to know how modules get loaded, but you should understand they exist and are loadable

    A lot of people make excuses that “you don’t know need to know how an internal combustion engine works to drive a car”

    Well, that’s bologna. I would hope you know that when you turn the key, it uses electricity in your battery (which needs to be working) to ignite the gasoline (which you also need enough of) in your car – you need to know the rules of the road and how to use the break

    No, you don’t need to know what the right mix ratio is for fuel to air in your cylinder
    But by George you better know that spark plugs make your engine go and they need to be replaced!

  • So what do all these pieces do?
    The sapi and the streams do input and output control, no matter what we’re using for input or for output
    The engine actually processes our PHP files – doing the work
    Extensions add functionality to PHP (if you ever want an adventure – don’t just do disable-all, try forcing all extensions off by fiddling with the .m4 files for date, pcre, standard, tokenizer, and a few others)
    You will have almost NO functionality in PHP itself

    Streams are a bit broken because although parts of them are integrated into PHP’s sapi layer, other parts are integrated into the engine and still other parts into the standard extension (ugh)
  • PHP is designed to be basic – the sapi says “run this script” and the engine runs it and returns output via the sapi
    That’s it
    The sapi gets to decide how things are handled

    There are ways of accessing input and output aside from our main sapi process, but they’re started from the main process
    They don’t exist independantly
  • PHP gets away with mod_php because of it’s concept of shared nothing

    Do you notice a trend yet?
    The rack specification and wsgi have a LOT in common
    They’re weirdly (and purposely) very similar to PSR-7
  • Ok, at this point things are getting a wee bit ridiculous
    Do you see the pattern? Yes, that’s right.

    There is no real difference between any of these things
    That’ s because they’re all solving the exact same issue
  • A lot of this has been reinterated for years, but these are the reasons that PHP IS different

    What’s interesting to me that these are often cited as DOWNSIDES to php, and the last one? Half the ruby and python developers I talk to do NOT understand what I’m talking about

    Problem is most of the PHP folks don’t either 
  • Why is this important? This IS The thing
    The choices you make depend on what you are doing!
    Do you want easy web dev and deployment?
    Do you want good horizontal scalability?
    Then PHP is your answer
    Do you have another question: well that’s your problem
  • PHP is designed in a way so that not only do we have a pluggable infrastructure for adding functionality, we have a pluggable infrastructure for speaking input and output
    This is usually the number one question people are asked when they have a bug … and half can’t answer
    SAPIS are (one of) the most underutilized features in the system, basically because few people know they exist, fewer can use them
    And their API is not as robust as it could be because it hasn’t HAD to be
  • So cgi has a lot of benefits actually
    Except for the fact that it’s slow as dirt
    Or molasses in july
  • Make sure to talk about what an mpm is in apache – a multi-processing model
  • Fastcgi really does work, and work well, mostly as advertised
  • Basically smacks a process manager on top of the fastcgi protocol, this is not necessarily needed for webservers like IIS which have features like worker pools, but for
    Nginx and apache raw fastcgi is … not fabulous. This adds a bunch of management features
  • Yes, the PHP cli is a sapi 
  • Yeah, so PHP actually has an http stream and code to help write a server in it that takes care of a LOT of hard stuff for you, just in case you really want to write your own appserver and then reverse proxy into it ;) IF YOU’RE CRAZY
  • Streams are pretty much the most awesomesauce part of the package
  • Streams are a huge underlying component of PHP

    Streams were introduced with PHP 4.3.0 – they are old, but underuse means they can have rough edges… so TEST TEST TEST
    But they are more powerful then almost anything else you can use



    Why is this better ?

    Lots and lots of data in small chunks lets you do large volumes without maxing out memory and cpu
  • All input and output comes into PHP

    It gets pushed through a streams filter

    Then through the streams wrapper

    During this point the stream context is available for the filter and wrapper to use

    Streams themselves are the “objects” coming in
    Wrappers are the “classes” defining how to deal with the stream
  • What is streamable behavorior? We’ll get to that in a bit

    Protocol: set of rules which is used by computers to communicate with each other across a network
    Resource: A resource is a special variable, holding a reference to an external resource

    Talk about resources in PHP and talk about general protocols, get a list from the audience of protocols they can name (yes http is a protocol)

    A socket is a special type of stream – pound this into their heads

    A socket is an endpoint of communication to which a name can be bound. A socket has a type and one associated process. Sockets were designed to implement the client-server model for interprocess communication where:


    In php , a wrapper ties the stream to the transport – so your http wrapper ties your PHP data to the http transport and tells it how to behave when reading and writing data
  • Internet Domain sockets expect a port number in addition to a target address. In the case of fsockopen() this is specified in a second parameter and therefore does not impact the formatting of transport URL. With stream_socket_client() and related functions as with traditional URLs however, the port number is specified as a suffix of the transport URL delimited by a colon.

    unix:// provides access to a socket stream connection in the Unix domain. udg:// provides an alternate transport to a Unix domain socket using the user datagram protocol.
    Unix domain sockets, unlike Internet domain sockets, do not expect a port number. In the case of fsockopen() the portno parameter should be set to 0.
  • Lexical Analysis

    Converts the source from a
    sequence of characters
    into a
    sequence of tokens
  • Syntax Analysis

    Analyzes a
    sequence of tokens
    to determine their
    grammaticalstructure
  • 5.6 and 7+
  • Generate
    bytecode
    based on the information gathered byanalyzing the
    sourcecode
  • so zend is actually a “virtual machine”
    it interprets OPCODES and does stuff with them
    reads each opcode and does a specific action – like a giant state machine
  • Extensions are the soul of what makes PHP great
    If you take away the extensions there’s not a lot left in PHP
    Sadly there isn’t a nice API to go with extensions, and that’s something PHP should address
  • PHP IS a glue language, it was not designed, it grew… and it was designed to be Cish and tie TIGHTLY to C without making the poor dev worry about the hard stuff!
    PHP has probably the sharpest delineation between a library (PHP code) and an extension (C code you have to compile) of any interpreted language out there. It is also arguably the most modular and extensible. This is not necessarily a bad thing, except that because so many people do PHP “on the cheap” they’re absolutely terrified of extensions!
    If C (or C++) can do it – PHP can be glued to it. The extension architecture introduced was probably one of the BEST decisions ever made for PHP! – And a bit of trivia – it didn’t exist before PHP 3.0
  • I lied in the slide here – you can bind to non C/C++ stuff too, there are some clever bindings to .NET managed assemblies and cocoa bindings and all kinds of evil – but those are not common use case and are hard to do
    extensions have to be compiled for the major.minor version you’re using – and there’s lots of different flags and such that can make binaries incompatible, this is one of the strengths and weaknesses of a compiled language, it’s optimized for what it’s meant to run on but doesn’t work at all elsewhere
  • Porting C libraries in pure PHP could be done – but why in the world would you do that!!
    Unless you’re doing something incredibly stupid – C code is going to be faster than PHP
    No matter what you do, there are parts of the engine you can’t touch – in userland Extensions do NOT have this limitation, you can do all kinds of evil. Yes there are extensions for threading, debugging, compiling php opcodes, intercepting function calls, profiling, you name it. C is a POWERFUL thing.
  • Two types of extensions
    zend extensions can dig right into the engine itself
    php module cannot
    A PHP extension can be both! (see xdebug)

    So what are “core default” extensions – these are the ones that are distributed with the PHP source code and are turned on by default – there are a few that are “always on” (as of 5.3) – PCRE, date, standard (duh), reflection and spl

    There are bunch that are generally turned “on by default” – PHP tries to build them even if you don’t flag them as on (not having the libs necessary will turn them off) – these include things like libxml2, zlib, iconv
    There are still a bunch more included in the core source – some are excellent and some – some suck
  • pickle or peh cul (depending on your side of the pond) is the ongoing argument for pronunciation – bottom line is the logo IS a pickle yo ;)

    Benefits:
    ▫▫code hosting and distribution (git! and github mirror!)
    ▫pecl install (PEAR packaging and installer support)
    ▫community
    ▫advertising

    Split off from PEAR in 2003
    •WezFurlong is “instigator”
    ▫see famous letter
    ▫http://news.php.net/article.php?group=php.pecl.dev&article=5
    •Many of the points raised still need work –mainly PECL needs manpower, bodies to do the work
  • I get all the time “PHP is so slow”
    If you’re connecting to a remote memcache server, and a remote redis server, and a remote webservice, and a remote database… even if all that code is fast you’re screwed – scale right, scale truly horizontal with localhost connections for everything you can get away with
    At the end of the day
  • FFI is a cool concept but the number one rule of FFI is… you can totally shoot yourself in the foot with it!
    The basic idea (and ruby and python use this extensively) is to “wrap” your C stuff with the FFI extension, and then
    Write python/ruby code to unstupid the APIS
  • So many languages support this idea of calling into usually C code
    Then they usually put a layer of regular code “on top” to make APIS non painful
    This can be useful
    it also tends to be slower
  • before you get into extension writing – if you just want an ffi wrapper and are just going to call the exact C calls from an existing library why go to the trouble of writing an extension?

    ffi is pretty great but a bit of a flakey extension yet, but it’s identical to python’s “ctypes” which is a stupid name, it’s really ffi

    I hear al lthe time about how “great” python is because of ctypes, frankly I beg to differ. Part of wrappign a C extension is translating the C calls into something far more “phpish”
  • oh ffi is cool and so needs a maintainer
  • Some of my wishlist of how I’d love to see you get involved 
  • Would love to do some code for evil on this 

×