But I don’t know C!
1. compiled
2. strictly typed
3. php internals do “hard” stuff
4. copy and paste!
5. cairo, pecl_http, date, imagick, dio
lxr.php.net
Quick Setup Needs
1. compiler
2. sdk (on win, put this on FIRST)
3. tools
4. dependencies
5. code
phpize is your friend!
How to Compile
1. phpize
2. ./configure
3. make
4. make install
5. make test
configure might require –with-php-config
How to Play along
https://github.com/auroraeosrose/php-extensions-
code.git
git://github.com/auroraeosrose/php-extensions-
code.git
Every Other Extensions Talk
1. Set up configuration
2. Write a module definition
3. Learn about the PHP lifecycle
4. Learn about zvals
5. Add functions
6. ???
7. Profit!
Which do YOU
want?
Please be the solution, not the
problem
We need no more painful APIs
in PHP
Step 1. What, When, Why
1. Do something you can’t do in userland
2. Utilize a C library
3. Make slow code faster
Maybe you should just use ffi!
Step 2. How
1. Think about what the API should be
2. Look at the C APIs but don’t mimic them
3. Write an example of how you WANT it to work
4. Write more then one example!
Step 3. Scaffolding
1. This is copy and paste
1. config.m4 & config.w32
2. macros for version api changes if necessary
3. main module file (php_{$ext}.c)
4. main header file (php_{$ext}.h)
2. Compile it and test!
Scaffolding rules
1. make sure you name your files in a standard way
2. document! use a license header, proto statements
3. read the coding standards
http://lxr.php.net/xref/PHP_5_4/CODING_STANDARDS
4. FOLLOW THE CODING STANDARDS
5. use version control early on – github is easy!
Check if it works
php –d extension=myext.so –m
The scaffold extension should show up in the list
Make sure to
1. define a version constant
2. read and use PHP code standards
Step 4. Write the test
1. run-test.php
2. make test will magically have output
more at http://qa.php.net/write-test.php and docs
at http://qa.php.net/phpt_details.php
zend_parse_parameters code
php type c type
array or object a zval *
boolean b zend_bool
class C zend_class_entry *
double d double
callable f zend_fcall_info andzend_fcall_info_cache
array or HASH_OF(object) H HashTable*
array h HashTable*
integer l long (NOT INT)
integer L long with LONG_MAX, LONG_MIN limits
object o zval *
object of specific type O zval *, zend_class_entry
string (no null bytes) p char*, int
resource r zval *
string (possible null bytes) s char*, int
actual zval z zval *
actual zval Z zval**
zend_parse_parameters
code
type
variable args (any) * int, zval***
variable args (1 or more) + int, zval***
| anything after is optional, use defaults
/ use SEPARATE_ZVAL_IF_NOT_REF
doesn’t apply to b, l, and d ! C NULL for zval null
Complex Data
array_init() object_init()
add_(index|assoc)_long() add_property_long()
add_(index|assoc)_bool() add_property_bool()
add_property_string()
add_(index|assoc)_string()
Step 6. Make it shiny
1. namespaces
2. classes
3. methods
4. constants
Classes
1. name, parent, and flags
2. hashtables of methods, default methods, static
methods
3. hashtables of static properties, default
properties, and properties
4. object handlers
5. union of either file information, or internal
structures (for internal classes)
Lifecycle
PHP starts
MINIT – for each extension
RINIT – for each request
GINIT – for each thread
GSHUTDOWN – for each thread
RSHUTDOWN – for each request
MSHUTDOWN – for each extension
PHP stops
Global Variables (threads = evil)
1. in your header – use
ZEND_BEGIN|END_MODULE_GLOBALS
2. create the global access macro in your header
(copy and paste)
3. ZEND_DECLARE_MODULE_GLOBALS in
every file where you will use them
4. use the macro to access
COUNTER_G(basic_counter_value)); }
5. Create ginit/gshutdown functions if your globals
need initializing , etc
emalloc( )
• allocates the specified number of bytes
safe_emalloc()
• like emalloc but adds a special protection against overflows
efree( )
• releases the specified block of memory back to the system
estrdup( )
• allocate a buffer and copy the string into that buffer
estrndup( )
• same as estrdup when you already know the length of the string
ecalloc( )
• allocates the number of bytes and initializes them to zero
erealloc( )
• resizes the specified block of memory
https://wiki.php.net/internals/zend_mm
1. clean up what you emalloc (C level destructor)
2. read wiki on how to make them extendable!
Step 7. Document, PECL,
release
1. http://edit.php.net
2. http://svn.php.net/viewvc/phpdoc/en/trunk/reference/
3. PhD is awesomesauce
• http://doc.php.net/phd/
4. email pecl-dev@lists.php.net
1. who you are
2. what you wrote (with links to your code!)
3. why you think it should be in pecl
5. poke me (or other devs)
Stuff I didn’t talk about
1. resources (use custom objects instead)
2. ini entries (just DON’T)
3. threading and parallel processing
4. engine hooking
About Me
http://emsmith.net
https://joind.in/6337
auroraeosrose@gmail.com
IRC – freenode – auroraeosrose
#php-gtk #coapp and others
Questions?
HELP WITH DOCS!
http://edit.php.net
http://wiki.php.net/internals
I mentor PHP developers into becoming C developers – because other PHP devs have done that for meIf you want to move on and do more with extensions after this talk please talk to me – I also sit on freenode all day and answer questions and give feedback – always always want more code monkeys and fresh blood in any projectand looking at github extensions makes me sad – 99% suck
This is the #1 argument I hear – but really is not anything you need to worry aboutc98 people! you can use some c99 stuff but declarations MUST go at the top of blocsk (that’s really good coding standards anyway)just like phpiself turn all your errors on when compiling (-wall is your friend) and try to code cleanyou don’t have to “know C” anymore then you have to “know PHP” to write a wordpress pluginif you have the very basics of what to do it’s not hardPHP takes care of a lot of the heavy lifting – from how to parse parameters coming in and how to shove data into someplace going out…to how to do fancy objectsUnless you’re doing something REALLY evil (an opcode cache, changing the way the engine works, threading) most of this stuff has already been done for another extension, it’s just a matter of finding the code – compile and testbtw, all the code generators out there currently suck – most don’t do test generation, doc generation, or use the proper apis
This is the same and yet different on every system and we’re not going to go into it a whole lot – this should have been homework before you came ;)Basically you need a compiler – xcode on mac, vc(2008) express on windows, gcc something on linux – this is different for every systemYou also need the sdk and headers for your system – those will be put on with xcode, you’ll need the 6.1 windows sdk for windowsThen you need autotools – bison, and re2c to build PHPOn mac or linux you can use phpize to build extensions – You CAN do this on windows as well except that for some stupid reason the proper files are not shipped with windows binary builds (stupid)Then you need any depedencies (to build PHP as a base you need iconv, zlib, and libxml2 at a bare minimum)After you build PHP on windows you can use phpize (it will be generated for you) and makes life so much easier
make a note that windows does not need the ./ before configure and uses nmake instead of make (otherwise identical)Note that to use a specific PHP install use /full/path/to/phpize and –with-php-config=/usr/local/php5/bin/php-configNote this does shared extensions only – then again most of the extensions you’ll work on should probably be shared
I’m putting parts of this code onto my github account – each section I talk about – scaffolding, adding a function, ext, will have a different branch – feel free to clone and playThese aren’t necessarily complicated extensions but they do follow PHP CS, have all the copy and paste you need and do compile (I think)
If you’ve ever seen another PHP extensions talk they dive into how to use the zend engine and how things work and suchPHP is to C as wordpress is to PHP – makes any idiot able to write a plugin for it ;)you don’t necessarily HAVE to know how the internals work – you just have to know the right calls to make!So although I might tell you about SOME of the internals how and why, for the most part this will be “how to do it” not “why you do it”Because PHP is a ball of rusty nails ;)
In addition this is the absolutely wrong way to approach writing PHP extensions!Actually it’s the wrong way to approach writing LOTS of thingsdo you start out writing a website worrying about how to make a db connection before you’ve designed the database and created the schema? do you create the schema before you have wireframes? just because we’re making a different product doesn’t mean the rules change
this is why you really need to think about what you’re doing – being able to use the extension is not really enough… just wrapping a C api is a good way to give yourselves headaches of a horrible nature
so these are the reasons you’d write a PHP extension – you want to use a C library, you want to make something process intensive faster, you want to hook into the engine itselfthere aren’t other reasons to do an extension
before you get into extension writing – if you just want an ffi wrapper and are just going to call the exact C calls from an existing library why go to the trouble of writing an extension?ffi is pretty great but a bit of a flakey extension yet, but it’s identical to python’s “ctypes” which is a stupid name, it’s really ffiI hear al lthe time about how “great” python is because of ctypes, frankly I beg to differ. Part of wrappign a C extension is translating the C calls into something far more “phpish”
C++ APIs or similar wrappers in python, perl, ruby, etc are a better place to look for api ideas – try a couple of ways of using the apis as wellreally think about what kinds of apis and code you’re looking for – so I was looking at wrapping a small library (uriparser.sourceforge.net) and one of the things I did first was sit down and write some rough almost pseudo-code of what the extension might look like
uriparser.sourceforge.net – a simple uri parser library that strictly conforms to rfchow would thibgs work?do I want namespaces? will this be an object oriented api or a functional api? do I want to do the extra work of a dual api? (such as date or mysqli?) these decisions are usually pretty easy to figure out once you actually try to write the kind of code you want to work
PHP is 90% copy and paste, the main copy and paste stuff is all scaffolding for an extensionso we’ve figure out what we want to write and how we want to write it, now we need to have scaffolding in place for ittalk about our 4 files, the scaffold example - show the files quickly to show what normally goes in a header, some PHP CS rules, et al
DO BOTH unless you’re absolutely certain that whatever you’re writing will only work on one system or the otherm4 is autotools stuff, but the w32 are very easy to write as well, they’re just jscript (windows javascript dialect) with some special functionsfor things like checking headers and adding libs, etc
this is the very basic items you can have in your struct to define what you module is and what it doesSTANDARD_MODULE_HEADER_EX allows you to add ini entries and depedencies to your module definitionusing STANDARD_MODULE_PROPERTIES_EX allows you to add globalswhat the globals are, a ginit and a gshutdown – plus a “post deactivate” method that basically never gets used ;)
Just some general rules for you to take a look ateven if your extension is not going into PHP proper the more time you take to properly document your code the happier you will be
how to see if what you’ve been playing with will actually workThis should load your boring extension without doing anything fancy, just allow it to show up in the modules listyou can actually do a test case for this with extension_loaded or get_loaded_modules if you’re so inclined
everybody should ALWAYS test their extensions – test testtestturn debugging on, turn memory leaks on - run your testsif they fail you’ll get a bunch of extra files from the tests system available that make things very easy to debug, a diff file, .out file, even an .shfile to run the test exactly like it was run the first timeno .bat file yet… yes I have a patch for that
These are very basic functional style teststhere are a bunch of different sections for each one – the important thing to remember is always include a skipifanything in “tests” will be run so it’s ok to put things in directories for orderwrite tests for everything you can think of! this is my standard test to check my phpinfo (MINFO) function
So we’re going to actually write a function for our uriparser library that does something extremely simple – we’re going to register a function to tell us the uriparser library version
So basically with an extension (especially one wrapping a library) what you’re doingis taking C variables and exposing them to php land by turning them into zvalsSo start by thinking about what PHP offers as variable typeslong lval; /* long value */309 double dval; /* double value */310 struct {311 char *val;312 intlen;313 } str;314 HashTable *ht; /* hash table value */zend_object_valueobj;so a “zval” in PHP has a union that can have the value of what is inside – something you’ll need to do when you create your extension is figure out how to “translate” that to a PHP type and how to handle it
you need to do three things to add a function to your extensiondefine it’s argumentsdefine it’s bodyput it inside a function structarginfo must have the begin and end arginfothen you add additional arguments as desired in betweenif you have optional arguments, use begin_arginfo_exyou can send by val or reference or do typehinting with arginfo as well
Our zval is a giant struct of doomfor those of you who don’t know what a union is, A union, is a collection of variables of different types, just like a structure. However, with unions, you can only store information in one field at any one time. You can picture a union as like a chunk of memory that is used to store variables of different types. Once a new value is assigned to a field, the existing data is wiped over with the new data. This is what everything you do in C goes into!
p is for pointer (that’s good enough for me)these are helper macros to get the data OUT of that union of doom that you need, it’s also how you get zval types and other fun stuffnote there are even MORE then this, but these are the ones you’ll use most oftenPHP is governed by macros and more macros, which is why it’s important to look at existing PHP extensions since 95% of the macros are NOT documented
so you can also do “zend_parse_parameters_ex” the only thing you can do it with is QUIETthere are a couple of reasons for doing this – mostly nesting and overloaded calls
This is macro hell – PHP is macro hellevery PHP function (or method) has a return value available to you – an automatically NULL zval that you need ot manipulate in order to give information backit’s called return_valueyou manipulate these return value zvals to get data back into PHP userland
When you start to want to return complex data things get a little more complicated – this is where array and object init come inthere are also ways to array_init with a specific length or object_init_ex which allows you to choose the type of object you’re creatingthere are a bunch more of thesezend_API.h is your lifeblood for how to use these things, along with
Now we move on to more complex parts of the code, methods that really DO somethingremember we should start by writing tests for it
the struct for a zend_class_entry is SOOOO big I can’t show it on one slide ;)however it’s almost entirely pointers to other stuffstruct _zend_class_entry {463 char type;464 const char *name;465 zend_uintname_length;466 struct _zend_class_entry *parent;467 intrefcount;468 zend_uintce_flags;469470 HashTablefunction_table;471 HashTableproperties_info;472 zval **default_properties_table;473 zval **default_static_members_table;474 zval **static_members_table;475 HashTableconstants_table;476 intdefault_properties_count;477 intdefault_static_members_count;478479 union _zend_function *constructor;480 union _zend_function *destructor;481 union _zend_function *clone;482 union _zend_function *__get;483 union _zend_function *__set;484 union _zend_function *__unset;485 union _zend_function *__isset;486 union _zend_function *__call;487 union _zend_function *__callstatic;488 union _zend_function *__tostring;489 union _zend_function *serialize_func;490 union _zend_function *unserialize_func;491492 zend_class_iterator_funcsiterator_funcs;493494 /* handlers */495 zend_object_value (*create_object)(zend_class_entry *class_type TSRMLS_DC);496 zend_object_iterator *(*get_iterator)(zend_class_entry *ce, zval *object, intby_ref TSRMLS_DC);497 int (*interface_gets_implemented)(zend_class_entry *iface, zend_class_entry *class_type TSRMLS_DC); /* a class implements this interface */498 union _zend_function *(*get_static_method)(zend_class_entry *ce, char* method, intmethod_len TSRMLS_DC);499500 /* serializer callbacks */501 int (*serialize)(zval *object, unsigned char **buffer, zend_uint *buf_len, zend_serialize_data *data TSRMLS_DC);502 int (*unserialize)(zval **object, zend_class_entry *ce, const unsigned char *buf, zend_uintbuf_len, zend_unserialize_data *data TSRMLS_DC);503504 zend_class_entry **interfaces;505 zend_uintnum_interfaces;506507 zend_class_entry **traits;508 zend_uintnum_traits;509 zend_trait_alias **trait_aliases;510 zend_trait_precedence **trait_precedences;511512 union {513 struct {514 const char *filename;515 zend_uintline_start;516 zend_uintline_end;517 const char *doc_comment;518 zend_uintdoc_comment_len;519 } user;520 struct {521 conststruct _zend_function_entry *builtin_functions;522 struct _zend_module_entry *module;523 } internal;524 } info;525};
TO do a class – create and register it’s CE in minitput it in another file – will keep yourself sane instead of having 300 mile long filessome people whine that this makes it harder to use static stuff, since each file is compiled into it’s own “unit” and then linkedbut modern compilers and linkers make the overhead on this moot – better to be organized!
so this is how you would do your minit function – this would go into php_uriparser.cwhere the other code displayed would be in uri.c – where all our code for our uri class is goingit is generally a good idea to keep arginfo and methods to gether as well in the code, so when one changes you remember to change the otherremember stuff must be declared in order to be used too!the namespace declaration and the PHP_MINIT declaration go into the header fileI often do a private header file for things like that – or do an api file if I allow things in my extension to be used by say other extensions
so this is the lifecycle of PHPeach of these items has a different important role to playwhat you need to remember is that minit and mshutdown are called in the opposite order that they are loaded, but you can’t depend on other extensions being around unless you have registered deps (remember way back in the main struct when we talked about deps?
There are multiple declarations to help with this – the ZEND_STRS is a helper macro – again macro hell
Class constants are another thing that’s pretty easy to do – global constants are evil, do NOT do
Methods have several components but look very close to functions that we did earlier. There is a getThis which is available inside that will give you the zval for the class
getTHis is the way you get “this” – notice that no matter how you declare your properties in MINIT, you can “juggle” the vale around depending on what you set it to
these are some more advanced topics
A PHP extension's globals are more properly called the "extension state", since most modules must remember what they're doing between function calls. The "counter" extension is a perfect example of this need: The basic interface calls for a counter with a persistent value. A programmer new to Zend and PHP might do something like this in counter.c to store that value:
efree, emalloc, ecalloc – you can find good documentation on these manual/en/internals2.memory.management.phpzend is taking care of all the memory stuff for you – allocating a larger chunk, cleaning up at the end of the request (but still, efree dammit)Often, memory needs to be allocated for longer than the duration of a single request. These types of allocations, called persistent allocations because they persist beyond the end of a request, could be performed using the traditional memory allocators because these do not add the additional per-request information used by ZendMM. Sometimes, however, it's not known until runtime whether a particular allocation will need to be persistent or not, so ZendMM exports a set of helper macros that act just like the other memory allocation functions, but have an additional parameter at the end to indicate persistence.If you genuinely want a persistent allocation, this parameter should be set to one, in which case the request will be passed through to the traditional malloc() family of allocators. If runtime logic has determined that this block does not need to be persistent however, this parameter may be set to zero, and the call will be channeled to the perrequest memory allocator functions.
The new resources – generally if you need to pass around some kind of internal C pointer stuff, this is the way you do itso we define a struct that holds our boring zend_object AND any additional C magic thingies we needwe’ll need a custom constructor for this – a C level constructor, that takes care of setting everything up, etcthere’s a bunch more abo
you’ll do a lot of zend_fetch_object stuff to pick this stuff
these are all “magic” methods you can plug into at the C level (note that you can also provide standard __get, __set etc, but extending classes can override them)you can even do casting magic, closure magic, even meddle with gc stuff – but most people only really do the clone, debug info (for var_dump), and get/set property stuff
Why do you need to care about this? because of mod_apache!!!do you really need ot know how this works? nowhat DO You need to know – pass this stuff around, never use TSRMLS_FETCH – that is the evilest macro ever!PHP takes care of all of this for you – this is normally the HARD stuffrule of thumb #1 – stick an “e” in front of any memory stuff (emalloc, efree vs. malloc, free)Most of the time you won’t need to do much memory management, you’ll be using zend_parse_parameter stuff and zval manipulation which will do all of that for you – you MIGHT need it for whatever library your wrapping, but that can be very individualizedTURN ON DEBUG, turn on crt debug (windows), turn on memory leak detection, compile with all errors, use static analysis if you can (clang, /analyze for windows, etc)all PHP_METHOD and PHP_FUNCTIO nstuff you’ll do has TSRMLS stuff in there – when you write your own functions in C you call from elsewhere PASS THAT DAMN THING ALONG – TSRMLS_FETCH should almost NEVER be used (unless you’re writing a threading extension and then – well good luck with that)PHP has a special way of doing globals in an extension that abstracts away the “owie” parts, USE IT – unless you need truly global globals and then you damn well better be locking them! (and you’re doing something crazy at that point)
Documentation is probably the most important part of the process, but I’ll still leave it for lastIf you’ve been writing examples and tests and protos and arginfo all along as I’ve whined about, you’re already 90 percent of the way there with docs, seriouslythe easiest way to do this would be to use a generator, however there aren’t any GOOD generators available right nowso just go to svn and copy another extensions stuff – everything is created with PhD so you can test stufffor pecl you can put your code anywhere, for phpdoc you need to get your docs roughed out then ask them to be put in svn
Go into here about how resources work (an integer pointer to a global table holding resources… or doing some other evil resource storage – basically you’re storing and throwing about pointers and they’re slow (DO NOT)ini entries are a horrible thing – the use cases are extremely smallthreading and such I can rant about for quite a bitand finally you can override parts of the engine from extensions – this is why xdebug and things like intercept work but they’re a bit beyond the scope of this talk
There is SOOO much more you can do from hooking objects to hooking the engine!
There is SOOO much more you can do from hooking objects to hooking the engine!
remember that a lot of the information on the internet and looking at extensions is out of date – these are some stuff that can help you, but your own experiences blogged would be great as wellI’m always sitting around in IRC willing to help out with things