Daniel Stenberg discusses some of the most common mistakes users are doing when using libcurl and what to do about them.
Video: https://youtu.be/0KfDdIAirSI
common libcurl mistakes
Documentation HTTP method CURLOPT_NOSIGNAL
Return codes Certificate checks -DCURL_STATICLIB
Verbose option Zero termination Set the URL
curl_global_init C++ strings callback invokes
Redirects Threading C++ methods
@bagder@bagder
Why are these mistakes made?
Humans are lazy
Copy and pasted from questionable sources
Documentation is hard
Internet transfers are complicated
Maybe, just maybe, the curl way isn’t always the smartest...
@bagder@bagder
Skipping the documentationSkipping the documentation
Lots of options have plain English names
Might trick you think you know what it does
Still might not work like you presume it does
Copy and paste from random web sites
There are also details
The devil is always in the details
@bagder@bagder
Lots of documentationLots of documentation
We offer man pages for every setopt option
We host over 100 stand-alone examples
Consider which docs you rely on (hello
stackoverflow.com)
@bagder@bagder
Failure to check return codesFailure to check return codes
@bagder@bagder
Return codes areReturn codes are usefuluseful cluesclues
How to know if the call succeeded?
How to know why something doesn’t do what you expected?
What if the feature isn’t even built-in?
Our example source codes might be bad examples
@bagder@bagder
Forgetting the verbose option
Strange, how come it doesn’t work?
Hm, why does it act like this?
Also:
/* please be verbose */
rc = curl_easy_setopt(hnd, CURLOPT_VERBOSE, 1L);
/* provide a buffer to store errors in */
curl_easy_setopt(curl, CURLOPT_ERRORBUFFER, errbuf);
@bagder@bagder
libcurl or content?
By using verbose, you’ll spot if this was libcurl that said it or if this
was actual content delivered from the server!
$ ./app
Error 505: HTTP Version Not Supported
Maybe even in production?
Consider it for debug options
Direct the output somewhere suitable with
CURLOPT_STDERR
Alternatively: CURLOPT_DEBUGFUNCTION
@bagder@bagder
There's a global init function
It is called implicitly by curl_easy_perform() if not done
explicitly
Not calling it means relying on default, implicit behavior
It typically then implies not calling curl_global_cleanup()
This may result in not releasing all used memory (“Dear sirs,
why does valgrind report that...”)
@bagder@bagder
Consider the redirects!
HTTP/1.1 301 Moved Permanently
Server: M4gic server/3000
Retry-After: 0
Location: https://curl.haxx.se/
Content-Length: 0
Accept-Ranges: bytes
Date: Thu, 07 May 2020 08:59:56 GMT
Connection: close
@bagder@bagder
Consider the redirects!
Rethink if redirect-following is good
Limit what protocols to allow redirects
Do not set custom HTTP methods on requests that follow
redirects
@bagder@bagder
Let users set (parts of) the URL
Scheme (maybe even use another protocol?)
Host name (maybe target a malicious server)
Extreme lengths (pass in 2GB of data?)
Also consider other inputs: user name, password etc risk
getting abused
@bagder@bagder
Setting the HTTP method
CURLOPT_CUSTOMREQUEST is a footgun
will be used in follow-up requests as well in
redirects
Does not change libcurl's behavior
@bagder@bagder
Disabled certificate checks
Widely abused and misunderstood
Only use while experimenting / developing
Never ship in production
This also goes for HTTPS proxies
SCP and SFTP is different
curl_easy_setopt(curl, CURLOPT_SSL_VERIFYHOST, 0L);
curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, 0L);
@bagder@bagder
Verify server certificates!
Avoid man-in-the-middle attacks
HTTPS is not secure without it!
May require regularly updating the CA store
Alternative: CURLOPT_PINNEDPUBLICKEY
@bagder@bagder
Assume zero terminated data in callbacks
CURLOPT_WRITEFUNCTION and CURLOPT_HEADERFUNCTION set
callbacks
Libcurl provide data to the application using these callbacks
The data is provided as a pointer to the data and length of that data
When that data is primarily text oriented, many users wrongly assume
that this means the data comes as zero terminated “strings”.
size_t write_callback(char *dataptr, size_t size, size_t nmemb, void *userp);
@bagder@bagder
The callback data is binary
The data isn’t text or “string” based
printf(“%s”, ...), strcpy(), strlen() and similar will not work
on this pointer!
@bagder@bagder
C++ strings are not C strings
libcurl provides a C API
C and C++ are similar
C and C++ are also different!
C++ users like their std::string types
C++ Strings are not C strings
curl_easy_setopt() takes a vararg...
@bagder@bagder
C++ string bad code
// Keep the URL as a C++ string object
std::string url("https://example.com/");
// Pass it to curl
curl_easy_setopt(curl, CURLOPT_URL, url);
@bagder@bagder
C++ string good code
// Keep the URL as a C++ string object
std::string url("https://example.com/");
// Pass it to curl as a C string!
curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
@bagder@bagder
Threading mistakes
libcurl is thread-safe but there are caveats:
1) No concurrent use of handles
2) OpenSSL < 1.1.0 need mutex callbacks setup
3) curl_global_init is not thread-safe
yet
@bagder@bagder
Understanding CURLOPT_NOSIGNAL
Signals is a unix-concept: “an asynchronous notification sent to a
process or to a specific thread within the same process in order to notify it of
an event that occurred”
Signals are complicated in a multi-threaded world and
when used by a library
@bagder@bagder
What does libcurl use signals for?
When using the synchronous name resolver, libcurl uses alarm()
to abort slow name resolves (if a timeout is set), which ultimately
sends a SIGALARM to the process and is caught by libcurl
libcurl installs its own sighandler while running, and restores the
original one again on return – for SIGALARM and SIGPIPE.
Closing TLS (with OpenSSL) can trigger a SIGPIPE if the connection
is dead.
Unless CURLOPT_NOSIGNAL is set!
@bagder@bagder
What does CURLOPT_NOSIGNAL do?
It stops libcurl from triggering signals
It prevents libcurl from installing its own sighandler
Generated signals must then be handled by the libcurl-
using application!
@bagder@bagder
Forgetting -DCURL_STATICLIB
Creating and using libcurl statically is easy and convenient
Seems especially popular on Windows
Requires the CURL_STATICLIB define to be set when building your
application!
Omission causes linker errors:
"unknown symbol __imp__curl_easy_init”
Because Windows need __declspec to be present or absent in the headers
depending on how it links!
@bagder@bagder
Static builds mean chasing deps
Libcurl can use many 3rd party dependencies
When linking statically, all those need to be provided to the linker
The curl build scripts (as well as your application linking) usually
need manual help to find them all
@bagder@bagder
@bagder@bagder
C++ methods
(Sibling to the C++ strings mistake)
C++ class methods look like functions
C++ class methods cannot be used as callbacks with
libcurl
… since they assume a ‘this’ pointer to the current object
Static member functions work!
@bagder@bagder
A C++ method that works
// f is the pointer to your object.
static size_t YourClass::func(void *buffer, size_t sz, size_t n, void *f)
{
// Call non-static member function.
static_cast<YourClass*>(f)->nonStaticFunction();
}
// This is how you pass pointer to the static function:
curl_easy_setopt(hcurl, CURLOPT_XFERINFOFUNCTION, YourClass::func);
curl_easy_setopt(hcurl, CURLOPT_XEFRINFODATA, this);
@bagder@bagder
Write callback invokes
Data is delivered by callback (CURLOPT_WRITEFUNCTION)
It might be called none, one, two or many times
Never assume you will get a certain amount of calls
Independently of the data amount
Because of network, server, kernel or other reasons
License
This presentation and its contents are
licensed under the Creative Commons
Attribution 4.0 license:
http://creativecommons.org/licenses/by/4.0/
@bagder@bagder