Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

common mistakes when using libcurl

Daniel Stenberg discusses some of the most common mistakes users are doing when using libcurl and what to do about them.

  • Login to see the comments

common mistakes when using libcurl

  1. 1. May 7, 2020May 7, 2020 Common mistakes when Common mistakes when using libcurl - and how to fix them! using libcurl - and how to fix them!
  2. 2. Daniel Stenberg @bagder
  3. 3. Daniel Stenberg @bagder
  4. 4. common libcurl mistakes Documentation HTTP method CURLOPT_NOSIGNAL Return codes Certificate checks -DCURL_STATICLIB Verbose option Zero termination Set the URL curl_global_init C++ strings callback invokes Redirects Threading C++ methods @bagder@bagder
  5. 5. Q&A in the end!Q&A in the end!
  6. 6. Why are these mistakes made? Humans are lazy Copy and pasted from questionable sources Documentation is hard Internet transfers are complicated Maybe, just maybe, the curl way isn’t always the smartest... @bagder@bagder
  7. 7. 11 @bagder@bagder
  8. 8. Skipping the documentationSkipping the documentation Lots of options have plain English names Might trick you think you know what it does Still might not work like you presume it does Copy and paste from random web sites There are also details The devil is always in the details @bagder@bagder
  9. 9. Lots of documentationLots of documentation We offer man pages for every setopt option We host over 100 stand-alone examples Consider which docs you rely on (hello @bagder@bagder
  10. 10. @bagder@bagder 22
  11. 11. Failure to check return codesFailure to check return codes @bagder@bagder
  12. 12. Return codes areReturn codes are usefuluseful cluesclues How to know if the call succeeded? How to know why something doesn’t do what you expected? What if the feature isn’t even built-in? Our example source codes might be bad examples @bagder@bagder
  13. 13. @bagder@bagder 33
  14. 14. Forgetting the verbose option Strange, how come it doesn’t work? Hm, why does it act like this? Also: /* please be verbose */ rc = curl_easy_setopt(hnd, CURLOPT_VERBOSE, 1L); /* provide a buffer to store errors in */ curl_easy_setopt(curl, CURLOPT_ERRORBUFFER, errbuf); @bagder@bagder
  15. 15. libcurl or content? By using verbose, you’ll spot if this was libcurl that said it or if this was actual content delivered from the server! $ ./app Error 505: HTTP Version Not Supported
  16. 16. Maybe even in production? Consider it for debug options Direct the output somewhere suitable with CURLOPT_STDERR Alternatively: CURLOPT_DEBUGFUNCTION @bagder@bagder
  17. 17. 44 @bagder@bagder
  18. 18. There's a global init function It is called implicitly by curl_easy_perform() if not done explicitly Not calling it means relying on default, implicit behavior It typically then implies not calling curl_global_cleanup() This may result in not releasing all used memory (“Dear sirs, why does valgrind report that...”) @bagder@bagder
  19. 19. curl_global_init isn't thread-safe curl_global_init needs to be called as a singleton It is not thread-safe due to legacy and “reasons” Will hopefully be rectified in a near future @bagder@bagder
  20. 20. There's a global init function! Call curl_global_init first Alone! Call curl_global_cleanup last @bagder@bagder
  21. 21. 55 @bagder@bagder
  22. 22. Consider the redirects! HTTP/1.1 301 Moved Permanently Server: M4gic server/3000 Retry-After: 0 Location: Content-Length: 0 Accept-Ranges: bytes Date: Thu, 07 May 2020 08:59:56 GMT Connection: close @bagder@bagder
  23. 23. Consider the redirects! Rethink if redirect-following is good Limit what protocols to allow redirects Do not set custom HTTP methods on requests that follow redirects @bagder@bagder
  24. 24. 66 @bagder@bagder
  25. 25. Let users set (parts of) the URL Scheme (maybe even use another protocol?) Host name (maybe target a malicious server) Extreme lengths (pass in 2GB of data?) Also consider other inputs: user name, password etc risk getting abused @bagder@bagder
  26. 26. Limit scope! Set CURLOPT_PROTOCOLS! Whitelist/filter Set only a limited part of the URL @bagder@bagder
  27. 27. 77 @bagder@bagder
  28. 28. Setting the HTTP method CURLOPT_CUSTOMREQUEST is a footgun will be used in follow-up requests as well in redirects Does not change libcurl's behavior @bagder@bagder
  29. 29. 88 @bagder@bagder
  30. 30. Disabled certificate checks Widely abused and misunderstood Only use while experimenting / developing Never ship in production This also goes for HTTPS proxies SCP and SFTP is different curl_easy_setopt(curl, CURLOPT_SSL_VERIFYHOST, 0L); curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, 0L); @bagder@bagder
  31. 31. Verify server certificates! Avoid man-in-the-middle attacks HTTPS is not secure without it! May require regularly updating the CA store Alternative: CURLOPT_PINNEDPUBLICKEY @bagder@bagder
  32. 32. 99 @bagder@bagder
  33. 33. Assume zero terminated data in callbacks CURLOPT_WRITEFUNCTION and CURLOPT_HEADERFUNCTION set callbacks Libcurl provide data to the application using these callbacks The data is provided as a pointer to the data and length of that data When that data is primarily text oriented, many users wrongly assume that this means the data comes as zero terminated “strings”. size_t write_callback(char *dataptr, size_t size, size_t nmemb, void *userp); @bagder@bagder
  34. 34. Typical mistake size_t cb(char *dataptr, size_t size, size_t nmemb, void *userp) { printf(“Incoming data: %sn”, dataptr); if(!strncmp(“Foo:”, dataptr, 4)) { ... } char *pos = strchr(dataptr, ‘n’); } @bagder@bagder
  35. 35. The callback data is binary The data isn’t text or “string” based printf(“%s”, ...), strcpy(), strlen() and similar will not work on this pointer! @bagder@bagder
  36. 36. 1010 @bagder@bagder
  37. 37. C++ strings are not C strings libcurl provides a C API C and C++ are similar C and C++ are also different! C++ users like their std::string types C++ Strings are not C strings curl_easy_setopt() takes a vararg... @bagder@bagder
  38. 38. C++ string bad code // Keep the URL as a C++ string object std::string url(""); // Pass it to curl curl_easy_setopt(curl, CURLOPT_URL, url); @bagder@bagder
  39. 39. C++ string good code // Keep the URL as a C++ string object std::string url(""); // Pass it to curl as a C string! curl_easy_setopt(curl, CURLOPT_URL, url.c_str()); @bagder@bagder
  40. 40. 1111 @bagder@bagder
  41. 41. Threading mistakes libcurl is thread-safe but there are caveats: 1) No concurrent use of handles 2) OpenSSL < 1.1.0 need mutex callbacks setup 3) curl_global_init is not thread-safe yet @bagder@bagder
  42. 42. 1212 @bagder@bagder
  43. 43. Understanding CURLOPT_NOSIGNAL Signals is a unix-concept: “an asynchronous notification sent to a process or to a specific thread within the same process in order to notify it of an event that occurred” Signals are complicated in a multi-threaded world and when used by a library @bagder@bagder
  44. 44. What does libcurl use signals for? When using the synchronous name resolver, libcurl uses alarm() to abort slow name resolves (if a timeout is set), which ultimately sends a SIGALARM to the process and is caught by libcurl libcurl installs its own sighandler while running, and restores the original one again on return – for SIGALARM and SIGPIPE. Closing TLS (with OpenSSL) can trigger a SIGPIPE if the connection is dead. Unless CURLOPT_NOSIGNAL is set! @bagder@bagder
  45. 45. What does CURLOPT_NOSIGNAL do? It stops libcurl from triggering signals It prevents libcurl from installing its own sighandler Generated signals must then be handled by the libcurl- using application! @bagder@bagder
  46. 46. 1313 @bagder@bagder
  47. 47. Forgetting -DCURL_STATICLIB Creating and using libcurl statically is easy and convenient Seems especially popular on Windows Requires the CURL_STATICLIB define to be set when building your application! Omission causes linker errors: "unknown symbol __imp__curl_easy_init” Because Windows need __declspec to be present or absent in the headers depending on how it links! @bagder@bagder
  48. 48. Static builds mean chasing deps Libcurl can use many 3rd party dependencies When linking statically, all those need to be provided to the linker The curl build scripts (as well as your application linking) usually need manual help to find them all @bagder@bagder
  49. 49. 1414 @bagder@bagder
  50. 50. @bagder@bagder C++ methods (Sibling to the C++ strings mistake) C++ class methods look like functions C++ class methods cannot be used as callbacks with libcurl … since they assume a ‘this’ pointer to the current object Static member functions work!
  51. 51. @bagder@bagder A C++ method that works // f is the pointer to your object. static size_t YourClass::func(void *buffer, size_t sz, size_t n, void *f) { // Call non-static member function. static_cast<YourClass*>(f)->nonStaticFunction(); } // This is how you pass pointer to the static function: curl_easy_setopt(hcurl, CURLOPT_XFERINFOFUNCTION, YourClass::func); curl_easy_setopt(hcurl, CURLOPT_XEFRINFODATA, this);
  52. 52. 1515 @bagder@bagder
  53. 53. @bagder@bagder Write callback invokes Data is delivered by callback (CURLOPT_WRITEFUNCTION) It might be called none, one, two or many times Never assume you will get a certain amount of calls Independently of the data amount Because of network, server, kernel or other reasons
  54. 54. 54 You can help!You can help! @bagder@bagder
  55. 55. @bagder@bagder
  56. 56. Daniel Stenberg @bagder Thank you!Thank you! Questions?Questions? @bagder@bagder
  57. 57. License This presentation and its contents are licensed under the Creative Commons Attribution 4.0 license: @bagder@bagder