Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Parsing JSON Really Quickly: Lessons Learned

91 views

Published on

Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2PXv97g.

Daniel Lemire talks about the lessons learned while writing the fast JSON parser, simdjson. One of the most important lessons is the importance of a nearly obsessive focus on performance metrics - to constantly measure the impact of the choices. Filmed at qconsf.com.

Daniel Lemire is a computer science professor at the Université du Québec (TELUQ). He has written over 70 peer-reviewed publications, including more than 40 journal articles. He serves on the program committees of leading computer science conferences. During the 2016-2017 NSERC Discovery Grant competition, he received a rating of outstanding for the excellence of the researcher.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Parsing JSON Really Quickly: Lessons Learned

  1. 1. InfoQ.com: News & Community Site • Over 1,000,000 software developers, architects and CTOs read the site world- wide every month • 250,000 senior developers subscribe to our weekly newsletter • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • 2 dedicated podcast channels: The InfoQ Podcast, with a focus on Architecture and The Engineering Culture Podcast, with a focus on building • 96 deep dives on innovative topics packed as downloadable emags and minibooks • Over 40 new content items per week Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ simdjson-parser/
  2. 2. Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide Presented at QCon San Francisco www.qconsf.com
  3. 3. ParsingJSONReallyQuickly:LessonsLearned DanielLemire blog:https://lemire.me twitter:@lemire GitHub:https://github.com/lemire/ professor(ComputerScience)atUniversitéduQuébec(TÉLUQ) Montreal 2
  4. 4. Howfastcanyoureadalargefile? Areyoulimitedbyyourdiskor AreyoulimitedbyyourCPU? 3
  5. 5. AniMacdisk:2.2GB/s,FasterSSDs(e.g.,5GB/s) areavailable 4
  6. 6. Readingtextlines(CPUonly) ~0.6GB/son3.4GHzSkylakeinJava void parseLine(String s) { volume += s.length(); } void readString(StringReader data) { BufferedReader bf = new BufferedReader(data); bf.lines().forEach(s -> parseLine(s)); } Sourceavailable. ImprovedbyJDK-8229022 5
  7. 7. Readingtextlines(CPUonly) ~1.5GB/son3.4GHzSkylake inC++(GNUGCC8.3) size_t sum_line_lengths(char * data, size_t length) { std::stringstream is; is.rdbuf()->pubsetbuf(data, length); std::string line; size_t sumofalllinelengths{0}; while(getline(is, line)) { sumofalllinelengths += line.size(); } return sumofalllinelengths; } Sourceavailable. 6
  8. 8. source 7
  9. 9. JSON SpecifiedbyDouglasCrockford RFC7159byTimBrayin2013 Ubiquitousformattoexchangedata {"Image": {"Width": 800,"Height": 600, "Title": "View from 15th Floor", "Thumbnail": { "Url": "http://www.example.com/81989943", "Height": 125,"Width": 100} } 8
  10. 10. "Ourbackendspendshalfitstimeserializinganddeserializingjson" 9
  11. 11. JSONparsing Readallofthecontent CheckthatitisvalidJSON CheckUnicodeencoding Parsenumbers BuildDOM(document-object-model) Harderthanparsinglines? 10
  12. 12. JacksonJSONspeed(Java) twitter.json:0.35GB/son3.4GHzSkylake Sourcecodeavailable. speed Jackson(Java) 0.35GB/s readLinesC++ 1.5GB/s disk 2.2GB/s 11
  13. 13. RapidJSONspeed(C++) twitter.json:0.650GB/son3.4GHzSkylake speed RapidJSON(C++) 0.65GB/s Jackson(Java) 0.35GB/s readLinesC++ 1.5GB/s disk 2.2GB/s 12
  14. 14. simdjsonspeed(C++) twitter.json:2.4GB/son3.4GHzSkylake speed simdjson(C++) 2.4GB/s RapidJSON(C++) 0.65GB/s Jackson(Java) 0.35GB/s readLinesC++ 1.5GB/s disk 2.2GB/s 13
  15. 15. 2.4GB/sona3.4GHz(+turbo)processoris ~1.5cyclesperinputbyte 14
  16. 16. Trick#1:avoidhard-to-predictbranches 15
  17. 17. Writerandomnumbersonanarray. while (howmany != 0) { out[index] = random(); index += 1; howmany--; } e.g.,~3cyclesperiteration 16
  18. 18. Writeonlyoddrandomnumbers: while (howmany != 0) { val = random(); if( val is odd) { // <=== new out[index] = val; index += 1; } howmany--; } 17
  19. 19. From3cyclesto15cyclespervalue! 18
  20. 20. Gobranchless!while (howmany != 0) { val = random(); out[index] = val; index += (val bitand 1); howmany--; } backtounder4cycles! Detailsandcodeavailable 19
  21. 21. WhatifIkeeprunningthesamebenchmark? (samepseudo-randomintegersfromrun-to-run) 20
  22. 22. Trick#2:Usewide"words" Don'tprocessbytebybyte 21
  23. 23. Whenpossible,useSIMDAvailableonmostcommodityprocessors(ARM,x64) Originallyadded(Pentium)formultimedia(sound) Addwider(128-bit,256-bit,512-bit)registers Addsnewfuninstructions:do32tablelookupsatonce. 22
  24. 24. ISA where max.registerwidth ARMNEON(AArch64) mobilephones,tablets 128-bit SSE2...SSE4.2 legacyx64(Intel,AMD) 128-bit AVX,AVX2 mainstreamx64(Intel,AMD) 256-bit AVX-512 latestx64(Intel) 512-bit 23
  25. 25. "Intrinsic"functions(C,C++,Rust,...)mappingtospecificinstructionsonspecific instructionssets Higherlevelfunctions(Swift,C++,...):JavaVectorAPI Autovectorization("compilermagic")(Java,C,C++,...) Optimizedfunctions(someinJava) Assembly(e.g.,incrypto) 24
  26. 26. Trick#3:avoidmemory/objectallocation 25
  27. 27. Insimdjson,theDOM(document-object-model)isstoredononecontiguoustape. 26
  28. 28. Trick#4:measuretheperformance! benchmark-drivendevelopment 27
  29. 29. ContinuousIntegrationPerformancetests performanceregressionisabugthatshouldbespottedearly 28
  30. 30. Processorfrequenciesarenotconstant Especiallyonlaptops CPUcyclesdifferentfromtime TimecanbenoisierthanCPUcycles 29
  31. 31. Specificexamples 30
  32. 32. Example1.UTF-8StringsareASCII(1bytepercodepoint) Otherwisemultiplebytes(2,3or4) Only1.1MvalidUTF-8codepoints 31
  33. 33. ValidatingUTF-8withif/else/while if (byte1 < 0x80) { return true; // ASCII } if (byte1 < 0xE0) { if (byte1 < 0xC2 || byte2 > 0xBF) { return false; } } else if (byte1 < 0xF0) { // Three-byte form. if (byte2 > 0xBF || (byte1 == 0xE0 && byte2 < 0xA0) || (byte1 == 0xED && 0xA0 <= byte2) blablabla ) blablabla } else { // Four-byte form. .... blabla } 32
  34. 34. UsingSIMD Load32-byteregisters Use~20instructions Nobranch,nobranchmisprediction 33
  35. 35. Example:Verifythatallbytevaluesarenolargerthan244 Saturatedsubtraction: x - 244 isnon-zeroifanonlyif x > 244 . _mm256_subs_epu8(current_bytes, 244 ); Oneinstruction,checks32bytesatonce! 34
  36. 36. processingrandomUTF-8cycles/byte branching 11 simdjson 0.5 20xfaster! Sourcecodeavailable. 35
  37. 37. Example2.Classifyingcharacters comma(0x2c) , colon(0x3a) : brackets(0x5b,0x5d,0x7b,0x7d): [, ], {, } white-space(0x09,0x0a,0x0d,0x20) others Classify16,32or64charactersatonce! 36
  38. 38. Dividevaluesintotwo'nibbles' 0x2cis2(highnibble)andc(lownibble) Thereare16possiblelownibbles. Thereare16possiblehighnibbles. 37
  39. 39. ARMNEONandx64processorshaveinstructionsto lookup16-bytetablesinavectorizedmanner(16 valuesatatime):pshufb,tbl 38
  40. 40. Startwithanarrayof4-bitvalues [1,1,0,2,0,5,10,15,7,8,13,9,0,13,5,1] Createalookuptable [200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215] 0 200,1 201,2 202 Result: [201,201,200,202,200,205,210,215,207,208,213,209,200,213,205,201] 39
  41. 41. Findtwotables H1 and H2 suchasthebitwiseANDofthelookclassifythecharacters. H1(low(c)) & H2(high(c)) comma(0x2c):1 colon(0x3a):2 brackets(0x5b,0x5d,0x7b,0x7d):4 mostwhite-space(0x09,0x0a,0x0d):8 whitespace(0x20):16 others:0 40
  42. 42. const uint8x16_t low_nibble_mask = (uint8x16_t){16, 0, 0, 0, 0, 0, 0, 0, 0, 8, 12, 1, 2, 9, 0, 0}; const uint8x16_t high_nibble_mask = (uint8x16_t){8, 0, 18, 4, 0, 1, 0, 1, 0, 0, 0, 3, 2, 1, 0, 0}; const uint8x16_t low_nib_and_mask = vmovq_n_u8(0xf); Fiveinstructions: uint8x16_t nib_lo = vandq_u8(chunk, low_nib_and_mask); uint8x16_t nib_hi = vshrq_n_u8(chunk, 4); uint8x16_t shuf_lo = vqtbl1q_u8(low_nibble_mask, nib_lo); uint8x16_t shuf_hi = vqtbl1q_u8(high_nibble_mask, nib_hi); return vandq_u8(shuf_lo, shuf_hi); 41
  43. 43. Example3.Detectingescapedcharacters " " " " 42
  44. 44. Canyoutellwherethestringsstartandend? { ""Nam[{": [ 116,"" ... Withoutbranching? 43
  45. 45. Escapecharactersfollowanoddsequenceof backslashes! 44
  46. 46. Identifybackslashes: { ""Nam[{": [ 116,"" ___111________________1111_ :B Oddandevenpositions 1_1_1_1_1_1_1_1_1_1_1_1_1_1 :E(constant) _1_1_1_1_1_1_1_1_1_1_1_1_1_ :O(constant) 45
  47. 47. Doabunchofarithmeticandlogicaloperations... (((B + (B &~(B << 1)& E))& ~B)& ~E) | (((B + ((B &~(B << 1))& O))& ~B)& E) Result: { ""Nam[{": [ 116,"" ... ______1____________________ Nobranch! 46
  48. 48. Removetheescapedquotes,and theremainingquotestellyouwherethestringsare! 47
  49. 49. { ""Nam[{": [ 116,"" __1___1_____1________1____1 :allquotes ______1____________________ :escapedquotes __1_________1________1____1 :string-delimiterquotes 48
  50. 50. Findthespanofthestring mask = quote xor (quote << 1); mask = mask xor (mask << 2); mask = mask xor (mask << 4); mask = mask xor (mask << 8); mask = mask xor (mask << 16); ... __1_________1________1____1 (quotes) becomes __1111111111_________11111_ (stringregion) 49
  51. 51. EntirestructureoftheJSONdocumentcanbe identified(asabitset)withoutanybranch! 50
  52. 52. Example4.DecodebitindexesGiventhebitset 1000100010001 ,wewantthelocationofthe1s(e.g.,0,4,812) 51
  53. 53. while (word != 0) { result[i] = trailingzeroes(word); word = word & (word - 1); i++; } Ifnumberof1sper64-bitishardtopredict:lotsofmispredictions!!! 52
  54. 54. Insteadofpredictingthenumberof1sper64-bit,predictwhetheritisin {1,2,3,4} {5,6,7,8} {9,10,11,12} Easier! 53
  55. 55. Reducethenumberofmispredictionbydoingmoreworkperiteration: while (word != 0) { result[i] = trailingzeroes(word); word = word & (word - 1); result[i+1] = trailingzeroes(word); word = word & (word - 1); result[i+2] = trailingzeroes(word); word = word & (word - 1); result[i+3] = trailingzeroes(word); word = word & (word - 1); i+=4; } Discardbogusindexesbycountingthenumberof1sintheworddirectly(e.g., bitCount ) 54
  56. 56. Example5.Numberparsingisexpensive strtod : 90MB/s 38cyclesperbyte 10branchmissesperfloating-pointnumber 55
  57. 57. Checkwhetherwehave8consecutivedigits bool is_made_of_eight_digits_fast(const char *chars) { uint64_t val; memcpy(&val, chars, 8); return (((val & 0xF0F0F0F0F0F0F0F0) | (((val + 0x0606060606060606) & 0xF0F0F0F0F0F0F0F0) >> 4)) == 0x3333333333333333); } 56
  58. 58. Thenconstructthecorrespondinginteger Usingonlythreemultiplications(insteadof7): uint32_t parse_eight_digits_unrolled(const char *chars) { uint64_t val; memcpy(&val, chars, sizeof(uint64_t)); val = (val & 0x0F0F0F0F0F0F0F0F) * 2561 >> 8; val = (val & 0x00FF00FF00FF00FF) * 6553601 >> 16; return (val & 0x0000FFFF0000FFFF) * 42949672960001 >> 32; } CandoevenbetterwithSIMD 57
  59. 59. RuntimedispatchOnfirstcall,pointerchecksCPU,andreassignsitself.Nolanguagesupport. 58
  60. 60. int json_parse_dispatch(...) { Architecture best_implementation = find_best_supported_implementation(); // Selecting the best implementation switch (best_implementation) { case Architecture::HASWELL: json_parse_ptr = &json_parse_implementation<Architecture::HASWELL>; break; case Architecture::WESTMERE: json_parse_ptr= &json_parse_implementation<Architecture::WESTMERE>; break; default: return UNEXPECTED_ERROR; } return json_parse_ptr(....); } 59
  61. 61. Wheretogetit? GitHub:https://github.com/lemire/simdjson/ ModernC++,single-header(easyintegration) ARM(e.g.,iPhone),x64(goingback10years) Apache2.0(nohiddenpatents) UsedbyMicrosoftFishStoreandYandexClickHouse wrappersinPython,PHP,C#,Rust,JavaScript(node),Ruby portstoRust,GoandC# 60
  62. 62. Reference GeoffLangdale,DanielLemire,ParsingGigabytesofJSONperSecond,VLDB Journal,https://arxiv.org/abs/1902.08318 61
  63. 63. Credit GeoffLangdale(algorithmicarchitectandwizard) Contributors: ThomasNavennec,KaiWolf,TylerKennedy,FrankWessels,GeorgeFotopoulos,Heinz N.Gies,EmilGedda,WojciechMuła,GeorgiosFloros,DongXie,NanXiao,Egor Bogatov,JinxiWang,LuizFernandoPeres,WouterBolsterlee,AnishKarandikar,Reini Urban.TomDyson,IhorDotsenko,AlexeyMilovidov,ChangLiu,SunnyGleason,John Keiser,ZachBjornson,VitalyBaranov,JuhoLauri,MichaelEisel,IoDazaDillon,Paul Dreik,JérémiePiotteandothers 62
  64. 64. 63
  65. 65. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ simdjson-parser/

×