SlideShare a Scribd company logo
1 of 53
How to Write the Fastest JSON
Parser/Writer in the World
Milo Yip
Tencent
28 Mar 2015
Milo Yip 叶劲峰
• Expert Engineer (2011 to now)
– Engine Technology Center, R & D Department,
Interactive Entertainment Group (IEG), Tencent
• Master of Philosophy in System Engineering &
Engineering Management, CUHK
• Bachelor of Cognitive Science, HKU
• https://github.com/miloyip
• http://www.cnblogs.com/miloyip
• http://www.zhihu.com/people/miloyip
Table of Contents
1. Introduction
2. Benchmark
3. Design
4. Limitations
5. Thoughts
6. References
1. INTRODUCTION
JSON
• JavaScript Object Notation
• Alternative to XML
• Human-readable text to transmit/persist data
• RFC 7159/ECMA-404
• Common uses
– Open API (e.g. Twitter, Facebook, etc.)
– Data storage/exchange (e.g. GeoJSON)
RapidJSON
• https://github.com/miloyip/rapidjson
• MIT License
• C++ Header-only Library
• Started in Nov 2011
• Inspired by RapidXML
• Will release 1.0 under Tencent *soon*
Features
• Both SAX and DOM style API
• Fast
• Cross platform/compiler
• No dependencies
• Memory friendly
• UTF-8/16/32/ASCII and transcoding
• In-situ Parsing
• More at http://miloyip.github.io/rapidjson/md_doc_features.html
Hello RapidJSON!
#include "rapidjson/document.h"
#include "rapidjson/writer.h"
#include "rapidjson/stringbuffer.h"
#include <iostream>
using namespace rapidjson;
int main() {
// 1. Parse a JSON string into DOM.
const char* json = "{"project":"rapidjson","stars":10}";
Document d;
d.Parse(json);
// 2. Modify it by DOM.
Value& s = d["stars"];
s.SetInt(s.GetInt() + 1);
// 3. Stringify the DOM
StringBuffer buffer;
Writer<StringBuffer> writer(buffer);
d.Accept(writer);
// Output {"project":"rapidjson","stars":11}
std::cout << buffer.GetString() << std::endl;
return 0;
}
Fast, AND Reliable
• 103 Unit Tests
• Continuous Integration
– Travis on Linux
– AppVeyor on Windows
– Valgrind (Linux) for memory leak checking
• Use in real applications
– Use in client and server applications at Tencent
– A user reported parsing 50 million JSON daily
Public Projects using RapidJSON
• Cocos2D-X: Cross-Platform 2D Game Engine
http://cocos2d-x.org/
• Microsoft Bond: Cross-Platform Serialization
https://github.com/Microsoft/bond/
• Google Angle: OpenGL ES 2 for Windows
https://chromium.googlesource.com/angle/angle/
• CERN LHCb: Large Hadron Collider beauty
http://lhcb-comp.web.cern.ch/lhcb-comp/
• Tell me if you know more
2. BENCHMARK
Benchmarks for Native JSON libraries
• https://github.com/miloyip/nativejson-benchmark
• Compare 20 open source C/C++ JSON libraries
• Evaluate speed, memory and code size
• For parsing, stringify, traversal, and more
Libaries
• CAJUN
• Casablanca
• cJSON
• dropbox/json11
• FastJson
• gason
• jansson
• json-c
• json spirit
• Json Box
• JsonCpp
• JSON++
• parson
• picojson
• RapidJSON
• simplejson
• udp/json
• ujson4c
• vincenthz/libjson
• YAJL
Results: Parsing Speed
Results: Parsing Memory
Results: Stringify Speed
Results: Code Size
Benchmarks for Spine
• Spine is a 2D skeletal animation tool
• Spine-C is the official runtime in C
https://github.com/EsotericSoftware/spine-runtimes/tree/master/spine-c
• It uses JSON as data format
• It has a custom JSON parser
• Adapt RapidJSON and compare loading time
Test Data
• http://esotericsoftware.com/forum/viewtopic.php?f=3&t=2831
• Original 80KB JSON
• Interpolate to get
multiple JSON files
• Load 100 times
Results
3. DESIGN
The Zero Overhead Principle
• Bjarne Stroustrup[1]:
“What you don't use, you don't pay for.”
• RapidJSON tries to obey this principle
– SAX and DOM
– Combinable options, configurations
SAX
StartObject()
Key("hello", 5, true)
String("world", 5, true)
Key("t", 1, true)
Bool(true)
Key("f", 1, true)
Bool(false)
Key("n", 1, true)
Null()
Key("i")
UInt(123)
Key("pi")
Double(3.1416)
Key("a")
StartArray()
Uint(1)
Uint(2)
Uint(3)
Uint(4)
EndArray(4)
EndObject(7)
DOM
When parsing a JSON to DOM, use SAX events to build a DOM.
When stringify a DOM, traverse it and generate events to SAX.
{"hello":"world", "t":true, "f":false, "n":null,
"i":123, "pi":3.1416, "a":[1, 2, 3, 4]}
DOM
SAX
Architecture
Value
Document
Reader
Writer
<<concept>>
Handler
<<concept>>
Stream
<<concept>>
Encoding
<<concept>>
Allocator
calls
implements
implements
accepts
has
Handler: Template Parameter
• Handler handles SAX event callbacks
• How to implement callbacks?
– Traditional: virtual function
– RapidJSON: template parameter
template <unsigned parseFlags, typename InputStream, typename Handler>
ParseResult Reader::Parse(InputStream& is, Handler& handler);
• No virtual function overhead
• Inline callback functions
Parsing Options: Template Argument
• Many parse options -> Zero overhead principle
• Use integer template argument
template <unsigned parseFlags, typename InputStream, typename Handler>
ParseResult Reader::Parse(InputStream& is, Handler& handler);
if (parseFlags & kParseInsituFlag) {
// ...
}
else {
// ...
}
• Compiler optimization eliminates unused code
Recursive SAX Parser
• Simple to write/optimize by hand
• Use program stack to maintain parsing state of
the tree structure
• Prone to stack overflow
– So also provide an iterative parser
(Contributed by Don Ding @thebusytypist)
Normal Parsing
In situ Parsing
No allocation and copying for strings! Cache Friendly!
Parsing Number: the Pain ;(
• RapidJSON supports parsing JSON number to
uint32_t, int32_t, uint64_t, int64_t, double
• Difficult to detect in single pass
• Even more difficult for double (strtod() is slow)
• Implemented kFullPrecision option using
1. Fast-path
2. DIY-FP (https://github.com/floitsch/double-conversion)
3. Big Integer method [2]
How difficult?
• PHP Hangs On Numeric Value 2.2250738585072011e-308
http://www.exploringbinary.com/php-hangs-on-numeric-
value-2-2250738585072011e-308/
• Java Hangs When Converting 2.2250738585072012e-308
http://www.exploringbinary.com/java-hangs-when-
converting-2-2250738585072012e-308/
• "2.22507385850720113605740979670913197593481954635
164564e-308“ → 2.2250738585072009e-308
• "2.22507385850720113605740979670913197593481954635
164565e-308“→ 2.2250738585072014e-308
• And need to be fast…
DOM Designed for Fast Parsing
• A JSON value can be one of 6 types
– object, array, number, string, boolean, null
• Inheritance needs new for each value
• RapidJSON uses a single variant type Value
Layout of Value
String
Ch* str
SizeType length
unsigned flags
Number
int i unsigned u
int64_t i64 uint64_t u64 double d
0 0
unsigned flags
Object
Member* members
SizeType size
SizeType capacity
unsigned flags
Array
Value* values
SizeType size
SizeType capacity
unsigned flags
Move Semantics
• Deep copying object/array/string is slow
• RapidJSON enforces move semantics
The Default Allocator
• Internally allocates a single linked-list of
buffers
• Do not free objects (thus FAST!)
• Suitable for parsing (creating values
consecutively)
• Not suitable for DOM manipulation
Custom Initial Buffer
• User can provide a custom initial buffer
– For example, buffer on stack, scratch buffer
• The allocator use that buffer first until it is full
• Possible to archive zero allocation in parsing
Short String Optimization
• Many JSON keys are short
• Contributor @Kosta-Github submitted a PR to
optimize short strings
String
Ch* str
SizeType length
unsigned flags
ShortString
Ch str[11];
uint8_t x;
unsigned flags
Let length = 11 – x
So 11-char long string is ended with ‘0’
SIMD Optimization
• Using SSE2/SSE4 to skip whitespaces
(space, tab, LF, CR)
• Each iteration compare 16 chars × 4 chars
• Fast for JSON with indentation
• Visual C++ 2010 32-bit test:
strlen()
for ref.
strspn() RapidJSON
(no SIMD)
RapidJSON
(SSE2)
RapidJSON
(SSE4)
Skip 1M
whitespace
(ms)
752 3011 1349 170 102
Integer-to-String Optimization
• Integer-To-String conversion is simple
– E.g. 123 -> “123”
• But standard library is quite slow
– E.g. sprintf(), _itoa(), etc.
• Tried various implementations
My implementations
• https://github.com/miloyip/itoa-benchmark
• Visual C++ 2013 on Windows 64-bit
Double-to-String Optimziation
• Double-to-string conversion is very slow
– E.g. 3.14 -> “3.14”
• Grisu2 is a fast algorithm for this[3]
– 100% cases give correct results
– >99% cases give optimal results
• Google V8 has an implementation
– https://github.com/floitsch/double-conversion
– But not header-only, so…
My Grisu2 Implementation
• https://github.com/miloyip/dtoa-benchmark
• Visual C++ 2013 on Windows 64-bit:
4. LIMITATIONS
Tradeoff: User-Friendliness
• DOM only supports move semantics
– Cannot copy-construct Value/Document
– So, cannot pass them by value, put in containers
• DOM APIs needs allocator as parameter, e.g.
numbers.PushBack(1, allocator);
• User needs to concern life-cycle of allocator
and its allocated values
Pausing in Parsing
• Cannot pause in parsing and resume it later
– Not keeping all parsing states explicitly
– Doing so will be much slower
• Typical Scenario
– Streaming JSON from network
– Don’t want to store the JSON in memory
• Solution
– Parse in an separate thread
– Block the input stream to pause
5. THOUGHTS
Origin
• RapidJSON is my hobby project in 2011
• Also my first open source project
• First version released in 2 weeks
Community
• Google Code helps tracking bugs but hard to
involve contributions
• After migrating to GitHub in 2014
– Community much more active
– Issue tracking more powerful
– Pull requests ease contributions
Future
• Official Release under Tencent
– 1.0 beta → 1.0 release (after 3+ years…)
– Can work on it in working time
– Involve marketing and other colleagues
– Establish Community in China
• Post-1.0 Features
– Easy DOM API (but slower)
– JSON Schema
– Relaxed JSON syntax
– Optimization on Object Member Access
• Open source our internal projects at Tencent
To Establish an Open Source Project
• Courage
• Start Small
• Make Different
– Innovative Idea?
– Easy to Use?
– Good Performance?
• Embrace Community
• Learn
References
1. Stroustrup, Bjarne. The design and evolution
of C++. Pearson Education India, 1994.
2. Clinger, William D. How to read floating point
numbers accurately. Vol. 25. No. 6. ACM,
1990.
3. Loitsch, Florian. "Printing floating-point
numbers quickly and accurately with
integers." ACM Sigplan Notices 45.6 (2010):
233-243.
Q&A

More Related Content

What's hot

High Frequency Trading and NoSQL database
High Frequency Trading and NoSQL databaseHigh Frequency Trading and NoSQL database
High Frequency Trading and NoSQL databasePeter Lawrey
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...HostedbyConfluent
 
Improving Resource Utilization in Data Centers using an LSTM-based Prediction...
Improving Resource Utilization in Data Centers using an LSTM-based Prediction...Improving Resource Utilization in Data Centers using an LSTM-based Prediction...
Improving Resource Utilization in Data Centers using an LSTM-based Prediction...Kundjanasith Thonglek
 
청강대 특강 - 프로젝트 제대로 해보기
청강대 특강 - 프로젝트 제대로 해보기청강대 특강 - 프로젝트 제대로 해보기
청강대 특강 - 프로젝트 제대로 해보기Chris Ohk
 
SSE4.2の文字列処理命令の紹介
SSE4.2の文字列処理命令の紹介SSE4.2の文字列処理命令の紹介
SSE4.2の文字列処理命令の紹介MITSUNARI Shigeo
 
Onnx and onnx runtime
Onnx and onnx runtimeOnnx and onnx runtime
Onnx and onnx runtimeVishwas N
 
게임 디자이너와 게임 서버
게임 디자이너와 게임 서버게임 디자이너와 게임 서버
게임 디자이너와 게임 서버ByungChun2
 
GDC Europe 2014: Unreal Engine 4 for Programmers - Lessons Learned & Things t...
GDC Europe 2014: Unreal Engine 4 for Programmers - Lessons Learned & Things t...GDC Europe 2014: Unreal Engine 4 for Programmers - Lessons Learned & Things t...
GDC Europe 2014: Unreal Engine 4 for Programmers - Lessons Learned & Things t...Gerke Max Preussner
 
Machine Learning using Kubeflow and Kubernetes
Machine Learning using Kubeflow and KubernetesMachine Learning using Kubeflow and Kubernetes
Machine Learning using Kubeflow and KubernetesArun Gupta
 
GC Tuning in the HotSpot Java VM - a FISL 10 Presentation
GC Tuning in the HotSpot Java VM - a FISL 10 PresentationGC Tuning in the HotSpot Java VM - a FISL 10 Presentation
GC Tuning in the HotSpot Java VM - a FISL 10 PresentationLudovic Poitou
 
Spotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYCSpotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYCJosh Baer
 
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...HostedbyConfluent
 
RedisConf18 - Redis Memory Optimization
RedisConf18 - Redis Memory OptimizationRedisConf18 - Redis Memory Optimization
RedisConf18 - Redis Memory OptimizationRedis Labs
 
Zipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering FrameworkZipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering FrameworkDatabricks
 
Airflow를 이용한 데이터 Workflow 관리
Airflow를 이용한  데이터 Workflow 관리Airflow를 이용한  데이터 Workflow 관리
Airflow를 이용한 데이터 Workflow 관리YoungHeon (Roy) Kim
 
Architecture & Pitfalls of Logical Replication
Architecture & Pitfalls of Logical ReplicationArchitecture & Pitfalls of Logical Replication
Architecture & Pitfalls of Logical ReplicationAtsushi Torikoshi
 
카프카, 산전수전 노하우
카프카, 산전수전 노하우카프카, 산전수전 노하우
카프카, 산전수전 노하우if kakao
 
AVX2時代の正規表現マッチング 〜半群でぐんぐん!〜
AVX2時代の正規表現マッチング 〜半群でぐんぐん!〜AVX2時代の正規表現マッチング 〜半群でぐんぐん!〜
AVX2時代の正規表現マッチング 〜半群でぐんぐん!〜Ryoma Sin'ya
 
Modeling Physical Systems in the Metaverse Easily with Graphs
Modeling Physical Systems in the Metaverse Easily with GraphsModeling Physical Systems in the Metaverse Easily with Graphs
Modeling Physical Systems in the Metaverse Easily with GraphsNeo4j
 
Introduction to InfluxDB and TICK Stack
Introduction to InfluxDB and TICK StackIntroduction to InfluxDB and TICK Stack
Introduction to InfluxDB and TICK StackAhmed AbouZaid
 

What's hot (20)

High Frequency Trading and NoSQL database
High Frequency Trading and NoSQL databaseHigh Frequency Trading and NoSQL database
High Frequency Trading and NoSQL database
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
 
Improving Resource Utilization in Data Centers using an LSTM-based Prediction...
Improving Resource Utilization in Data Centers using an LSTM-based Prediction...Improving Resource Utilization in Data Centers using an LSTM-based Prediction...
Improving Resource Utilization in Data Centers using an LSTM-based Prediction...
 
청강대 특강 - 프로젝트 제대로 해보기
청강대 특강 - 프로젝트 제대로 해보기청강대 특강 - 프로젝트 제대로 해보기
청강대 특강 - 프로젝트 제대로 해보기
 
SSE4.2の文字列処理命令の紹介
SSE4.2の文字列処理命令の紹介SSE4.2の文字列処理命令の紹介
SSE4.2の文字列処理命令の紹介
 
Onnx and onnx runtime
Onnx and onnx runtimeOnnx and onnx runtime
Onnx and onnx runtime
 
게임 디자이너와 게임 서버
게임 디자이너와 게임 서버게임 디자이너와 게임 서버
게임 디자이너와 게임 서버
 
GDC Europe 2014: Unreal Engine 4 for Programmers - Lessons Learned & Things t...
GDC Europe 2014: Unreal Engine 4 for Programmers - Lessons Learned & Things t...GDC Europe 2014: Unreal Engine 4 for Programmers - Lessons Learned & Things t...
GDC Europe 2014: Unreal Engine 4 for Programmers - Lessons Learned & Things t...
 
Machine Learning using Kubeflow and Kubernetes
Machine Learning using Kubeflow and KubernetesMachine Learning using Kubeflow and Kubernetes
Machine Learning using Kubeflow and Kubernetes
 
GC Tuning in the HotSpot Java VM - a FISL 10 Presentation
GC Tuning in the HotSpot Java VM - a FISL 10 PresentationGC Tuning in the HotSpot Java VM - a FISL 10 Presentation
GC Tuning in the HotSpot Java VM - a FISL 10 Presentation
 
Spotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYCSpotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYC
 
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
 
RedisConf18 - Redis Memory Optimization
RedisConf18 - Redis Memory OptimizationRedisConf18 - Redis Memory Optimization
RedisConf18 - Redis Memory Optimization
 
Zipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering FrameworkZipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering Framework
 
Airflow를 이용한 데이터 Workflow 관리
Airflow를 이용한  데이터 Workflow 관리Airflow를 이용한  데이터 Workflow 관리
Airflow를 이용한 데이터 Workflow 관리
 
Architecture & Pitfalls of Logical Replication
Architecture & Pitfalls of Logical ReplicationArchitecture & Pitfalls of Logical Replication
Architecture & Pitfalls of Logical Replication
 
카프카, 산전수전 노하우
카프카, 산전수전 노하우카프카, 산전수전 노하우
카프카, 산전수전 노하우
 
AVX2時代の正規表現マッチング 〜半群でぐんぐん!〜
AVX2時代の正規表現マッチング 〜半群でぐんぐん!〜AVX2時代の正規表現マッチング 〜半群でぐんぐん!〜
AVX2時代の正規表現マッチング 〜半群でぐんぐん!〜
 
Modeling Physical Systems in the Metaverse Easily with Graphs
Modeling Physical Systems in the Metaverse Easily with GraphsModeling Physical Systems in the Metaverse Easily with Graphs
Modeling Physical Systems in the Metaverse Easily with Graphs
 
Introduction to InfluxDB and TICK Stack
Introduction to InfluxDB and TICK StackIntroduction to InfluxDB and TICK Stack
Introduction to InfluxDB and TICK Stack
 

Viewers also liked

Json for modern c++
Json for modern c++Json for modern c++
Json for modern c++지환 김
 
GPU Gems3 Vegetation
GPU Gems3 VegetationGPU Gems3 Vegetation
GPU Gems3 VegetationYoupyo Choi
 
D2 Horizon Occlusion
D2 Horizon OcclusionD2 Horizon Occlusion
D2 Horizon OcclusionYoupyo Choi
 
D2 Depth of field
D2 Depth of fieldD2 Depth of field
D2 Depth of fieldYoupyo Choi
 
FINDING FORENSIC ARTIFACTS FROM WINDOW REGISTRY
FINDING FORENSIC ARTIFACTS FROM WINDOW REGISTRYFINDING FORENSIC ARTIFACTS FROM WINDOW REGISTRY
FINDING FORENSIC ARTIFACTS FROM WINDOW REGISTRYnitinparashar786
 
How to be a writer in a world of structured content
How to be a writer in a world of structured contentHow to be a writer in a world of structured content
How to be a writer in a world of structured contentFabrizio Ferri-Benedetti
 
Stories that Sell: Content Strategy for Adventure Brands
Stories that Sell: Content Strategy for Adventure Brands Stories that Sell: Content Strategy for Adventure Brands
Stories that Sell: Content Strategy for Adventure Brands Stephen Landau
 
Learning To Sell - The Most Essential Start-up Skill by Chris Cousins
 Learning To Sell - The Most Essential Start-up Skill by Chris Cousins Learning To Sell - The Most Essential Start-up Skill by Chris Cousins
Learning To Sell - The Most Essential Start-up Skill by Chris CousinsGibraltar Startup
 
Open Ldap Integration and Configuration with Lifray 6.2
Open Ldap Integration and Configuration with Lifray 6.2Open Ldap Integration and Configuration with Lifray 6.2
Open Ldap Integration and Configuration with Lifray 6.2Vinaykumar Hebballi
 
Workers of the web - BrazilJS 2013
Workers of the web - BrazilJS 2013Workers of the web - BrazilJS 2013
Workers of the web - BrazilJS 2013Thibault Imbert
 

Viewers also liked (20)

Rapid json tutorial
Rapid json tutorialRapid json tutorial
Rapid json tutorial
 
Java JSON Benchmark
Java JSON BenchmarkJava JSON Benchmark
Java JSON Benchmark
 
Json for modern c++
Json for modern c++Json for modern c++
Json for modern c++
 
JSON and REST
JSON and RESTJSON and REST
JSON and REST
 
JSON with C++ & C#
JSON with C++ & C#JSON with C++ & C#
JSON with C++ & C#
 
D2 Rain (1/2)
D2 Rain (1/2)D2 Rain (1/2)
D2 Rain (1/2)
 
GPU Gems3 Vegetation
GPU Gems3 VegetationGPU Gems3 Vegetation
GPU Gems3 Vegetation
 
D2 Horizon Occlusion
D2 Horizon OcclusionD2 Horizon Occlusion
D2 Horizon Occlusion
 
D2 Rain (2/2)
D2 Rain (2/2)D2 Rain (2/2)
D2 Rain (2/2)
 
D2 Havok
D2 HavokD2 Havok
D2 Havok
 
D2 Job Pool
D2 Job PoolD2 Job Pool
D2 Job Pool
 
D2 Depth of field
D2 Depth of fieldD2 Depth of field
D2 Depth of field
 
FINDING FORENSIC ARTIFACTS FROM WINDOW REGISTRY
FINDING FORENSIC ARTIFACTS FROM WINDOW REGISTRYFINDING FORENSIC ARTIFACTS FROM WINDOW REGISTRY
FINDING FORENSIC ARTIFACTS FROM WINDOW REGISTRY
 
D2 Hdr
D2 HdrD2 Hdr
D2 Hdr
 
How to Sell Content Strategy... in Spain
How to Sell Content Strategy... in SpainHow to Sell Content Strategy... in Spain
How to Sell Content Strategy... in Spain
 
How to be a writer in a world of structured content
How to be a writer in a world of structured contentHow to be a writer in a world of structured content
How to be a writer in a world of structured content
 
Stories that Sell: Content Strategy for Adventure Brands
Stories that Sell: Content Strategy for Adventure Brands Stories that Sell: Content Strategy for Adventure Brands
Stories that Sell: Content Strategy for Adventure Brands
 
Learning To Sell - The Most Essential Start-up Skill by Chris Cousins
 Learning To Sell - The Most Essential Start-up Skill by Chris Cousins Learning To Sell - The Most Essential Start-up Skill by Chris Cousins
Learning To Sell - The Most Essential Start-up Skill by Chris Cousins
 
Open Ldap Integration and Configuration with Lifray 6.2
Open Ldap Integration and Configuration with Lifray 6.2Open Ldap Integration and Configuration with Lifray 6.2
Open Ldap Integration and Configuration with Lifray 6.2
 
Workers of the web - BrazilJS 2013
Workers of the web - BrazilJS 2013Workers of the web - BrazilJS 2013
Workers of the web - BrazilJS 2013
 

Similar to How to Write the Fastest JSON Parser/Writer in the World

Messaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new frameworkMessaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new frameworkTomas Doran
 
Performance and Abstractions
Performance and AbstractionsPerformance and Abstractions
Performance and AbstractionsMetosin Oy
 
Go from a PHP Perspective
Go from a PHP PerspectiveGo from a PHP Perspective
Go from a PHP PerspectiveBarry Jones
 
Python VS GO
Python VS GOPython VS GO
Python VS GOOfir Nir
 
Introduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyIntroduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyRobert Viseur
 
Your backend architecture is what matters slideshare
Your backend architecture is what matters slideshareYour backend architecture is what matters slideshare
Your backend architecture is what matters slideshareColin Charles
 
Hibernate ORM: Tips, Tricks, and Performance Techniques
Hibernate ORM: Tips, Tricks, and Performance TechniquesHibernate ORM: Tips, Tricks, and Performance Techniques
Hibernate ORM: Tips, Tricks, and Performance TechniquesBrett Meyer
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage SystemsSATOSHI TAGOMORI
 
The Why and How of Scala at Twitter
The Why and How of Scala at TwitterThe Why and How of Scala at Twitter
The Why and How of Scala at TwitterAlex Payne
 
High Performance Systems in Go - GopherCon 2014
High Performance Systems in Go - GopherCon 2014High Performance Systems in Go - GopherCon 2014
High Performance Systems in Go - GopherCon 2014Derek Collison
 
Writing a fast HTTP parser
Writing a fast HTTP parserWriting a fast HTTP parser
Writing a fast HTTP parserfukamachi
 
PostgreSQL is the new NoSQL - at Devoxx 2018
PostgreSQL is the new NoSQL  - at Devoxx 2018PostgreSQL is the new NoSQL  - at Devoxx 2018
PostgreSQL is the new NoSQL - at Devoxx 2018Quentin Adam
 
Taming the resource tiger
Taming the resource tigerTaming the resource tiger
Taming the resource tigerElizabeth Smith
 
From a student to an apache committer practice of apache io tdb
From a student to an apache committer  practice of apache io tdbFrom a student to an apache committer  practice of apache io tdb
From a student to an apache committer practice of apache io tdbjixuan1989
 
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...Amazon Web Services
 
High Performance With Java
High Performance With JavaHigh Performance With Java
High Performance With Javamalduarte
 
Taming the resource tiger
Taming the resource tigerTaming the resource tiger
Taming the resource tigerElizabeth Smith
 

Similar to How to Write the Fastest JSON Parser/Writer in the World (20)

Messaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new frameworkMessaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new framework
 
Performance and Abstractions
Performance and AbstractionsPerformance and Abstractions
Performance and Abstractions
 
Go from a PHP Perspective
Go from a PHP PerspectiveGo from a PHP Perspective
Go from a PHP Perspective
 
Python VS GO
Python VS GOPython VS GO
Python VS GO
 
Introduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyIntroduction to libre « fulltext » technology
Introduction to libre « fulltext » technology
 
Your backend architecture is what matters slideshare
Your backend architecture is what matters slideshareYour backend architecture is what matters slideshare
Your backend architecture is what matters slideshare
 
Hibernate ORM: Tips, Tricks, and Performance Techniques
Hibernate ORM: Tips, Tricks, and Performance TechniquesHibernate ORM: Tips, Tricks, and Performance Techniques
Hibernate ORM: Tips, Tricks, and Performance Techniques
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage Systems
 
The Why and How of Scala at Twitter
The Why and How of Scala at TwitterThe Why and How of Scala at Twitter
The Why and How of Scala at Twitter
 
High Performance Systems in Go - GopherCon 2014
High Performance Systems in Go - GopherCon 2014High Performance Systems in Go - GopherCon 2014
High Performance Systems in Go - GopherCon 2014
 
Writing a fast HTTP parser
Writing a fast HTTP parserWriting a fast HTTP parser
Writing a fast HTTP parser
 
PostgreSQL is the new NoSQL - at Devoxx 2018
PostgreSQL is the new NoSQL  - at Devoxx 2018PostgreSQL is the new NoSQL  - at Devoxx 2018
PostgreSQL is the new NoSQL - at Devoxx 2018
 
Taming the resource tiger
Taming the resource tigerTaming the resource tiger
Taming the resource tiger
 
From a student to an apache committer practice of apache io tdb
From a student to an apache committer  practice of apache io tdbFrom a student to an apache committer  practice of apache io tdb
From a student to an apache committer practice of apache io tdb
 
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...
 
High Performance With Java
High Performance With JavaHigh Performance With Java
High Performance With Java
 
Zero mq logs
Zero mq logsZero mq logs
Zero mq logs
 
Taming the resource tiger
Taming the resource tigerTaming the resource tiger
Taming the resource tiger
 
Php
PhpPhp
Php
 
Php
PhpPhp
Php
 

Recently uploaded

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Fact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMsFact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMsZilliz
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your QueriesExploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your QueriesSanjay Willie
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Recently uploaded (20)

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Fact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMsFact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMs
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your QueriesExploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

How to Write the Fastest JSON Parser/Writer in the World

  • 1. How to Write the Fastest JSON Parser/Writer in the World Milo Yip Tencent 28 Mar 2015
  • 2. Milo Yip 叶劲峰 • Expert Engineer (2011 to now) – Engine Technology Center, R & D Department, Interactive Entertainment Group (IEG), Tencent • Master of Philosophy in System Engineering & Engineering Management, CUHK • Bachelor of Cognitive Science, HKU • https://github.com/miloyip • http://www.cnblogs.com/miloyip • http://www.zhihu.com/people/miloyip
  • 3.
  • 4. Table of Contents 1. Introduction 2. Benchmark 3. Design 4. Limitations 5. Thoughts 6. References
  • 6. JSON • JavaScript Object Notation • Alternative to XML • Human-readable text to transmit/persist data • RFC 7159/ECMA-404 • Common uses – Open API (e.g. Twitter, Facebook, etc.) – Data storage/exchange (e.g. GeoJSON)
  • 7. RapidJSON • https://github.com/miloyip/rapidjson • MIT License • C++ Header-only Library • Started in Nov 2011 • Inspired by RapidXML • Will release 1.0 under Tencent *soon*
  • 8. Features • Both SAX and DOM style API • Fast • Cross platform/compiler • No dependencies • Memory friendly • UTF-8/16/32/ASCII and transcoding • In-situ Parsing • More at http://miloyip.github.io/rapidjson/md_doc_features.html
  • 9. Hello RapidJSON! #include "rapidjson/document.h" #include "rapidjson/writer.h" #include "rapidjson/stringbuffer.h" #include <iostream> using namespace rapidjson; int main() { // 1. Parse a JSON string into DOM. const char* json = "{"project":"rapidjson","stars":10}"; Document d; d.Parse(json); // 2. Modify it by DOM. Value& s = d["stars"]; s.SetInt(s.GetInt() + 1); // 3. Stringify the DOM StringBuffer buffer; Writer<StringBuffer> writer(buffer); d.Accept(writer); // Output {"project":"rapidjson","stars":11} std::cout << buffer.GetString() << std::endl; return 0; }
  • 10. Fast, AND Reliable • 103 Unit Tests • Continuous Integration – Travis on Linux – AppVeyor on Windows – Valgrind (Linux) for memory leak checking • Use in real applications – Use in client and server applications at Tencent – A user reported parsing 50 million JSON daily
  • 11. Public Projects using RapidJSON • Cocos2D-X: Cross-Platform 2D Game Engine http://cocos2d-x.org/ • Microsoft Bond: Cross-Platform Serialization https://github.com/Microsoft/bond/ • Google Angle: OpenGL ES 2 for Windows https://chromium.googlesource.com/angle/angle/ • CERN LHCb: Large Hadron Collider beauty http://lhcb-comp.web.cern.ch/lhcb-comp/ • Tell me if you know more
  • 13. Benchmarks for Native JSON libraries • https://github.com/miloyip/nativejson-benchmark • Compare 20 open source C/C++ JSON libraries • Evaluate speed, memory and code size • For parsing, stringify, traversal, and more
  • 14. Libaries • CAJUN • Casablanca • cJSON • dropbox/json11 • FastJson • gason • jansson • json-c • json spirit • Json Box • JsonCpp • JSON++ • parson • picojson • RapidJSON • simplejson • udp/json • ujson4c • vincenthz/libjson • YAJL
  • 19. Benchmarks for Spine • Spine is a 2D skeletal animation tool • Spine-C is the official runtime in C https://github.com/EsotericSoftware/spine-runtimes/tree/master/spine-c • It uses JSON as data format • It has a custom JSON parser • Adapt RapidJSON and compare loading time
  • 20. Test Data • http://esotericsoftware.com/forum/viewtopic.php?f=3&t=2831 • Original 80KB JSON • Interpolate to get multiple JSON files • Load 100 times
  • 23. The Zero Overhead Principle • Bjarne Stroustrup[1]: “What you don't use, you don't pay for.” • RapidJSON tries to obey this principle – SAX and DOM – Combinable options, configurations
  • 24. SAX StartObject() Key("hello", 5, true) String("world", 5, true) Key("t", 1, true) Bool(true) Key("f", 1, true) Bool(false) Key("n", 1, true) Null() Key("i") UInt(123) Key("pi") Double(3.1416) Key("a") StartArray() Uint(1) Uint(2) Uint(3) Uint(4) EndArray(4) EndObject(7) DOM When parsing a JSON to DOM, use SAX events to build a DOM. When stringify a DOM, traverse it and generate events to SAX. {"hello":"world", "t":true, "f":false, "n":null, "i":123, "pi":3.1416, "a":[1, 2, 3, 4]}
  • 26. Handler: Template Parameter • Handler handles SAX event callbacks • How to implement callbacks? – Traditional: virtual function – RapidJSON: template parameter template <unsigned parseFlags, typename InputStream, typename Handler> ParseResult Reader::Parse(InputStream& is, Handler& handler); • No virtual function overhead • Inline callback functions
  • 27. Parsing Options: Template Argument • Many parse options -> Zero overhead principle • Use integer template argument template <unsigned parseFlags, typename InputStream, typename Handler> ParseResult Reader::Parse(InputStream& is, Handler& handler); if (parseFlags & kParseInsituFlag) { // ... } else { // ... } • Compiler optimization eliminates unused code
  • 28. Recursive SAX Parser • Simple to write/optimize by hand • Use program stack to maintain parsing state of the tree structure • Prone to stack overflow – So also provide an iterative parser (Contributed by Don Ding @thebusytypist)
  • 30. In situ Parsing No allocation and copying for strings! Cache Friendly!
  • 31. Parsing Number: the Pain ;( • RapidJSON supports parsing JSON number to uint32_t, int32_t, uint64_t, int64_t, double • Difficult to detect in single pass • Even more difficult for double (strtod() is slow) • Implemented kFullPrecision option using 1. Fast-path 2. DIY-FP (https://github.com/floitsch/double-conversion) 3. Big Integer method [2]
  • 32. How difficult? • PHP Hangs On Numeric Value 2.2250738585072011e-308 http://www.exploringbinary.com/php-hangs-on-numeric- value-2-2250738585072011e-308/ • Java Hangs When Converting 2.2250738585072012e-308 http://www.exploringbinary.com/java-hangs-when- converting-2-2250738585072012e-308/ • "2.22507385850720113605740979670913197593481954635 164564e-308“ → 2.2250738585072009e-308 • "2.22507385850720113605740979670913197593481954635 164565e-308“→ 2.2250738585072014e-308 • And need to be fast…
  • 33. DOM Designed for Fast Parsing • A JSON value can be one of 6 types – object, array, number, string, boolean, null • Inheritance needs new for each value • RapidJSON uses a single variant type Value
  • 34. Layout of Value String Ch* str SizeType length unsigned flags Number int i unsigned u int64_t i64 uint64_t u64 double d 0 0 unsigned flags Object Member* members SizeType size SizeType capacity unsigned flags Array Value* values SizeType size SizeType capacity unsigned flags
  • 35. Move Semantics • Deep copying object/array/string is slow • RapidJSON enforces move semantics
  • 36. The Default Allocator • Internally allocates a single linked-list of buffers • Do not free objects (thus FAST!) • Suitable for parsing (creating values consecutively) • Not suitable for DOM manipulation
  • 37. Custom Initial Buffer • User can provide a custom initial buffer – For example, buffer on stack, scratch buffer • The allocator use that buffer first until it is full • Possible to archive zero allocation in parsing
  • 38. Short String Optimization • Many JSON keys are short • Contributor @Kosta-Github submitted a PR to optimize short strings String Ch* str SizeType length unsigned flags ShortString Ch str[11]; uint8_t x; unsigned flags Let length = 11 – x So 11-char long string is ended with ‘0’
  • 39. SIMD Optimization • Using SSE2/SSE4 to skip whitespaces (space, tab, LF, CR) • Each iteration compare 16 chars × 4 chars • Fast for JSON with indentation • Visual C++ 2010 32-bit test: strlen() for ref. strspn() RapidJSON (no SIMD) RapidJSON (SSE2) RapidJSON (SSE4) Skip 1M whitespace (ms) 752 3011 1349 170 102
  • 40. Integer-to-String Optimization • Integer-To-String conversion is simple – E.g. 123 -> “123” • But standard library is quite slow – E.g. sprintf(), _itoa(), etc. • Tried various implementations
  • 42. Double-to-String Optimziation • Double-to-string conversion is very slow – E.g. 3.14 -> “3.14” • Grisu2 is a fast algorithm for this[3] – 100% cases give correct results – >99% cases give optimal results • Google V8 has an implementation – https://github.com/floitsch/double-conversion – But not header-only, so…
  • 43. My Grisu2 Implementation • https://github.com/miloyip/dtoa-benchmark • Visual C++ 2013 on Windows 64-bit:
  • 45. Tradeoff: User-Friendliness • DOM only supports move semantics – Cannot copy-construct Value/Document – So, cannot pass them by value, put in containers • DOM APIs needs allocator as parameter, e.g. numbers.PushBack(1, allocator); • User needs to concern life-cycle of allocator and its allocated values
  • 46. Pausing in Parsing • Cannot pause in parsing and resume it later – Not keeping all parsing states explicitly – Doing so will be much slower • Typical Scenario – Streaming JSON from network – Don’t want to store the JSON in memory • Solution – Parse in an separate thread – Block the input stream to pause
  • 48. Origin • RapidJSON is my hobby project in 2011 • Also my first open source project • First version released in 2 weeks
  • 49. Community • Google Code helps tracking bugs but hard to involve contributions • After migrating to GitHub in 2014 – Community much more active – Issue tracking more powerful – Pull requests ease contributions
  • 50. Future • Official Release under Tencent – 1.0 beta → 1.0 release (after 3+ years…) – Can work on it in working time – Involve marketing and other colleagues – Establish Community in China • Post-1.0 Features – Easy DOM API (but slower) – JSON Schema – Relaxed JSON syntax – Optimization on Object Member Access • Open source our internal projects at Tencent
  • 51. To Establish an Open Source Project • Courage • Start Small • Make Different – Innovative Idea? – Easy to Use? – Good Performance? • Embrace Community • Learn
  • 52. References 1. Stroustrup, Bjarne. The design and evolution of C++. Pearson Education India, 1994. 2. Clinger, William D. How to read floating point numbers accurately. Vol. 25. No. 6. ACM, 1990. 3. Loitsch, Florian. "Printing floating-point numbers quickly and accurately with integers." ACM Sigplan Notices 45.6 (2010): 233-243.
  • 53. Q&A