SlideShare a Scribd company logo
1 of 50
Screaming Fast JSON parsing
Karthik Ramgopal
Who am I?
Engineer
Mobile Infrastructure lead
Former engineer on Flagship and Pulse app teams
Obsessed about performance
Connect with me: https://www.linkedin.com/in/karthikrg/
Our user base
LinkedIn’s Android app family
Job Search
Lookup
Pulse
Slideshare
Sales Navigator
Lynda
Recruiter
Students
Android device and network diversity
● Samsung Galaxy S6
● 4x2.1 GHz Cortex-A57 + 4x1.5 GHz Cortex-A53
● 3 GB RAM
● LTE (100 Mbits/s)
● Samsung Star Pro
● 1 Ghz Cortex A5
● 512 MB RAM
● EDGE (384 Kbps)
LinkedIn client app high level architecture
Frontend API server
LinkedIn uses JSON to talk between apps and server
What is JSON?
JavaScript Object Notation is a data serialization format.
Key value encoded data.
Values must be string, boolean, number, array, object, null.
Text based, Light weight (relatively), Human readable.
Wide support across programming languages/platforms
What else is out there?
XML (eXtensible Markup Language)
(+) Text based and human readable.
(-) Very verbose.
Binary Data Formats
Examples include MsgPack, ProtoBuf, FlatBuffers, Cap’n’Proto etc.
(+) More compact than JSON. Positional index based formats even omit keys.
(+) Backing schema to describe data structure with platform specific binding generators
(+) Much faster to parse than JSON when using vanilla parsing techniques.
(-) Not human readable.
(-) No native parsing support in web browsers.
(-) Removed fields still occupy some space in positional formats.
(-) Schema evolution MUST preserve field order in positional formats.
Data Flow
Parser Model Binder View Binder
Data
(JSON/XML/Binary) DataModel ViewModel
Network
Fission
DataModel
MMAP Cache
Binary
What affects JSON parsing performance?
CPU
Validating structure and tokenizing.
Large number of branches causing pipeline stalls.
Memory
Large number of small allocs on the heap
Causes memory churn slowing down the allocator
Garbage collection pauses
Types of JSON parsers
Who controls the flow of parsed data to the consumer?
Pull parser (Consumer controls)
Push parser (Parser controls)
How many times is the data processed?
Once (traditional parsers)
Twice (index overlay parsers)
How is the data processed?
JSON vs Binary
JSON (naturally) has a size disadvantage over binary
But, it is human readable and has wider multi-platform support
Schema evolution is easier
Size does matter or does it?
JSON compresses very well being text based and having key repetition
Binary formats don’t compress as well
With compression, size over the wire is very comparable
Decompression cost is similar, but after decompression binary is smaller
Format Compressed size (gzip) Uncompressed size
JSON 35.2 KB 309.5 KB
ProtocolBuffers 33.7 KB 178.2 KB
FlatBuffers 34.1 KB 192.8 KB
Cap’n’Proto 33.8 KB 166.3 KB
LinkedIn Feed 20 items (90th percentile sizes)
Comparison of Android JSON parsing libraries
Parser Streaming Reflection Parse time (ms) Allocation (KB)
JSONObject No No 297/281 2397/2371
JsonReader Yes No 199/187 409/396
Alibaba streaming Yes No 72/70 220/185
GSON Yes Yes 521/486 1135/302
Moshi Yes Yes 493/311 1088/341
Jackson Databind Yes Yes 402/78 1192/191
Jackson streaming Yes No 79/77 219/187
LinkedIn Feed 20 items (First/Subsequent) Nexus 5
● Using reflection introduces a massive first time penalty.
● Alibaba and Jackson streaming win hands down with Alibaba having the slight edge.
What is the ideal way to parse network responses?
Streaming (SAX) vs blob (DOM) parsing
Stream means parsing can begin before network download finishes.
Memory pressure/Garbage is reduced with streaming.
Typically harder to code by hand (need to handle incremental data load etc.)
Minimize transformations
Typical parsing involves JSON -> Map -> Model object POJO.
Intermediary transformation involves CPU and memory.
Go directly from JSON to POJO.
Android specific code generation considerations
Prefer fields instead of methods for accessors on POJO.
65k method count limit pre Android L
Virtual function execution penalty
Use primitive types wherever possible
int instead of Integer for example
Boxed values are allocated on the heap and result in unnecessary memory churn
Generate compact code
Surely someone must have figured all this out?
Yes! Open source codegenerating JSON parsers based on Jackson streaming.
Instagram JSON parser
LoganSquare (Uses a teeny bit of reflection)
How does the generated code look?
{
“numConnections” : 20,
“name”: “John”
}
profile.json
Profile build(JsonParser parser) {
String name;
int numConnections;
parser.startRecord(); // Consumes ‘{’
while (parser.hasMoreFields()) {
String field = parser.getText();
parser.startField(); // Consumes ‘:’
if (“numConnections”.equals(field)) {
numConnections = parser.getInteger();
} else if (“name”.equals(field)) {
name = parser.getText();
} else {
parser.skipField();
}
}
return new Profile(numConnections, name);
}
But binary still wins!
Much faster (Lesser CPU consumption)
Much less intermediary memory allocs (Memory churn/Garbage reduced)
Parser Streaming Reflection Parse time (ms) Allocation (KB)
Alibaba streaming Yes No 72/70 220/185
Jackson streaming Yes No 79/77 219/187
Protocol Buffers Lite Yes No 32/31 66/62
LinkedIn Feed 20 items (First/Subsequent) Nexus 5
The gap is wider on lower end devices
Binary is ~4x faster
Could be the difference between delight and despair!
Parser Streaming Reflection Parse time (ms) Allocation (KB)
Alibaba streaming Yes No 377/370 220/185
Jackson streaming Yes No 392/397 219/187
Protocol Buffers Lite Yes No 99/97 66/62
LinkedIn Feed 20 items (First/Subsequent) Galaxy Star Pro
Closing the gap with binary
Make the CPU do less work when parsing JSON
Fewer memory allocations
Reduce garbage and memory churn
All when parsing more data
Don’t pay for what you don’t use
The hunt for inefficiencies: JSON keys
Positional binary formats achieve compaction and faster parsing since they
don’t serialize keys, and use position based encoding.
Parsing keys involves the following
Allocating key strings.
Comparing key strings with known “keys” to figure out which field to match
Back to code
Profile build(JsonParser parser) {
String name;
int numConnections;
parser.startRecord(); // Consumes ‘{’
while (parser.hasMoreFields()) {
String field = parser.getText();
parser.startField(); // Consumes ‘:’
if (“numConnections”.equals(field)) {
numConnections = parser.getInteger();
} else if (“name”.equals(field)) {
name = parser.getText();
} else {
parser.skipField();
}
}
return new Profile(numConnections, name);
}
String alloc
Comparisons
The cost of JSON key comparisons
If there are ‘n’ keys with an average length of ‘k’.
Temporary memory allocation space complexity O(nk)
Equality checking time complexity O(n2k)
But we know the keys in advance, so can we use this to our advantage?
Yes! Use a trie with positional ordinals as values
n
a
m
e
u
m
s
1
0
● Trades a 1 time static space allocation for faster performance.
● No temp string allocation. Read character by character from
source and check in trie.
● Avoids multiple comparison branches using if-else.
● Trie can be statically generated (since all key names are known
in advance)
● Trie can also be compacted to reduce storage space for non
redundant subsequences.
● Reduces space complexity to a 1 time cost of O(nk)
● Reduces equality checking time complexity to O(nk)
● Faster performance due to lesser branching.
Generated code with Trie
n
a
m
e
u
m
s
1
0
private static final Trie KEY_STORE = new Trie();
static {
KEY_STORE.put(“name”, 0);
KEY_STORE.put(“numConnections”, 1);
}
Profile build(NewJsonParser parser) {
String name;
int numConnections;
parser.startRecord(); // Consumes ‘{’
while (parser.hasMoreFields()) {
int ordinal = parser.getFieldOrdinal(KEY_STORE);
parser.startField(); // Consumes ‘:’
switch (ordinal) {
case 0: numConnections = parser.getInteger();
Break;
case 1: name = parser.getText();
Break;
default: parser.skipField();
}
}
return new Profile(numConnections, name);
}
How does this change the numbers?
Closes the gap but not enough!
Parser Parse time (ms) Allocation (KB)
Alibaba streaming 72/70 220/185
Jackson streaming 79/77 219/187
Protocol Buffers Lite 32/31 66/62
New Json parser 57/55 129/107
LinkedIn Feed 20 items (First/Subsequent) Nexus 5
Expoiting prior knowledge of value types
Our JSON is backed by a schema. Schemas are written using an IDL.
We internally use PDL (Pegasus Data Language) as the IDL.
record Profile {
numConnections: int?
name: String?
}
● Records define a JSON object.
● Field names here are the field names in the serialized JSON.
● Types in the schema are types of values in the serialized JSON.
● Knowing types beforehand means parsing code can be lax and needn’t have strict checks.
● If an unexpected type is found, JSON is malformed, abort!
{
“numConnections” : 20,
“name”: “John”
}
Vanilla JSON parser field value parsing
Field start (:)
Object/Map Array Number BooleanString Null
{ [ -/ 0 to 9 “ t or f n
● Since we know types beforehand, these branches can be avoided.
● Parsing of value can be on-demand.
● Significantly reduces parse time.
How does this change the numbers?
Closes the gap more on parse time, temp allocations are still pretty bad!
Parser Parse time (ms) Allocation (KB)
Alibaba streaming 72/70 220/185
Jackson streaming 79/77 219/187
Protocol Buffers Lite 32/31 66/62
New Json parser 45/42 127/108
LinkedIn Feed 20 items (First/Subsequent) Nexus 5
All obvious issues seem fixed. What else?
Sometimes profiling is the only answer to find hotspots.
Data arrives as a UTF-8 byte stream over the network not as chars.
LinkedIn app payloads are massively String heavy.
Profiling showed some CPU and allocation hotspots
Converting bytes to chars using Java’s built-in decoder.
Reading strings.
Converting bytes to chars?
Another transformation.
Temporary memory allocs for decoding buffers etc.
Most JSON tokens are ASCII, can use just 1 byte for them instead of 2
Surprise! Jackson, Alibaba etc. do have separate UTF-8 stream parsers.
We adopt a Jackson-like optimized approach when decoding UTF-8 strings.
UTF-8 decoding
Variable length encoding
1 byte/ASCII characters (U+0000 to U+007F)
2 byte chars (U+0080 to U+07FF)
3 byte chars (U+0800 to U+FFFF)
4 byte chars (U+10000 to U+10FFFF)
int c = inputStream.read();
if (c < 0x007f) {
// read 1 byte UTF
}
else if ((c & 0xE0) == 0xC0)
{ // 2 bytes (0x0080 - 0x07FF)
// read 2 byte UTF
}
else if ((c & 0xF0) == 0xE0)
{ // 3 bytes (0x0800 - 0xFFFF)
// read 3 byte UTF
}
else if ((c & 0xF8) == 0xF0)
{
// 4 bytes; double-char with surrogates.
// read 4 byte UTF
}
Upto 4 branches
Can we make this faster? Yes!
● Static 256 int alloc, but helps us massively during
decode.
● Reduces CPU computation during decode as well as
branches.
● Massively speeds up string decode.
UTF-8 decoding revised
int c = inputStream.read();
switch (UTF_8_LOOKUP_TABLE[c]) {
case 0: // read 1 byte char;
break;
case 2: // read 2 byte char;
break;
case 3: // read 3 byte char;
break;
case 4: // read 4 byte char;
break;
default: // handle error;
break;
}
1 branch, 1 comparison computation per char
Reading long strings
Traditional approach using StringBuilder:
StringBuilder builder = new StringBuilder();
while (!parser.stringEndReached()) {
builder.add(parser.nextChar());
}
return builder.toString();
● Every time buffer is enlarged to make more space three things happen
○ Allocating a new buffer (CPU + memory alloc).
○ Copying from old buffer to new buffer (CPU cost).
○ Garbage collecting old buffer (Memory churn and garbage).
● If we pool the underlying buffers in a buffer pool, and use a custom ‘StringBuilder’
○ Memory alloc, garbage and churn reduced.
○ CPU cost of copy still remains.
○ Over large, diverse payloads, pool becomes fragmented so efficiency reduces.
Reading long strings
Segmentation using pooled homogeneous buffers helps performance.
Zero copy cost when builder is enlarged (New buffer is appended to list)
Memory alloc, churn and garbage cost amortized by pooling.
Segmentation into homogeneous chunks means no fragmentation.
Final string computation may be slightly slower, but buffer size is chosen in a way that advantages elsewhere more than
cover it.
Buffer 1 Buffer 2 Buffer 3 Buffer 4
Characters not in the basic multilingual plane
Not encoded as codepoints.
Encoded as UTF-16 surrogate pairs escaped with u.
Historic reason for doing so (Any guesses?)
Needs to be handled carefully when parsing
Static decoder table for hex chars similar to UTF-8 to speed up parsing.
U+1D11E -> uD834uDD1E
Analysis of string content
Strings in LinkedIn apps tend to be very ASCII character heavy.
Even string values in other locales often are interspersed with ASCII content.
ASCII characters often occur together in a sequence.
Parsing these can be speeded up if we use a tight loop for ASCII content.
Break out and do extra branches if non ASCII content is encountered.
Massively improves overall string parsing performance from byte streams.
When reading ASCII byte is the same as the char.
Whitespaces
JSON sent over the wire is not pretty printed for compaction.
When parsing delimiters, check for delimiter first, before skipping whitespace.
Within whitespaces itself, a plain space has a higher chance of occuring than a
carriage return, line feed or tab.
Tight loop for space characters when skipping whitespace.
After doing all this...
The performance is very comparable!
Parser Parse time (ms) Allocation (KB)
Alibaba streaming 72/70 220/185
Jackson streaming 79/77 219/187
Protocol Buffers Lite 32/31 66/62
New Json parser 31/30 62/41
LinkedIn Feed 20 items (First/Subsequent) Nexus 5
● Still human readable
● Still debuggable
● Can still use the same format across iOS/Android/Web
And on low end devices...
The improvements are more profound!
Parser Parse time (ms) Allocation (KB)
Alibaba streaming 377/370 220/185
Jackson streaming 392/397 219/187
Protocol Buffers Lite 99/97 66/62
New Json parser 99/96 62/41
LinkedIn Feed 20 items (First/Subsequent) Samsung Star Pro
● Most of the benefit comes from saving on alloc and GC pauses
● Results in smoother UI
Zero Garbage!
This new parser is Zero garbage.
It does not allocate any transient memory beyond the POJOs it creates as the result of parse.
All intermittent allocs like buffers are pooled.
Pools are homogeneous as much as possible to limit fragmentation.
Pool capacities/buffer sizes are tuned based on device and network.
Lessons learnt
It is possible to parse JSON fast even on low end Android devices.
All formats have their achille’s heels, and there is no one size fits all.
Never adopt some cool new format blindly. Measure measure measure!
What’s next?
Similar parser + codegen for iOS in Obj-C
Open source both as part of Rest.li mobile optimized bindings.
Targeted for Q4 2017
Questions?

More Related Content

What's hot

Building Your Data Streams for all the IoT
Building Your Data Streams for all the IoTBuilding Your Data Streams for all the IoT
Building Your Data Streams for all the IoTDevOps.com
 
Extended ecm for office 365 overview and roadmap
Extended ecm for office 365 overview and roadmapExtended ecm for office 365 overview and roadmap
Extended ecm for office 365 overview and roadmapOpenText
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and ThenSATOSHI TAGOMORI
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...HostedbyConfluent
 
Openstack - An introduction/Installation - Presented at Dr Dobb's conference...
 Openstack - An introduction/Installation - Presented at Dr Dobb's conference... Openstack - An introduction/Installation - Presented at Dr Dobb's conference...
Openstack - An introduction/Installation - Presented at Dr Dobb's conference...Rahul Krishna Upadhyaya
 
AWS re:Invent 2016: From Monolithic to Microservices: Evolving Architecture P...
AWS re:Invent 2016: From Monolithic to Microservices: Evolving Architecture P...AWS re:Invent 2016: From Monolithic to Microservices: Evolving Architecture P...
AWS re:Invent 2016: From Monolithic to Microservices: Evolving Architecture P...Amazon Web Services
 
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR complianceOzone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR complianceDinesh Chitlangia
 
Q1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXL
Q1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXLQ1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXL
Q1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXLMemory Fabric Forum
 
INF104 - HCL Domino AppDev Pack – The Future of Domino App Dev Nobody Knows A...
INF104 - HCL Domino AppDev Pack – The Future of Domino App Dev Nobody Knows A...INF104 - HCL Domino AppDev Pack – The Future of Domino App Dev Nobody Knows A...
INF104 - HCL Domino AppDev Pack – The Future of Domino App Dev Nobody Knows A...Heiko Voigt
 
Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Ac...
Flink Forward San Francisco 2018: Andrew Gao &  Jeff Sharpe - "Finding Bad Ac...Flink Forward San Francisco 2018: Andrew Gao &  Jeff Sharpe - "Finding Bad Ac...
Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Ac...Flink Forward
 
Unique ID generation in distributed systems
Unique ID generation in distributed systemsUnique ID generation in distributed systems
Unique ID generation in distributed systemsDave Gardner
 
cLoki: Like Loki but for ClickHouse
cLoki: Like Loki but for ClickHousecLoki: Like Loki but for ClickHouse
cLoki: Like Loki but for ClickHouseAltinity Ltd
 
Automate Your Kafka Cluster with Kubernetes Custom Resources
Automate Your Kafka Cluster with Kubernetes Custom Resources Automate Your Kafka Cluster with Kubernetes Custom Resources
Automate Your Kafka Cluster with Kubernetes Custom Resources confluent
 
Honu/Big Data @ Riot Games
Honu/Big Data @ Riot GamesHonu/Big Data @ Riot Games
Honu/Big Data @ Riot GamesJerome Boulon
 
Power of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data StructuresPower of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data Structuresconfluent
 
What are Microservices | Microservices Architecture Training | Microservices ...
What are Microservices | Microservices Architecture Training | Microservices ...What are Microservices | Microservices Architecture Training | Microservices ...
What are Microservices | Microservices Architecture Training | Microservices ...Edureka!
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkFlink Forward
 

What's hot (20)

Building Your Data Streams for all the IoT
Building Your Data Streams for all the IoTBuilding Your Data Streams for all the IoT
Building Your Data Streams for all the IoT
 
Extended ecm for office 365 overview and roadmap
Extended ecm for office 365 overview and roadmapExtended ecm for office 365 overview and roadmap
Extended ecm for office 365 overview and roadmap
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and Then
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
 
Openstack - An introduction/Installation - Presented at Dr Dobb's conference...
 Openstack - An introduction/Installation - Presented at Dr Dobb's conference... Openstack - An introduction/Installation - Presented at Dr Dobb's conference...
Openstack - An introduction/Installation - Presented at Dr Dobb's conference...
 
AWS re:Invent 2016: From Monolithic to Microservices: Evolving Architecture P...
AWS re:Invent 2016: From Monolithic to Microservices: Evolving Architecture P...AWS re:Invent 2016: From Monolithic to Microservices: Evolving Architecture P...
AWS re:Invent 2016: From Monolithic to Microservices: Evolving Architecture P...
 
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR complianceOzone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
 
Q1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXL
Q1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXLQ1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXL
Q1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXL
 
INF104 - HCL Domino AppDev Pack – The Future of Domino App Dev Nobody Knows A...
INF104 - HCL Domino AppDev Pack – The Future of Domino App Dev Nobody Knows A...INF104 - HCL Domino AppDev Pack – The Future of Domino App Dev Nobody Knows A...
INF104 - HCL Domino AppDev Pack – The Future of Domino App Dev Nobody Knows A...
 
Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Ac...
Flink Forward San Francisco 2018: Andrew Gao &  Jeff Sharpe - "Finding Bad Ac...Flink Forward San Francisco 2018: Andrew Gao &  Jeff Sharpe - "Finding Bad Ac...
Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Ac...
 
Unique ID generation in distributed systems
Unique ID generation in distributed systemsUnique ID generation in distributed systems
Unique ID generation in distributed systems
 
cLoki: Like Loki but for ClickHouse
cLoki: Like Loki but for ClickHousecLoki: Like Loki but for ClickHouse
cLoki: Like Loki but for ClickHouse
 
Automate Your Kafka Cluster with Kubernetes Custom Resources
Automate Your Kafka Cluster with Kubernetes Custom Resources Automate Your Kafka Cluster with Kubernetes Custom Resources
Automate Your Kafka Cluster with Kubernetes Custom Resources
 
Honu/Big Data @ Riot Games
Honu/Big Data @ Riot GamesHonu/Big Data @ Riot Games
Honu/Big Data @ Riot Games
 
Greenplum User Case
Greenplum User Case Greenplum User Case
Greenplum User Case
 
Power of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data StructuresPower of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data Structures
 
What are Microservices | Microservices Architecture Training | Microservices ...
What are Microservices | Microservices Architecture Training | Microservices ...What are Microservices | Microservices Architecture Training | Microservices ...
What are Microservices | Microservices Architecture Training | Microservices ...
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Upgrading to Exchange 2016
Upgrading to Exchange 2016Upgrading to Exchange 2016
Upgrading to Exchange 2016
 

Similar to Screaming fast json parsing on Android

json.ppt download for free for college project
json.ppt download for free for college projectjson.ppt download for free for college project
json.ppt download for free for college projectAmitSharma397241
 
Json - ideal for data interchange
Json - ideal for data interchangeJson - ideal for data interchange
Json - ideal for data interchangeChristoph Santschi
 
Jsonsaga 100605143125-phpapp02
Jsonsaga 100605143125-phpapp02Jsonsaga 100605143125-phpapp02
Jsonsaga 100605143125-phpapp02Ramamohan Chokkam
 
Mongo db present
Mongo db presentMongo db present
Mongo db presentscottmsims
 
{"JSON, Swift and Type Safety" : "It's a wrap"}
{"JSON, Swift and Type Safety" : "It's a wrap"}{"JSON, Swift and Type Safety" : "It's a wrap"}
{"JSON, Swift and Type Safety" : "It's a wrap"}Anthony Levings
 
JSON Data Parsing in Snowflake (By Faysal Shaarani)
JSON Data Parsing in Snowflake (By Faysal Shaarani)JSON Data Parsing in Snowflake (By Faysal Shaarani)
JSON Data Parsing in Snowflake (By Faysal Shaarani)Faysal Shaarani (MBA)
 
Working with JSON Data in PostgreSQL vs. MongoDB
Working with JSON Data in PostgreSQL vs. MongoDBWorking with JSON Data in PostgreSQL vs. MongoDB
Working with JSON Data in PostgreSQL vs. MongoDBScaleGrid.io
 
sbt, history of JSON libraries, microservices, and schema evolution (Tokyo ver)
sbt, history of JSON libraries, microservices, and schema evolution (Tokyo ver)sbt, history of JSON libraries, microservices, and schema evolution (Tokyo ver)
sbt, history of JSON libraries, microservices, and schema evolution (Tokyo ver)Eugene Yokota
 
module 2.pptx for full stack mobile development application on backend applic...
module 2.pptx for full stack mobile development application on backend applic...module 2.pptx for full stack mobile development application on backend applic...
module 2.pptx for full stack mobile development application on backend applic...HemaSenthil5
 

Similar to Screaming fast json parsing on Android (20)

json.ppt download for free for college project
json.ppt download for free for college projectjson.ppt download for free for college project
json.ppt download for free for college project
 
Json
JsonJson
Json
 
Json - ideal for data interchange
Json - ideal for data interchangeJson - ideal for data interchange
Json - ideal for data interchange
 
JSON_FIles-Py (2).pptx
JSON_FIles-Py (2).pptxJSON_FIles-Py (2).pptx
JSON_FIles-Py (2).pptx
 
Jsonsaga 100605143125-phpapp02
Jsonsaga 100605143125-phpapp02Jsonsaga 100605143125-phpapp02
Jsonsaga 100605143125-phpapp02
 
Mongo db present
Mongo db presentMongo db present
Mongo db present
 
{"JSON, Swift and Type Safety" : "It's a wrap"}
{"JSON, Swift and Type Safety" : "It's a wrap"}{"JSON, Swift and Type Safety" : "It's a wrap"}
{"JSON, Swift and Type Safety" : "It's a wrap"}
 
JSON Data Parsing in Snowflake (By Faysal Shaarani)
JSON Data Parsing in Snowflake (By Faysal Shaarani)JSON Data Parsing in Snowflake (By Faysal Shaarani)
JSON Data Parsing in Snowflake (By Faysal Shaarani)
 
Javascript2839
Javascript2839Javascript2839
Javascript2839
 
Json
JsonJson
Json
 
Working with JSON Data in PostgreSQL vs. MongoDB
Working with JSON Data in PostgreSQL vs. MongoDBWorking with JSON Data in PostgreSQL vs. MongoDB
Working with JSON Data in PostgreSQL vs. MongoDB
 
JSON Injection
JSON InjectionJSON Injection
JSON Injection
 
sbt, history of JSON libraries, microservices, and schema evolution (Tokyo ver)
sbt, history of JSON libraries, microservices, and schema evolution (Tokyo ver)sbt, history of JSON libraries, microservices, and schema evolution (Tokyo ver)
sbt, history of JSON libraries, microservices, and schema evolution (Tokyo ver)
 
MongoDB 3.0
MongoDB 3.0 MongoDB 3.0
MongoDB 3.0
 
Json
JsonJson
Json
 
module 2.pptx for full stack mobile development application on backend applic...
module 2.pptx for full stack mobile development application on backend applic...module 2.pptx for full stack mobile development application on backend applic...
module 2.pptx for full stack mobile development application on backend applic...
 
Json at work overview and ecosystem-v2.0
Json at work   overview and ecosystem-v2.0Json at work   overview and ecosystem-v2.0
Json at work overview and ecosystem-v2.0
 
Json the-x-in-ajax1588
Json the-x-in-ajax1588Json the-x-in-ajax1588
Json the-x-in-ajax1588
 
Avro
AvroAvro
Avro
 
Json1
Json1Json1
Json1
 

Recently uploaded

%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationJuha-Pekka Tolvanen
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benonimasabamasaba
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
tonesoftg
tonesoftgtonesoftg
tonesoftglanshi9
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxAnnaArtyushina1
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 

Recently uploaded (20)

%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 

Screaming fast json parsing on Android

  • 1. Screaming Fast JSON parsing Karthik Ramgopal
  • 2. Who am I? Engineer Mobile Infrastructure lead Former engineer on Flagship and Pulse app teams Obsessed about performance Connect with me: https://www.linkedin.com/in/karthikrg/
  • 4. LinkedIn’s Android app family Job Search Lookup Pulse Slideshare Sales Navigator Lynda Recruiter Students
  • 5. Android device and network diversity ● Samsung Galaxy S6 ● 4x2.1 GHz Cortex-A57 + 4x1.5 GHz Cortex-A53 ● 3 GB RAM ● LTE (100 Mbits/s) ● Samsung Star Pro ● 1 Ghz Cortex A5 ● 512 MB RAM ● EDGE (384 Kbps)
  • 6. LinkedIn client app high level architecture Frontend API server
  • 7. LinkedIn uses JSON to talk between apps and server
  • 8. What is JSON? JavaScript Object Notation is a data serialization format. Key value encoded data. Values must be string, boolean, number, array, object, null. Text based, Light weight (relatively), Human readable. Wide support across programming languages/platforms
  • 9. What else is out there?
  • 10. XML (eXtensible Markup Language) (+) Text based and human readable. (-) Very verbose.
  • 11. Binary Data Formats Examples include MsgPack, ProtoBuf, FlatBuffers, Cap’n’Proto etc. (+) More compact than JSON. Positional index based formats even omit keys. (+) Backing schema to describe data structure with platform specific binding generators (+) Much faster to parse than JSON when using vanilla parsing techniques. (-) Not human readable. (-) No native parsing support in web browsers. (-) Removed fields still occupy some space in positional formats. (-) Schema evolution MUST preserve field order in positional formats.
  • 12. Data Flow Parser Model Binder View Binder Data (JSON/XML/Binary) DataModel ViewModel Network Fission DataModel MMAP Cache Binary
  • 13. What affects JSON parsing performance? CPU Validating structure and tokenizing. Large number of branches causing pipeline stalls. Memory Large number of small allocs on the heap Causes memory churn slowing down the allocator Garbage collection pauses
  • 14. Types of JSON parsers Who controls the flow of parsed data to the consumer? Pull parser (Consumer controls) Push parser (Parser controls) How many times is the data processed? Once (traditional parsers) Twice (index overlay parsers) How is the data processed?
  • 15. JSON vs Binary JSON (naturally) has a size disadvantage over binary But, it is human readable and has wider multi-platform support Schema evolution is easier
  • 16. Size does matter or does it? JSON compresses very well being text based and having key repetition Binary formats don’t compress as well With compression, size over the wire is very comparable Decompression cost is similar, but after decompression binary is smaller Format Compressed size (gzip) Uncompressed size JSON 35.2 KB 309.5 KB ProtocolBuffers 33.7 KB 178.2 KB FlatBuffers 34.1 KB 192.8 KB Cap’n’Proto 33.8 KB 166.3 KB LinkedIn Feed 20 items (90th percentile sizes)
  • 17. Comparison of Android JSON parsing libraries Parser Streaming Reflection Parse time (ms) Allocation (KB) JSONObject No No 297/281 2397/2371 JsonReader Yes No 199/187 409/396 Alibaba streaming Yes No 72/70 220/185 GSON Yes Yes 521/486 1135/302 Moshi Yes Yes 493/311 1088/341 Jackson Databind Yes Yes 402/78 1192/191 Jackson streaming Yes No 79/77 219/187 LinkedIn Feed 20 items (First/Subsequent) Nexus 5 ● Using reflection introduces a massive first time penalty. ● Alibaba and Jackson streaming win hands down with Alibaba having the slight edge.
  • 18. What is the ideal way to parse network responses? Streaming (SAX) vs blob (DOM) parsing Stream means parsing can begin before network download finishes. Memory pressure/Garbage is reduced with streaming. Typically harder to code by hand (need to handle incremental data load etc.) Minimize transformations Typical parsing involves JSON -> Map -> Model object POJO. Intermediary transformation involves CPU and memory. Go directly from JSON to POJO.
  • 19. Android specific code generation considerations Prefer fields instead of methods for accessors on POJO. 65k method count limit pre Android L Virtual function execution penalty Use primitive types wherever possible int instead of Integer for example Boxed values are allocated on the heap and result in unnecessary memory churn Generate compact code
  • 20. Surely someone must have figured all this out? Yes! Open source codegenerating JSON parsers based on Jackson streaming. Instagram JSON parser LoganSquare (Uses a teeny bit of reflection)
  • 21. How does the generated code look? { “numConnections” : 20, “name”: “John” } profile.json Profile build(JsonParser parser) { String name; int numConnections; parser.startRecord(); // Consumes ‘{’ while (parser.hasMoreFields()) { String field = parser.getText(); parser.startField(); // Consumes ‘:’ if (“numConnections”.equals(field)) { numConnections = parser.getInteger(); } else if (“name”.equals(field)) { name = parser.getText(); } else { parser.skipField(); } } return new Profile(numConnections, name); }
  • 22. But binary still wins! Much faster (Lesser CPU consumption) Much less intermediary memory allocs (Memory churn/Garbage reduced) Parser Streaming Reflection Parse time (ms) Allocation (KB) Alibaba streaming Yes No 72/70 220/185 Jackson streaming Yes No 79/77 219/187 Protocol Buffers Lite Yes No 32/31 66/62 LinkedIn Feed 20 items (First/Subsequent) Nexus 5
  • 23. The gap is wider on lower end devices Binary is ~4x faster Could be the difference between delight and despair! Parser Streaming Reflection Parse time (ms) Allocation (KB) Alibaba streaming Yes No 377/370 220/185 Jackson streaming Yes No 392/397 219/187 Protocol Buffers Lite Yes No 99/97 66/62 LinkedIn Feed 20 items (First/Subsequent) Galaxy Star Pro
  • 24. Closing the gap with binary Make the CPU do less work when parsing JSON Fewer memory allocations Reduce garbage and memory churn All when parsing more data
  • 25. Don’t pay for what you don’t use
  • 26. The hunt for inefficiencies: JSON keys Positional binary formats achieve compaction and faster parsing since they don’t serialize keys, and use position based encoding. Parsing keys involves the following Allocating key strings. Comparing key strings with known “keys” to figure out which field to match
  • 27. Back to code Profile build(JsonParser parser) { String name; int numConnections; parser.startRecord(); // Consumes ‘{’ while (parser.hasMoreFields()) { String field = parser.getText(); parser.startField(); // Consumes ‘:’ if (“numConnections”.equals(field)) { numConnections = parser.getInteger(); } else if (“name”.equals(field)) { name = parser.getText(); } else { parser.skipField(); } } return new Profile(numConnections, name); } String alloc Comparisons
  • 28. The cost of JSON key comparisons If there are ‘n’ keys with an average length of ‘k’. Temporary memory allocation space complexity O(nk) Equality checking time complexity O(n2k) But we know the keys in advance, so can we use this to our advantage?
  • 29. Yes! Use a trie with positional ordinals as values n a m e u m s 1 0 ● Trades a 1 time static space allocation for faster performance. ● No temp string allocation. Read character by character from source and check in trie. ● Avoids multiple comparison branches using if-else. ● Trie can be statically generated (since all key names are known in advance) ● Trie can also be compacted to reduce storage space for non redundant subsequences. ● Reduces space complexity to a 1 time cost of O(nk) ● Reduces equality checking time complexity to O(nk) ● Faster performance due to lesser branching.
  • 30. Generated code with Trie n a m e u m s 1 0 private static final Trie KEY_STORE = new Trie(); static { KEY_STORE.put(“name”, 0); KEY_STORE.put(“numConnections”, 1); } Profile build(NewJsonParser parser) { String name; int numConnections; parser.startRecord(); // Consumes ‘{’ while (parser.hasMoreFields()) { int ordinal = parser.getFieldOrdinal(KEY_STORE); parser.startField(); // Consumes ‘:’ switch (ordinal) { case 0: numConnections = parser.getInteger(); Break; case 1: name = parser.getText(); Break; default: parser.skipField(); } } return new Profile(numConnections, name); }
  • 31. How does this change the numbers? Closes the gap but not enough! Parser Parse time (ms) Allocation (KB) Alibaba streaming 72/70 220/185 Jackson streaming 79/77 219/187 Protocol Buffers Lite 32/31 66/62 New Json parser 57/55 129/107 LinkedIn Feed 20 items (First/Subsequent) Nexus 5
  • 32. Expoiting prior knowledge of value types Our JSON is backed by a schema. Schemas are written using an IDL. We internally use PDL (Pegasus Data Language) as the IDL. record Profile { numConnections: int? name: String? } ● Records define a JSON object. ● Field names here are the field names in the serialized JSON. ● Types in the schema are types of values in the serialized JSON. ● Knowing types beforehand means parsing code can be lax and needn’t have strict checks. ● If an unexpected type is found, JSON is malformed, abort! { “numConnections” : 20, “name”: “John” }
  • 33. Vanilla JSON parser field value parsing Field start (:) Object/Map Array Number BooleanString Null { [ -/ 0 to 9 “ t or f n ● Since we know types beforehand, these branches can be avoided. ● Parsing of value can be on-demand. ● Significantly reduces parse time.
  • 34. How does this change the numbers? Closes the gap more on parse time, temp allocations are still pretty bad! Parser Parse time (ms) Allocation (KB) Alibaba streaming 72/70 220/185 Jackson streaming 79/77 219/187 Protocol Buffers Lite 32/31 66/62 New Json parser 45/42 127/108 LinkedIn Feed 20 items (First/Subsequent) Nexus 5
  • 35. All obvious issues seem fixed. What else? Sometimes profiling is the only answer to find hotspots. Data arrives as a UTF-8 byte stream over the network not as chars. LinkedIn app payloads are massively String heavy. Profiling showed some CPU and allocation hotspots Converting bytes to chars using Java’s built-in decoder. Reading strings.
  • 36. Converting bytes to chars? Another transformation. Temporary memory allocs for decoding buffers etc. Most JSON tokens are ASCII, can use just 1 byte for them instead of 2 Surprise! Jackson, Alibaba etc. do have separate UTF-8 stream parsers. We adopt a Jackson-like optimized approach when decoding UTF-8 strings.
  • 37. UTF-8 decoding Variable length encoding 1 byte/ASCII characters (U+0000 to U+007F) 2 byte chars (U+0080 to U+07FF) 3 byte chars (U+0800 to U+FFFF) 4 byte chars (U+10000 to U+10FFFF) int c = inputStream.read(); if (c < 0x007f) { // read 1 byte UTF } else if ((c & 0xE0) == 0xC0) { // 2 bytes (0x0080 - 0x07FF) // read 2 byte UTF } else if ((c & 0xF0) == 0xE0) { // 3 bytes (0x0800 - 0xFFFF) // read 3 byte UTF } else if ((c & 0xF8) == 0xF0) { // 4 bytes; double-char with surrogates. // read 4 byte UTF } Upto 4 branches
  • 38. Can we make this faster? Yes! ● Static 256 int alloc, but helps us massively during decode. ● Reduces CPU computation during decode as well as branches. ● Massively speeds up string decode.
  • 39. UTF-8 decoding revised int c = inputStream.read(); switch (UTF_8_LOOKUP_TABLE[c]) { case 0: // read 1 byte char; break; case 2: // read 2 byte char; break; case 3: // read 3 byte char; break; case 4: // read 4 byte char; break; default: // handle error; break; } 1 branch, 1 comparison computation per char
  • 40. Reading long strings Traditional approach using StringBuilder: StringBuilder builder = new StringBuilder(); while (!parser.stringEndReached()) { builder.add(parser.nextChar()); } return builder.toString(); ● Every time buffer is enlarged to make more space three things happen ○ Allocating a new buffer (CPU + memory alloc). ○ Copying from old buffer to new buffer (CPU cost). ○ Garbage collecting old buffer (Memory churn and garbage). ● If we pool the underlying buffers in a buffer pool, and use a custom ‘StringBuilder’ ○ Memory alloc, garbage and churn reduced. ○ CPU cost of copy still remains. ○ Over large, diverse payloads, pool becomes fragmented so efficiency reduces.
  • 41. Reading long strings Segmentation using pooled homogeneous buffers helps performance. Zero copy cost when builder is enlarged (New buffer is appended to list) Memory alloc, churn and garbage cost amortized by pooling. Segmentation into homogeneous chunks means no fragmentation. Final string computation may be slightly slower, but buffer size is chosen in a way that advantages elsewhere more than cover it. Buffer 1 Buffer 2 Buffer 3 Buffer 4
  • 42. Characters not in the basic multilingual plane Not encoded as codepoints. Encoded as UTF-16 surrogate pairs escaped with u. Historic reason for doing so (Any guesses?) Needs to be handled carefully when parsing Static decoder table for hex chars similar to UTF-8 to speed up parsing. U+1D11E -> uD834uDD1E
  • 43. Analysis of string content Strings in LinkedIn apps tend to be very ASCII character heavy. Even string values in other locales often are interspersed with ASCII content. ASCII characters often occur together in a sequence. Parsing these can be speeded up if we use a tight loop for ASCII content. Break out and do extra branches if non ASCII content is encountered. Massively improves overall string parsing performance from byte streams. When reading ASCII byte is the same as the char.
  • 44. Whitespaces JSON sent over the wire is not pretty printed for compaction. When parsing delimiters, check for delimiter first, before skipping whitespace. Within whitespaces itself, a plain space has a higher chance of occuring than a carriage return, line feed or tab. Tight loop for space characters when skipping whitespace.
  • 45. After doing all this... The performance is very comparable! Parser Parse time (ms) Allocation (KB) Alibaba streaming 72/70 220/185 Jackson streaming 79/77 219/187 Protocol Buffers Lite 32/31 66/62 New Json parser 31/30 62/41 LinkedIn Feed 20 items (First/Subsequent) Nexus 5 ● Still human readable ● Still debuggable ● Can still use the same format across iOS/Android/Web
  • 46. And on low end devices... The improvements are more profound! Parser Parse time (ms) Allocation (KB) Alibaba streaming 377/370 220/185 Jackson streaming 392/397 219/187 Protocol Buffers Lite 99/97 66/62 New Json parser 99/96 62/41 LinkedIn Feed 20 items (First/Subsequent) Samsung Star Pro ● Most of the benefit comes from saving on alloc and GC pauses ● Results in smoother UI
  • 47. Zero Garbage! This new parser is Zero garbage. It does not allocate any transient memory beyond the POJOs it creates as the result of parse. All intermittent allocs like buffers are pooled. Pools are homogeneous as much as possible to limit fragmentation. Pool capacities/buffer sizes are tuned based on device and network.
  • 48. Lessons learnt It is possible to parse JSON fast even on low end Android devices. All formats have their achille’s heels, and there is no one size fits all. Never adopt some cool new format blindly. Measure measure measure!
  • 49. What’s next? Similar parser + codegen for iOS in Obj-C Open source both as part of Rest.li mobile optimized bindings. Targeted for Q4 2017

Editor's Notes

  1. Typical of our 90th pc devices in US and India