SlideShare a Scribd company logo
1 of 33
Download to read offline
Handling Large StateHandling Large State
on BEAMon BEAM
 
Yoshihiro Tanaka
Erlang & Elixir SF Meetup 2017-June-29
1 . 1
TopicsTopics
Problem statement & Use-case / experience
Process Structure
Data Copy in Erlang VM
Constant Pool & Demo
1 . 2
Problem:Problem:
How to deal withHow to deal with
Large number/sizeLarge number/size
of State ?of State ?
1 . 3
Our Story: BackendOur Story: Backend
component Requirementcomponent Requirement
Matching incoming data to the local inventories
4-5k rules(inventory) to apply for each incoming request
Return qualified inventories to caller
Matching
Incoming rules,
properties
Qualified inventries
Caller
Each request
Set of Rules
1 . 4
Design 1: A process keeps one ruleDesign 1: A process keeps one rule
Forward the request to all
processes, gather results
1 . 5Rule1 Rule2 Rule3 Rule4
ETS
Design 1: A rule is kept as fun object andDesign 1: A rule is kept as fun object and
evaluatedevaluated
1> {ok, F} = formulerl_fun:compile(
"if (x > 23) then
{ x^5 + 2*x^3 + 5*x^2 + 48 }
else { x^3 + 2*x^4 + 5*x + 48 }"
).
{ok,#Fun<formulerl_fun.6.29233594>}
2> F(dict:store("x", 2, dict:new())).
98.0
3> erlang:size(term_to_binary(F)).
2961
Simple proof of concept:
 
This 'Fun' object can be stored/read from/to ETS
https://github.com/hirotnk/formulerl
1 . 6
Problem of Design1: A processProblem of Design1: A process
keeps one rulekeeps one rule
1 request = messages to&from each
process(rule) ex. 2k * 2=~4k messages
Too many processes(ex. 2k) in run queue
 
1 . 7
ETS
Design 2: Process Pool and One workerDesign 2: Process Pool and One worker
keeps All ruleskeeps All rules
Pool Worker
processes
1 . 8
Problem of Design2: MessageProblem of Design2: Message
costs & Memory usagecosts & Memory usage
This is the common pattern, but...
One worker process keeps all rules(data)
=> high memory usage
Cost of message passing: Need to
forward a request to a worker process
1 . 9
Cost of MessageCost of Message
Passing?Passing?
Wait, isn't it fast/cheap on Erlang VM ?
1 . 10
Process StructureProcess Structure
PCB Stack
Heap
Mail box
Free
Old Heap
Processes don't share state
m-buf m-buf m-buf
1 . 11
Cost of Message PassingCost of Message Passing
Sending process has to:
1. Calculate the size of the message
2. Allocate memory for the size (if needed off heap copy)
3. Copy the message to allocated memory
4. Allocate defined structure for message
5. Link the message to receiving-process' mailbox
Receiving-process copies data from its mailbox to
process heap
These steps involve lock/unlock, GC, memory allocation
So, the cost of message passing is
something, rather than nothing. 1 . 12
Constant
Pool
1 . 13
Design 3: Keep data in Constant poolDesign 3: Keep data in Constant pool
All processing happens in one process,
the result is returned from it.
No messages involved.
So, what is Constant Pool ?So, what is Constant Pool ?
“ Constant Erlang terms (also called literals)
are now kept in constant pools; each loaded
module has its own pool.
 
From 8.2  Process Messages:
 http://erlang.org/doc/efficiency_guide/processes.html
1 . 14
Benefit of using Constant PoolBenefit of using Constant Pool
Global Access without Data Copy
1 . 15
Constant Pool DemoConstant Pool Demo
Compile with: erlc -'S' cp.erl
1 -module(cp).
2 -export([
3 get/0,
4 get_shared/0
5 ]).
6
7 get() ->
8 lists:duplicate(10000, $a).
9
10 get_shared() ->
11 L = "aaaa....aa", % omitted
12 10000 = length(L),
13 L.
14
1 . 16
1 {module, cp}. %% version = 0
2
3 {exports, [{get,0},
{get_shared,0},
{module_info,0},{module_info,1},{vsn,0}]}.
4
5 {attributes, []}.
6
7 {labels, 11}.
8
9
10 {function, get, 0, 2}.
11 {label,1}.
12 {line,[{location,"cp.erl",10}]}.
13 {func_info,{atom,cp},{atom,get},0}.
14 {label,2}.
15 {move,{integer,97},{x,1}}.
16 {move,{integer,10000},{x,0}}.
17 {line,[{location,"cp.erl",11}]}.
18 {call_ext_only,2,{extfunc,lists,duplicate,2}}.
19
20
21 {function, get_shared, 0, 4}.
22 {label,3}.
23 {line,[{location,"cp.erl",13}]}.
24 {func_info,{atom,cp},{atom,get_shared},0}.
25 {label,4}.
26 {move,{literal,"aaa...aaa"}, % Omitted actual "aaa.."
27 {x,0}}.
28 return.
29... 1 . 17
4> spawn(fun() -> L = cp:get_shared(), io:format("-->~p~n", [process_info(self(), [heap_size])])
end).
-->[{heap_size,233}]
<0.41.0>
5> spawn(fun() -> L = cp:get(), io:format("-->~p~n", [process_info(self(), [heap_size])]) end).
-->[{heap_size,1598}]
<0.43.0>
Constant Pool DemoConstant Pool Demo
1 . 18
Caveat about Constant PoolCaveat about Constant Pool
When the code is unloaded, the constants are copied
to the heap of the processes that refer to them
(Efficiency guide 8.2)
Current Current Old
v1 v1v2
Current Old
v2v3
1 . 19
CP CP
Reference to CP
Benefit of Design 3: No Data CopyBenefit of Design 3: No Data Copy
Reading from Constant Pool !=     
Reading from ETS 
Reading from Constant Pool => no copy
Reading from ETS => data is copied to the process
No message needed for each incoming
requests
1 . 20
Design 3: But wait !Design 3: But wait !
You can not.
Instead, we turned rules into module using .Merl
How can you turnHow can you turn funfun objects intoobjects into
Constant Pool ?Constant Pool ?
1> formulerl_beam:compile(
calc_example_mod,
"if (x > 23) then
{ x^5 + 2*x^3 + 5*x^2 + 48 }
else
{ x^3 + 2*x^4 + 5*x + 48 }"
).
ok
2> calc_example_mod:calc(dict:store("x", 2, dict:new())).
98.0
Simple proof of concept:
 
data == code + constant pool
https://github.com/hirotnk/formulerl
1 . 21
Revisit Design 1: A rule is kept as funRevisit Design 1: A rule is kept as fun
object and evaluatedobject and evaluated
1> {ok, F} = formulerl_fun:compile(
"if (x > 23) then
{ x^5 + 2*x^3 + 5*x^2 + 48 }
else { x^3 + 2*x^4 + 5*x + 48 }"
).
{ok,#Fun<formulerl_fun.6.29233594>}
2> F(dict:store("x", 2, dict:new())).
98.0
3> erlang:size(term_to_binary(F)).
2961
Simple proof of concept:
 
This 'Fun' object can be stored/read from/to ETS
https://github.com/hirotnk/formulerl
1 . 22
Cool, all sounds good now.
BUT...Capacity increase was only ~100%
1 . 23
Further investigation: Cost of Data CopyFurther investigation: Cost of Data Copy
It was reading data from ETS in the code
Data was supposed to be small...except
some entries...
A couple of entries contained large data
We fixed it by turning those into binary (refc
binary)
1 . 24
The result: ~800% capacity increase in totalThe result: ~800% capacity increase in total
Latency (ms)
1 . 25
When data copy happens ?When data copy happens ?
Message passing
Process creation
Read/Write from ETS/DTS/Mnesia
1 . 26
When data copy does NOTWhen data copy does NOT
happen ?happen ?
Passing data around inside process(ex. function
call)
Binary > 64 bytes are not copied
Read from Constant Pool
Literals are not copied between processes for
messages or spawn (new behavior ~20.0rc-1)
1 . 27
20.0-rc120.0-rc1
  OTP­13529    Application(s): erts 
 
               Erlang literals are no longer 
copied during process to process messaging.
1 . 28
Summary of our use case:Summary of our use case:
How to avoid data copyHow to avoid data copy
1. Turn thousands of URLs into trie tree, save it to
constant pool using mochiglobal. Then processes
can access to that tree data in parallel, without any
copy
2. When we have relatively large data, try to turn it
into binary
3. Turn DSL rules(4-5K) into modules using
merl/syntax_tools (this works fine with Erlang VM)
1 . 29
Constant Pool ToolsConstant Pool Tools
parse transform
mochiglobal
merl
Macro ?
???
fastglobal
Erlang Elixir
1 . 30
ReferenceReference
https://github.com/happi/theBeamBook
https://www.erlang-solutions.com/blog/erlang-19-0-garbage-
collector.html
1 . 31
AcknowledgmentsAcknowledgments
This work was done with following my colleagues at OpenX, and I'd
like to express my appreciation for their insights and helps:
Kenan Gillet
David Hull
1 . 32
Thank you !Thank you !
 
Q & AQ & A
 
1 . 33

More Related Content

What's hot

Concurrency at the Database Layer
Concurrency at the Database Layer Concurrency at the Database Layer
Concurrency at the Database Layer mcwilson1
 
A software agent controlling 2 robot arms in co-operating concurrent tasks
A software agent controlling 2 robot arms in co-operating concurrent tasksA software agent controlling 2 robot arms in co-operating concurrent tasks
A software agent controlling 2 robot arms in co-operating concurrent tasksRuleML
 
[241]large scale search with polysemous codes
[241]large scale search with polysemous codes[241]large scale search with polysemous codes
[241]large scale search with polysemous codesNAVER D2
 
Distributed real time stream processing- why and how
Distributed real time stream processing- why and howDistributed real time stream processing- why and how
Distributed real time stream processing- why and howPetr Zapletal
 
CS6401 Operating systems - Solved Examples
CS6401 Operating systems - Solved ExamplesCS6401 Operating systems - Solved Examples
CS6401 Operating systems - Solved Examplesramyaranjith
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Robert Evans
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkDongwon Kim
 
Developing Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache StormDeveloping Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache StormLester Martin
 
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time ComputationSonal Raj
 
S4: Distributed Stream Computing Platform
S4: Distributed Stream Computing PlatformS4: Distributed Stream Computing Platform
S4: Distributed Stream Computing PlatformAleksandar Bradic
 
Spark performance tuning eng
Spark performance tuning engSpark performance tuning eng
Spark performance tuning enghaiteam
 
Message-passing concurrency in Python
Message-passing concurrency in PythonMessage-passing concurrency in Python
Message-passing concurrency in PythonSarah Mount
 
Realtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQRealtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQXin Wang
 
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTING
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTINGLOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTING
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTINGijccsa
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormDavorin Vukelic
 
Using Simplicity to Make Hard Big Data Problems Easy
Using Simplicity to Make Hard Big Data Problems EasyUsing Simplicity to Make Hard Big Data Problems Easy
Using Simplicity to Make Hard Big Data Problems Easynathanmarz
 

What's hot (19)

Jan 2012 HUG: Storm
Jan 2012 HUG: StormJan 2012 HUG: Storm
Jan 2012 HUG: Storm
 
Concurrency at the Database Layer
Concurrency at the Database Layer Concurrency at the Database Layer
Concurrency at the Database Layer
 
Storm Anatomy
Storm AnatomyStorm Anatomy
Storm Anatomy
 
A software agent controlling 2 robot arms in co-operating concurrent tasks
A software agent controlling 2 robot arms in co-operating concurrent tasksA software agent controlling 2 robot arms in co-operating concurrent tasks
A software agent controlling 2 robot arms in co-operating concurrent tasks
 
[241]large scale search with polysemous codes
[241]large scale search with polysemous codes[241]large scale search with polysemous codes
[241]large scale search with polysemous codes
 
Distributed real time stream processing- why and how
Distributed real time stream processing- why and howDistributed real time stream processing- why and how
Distributed real time stream processing- why and how
 
CS6401 Operating systems - Solved Examples
CS6401 Operating systems - Solved ExamplesCS6401 Operating systems - Solved Examples
CS6401 Operating systems - Solved Examples
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmark
 
04 pig data operations
04 pig data operations04 pig data operations
04 pig data operations
 
Developing Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache StormDeveloping Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache Storm
 
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time Computation
 
S4: Distributed Stream Computing Platform
S4: Distributed Stream Computing PlatformS4: Distributed Stream Computing Platform
S4: Distributed Stream Computing Platform
 
Spark performance tuning eng
Spark performance tuning engSpark performance tuning eng
Spark performance tuning eng
 
Message-passing concurrency in Python
Message-passing concurrency in PythonMessage-passing concurrency in Python
Message-passing concurrency in Python
 
Realtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQRealtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQ
 
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTING
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTINGLOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTING
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTING
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache Storm
 
Using Simplicity to Make Hard Big Data Problems Easy
Using Simplicity to Make Hard Big Data Problems EasyUsing Simplicity to Make Hard Big Data Problems Easy
Using Simplicity to Make Hard Big Data Problems Easy
 

Similar to Handling Large State on BEAM

0.5mln packets per second with Erlang
0.5mln packets per second with Erlang0.5mln packets per second with Erlang
0.5mln packets per second with ErlangMaxim Kharchenko
 
Making fitting in RooFit faster
Making fitting in RooFit fasterMaking fitting in RooFit faster
Making fitting in RooFit fasterPatrick Bos
 
Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE confluent
 
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEKafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEkawamuray
 
Workshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
Workshop "Can my .NET application use less CPU / RAM?", Yevhen TatarynovWorkshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
Workshop "Can my .NET application use less CPU / RAM?", Yevhen TatarynovFwdays
 
Sedna XML Database: Executor Internals
Sedna XML Database: Executor InternalsSedna XML Database: Executor Internals
Sedna XML Database: Executor InternalsIvan Shcheklein
 
0.5mln packets per second with Erlang
0.5mln packets per second with Erlang0.5mln packets per second with Erlang
0.5mln packets per second with ErlangMaxim Kharchenko
 
GC free coding in @Java presented @Geecon
GC free coding in @Java presented @GeeconGC free coding in @Java presented @Geecon
GC free coding in @Java presented @GeeconPeter Lawrey
 
A Scalable I/O Manager for GHC
A Scalable I/O Manager for GHCA Scalable I/O Manager for GHC
A Scalable I/O Manager for GHCJohan Tibell
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...EUDAT
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014P. Taylor Goetz
 
maXbox Starter 42 Multiprocessing Programming
maXbox Starter 42 Multiprocessing Programming maXbox Starter 42 Multiprocessing Programming
maXbox Starter 42 Multiprocessing Programming Max Kleiner
 
Ireland OUG Meetup May 2017
Ireland OUG Meetup May 2017Ireland OUG Meetup May 2017
Ireland OUG Meetup May 2017Brendan Tierney
 
Threaded-Execution and CPS Provide Smooth Switching Between Execution Modes
Threaded-Execution and CPS Provide Smooth Switching Between Execution ModesThreaded-Execution and CPS Provide Smooth Switching Between Execution Modes
Threaded-Execution and CPS Provide Smooth Switching Between Execution ModesESUG
 
Many Task Applications for Grids and Supercomputers
Many Task Applications for Grids and SupercomputersMany Task Applications for Grids and Supercomputers
Many Task Applications for Grids and SupercomputersIan Foster
 

Similar to Handling Large State on BEAM (20)

0.5mln packets per second with Erlang
0.5mln packets per second with Erlang0.5mln packets per second with Erlang
0.5mln packets per second with Erlang
 
Making fitting in RooFit faster
Making fitting in RooFit fasterMaking fitting in RooFit faster
Making fitting in RooFit faster
 
Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE
 
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEKafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
 
Workshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
Workshop "Can my .NET application use less CPU / RAM?", Yevhen TatarynovWorkshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
Workshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
 
Sedna XML Database: Executor Internals
Sedna XML Database: Executor InternalsSedna XML Database: Executor Internals
Sedna XML Database: Executor Internals
 
Apex code benchmarking
Apex code benchmarkingApex code benchmarking
Apex code benchmarking
 
Handout3o
Handout3oHandout3o
Handout3o
 
0.5mln packets per second with Erlang
0.5mln packets per second with Erlang0.5mln packets per second with Erlang
0.5mln packets per second with Erlang
 
GC free coding in @Java presented @Geecon
GC free coding in @Java presented @GeeconGC free coding in @Java presented @Geecon
GC free coding in @Java presented @Geecon
 
A Scalable I/O Manager for GHC
A Scalable I/O Manager for GHCA Scalable I/O Manager for GHC
A Scalable I/O Manager for GHC
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
 
13 tm adv
13 tm adv13 tm adv
13 tm adv
 
maXbox Starter 42 Multiprocessing Programming
maXbox Starter 42 Multiprocessing Programming maXbox Starter 42 Multiprocessing Programming
maXbox Starter 42 Multiprocessing Programming
 
Ireland OUG Meetup May 2017
Ireland OUG Meetup May 2017Ireland OUG Meetup May 2017
Ireland OUG Meetup May 2017
 
Threaded-Execution and CPS Provide Smooth Switching Between Execution Modes
Threaded-Execution and CPS Provide Smooth Switching Between Execution ModesThreaded-Execution and CPS Provide Smooth Switching Between Execution Modes
Threaded-Execution and CPS Provide Smooth Switching Between Execution Modes
 
Compiler Design Unit 4
Compiler Design Unit 4Compiler Design Unit 4
Compiler Design Unit 4
 
Inferno Scalable Deep Learning on Spark
Inferno Scalable Deep Learning on SparkInferno Scalable Deep Learning on Spark
Inferno Scalable Deep Learning on Spark
 
Many Task Applications for Grids and Supercomputers
Many Task Applications for Grids and SupercomputersMany Task Applications for Grids and Supercomputers
Many Task Applications for Grids and Supercomputers
 

Recently uploaded

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 

Recently uploaded (20)

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 

Handling Large State on BEAM

  • 1. Handling Large StateHandling Large State on BEAMon BEAM   Yoshihiro Tanaka Erlang & Elixir SF Meetup 2017-June-29 1 . 1
  • 2. TopicsTopics Problem statement & Use-case / experience Process Structure Data Copy in Erlang VM Constant Pool & Demo 1 . 2
  • 3. Problem:Problem: How to deal withHow to deal with Large number/sizeLarge number/size of State ?of State ? 1 . 3
  • 4. Our Story: BackendOur Story: Backend component Requirementcomponent Requirement Matching incoming data to the local inventories 4-5k rules(inventory) to apply for each incoming request Return qualified inventories to caller Matching Incoming rules, properties Qualified inventries Caller Each request Set of Rules 1 . 4
  • 5. Design 1: A process keeps one ruleDesign 1: A process keeps one rule Forward the request to all processes, gather results 1 . 5Rule1 Rule2 Rule3 Rule4 ETS
  • 6. Design 1: A rule is kept as fun object andDesign 1: A rule is kept as fun object and evaluatedevaluated 1> {ok, F} = formulerl_fun:compile( "if (x > 23) then { x^5 + 2*x^3 + 5*x^2 + 48 } else { x^3 + 2*x^4 + 5*x + 48 }" ). {ok,#Fun<formulerl_fun.6.29233594>} 2> F(dict:store("x", 2, dict:new())). 98.0 3> erlang:size(term_to_binary(F)). 2961 Simple proof of concept:   This 'Fun' object can be stored/read from/to ETS https://github.com/hirotnk/formulerl 1 . 6
  • 7. Problem of Design1: A processProblem of Design1: A process keeps one rulekeeps one rule 1 request = messages to&from each process(rule) ex. 2k * 2=~4k messages Too many processes(ex. 2k) in run queue   1 . 7
  • 8. ETS Design 2: Process Pool and One workerDesign 2: Process Pool and One worker keeps All ruleskeeps All rules Pool Worker processes 1 . 8
  • 9. Problem of Design2: MessageProblem of Design2: Message costs & Memory usagecosts & Memory usage This is the common pattern, but... One worker process keeps all rules(data) => high memory usage Cost of message passing: Need to forward a request to a worker process 1 . 9
  • 10. Cost of MessageCost of Message Passing?Passing? Wait, isn't it fast/cheap on Erlang VM ? 1 . 10
  • 11. Process StructureProcess Structure PCB Stack Heap Mail box Free Old Heap Processes don't share state m-buf m-buf m-buf 1 . 11
  • 12. Cost of Message PassingCost of Message Passing Sending process has to: 1. Calculate the size of the message 2. Allocate memory for the size (if needed off heap copy) 3. Copy the message to allocated memory 4. Allocate defined structure for message 5. Link the message to receiving-process' mailbox Receiving-process copies data from its mailbox to process heap These steps involve lock/unlock, GC, memory allocation So, the cost of message passing is something, rather than nothing. 1 . 12
  • 13. Constant Pool 1 . 13 Design 3: Keep data in Constant poolDesign 3: Keep data in Constant pool All processing happens in one process, the result is returned from it. No messages involved.
  • 14. So, what is Constant Pool ?So, what is Constant Pool ? “ Constant Erlang terms (also called literals) are now kept in constant pools; each loaded module has its own pool.   From 8.2  Process Messages:  http://erlang.org/doc/efficiency_guide/processes.html 1 . 14
  • 15. Benefit of using Constant PoolBenefit of using Constant Pool Global Access without Data Copy 1 . 15
  • 16. Constant Pool DemoConstant Pool Demo Compile with: erlc -'S' cp.erl 1 -module(cp). 2 -export([ 3 get/0, 4 get_shared/0 5 ]). 6 7 get() -> 8 lists:duplicate(10000, $a). 9 10 get_shared() -> 11 L = "aaaa....aa", % omitted 12 10000 = length(L), 13 L. 14 1 . 16
  • 17. 1 {module, cp}. %% version = 0 2 3 {exports, [{get,0}, {get_shared,0}, {module_info,0},{module_info,1},{vsn,0}]}. 4 5 {attributes, []}. 6 7 {labels, 11}. 8 9 10 {function, get, 0, 2}. 11 {label,1}. 12 {line,[{location,"cp.erl",10}]}. 13 {func_info,{atom,cp},{atom,get},0}. 14 {label,2}. 15 {move,{integer,97},{x,1}}. 16 {move,{integer,10000},{x,0}}. 17 {line,[{location,"cp.erl",11}]}. 18 {call_ext_only,2,{extfunc,lists,duplicate,2}}. 19 20 21 {function, get_shared, 0, 4}. 22 {label,3}. 23 {line,[{location,"cp.erl",13}]}. 24 {func_info,{atom,cp},{atom,get_shared},0}. 25 {label,4}. 26 {move,{literal,"aaa...aaa"}, % Omitted actual "aaa.." 27 {x,0}}. 28 return. 29... 1 . 17
  • 18. 4> spawn(fun() -> L = cp:get_shared(), io:format("-->~p~n", [process_info(self(), [heap_size])]) end). -->[{heap_size,233}] <0.41.0> 5> spawn(fun() -> L = cp:get(), io:format("-->~p~n", [process_info(self(), [heap_size])]) end). -->[{heap_size,1598}] <0.43.0> Constant Pool DemoConstant Pool Demo 1 . 18
  • 19. Caveat about Constant PoolCaveat about Constant Pool When the code is unloaded, the constants are copied to the heap of the processes that refer to them (Efficiency guide 8.2) Current Current Old v1 v1v2 Current Old v2v3 1 . 19 CP CP Reference to CP
  • 20. Benefit of Design 3: No Data CopyBenefit of Design 3: No Data Copy Reading from Constant Pool !=      Reading from ETS  Reading from Constant Pool => no copy Reading from ETS => data is copied to the process No message needed for each incoming requests 1 . 20
  • 21. Design 3: But wait !Design 3: But wait ! You can not. Instead, we turned rules into module using .Merl How can you turnHow can you turn funfun objects intoobjects into Constant Pool ?Constant Pool ? 1> formulerl_beam:compile( calc_example_mod, "if (x > 23) then { x^5 + 2*x^3 + 5*x^2 + 48 } else { x^3 + 2*x^4 + 5*x + 48 }" ). ok 2> calc_example_mod:calc(dict:store("x", 2, dict:new())). 98.0 Simple proof of concept:   data == code + constant pool https://github.com/hirotnk/formulerl 1 . 21
  • 22. Revisit Design 1: A rule is kept as funRevisit Design 1: A rule is kept as fun object and evaluatedobject and evaluated 1> {ok, F} = formulerl_fun:compile( "if (x > 23) then { x^5 + 2*x^3 + 5*x^2 + 48 } else { x^3 + 2*x^4 + 5*x + 48 }" ). {ok,#Fun<formulerl_fun.6.29233594>} 2> F(dict:store("x", 2, dict:new())). 98.0 3> erlang:size(term_to_binary(F)). 2961 Simple proof of concept:   This 'Fun' object can be stored/read from/to ETS https://github.com/hirotnk/formulerl 1 . 22
  • 23. Cool, all sounds good now. BUT...Capacity increase was only ~100% 1 . 23
  • 24. Further investigation: Cost of Data CopyFurther investigation: Cost of Data Copy It was reading data from ETS in the code Data was supposed to be small...except some entries... A couple of entries contained large data We fixed it by turning those into binary (refc binary) 1 . 24
  • 25. The result: ~800% capacity increase in totalThe result: ~800% capacity increase in total Latency (ms) 1 . 25
  • 26. When data copy happens ?When data copy happens ? Message passing Process creation Read/Write from ETS/DTS/Mnesia 1 . 26
  • 27. When data copy does NOTWhen data copy does NOT happen ?happen ? Passing data around inside process(ex. function call) Binary > 64 bytes are not copied Read from Constant Pool Literals are not copied between processes for messages or spawn (new behavior ~20.0rc-1) 1 . 27
  • 29. Summary of our use case:Summary of our use case: How to avoid data copyHow to avoid data copy 1. Turn thousands of URLs into trie tree, save it to constant pool using mochiglobal. Then processes can access to that tree data in parallel, without any copy 2. When we have relatively large data, try to turn it into binary 3. Turn DSL rules(4-5K) into modules using merl/syntax_tools (this works fine with Erlang VM) 1 . 29
  • 30. Constant Pool ToolsConstant Pool Tools parse transform mochiglobal merl Macro ? ??? fastglobal Erlang Elixir 1 . 30
  • 32. AcknowledgmentsAcknowledgments This work was done with following my colleagues at OpenX, and I'd like to express my appreciation for their insights and helps: Kenan Gillet David Hull 1 . 32
  • 33. Thank you !Thank you !   Q & AQ & A   1 . 33