SlideShare a Scribd company logo
1 of 29
Download to read offline
Data efficiency on BEAM
Choose the right data representation
Dmytro Lytovchenko
Erlang Solutions, Sweden
@kvakvs
Stockholm
Cybertalks
2019
BEAM File
● A simple container with sections
○ Code
○ LitT, StrT (literals and strings)
○ Atom, AtU8
● Literals have dedicated location in memory
(constant pool)
● Access is O(1)
● Used literals are copied during code upgrade
“FOR1”, Length:32/big, “BEAM”
“Code”, Length:32/big,
Code/binary
“LitT”, Length:32/big,
Compressed/binary
…
Term
● Is a machine word, 64 (or 32) bit
● Has 2 bits reserved at all times, these bits
define the contained data type
Can be:
● An immediate value
● C pointer to a list cell
● C pointer to a boxed value
● A header, marks beginning of a “box”
a machine word
BEAM VM Structures are all terms
● Heap — array[] of Term
● Stack — array[] of Term, inside the young heap
● VM Registers — array[] of Term
Data types on heap
● Tuple — array[] of Term on heap
● List — array[2] of Term
● Binary & Bit String — array[] of words
● Float — one C double on heap (stored as 2 or 3 words)
Immediate value facts
● Always have size of 1 word
● Extra 2 or 4 bits define the type of data
Can be
● Small integers
● Local pids and ports
● Atoms
● NIL [] (empty list)
● Internal catch values
An immediate-2 value
An immediate-1 value
60 (28) bits for IMM1 value
64 (32) bits
58 (26) bits for IMM2 value
* VM NIL is not Elixir nil
List facts
● C-style single linked list
● A list cell is 2 words, containing the
head and the tail.
● Any list term is a pointer to a cell
● Last element may be a NIL []
● Can only iterate forward
● Cheap to prepend
[0 | []] 1 cell
[0 | [1 | []]] 2 cells
A list value, C pointer
1
Tail: [ ]
0
List value for the Tail, C pointer
* NIL in VM represents an empty list
[] and is not the same as Elixir nil
List tricks
● Store another value instead of a trailing NIL
○ Improper list
○ [X | Y] takes 2 words
which is smaller, than [X, Y | []] — 4 words,
also smaller, than {X, Y} — 3 words
● Reversing a list is cheap (copy)
○ Better than an inefficient algorithm which builds the result forward
● Reusing any tail of any list cell in multiple values is cheap.
This also means prepending any value.
● LC can be optimised if the result is not used (the unused
list will never be created)
X
Y
Lists: Please avoid
● Length operation is O(N), appending is O(N), finding Nth
element is O(N).
● – – operation is O(N*M), use ordsets or gb_sets for
the right arg
○ It is not too bad if the right argument is short.
○ Improved in OTP 22
● ++ operation is O(N) because: finding last element.
● lists:flatten, lists:reverse build a new list.
● When doing list-building, ensure that your code is
tail-recursive.
IO Lists
● Nested lists which contain other
lists or single characters or binaries
(also Elixir strings)
● Easy to “flatten” to a binary or a
string when is ready O(N); many
functions can consume from an
iolist without flattening
● Prepend and append are O(1)
● Fast and efficient
[
“Hello”,
“, ”,
[‘w’, “orl”],
[<<“d!”>>],
[]
]
IO Lists
● Prepending X to Y is O(1), but sometimes you need also to append:
○ [X | Y] — X is inserted before the first element of Y
○ [Y, X] — iolist is created, where X goes after Y — very cheap operation
● You don’t have to rebuild the large list to join multiple lists!
○ [X, Y, Z, T] where X, Y, Z and T are large lists — creates an iolist
○ Replaces lists:append
● Many functions in Erlang accept iolists as well as strings
○ File and socket functions, unicode functions, printing etc.
○ Types to look for: iolist() or iodata() (defined literally as iolist() | binary())
● Do more IO lists, it is good
a machine word
Term
● Is a machine word, 64 (or 32) bit
● Has 2 bits reserved, these bits define the
contained data type
Can be:
● An immediate value
● Can be a C pointer to a list cell
● Can be a C pointer to a boxed value
● Can be a header, marks beginning of a box
Boxed values
● Term value contains a pointer
● Binaries, floats, tuples, maps, local and
external refs, pids, and ports, big integers,
function closures, exports. Also temporary
data for internal functions
A boxed value, C pointer
Boxed values
● Term value contains a pointer
● Boxes always have 1 word header
● Binaries, floats, tuples, maps, local and
external refs, pids, and ports, big integers,
function closures, exports. Also temporary
data for internal functions
● Boxes always have 1 word header
○ Subtag (yellow bits), which defines what’s in the
box, and arity which defines size
A boxed value, C pointer
Header
array[] of Termarray[] of Termarray[] of word (term)
Tuple
● Tuple is a boxed value and exists on heap
{hello, “world”}
Tuple, C pointer
Tuple
● Tuple is a boxed value and exists on heap
● Tuple has 1 word header with tuple tag bits
and size (arity)
● Arity is limited to 26 bits (67M elements)
● Elements are a simple term array
● Changing a tuple element makes a full copy
except when the compiler can optimize it
Tuple, C pointer
Header
array[] of Termarray[] of Termarray[] of Term
Tuple tricks
● Random value lookup:
○ Accessing a random tuple element is O(1)
○ A literal tuple in your module is O(1)
● Setting multiple elements of a tuple will be optimized if:
○ Descending integer indexes are used
○ Tuple result is chained to the next setelement
○ No other function calls happen in between
● In other situations consider converting tuple to a list, map,
tree, or a more complex structure
Map facts
● Map is a boxed value and exists on heap
Map, C pointer
Map facts
● Map is a boxed value and exists on heap
○ Shorter than 32 keys: sorted list
○ HAMT (Hash Array Mapped Trie)
● Update is slow
● Lookup is faster than n-th list element
● Lookup is slower than indexing a tuple
HAMT
Map, C pointer
{Key, Value,
Key2, Value, ...}
Header
Header
Binary facts
● Binary is a boxed value and exists on
some heap
● A large binary is made of:
○ a heap bin on the binary heap
○ a refc bin on the process heap which points
to a refc bin
● A proc bin is a local small binary with a
2 word header and < 64b data
● A subbinary and match context are two
special cases
A process
PROC bin
REFC bin
Binary heap
HEAP bin
A process
REFC bin
SUB bin
Binaries: The good news
● Chain of binary append operations will be optimized if there
was ONLY ONE use of that binary throughout the operation.
● Unused variables in a binary match can be optimized away
○ Skipping/unused part of binary in all function clauses is globally
optimized away
f(<<_,X/binary>>) -> …
● How to see binary optimizations:
○ erlc +bin_opt_info MyModule.erl
○ export ERL_COMPILER_OPTIONS=bin_opt_info
Binaries: Please avoid
● Exposing large bin to multiple processes increases refcount and
holds the binary alive until the GC runs on all these processes!
● When growing a binary — a copy is created:
○ When the binary is sent as a message in the middle of manipulation
○ When the binary is inserted into ETS, sent to a port or to a NIF
○ When matching a binary (match context creates a pointer to the binary data)
External term format (ETF)
● Produced by erlang:term_to_binary call
● ETF is used when
○ Sending terms over network
○ Storing terms on disk: Mnesia, DETS, disk_log module
○ Database binary fields with Erlang term in them
● Atoms may be optimal in memory, but not in ETF
○ Not always true: ETF protocol is able to carry an atom table between connected nodes
○ Remote pids and remote ports contain hostname atom
○ For making the size smaller one can e.g. change atoms to small integers
Exports/Imports and Closures
● Export is a function reference:
○ Erlang: fun lists:foreach/1
○ Elixir: &Enum.each/2
● Represented internally as a {M, F, Arity} + extra fields
● A closure is same as an export + some captured values
● Sending function closures and exports over the network is not nice
○ The function code may be missing on the remote end
○ The remote code may be outdated
○ The remote code might be not what you think
Other
● Float takes (2 or 3 words on the heap) + 1 for the term itself — expensive.
○ Consider N/M rational fractions
○ Consider fixed point integers
● Small integer 1 word (28 or 60 bits w/sign) automatically becomes a BIG
integer on overflow (3x memory usage)
Binary
Heap
Data is copied, when it
● leaves the process as a message to
another process or port
● is used as arguments when spawning a
process
● is exchanged with ETS
Exception:
● Large binaries > 64 bytes are reference
counted and only references are copied
Process
Value
ETS
Ports
Other
Processes
Value
64+ bytes
Reference Value
Profile and measure
● Knowing the theory helps, but one must also know their data!
● Do not guess
● Print (see) and inspect your data structures
● Profile and measure your memory
○ erlang:system_info
○ erlang:process_info
○ erts_debug:size/1
○ erts_debug:flat_size/1
Performance: Optimize First
● Disk storage access
● Database access and slow queries
● Remote network access
…..
● Inefficient and slow algorithms & data structures
More information about internal structure for
BEAM memory can be found at
http://beam-wisdoms.clau.se
also
happi/theBeamBook
Dmytro Lytovchenko
Erlang Solutions, Sweden
@kvakvs

More Related Content

What's hot

Processing data with Python, using standard library modules you (probably) ne...
Processing data with Python, using standard library modules you (probably) ne...Processing data with Python, using standard library modules you (probably) ne...
Processing data with Python, using standard library modules you (probably) ne...gjcross
 
Funddamentals of data structures
Funddamentals of data structuresFunddamentals of data structures
Funddamentals of data structuresGlobalidiots
 
Stl Containers
Stl ContainersStl Containers
Stl Containersppd1961
 
Standard Template Library
Standard Template LibraryStandard Template Library
Standard Template LibraryGauravPatil318
 
Chapter 10 Library Function
Chapter 10 Library FunctionChapter 10 Library Function
Chapter 10 Library FunctionDeepak Singh
 
Chapter 2.2 data structures
Chapter 2.2 data structuresChapter 2.2 data structures
Chapter 2.2 data structuressshhzap
 
Stl (standard template library)
Stl (standard template library)Stl (standard template library)
Stl (standard template library)Hemant Jain
 
AES effecitve software implementation
AES effecitve software implementationAES effecitve software implementation
AES effecitve software implementationRoman Oliynykov
 
CSharp for Unity - Day 1
CSharp for Unity - Day 1CSharp for Unity - Day 1
CSharp for Unity - Day 1Duong Thanh
 
DATA STRUCTURE AND ALGORITHM FULL NOTES
DATA STRUCTURE AND ALGORITHM FULL NOTESDATA STRUCTURE AND ALGORITHM FULL NOTES
DATA STRUCTURE AND ALGORITHM FULL NOTESAniruddha Paul
 
Data structure using c module 1
Data structure using c module 1Data structure using c module 1
Data structure using c module 1smruti sarangi
 
Standard template library
Standard template libraryStandard template library
Standard template librarySukriti Singh
 
Analysis of algorithms
Analysis of algorithmsAnalysis of algorithms
Analysis of algorithmsiqbalphy1
 
Introduction to data_structure
Introduction to data_structureIntroduction to data_structure
Introduction to data_structureAshim Lamichhane
 

What's hot (20)

Processing data with Python, using standard library modules you (probably) ne...
Processing data with Python, using standard library modules you (probably) ne...Processing data with Python, using standard library modules you (probably) ne...
Processing data with Python, using standard library modules you (probably) ne...
 
Funddamentals of data structures
Funddamentals of data structuresFunddamentals of data structures
Funddamentals of data structures
 
STL in C++
STL in C++STL in C++
STL in C++
 
Files and streams
Files and streamsFiles and streams
Files and streams
 
Ds lec 14 filing in c++
Ds lec 14 filing in c++Ds lec 14 filing in c++
Ds lec 14 filing in c++
 
Stl Containers
Stl ContainersStl Containers
Stl Containers
 
Standard Library Functions
Standard Library FunctionsStandard Library Functions
Standard Library Functions
 
Standard Template Library
Standard Template LibraryStandard Template Library
Standard Template Library
 
Chapter 10 Library Function
Chapter 10 Library FunctionChapter 10 Library Function
Chapter 10 Library Function
 
Chapter 2.2 data structures
Chapter 2.2 data structuresChapter 2.2 data structures
Chapter 2.2 data structures
 
Stl (standard template library)
Stl (standard template library)Stl (standard template library)
Stl (standard template library)
 
AES effecitve software implementation
AES effecitve software implementationAES effecitve software implementation
AES effecitve software implementation
 
Regular Expressions
Regular ExpressionsRegular Expressions
Regular Expressions
 
CSharp for Unity - Day 1
CSharp for Unity - Day 1CSharp for Unity - Day 1
CSharp for Unity - Day 1
 
DATA STRUCTURE AND ALGORITHM FULL NOTES
DATA STRUCTURE AND ALGORITHM FULL NOTESDATA STRUCTURE AND ALGORITHM FULL NOTES
DATA STRUCTURE AND ALGORITHM FULL NOTES
 
Data structure using c module 1
Data structure using c module 1Data structure using c module 1
Data structure using c module 1
 
Standard template library
Standard template libraryStandard template library
Standard template library
 
Analysis of algorithms
Analysis of algorithmsAnalysis of algorithms
Analysis of algorithms
 
Introduction to data_structure
Introduction to data_structureIntroduction to data_structure
Introduction to data_structure
 
Al2ed chapter6
Al2ed chapter6Al2ed chapter6
Al2ed chapter6
 

Similar to Data efficiency on BEAM - Choose the right data representation by Dmytro Lytovchenko

TokuDB vs RocksDB
TokuDB vs RocksDBTokuDB vs RocksDB
TokuDB vs RocksDBVlad Lesin
 
Verilog Final Probe'22.pptx
Verilog Final Probe'22.pptxVerilog Final Probe'22.pptx
Verilog Final Probe'22.pptxSyedAzim6
 
system software 16 marks
system software 16 markssystem software 16 marks
system software 16 marksvvcetit
 
1. python programming
1. python programming1. python programming
1. python programmingsreeLekha51
 
Meetup C++ A brief overview of c++17
Meetup C++  A brief overview of c++17Meetup C++  A brief overview of c++17
Meetup C++ A brief overview of c++17Daniel Eriksson
 
Chromatic Sparse Learning
Chromatic Sparse LearningChromatic Sparse Learning
Chromatic Sparse LearningDatabricks
 
Systemsoftwarenotes 100929171256-phpapp02 2
Systemsoftwarenotes 100929171256-phpapp02 2Systemsoftwarenotes 100929171256-phpapp02 2
Systemsoftwarenotes 100929171256-phpapp02 2Khaja Dileef
 
INTRODUCTION TO PYTHON.pptx
INTRODUCTION TO PYTHON.pptxINTRODUCTION TO PYTHON.pptx
INTRODUCTION TO PYTHON.pptxNimrahafzal1
 
LECTURE2 td 2 sue les theories de graphes
LECTURE2 td 2 sue les theories de graphesLECTURE2 td 2 sue les theories de graphes
LECTURE2 td 2 sue les theories de graphesAhmedMahjoub15
 
Introduction to redis - version 2
Introduction to redis - version 2Introduction to redis - version 2
Introduction to redis - version 2Dvir Volk
 
The Ring programming language version 1.9 book - Part 100 of 210
The Ring programming language version 1.9 book - Part 100 of 210The Ring programming language version 1.9 book - Part 100 of 210
The Ring programming language version 1.9 book - Part 100 of 210Mahmoud Samir Fayed
 

Similar to Data efficiency on BEAM - Choose the right data representation by Dmytro Lytovchenko (20)

TokuDB vs RocksDB
TokuDB vs RocksDBTokuDB vs RocksDB
TokuDB vs RocksDB
 
Auto Tuning
Auto TuningAuto Tuning
Auto Tuning
 
ppt_pspp.pdf
ppt_pspp.pdfppt_pspp.pdf
ppt_pspp.pdf
 
Collections and generics
Collections and genericsCollections and generics
Collections and generics
 
Verilog Final Probe'22.pptx
Verilog Final Probe'22.pptxVerilog Final Probe'22.pptx
Verilog Final Probe'22.pptx
 
Editors l21 l24
Editors l21 l24Editors l21 l24
Editors l21 l24
 
system software 16 marks
system software 16 markssystem software 16 marks
system software 16 marks
 
1. python programming
1. python programming1. python programming
1. python programming
 
Meetup C++ A brief overview of c++17
Meetup C++  A brief overview of c++17Meetup C++  A brief overview of c++17
Meetup C++ A brief overview of c++17
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Chromatic Sparse Learning
Chromatic Sparse LearningChromatic Sparse Learning
Chromatic Sparse Learning
 
Systemsoftwarenotes 100929171256-phpapp02 2
Systemsoftwarenotes 100929171256-phpapp02 2Systemsoftwarenotes 100929171256-phpapp02 2
Systemsoftwarenotes 100929171256-phpapp02 2
 
INTRODUCTION TO PYTHON.pptx
INTRODUCTION TO PYTHON.pptxINTRODUCTION TO PYTHON.pptx
INTRODUCTION TO PYTHON.pptx
 
Database Sizing
Database SizingDatabase Sizing
Database Sizing
 
04 pig data operations
04 pig data operations04 pig data operations
04 pig data operations
 
Algorithms.
Algorithms. Algorithms.
Algorithms.
 
LECTURE2 td 2 sue les theories de graphes
LECTURE2 td 2 sue les theories de graphesLECTURE2 td 2 sue les theories de graphes
LECTURE2 td 2 sue les theories de graphes
 
Introduction to redis - version 2
Introduction to redis - version 2Introduction to redis - version 2
Introduction to redis - version 2
 
The Ring programming language version 1.9 book - Part 100 of 210
The Ring programming language version 1.9 book - Part 100 of 210The Ring programming language version 1.9 book - Part 100 of 210
The Ring programming language version 1.9 book - Part 100 of 210
 
Austen x talk
Austen x talkAusten x talk
Austen x talk
 

Recently uploaded

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 

Recently uploaded (20)

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 

Data efficiency on BEAM - Choose the right data representation by Dmytro Lytovchenko

  • 1. Data efficiency on BEAM Choose the right data representation Dmytro Lytovchenko Erlang Solutions, Sweden @kvakvs
  • 3. BEAM File ● A simple container with sections ○ Code ○ LitT, StrT (literals and strings) ○ Atom, AtU8 ● Literals have dedicated location in memory (constant pool) ● Access is O(1) ● Used literals are copied during code upgrade “FOR1”, Length:32/big, “BEAM” “Code”, Length:32/big, Code/binary “LitT”, Length:32/big, Compressed/binary …
  • 4. Term ● Is a machine word, 64 (or 32) bit ● Has 2 bits reserved at all times, these bits define the contained data type Can be: ● An immediate value ● C pointer to a list cell ● C pointer to a boxed value ● A header, marks beginning of a “box” a machine word
  • 5. BEAM VM Structures are all terms ● Heap — array[] of Term ● Stack — array[] of Term, inside the young heap ● VM Registers — array[] of Term Data types on heap ● Tuple — array[] of Term on heap ● List — array[2] of Term ● Binary & Bit String — array[] of words ● Float — one C double on heap (stored as 2 or 3 words)
  • 6. Immediate value facts ● Always have size of 1 word ● Extra 2 or 4 bits define the type of data Can be ● Small integers ● Local pids and ports ● Atoms ● NIL [] (empty list) ● Internal catch values An immediate-2 value An immediate-1 value 60 (28) bits for IMM1 value 64 (32) bits 58 (26) bits for IMM2 value * VM NIL is not Elixir nil
  • 7. List facts ● C-style single linked list ● A list cell is 2 words, containing the head and the tail. ● Any list term is a pointer to a cell ● Last element may be a NIL [] ● Can only iterate forward ● Cheap to prepend [0 | []] 1 cell [0 | [1 | []]] 2 cells A list value, C pointer 1 Tail: [ ] 0 List value for the Tail, C pointer * NIL in VM represents an empty list [] and is not the same as Elixir nil
  • 8. List tricks ● Store another value instead of a trailing NIL ○ Improper list ○ [X | Y] takes 2 words which is smaller, than [X, Y | []] — 4 words, also smaller, than {X, Y} — 3 words ● Reversing a list is cheap (copy) ○ Better than an inefficient algorithm which builds the result forward ● Reusing any tail of any list cell in multiple values is cheap. This also means prepending any value. ● LC can be optimised if the result is not used (the unused list will never be created) X Y
  • 9. Lists: Please avoid ● Length operation is O(N), appending is O(N), finding Nth element is O(N). ● – – operation is O(N*M), use ordsets or gb_sets for the right arg ○ It is not too bad if the right argument is short. ○ Improved in OTP 22 ● ++ operation is O(N) because: finding last element. ● lists:flatten, lists:reverse build a new list. ● When doing list-building, ensure that your code is tail-recursive.
  • 10. IO Lists ● Nested lists which contain other lists or single characters or binaries (also Elixir strings) ● Easy to “flatten” to a binary or a string when is ready O(N); many functions can consume from an iolist without flattening ● Prepend and append are O(1) ● Fast and efficient [ “Hello”, “, ”, [‘w’, “orl”], [<<“d!”>>], [] ]
  • 11. IO Lists ● Prepending X to Y is O(1), but sometimes you need also to append: ○ [X | Y] — X is inserted before the first element of Y ○ [Y, X] — iolist is created, where X goes after Y — very cheap operation ● You don’t have to rebuild the large list to join multiple lists! ○ [X, Y, Z, T] where X, Y, Z and T are large lists — creates an iolist ○ Replaces lists:append ● Many functions in Erlang accept iolists as well as strings ○ File and socket functions, unicode functions, printing etc. ○ Types to look for: iolist() or iodata() (defined literally as iolist() | binary()) ● Do more IO lists, it is good
  • 12. a machine word Term ● Is a machine word, 64 (or 32) bit ● Has 2 bits reserved, these bits define the contained data type Can be: ● An immediate value ● Can be a C pointer to a list cell ● Can be a C pointer to a boxed value ● Can be a header, marks beginning of a box
  • 13. Boxed values ● Term value contains a pointer ● Binaries, floats, tuples, maps, local and external refs, pids, and ports, big integers, function closures, exports. Also temporary data for internal functions A boxed value, C pointer
  • 14. Boxed values ● Term value contains a pointer ● Boxes always have 1 word header ● Binaries, floats, tuples, maps, local and external refs, pids, and ports, big integers, function closures, exports. Also temporary data for internal functions ● Boxes always have 1 word header ○ Subtag (yellow bits), which defines what’s in the box, and arity which defines size A boxed value, C pointer Header array[] of Termarray[] of Termarray[] of word (term)
  • 15. Tuple ● Tuple is a boxed value and exists on heap {hello, “world”} Tuple, C pointer
  • 16. Tuple ● Tuple is a boxed value and exists on heap ● Tuple has 1 word header with tuple tag bits and size (arity) ● Arity is limited to 26 bits (67M elements) ● Elements are a simple term array ● Changing a tuple element makes a full copy except when the compiler can optimize it Tuple, C pointer Header array[] of Termarray[] of Termarray[] of Term
  • 17. Tuple tricks ● Random value lookup: ○ Accessing a random tuple element is O(1) ○ A literal tuple in your module is O(1) ● Setting multiple elements of a tuple will be optimized if: ○ Descending integer indexes are used ○ Tuple result is chained to the next setelement ○ No other function calls happen in between ● In other situations consider converting tuple to a list, map, tree, or a more complex structure
  • 18. Map facts ● Map is a boxed value and exists on heap Map, C pointer
  • 19. Map facts ● Map is a boxed value and exists on heap ○ Shorter than 32 keys: sorted list ○ HAMT (Hash Array Mapped Trie) ● Update is slow ● Lookup is faster than n-th list element ● Lookup is slower than indexing a tuple HAMT Map, C pointer {Key, Value, Key2, Value, ...} Header Header
  • 20. Binary facts ● Binary is a boxed value and exists on some heap ● A large binary is made of: ○ a heap bin on the binary heap ○ a refc bin on the process heap which points to a refc bin ● A proc bin is a local small binary with a 2 word header and < 64b data ● A subbinary and match context are two special cases A process PROC bin REFC bin Binary heap HEAP bin A process REFC bin SUB bin
  • 21. Binaries: The good news ● Chain of binary append operations will be optimized if there was ONLY ONE use of that binary throughout the operation. ● Unused variables in a binary match can be optimized away ○ Skipping/unused part of binary in all function clauses is globally optimized away f(<<_,X/binary>>) -> … ● How to see binary optimizations: ○ erlc +bin_opt_info MyModule.erl ○ export ERL_COMPILER_OPTIONS=bin_opt_info
  • 22. Binaries: Please avoid ● Exposing large bin to multiple processes increases refcount and holds the binary alive until the GC runs on all these processes! ● When growing a binary — a copy is created: ○ When the binary is sent as a message in the middle of manipulation ○ When the binary is inserted into ETS, sent to a port or to a NIF ○ When matching a binary (match context creates a pointer to the binary data)
  • 23. External term format (ETF) ● Produced by erlang:term_to_binary call ● ETF is used when ○ Sending terms over network ○ Storing terms on disk: Mnesia, DETS, disk_log module ○ Database binary fields with Erlang term in them ● Atoms may be optimal in memory, but not in ETF ○ Not always true: ETF protocol is able to carry an atom table between connected nodes ○ Remote pids and remote ports contain hostname atom ○ For making the size smaller one can e.g. change atoms to small integers
  • 24. Exports/Imports and Closures ● Export is a function reference: ○ Erlang: fun lists:foreach/1 ○ Elixir: &Enum.each/2 ● Represented internally as a {M, F, Arity} + extra fields ● A closure is same as an export + some captured values ● Sending function closures and exports over the network is not nice ○ The function code may be missing on the remote end ○ The remote code may be outdated ○ The remote code might be not what you think
  • 25. Other ● Float takes (2 or 3 words on the heap) + 1 for the term itself — expensive. ○ Consider N/M rational fractions ○ Consider fixed point integers ● Small integer 1 word (28 or 60 bits w/sign) automatically becomes a BIG integer on overflow (3x memory usage)
  • 26. Binary Heap Data is copied, when it ● leaves the process as a message to another process or port ● is used as arguments when spawning a process ● is exchanged with ETS Exception: ● Large binaries > 64 bytes are reference counted and only references are copied Process Value ETS Ports Other Processes Value 64+ bytes Reference Value
  • 27. Profile and measure ● Knowing the theory helps, but one must also know their data! ● Do not guess ● Print (see) and inspect your data structures ● Profile and measure your memory ○ erlang:system_info ○ erlang:process_info ○ erts_debug:size/1 ○ erts_debug:flat_size/1
  • 28. Performance: Optimize First ● Disk storage access ● Database access and slow queries ● Remote network access ….. ● Inefficient and slow algorithms & data structures
  • 29. More information about internal structure for BEAM memory can be found at http://beam-wisdoms.clau.se also happi/theBeamBook Dmytro Lytovchenko Erlang Solutions, Sweden @kvakvs