Range-Based Text
Formatting
For a Future Range-Based Standard Library
1 / 90
Text Formatting
Text formatting everywhere
Many different libraries/approaches
Here is yet another one
2 / 90
Text Formatting
Text formatting everywhere
Many different libraries/approaches
Here is yet another one
3 / 90
Text Formatting
Text formatting everywhere
Many different libraries/approaches
Here is yet another one
Input
components of text
order of these components
per component, conversion parameters
e.g., number of decimal places
4 / 90
Text Formatting
Text formatting everywhere
Many different libraries/approaches
Here is yet another one
Input
components of text
order of these components
per component, conversion parameters
e.g., number of decimal places
Output
a string
5 / 90
Syntax Options
Order of components and parameters described by
6 / 90
Syntax Options
Order of components and parameters described by
format string
printf("Hello %f", 3.14);
absl::StrFormat (printf syntax)
{fmt} / std::format / LEWG P0645 (Python syntax)
7 / 90
Syntax Options
Order of components and parameters described by
format string
printf("Hello %f", 3.14);
absl::StrFormat (printf syntax)
{fmt} / std::format / LEWG P0645 (Python syntax)
'Just C++': functions, parameters, operators
std::stringstream() << std::setprecision(2) << 3.14;
"Hello " + std::to_string(3.14)
8 / 90
Format Strings: Pros
format string
printf("Hello %f", 3.14);
absl::StrFormat (printf syntax)
{fmt} / std::format / LEWG P0645 (Python syntax)
Pros
syntax closer to resulting string
can decide format string at runtime
forgoes compile-time format check
9 / 90
Format Strings: Cons
format string
printf("Hello %f", 3.14);
absl::StrFormat (printf syntax)
{fmt} / std::format / LEWG P0645 (Python syntax)
Cons
must escape format string (and remember it!)
10 / 90
Format Strings: Cons
format string
printf("Hello %f", 3.14);
absl::StrFormat (printf syntax)
{fmt} / std::format / LEWG P0645 (Python syntax)
Cons
must escape format string (and remember it!)
extra language for parameters
why not use C++?
user-defined types need parser for parameters
11 / 90
Format Strings: Cons
format string
printf("Hello %f", 3.14);
absl::StrFormat (printf syntax)
{fmt} / std::format / LEWG P0645 (Python syntax)
Cons
must escape format string (and remember it!)
extra language for parameters
why not use C++?
user-defined types need parser for parameters
no gradual change in syntax from string concatenation to formatting
str1 + str2 vs.
format("%s%s", str1, str2) vs.
format("%s%d%s", str1, n, str2)
12 / 90
Format Strings: I8N
translation outsourced to agencies
set up to deal with text, not code
XLIFF (XML Localization Interchange File Format)
13 / 90
Format Strings: I8N
translation outsourced to agencies
set up to deal with text, not code
XLIFF (XML Localization Interchange File Format)
text may contain placeholders
Dear {0}, thank you for your interest in our product.
So must use format strings!
14 / 90
Format Strings: I8N
translation outsourced to agencies
set up to deal with text, not code
XLIFF (XML Localization Interchange File Format)
text may contain placeholders
Dear {0}, thank you for your interest in our product.
So must use format strings!
BUT: give agency only as much control as necessary
insertion position
formatting parameters provided by OS setting/culture database
decimal separator (comma vs. period)
number of decimal places
date format
15 / 90
Just C++: iostream
std::stringstream() << std::setprecision(2) << 3.14;
abuse of operator overloading
only because no variadic templates?
stateful ("manipulators")
std::setprecision applies to all following items,
std::width only to next item, ARGH!
slow due to
virtual calls
extra copy std::stringstream -> std::string
(std::stringstream() << std::setprecision(2) << 3.14).str()
16 / 90
Just C++: string concatenation
"Hello " + std::to_string(3.14)
different abuse of operator overloading
no formatting options
slow
many temporaries
17 / 90
Just C++: string concatenation
"Hello " + std::to_string(3.14)
different abuse of operator overloading
no formatting options
slow
many temporaries
BUT: conceptually the essence of formatting
turn data into string snippets
concatenate the snippets into whole text
naturally extends to user-defined types/parameters: call a function
Overcome weaknesses with Ranges!
18 / 90
Introduction to Ranges
Who knows the range-based for -loop?
19 / 90
Introduction to Ranges
Who knows the range-based for -loop?
Who knows Ranges TS?
20 / 90
Introduction to Ranges
Who knows the range-based for -loop?
Who knows Ranges TS?
Who knows Eric Niebler's Range-v3 library?
21 / 90
Introduction to Ranges
Who knows the range-based for -loop?
Who knows Ranges TS?
Who knows Eric Niebler's Range-v3 library?
Who uses ranges every day?
Range-v3?
22 / 90
Introduction to Ranges
Who knows the range-based for -loop?
Who knows Ranges TS?
Who knows Eric Niebler's Range-v3 library?
Who uses ranges every day?
Range-v3?
Boost.Range?
23 / 90
Introduction to Ranges
Who knows the range-based for -loop?
Who knows Ranges TS?
Who knows Eric Niebler's Range-v3 library?
Who uses ranges every day?
Range-v3?
Boost.Range?
think-cell library?
24 / 90
Introduction to Ranges
Who knows the range-based for -loop?
Who knows Ranges TS?
Who knows Eric Niebler's Range-v3 library?
Who uses ranges every day?
Range-v3?
Boost.Range?
think-cell library?
home-grown?
25 / 90
Essence of Ranges
std::find(itBegin, itEnd, x)
->
std::find(rng, x) // Ranges TS
rng anything with begin , end
26 / 90
Essence of Ranges
std::find(itBegin, itEnd, x)
->
std::find(rng, x) // Ranges TS
rng anything with begin , end
containers that own elements ( vector , basic_string , etc.)
27 / 90
Essence of Ranges
std::find(itBegin, itEnd, x)
->
std::find(rng, x) // Ranges TS
rng anything with begin , end
containers that own elements ( vector , basic_string , etc.)
views that reference elements (= iterator pairs wrapped into single object)
28 / 90
Essence of Ranges
std::find(itBegin, itEnd, x)
->
std::find(rng, x) // Ranges TS
rng anything with begin , end
containers that own elements ( vector , basic_string , etc.)
views that reference elements (= iterator pairs wrapped into single object)
ranges may do lazy calculations
tc::filter(rng,pred) only captures rng and pred , performs no work
skips elements while iterating
29 / 90
Why do I think I know
something about ranges?
think-cell has range library
evolved from Boost.Range
1 million lines of production code use it
chicken-and-egg problem of library design
can only learn good design by lots of use
with lots of use, cannot change design
avoid by
all code in-house
extra resources dedicated to refactoring
30 / 90
Ranges for Text 101: Replace
basic_string Member Functions
by Range Algorithms
index = str.find(...);
->
iterator = std::find(str,...); // Ranges TS or tc::find_*
same generic algorithms for character and other sequences
flexible with string types
wrap OS-/library-specific string types in range interface
treat uniformly in syntax/algorithms
31 / 90
Ranges for Text 101: Replace
basic_string Member Functions
by Range Algorithms
index = str.find(...);
->
iterator = std::find(str,...); // Ranges TS or tc::find_*
same generic algorithms for character and other sequences
flexible with string types
wrap OS-/library-specific string types in range interface
treat uniformly in syntax/algorithms
C++17 basic_string_view perpetuates basic_string member interface:-(
32 / 90
Ranges for Text Formatting (1)
All Range libraries already have concatenation
tc::concat("Hello ", strName) // similar syntax in Range-v3
33 / 90
Ranges for Text Formatting (1)
All Range libraries already have concatenation
tc::concat("Hello ", strName) // similar syntax in Range-v3
To format data, add formatting functions like tc::as_dec
double f=3.14;
tc::concat("You won ", tc::as_dec(f,2), " dollars.")
34 / 90
Ranges for Text Formatting (1)
All Range libraries already have concatenation
tc::concat("Hello ", strName) // similar syntax in Range-v3
To format data, add formatting functions like tc::as_dec
double f=3.14;
tc::concat("You won ", tc::as_dec(f,2), " dollars.")
not like <iostream> : double itself is not a character range:
tc::concat("You won ", f, " dollars.") // DOES NOT COMPILE
35 / 90
Ranges for Text Formatting (1)
All Range libraries already have concatenation
tc::concat("Hello ", strName) // similar syntax in Range-v3
To format data, add formatting functions like tc::as_dec
double f=3.14;
tc::concat("You won ", tc::as_dec(f,2), " dollars.")
not like <iostream> : double itself is not a character range:
tc::concat("You won ", f, " dollars.") // DOES NOT COMPILE
No need for special format function
36 / 90
Ranges for Text Formatting (2)
Extensible by functions returning ranges
auto dollars(double f) {
return tc::concat(tc::as_dec(f,2), " dollars");
}
double f=3.14;
tc::concat("You won ", dollars(f), ".");
37 / 90
Ranges for Text Formatting (2)
Extensible by functions returning ranges
auto dollars(double f) {
return tc::concat(tc::as_dec(f,2), " dollars");
}
double f=3.14;
tc::concat("You won ", dollars(f), ".");
Range algorithms work
tc::for_each(
tc::as_dec(f,2),
[](char c){...}
);
if( tc::all_of/tc::any_of(
tc::concat("You won ", tc::as_dec(f,2), " dollars."),
[](char c){ return c!='1'; }
) ) {...}
38 / 90
Formatting Into Containers (1)
std::string gives us
Empty Construction
std::string s(); // compiles
Construction from literal, another string
std::string s1("Hello"); // compiles
std::string s2(s1); // compiles
39 / 90
Formatting Into Containers (1)
std::string gives us
Empty Construction
std::string s(); // compiles
Construction from literal, another string
std::string s1("Hello"); // compiles
std::string s2(s1); // compiles
Add construction from 1 Range
std::string s3(tc::as_dec(3.14,2)); // suggested
std::string s3(tc::concat("You won ", tc::as_dec(3.14,2), " dollar
s.")); // suggested
40 / 90
Formatting Into Containers (1)
std::string gives us
Empty Construction
std::string s(); // compiles
Construction from literal, another string
std::string s1("Hello"); // compiles
std::string s2(s1); // compiles
Add construction from 1 Range
std::string s3(tc::as_dec(3.14,2)); // suggested
std::string s3(tc::concat("You won ", tc::as_dec(3.14,2), " dollar
s.")); // suggested
Add construction from N Ranges
std::string s4("Hello", " World"); // suggested
std::string s5("You won ", tc::as_dec(3.14,2), " dollars."); // su
ggested
41 / 90
Formatting Into Containers (2)
What about existing constructors?
std::string s1("A", 3 );
std::string s2('A', 3 );
std::string s3( 3 , 'A');
42 / 90
Formatting Into Containers (2)
What about existing constructors?
std::string s1("A", 3 ); // UB, buffer "A" overrun
std::string s2('A', 3 );
std::string s3( 3 , 'A');
43 / 90
Formatting Into Containers (2)
What about existing constructors?
std::string s1("A", 3 ); // UB, buffer "A" overrun
std::string s2('A', 3 ); // Adds 65x Ctrl-C
std::string s3( 3 , 'A');
44 / 90
Formatting Into Containers (2)
What about existing constructors?
std::string s1("A", 3 ); // UB, buffer "A" overrun
std::string s2('A', 3 ); // Adds 65x Ctrl-C
std::string s3( 3 , 'A'); // Adds 3x 'A'
45 / 90
Formatting Into Containers (2)
What about existing constructors?
std::string s1("A", 3 ); // UB, buffer "A" overrun
std::string s2('A', 3 ); // Adds 65x Ctrl-C
std::string s3( 3 , 'A'); // Adds 3x 'A'
Deprecate them!
std::string s(tc::repeat_n('A', 3)); //suggested, repeat_n as in R
ange-v3
46 / 90
Formatting Into Containers (3)
think-cell library uses tc::explicit_cast to simulate adding/removing explicit
constructors:
auto s4=tc::explicit_cast<std::string>("Hello", " World");
auto s5=tc::explicit_cast<std::string>("You won ", tc::as_dec(f,2),
" dollars.");
47 / 90
Formatting Into Containers (3)
think-cell library uses tc::explicit_cast to simulate adding/removing explicit
constructors:
auto s4=tc::explicit_cast<std::string>("Hello", " World");
auto s5=tc::explicit_cast<std::string>("You won ", tc::as_dec(f,2),
" dollars.");
tc::cont_emplace_back wraps .emplace_back / .push_back , uses
tc::explicit_cast as needed:
std::vector<std::string> vec;
tc::cont_emplace_back( vec, tc::as_dec(3.14,2) );
48 / 90
Formatting Into Containers (3)
think-cell library uses tc::explicit_cast to simulate adding/removing explicit
constructors:
auto s4=tc::explicit_cast<std::string>("Hello", " World");
auto s5=tc::explicit_cast<std::string>("You won ", tc::as_dec(f,2),
" dollars.");
tc::cont_emplace_back wraps .emplace_back / .push_back , uses
tc::explicit_cast as needed:
std::vector<std::string> vec;
tc::cont_emplace_back( vec, tc::as_dec(3.14,2) );
Can tc::append :
std::string s;
tc::append( s, tc::concat("You won ", tc::as_dec(f,2), " dollars.")
);
tc::append( s, "You won ", tc::as_dec(f,2), " dollars." );
49 / 90
Format Strings
tc::concat(
"<body>", html_escape(
tc::placeholders( "You won {0} dollars.", tc::as_dec(f,2) )
), "</body>"
)
50 / 90
Format Strings
tc::concat(
"<body>", html_escape(
tc::placeholders( "You won {0} dollars.", tc::as_dec(f,2) )
), "</body>"
)
support for names
tc::concat(
"<body>", html_escape(
tc::placeholders( "You won {amount} dollars on {date}."
, tc::named_arg("amount", tc::as_dec(f,2))
, tc::named_arg("date", tc::as_ISO8601(
std::chrono::system_clock::now()
))
)
), "</body>"
)
51 / 90
Naive Implementation
each formatter returns std::string
tc::concat returns std::string
tc::append appends std::string s
52 / 90
Naive Implementation
each formatter returns std::string
tc::concat returns std::string
tc::append appends std::string s
Pro
simple
53 / 90
Naive Implementation
each formatter returns std::string
tc::concat returns std::string
tc::append appends std::string s
Pro
simple
Con
need to allocate and copy many strings
talk would be over
54 / 90
Avoid Heap Allocation For
Components
Make formatter ranges lazy
generate character sequence during iteration
size of as_dec -like formatter objects known at compile-time, no heap allocation
55 / 90
Avoid Heap Allocation For
Components
Make formatter ranges lazy
generate character sequence during iteration
size of as_dec -like formatter objects known at compile-time, no heap allocation
auto dollars(double f) {
return tc::concat(tc::as_dec(f,2), " dollars");
}
double f=3.14;
std::string s(tc::concat("You won ", dollars(f), "."));
tc::as_dec(f,2) stores {f,2} ,
tc::concat stores components
lvalues stored by reference
rvalues stored by copy/move
like expression templates
56 / 90
Fast Formatting Into Containers
determine string length
allocate memory for whole string at once
fill in characters
57 / 90
Fast Formatting Into Containers
determine string length
allocate memory for whole string at once
fill in characters
template<typename Cont, typename Rng>
auto explicit_cast(Rng const& rng) {
return Cont(std::begin(rng),std::end(rng));
}
// note: there are more explicit_cast implementations for types othe
r than containers
58 / 90
Fast Formatting Into Containers
determine string length
allocate memory for whole string at once
fill in characters
template<typename Cont, typename Rng>
auto explicit_cast(Rng const& rng) {
return Cont(std::begin(rng),std::end(rng));
}
// note: there are more explicit_cast implementations for types othe
r than containers
formatters are not random-access
string ctor runs twice over rng :-(
first determine size
then copy characters
59 / 90
Fast Formatting Into Containers
avoid traversing rng twice
character rng implements size() member
explicit loop to take advantage of std::size
template<typename Cont, typename Rng, enable_if<
Rng has size and is not random-access
> >
auto explicit_cast(Rng const& rng) {
Cont cont;
cont.reserve( std::size(rng) );
for(auto it=std::begin(rng); it!=std::end(rng); ++it) {
tc::cont_emplace_back(cont, *it);
}
return cont;
}
60 / 90
Fast Formatting Into Containers
also have tc::append
template<typename Cont, typename Rng, enable_if<
Rng has size and is not random-access
> >
void append(Cont& cont, Rng const& rng) {
cont.reserve( cont.size() + std::size(rng) );
for(auto it=std::begin(rng); it!=std::end(rng); ++it) {
tc::cont_emplace_back(cont, *it);
}
}
61 / 90
Fast Formatting Into Containers
also have tc::append
template<typename Cont, typename Rng, enable_if<
Rng has size and is not random-access
> >
void append(Cont& cont, Rng const& rng) {
cont.reserve( cont.size() + std::size(rng) );
for(auto it=std::begin(rng); it!=std::end(rng); ++it) {
tc::cont_emplace_back(cont, *it);
}
}
all good?
62 / 90
Fast Formatting Into Containers
also have tc::append
template<typename Cont, typename Rng, enable_if<
Rng has size and is not random-access
> >
void append(Cont& cont, Rng const& rng) {
cont.reserve( cont.size() + std::size(rng) );
for(auto it=std::begin(rng); it!=std::end(rng); ++it) {
tc::cont_emplace_back(cont, *it);
}
}
.reserve is evil!!!
63 / 90
Better reserve
when adding N elements, guarantee O(N) moves and O(log(N)) memory allocations!
template< typename Cont >
void cont_reserve( Cont& cont, typename Cont::size_type n ) {
if( cont.capacity()<n ) {
cont.reserve(max(n,cont.capacity()*8/5));
}
}
template<typename Cont, typename Rng, enable_if<
Rng has size and is not random-access
> >
void append(Cont& cont, Rng const& rng) {
tc::cont_reserve( cont.size() + std::size(rng) );
for(auto it=std::begin(rng); it!=std::end(rng); ++it) {
tc::cont_emplace_back(cont, *it);
}
}
64 / 90
Fast Formatting Into Containers
template<typename Cont, typename Rng, enable_if<
Rng has size and is not random-access
> >
void append(Cont& cont, Rng const& rng) {
tc::cont_reserve( cont.size() + std::size(rng) );
for(auto it=std::begin(rng); it!=std::end(rng); ++it) {
tc::cont_emplace_back(cont, *it);
}
}
Next bottleneck: iterators!
65 / 90
Iterators Cost Performance
concat
iterator is std::variant of component iterators
each operator* and operator++ branches on the variant
iterator::operator++() {
std::visit( make_overload(
[&](Iterator1& it1){
++it1;
if (it1==std::end(m_rng1)) {
m_variant_of_its=std::begin(m_rng2);
}
},
[&](Iterator2& it2){
++it2;
}
), m_variant_of_its );
}
66 / 90
Iterators Cost Performance
concat
iterator is std::variant of component iterators
each operator* and operator++ branches on the variant
iterator::operator++() {
std::visit( make_overload(
[&](Iterator1& it1){
++it1;
if (it1==std::end(m_rng1)) {
m_variant_of_its=std::begin(m_rng2);
}
},
[&](Iterator2& it2){
++it2;
}
), m_variant_of_its );
}
tc::as_dec
iterator bookkeeping costs performance
67 / 90
External Iteration
C++ iterators do external iteration
Consumer calls producer to get new element
^
| Stack Producer Producer
| /  / 
Consumer Consumer Consumer
Consumer is at bottom of stack
Producer is at top of stack
68 / 90
External iteration (2)
Consumer is at bottom of stack
contiguous code path for whole range
easier to write
better performance
state encoded in instruction pointer
no limit for stack memory
Producer is at top of stack
contiguous code path for each item
harder to write
worse performance
single entry point, must restore state
fixed amount of memory or go to heap
69 / 90
Internal Iteration
Formatting text is more efficient with internal iteration
Producer calls consumer to offer new element
^
| Stack Consumer Consumer
| /  / 
Producer Producer Producer
Producer is at bottom of stack
... all the advantages of being bottom of stack ...
Consumer is at top of stack
... all the disadvantages of being top of stack ...
70 / 90
Many Range Algorithms OK with
Internal Iteration
Algorithm Internal Iteration?
binary_search no (random access iterators)
find no (single pass iterators); yes if only value
71 / 90
Many Range Algorithms OK with
Internal Iteration
Algorithm Internal Iteration?
binary_search no (random access iterators)
find no (single pass iterators); yes if only value
for_each yes
accumulate yes
all_of yes
any_of yes
none_of yes
...
72 / 90
Many Range Algorithms OK with
Internal Iteration
Algorithm Internal Iteration?
binary_search no (random access iterators)
find no (single pass iterators); yes if only value
for_each yes
accumulate yes
all_of yes
any_of yes
none_of yes
...
View Internal Iteration?
filter yes
transform yes
73 / 90
Many Range Algorithms OK with
Internal Iteration
Algorithm Internal Iteration?
binary_search no (random access iterators)
find no (single pass iterators); yes if only value
for_each yes
accumulate yes
all_of yes
any_of yes
none_of yes
...
View Internal Iteration?
filter yes
transform yes
Extend Range concept to internal iteration!
74 / 90
Extend Range Concept to
Internal Iteration
Range implements operator() that takes sink functor
Con: C++20 std::span::operator() already used, must SFINAE it out
Pro: can be written as lambda
tc::for_each(
// the range
[](auto sink) {
sink(1);
sink(2);
},
// the visitor
[](int n) {
consume(n);
}
);
tc::for_each uses internal iteration if available (never slower than iterators)
otherwise uses iterators
75 / 90
concat with Internal Iteration
template<typename... Rngs>
struct concat {
std::tuple<Rngs...> m_rng;
template<typename Sink>
void operator()(Sink sink) const {
// tc::for_each also works on tuples
tc::for_each(m_rng, [&](auto const& rng) {
tc::for_each(rng, sink);
});
}
};
no overhead
76 / 90
Appender Customization Point
introduce appender sink for explicit_cast and append to use
template<typename Cont, typename Rng>
void append(Cont& cont, Rng const& rng) {
tc::for_each(std::forward<Rng>(rng), tc::appender(cont));
}
77 / 90
Appender Customization Point
introduce appender sink for explicit_cast and append to use
template<typename Cont, typename Rng>
void append(Cont& cont, Rng const& rng) {
tc::for_each(std::forward<Rng>(rng), tc::appender(cont));
}
appender customization point
returned by container::appender() member function
default for std:: containers
template<typename Cont>
struct appender {
Cont& m_cont;
template<typename T> void operator()(T&& t) {
tc::cont_emplace_back(m_cont, std::forward<T>(t));
}
};
78 / 90
Chunk Customization Point
What about reserve ?
Sink needs whole range to call std::size before iteration
79 / 90
Chunk Customization Point
What about reserve ?
Sink needs whole range to call std::size before iteration
new Sink customization point chunk
if available, tc::for_each calls it with whole range
template<typename Cont, enable_if<Cont has reserve()> >
struct reserving_appender : appender<Cont> {
template<typename Rng, enable_if<Rng has size()> >
void chunk(Rng&& rng) const {
tc::cont_reserve( m_cont, m_cont.size()+std::size(rng) );
tc::for_each( std::forward<Rng>(rng),
static_cast<appender<Cont> const&>(*this)
);
}
};
80 / 90
Chunk Customization Point:
other uses
file sink advertises interest in contiguous memory chunks
struct file_appender {
void chunk(std::span<unsigned char const> rng) const {
std::fwrite(rng.begin(),1,rng.size(),m_file);
}
void operator()(unsigned char ch) const {
chunk(tc::single(ch));
}
};
81 / 90
Performance: Appender vs
Hand-Written
How much loss compared to hand-written code?
trivial formatting task 10x 'A' + 10x 'B' + 10x 'C' best to expose overhead
struct Buffer {
char achBuffer[1024];
char* pchEnd=&achBuffer[0];
} buffer;
void repeat_handwritten(char chA, int cchA,
char chB, int cchB,
char chC, int cchC
) {
for (auto i = cchA; 0 < i; --i) {
*buffer.pchEnd=chA;
++buffer.pchEnd;
}
... cchB ... chB ...
... cchC ... chC ...
}
82 / 90
Performance: Appender vs
Hand-Written
struct Buffer {
...
auto appender() & {
struct appender_t {
Buffer* m_buffer;
void operator()(char ch) noexcept {
*m_buffer->pchEnd=ch;
++m_buffer->pchEnd;
}
};
return appender_t{this};
}
} buffer;
void repeat_with_ranges(char chA, int cchA,
char chB, int cchB,
char chC, int cchC ) {
tc::append(buffer, tc::repeat_n(chA,cchA), tc::repeat_n(chB,cch
B),
tc::repeat_n(chC,cchC));
}
83 / 90
Performance: Appender vs
Hand-Written
repeat_n iterator-based
~50% more time than hand-written (Visual C++ 15.8)
repeat_n supports internal iteration
~15% more time than hand-written (Visual C++ 15.8)
Test is worst case: actual work is trivial
smaller difference for, e.g., converting numbers to strings
84 / 90
Performance: Custom vs
Standard Appender
toy basic_string implementation
only heap: pointers begin , end , end_of_memory
Again trivial formatting task: 10x 'A' + 10x 'B' + 10x 'C'
void repeat_with_ranges(
char chA, int cchA,
char chB, int cchB,
char chC, int cchC
) {
tc::append(mystring,
tc::repeat_n(chA,cchA), tc::repeat_n(chB,cchB),
tc::repeat_n(chC,cchC));
}
85 / 90
Performance: Custom vs
Standard Appender
Standard Appender
template<typename Cont>
struct appender {
Cont& m_cont;
template<typename T>
void operator()(T&& t) {
m_cont.emplace_back(std::forward<T>(t));
}
};
template<typename Cont, enable_if<Cont has reserve()> >
struct reserving_appender : appender<Cont> {
template<typename Rng, enable_if<Rng has size()> >
void chunk(Rng&& rng) const {
tc::cont_reserve( m_cont, m_cont.size()+std::size(rng) );
tc::for_each( std::forward<Rng>(rng),
static_cast<appender<Cont> const&>(*this)
);
}
};
86 / 90
Performance: Custom vs
Standard Appender
Custom Appender
template<typename Cont>
struct mystring_appender : appender<Cont> {
Cont& m_cont;
template<typename T>
void operator()(T&& t) {
m_cont.emplace_back(std::forward<T>(t));
}
template<typename Rng, enable_if<Rng has size()> >
void chunk(Rng&& rng) const {
tc::cont_reserve( m_cont, m_cont.size()+std::size(rng) );
tc::for_each( std::forward<Rng>(rng),
[&](auto&& t) {
*m_cont.m_ptEnd=std::forward<decltype(t)>(t);
++m_cont.m_ptEnd;
}
);
}
};
87 / 90
Performance: Custom vs.
Standard Appender
String was only 30 characters
Heap allocation
Custom Appender ~20% less time (Visual C++ 15.8)
Requires own basic_string implementation
uninitialized buffer not exposed by std::basic_string / std::vector
88 / 90
Performance: Future Work
if not all snippets implement size() : new customization point min_size() ?
concat::min_size() is sum of min_size() of components
min_size() never wrong to return 0
custom file appender that fills fixed I/O buffer
replace std::FILE buffer with own buffer
offer unchecked write as long as snippet size() still fits
new customization point max_size ?
89 / 90
Conclusion
Use Range syntax and algorithms for text formatting
For performance, need new customization points, Range::operator() , appender ,
chunk
Then performance competitive with hand-written code
think-cell library is at https://github.com/think-cell/range [NEWS: now under
Boost license]
90 / 90

C++ CoreHard Autumn 2018. Text Formatting For a Future Range-Based Standard Library - Arno Schödl

  • 1.
    Range-Based Text Formatting For aFuture Range-Based Standard Library 1 / 90
  • 2.
    Text Formatting Text formattingeverywhere Many different libraries/approaches Here is yet another one 2 / 90
  • 3.
    Text Formatting Text formattingeverywhere Many different libraries/approaches Here is yet another one 3 / 90
  • 4.
    Text Formatting Text formattingeverywhere Many different libraries/approaches Here is yet another one Input components of text order of these components per component, conversion parameters e.g., number of decimal places 4 / 90
  • 5.
    Text Formatting Text formattingeverywhere Many different libraries/approaches Here is yet another one Input components of text order of these components per component, conversion parameters e.g., number of decimal places Output a string 5 / 90
  • 6.
    Syntax Options Order ofcomponents and parameters described by 6 / 90
  • 7.
    Syntax Options Order ofcomponents and parameters described by format string printf("Hello %f", 3.14); absl::StrFormat (printf syntax) {fmt} / std::format / LEWG P0645 (Python syntax) 7 / 90
  • 8.
    Syntax Options Order ofcomponents and parameters described by format string printf("Hello %f", 3.14); absl::StrFormat (printf syntax) {fmt} / std::format / LEWG P0645 (Python syntax) 'Just C++': functions, parameters, operators std::stringstream() << std::setprecision(2) << 3.14; "Hello " + std::to_string(3.14) 8 / 90
  • 9.
    Format Strings: Pros formatstring printf("Hello %f", 3.14); absl::StrFormat (printf syntax) {fmt} / std::format / LEWG P0645 (Python syntax) Pros syntax closer to resulting string can decide format string at runtime forgoes compile-time format check 9 / 90
  • 10.
    Format Strings: Cons formatstring printf("Hello %f", 3.14); absl::StrFormat (printf syntax) {fmt} / std::format / LEWG P0645 (Python syntax) Cons must escape format string (and remember it!) 10 / 90
  • 11.
    Format Strings: Cons formatstring printf("Hello %f", 3.14); absl::StrFormat (printf syntax) {fmt} / std::format / LEWG P0645 (Python syntax) Cons must escape format string (and remember it!) extra language for parameters why not use C++? user-defined types need parser for parameters 11 / 90
  • 12.
    Format Strings: Cons formatstring printf("Hello %f", 3.14); absl::StrFormat (printf syntax) {fmt} / std::format / LEWG P0645 (Python syntax) Cons must escape format string (and remember it!) extra language for parameters why not use C++? user-defined types need parser for parameters no gradual change in syntax from string concatenation to formatting str1 + str2 vs. format("%s%s", str1, str2) vs. format("%s%d%s", str1, n, str2) 12 / 90
  • 13.
    Format Strings: I8N translationoutsourced to agencies set up to deal with text, not code XLIFF (XML Localization Interchange File Format) 13 / 90
  • 14.
    Format Strings: I8N translationoutsourced to agencies set up to deal with text, not code XLIFF (XML Localization Interchange File Format) text may contain placeholders Dear {0}, thank you for your interest in our product. So must use format strings! 14 / 90
  • 15.
    Format Strings: I8N translationoutsourced to agencies set up to deal with text, not code XLIFF (XML Localization Interchange File Format) text may contain placeholders Dear {0}, thank you for your interest in our product. So must use format strings! BUT: give agency only as much control as necessary insertion position formatting parameters provided by OS setting/culture database decimal separator (comma vs. period) number of decimal places date format 15 / 90
  • 16.
    Just C++: iostream std::stringstream()<< std::setprecision(2) << 3.14; abuse of operator overloading only because no variadic templates? stateful ("manipulators") std::setprecision applies to all following items, std::width only to next item, ARGH! slow due to virtual calls extra copy std::stringstream -> std::string (std::stringstream() << std::setprecision(2) << 3.14).str() 16 / 90
  • 17.
    Just C++: stringconcatenation "Hello " + std::to_string(3.14) different abuse of operator overloading no formatting options slow many temporaries 17 / 90
  • 18.
    Just C++: stringconcatenation "Hello " + std::to_string(3.14) different abuse of operator overloading no formatting options slow many temporaries BUT: conceptually the essence of formatting turn data into string snippets concatenate the snippets into whole text naturally extends to user-defined types/parameters: call a function Overcome weaknesses with Ranges! 18 / 90
  • 19.
    Introduction to Ranges Whoknows the range-based for -loop? 19 / 90
  • 20.
    Introduction to Ranges Whoknows the range-based for -loop? Who knows Ranges TS? 20 / 90
  • 21.
    Introduction to Ranges Whoknows the range-based for -loop? Who knows Ranges TS? Who knows Eric Niebler's Range-v3 library? 21 / 90
  • 22.
    Introduction to Ranges Whoknows the range-based for -loop? Who knows Ranges TS? Who knows Eric Niebler's Range-v3 library? Who uses ranges every day? Range-v3? 22 / 90
  • 23.
    Introduction to Ranges Whoknows the range-based for -loop? Who knows Ranges TS? Who knows Eric Niebler's Range-v3 library? Who uses ranges every day? Range-v3? Boost.Range? 23 / 90
  • 24.
    Introduction to Ranges Whoknows the range-based for -loop? Who knows Ranges TS? Who knows Eric Niebler's Range-v3 library? Who uses ranges every day? Range-v3? Boost.Range? think-cell library? 24 / 90
  • 25.
    Introduction to Ranges Whoknows the range-based for -loop? Who knows Ranges TS? Who knows Eric Niebler's Range-v3 library? Who uses ranges every day? Range-v3? Boost.Range? think-cell library? home-grown? 25 / 90
  • 26.
    Essence of Ranges std::find(itBegin,itEnd, x) -> std::find(rng, x) // Ranges TS rng anything with begin , end 26 / 90
  • 27.
    Essence of Ranges std::find(itBegin,itEnd, x) -> std::find(rng, x) // Ranges TS rng anything with begin , end containers that own elements ( vector , basic_string , etc.) 27 / 90
  • 28.
    Essence of Ranges std::find(itBegin,itEnd, x) -> std::find(rng, x) // Ranges TS rng anything with begin , end containers that own elements ( vector , basic_string , etc.) views that reference elements (= iterator pairs wrapped into single object) 28 / 90
  • 29.
    Essence of Ranges std::find(itBegin,itEnd, x) -> std::find(rng, x) // Ranges TS rng anything with begin , end containers that own elements ( vector , basic_string , etc.) views that reference elements (= iterator pairs wrapped into single object) ranges may do lazy calculations tc::filter(rng,pred) only captures rng and pred , performs no work skips elements while iterating 29 / 90
  • 30.
    Why do Ithink I know something about ranges? think-cell has range library evolved from Boost.Range 1 million lines of production code use it chicken-and-egg problem of library design can only learn good design by lots of use with lots of use, cannot change design avoid by all code in-house extra resources dedicated to refactoring 30 / 90
  • 31.
    Ranges for Text101: Replace basic_string Member Functions by Range Algorithms index = str.find(...); -> iterator = std::find(str,...); // Ranges TS or tc::find_* same generic algorithms for character and other sequences flexible with string types wrap OS-/library-specific string types in range interface treat uniformly in syntax/algorithms 31 / 90
  • 32.
    Ranges for Text101: Replace basic_string Member Functions by Range Algorithms index = str.find(...); -> iterator = std::find(str,...); // Ranges TS or tc::find_* same generic algorithms for character and other sequences flexible with string types wrap OS-/library-specific string types in range interface treat uniformly in syntax/algorithms C++17 basic_string_view perpetuates basic_string member interface:-( 32 / 90
  • 33.
    Ranges for TextFormatting (1) All Range libraries already have concatenation tc::concat("Hello ", strName) // similar syntax in Range-v3 33 / 90
  • 34.
    Ranges for TextFormatting (1) All Range libraries already have concatenation tc::concat("Hello ", strName) // similar syntax in Range-v3 To format data, add formatting functions like tc::as_dec double f=3.14; tc::concat("You won ", tc::as_dec(f,2), " dollars.") 34 / 90
  • 35.
    Ranges for TextFormatting (1) All Range libraries already have concatenation tc::concat("Hello ", strName) // similar syntax in Range-v3 To format data, add formatting functions like tc::as_dec double f=3.14; tc::concat("You won ", tc::as_dec(f,2), " dollars.") not like <iostream> : double itself is not a character range: tc::concat("You won ", f, " dollars.") // DOES NOT COMPILE 35 / 90
  • 36.
    Ranges for TextFormatting (1) All Range libraries already have concatenation tc::concat("Hello ", strName) // similar syntax in Range-v3 To format data, add formatting functions like tc::as_dec double f=3.14; tc::concat("You won ", tc::as_dec(f,2), " dollars.") not like <iostream> : double itself is not a character range: tc::concat("You won ", f, " dollars.") // DOES NOT COMPILE No need for special format function 36 / 90
  • 37.
    Ranges for TextFormatting (2) Extensible by functions returning ranges auto dollars(double f) { return tc::concat(tc::as_dec(f,2), " dollars"); } double f=3.14; tc::concat("You won ", dollars(f), "."); 37 / 90
  • 38.
    Ranges for TextFormatting (2) Extensible by functions returning ranges auto dollars(double f) { return tc::concat(tc::as_dec(f,2), " dollars"); } double f=3.14; tc::concat("You won ", dollars(f), "."); Range algorithms work tc::for_each( tc::as_dec(f,2), [](char c){...} ); if( tc::all_of/tc::any_of( tc::concat("You won ", tc::as_dec(f,2), " dollars."), [](char c){ return c!='1'; } ) ) {...} 38 / 90
  • 39.
    Formatting Into Containers(1) std::string gives us Empty Construction std::string s(); // compiles Construction from literal, another string std::string s1("Hello"); // compiles std::string s2(s1); // compiles 39 / 90
  • 40.
    Formatting Into Containers(1) std::string gives us Empty Construction std::string s(); // compiles Construction from literal, another string std::string s1("Hello"); // compiles std::string s2(s1); // compiles Add construction from 1 Range std::string s3(tc::as_dec(3.14,2)); // suggested std::string s3(tc::concat("You won ", tc::as_dec(3.14,2), " dollar s.")); // suggested 40 / 90
  • 41.
    Formatting Into Containers(1) std::string gives us Empty Construction std::string s(); // compiles Construction from literal, another string std::string s1("Hello"); // compiles std::string s2(s1); // compiles Add construction from 1 Range std::string s3(tc::as_dec(3.14,2)); // suggested std::string s3(tc::concat("You won ", tc::as_dec(3.14,2), " dollar s.")); // suggested Add construction from N Ranges std::string s4("Hello", " World"); // suggested std::string s5("You won ", tc::as_dec(3.14,2), " dollars."); // su ggested 41 / 90
  • 42.
    Formatting Into Containers(2) What about existing constructors? std::string s1("A", 3 ); std::string s2('A', 3 ); std::string s3( 3 , 'A'); 42 / 90
  • 43.
    Formatting Into Containers(2) What about existing constructors? std::string s1("A", 3 ); // UB, buffer "A" overrun std::string s2('A', 3 ); std::string s3( 3 , 'A'); 43 / 90
  • 44.
    Formatting Into Containers(2) What about existing constructors? std::string s1("A", 3 ); // UB, buffer "A" overrun std::string s2('A', 3 ); // Adds 65x Ctrl-C std::string s3( 3 , 'A'); 44 / 90
  • 45.
    Formatting Into Containers(2) What about existing constructors? std::string s1("A", 3 ); // UB, buffer "A" overrun std::string s2('A', 3 ); // Adds 65x Ctrl-C std::string s3( 3 , 'A'); // Adds 3x 'A' 45 / 90
  • 46.
    Formatting Into Containers(2) What about existing constructors? std::string s1("A", 3 ); // UB, buffer "A" overrun std::string s2('A', 3 ); // Adds 65x Ctrl-C std::string s3( 3 , 'A'); // Adds 3x 'A' Deprecate them! std::string s(tc::repeat_n('A', 3)); //suggested, repeat_n as in R ange-v3 46 / 90
  • 47.
    Formatting Into Containers(3) think-cell library uses tc::explicit_cast to simulate adding/removing explicit constructors: auto s4=tc::explicit_cast<std::string>("Hello", " World"); auto s5=tc::explicit_cast<std::string>("You won ", tc::as_dec(f,2), " dollars."); 47 / 90
  • 48.
    Formatting Into Containers(3) think-cell library uses tc::explicit_cast to simulate adding/removing explicit constructors: auto s4=tc::explicit_cast<std::string>("Hello", " World"); auto s5=tc::explicit_cast<std::string>("You won ", tc::as_dec(f,2), " dollars."); tc::cont_emplace_back wraps .emplace_back / .push_back , uses tc::explicit_cast as needed: std::vector<std::string> vec; tc::cont_emplace_back( vec, tc::as_dec(3.14,2) ); 48 / 90
  • 49.
    Formatting Into Containers(3) think-cell library uses tc::explicit_cast to simulate adding/removing explicit constructors: auto s4=tc::explicit_cast<std::string>("Hello", " World"); auto s5=tc::explicit_cast<std::string>("You won ", tc::as_dec(f,2), " dollars."); tc::cont_emplace_back wraps .emplace_back / .push_back , uses tc::explicit_cast as needed: std::vector<std::string> vec; tc::cont_emplace_back( vec, tc::as_dec(3.14,2) ); Can tc::append : std::string s; tc::append( s, tc::concat("You won ", tc::as_dec(f,2), " dollars.") ); tc::append( s, "You won ", tc::as_dec(f,2), " dollars." ); 49 / 90
  • 50.
    Format Strings tc::concat( "<body>", html_escape( tc::placeholders("You won {0} dollars.", tc::as_dec(f,2) ) ), "</body>" ) 50 / 90
  • 51.
    Format Strings tc::concat( "<body>", html_escape( tc::placeholders("You won {0} dollars.", tc::as_dec(f,2) ) ), "</body>" ) support for names tc::concat( "<body>", html_escape( tc::placeholders( "You won {amount} dollars on {date}." , tc::named_arg("amount", tc::as_dec(f,2)) , tc::named_arg("date", tc::as_ISO8601( std::chrono::system_clock::now() )) ) ), "</body>" ) 51 / 90
  • 52.
    Naive Implementation each formatterreturns std::string tc::concat returns std::string tc::append appends std::string s 52 / 90
  • 53.
    Naive Implementation each formatterreturns std::string tc::concat returns std::string tc::append appends std::string s Pro simple 53 / 90
  • 54.
    Naive Implementation each formatterreturns std::string tc::concat returns std::string tc::append appends std::string s Pro simple Con need to allocate and copy many strings talk would be over 54 / 90
  • 55.
    Avoid Heap AllocationFor Components Make formatter ranges lazy generate character sequence during iteration size of as_dec -like formatter objects known at compile-time, no heap allocation 55 / 90
  • 56.
    Avoid Heap AllocationFor Components Make formatter ranges lazy generate character sequence during iteration size of as_dec -like formatter objects known at compile-time, no heap allocation auto dollars(double f) { return tc::concat(tc::as_dec(f,2), " dollars"); } double f=3.14; std::string s(tc::concat("You won ", dollars(f), ".")); tc::as_dec(f,2) stores {f,2} , tc::concat stores components lvalues stored by reference rvalues stored by copy/move like expression templates 56 / 90
  • 57.
    Fast Formatting IntoContainers determine string length allocate memory for whole string at once fill in characters 57 / 90
  • 58.
    Fast Formatting IntoContainers determine string length allocate memory for whole string at once fill in characters template<typename Cont, typename Rng> auto explicit_cast(Rng const& rng) { return Cont(std::begin(rng),std::end(rng)); } // note: there are more explicit_cast implementations for types othe r than containers 58 / 90
  • 59.
    Fast Formatting IntoContainers determine string length allocate memory for whole string at once fill in characters template<typename Cont, typename Rng> auto explicit_cast(Rng const& rng) { return Cont(std::begin(rng),std::end(rng)); } // note: there are more explicit_cast implementations for types othe r than containers formatters are not random-access string ctor runs twice over rng :-( first determine size then copy characters 59 / 90
  • 60.
    Fast Formatting IntoContainers avoid traversing rng twice character rng implements size() member explicit loop to take advantage of std::size template<typename Cont, typename Rng, enable_if< Rng has size and is not random-access > > auto explicit_cast(Rng const& rng) { Cont cont; cont.reserve( std::size(rng) ); for(auto it=std::begin(rng); it!=std::end(rng); ++it) { tc::cont_emplace_back(cont, *it); } return cont; } 60 / 90
  • 61.
    Fast Formatting IntoContainers also have tc::append template<typename Cont, typename Rng, enable_if< Rng has size and is not random-access > > void append(Cont& cont, Rng const& rng) { cont.reserve( cont.size() + std::size(rng) ); for(auto it=std::begin(rng); it!=std::end(rng); ++it) { tc::cont_emplace_back(cont, *it); } } 61 / 90
  • 62.
    Fast Formatting IntoContainers also have tc::append template<typename Cont, typename Rng, enable_if< Rng has size and is not random-access > > void append(Cont& cont, Rng const& rng) { cont.reserve( cont.size() + std::size(rng) ); for(auto it=std::begin(rng); it!=std::end(rng); ++it) { tc::cont_emplace_back(cont, *it); } } all good? 62 / 90
  • 63.
    Fast Formatting IntoContainers also have tc::append template<typename Cont, typename Rng, enable_if< Rng has size and is not random-access > > void append(Cont& cont, Rng const& rng) { cont.reserve( cont.size() + std::size(rng) ); for(auto it=std::begin(rng); it!=std::end(rng); ++it) { tc::cont_emplace_back(cont, *it); } } .reserve is evil!!! 63 / 90
  • 64.
    Better reserve when addingN elements, guarantee O(N) moves and O(log(N)) memory allocations! template< typename Cont > void cont_reserve( Cont& cont, typename Cont::size_type n ) { if( cont.capacity()<n ) { cont.reserve(max(n,cont.capacity()*8/5)); } } template<typename Cont, typename Rng, enable_if< Rng has size and is not random-access > > void append(Cont& cont, Rng const& rng) { tc::cont_reserve( cont.size() + std::size(rng) ); for(auto it=std::begin(rng); it!=std::end(rng); ++it) { tc::cont_emplace_back(cont, *it); } } 64 / 90
  • 65.
    Fast Formatting IntoContainers template<typename Cont, typename Rng, enable_if< Rng has size and is not random-access > > void append(Cont& cont, Rng const& rng) { tc::cont_reserve( cont.size() + std::size(rng) ); for(auto it=std::begin(rng); it!=std::end(rng); ++it) { tc::cont_emplace_back(cont, *it); } } Next bottleneck: iterators! 65 / 90
  • 66.
    Iterators Cost Performance concat iteratoris std::variant of component iterators each operator* and operator++ branches on the variant iterator::operator++() { std::visit( make_overload( [&](Iterator1& it1){ ++it1; if (it1==std::end(m_rng1)) { m_variant_of_its=std::begin(m_rng2); } }, [&](Iterator2& it2){ ++it2; } ), m_variant_of_its ); } 66 / 90
  • 67.
    Iterators Cost Performance concat iteratoris std::variant of component iterators each operator* and operator++ branches on the variant iterator::operator++() { std::visit( make_overload( [&](Iterator1& it1){ ++it1; if (it1==std::end(m_rng1)) { m_variant_of_its=std::begin(m_rng2); } }, [&](Iterator2& it2){ ++it2; } ), m_variant_of_its ); } tc::as_dec iterator bookkeeping costs performance 67 / 90
  • 68.
    External Iteration C++ iteratorsdo external iteration Consumer calls producer to get new element ^ | Stack Producer Producer | / / Consumer Consumer Consumer Consumer is at bottom of stack Producer is at top of stack 68 / 90
  • 69.
    External iteration (2) Consumeris at bottom of stack contiguous code path for whole range easier to write better performance state encoded in instruction pointer no limit for stack memory Producer is at top of stack contiguous code path for each item harder to write worse performance single entry point, must restore state fixed amount of memory or go to heap 69 / 90
  • 70.
    Internal Iteration Formatting textis more efficient with internal iteration Producer calls consumer to offer new element ^ | Stack Consumer Consumer | / / Producer Producer Producer Producer is at bottom of stack ... all the advantages of being bottom of stack ... Consumer is at top of stack ... all the disadvantages of being top of stack ... 70 / 90
  • 71.
    Many Range AlgorithmsOK with Internal Iteration Algorithm Internal Iteration? binary_search no (random access iterators) find no (single pass iterators); yes if only value 71 / 90
  • 72.
    Many Range AlgorithmsOK with Internal Iteration Algorithm Internal Iteration? binary_search no (random access iterators) find no (single pass iterators); yes if only value for_each yes accumulate yes all_of yes any_of yes none_of yes ... 72 / 90
  • 73.
    Many Range AlgorithmsOK with Internal Iteration Algorithm Internal Iteration? binary_search no (random access iterators) find no (single pass iterators); yes if only value for_each yes accumulate yes all_of yes any_of yes none_of yes ... View Internal Iteration? filter yes transform yes 73 / 90
  • 74.
    Many Range AlgorithmsOK with Internal Iteration Algorithm Internal Iteration? binary_search no (random access iterators) find no (single pass iterators); yes if only value for_each yes accumulate yes all_of yes any_of yes none_of yes ... View Internal Iteration? filter yes transform yes Extend Range concept to internal iteration! 74 / 90
  • 75.
    Extend Range Conceptto Internal Iteration Range implements operator() that takes sink functor Con: C++20 std::span::operator() already used, must SFINAE it out Pro: can be written as lambda tc::for_each( // the range [](auto sink) { sink(1); sink(2); }, // the visitor [](int n) { consume(n); } ); tc::for_each uses internal iteration if available (never slower than iterators) otherwise uses iterators 75 / 90
  • 76.
    concat with InternalIteration template<typename... Rngs> struct concat { std::tuple<Rngs...> m_rng; template<typename Sink> void operator()(Sink sink) const { // tc::for_each also works on tuples tc::for_each(m_rng, [&](auto const& rng) { tc::for_each(rng, sink); }); } }; no overhead 76 / 90
  • 77.
    Appender Customization Point introduceappender sink for explicit_cast and append to use template<typename Cont, typename Rng> void append(Cont& cont, Rng const& rng) { tc::for_each(std::forward<Rng>(rng), tc::appender(cont)); } 77 / 90
  • 78.
    Appender Customization Point introduceappender sink for explicit_cast and append to use template<typename Cont, typename Rng> void append(Cont& cont, Rng const& rng) { tc::for_each(std::forward<Rng>(rng), tc::appender(cont)); } appender customization point returned by container::appender() member function default for std:: containers template<typename Cont> struct appender { Cont& m_cont; template<typename T> void operator()(T&& t) { tc::cont_emplace_back(m_cont, std::forward<T>(t)); } }; 78 / 90
  • 79.
    Chunk Customization Point Whatabout reserve ? Sink needs whole range to call std::size before iteration 79 / 90
  • 80.
    Chunk Customization Point Whatabout reserve ? Sink needs whole range to call std::size before iteration new Sink customization point chunk if available, tc::for_each calls it with whole range template<typename Cont, enable_if<Cont has reserve()> > struct reserving_appender : appender<Cont> { template<typename Rng, enable_if<Rng has size()> > void chunk(Rng&& rng) const { tc::cont_reserve( m_cont, m_cont.size()+std::size(rng) ); tc::for_each( std::forward<Rng>(rng), static_cast<appender<Cont> const&>(*this) ); } }; 80 / 90
  • 81.
    Chunk Customization Point: otheruses file sink advertises interest in contiguous memory chunks struct file_appender { void chunk(std::span<unsigned char const> rng) const { std::fwrite(rng.begin(),1,rng.size(),m_file); } void operator()(unsigned char ch) const { chunk(tc::single(ch)); } }; 81 / 90
  • 82.
    Performance: Appender vs Hand-Written Howmuch loss compared to hand-written code? trivial formatting task 10x 'A' + 10x 'B' + 10x 'C' best to expose overhead struct Buffer { char achBuffer[1024]; char* pchEnd=&achBuffer[0]; } buffer; void repeat_handwritten(char chA, int cchA, char chB, int cchB, char chC, int cchC ) { for (auto i = cchA; 0 < i; --i) { *buffer.pchEnd=chA; ++buffer.pchEnd; } ... cchB ... chB ... ... cchC ... chC ... } 82 / 90
  • 83.
    Performance: Appender vs Hand-Written structBuffer { ... auto appender() & { struct appender_t { Buffer* m_buffer; void operator()(char ch) noexcept { *m_buffer->pchEnd=ch; ++m_buffer->pchEnd; } }; return appender_t{this}; } } buffer; void repeat_with_ranges(char chA, int cchA, char chB, int cchB, char chC, int cchC ) { tc::append(buffer, tc::repeat_n(chA,cchA), tc::repeat_n(chB,cch B), tc::repeat_n(chC,cchC)); } 83 / 90
  • 84.
    Performance: Appender vs Hand-Written repeat_niterator-based ~50% more time than hand-written (Visual C++ 15.8) repeat_n supports internal iteration ~15% more time than hand-written (Visual C++ 15.8) Test is worst case: actual work is trivial smaller difference for, e.g., converting numbers to strings 84 / 90
  • 85.
    Performance: Custom vs StandardAppender toy basic_string implementation only heap: pointers begin , end , end_of_memory Again trivial formatting task: 10x 'A' + 10x 'B' + 10x 'C' void repeat_with_ranges( char chA, int cchA, char chB, int cchB, char chC, int cchC ) { tc::append(mystring, tc::repeat_n(chA,cchA), tc::repeat_n(chB,cchB), tc::repeat_n(chC,cchC)); } 85 / 90
  • 86.
    Performance: Custom vs StandardAppender Standard Appender template<typename Cont> struct appender { Cont& m_cont; template<typename T> void operator()(T&& t) { m_cont.emplace_back(std::forward<T>(t)); } }; template<typename Cont, enable_if<Cont has reserve()> > struct reserving_appender : appender<Cont> { template<typename Rng, enable_if<Rng has size()> > void chunk(Rng&& rng) const { tc::cont_reserve( m_cont, m_cont.size()+std::size(rng) ); tc::for_each( std::forward<Rng>(rng), static_cast<appender<Cont> const&>(*this) ); } }; 86 / 90
  • 87.
    Performance: Custom vs StandardAppender Custom Appender template<typename Cont> struct mystring_appender : appender<Cont> { Cont& m_cont; template<typename T> void operator()(T&& t) { m_cont.emplace_back(std::forward<T>(t)); } template<typename Rng, enable_if<Rng has size()> > void chunk(Rng&& rng) const { tc::cont_reserve( m_cont, m_cont.size()+std::size(rng) ); tc::for_each( std::forward<Rng>(rng), [&](auto&& t) { *m_cont.m_ptEnd=std::forward<decltype(t)>(t); ++m_cont.m_ptEnd; } ); } }; 87 / 90
  • 88.
    Performance: Custom vs. StandardAppender String was only 30 characters Heap allocation Custom Appender ~20% less time (Visual C++ 15.8) Requires own basic_string implementation uninitialized buffer not exposed by std::basic_string / std::vector 88 / 90
  • 89.
    Performance: Future Work ifnot all snippets implement size() : new customization point min_size() ? concat::min_size() is sum of min_size() of components min_size() never wrong to return 0 custom file appender that fills fixed I/O buffer replace std::FILE buffer with own buffer offer unchecked write as long as snippet size() still fits new customization point max_size ? 89 / 90
  • 90.
    Conclusion Use Range syntaxand algorithms for text formatting For performance, need new customization points, Range::operator() , appender , chunk Then performance competitive with hand-written code think-cell library is at https://github.com/think-cell/range [NEWS: now under Boost license] 90 / 90