Pass-by-value vs Pass-by-reference
Let’s dig into the age old question, should you pass-by-value or pass-by-reference in C++? (or by pointer in C)
This blog post is mostly a re-post of a reddit comment that I made on r/cpp about pass-by-value and pass-by-reference, with some minor improvements, to make it easier to reference and save.
The answer isn’t as easy as it might seem, it depends on the Application Binary Interface (ABI) and your use-cases, there isn’t a one size fits all answer, this is even more the case for anything which is built to be cross platform.
First it’s probably good to break the problem down into two parts (focusing solely on performance, ignoring readability and maintainability which should often be more important)
- The language construct costs (copying, moving, etc)
- Compiler implications (aliasing, pointer provenance, etc)
- The ABI (the stack, registers, etc)
Language constructs
tl;dr Pass-by-value can end up with unnecessary copies
To help understand the language constructs, here is a bit of a refresher on value types
Calling with lvalue |
|
---|---|
Calling with xvalue (type of rvalue) NOTE: data can be used after func
|
|
Calling with prvalue (type of rvalue) NOTE: The input arg can’t be used after func
|
|
Now let’s see how these value types impact what happens before and after the function call
Behavior before the call | |||
---|---|---|---|
void func(T) |
void func(const T &) |
void func(T &&) |
|
Calling with lvalue | Copies | Reference | N/A |
Calling with xvalue | Moves | Reference | Reference |
Calling with prvalue | In-place construct | Reference | Reference |
Behavior after if the caller does not move or copy
arg.something() |
|||
---|---|---|---|
void func(T) |
void func(const T &) |
void func(T &&) |
|
Calling with lvalue | Unnecessary copy | No overhead | N/A |
Calling with xvalue | Unnecessary move | No overhead | No overhead |
Calling with prvalue | No overhead | No overhead | No overhead |
Behavior after if the caller moves
other = std::move( arg ); |
|||
---|---|---|---|
void func(T) |
void func(const T &) |
void func(T &&) |
|
Calling with lvalue | Copy and move | N/A | Compile error |
Calling with xvalue | Two moves | N/A | One move |
Calling with prvalue | One move | N/A | One move |
Behavior after if the caller copies
other = arg; |
|||
---|---|---|---|
void func(T) |
void func(const T &) |
void func(T &&) |
|
Calling with lvalue | Two copies | One copy | N/A |
Calling with xvalue | Move and Copy | One copy | One copy |
Calling with prvalue | One copy | One copy | One copy |
Compiler implications
tl;dr Pass by reference can create additional aliasing situations
In order for the compiler to do optimizations it must make assumptions based on the code it is provided, one of the important ones is understanding who owns what pointer (or reference) and if this pointer/reference will be modified over a calling boundary.
In order to understand some of the implications of this aliasing, let’s look at a simple example (compiler explorer link).
struct MyObject
{
int val;
};
void other_func();
int by_ref( const MyObject & v )
{
int x = v.val;
other_func();
int y = v.val;
return x + y;
}
int by_value( MyObject v )
{
int x = v.val;
other_func();
int y = v.val;
return x + y;
}
With the by_ref
function the compiler is unable to determine if v.val
could have changed during the call to other_func
(e.g. it might have changed as a referenced to a global variable inside other_func
), because of this it is unable to optimize this code to do a single load of v.val
, with the by_value
implementation it knows the value can not change, as nothing else references it, it is the sole owner.
This aliasing can also happen when you have multiple arguments, even when the compiler is fully aware of the entire function here is another example (compiler explorer)
void two_arg_by_ref( const MyObject & input, MyObject & output )
{
output.val += input.val * 2;
output.val += input.val * 4;
}
void two_arg_by_value( MyObject input, MyObject & output )
{
output.val += input.val * 2;
output.val += input.val * 4;
}
With this example the compiler is unable to determine if input
and output
point to the same object so the value could be changing for each increment, meaning the compiler can not remove the extra loads and stores.
With this next one it might be fairly clear that they could reference the same object, how about when different argument types are used for outputting the value. (compiler explorer)
void two_arg_by_char_ref( const MyObject & input, char & output )
{
output += input.val * 2;
output += input.val * 4;
}
void two_arg_by_short_ref( const MyObject & input, short & output )
{
output += input.val * 2;
output += input.val * 4;
}
void two_arg_by_int_ref( const MyObject & input, int & output )
{
output += input.val * 2;
output += input.val * 4;
}
What might come as as surprise is that the same aliasing can exist for two_arg_by_int_ref
because output
might be pointing to the same value as input.val
, however the probably more unsuspecting one here is two_arg_by_char_ref
it could possibly also point to input.val
because it’s well-defined behavior to use reinterpret_cast
on any object to treat it as a char
(or ideally std::byte
).
Thankfully with the two_arg_by_short_ref
version this would be undefined behavior for the short
to point to the same object as it’s a different type that isn’t char
so it’s free to remove the unnecessary load/stores.
There is also the caller side of things which is impacted, let’s take a look at another example (compiler explorer)
void by_ref(const MyObject&);
void by_value(MyObject);
int use_by_ref()
{
MyObject obj{10};
by_ref(obj);
return obj.val;
}
int use_by_value()
{
MyObject obj{10};
by_value(obj);
return obj.val;
}
With this the use_by_ref
example even though it is calling a function which is const MyObject&
that function can still change the actual value (const
isn’t as meaningful as you would hope) so it must load the value of obj.val
after the return while the use_by_value
can safely assume that it will not change so does not need to load the value again.
The ABI
tl;dr Each platform is different, System V (*nix) does better with aggregate handling, Windows frequently leads to Invisible Reference.
This is heavily dependent on the platform, most of my experience is around x86 and most information here is specific to x86-64 and related to how MSVC and Unix System V (Linux, BSD, Mac, etc), to start with I’ll break things down into a few categories of what can happen
- Argument is passed in register(s)
- Argument is passed as a reference; reuse existing memory location (Traditional reference)
- Argument is passed as a reference; new memory location just for argument (Invisible reference)
These are in order from what is typically faster to slower, however assumptions about speed can have many exceptions and change over time, so if your making performance decisions use benchmarks not assumptions.
In order to show how things work its good to have some types to work with
struct int5
{
int v1, v2, v3, v4, v5;
};
struct int4
{
int v1, v2, v3, v4;
};
struct int3
{
int v1, v2, v3;
};
struct int2
{
int v1, v2;
};
struct int1
{
int v1;
};
struct char3
{
char x, y, z;
}
Now with these types let’s see how different ABIs handle them (reg is short for register here)
MSVC x64 ABI | Sys V ABI | |
---|---|---|
void func(int5) |
Invisible reference in 1-reg | Invisible reference in 1-reg |
void func(int4) |
Invisible reference in 1-reg | Packed in 2-regs |
void func(int3) |
Invisible reference in 1-reg | Packed in 2-regs |
void func(int2) |
Packed in 1-reg | Packed in 1-reg |
void func(int1) |
Packed in 1-reg | Packed in 1-reg |
void func(char3) |
Invisible reference in 1-reg | Packed in 1-reg |
void func(const int3 &) |
Reference in 1-reg | Reference in 1-reg |
void func(const int2 &) |
Reference in 1-reg | Reference in 1-reg |
void func(const int1 &) |
Reference in 1-reg | Reference in 1-reg |
void func(const char3 &) |
Reference in 1-reg | Reference in 1-reg |
void func(int3 &&) |
Reference in 1-reg | Reference in 1-reg |
void func(int2 &&) |
Reference in 1-reg | Reference in 1-reg |
void func(int1 &&) |
Reference in 1-reg | Reference in 1-reg |
void func(char3 &&) |
Reference in 1-reg | Reference in 1-reg |
As you can see it gets pretty complex, references are all references however when you pass-by-value things vary based on different platforms, different sizes, there is many more complex situations, especially when it comes to System V ABI’s unpacking of structs, which can make it fairly effective passing by value. Windows also has many attributes which can be used to improve and change it’s ABI for example __vectorcall
which allow it to unpack some structures similar behavior to System V.
Now the “Invisible Reference” thing here is important to understand a bit better, in many cases this will be a copy of the original data so it has a memory address to reference as the initial value is not allowed to change so referencing it is dangerous.
This also goes for the standard “Reference” if you are using an l-value reference from something which would typically exist in registers (in x86-64 this is where most your code and calculations hopefully live).
Even with all of this information it’s nearly impossible to define a one size fits all approach for how functions should accept arguments when it comes to performance, you should just try to consider the basics of avoiding copying heavy to copy objects and moving heavy to move objects (which hopefully you never have heavy to move objects), and focusing on readability and maintainability unless you really need that last drop of performance and this is your bottleneck then and only then bother improving it, and do it with benchmarks and on all your supported platforms.
As a follow-up to this I plan to dig further into Vector math libraries commonly seen in 3D graphics and games and the implications an ABI can have.