Introduction
As of C++11 the STL defines a thread class. Most documentation and examples I’ve found cover usage pretty well, but leave deeper understanding as an exercise to the reader. So that’s the goal of this discussion.
The following discussion assumes a basic understanding of std::thread as documented here or here.
While using std::thread, eventually you will need to do something similar to example 1 (pass a parameter to a function by reference).
// Example 1
void func(int &a)
{
a++;
}
int main()
{
int a = 0;
std::thread t(func, a);
t.join();
std::cout << a << std::endl;
return 0;
}
In the non-threading case such as example 2 we expect to print out a value of 1. But in the case example 1 with threading, you might be surprised to see that 0 is printed out.
// Example 2
void func(int &a)
{
a++;
}
int main()
{
int a = 0;
func(a);
std::cout << a << std::endl;
return 0;
}
Most explanations will just mention that you need to use std::ref
because in example 1 what is happening is that a reference to a local copy of a
is what is being passed to func
on the thread.
But why is a local copy of a
being used? And why does std::ref
stop a local copy from being used? Finding these answers is the purpose of the following discussion.
std::thread Constructor Implementation
Fortunately the Microsoft implementation of the STL is available on GitHub so it is possible to see exactly is going on.
The std::thread constructor can be found here and part of it is shown in sample 1.
// Sample 1
template <class _Fn, class... _Args, enable_if_t<!is_same_v<_Remove_cvref_t<_Fn>, thread>, int> = 0>
_NODISCARD_CTOR explicit thread(_Fn&& _Fx, _Args&&... _Ax)
{
_Start(_STD forward<_Fn>(_Fx), _STD forward<_Args>(_Ax)...);
// snip...
}
Ignoring all the stuff going on with the template, there are a few things to look at with the constructor parameters _FN&& _Fx
and _Args&&... _Ax
.
Rvalue Reference
What is &&
? It is the declarator for an rvalue reference.
Deep diving on value categories (rvalues and lvalues) is out of scope but here are some good references on the subject.
The general idea is that in C++ we have the lvalue reference &
, and the rvalue reference &&
.
Looking at example 3, the literal 5
is an rvalue since it is unnamed. But also result
is an rvalue because although it is named, it is temporary.
b
and c
are tricky because in scope of func
they are lvalues…we know they are temporary to the scope of the function and so in a way they are rvalues…this is an “expiring” value which is totally tangential to this discussion so not going to look at that any farther.
// Example 3
int a = 5;
void func(int b, int c)
{
int result = b + c;
return result;
}
We experience the difference in lvalues and rvalues when we have a function such as the one in example 4, and try to use it in the ways shown in example 5.
// Example 4
void func(int &a)
{
// do stuff
}
// Example 5
int sum(int a, int b)
{
return a + b;
}
// usage 1
func(10);
// usage 2
func(sum(10, 5));
Both usage 1 and usage 2 are attempting to pass rvalues to a function that takes an lvalue reference.
Technically the usages shown in example 5 would work if func
was defined as in example 6.
// Example 6
void func(const int &a)
{
// do stuff
}
BUT, what if func
could take rvalue references? In fact it can by using &&
(example 7). Then usage 1 and 2 as shown in example 8 work, but usage 3 does not 😞.
// Example 7
void func(int &&a)
{
// do stuff
}
// Example 8
int sum(int a, int b)
{
return a + b;
}
// usage 1
func(10);
// usage 2
func(sum(10, 5));
// usage 3
int c = 10;
func(c);
Ok so where is this going? Well, we know that the std::thread constructor takes a callable object such as a function, and there are no limitations on that function such as, “must take lvalues only” or anything like that. We also don’t really have to use wonky syntax. But we just saw some examples showing that there are complications between defining a function that takes lvalue references versus one that takes rvalue references. How does std::thread deal with this? Enter Type-Deducing Context.
Type-Deducing Context
This is the part where we can no longer ignore the template part (is there a proper name for “template part?") of the std::thread constructor.
template <class _Fn, class... _Args, enable_if_t<!is_same_v<_Remove_cvref_t<_Fn>, thread>, int> = 0>
_NODISCARD_CTOR explicit thread(_Fn&& _Fx, _Args&&... _Ax)
// snip
Aside from this being a template, it is called a “type-deducing context” and so _Fn&&
and _Args&&
take on special meaning.
Perfect Forwarding and Universal References in C++
In a type-deducing context,
T&&
acquires a special meaning. Whenfunc
is instantiated,T
depends on whether the argument passed tofunc
is an lvalue or an rvalue.
If we re-wrote our function func
as in example 9, now it works for all 3 usages we’ve looked at.
// Example 9
template<typename T>
void func(T&& a)
{
// do stuff
}
int sum(int a, int b)
{
return a + b;
}
// (1)
func(10);
// (2)
func(sum(10, 5));
// (3)
int c = 10;
func(c);
One other aspect of the template that we need to look at is the use of an ellpsis ...
, which indicates that this is a variadic template.
Taking a variable number of arguments; especially, taking arbitrarily many arguments
The variadic-ness of std::thread is pretty clear. It accounts for why we don’t specifically have to tell std::thread how many arguments are being passed to use with the callable object, it just works.
Variadic Templates
The constructor of std::thread being a variadic template is an important piece of the puzzle, as described in, The C++ Programming Language 4th ed.
The thread constructors are variadic templates. This implies that to pass a reference to a thread constructor, we must use a reference wrapper…
What is the relationship between a variadic template and needing to use a reference wrapper?
Again from The C++ Programming Language 4th ed.
The problem is that the variadic template uses
bind()
or some equivalent mechanism, so that a reference is by default dereferenced and the result copied.
Ah ha, now we are getting somewhere. But we’re not going to look at std::bind yet. We’re going to look for “some equivalent mechanism” in the the std::thread constructor.
Some Equivalent Mechanism
To find what we’re looking for, we have to dig into the implementation of _Start
from sample 1.
The part of _Start
that we’re interested in is shown in sample 2.
// Sample 2
template <class _Fn, class... _Args>
void _Start(_Fn&& _Fx, _Args&&... _Ax)
{
// line (1)
using _Tuple = tuple<decay_t<_Fn>, decay_t<_Args>...>;
// line (2)
auto _Decay_copied = _STD make_unique<_Tuple>(_STD forward<_Fn>(_Fx),
_STD forward<_Args>(_Ax)...);
// line (3)
constexpr auto _Invoker_proc =
_Get_invoke<_Tuple>(make_index_sequence<1 +sizeof...(_Args)>{});
// snip ...
}
There are 2 key pieces here to understand, std::decay and std::make_unique.
std::decay
Applies lvalue-to-rvalue, array-to-pointer, and function-to-pointer implicit conversions to the type
T
, removes cv-qualifiers, and defines the resulting type as the member typedeftype
.
Or a bit more useful description from Microsoft docs
Produces the type as passed by value. Makes the type non-reference, non-const, non-volatile, or makes a pointer to the type from a function or an array type.
In sample 3, the implementation is defining _Tuple
with the type decay of _Fn
and _Args
, which in the case of _Args
means the dereferenced type. That is, if one of the _Args
is a reference to an int, &int
, its type decay is int
.
// Sample 3
using _Tuple = tuple<decay_t<_Fn>, decay_t<_Args>...>;
As an experiment I ran the code in example 10 to construct a tuple with and without type decay.
// Example 10
#include <iostream>
#include <tuple>
void add_things(int &a)
{
// do stuff
}
template <class _Ty>
using decay_t = typename std::decay<_Ty>::type;
template <typename _Fn, typename _Arg>
void do_stuff_wo_decay(_Fn &&fn, _Arg &&arg)
{
std::tuple<_Fn, _Arg> foo(fn, arg);
}
template <typename _Fn, typename _Arg>
void do_stuff_w_decay(_Fn &&fn, _Arg &&arg)
{
std::tuple<decay_t<_Fn>, decay_t<_Arg>> foo(fn, arg);
}
int main()
{
int b = 19;
do_stuff_w_decay(add_things, b);
do_stuff_wo_decay(add_things, b);
return 0;
}
Figure 1 shows a tuple named foo
created with “as is” types for _Fn
and _Arg
. The function ends up as a function pointer, and the arg is int&
so nothing new or exciting.
figure 1
Figure 2 shows the tuple foo
created with type decay for _Fn
. The function is still a function pointer, but take note that the arg is now just int
, we have lost our reference.
figure 2
Just to further satiate my curiousity around std::decay, I took a look at the implementation, shown in sample 4. It is just using std::remove_reference on the type.
If the type is a reference type, provides the member typedef type which is the type referred to by T. Otherwise type is T.
// Sample 4
template <class _Ty>
struct decay
{
// determines decayed version of _Ty
using _Ty1 = remove_reference_t<_Ty>;
// snip...
};
std::make_unique
Creates and returns a unique_ptr to an object of the specified type, which is constructed by using the specified arguments.
Coninuing to experiment, I setup example 11 which immitates line 1 and 2 in sample 2.
Note: I am intentionally not discussing std::forward because that’s a different topic.
// Example 11
#include <memory>
#include <utility>
#include <tuple>
#include <iostream>
void add_things(int &a)
{
// do stuff
}
template <class _Ty>
using decay_t = typename std::decay<_Ty>::type;
template<typename _Fn, typename _Arg>
void do_stuff(_Fn &&fn, _Arg &&arg)
{
using _Tuple = std::tuple<decay_t<_Fn>, decay_t<_Arg>>;
auto thing = std::make_unique<_Tuple>(std::forward<_Fn>(fn), std::forward<_Arg>(arg));
}
int main()
{
int b = 19;
do_stuff(add_things, b);
return 0;
}
The type of thing
is shown in figure 3. As expected, we have a pointer to a newly constructed instance of our tuple, consisting of the type decay of the arguments fn
and arg
.
figure 3.
std::bind
Binds arguments to a callable object
The book mentioned bind or “similar mechanism”.
For the curious, the Microsoft implementation of std::bind can be seen here and it is indeed similar to what we have just seen in std::thread, making use of std::decay and std::tuple.
Conclusion and Trivia
Coming back to the original questions, why doesn’t this code print out 1?
// Example 12
void func(int &a)
{
a++;
}
int main()
{
int a = 0;
std::thread t(func, a);
t.join();
std::cout << a << std::endl;
return 0;
}
Looking at example 12, now we understand that a
is an lvalue reference that gets decayed to an rvalue. The implementation ends up with a local copy of a
through make_unique
, and we can then infer that a reference to that local copy is what gets passed to func
(the exact mechanism of how func
goes on to be invoked is a whole other thing).
std::reference_wrapper
Is there a way we can make the type decay of a
still be an lvalue reference instead of an rvalue? Yes indeed, that is called a std::reference_wrapper.
std::reference_wrapper is a class template that wraps a reference in a copyable, assignable object….
Cool, so what does std::decay (i.e std::remove_reference) do to a std::reference_wrapper?
I tested this by setting up example 13, and what seems to happen is that std::reference_wrapper is passed through. Examining the types of what_type
and another_type
I found that the type of what_type
is std::reference_wrapper<int>
while the type of another_type
is int
(expected).
// Example 13
int main()
{
int a = 10;
int *b = &a;
auto x = std::ref(a);
using C = typename std::remove_reference<std::reference_wrapper<int>>::type;
using D = typename std::remove_reference<int &>::type;
auto what_type = (C)x;
auto another_type = (D)b;
}
There are some implementation details there that I don’t know about yet, but long story short, std::reference_wrapper allows your lvalue reference to make it through decay/remove_reference without turning into an rvalue.
Now to answer “… why does std::ref
stop a local copy from being used?”
std::ref(a)
returns a std::reference_wrapper for a
. Thus if we modify example 13 as shown in example 14, we now get the expected result of 1.
// Example 14
void func(int &a)
{
a++;
}
int main()
{
int a = 0;
std::thread t(func, std::ref(a));
t.join();
std::cout << a << std::endl;
return 0;
}
One Last Experiment
To see if I followed all of that and if I could implement something similar-ish to the std::thread constructor, I made example 15. The expected result is 4, and indeed that is what gets printed out. But if we drop the std::ref, 1 is printed.
// Example 15
void add_a_b(int &a, int &b)
{
a += b;
}
template<typename _Fn, typename A>
void do_things(_Fn&& _Fx, A&& a, A&& b)
{
using _TT = typename std::decay<A>::type;
auto decay_a = std::make_unique<_TT>(std::forward<A>(a));
auto decay_b = std::make_unique<_TT>(std::forward<A>(b));
_Fx(*decay_a, *decay_b);
}
int main()
{
int a = 1;
int b = 3;
do_things(add_a_b, std::ref(a), std::ref(b));
std::cout << a << std::endl;
}
Lambdas
Lambda functions are a type of rvalue
The lambda expression is a prvalue expression of unique unnamed non-union non-aggregate class type, known as closure type.
So the “special” type-deducing context handling of &&
is how the thread constructor can work with lambas as well as functions and functors.
But Where Does The Thread Come From?
Threads are an operating system feature…So somewhere in all of this, a Win32 specific call was made.
That call can be found here, and shown below. Specifically, it is _beginthreadex
_Thr._Hnd = reinterpret_cast<void*>(_CSTD _beginthreadex(nullptr, 0, _Invoker_proc, _Decay_copied.get(), 0, &_Thr._Id));
Thanks for reading! Feel free to send questions, comments, concerns to me on Twitter
>> Home