Deep (ish) Dive into Microsoft std::thread Constructor Implementation

Introduction

As of C++11 the STL defines a thread class. Most documentation and examples I’ve found cover usage pretty well, but leave deeper understanding as an exercise to the reader. So that’s the goal of this discussion.

The following discussion assumes a basic understanding of std::thread as documented here or here.

While using std::thread, eventually you will need to do something similar to example 1 (pass a parameter to a function by reference).

// Example 1

void func(int &a)
{
	a++;
}

int main()
{
	int a = 0;
	std::thread t(func, a);
	
	t.join();
	
	std::cout << a << std::endl;
	
	return 0;
}

In the non-threading case such as example 2 we expect to print out a value of 1. But in the case example 1 with threading, you might be surprised to see that 0 is printed out.

// Example 2

void func(int &a)
{
	a++;
}

int main()
{
	int a = 0;
	
	func(a);
	
	std::cout << a << std::endl;
	
	return 0;
}

Most explanations will just mention that you need to use std::ref because in example 1 what is happening is that a reference to a local copy of a is what is being passed to func on the thread.

But why is a local copy of a being used? And why does std::ref stop a local copy from being used? Finding these answers is the purpose of the following discussion.

std::thread Constructor Implementation

Fortunately the Microsoft implementation of the STL is available on GitHub so it is possible to see exactly is going on.

The std::thread constructor can be found here and part of it is shown in sample 1.

// Sample 1

template <class _Fn, class... _Args, enable_if_t<!is_same_v<_Remove_cvref_t<_Fn>, thread>, int> = 0>
_NODISCARD_CTOR explicit thread(_Fn&& _Fx, _Args&&... _Ax)
 {
	_Start(_STD forward<_Fn>(_Fx), _STD forward<_Args>(_Ax)...);
	
	// snip...
 }

Ignoring all the stuff going on with the template, there are a few things to look at with the constructor parameters _FN&& _Fx and _Args&&... _Ax.

Rvalue Reference

What is &&? It is the declarator for an rvalue reference.

Deep diving on value categories (rvalues and lvalues) is out of scope but here are some good references on the subject.

The general idea is that in C++ we have the lvalue reference &, and the rvalue reference &&.

Looking at example 3, the literal 5 is an rvalue since it is unnamed. But also result is an rvalue because although it is named, it is temporary.

b and c are tricky because in scope of func they are lvalues…we know they are temporary to the scope of the function and so in a way they are rvalues…this is an “expiring” value which is totally tangential to this discussion so not going to look at that any farther.

// Example 3

int a = 5;

void func(int b, int c)
{
	int result = b + c;
	return result;
}

We experience the difference in lvalues and rvalues when we have a function such as the one in example 4, and try to use it in the ways shown in example 5.

// Example 4

void func(int &a)
{
	// do stuff
}

// Example 5

int sum(int a, int b)
{
	return a + b;
}

// usage 1
func(10);

// usage 2
func(sum(10, 5));

Both usage 1 and usage 2 are attempting to pass rvalues to a function that takes an lvalue reference.

Technically the usages shown in example 5 would work if func was defined as in example 6.

// Example 6

void func(const int &a)
{
	// do stuff
}

BUT, what if func could take rvalue references? In fact it can by using && (example 7). Then usage 1 and 2 as shown in example 8 work, but usage 3 does not 😞.

// Example 7

void func(int &&a)
{
	// do stuff
}

// Example 8

int sum(int a, int b)
{
	return a + b;
}

// usage 1
func(10);

// usage 2
func(sum(10, 5));

// usage 3
int c = 10;
func(c);

Ok so where is this going? Well, we know that the std::thread constructor takes a callable object such as a function, and there are no limitations on that function such as, “must take lvalues only” or anything like that. We also don’t really have to use wonky syntax. But we just saw some examples showing that there are complications between defining a function that takes lvalue references versus one that takes rvalue references. How does std::thread deal with this? Enter Type-Deducing Context.

Type-Deducing Context

This is the part where we can no longer ignore the template part (is there a proper name for “template part?") of the std::thread constructor.

template <class _Fn, class... _Args, enable_if_t<!is_same_v<_Remove_cvref_t<_Fn>, thread>, int> = 0>
_NODISCARD_CTOR explicit thread(_Fn&& _Fx, _Args&&... _Ax)

// snip

Aside from this being a template, it is called a “type-deducing context” and so _Fn&& and _Args&& take on special meaning.

Perfect Forwarding and Universal References in C++

In a type-deducing context, T&& acquires a special meaning. When func is instantiated, T depends on whether the argument passed to func is an lvalue or an rvalue.

If we re-wrote our function func as in example 9, now it works for all 3 usages we’ve looked at.

// Example 9

template<typename T>
void func(T&& a)
{
	// do stuff
}

int sum(int a, int b)
{
	return a + b;
}

// (1)
func(10);

// (2)
func(sum(10, 5));

// (3)
int c = 10;
func(c);

One other aspect of the template that we need to look at is the use of an ellpsis ..., which indicates that this is a variadic template.

Variadic

Taking a variable number of arguments; especially, taking arbitrarily many arguments

The variadic-ness of std::thread is pretty clear. It accounts for why we don’t specifically have to tell std::thread how many arguments are being passed to use with the callable object, it just works.

Variadic Templates

The constructor of std::thread being a variadic template is an important piece of the puzzle, as described in, The C++ Programming Language 4th ed.

The thread constructors are variadic templates. This implies that to pass a reference to a thread constructor, we must use a reference wrapper…

What is the relationship between a variadic template and needing to use a reference wrapper?

Again from The C++ Programming Language 4th ed.

The problem is that the variadic template uses bind() or some equivalent mechanism, so that a reference is by default dereferenced and the result copied.

Ah ha, now we are getting somewhere. But we’re not going to look at std::bind yet. We’re going to look for “some equivalent mechanism” in the the std::thread constructor.

Some Equivalent Mechanism

To find what we’re looking for, we have to dig into the implementation of _Start from sample 1.

The part of _Start that we’re interested in is shown in sample 2.

// Sample 2

template <class _Fn, class... _Args>
void _Start(_Fn&& _Fx, _Args&&... _Ax)
{
	// line (1)
	using _Tuple = tuple<decay_t<_Fn>, decay_t<_Args>...>;
	
	// line (2)
	auto _Decay_copied = _STD make_unique<_Tuple>(_STD forward<_Fn>(_Fx),
	                                              _STD forward<_Args>(_Ax)...);

	// line (3)
	constexpr auto _Invoker_proc = 
		_Get_invoke<_Tuple>(make_index_sequence<1 +sizeof...(_Args)>{});

	// snip ...
}

There are 2 key pieces here to understand, std::decay and std::make_unique.

std::decay

Applies lvalue-to-rvalue, array-to-pointer, and function-to-pointer implicit conversions to the type T, removes cv-qualifiers, and defines the resulting type as the member typedef type.

Or a bit more useful description from Microsoft docs

Produces the type as passed by value. Makes the type non-reference, non-const, non-volatile, or makes a pointer to the type from a function or an array type.

In sample 3, the implementation is defining _Tuple with the type decay of _Fn and _Args, which in the case of _Args means the dereferenced type. That is, if one of the _Args is a reference to an int, &int, its type decay is int.

// Sample 3

using _Tuple = tuple<decay_t<_Fn>, decay_t<_Args>...>;

As an experiment I ran the code in example 10 to construct a tuple with and without type decay.

// Example 10

#include <iostream>
#include <tuple>

void add_things(int &a)
{
    // do stuff
}

template <class _Ty>
using decay_t = typename std::decay<_Ty>::type;

template <typename _Fn, typename _Arg>
void do_stuff_wo_decay(_Fn &&fn, _Arg &&arg)
{
    std::tuple<_Fn, _Arg> foo(fn, arg);
}

template <typename _Fn, typename _Arg>
void do_stuff_w_decay(_Fn &&fn, _Arg &&arg)
{
    std::tuple<decay_t<_Fn>, decay_t<_Arg>> foo(fn, arg);
}

int main()
{
    int b = 19;

    do_stuff_w_decay(add_things, b);
    do_stuff_wo_decay(add_things, b);

    return 0;
}

Figure 1 shows a tuple named foo created with “as is” types for _Fn and _Arg. The function ends up as a function pointer, and the arg is int& so nothing new or exciting.

figure 1

Figure 2 shows the tuple foo created with type decay for _Fn. The function is still a function pointer, but take note that the arg is now just int, we have lost our reference.

figure 2

Just to further satiate my curiousity around std::decay, I took a look at the implementation, shown in sample 4. It is just using std::remove_reference on the type.

std::remove_reference

If the type is a reference type, provides the member typedef type which is the type referred to by T. Otherwise type is T.

// Sample 4
template <class _Ty>
struct decay
{
	// determines decayed version of _Ty
	
	using _Ty1 = remove_reference_t<_Ty>;
	
	// snip...
};

std::make_unique

Creates and returns a unique_ptr to an object of the specified type, which is constructed by using the specified arguments.

Coninuing to experiment, I setup example 11 which immitates line 1 and 2 in sample 2.

Note: I am intentionally not discussing std::forward because that’s a different topic.

// Example 11

#include <memory>
#include <utility>
#include <tuple>
#include <iostream>

void add_things(int &a)
{
    // do stuff
}

template <class _Ty>
using decay_t = typename std::decay<_Ty>::type;

template<typename _Fn, typename _Arg>
void do_stuff(_Fn &&fn, _Arg &&arg)
{
    using _Tuple = std::tuple<decay_t<_Fn>, decay_t<_Arg>>;

    auto thing = std::make_unique<_Tuple>(std::forward<_Fn>(fn), std::forward<_Arg>(arg));
}

int main()
{
    int b = 19;

    do_stuff(add_things, b);

    return 0;
}

The type of thing is shown in figure 3. As expected, we have a pointer to a newly constructed instance of our tuple, consisting of the type decay of the arguments fn and arg .

figure 3.

std::bind

Binds arguments to a callable object

The book mentioned bind or “similar mechanism”.

For the curious, the Microsoft implementation of std::bind can be seen here and it is indeed similar to what we have just seen in std::thread, making use of std::decay and std::tuple.

Conclusion and Trivia

Coming back to the original questions, why doesn’t this code print out 1?

// Example 12

void func(int &a)
{
	a++;
}

int main()
{
	int a = 0;
	std::thread t(func, a);
	
	t.join();
	
	std::cout << a << std::endl;
	
	return 0;
}

Looking at example 12, now we understand that a is an lvalue reference that gets decayed to an rvalue. The implementation ends up with a local copy of a through make_unique, and we can then infer that a reference to that local copy is what gets passed to func (the exact mechanism of how func goes on to be invoked is a whole other thing).

std::reference_wrapper

Is there a way we can make the type decay of a still be an lvalue reference instead of an rvalue? Yes indeed, that is called a std::reference_wrapper.

std::reference_wrapper is a class template that wraps a reference in a copyable, assignable object….

Cool, so what does std::decay (i.e std::remove_reference) do to a std::reference_wrapper?

I tested this by setting up example 13, and what seems to happen is that std::reference_wrapper is passed through. Examining the types of what_type and another_type I found that the type of what_type is std::reference_wrapper<int> while the type of another_type is int (expected).

// Example 13

int main()
{
    int a = 10;
    int *b = &a;
    auto x = std::ref(a);

    using C = typename std::remove_reference<std::reference_wrapper<int>>::type;
    using D = typename std::remove_reference<int &>::type;

    auto what_type = (C)x;
    auto another_type = (D)b;
}

There are some implementation details there that I don’t know about yet, but long story short, std::reference_wrapper allows your lvalue reference to make it through decay/remove_reference without turning into an rvalue.

Now to answer “… why does std::ref stop a local copy from being used?”

std::ref(a) returns a std::reference_wrapper for a. Thus if we modify example 13 as shown in example 14, we now get the expected result of 1.

// Example 14

void func(int &a)
{
	a++;
}

int main()
{
	int a = 0;
	std::thread t(func, std::ref(a));
	
	t.join();
	
	std::cout << a << std::endl;
	
	return 0;
}

One Last Experiment

To see if I followed all of that and if I could implement something similar-ish to the std::thread constructor, I made example 15. The expected result is 4, and indeed that is what gets printed out. But if we drop the std::ref, 1 is printed.

// Example 15

void add_a_b(int &a, int &b)
{
    a += b;
}

template<typename _Fn, typename A>
void do_things(_Fn&& _Fx, A&& a, A&& b)
{
    using _TT = typename std::decay<A>::type;

    auto decay_a = std::make_unique<_TT>(std::forward<A>(a));
    auto decay_b = std::make_unique<_TT>(std::forward<A>(b));

    _Fx(*decay_a, *decay_b);
}

int main()
{
    int a = 1;
    int b = 3;
    do_things(add_a_b, std::ref(a), std::ref(b));

    std::cout << a << std::endl;
}

Lambdas

Lambda functions are a type of rvalue

The lambda expression is a prvalue expression of unique unnamed non-union non-aggregate class type, known as closure type.

So the “special” type-deducing context handling of && is how the thread constructor can work with lambas as well as functions and functors.

But Where Does The Thread Come From?

Threads are an operating system feature…So somewhere in all of this, a Win32 specific call was made.

That call can be found here, and shown below. Specifically, it is _beginthreadex

    _Thr._Hnd = reinterpret_cast<void*>(_CSTD _beginthreadex(nullptr, 0, _Invoker_proc, _Decay_copied.get(), 0, &_Thr._Id));

Thanks for reading! Feel free to send questions, comments, concerns to me on Twitter

>> Home