Qt Signal Slot Thread Safety

broken image


For any C++ developer who's used Qt, we've grown to love the Signals/Slots idiom it presents for creating clean Observer code. However, it relied on the Qt Moc pre-compiler tool, which meant any project that wanted to use this feature had to use follow along with the Qt idiom, which really made Qt applications look potentially foreign despite being written in C++. In addition to this the implementation wasn't type-safe (a.k.a. no compile-time type checking), and doesn't generalize to any callable target (you have to extend QtObject and declare a slot using Qt's syntax, can only return void).

As mentioned, each program has one thread when it is started. This thread is called the 'main thread' (also known as the 'GUI thread' in Qt applications). The Qt GUI must run in this thread. All widgets and several related classes, for example QPixmap, don't work in secondary threads. On the other hand, we might have to use synchronisation primitives like mutexes or wait conditions to model the flow of data in a realistic producer/consumer scenarios (like when Qt uses QMetaCallEvent for signal/slot connections crossing threads). This will shadow the impact the locks in the allocator has. This might sound somewhat uninteresting at first, but it means you can have your own signals and slots outside the main thread. The Trolls created a new way to connect signals to slots such that signals can actually cross thread boundaries. I can now emit a signal in one thread and receive it in a slot in a different thread.

Since then, there have been multiple implementations of Signals/Slots. Some notable ones are listed below:

  • Boost Signals. Not thread safe, performance wasn't great, now deprecated in favor of Boost Signals2. Licensed under the Boost Liscense.
  • Boost Signals2. Thread safe upgrade of Boost Signals. Others have complained about its performance, but my tests seem to show it's at least decent. Licensed under the Boost Liscense.
  • Libsigc++. Supposedly decently quick, but not thread safe. I think this is also a near-fully featured implementation like Boost Signals/Signals2. Also licensed under LGPL, making use somewhat restricted.
  • libtscb. A thread safe fairly quick quick implementation. However, it skimps on features (I think it offers similar features to Qt's implementation). Also licensed under LGPL.
  • C++11-based Implementation. This is actually another blog which sought to implement Signals/Slots using new features brought by C++11, namely variadic templates, std::function, and possibly more. This is CC0 licensed (public domain). Probably one of the fastest implementations I have seen, but is not thread-safe.

I was wondering if I could implement a more feature-full implementation like Boost Signals2 bet with better performance like libtscb. I'll be using some C++11 features like the last implementation, notably atomics, std::function, and variadic templates. I'm also using Boost shared_ptr/weak_ptr, Boost mutex, and Boost shared_ptr atomics. These libraries are being used because currently the compiler I'm using doesn't have the standard variants implemented (Mingw w64 gcc-4.8.1).

For the most part, I was able to implement (or actually, can see a straight-forward implementation) nearly all of the features provided by Boost Signals2. There are some semantic changes (notably the interface for combiners is different), and I was unable to capture the O(log(n)) performance of inserting group-ordered slots. My implementation likely will have O(n) group insertion.

Some basic observations I've noticed about how Signals/Slots are usually used:

  • Connecting/disconnecting slots is usually not that time critical.
  • Most signals either have no slots, or very few slots connected.
  • Signals may be invoked from multiple threads, and usually can be evaluated asynchronously. It seems like slots usually can be evaluated asynchronously, but they could be strongly ordered, too. In any case, only forward iteration is required for emitting a signal.
  • Emitting should be as fast as possible.

Thinking about how to best reach these goals, I decided to use an implementation which guarantees a wait-free singly linked list for read operations. The back-end implementation is still a doubly linked list, and only one writer is allowed at a time.

Memory management is taken care of using shared_ptr/weak_ptr. I wanted to see if I could implement using only standard C++11, and theoretically I could have, but unfortunately I don't have access to a compiler with all the necessary features. Fortunately, I'm only using the Boost equivalents in a 'standards-compliant' manner, so as time goes on changing these to a pure C++11 implementation is a find/replace operation.

Qt signal slot performance

basic slot class structure

I'm utilizing a trick for transforming multiple inheritance into single inheritance. There's not really much good reason for doing so other than code bloat issues with Visual Studio. Currently my implementation uses standard C++11 features not supported by any release of Visual Studio (2012 or older), but in the future I want to work towards C++03 compatibility using various Boost libraries (since I'm already using a few for this purpose).

This is not the full code, but a near-bare bones implementation which at least shows the usage and some implementation details.

Because I am limiting myself to single inheritance chains only, I have to carefully consider what order classes get inherited (especially for classes which could be instantiated). Eventually, I ended up with the following:

  • slot_base -> slot_wrapper
  • slot -> callable -> slot_base -> slot_wrapper
  • grouped_slot -> slot -> groupable -> callable -> slot_base -> slot_wrapper
  • extended_slot -> callable -> slot_base -> slot_wrapper
  • grouped_extended_slot -> extended_slot -> groupable -> callable -> slot_base -> slot_wrapper

Basic Signal Structure

Qt Connect Signal Slot

Update: While I was implementing this I had assumed that atomic shared_ptr was lock-free. After looking into the current implementations, they currently use spinlocks on each shared_ptr. In other words, not lock-free. However, in theory my implementation could take advantage of true lock-free atomic shared_ptr's in the future.

The basic structure of the Signal class is a doubly linked list, but there internally the linked list ensures that atomically there is always a valid singly linked list from any node to the tail. I am unaware of any generic lock-free doubly linked list implementation which has all of the features I need, so the trade off I'm making is only one writer is allowed at a time. Memory management is handled entirely by shared_ptr's internal reference counting mechanisms. There might be some more efficient atomic reference counting method I could implement, but for now shared_ptr is fast enough.

In order to make signal emission lock-free, I decided to use head and tail nodes. I also place a shared_ptr to a given node which is where grouped slot insertion can begin searching from, but this only points to an existing node. I don't know if I can implement some sort of interleaved binary tree/singly linked list to allow better grouped slot insertion, but for now I'm going to assume that the somewhat crummy group insertion performance is acceptable since signal emission is the primary optimization goal.

Basic Algorithm for List Write Operations

  1. Do non-list dependant operations (mainly allocating memory for nodes and build connection object)
  2. Acquire unique write lock.
  3. Perform all necessary list modification operations. Operations which will change the implicit singly linked list must be atomic, otherwise they do not.
  4. Release lock.

Here's an example implementation for push_front (connect new slot at front of list):

Casino Party Planning - This guide will help you to plan a successful Casino Theme Party in ten steps. Download a free Casino Party Planner. Step 1 - Establish a Budget. Casino Party Budgets range from $25 to $100 per person, depending on location, food and beverage, and decor. Food will often comprise 40% of the budget, with 40% allocated to. Planning a casino themed party. A casino party is a great theme for a guy's birthday party. Whether you decide on an all-out casino theme party or a game night, your party will sure to be a big hit! Invitations: Punchbowl has this terrific free casino-themed invitations that you can send to all of your guests to get them excited for your casino theme party.

Modified Iterator

One problem I encountered while working on the combiner call implementation was how to efficiently pass iterators to the combiner. The solution I came up with means the iterator object encapsulated all state internally, in particular if it is the end node. This means that no separate end iterator is passed. I think the problem could be resolved using a similar single inheritance chain like I did for the slots classes, though I haven't tried this out yet.

I am using a std::tuple to store parameters for later execution, and unpacking them using the technique Johannes Schaub presented here.

Signal Emit Algorithm

Implementing the signal emit operation is fairly simple, the only thing to keep in mind is that operations which try to move to the next node should be atomic. I've omitted a sample Combiner implementation because there's not much different about it (other than keeping in mind an iterator currently encapsulates the end state).

Qt Signal Slot Thread Safety Pins

For my testing, I'm going to benchmark Qt Signals/Slots, Boost Signals2, and this implementation. The slot being benchmarked:

The hardware running the test is an Intel i5-m430 laptop running Windows 7 x64. Rivers casino jobs chicago. For Qt test I'm using VS2012 x64, and for the other tests I'm using Mingw-w64 gcc 4.8.1. Basically, I didn't want to re-compile Qt with mingw since I already had it built for VS2012. For the record I'm testing with *Qt5 and Boost *1.54.0.

Note*: well, actually I'm using trunk repository builds, but I'm pretty sure there's no difference between the two (it would be silly for Qt to significantly muck around with the Signals/Slots implementation, and I haven't observed any changes to Boost Signals2).
Slot
Thread

basic slot class structure

I'm utilizing a trick for transforming multiple inheritance into single inheritance. There's not really much good reason for doing so other than code bloat issues with Visual Studio. Currently my implementation uses standard C++11 features not supported by any release of Visual Studio (2012 or older), but in the future I want to work towards C++03 compatibility using various Boost libraries (since I'm already using a few for this purpose).

This is not the full code, but a near-bare bones implementation which at least shows the usage and some implementation details.

Because I am limiting myself to single inheritance chains only, I have to carefully consider what order classes get inherited (especially for classes which could be instantiated). Eventually, I ended up with the following:

  • slot_base -> slot_wrapper
  • slot -> callable -> slot_base -> slot_wrapper
  • grouped_slot -> slot -> groupable -> callable -> slot_base -> slot_wrapper
  • extended_slot -> callable -> slot_base -> slot_wrapper
  • grouped_extended_slot -> extended_slot -> groupable -> callable -> slot_base -> slot_wrapper

Basic Signal Structure

Qt Connect Signal Slot

Update: While I was implementing this I had assumed that atomic shared_ptr was lock-free. After looking into the current implementations, they currently use spinlocks on each shared_ptr. In other words, not lock-free. However, in theory my implementation could take advantage of true lock-free atomic shared_ptr's in the future.

The basic structure of the Signal class is a doubly linked list, but there internally the linked list ensures that atomically there is always a valid singly linked list from any node to the tail. I am unaware of any generic lock-free doubly linked list implementation which has all of the features I need, so the trade off I'm making is only one writer is allowed at a time. Memory management is handled entirely by shared_ptr's internal reference counting mechanisms. There might be some more efficient atomic reference counting method I could implement, but for now shared_ptr is fast enough.

In order to make signal emission lock-free, I decided to use head and tail nodes. I also place a shared_ptr to a given node which is where grouped slot insertion can begin searching from, but this only points to an existing node. I don't know if I can implement some sort of interleaved binary tree/singly linked list to allow better grouped slot insertion, but for now I'm going to assume that the somewhat crummy group insertion performance is acceptable since signal emission is the primary optimization goal.

Basic Algorithm for List Write Operations

  1. Do non-list dependant operations (mainly allocating memory for nodes and build connection object)
  2. Acquire unique write lock.
  3. Perform all necessary list modification operations. Operations which will change the implicit singly linked list must be atomic, otherwise they do not.
  4. Release lock.

Here's an example implementation for push_front (connect new slot at front of list):

Casino Party Planning - This guide will help you to plan a successful Casino Theme Party in ten steps. Download a free Casino Party Planner. Step 1 - Establish a Budget. Casino Party Budgets range from $25 to $100 per person, depending on location, food and beverage, and decor. Food will often comprise 40% of the budget, with 40% allocated to. Planning a casino themed party. A casino party is a great theme for a guy's birthday party. Whether you decide on an all-out casino theme party or a game night, your party will sure to be a big hit! Invitations: Punchbowl has this terrific free casino-themed invitations that you can send to all of your guests to get them excited for your casino theme party.

Modified Iterator

One problem I encountered while working on the combiner call implementation was how to efficiently pass iterators to the combiner. The solution I came up with means the iterator object encapsulated all state internally, in particular if it is the end node. This means that no separate end iterator is passed. I think the problem could be resolved using a similar single inheritance chain like I did for the slots classes, though I haven't tried this out yet.

I am using a std::tuple to store parameters for later execution, and unpacking them using the technique Johannes Schaub presented here.

Signal Emit Algorithm

Implementing the signal emit operation is fairly simple, the only thing to keep in mind is that operations which try to move to the next node should be atomic. I've omitted a sample Combiner implementation because there's not much different about it (other than keeping in mind an iterator currently encapsulates the end state).

Qt Signal Slot Thread Safety Pins

For my testing, I'm going to benchmark Qt Signals/Slots, Boost Signals2, and this implementation. The slot being benchmarked:

The hardware running the test is an Intel i5-m430 laptop running Windows 7 x64. Rivers casino jobs chicago. For Qt test I'm using VS2012 x64, and for the other tests I'm using Mingw-w64 gcc 4.8.1. Basically, I didn't want to re-compile Qt with mingw since I already had it built for VS2012. For the record I'm testing with *Qt5 and Boost *1.54.0.

Note*: well, actually I'm using trunk repository builds, but I'm pretty sure there's no difference between the two (it would be silly for Qt to significantly muck around with the Signals/Slots implementation, and I haven't observed any changes to Boost Signals2).

I'm going to time how long it takes to emit signals with various number of number of slots (yes, I am averaging over many signal emits). The std::function calls is repeated calls to a std::function wrapping this to handler, more or less used to determine how much time is spent doing actual slot work.

Test CaseNo Slots (ns)1 Slot (ns)2 Slots (ns)4 Slots (ns)8 Slots (ns)16 Slots (ns)32 Slots (ns)64 Slots (ns)static overhead (ns)dynamic overhead (ns/slot)
std::function calls03.57.515296011623203.6
Qt Signals/Slots1475115177312581112322024033.8
Boost Signals275137183253398705128025007537
My Implementation275788150272525101019802730

Qt Signal Slot Performance

So what can we tell from these results? Well, unfortunately it's difficult to know exactly how Qt compares to the others because of the different compilers. However, it is probably safe to say Qt has some short-circuit mechanism for handling empty signals. I would be curious to see how well these implementations would fare under high contention/parallel loads. I suspect my implementation would handle multiple emits with a single writer quite easily because this is what the implementation targets. I don't know how Qt or Boost Signals2 are implemented under the hood, but I suspect both of them use some sort of locking mechanism on emits.

I don't know exactly how bad Boost Signals2 performed in the past, but as far as I can tell it is probably 'acceptable' most of the time. However, it does perform poorly when there are no slots connected. I'm going to keep working on this implementation and will eventually release it when it's done (right now it basically works, but many features are unimplemented). The cost is not as good as is possible with non-thread safe implementations, but there are nice gains. I may even try submitting it for inclusion in Boost (either as signals3, or use parts to improve signals2).





broken image