Gaff: Threading

New post in under a month! Yay!

For serious though, I’ve found that I haven’t really posted anything on the internals of some of my projects. I really want to, I just don’t really know how to go about it. That and a lot of the things that I am doing, I’m learning at the same time. Take for instance, threading. I know what a thread id and I know some of the basic synchronization techniques such as mutexes, semaphores, and what-not. But I’ve never made a robust threading system or even really made a task/job oriented Thread Pool. To add to that list, I’ve also never done anything with atomic operations either.

That said, I’ve been reading up a lot on atomic operations. I have found these two pages very useful. ([New post in under a month! Yay!

For serious though, I’ve found that I haven’t really posted anything on the internals of some of my projects. I really want to, I just don’t really know how to go about it. That and a lot of the things that I am doing, I’m learning at the same time. Take for instance, threading. I know what a thread id and I know some of the basic synchronization techniques such as mutexes, semaphores, and what-not. But I’ve never made a robust threading system or even really made a task/job oriented Thread Pool. To add to that list, I’ve also never done anything with atomic operations either.

That said, I’ve been reading up a lot on atomic operations. I have found these two pages very useful. (]1 / GCC) Using the knowledge of these two pages, I’m going to attempt to build some Lock-Free data structures. I’m not going to do anything fancy and try and make up some new, Lock-Free container that no one has ever done before. Just the prime suspects that seem useful. I currently have a file called Gaff_Atomic.h, which is just a bunch of #defines of functions that do equivalent atomic operations on supported platforms. I attained the list by cross-referencing the MSDN and GCC pages. In its current form, it looks like this:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
/************************************************************************************
Copyright (C) 2014 by Nicholas LaCroix

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
************************************************************************************/

#pragma once

#if defined(_WIN32) || defined(_WIN64)
    #include "Gaff_IncludeWindows.h"

    // Supports unsigned long, unsigned long long, unsigned int, and long
    #define AtomicCompareExchange InterlockedCompareExchange
    #define AtomicIncrement InterlockedIncrement
    #define AtomicDecrement InterlockedDecrement
    #define AtomicAcquire(ptr) InterlockedExchangeAcquire(ptr, 1)
    #define AtomicRelease InterlockedDecrementRelease

    // Only supports long
    #define AtomicAdd32 InterlockedAdd
    #define AtomicSub32(ptr, val) InterlockedAdd(ptr, -val)
    #define AtomicAddFetchOrig32 InterlockedExchangeAdd
    #define AtomicSubFetchOrig32 InterlockedExchangeSubtract

    #define AtomicAnd32(ptr, val) (InterlockedAnd(ptr, val) & val)
    #define AtomicOr32(ptr, val) (InterlockedOr(ptr, val) | val)
    #define AtomicXor32(ptr, val) (InterlockedXor(ptr, val) ^ val)
    #define AtomicAnd32FetchOrig InterlockedAnd
    #define AtomicOr32FetchOrig InterlockedOr
    #define AtomicXor32FetchOrig InterlockedXor

    // Only supports long long
    #define AtomicCompareExchange64 InterlockedCompareExchange64
    #define AtomicAdd64 InterlockedAdd64
    #define AtomicSub64(ptr, val) InterlockedAdd64(ptr, -val)
    #define AtomicIncrement64 InterlockedIncrement64
    #define AtomicDecrement64 InterlockedDecrement64
    #define AtomicAnd64(ptr, val) (InterlockedAnd64(ptr, val) & val)
    #define AtomicOr64(ptr, val) (InterlockedOr64(ptr, val) | val)
    #define AtomicXor64(ptr, val) (InterlockedXor64(ptr, val) ^ val)
    #define AtomicAnd64FetchOrig InterlockedAnd64
    #define AtomicOr64FetchOrig InterlockedOr64
    #define AtomicXor64FetchOrig InterlockedXor64

    // Supports any pointer type (except probably member function pointers)
    #define AtomicCompareExchangePointer InterlockedCompareExchangePointer

    // Only supports unsigned long, unsigned long long, and unsigned int
    #define AtomicUAdd(ptr, val) (InterlockedExchangeAdd(ptr, val) + val)
    #define AtomicUSub(ptr, val) (InterlockedExchangeSubtract(ptr, val) - val)
    #define AtomicUAddFetchOrig InterlockedExchangeAdd
    #define AtomicUSubFetchOrig InterlockedExchangeSubtract

#elif defined(__linux__) || defined(__APPLE__)
    // Supports unsigned long, unsigned long long, unsigned int, and long types
    #define AtomicCompareExchange(dest, new_val, old_val) __sync_val_compare_and_swap(dest, old_val, new_val)
    #define AtomicIncrement(ptr) __sync_add_and_fetch(ptr, 1)
    #define AtomicDecrement(ptr) __sync_sub_and_fetch(ptr, 1)
    #define AtomicAcquire(ptr) __sync_lock_test_and_set(ptr, 1)
    #define AtomicRelease __sync_lock_release

    // Only supports long
    #define AtomicAdd32 __sync_add_and_fetch
    #define AtomicSub32 __sync_sub_and_fetch
    #define AtomicAddFetchOrig32 __sync_fetch_and_add
    #define AtomicSubFetchOrig32 __sync_fetch_and_sub

    #define AtomicAnd32 __sync_and_and_fetch
    #define AtomicOr32 __sync_or_and_fetch
    #define AtomicXor32 __sync_xor_and_fetch
    #define AtomicAnd32FetchOrig __sync_fetch_and_and
    #define AtomicOr32FetchOrig __sync_fetch_and_or
    #define AtomicXor32FetchOrig __sync_fetch_and_xor

    // Only supports long long
    #define AtomicCompareExchange64(dest, new_val, old_val) __sync_val_compare_and_swap(dest, old_val, new_val)
    #define AtomicAdd64 __sync_add_and_fetch
    #define AtomicSub64 __sync_sub_and_fetch
    #define AtomicIncrement64(ptr) __sync_add_and_fetch(ptr, 1)
    #define AtomicDecrement64(ptr) __sync_sub_and_fetch(ptr, 1)
    #define AtomicAnd64 __sync_and_and_fetch
    #define AtomicOr64 __sync_or_and_fetch
    #define AtomicXor64 __sync_xor_and_fetch
    #define AtomicAnd64FetchOrig __sync_fetch_and_and
    #define AtomicOr64FetchOrig __sync_fetch_and_or
    #define AtomicXor64FetchOrig __sync_fetch_and_xor

    // Supports any pointer type (except probably member function pointers)
    #define AtomicCompareExchangePointer(dest, new_val, old_val) __sync_val_compare_and_swap(dest, old_val, new_val)

    // Only supports unsigned long, unsigned long long, and unsigned int
    #define AtomicUAdd __sync_sub_and_fetch
    #define AtomicUSub __sync_sub_and_fetch
    #define AtomicUAddFetchOrig __sync_fetch_and_add
    #define AtomicUSubFetchOrig __sync_fetch_and_sub

#else
    #error Platform not supported.
#endif

So far I haven’t done much with it. I’ve built a Spin Lock and that’s about it. I plan on building a Lock-Free stack and queue next. I then plan on using these data structures in my Thread Pool implementation. My current Thread Pool is simple and stupid. It allocates a number of threads equal to the number of cores the CPU has. It then runs a function I call the “TaskRunner” on each of those threads. Those threads then pull jobs out of a queue and run those tasks via an ITask interface. If that task had dependent jobs attached to it, it adds them to the task queue. If there are no jobs, the thread yields it’s time slice to the scheduler. That’s more or less all my Thread Pool does right now. It doesn’t grow/shrink dynamically based on task load or anything like that. My Thread Pool currently uses a Spin Lock to synchronize when grabbing a task from the queue. I hope to swap that out for a Lock-Free Queue class implementation so I don’t have a bunch of calls to ask for a lock on the Spin Lock plaguing my code.

Hmm, as far as threading goes, I don’t really think I have anything else to say. I feel I had to talk about something with a bit more meat to it. I may not be an expert, but I’d always be glad to hear from people who have good advice! I’m also looking for people who will spend a little bit of their free time to code review any bit or piece of my projects! Check out my repos of Gaff or Gleam. I haven’t set up e-mail on my new webserver yet, so comments to this post will have to do until I get that set up.

Until next time, thanks for reading!

More Reading
Newer// Nothing Much