A glimpse at gcc-4.8.2/libitm - Intel Transactional Memory (ITM) Implementation.

Feb 05, 2014 14:41

~/code/gcc-4.8.2/libitm/testsuite/libitm.c $ cat simple-1.c
/* Verify that two sequential runs of a transaction will complete and
produce correct results. An early test of the library did in fact
leave things in an inconsistent state following the commit of the
first transaction. */

#include

static int x;

static void start (void)
{
__transaction_atomic { x++; }
}

int main()
{
start ();
start ();

if (x != 2)
abort ();

return 0;
}
~/code/gcc-4.8.2/libitm/testsuite/libitm.c $ gcc -g -fgnu-tm simple-1.c -o simple-1
~/code/gcc-4.8.2/libitm/testsuite/libitm.c $ objdump -d simple-1
000000000040078d :
40078d: 55 push %rbp
40078e: 48 89 e5 mov %rsp,%rbp
400791: bf 2b 00 00 00 mov $0x2b,%edi
400796: b8 00 00 00 00 mov $0x0,%eax
40079b: e8 a0 fe ff ff callq 400640 <_ITM_beginTransaction@plt>
4007a0: 83 e0 02 and $0x2,%eax
4007a3: 85 c0 test %eax,%eax
4007a5: 74 16 je 4007bd
4007a7: 8b 05 df 04 20 00 mov 0x2004df(%rip),%eax # 600c8c
4007ad: 83 c0 01 add $0x1,%eax
4007b0: 89 05 d6 04 20 00 mov %eax,0x2004d6(%rip) # 600c8c
4007b6: e8 65 fe ff ff callq 400620 <_ITM_commitTransaction@plt>
4007bb: eb 1e jmp 4007db
4007bd: bf 8c 0c 60 00 mov $0x600c8c,%edi
4007c2: e8 a9 fe ff ff callq 400670 <_ITM_RU4@plt>
4007c7: 83 c0 01 add $0x1,%eax
4007ca: 89 c6 mov %eax,%esi
4007cc: bf 8c 0c 60 00 mov $0x600c8c,%edi
4007d1: e8 2a fe ff ff callq 400600 <_ITM_WU4@plt>
4007d6: e8 45 fe ff ff callq 400620 <_ITM_commitTransaction@plt>
4007db: 5d pop %rbp
4007dc: c3 retq

00000000004007dd
:
4007dd: 55 push %rbp
4007de: 48 89 e5 mov %rsp,%rbp
4007e1: e8 a7 ff ff ff callq 40078d
4007e6: e8 a2 ff ff ff callq 40078d
4007eb: 8b 05 9b 04 20 00 mov 0x20049b(%rip),%eax # 600c8c
4007f1: 83 f8 02 cmp $0x2,%eax
4007f4: 74 05 je 4007fb
4007f6: e8 f5 fd ff ff callq 4005f0
4007fb: b8 00 00 00 00 mov $0x0,%eax
400800: 5d pop %rbp
400801: c3 retq

Well basically _ITM_beginTransaction() registers the thread in the list of threads and goes through fastpath "mov x, %eax ; add $0x1, %eax and mov %eax, x" and then _ITM_commitTransaction() unregisters the thread. This works only for the single-threaded case (well maybe we aren't multithreaded at all?). Next when the second thread accesses this code block the actual x is stored somewhere in a separate memory chunk allocated by libitm and _ITM_RU4/_ITM_WU4 (Read Unsinged 4bytes, and Write Unsigned 4 bytes) are used to read/write the copy from/to the chunk. And then _ITM_commitTransaction() locks and memcpy()s data into the original address(es).

Obviously the _ITM_RU4 path is much slower (think about function call overhead, plt trampouline walkthrough, function code and address lookup in table of stored objects and we are talking about 100x or more times slower) than no-braner 'mov foo, bar' but it enables us to "commit" memory.

On the bright side Intel's TSX (included in Haswell 45xx and newer microprocessors since 2013th), Intel RTM (already implemented in libitm) and AMD's proposed AFS instructions should greatly minimize the _ITM_* functions overhead.

Oh and the _ITM_RU4 soft code path is called Software Transactional Memory (STM), with microprocessor's help it's called Hardware Transactional Memory (HTML) and GCC's implementation is called GNU Transactional Memory (GTM - mostly used inside libtm sources).

References:
http://gcc.gnu.org/wiki/TransactionalMemory
http://lwn.net/Articles/466513/
http://en.wikipedia.org/wiki/Haswell_(microarchitecture)#List_of_Haswell_processors
http://software.intel.com/en-us/blogs/2012/02/07/transactional-synchronization-in-haswell
http://staging.clarusagency.com/clients/amd/devcentral_stg/resources/archive/amd-advanced-synchronization-facility-proposal/
https://gnu.googlesource.com/gcc/+/transactional-memory
http://gcc.gnu.org/git/?p=gcc.git;a=tree;f=libitm;hb=HEAD
https://fosdem.org/2014/interviews/2014-nuno-diegues-torvald-riegel/
https://fosdem.org/2014/schedule/event/concurrent_programming_made_simple/
Previous post Next post
Up