4007dd: 55 push %rbp
4007de: 48 89 e5 mov %rsp,%rbp
4007e1: e8 a7 ff ff ff callq 40078d
4007e6: e8 a2 ff ff ff callq 40078d
4007eb: 8b 05 9b 04 20 00 mov 0x20049b(%rip),%eax # 600c8c
4007f1: 83 f8 02 cmp $0x2,%eax
4007f4: 74 05 je 4007fb
4007f6: e8 f5 fd ff ff callq 4005f0
4007fb: b8 00 00 00 00 mov $0x0,%eax
400800: 5d pop %rbp
400801: c3 retq
Well basically _ITM_beginTransaction() registers the thread in the list of threads and goes through fastpath "mov x, %eax ; add $0x1, %eax and mov %eax, x" and then _ITM_commitTransaction() unregisters the thread. This works only for the single-threaded case (well maybe we aren't multithreaded at all?). Next when the second thread accesses this code block the actual x is stored somewhere in a separate memory chunk allocated by libitm and _ITM_RU4/_ITM_WU4 (Read Unsinged 4bytes, and Write Unsigned 4 bytes) are used to read/write the copy from/to the chunk. And then _ITM_commitTransaction() locks and memcpy()s data into the original address(es).
Obviously the _ITM_RU4 path is much slower (think about function call overhead, plt trampouline walkthrough, function code and address lookup in table of stored objects and we are talking about 100x or more times slower) than no-braner 'mov foo, bar' but it enables us to "commit" memory.
On the bright side Intel's TSX (included in Haswell 45xx and newer microprocessors since 2013th), Intel RTM (already implemented in libitm) and AMD's proposed AFS instructions should greatly minimize the _ITM_* functions overhead.
Oh and the _ITM_RU4 soft code path is called Software Transactional Memory (STM), with microprocessor's help it's called Hardware Transactional Memory (HTML) and GCC's implementation is called GNU Transactional Memory (GTM - mostly used inside libtm sources).
References:
http://gcc.gnu.org/wiki/TransactionalMemory
http://lwn.net/Articles/466513/
http://en.wikipedia.org/wiki/Haswell_(microarchitecture)#List_of_Haswell_processors
http://software.intel.com/en-us/blogs/2012/02/07/transactional-synchronization-in-haswell
http://staging.clarusagency.com/clients/amd/devcentral_stg/resources/archive/amd-advanced-synchronization-facility-proposal/
https://gnu.googlesource.com/gcc/+/transactional-memory
http://gcc.gnu.org/git/?p=gcc.git;a=tree;f=libitm;hb=HEAD
https://fosdem.org/2014/interviews/2014-nuno-diegues-torvald-riegel/
https://fosdem.org/2014/schedule/event/concurrent_programming_made_simple/