CUDA SWAP atômica
int atomicCAS(int* address, int compare, int val);
//reads the 16-bit, 32-bit or 64-bit word old located at the address
//address in global or shared memory, computes
//(old == compare ? val : old) , and stores the result back to memory
//at the same address. These three operations are performed in one
//atomic transaction. The function returns old (Compare And Swap).
mrjakobdk