improve atomic read from InterlockedCompareExchange()
up vote
2
down vote
favorite
Assuming architecture is ARM64 or x86-64.
I want to make sure if these two are equivalent:
a = _InterlockedCompareExchange64((__int64*)p, 0, 0);
MyBarrier(); a = *(volatile __int64*)p; MyBarrier();
Where MyBarrier()
is a memory barrier (hint) of compiler level, like __asm__ __volatile__ ("" ::: "memory")
.
So method 2 is supposed to be faster than method 1.
I heard that _Interlocked()
functions would also imply memory barrier of both compiler and hardware level.
I heard that read (proper-aligned) intrinsic data is atomic on these architectures, but I am not sure if method 2 could be widely used?
(ps. because I think CPU will handle data dependency automatically so hardware barrier is not much considered here.)
Thank you for any advise/correction on this.
Here is some benchmarks on Ivy Bridge (i5 laptop).
(1E+006 loops: 27ms):
; __int64 a = _InterlockedCompareExchange64((__int64*)p, 0, 0);
xor eax, eax
lock cmpxchg QWORD PTR val$[rsp], rbx
(1E+006 loops: 27ms):
; __faststorefence(); __int64 a = *(volatile __int64*)p;
lock or DWORD PTR [rsp], 0
mov rcx, QWORD PTR val$[rsp]
(1E+006 loops: 7ms):
; _mm_sfence(); __int64 a = *(volatile __int64*)p;
sfence
mov rcx, QWORD PTR val$[rsp]
(1E+006 loops: 1.26ms, not synchronized?):
; __int64 a = *(volatile __int64*)p;
mov rcx, QWORD PTR val$[rsp]
c++ multithreading 64bit atomicity interlocked
add a comment |
up vote
2
down vote
favorite
Assuming architecture is ARM64 or x86-64.
I want to make sure if these two are equivalent:
a = _InterlockedCompareExchange64((__int64*)p, 0, 0);
MyBarrier(); a = *(volatile __int64*)p; MyBarrier();
Where MyBarrier()
is a memory barrier (hint) of compiler level, like __asm__ __volatile__ ("" ::: "memory")
.
So method 2 is supposed to be faster than method 1.
I heard that _Interlocked()
functions would also imply memory barrier of both compiler and hardware level.
I heard that read (proper-aligned) intrinsic data is atomic on these architectures, but I am not sure if method 2 could be widely used?
(ps. because I think CPU will handle data dependency automatically so hardware barrier is not much considered here.)
Thank you for any advise/correction on this.
Here is some benchmarks on Ivy Bridge (i5 laptop).
(1E+006 loops: 27ms):
; __int64 a = _InterlockedCompareExchange64((__int64*)p, 0, 0);
xor eax, eax
lock cmpxchg QWORD PTR val$[rsp], rbx
(1E+006 loops: 27ms):
; __faststorefence(); __int64 a = *(volatile __int64*)p;
lock or DWORD PTR [rsp], 0
mov rcx, QWORD PTR val$[rsp]
(1E+006 loops: 7ms):
; _mm_sfence(); __int64 a = *(volatile __int64*)p;
sfence
mov rcx, QWORD PTR val$[rsp]
(1E+006 loops: 1.26ms, not synchronized?):
; __int64 a = *(volatile __int64*)p;
mov rcx, QWORD PTR val$[rsp]
c++ multithreading 64bit atomicity interlocked
It is just not equivalent. sfence ensures that the store is visible but doesn't make sure that the load is fresh. So no atomic read at all. mfence is equivalent, good odds that it won't make any difference anymore. Maybe you meant lfence, hard to tell.
– Hans Passant
Nov 23 at 10:43
add a comment |
up vote
2
down vote
favorite
up vote
2
down vote
favorite
Assuming architecture is ARM64 or x86-64.
I want to make sure if these two are equivalent:
a = _InterlockedCompareExchange64((__int64*)p, 0, 0);
MyBarrier(); a = *(volatile __int64*)p; MyBarrier();
Where MyBarrier()
is a memory barrier (hint) of compiler level, like __asm__ __volatile__ ("" ::: "memory")
.
So method 2 is supposed to be faster than method 1.
I heard that _Interlocked()
functions would also imply memory barrier of both compiler and hardware level.
I heard that read (proper-aligned) intrinsic data is atomic on these architectures, but I am not sure if method 2 could be widely used?
(ps. because I think CPU will handle data dependency automatically so hardware barrier is not much considered here.)
Thank you for any advise/correction on this.
Here is some benchmarks on Ivy Bridge (i5 laptop).
(1E+006 loops: 27ms):
; __int64 a = _InterlockedCompareExchange64((__int64*)p, 0, 0);
xor eax, eax
lock cmpxchg QWORD PTR val$[rsp], rbx
(1E+006 loops: 27ms):
; __faststorefence(); __int64 a = *(volatile __int64*)p;
lock or DWORD PTR [rsp], 0
mov rcx, QWORD PTR val$[rsp]
(1E+006 loops: 7ms):
; _mm_sfence(); __int64 a = *(volatile __int64*)p;
sfence
mov rcx, QWORD PTR val$[rsp]
(1E+006 loops: 1.26ms, not synchronized?):
; __int64 a = *(volatile __int64*)p;
mov rcx, QWORD PTR val$[rsp]
c++ multithreading 64bit atomicity interlocked
Assuming architecture is ARM64 or x86-64.
I want to make sure if these two are equivalent:
a = _InterlockedCompareExchange64((__int64*)p, 0, 0);
MyBarrier(); a = *(volatile __int64*)p; MyBarrier();
Where MyBarrier()
is a memory barrier (hint) of compiler level, like __asm__ __volatile__ ("" ::: "memory")
.
So method 2 is supposed to be faster than method 1.
I heard that _Interlocked()
functions would also imply memory barrier of both compiler and hardware level.
I heard that read (proper-aligned) intrinsic data is atomic on these architectures, but I am not sure if method 2 could be widely used?
(ps. because I think CPU will handle data dependency automatically so hardware barrier is not much considered here.)
Thank you for any advise/correction on this.
Here is some benchmarks on Ivy Bridge (i5 laptop).
(1E+006 loops: 27ms):
; __int64 a = _InterlockedCompareExchange64((__int64*)p, 0, 0);
xor eax, eax
lock cmpxchg QWORD PTR val$[rsp], rbx
(1E+006 loops: 27ms):
; __faststorefence(); __int64 a = *(volatile __int64*)p;
lock or DWORD PTR [rsp], 0
mov rcx, QWORD PTR val$[rsp]
(1E+006 loops: 7ms):
; _mm_sfence(); __int64 a = *(volatile __int64*)p;
sfence
mov rcx, QWORD PTR val$[rsp]
(1E+006 loops: 1.26ms, not synchronized?):
; __int64 a = *(volatile __int64*)p;
mov rcx, QWORD PTR val$[rsp]
c++ multithreading 64bit atomicity interlocked
c++ multithreading 64bit atomicity interlocked
edited Nov 23 at 10:51
asked Nov 22 at 7:56
cozmoz
143
143
It is just not equivalent. sfence ensures that the store is visible but doesn't make sure that the load is fresh. So no atomic read at all. mfence is equivalent, good odds that it won't make any difference anymore. Maybe you meant lfence, hard to tell.
– Hans Passant
Nov 23 at 10:43
add a comment |
It is just not equivalent. sfence ensures that the store is visible but doesn't make sure that the load is fresh. So no atomic read at all. mfence is equivalent, good odds that it won't make any difference anymore. Maybe you meant lfence, hard to tell.
– Hans Passant
Nov 23 at 10:43
It is just not equivalent. sfence ensures that the store is visible but doesn't make sure that the load is fresh. So no atomic read at all. mfence is equivalent, good odds that it won't make any difference anymore. Maybe you meant lfence, hard to tell.
– Hans Passant
Nov 23 at 10:43
It is just not equivalent. sfence ensures that the store is visible but doesn't make sure that the load is fresh. So no atomic read at all. mfence is equivalent, good odds that it won't make any difference anymore. Maybe you meant lfence, hard to tell.
– Hans Passant
Nov 23 at 10:43
add a comment |
1 Answer
1
active
oldest
votes
up vote
1
down vote
For the second version to be functionally equivalent, you obviously need atomic 64-bit reads, which is true on your platform.
However, _MemoryBarrier()
is not a "hint to the compiler". _MemoryBarrier()
on x86 prevents compiler and CPU reordering, and also ensures global visibility after the write. You also probably only need the first _MemoryBarrier()
, the second one could be replaced with a _ReadWriteBarrier()
unless a
is also a shared variable - but you don't even need that since you are reading through a volatile pointer, which will prevent any compiler reordering in MSVC.
When you create this replacement, you basically end up with pretty much the same result:
// a = _InterlockedCompareExchange64((__int64*)&val, 0, 0);
xor eax, eax
lock cmpxchg QWORD PTR __int64 val, r8 ; val
// _MemoryBarrier(); a = *(volatile __int64*)&val;
lock or DWORD PTR [rsp], r8d
mov rax, QWORD PTR __int64 val ; val
Running these two in a loop, on my i7 Ivy Bridge laptop, gives equal results, within 2-3%.
However, with two memory barriers, the "optimized version" is actually around 2x slower.
So the better question is: Why are you using _InterlockedCompareExchange64
at all? If you need atomic access to a variable, use std::atomic
, and an optimizing compiler should compile it to the most optimized version for your architecture, and add all the necessary barriers to prevent reordering and ensure cache coherency.
And btw,__int64
? You should stick to standard typedefs fromstdint.h
/cstdint
.
– Groo
Nov 22 at 11:25
I am so sorry that, I previously used misleading_MemoryBarrier()
instead ofMyBarrier()
. I am not using microsoft's macro MemoryBarrier(). So the updated asm code for the 2nd version (the "optimized version"), should not includelock or DWORD PTR [rsp], r8d
, which is emited byMemoryBarrier()
.
– cozmoz
Nov 22 at 11:26
Interlocked functions are easy to understand. And I personally hate to usestd::atomic
, which is too complex to me.
– cozmoz
Nov 22 at 11:31
@cozmoz: in that case, the resulting code will not guarantee that other threads will see values being updated in the program order. Anyway, as a C++ programmer, you should really take a moment of your time and read the docs forstd::atomic
. It's standard, it works, and, most of all, it lets you convey your intents explicitly. Do you only need an atomic read? Usememory_order_relaxed
. Do you need to publish the changes across all threads with sequential consistency? Usememory_order_seq_cst
. Right now, you are placing performance optimizations above code correctness and clarity.
– Groo
Nov 22 at 12:41
Thanks. I always use interlocked functions to modify shared variables, so there's no problem in the producer threads. Since atomicity is never a problem with ARM64/x86-64, so the only require is to read true value in consumer threads. The question is if the variable is modified by some interlocked function in some producer thread, does the updated value immediately visibile in another viewer thread by a simple volatile read?
– cozmoz
Nov 23 at 9:07
|
show 3 more comments
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
For the second version to be functionally equivalent, you obviously need atomic 64-bit reads, which is true on your platform.
However, _MemoryBarrier()
is not a "hint to the compiler". _MemoryBarrier()
on x86 prevents compiler and CPU reordering, and also ensures global visibility after the write. You also probably only need the first _MemoryBarrier()
, the second one could be replaced with a _ReadWriteBarrier()
unless a
is also a shared variable - but you don't even need that since you are reading through a volatile pointer, which will prevent any compiler reordering in MSVC.
When you create this replacement, you basically end up with pretty much the same result:
// a = _InterlockedCompareExchange64((__int64*)&val, 0, 0);
xor eax, eax
lock cmpxchg QWORD PTR __int64 val, r8 ; val
// _MemoryBarrier(); a = *(volatile __int64*)&val;
lock or DWORD PTR [rsp], r8d
mov rax, QWORD PTR __int64 val ; val
Running these two in a loop, on my i7 Ivy Bridge laptop, gives equal results, within 2-3%.
However, with two memory barriers, the "optimized version" is actually around 2x slower.
So the better question is: Why are you using _InterlockedCompareExchange64
at all? If you need atomic access to a variable, use std::atomic
, and an optimizing compiler should compile it to the most optimized version for your architecture, and add all the necessary barriers to prevent reordering and ensure cache coherency.
And btw,__int64
? You should stick to standard typedefs fromstdint.h
/cstdint
.
– Groo
Nov 22 at 11:25
I am so sorry that, I previously used misleading_MemoryBarrier()
instead ofMyBarrier()
. I am not using microsoft's macro MemoryBarrier(). So the updated asm code for the 2nd version (the "optimized version"), should not includelock or DWORD PTR [rsp], r8d
, which is emited byMemoryBarrier()
.
– cozmoz
Nov 22 at 11:26
Interlocked functions are easy to understand. And I personally hate to usestd::atomic
, which is too complex to me.
– cozmoz
Nov 22 at 11:31
@cozmoz: in that case, the resulting code will not guarantee that other threads will see values being updated in the program order. Anyway, as a C++ programmer, you should really take a moment of your time and read the docs forstd::atomic
. It's standard, it works, and, most of all, it lets you convey your intents explicitly. Do you only need an atomic read? Usememory_order_relaxed
. Do you need to publish the changes across all threads with sequential consistency? Usememory_order_seq_cst
. Right now, you are placing performance optimizations above code correctness and clarity.
– Groo
Nov 22 at 12:41
Thanks. I always use interlocked functions to modify shared variables, so there's no problem in the producer threads. Since atomicity is never a problem with ARM64/x86-64, so the only require is to read true value in consumer threads. The question is if the variable is modified by some interlocked function in some producer thread, does the updated value immediately visibile in another viewer thread by a simple volatile read?
– cozmoz
Nov 23 at 9:07
|
show 3 more comments
up vote
1
down vote
For the second version to be functionally equivalent, you obviously need atomic 64-bit reads, which is true on your platform.
However, _MemoryBarrier()
is not a "hint to the compiler". _MemoryBarrier()
on x86 prevents compiler and CPU reordering, and also ensures global visibility after the write. You also probably only need the first _MemoryBarrier()
, the second one could be replaced with a _ReadWriteBarrier()
unless a
is also a shared variable - but you don't even need that since you are reading through a volatile pointer, which will prevent any compiler reordering in MSVC.
When you create this replacement, you basically end up with pretty much the same result:
// a = _InterlockedCompareExchange64((__int64*)&val, 0, 0);
xor eax, eax
lock cmpxchg QWORD PTR __int64 val, r8 ; val
// _MemoryBarrier(); a = *(volatile __int64*)&val;
lock or DWORD PTR [rsp], r8d
mov rax, QWORD PTR __int64 val ; val
Running these two in a loop, on my i7 Ivy Bridge laptop, gives equal results, within 2-3%.
However, with two memory barriers, the "optimized version" is actually around 2x slower.
So the better question is: Why are you using _InterlockedCompareExchange64
at all? If you need atomic access to a variable, use std::atomic
, and an optimizing compiler should compile it to the most optimized version for your architecture, and add all the necessary barriers to prevent reordering and ensure cache coherency.
And btw,__int64
? You should stick to standard typedefs fromstdint.h
/cstdint
.
– Groo
Nov 22 at 11:25
I am so sorry that, I previously used misleading_MemoryBarrier()
instead ofMyBarrier()
. I am not using microsoft's macro MemoryBarrier(). So the updated asm code for the 2nd version (the "optimized version"), should not includelock or DWORD PTR [rsp], r8d
, which is emited byMemoryBarrier()
.
– cozmoz
Nov 22 at 11:26
Interlocked functions are easy to understand. And I personally hate to usestd::atomic
, which is too complex to me.
– cozmoz
Nov 22 at 11:31
@cozmoz: in that case, the resulting code will not guarantee that other threads will see values being updated in the program order. Anyway, as a C++ programmer, you should really take a moment of your time and read the docs forstd::atomic
. It's standard, it works, and, most of all, it lets you convey your intents explicitly. Do you only need an atomic read? Usememory_order_relaxed
. Do you need to publish the changes across all threads with sequential consistency? Usememory_order_seq_cst
. Right now, you are placing performance optimizations above code correctness and clarity.
– Groo
Nov 22 at 12:41
Thanks. I always use interlocked functions to modify shared variables, so there's no problem in the producer threads. Since atomicity is never a problem with ARM64/x86-64, so the only require is to read true value in consumer threads. The question is if the variable is modified by some interlocked function in some producer thread, does the updated value immediately visibile in another viewer thread by a simple volatile read?
– cozmoz
Nov 23 at 9:07
|
show 3 more comments
up vote
1
down vote
up vote
1
down vote
For the second version to be functionally equivalent, you obviously need atomic 64-bit reads, which is true on your platform.
However, _MemoryBarrier()
is not a "hint to the compiler". _MemoryBarrier()
on x86 prevents compiler and CPU reordering, and also ensures global visibility after the write. You also probably only need the first _MemoryBarrier()
, the second one could be replaced with a _ReadWriteBarrier()
unless a
is also a shared variable - but you don't even need that since you are reading through a volatile pointer, which will prevent any compiler reordering in MSVC.
When you create this replacement, you basically end up with pretty much the same result:
// a = _InterlockedCompareExchange64((__int64*)&val, 0, 0);
xor eax, eax
lock cmpxchg QWORD PTR __int64 val, r8 ; val
// _MemoryBarrier(); a = *(volatile __int64*)&val;
lock or DWORD PTR [rsp], r8d
mov rax, QWORD PTR __int64 val ; val
Running these two in a loop, on my i7 Ivy Bridge laptop, gives equal results, within 2-3%.
However, with two memory barriers, the "optimized version" is actually around 2x slower.
So the better question is: Why are you using _InterlockedCompareExchange64
at all? If you need atomic access to a variable, use std::atomic
, and an optimizing compiler should compile it to the most optimized version for your architecture, and add all the necessary barriers to prevent reordering and ensure cache coherency.
For the second version to be functionally equivalent, you obviously need atomic 64-bit reads, which is true on your platform.
However, _MemoryBarrier()
is not a "hint to the compiler". _MemoryBarrier()
on x86 prevents compiler and CPU reordering, and also ensures global visibility after the write. You also probably only need the first _MemoryBarrier()
, the second one could be replaced with a _ReadWriteBarrier()
unless a
is also a shared variable - but you don't even need that since you are reading through a volatile pointer, which will prevent any compiler reordering in MSVC.
When you create this replacement, you basically end up with pretty much the same result:
// a = _InterlockedCompareExchange64((__int64*)&val, 0, 0);
xor eax, eax
lock cmpxchg QWORD PTR __int64 val, r8 ; val
// _MemoryBarrier(); a = *(volatile __int64*)&val;
lock or DWORD PTR [rsp], r8d
mov rax, QWORD PTR __int64 val ; val
Running these two in a loop, on my i7 Ivy Bridge laptop, gives equal results, within 2-3%.
However, with two memory barriers, the "optimized version" is actually around 2x slower.
So the better question is: Why are you using _InterlockedCompareExchange64
at all? If you need atomic access to a variable, use std::atomic
, and an optimizing compiler should compile it to the most optimized version for your architecture, and add all the necessary barriers to prevent reordering and ensure cache coherency.
edited Nov 22 at 10:44
answered Nov 22 at 10:38
Groo
34.8k1383158
34.8k1383158
And btw,__int64
? You should stick to standard typedefs fromstdint.h
/cstdint
.
– Groo
Nov 22 at 11:25
I am so sorry that, I previously used misleading_MemoryBarrier()
instead ofMyBarrier()
. I am not using microsoft's macro MemoryBarrier(). So the updated asm code for the 2nd version (the "optimized version"), should not includelock or DWORD PTR [rsp], r8d
, which is emited byMemoryBarrier()
.
– cozmoz
Nov 22 at 11:26
Interlocked functions are easy to understand. And I personally hate to usestd::atomic
, which is too complex to me.
– cozmoz
Nov 22 at 11:31
@cozmoz: in that case, the resulting code will not guarantee that other threads will see values being updated in the program order. Anyway, as a C++ programmer, you should really take a moment of your time and read the docs forstd::atomic
. It's standard, it works, and, most of all, it lets you convey your intents explicitly. Do you only need an atomic read? Usememory_order_relaxed
. Do you need to publish the changes across all threads with sequential consistency? Usememory_order_seq_cst
. Right now, you are placing performance optimizations above code correctness and clarity.
– Groo
Nov 22 at 12:41
Thanks. I always use interlocked functions to modify shared variables, so there's no problem in the producer threads. Since atomicity is never a problem with ARM64/x86-64, so the only require is to read true value in consumer threads. The question is if the variable is modified by some interlocked function in some producer thread, does the updated value immediately visibile in another viewer thread by a simple volatile read?
– cozmoz
Nov 23 at 9:07
|
show 3 more comments
And btw,__int64
? You should stick to standard typedefs fromstdint.h
/cstdint
.
– Groo
Nov 22 at 11:25
I am so sorry that, I previously used misleading_MemoryBarrier()
instead ofMyBarrier()
. I am not using microsoft's macro MemoryBarrier(). So the updated asm code for the 2nd version (the "optimized version"), should not includelock or DWORD PTR [rsp], r8d
, which is emited byMemoryBarrier()
.
– cozmoz
Nov 22 at 11:26
Interlocked functions are easy to understand. And I personally hate to usestd::atomic
, which is too complex to me.
– cozmoz
Nov 22 at 11:31
@cozmoz: in that case, the resulting code will not guarantee that other threads will see values being updated in the program order. Anyway, as a C++ programmer, you should really take a moment of your time and read the docs forstd::atomic
. It's standard, it works, and, most of all, it lets you convey your intents explicitly. Do you only need an atomic read? Usememory_order_relaxed
. Do you need to publish the changes across all threads with sequential consistency? Usememory_order_seq_cst
. Right now, you are placing performance optimizations above code correctness and clarity.
– Groo
Nov 22 at 12:41
Thanks. I always use interlocked functions to modify shared variables, so there's no problem in the producer threads. Since atomicity is never a problem with ARM64/x86-64, so the only require is to read true value in consumer threads. The question is if the variable is modified by some interlocked function in some producer thread, does the updated value immediately visibile in another viewer thread by a simple volatile read?
– cozmoz
Nov 23 at 9:07
And btw,
__int64
? You should stick to standard typedefs from stdint.h
/cstdint
.– Groo
Nov 22 at 11:25
And btw,
__int64
? You should stick to standard typedefs from stdint.h
/cstdint
.– Groo
Nov 22 at 11:25
I am so sorry that, I previously used misleading
_MemoryBarrier()
instead of MyBarrier()
. I am not using microsoft's macro MemoryBarrier(). So the updated asm code for the 2nd version (the "optimized version"), should not include lock or DWORD PTR [rsp], r8d
, which is emited by MemoryBarrier()
.– cozmoz
Nov 22 at 11:26
I am so sorry that, I previously used misleading
_MemoryBarrier()
instead of MyBarrier()
. I am not using microsoft's macro MemoryBarrier(). So the updated asm code for the 2nd version (the "optimized version"), should not include lock or DWORD PTR [rsp], r8d
, which is emited by MemoryBarrier()
.– cozmoz
Nov 22 at 11:26
Interlocked functions are easy to understand. And I personally hate to use
std::atomic
, which is too complex to me.– cozmoz
Nov 22 at 11:31
Interlocked functions are easy to understand. And I personally hate to use
std::atomic
, which is too complex to me.– cozmoz
Nov 22 at 11:31
@cozmoz: in that case, the resulting code will not guarantee that other threads will see values being updated in the program order. Anyway, as a C++ programmer, you should really take a moment of your time and read the docs for
std::atomic
. It's standard, it works, and, most of all, it lets you convey your intents explicitly. Do you only need an atomic read? Use memory_order_relaxed
. Do you need to publish the changes across all threads with sequential consistency? Use memory_order_seq_cst
. Right now, you are placing performance optimizations above code correctness and clarity.– Groo
Nov 22 at 12:41
@cozmoz: in that case, the resulting code will not guarantee that other threads will see values being updated in the program order. Anyway, as a C++ programmer, you should really take a moment of your time and read the docs for
std::atomic
. It's standard, it works, and, most of all, it lets you convey your intents explicitly. Do you only need an atomic read? Use memory_order_relaxed
. Do you need to publish the changes across all threads with sequential consistency? Use memory_order_seq_cst
. Right now, you are placing performance optimizations above code correctness and clarity.– Groo
Nov 22 at 12:41
Thanks. I always use interlocked functions to modify shared variables, so there's no problem in the producer threads. Since atomicity is never a problem with ARM64/x86-64, so the only require is to read true value in consumer threads. The question is if the variable is modified by some interlocked function in some producer thread, does the updated value immediately visibile in another viewer thread by a simple volatile read?
– cozmoz
Nov 23 at 9:07
Thanks. I always use interlocked functions to modify shared variables, so there's no problem in the producer threads. Since atomicity is never a problem with ARM64/x86-64, so the only require is to read true value in consumer threads. The question is if the variable is modified by some interlocked function in some producer thread, does the updated value immediately visibile in another viewer thread by a simple volatile read?
– cozmoz
Nov 23 at 9:07
|
show 3 more comments
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53426225%2fimprove-atomic-read-from-interlockedcompareexchange%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
It is just not equivalent. sfence ensures that the store is visible but doesn't make sure that the load is fresh. So no atomic read at all. mfence is equivalent, good odds that it won't make any difference anymore. Maybe you meant lfence, hard to tell.
– Hans Passant
Nov 23 at 10:43