What time 18446744073709550.000 meansVadim Tkachenko
Sometimes when you do profiling you can see number like this in timestamps. Periodically in our patches there was Query_time: 18446744073709550.000 in slow.log file (well, it was fixed recently, but still appears in other places).
I faced this problem several years ago when only 2-core AMD Opteron systems appeared and I noticed sysbench getting crazy showing query execution time 18446744073709550.000 or like this.
Obviously this is unsigned integer which was received by subtraction bigger number from smaller.
But how it can be if we use:
start_time = clock_gettime(CLOCK_THREAD_CPUTIME_ID, &tp);
end_time = clock_gettime(CLOCK_THREAD_CPUTIME_ID, &tp);
total_time = end_time – start_time;
How we can get query executed in negative time ? It would be too good to be real :).
After some research I found the reason, and it is interesting that I want to share it even few years later. The reason is that different CPU on the same motherboard uses different frequency generators (each CPU has its own frequency generator, though exact implementation depends on CPU vendor), and there is no way to get these generators synchronized. That is if in your server you have 4 CPU, each of them lives on its own time. Usually difference between generators is very small (microseconds matter), but it appears this difference is enough to get negative time.
It happens when your query started on one CPU and during execution it was switched to another CPU (where generator delayed to first CPU) and time between two measurements were small enough to catch difference in generators. In result you have negative time, but it actually, it is more serious, it mean – we can’t get exact time between two points in application.
So designing profiling in your application be aware of. I wonder how realtime sensor systems, where microseconds are important, deal with this.