Not understanding read performance higher than NVME capacity #662
Replies: 2 comments 1 reply
-
|
Thanks for the interesting inquiry. I ran this locally and I see similar results: I enabled I think that is clearly the smoking gun: somehow we are getting a very high cache hit rate in the benchmark. That explains the performance discrepancy. I am still working to figure out why we get such a high cache hit rate. My first guess is that perhaps the random-number generator we use to generate the workload is not very good. The ith number output by our generator is, more-or-less, One thing I notice is that your database is substantially larger than what I saw -- you had around 14GBs, whereas I had only around 9.5GBs. I suspect this is because the benchmark generates messages of various sizes between 8 and 100 bytes, which should result in about 8-9GBs of data after inserting 100M such messages with 24-byte keys. Did you modify the benchmark to always generate 100-byte messages? |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for the update. Actually, I think your result supports my concern that XXH64 is the problem. My concern is not that XXH64 has a lot of collisions. Rather, I suspect that I will do some experiments and possibly replace XXH64 with something else. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello everyone! I was playing with the
driver_testbenchmark, but I cannot wrap my mind around the fact that I'm getting such high randread throughput, despite the cache size being smaller than the database size and usingO_DIRECT.Here's the benchmark report and the command I launched
Here you see a whopping 1.5M lookups/second.
However, on another tab I was inspecting my Nvme activity with
iostat 1 -x, and here's a sample of it (I can tell it was quite uniform during the randread test, so this single sample is representative)So, on one hand I'm reading around 600K pages/s, which I assume for a randread test it should translate to 600K read/s (under uniform distribution, each read is going to hit a different disk page); if I consider the 2GB cache, with 100M key values (roughly each is 128 byte), it should roughly cover the 16% of the dataset, so I should expect no more than
Disk reads/s * 1.16, which would be around 696K read/s, still way lower than the measured1.5M reads/s! Andvmtouchshows that the page cache is actually bypassed, so - even though the--set-O_DIRECTflag is set, I'm positive that I'm not going through the OS page cache.Even considered this, the most I can get from my NVME with
fiois between 800K and 1M rand read/s withO_DIRECT.My question here is: what am I missing here? Where is the
1.5M reads/scoming from?Thank you!
Beta Was this translation helpful? Give feedback.
All reactions