- Feb 14, 2017
-
-
Siarhei Siamashka authored
main.c:488:22: warning: The left operand of '/' is a garbage value fbsize = (fbsize / BLOCKSIZE) * BLOCKSIZE; ~~~~~~ ^ 1 warning generated.
-
- Feb 13, 2017
-
-
Siarhei Siamashka authored
This can speed up the tests on travis-ci, and we don't care about the measurements accuracy there.
-
- Apr 18, 2016
-
-
Jan Chren authored
-
- Apr 01, 2016
-
-
Siarhei Siamashka authored
-
- Mar 30, 2016
-
-
Siarhei Siamashka authored
There is no need for the -DBENCH_FRAMEBUFFER hack anymore.
-
Siarhei Siamashka authored
Keep doubling the number of loop iterations until the duration of test run exceeds 0.5s and also start from 1 loop iteration instead of 16. In the case if the memory bandwidth is extremely low (for example, running tests with the framebuffer), it makes the test duration reasonable. Also in the case if the memory bandwidth is extremely high, this approach reduces the periodic gettimeofday() calls overhead.
-
- Mar 29, 2016
-
-
Siarhei Siamashka authored
This simplifies the code a lot and allows to use different tables for different use cases.
-
Siarhei Siamashka authored
Because some processors are sensitive to the order of memory accesses, add a few more variants of memory buffer backwards copy which do sequential memory writes in the forward direction inside of each sub-block of certain size. The most interesting sizes of such sub-blocks are 32 and 64 bytes, because they match the most frequently used CPU cache line sizes. Example reports: == ARM Cortex A7 == C copy backwards : 266.5 MB/s C copy backwards (32 byte blocks) : 1015.6 MB/s C copy backwards (64 byte blocks) : 1045.7 MB/s C copy : 1033.3 MB/s == ARM Cortex A15 == C copy backwards : 1438.5 MB/s C copy backwards (32 byte blocks) : 1497.5 MB/s C copy backwards (64 byte blocks) : 2643.2 MB/s C copy : 2985.8 MB/s
-
Siarhei Siamashka authored
This is expected to test the ability to do write combining for scattered writes and detect any possible performance penalties. Example reports: == ARM Cortex A7 == C fill : 4011.5 MB/s C fill (shuffle within 16 byte blocks) : 4112.2 MB/s (0.3%) C fill (shuffle within 32 byte blocks) : 333.9 MB/s C fill (shuffle within 64 byte blocks) : 336.6 MB/s == ARM Cortex A15 == C fill : 6065.2 MB/s (0.4%) C fill (shuffle within 16 byte blocks) : 2152.0 MB/s C fill (shuffle within 32 byte blocks) : 2150.7 MB/s C fill (shuffle within 64 byte blocks) : 2238.2 MB/s == ARM Cortex A53 == C fill : 3080.8 MB/s (0.2%) C fill (shuffle within 16 byte blocks) : 3080.7 MB/s C fill (shuffle within 32 byte blocks) : 3079.2 MB/s C fill (shuffle within 64 byte blocks) : 3080.4 MB/s == Intel Atom N450 == C fill : 1554.9 MB/s C fill (shuffle within 16 byte blocks) : 1554.5 MB/s C fill (shuffle within 32 byte blocks) : 1553.9 MB/s C fill (shuffle within 64 byte blocks) : 1554.4 MB/s See https://github.com/ssvb/tinymembench/issues/7
-
- Sep 24, 2013
-
-
Siarhei Siamashka authored
It is disabled by default and can be only activated by compiling the benchmark with -DBENCH_FRAMBUFFER in CFLAGS. Basically it can be used to check how the processor can handle uncached reads (assuming integrated GPU and the framebuffer in the system memory).
-
- Jul 02, 2013
-
-
Siarhei Siamashka authored
Now we try to run two rounds of test: one with huge pages explicitly disabled, and another one with huge pages enabled. Additionally, the minimal block size used for latency benchmarks is now 1024. Testing smaller blocks is just a waste of time.
-
- Jun 25, 2013
-
-
Siarhei Siamashka authored
Just select a random offset in order to mitigate the unpredictability of cache associativity effects when dealing with different physical memory fragmentation (for PIPT caches). We are reporting the "best" measured latency, some offsets may be better than the others.
-
- Mar 23, 2013
-
-
Siarhei Siamashka authored
/tmp/ccej9DYL.s:47: Rd and Rm should be different in mla (repeated) /tmp/ccej9DYL.s:754: Rd and Rm should be different in mla (repeated) /tmp/ccej9DYL.s:720: Error: bad immediate value for offset (5328) /tmp/ccej9DYL.s:724: Error: bad immediate value for offset (5316) /tmp/ccej9DYL.s:725: Error: bad immediate value for offset (5316) https://github.com/ssvb/tinymembench/issues/1
-
- Dec 26, 2012
-
-
Siarhei Siamashka authored
-
Siarhei Siamashka authored
-
Siarhei Siamashka authored
-
- Dec 23, 2012
-
-
Siarhei Siamashka authored
-
Siarhei Siamashka authored
-
- Apr 24, 2012
-
-
Siarhei Siamashka authored
Now the compilers should have no chance to mess up latency measurement loops by adding unwanted memory accesses.
-
Siarhei Siamashka authored
The compilers should be a bit less likely to spill variables to stack.
-
- Oct 09, 2011
-
-
Siarhei Siamashka authored
-
Siarhei Siamashka authored
Checks how the processor can handle outstanding cache misses.
-
- Sep 12, 2011
-
-
Siarhei Siamashka authored
-
Siarhei Siamashka authored
-
- Sep 10, 2011
-
-
Siarhei Siamashka authored
-
Siarhei Siamashka authored
-
- Sep 09, 2011
-
-
Siarhei Siamashka authored
-
Siarhei Siamashka authored
-
Siarhei Siamashka authored
-
Siarhei Siamashka authored
-
Siarhei Siamashka authored
-
- Sep 08, 2011
-
-
Siarhei Siamashka authored
-
Siarhei Siamashka authored
-
Siarhei Siamashka authored
-
Siarhei Siamashka authored
-