- Mar 29, 2016
-
-
Siarhei Siamashka authored
Because some processors are sensitive to the order of memory accesses, add a few more variants of memory buffer backwards copy which do sequential memory writes in the forward direction inside of each sub-block of certain size. The most interesting sizes of such sub-blocks are 32 and 64 bytes, because they match the most frequently used CPU cache line sizes. Example reports: == ARM Cortex A7 == C copy backwards : 266.5 MB/s C copy backwards (32 byte blocks) : 1015.6 MB/s C copy backwards (64 byte blocks) : 1045.7 MB/s C copy : 1033.3 MB/s == ARM Cortex A15 == C copy backwards : 1438.5 MB/s C copy backwards (32 byte blocks) : 1497.5 MB/s C copy backwards (64 byte blocks) : 2643.2 MB/s C copy : 2985.8 MB/s
-
Siarhei Siamashka authored
This is expected to test the ability to do write combining for scattered writes and detect any possible performance penalties. Example reports: == ARM Cortex A7 == C fill : 4011.5 MB/s C fill (shuffle within 16 byte blocks) : 4112.2 MB/s (0.3%) C fill (shuffle within 32 byte blocks) : 333.9 MB/s C fill (shuffle within 64 byte blocks) : 336.6 MB/s == ARM Cortex A15 == C fill : 6065.2 MB/s (0.4%) C fill (shuffle within 16 byte blocks) : 2152.0 MB/s C fill (shuffle within 32 byte blocks) : 2150.7 MB/s C fill (shuffle within 64 byte blocks) : 2238.2 MB/s == ARM Cortex A53 == C fill : 3080.8 MB/s (0.2%) C fill (shuffle within 16 byte blocks) : 3080.7 MB/s C fill (shuffle within 32 byte blocks) : 3079.2 MB/s C fill (shuffle within 64 byte blocks) : 3080.4 MB/s == Intel Atom N450 == C fill : 1554.9 MB/s C fill (shuffle within 16 byte blocks) : 1554.5 MB/s C fill (shuffle within 32 byte blocks) : 1553.9 MB/s C fill (shuffle within 64 byte blocks) : 1554.4 MB/s See https://github.com/ssvb/tinymembench/issues/7
-
Siarhei Siamashka authored
The C compiler may attempt to reorder read and write operations when accessing the source and destination buffers. So instead of sequential memory accesses we may get something like a "drunk master style" memory access pattern. Certain processors, such as ARM Cortex-A7, do not like such memory access pattern very much and it causes a major performance drop. The actual access pattern is unpredictable and is sensitive to the compiler version, optimization flags and even sometimes on some changes in unrelated parts of source code. So use the volatile keyword for the destination pointer in order to resolve this problem and make C benchmarks more deterministic. See https://github.com/ssvb/tinymembench/issues/7
-
Siarhei Siamashka authored
The old variant was just a copy of aligned_block_copy_pf32.
-
- Sep 10, 2011
-
-
Siarhei Siamashka authored
-
- Sep 09, 2011
-
-
Siarhei Siamashka authored
-
- Sep 08, 2011
-
-
Siarhei Siamashka authored
-
Siarhei Siamashka authored
-