Commits · eb1fccd5f5a7538dc54a9d1b59b6fe253bb4fbcf · Grant Kim / tinymembench

Mar 29, 2016

New variants of block based C backwards copy · eb1fccd5

Because some processors are sensitive to the order of memory
accesses, add a few more variants of memory buffer backwards
copy which do sequential memory writes in the forward direction
inside of each sub-block of certain size. The most interesting
sizes of such sub-blocks are 32 and 64 bytes, because they match
the most frequently used CPU cache line sizes.

Example reports:

== ARM Cortex A7 ==
 C copy backwards                                     :    266.5 MB/s
 C copy backwards (32 byte blocks)                    :   1015.6 MB/s
 C copy backwards (64 byte blocks)                    :   1045.7 MB/s
 C copy                                               :   1033.3 MB/s

== ARM Cortex A15 ==
 C copy backwards                                     :   1438.5 MB/s
 C copy backwards (32 byte blocks)                    :   1497.5 MB/s
 C copy backwards (64 byte blocks)                    :   2643.2 MB/s
 C copy                                               :   2985.8 MB/s

eb1fccd5

Benchmark reshuffled writes to the destination buffer · ada1db8c

Siarhei Siamashka authored 8 years ago

This is expected to test the ability to do write combining for
scattered writes and detect any possible performance penalties.

Example reports:

== ARM Cortex A7 ==
 C fill                                               :   4011.5 MB/s
 C fill (shuffle within 16 byte blocks)               :   4112.2 MB/s (0.3%)
 C fill (shuffle within 32 byte blocks)               :    333.9 MB/s
 C fill (shuffle within 64 byte blocks)               :    336.6 MB/s

== ARM Cortex A15 ==
 C fill                                               :   6065.2 MB/s (0.4%)
 C fill (shuffle within 16 byte blocks)               :   2152.0 MB/s
 C fill (shuffle within 32 byte blocks)               :   2150.7 MB/s
 C fill (shuffle within 64 byte blocks)               :   2238.2 MB/s

== ARM Cortex A53 ==
 C fill                                               :   3080.8 MB/s (0.2%)
 C fill (shuffle within 16 byte blocks)               :   3080.7 MB/s
 C fill (shuffle within 32 byte blocks)               :   3079.2 MB/s
 C fill (shuffle within 64 byte blocks)               :   3080.4 MB/s

== Intel Atom N450 ==
 C fill                                               :   1554.9 MB/s
 C fill (shuffle within 16 byte blocks)               :   1554.5 MB/s
 C fill (shuffle within 32 byte blocks)               :   1553.9 MB/s
 C fill (shuffle within 64 byte blocks)               :   1554.4 MB/s

See https://github.com/ssvb/tinymembench/issues/7

ada1db8c

Enforce strict order of writes in C benchmarks via volatile keyword · 6fd9baed

Siarhei Siamashka authored 8 years ago

The C compiler may attempt to reorder read and write operations when
accessing the source and destination buffers. So instead of sequential
memory accesses we may get something like a "drunk master style"
memory access pattern. Certain processors, such as ARM Cortex-A7,
do not like such memory access pattern very much and it causes
a major performance drop. The actual access pattern is unpredictable
and is sensitive to the compiler version, optimization flags and
even sometimes on some changes in unrelated parts of source code.

So use the volatile keyword for the destination pointer in order
to resolve this problem and make C benchmarks more deterministic.

See https://github.com/ssvb/tinymembench/issues/7

6fd9baed

Do prefetch via 64 byte steps in aligned_block_copy_pf64 · b40f1c03
Siarhei Siamashka authored 8 years ago
```
The old variant was just a copy of aligned_block_copy_pf32.
```
b40f1c03

Sep 10, 2011
- Fix for a nasty typo bug (found by clang static analyzer) · 36898a79
  Siarhei Siamashka authored 13 years ago
  
  36898a79
Sep 09, 2011
- More unrolling, code is compilable with C++, two types of prefetch · aa7fa0c8
  Siarhei Siamashka authored 13 years ago
  
  aa7fa0c8
Sep 08, 2011
- Added benchmarks for fill operation and standard memcpy/memset · b92d370b
  Siarhei Siamashka authored 13 years ago
  
  b92d370b
- Some initial memory bandwidth benchmark code · 1fb82763
  Siarhei Siamashka authored 13 years ago
  
  1fb82763