Commits · eb1fccd5f5a7538dc54a9d1b59b6fe253bb4fbcf · Grant Kim / tinymembench

Mar 29, 2016

New variants of block based C backwards copy · eb1fccd5

Because some processors are sensitive to the order of memory
accesses, add a few more variants of memory buffer backwards
copy which do sequential memory writes in the forward direction
inside of each sub-block of certain size. The most interesting
sizes of such sub-blocks are 32 and 64 bytes, because they match
the most frequently used CPU cache line sizes.

Example reports:

== ARM Cortex A7 ==
 C copy backwards                                     :    266.5 MB/s
 C copy backwards (32 byte blocks)                    :   1015.6 MB/s
 C copy backwards (64 byte blocks)                    :   1045.7 MB/s
 C copy                                               :   1033.3 MB/s

== ARM Cortex A15 ==
 C copy backwards                                     :   1438.5 MB/s
 C copy backwards (32 byte blocks)                    :   1497.5 MB/s
 C copy backwards (64 byte blocks)                    :   2643.2 MB/s
 C copy                                               :   2985.8 MB/s

eb1fccd5

Benchmark reshuffled writes to the destination buffer · ada1db8c

Siarhei Siamashka authored 8 years ago

This is expected to test the ability to do write combining for
scattered writes and detect any possible performance penalties.

Example reports:

== ARM Cortex A7 ==
 C fill                                               :   4011.5 MB/s
 C fill (shuffle within 16 byte blocks)               :   4112.2 MB/s (0.3%)
 C fill (shuffle within 32 byte blocks)               :    333.9 MB/s
 C fill (shuffle within 64 byte blocks)               :    336.6 MB/s

== ARM Cortex A15 ==
 C fill                                               :   6065.2 MB/s (0.4%)
 C fill (shuffle within 16 byte blocks)               :   2152.0 MB/s
 C fill (shuffle within 32 byte blocks)               :   2150.7 MB/s
 C fill (shuffle within 64 byte blocks)               :   2238.2 MB/s

== ARM Cortex A53 ==
 C fill                                               :   3080.8 MB/s (0.2%)
 C fill (shuffle within 16 byte blocks)               :   3080.7 MB/s
 C fill (shuffle within 32 byte blocks)               :   3079.2 MB/s
 C fill (shuffle within 64 byte blocks)               :   3080.4 MB/s

== Intel Atom N450 ==
 C fill                                               :   1554.9 MB/s
 C fill (shuffle within 16 byte blocks)               :   1554.5 MB/s
 C fill (shuffle within 32 byte blocks)               :   1553.9 MB/s
 C fill (shuffle within 64 byte blocks)               :   1554.4 MB/s

See https://github.com/ssvb/tinymembench/issues/7

ada1db8c

Sep 24, 2013

Experimental code for benchmarking framebuffer (in linux) · 4e0b0949

Siarhei Siamashka authored 11 years ago

It is disabled by default and can be only activated by compiling
the benchmark with -DBENCH_FRAMBUFFER in CFLAGS.

Basically it can be used to check how the processor can handle
uncached reads (assuming integrated GPU and the framebuffer
in the system memory).

4e0b0949

Jul 02, 2013

Support for Transparent Huge Pages in the latency benchmark · 7e9db85f

Siarhei Siamashka authored 11 years ago

Now we try to run two rounds of test: one with huge pages
explicitly disabled, and another one with huge pages enabled.

Additionally, the minimal block size used for latency benchmarks
is now 1024. Testing smaller blocks is just a waste of time.

7e9db85f

Jun 25, 2013

Reduce the effects of cache associativity in the latency test · 009150a5

Siarhei Siamashka authored 11 years ago

Just select a random offset in order to mitigate the unpredictability
of cache associativity effects when dealing with different physical
memory fragmentation (for PIPT caches). We are reporting the "best"
measured latency, some offsets may be better than the others.

009150a5

Mar 23, 2013

Fixed build problems when compiling for armv4/armv5 · 40ad46a5

Siarhei Siamashka authored 11 years ago

/tmp/ccej9DYL.s:47: Rd and Rm should be different in mla (repeated)
/tmp/ccej9DYL.s:754: Rd and Rm should be different in mla (repeated)
/tmp/ccej9DYL.s:720: Error: bad immediate value for offset (5328)
/tmp/ccej9DYL.s:724: Error: bad immediate value for offset (5316)
/tmp/ccej9DYL.s:725: Error: bad immediate value for offset (5316)

https://github.com/ssvb/tinymembench/issues/1

40ad46a5

Dec 26, 2012
- Fix a typo · 53c978f1
  Siarhei Siamashka authored 12 years ago
  
  53c978f1
- Rename to 'tinymembench' and v0.2 release · 42afc20c
  Siarhei Siamashka authored 12 years ago
  
  42afc20c
- More explanations for the latency test and improved accuracy · 72a70ed0
  Siarhei Siamashka authored 12 years ago
  
  72a70ed0
Dec 23, 2012
- Unrolled NEON copy · 65b40986
  Siarhei Siamashka authored 12 years ago
  
  65b40986
- Stddev calculation for memory bandwidth tests · 6a910402
  Siarhei Siamashka authored 12 years ago
  
  6a910402
Apr 24, 2012
- ARM inline assembly for the latency measurement code · a669340d
  Siarhei Siamashka authored 12 years ago
  
  Now the compilers should have no chance to mess up latency measurement loops by adding unwanted memory accesses.
  a669340d
- Some tweaks for the latency measurement code · 06d2bda1
  Siarhei Siamashka authored 12 years ago
  
  The compilers should be a bit less likely to spill variables to stack.
  06d2bda1
Oct 09, 2011
- Only run the bandwidth benchmarks for non-aliased buffers · b7e356e6
  Siarhei Siamashka authored 13 years ago
  
  b7e356e6
- Added benchmark for simultaneous random reads · a0ef5ceb
  Siarhei Siamashka authored 13 years ago
  
  Checks how the processor can handle outstanding cache misses.
  a0ef5ceb
Sep 12, 2011
- The first v0.1 release · 37ef25ca
  Siarhei Siamashka authored 13 years ago
  
  37ef25ca
- Support for compilation with mingw32 · 0534ceaa
  Siarhei Siamashka authored 13 years ago
  
  0534ceaa
Sep 10, 2011
- Fixes for correct build with C++ · 25a2e190
  Siarhei Siamashka authored 13 years ago
  
  25a2e190
- Text messages updated · 2a0e8944
  Siarhei Siamashka authored 13 years ago
  
  2a0e8944
Sep 09, 2011
- Added SSE2 assembly · 01357ae8
  Siarhei Siamashka authored 13 years ago
  
  01357ae8
- Added empty stub for cpu specific assembly optimizations · 95b9f92e
  Siarhei Siamashka authored 13 years ago
  
  95b9f92e
- Use at least 1 second for each individual benchmark · 22152e77
  Siarhei Siamashka authored 13 years ago
  
  22152e77
- Typo fix (don't fill via temp buffer) · b91d4f7a
  Siarhei Siamashka authored 13 years ago
  
  b91d4f7a
- More unrolling, code is compilable with C++, two types of prefetch · aa7fa0c8
  Siarhei Siamashka authored 13 years ago
  
  aa7fa0c8
Sep 08, 2011
- Memory latency benchmark · fa38a419
  Siarhei Siamashka authored 13 years ago
  
  fa38a419
- Added benchmarks for fill operation and standard memcpy/memset · b92d370b
  Siarhei Siamashka authored 13 years ago
  
  b92d370b
- More descriptive messages · dca55e4a
  Siarhei Siamashka authored 13 years ago
  
  dca55e4a
- Some initial memory bandwidth benchmark code · 1fb82763
  Siarhei Siamashka authored 13 years ago
  
  1fb82763