- Jul 06, 2013
-
-
Siarhei Siamashka authored
Use a fixed prefetch distance 512 bytes for everything. Also make sure that 32 byte prefetch step really means a single PLD instruction executed per 32 byte data chunk. And likewise for 64 byte prefetch. We don't care too much about achieving peak performance, consistency and predictability is more important.
-
Siarhei Siamashka authored
This benchmark exposes some problems or misconfiguration in Allwinner A20 (Cortex-A7) memory subsystem. If we do reads from two separate buffers at once, then the performance drops quite significantly. It is documented that the automatic prefetcher in Cortex-A7 can only track a single data stream: http://infocenter.arm.com/help/topic/com.arm.doc.ddi0464f/CHDIGCEB.html However there is still something wrong even when using explicit PLD prefetch. Here are some results: NEON read : 1256.7 MB/s NEON read prefetched (32 bytes step) : 1346.4 MB/s (0.4%) NEON read prefetched (64 bytes step) : 1439.6 MB/s (0.4%) NEON read 2 data streams : 371.3 MB/s (0.3%) NEON read 2 data streams prefetched (32 bytes step) : 687.5 MB/s (0.5%) NEON read 2 data streams prefetched (64 bytes step) : 703.2 MB/s (0.4%) Normally we would expect that the memory bandwidth should remain roughly the same no matter how many data streams we are reading at once. But even reading two data streams is enough to demonstrate big troubles. Being able to simultaneously read from multiple data streams efficiently is important for 2D graphics (alpha blending), colorspace conversion (Planar YUV -> packed RGB) and many other things.
-
- Jul 02, 2013
-
-
Siarhei Siamashka authored
-
Siarhei Siamashka authored
-
Siarhei Siamashka authored
Now we try to run two rounds of test: one with huge pages explicitly disabled, and another one with huge pages enabled. Additionally, the minimal block size used for latency benchmarks is now 1024. Testing smaller blocks is just a waste of time.
-
- Jun 25, 2013
-
-
Siarhei Siamashka authored
Just select a random offset in order to mitigate the unpredictability of cache associativity effects when dealing with different physical memory fragmentation (for PIPT caches). We are reporting the "best" measured latency, some offsets may be better than the others.
-
- Mar 23, 2013
-
-
Siarhei Siamashka authored
/tmp/ccej9DYL.s:47: Rd and Rm should be different in mla (repeated) /tmp/ccej9DYL.s:754: Rd and Rm should be different in mla (repeated) /tmp/ccej9DYL.s:720: Error: bad immediate value for offset (5328) /tmp/ccej9DYL.s:724: Error: bad immediate value for offset (5316) /tmp/ccej9DYL.s:725: Error: bad immediate value for offset (5316) https://github.com/ssvb/tinymembench/issues/1
-
- Jan 03, 2013
-
-
Siarhei Siamashka authored
-
- Dec 26, 2012
-
-
Siarhei Siamashka authored
-
Siarhei Siamashka authored
-
Siarhei Siamashka authored
-
Siarhei Siamashka authored
-
Siarhei Siamashka authored
-
Siarhei Siamashka authored
-
Siarhei Siamashka authored
-
- Dec 23, 2012
-
-
Siarhei Siamashka authored
-
Siarhei Siamashka authored
-
Siarhei Siamashka authored
-
- Nov 14, 2012
-
-
Siarhei Siamashka authored
incr - PLD prefetch hits the first bytes of cache lines wrap - PLD prefetch hist the last bytes of cache lines This can expose Raspberry Pi performance issues related to wrap cache linefills.
-
- Apr 24, 2012
-
-
Siarhei Siamashka authored
-
Siarhei Siamashka authored
-
Siarhei Siamashka authored
Now the compilers should have no chance to mess up latency measurement loops by adding unwanted memory accesses.
-
Siarhei Siamashka authored
The compilers should be a bit less likely to spill variables to stack.
-
- Oct 09, 2011
-
-
Siarhei Siamashka authored
-
Siarhei Siamashka authored
Checks how the processor can handle outstanding cache misses.
-
- Sep 12, 2011
-
-
Siarhei Siamashka authored
-
Siarhei Siamashka authored
-
Siarhei Siamashka authored
-
Siarhei Siamashka authored
-
- Sep 10, 2011
-
-
Siarhei Siamashka authored
-
Siarhei Siamashka authored
-
Siarhei Siamashka authored
-
Siarhei Siamashka authored
-
- Sep 09, 2011
-
-
Siarhei Siamashka authored
-
Siarhei Siamashka authored
-
Siarhei Siamashka authored
-
Siarhei Siamashka authored
-
Siarhei Siamashka authored
-
Siarhei Siamashka authored
-
Siarhei Siamashka authored
-