Skip to content
Snippets Groups Projects
  1. Mar 29, 2016
    • Siarhei Siamashka's avatar
      New variants of block based C backwards copy · eb1fccd5
      Siarhei Siamashka authored
      Because some processors are sensitive to the order of memory
      accesses, add a few more variants of memory buffer backwards
      copy which do sequential memory writes in the forward direction
      inside of each sub-block of certain size. The most interesting
      sizes of such sub-blocks are 32 and 64 bytes, because they match
      the most frequently used CPU cache line sizes.
      
      Example reports:
      
      == ARM Cortex A7 ==
       C copy backwards                                     :    266.5 MB/s
       C copy backwards (32 byte blocks)                    :   1015.6 MB/s
       C copy backwards (64 byte blocks)                    :   1045.7 MB/s
       C copy                                               :   1033.3 MB/s
      
      == ARM Cortex A15 ==
       C copy backwards                                     :   1438.5 MB/s
       C copy backwards (32 byte blocks)                    :   1497.5 MB/s
       C copy backwards (64 byte blocks)                    :   2643.2 MB/s
       C copy                                               :   2985.8 MB/s
      eb1fccd5
    • Siarhei Siamashka's avatar
      Benchmark reshuffled writes to the destination buffer · ada1db8c
      Siarhei Siamashka authored
      This is expected to test the ability to do write combining for
      scattered writes and detect any possible performance penalties.
      
      Example reports:
      
      == ARM Cortex A7 ==
       C fill                                               :   4011.5 MB/s
       C fill (shuffle within 16 byte blocks)               :   4112.2 MB/s (0.3%)
       C fill (shuffle within 32 byte blocks)               :    333.9 MB/s
       C fill (shuffle within 64 byte blocks)               :    336.6 MB/s
      
      == ARM Cortex A15 ==
       C fill                                               :   6065.2 MB/s (0.4%)
       C fill (shuffle within 16 byte blocks)               :   2152.0 MB/s
       C fill (shuffle within 32 byte blocks)               :   2150.7 MB/s
       C fill (shuffle within 64 byte blocks)               :   2238.2 MB/s
      
      == ARM Cortex A53 ==
       C fill                                               :   3080.8 MB/s (0.2%)
       C fill (shuffle within 16 byte blocks)               :   3080.7 MB/s
       C fill (shuffle within 32 byte blocks)               :   3079.2 MB/s
       C fill (shuffle within 64 byte blocks)               :   3080.4 MB/s
      
      == Intel Atom N450 ==
       C fill                                               :   1554.9 MB/s
       C fill (shuffle within 16 byte blocks)               :   1554.5 MB/s
       C fill (shuffle within 32 byte blocks)               :   1553.9 MB/s
       C fill (shuffle within 64 byte blocks)               :   1554.4 MB/s
      
      See https://github.com/ssvb/tinymembench/issues/7
      ada1db8c
    • Siarhei Siamashka's avatar
      Enforce strict order of writes in C benchmarks via volatile keyword · 6fd9baed
      Siarhei Siamashka authored
      The C compiler may attempt to reorder read and write operations when
      accessing the source and destination buffers. So instead of sequential
      memory accesses we may get something like a "drunk master style"
      memory access pattern. Certain processors, such as ARM Cortex-A7,
      do not like such memory access pattern very much and it causes
      a major performance drop. The actual access pattern is unpredictable
      and is sensitive to the compiler version, optimization flags and
      even sometimes on some changes in unrelated parts of source code.
      
      So use the volatile keyword for the destination pointer in order
      to resolve this problem and make C benchmarks more deterministic.
      
      See https://github.com/ssvb/tinymembench/issues/7
      6fd9baed
    • Siarhei Siamashka's avatar
      Do prefetch via 64 byte steps in aligned_block_copy_pf64 · b40f1c03
      Siarhei Siamashka authored
      The old variant was just a copy of aligned_block_copy_pf32.
      b40f1c03
  2. Sep 10, 2011
  3. Sep 09, 2011
  4. Sep 08, 2011
Loading