Runtime code dispatch support
Let CineForm SDK choose available SIMD optimization if processor has such extensions.
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Grant Kim changed title from Runtime optimization support to Runtime code dispatch support
changed title from Runtime optimization support to Runtime code dispatch support
- Grant Kim changed the description
changed the description
Processor Dispatch
There are two ways to achieve this:
- Use "thunks".
- Let encoder or decoder decide it.
Common things:
- Encoder and decoder object (or struct, context...) will detect whether processor has SIMD extensions on its initialization.
Thunks method
- Register all function tables.
- Connect to corresponding function with function pointer. e.g.
// Register functions void init_function(void* table, int extension) { void* wavelet = table; switch(extension) { // ... case BUILD_WITH_SIMD_X86_AVX2: // wavelet has function tables of wavelet.c/h // void (*TransformForwardSpatialYUV)(...) wavelet->TransformForwardSpatialYUV = &TransformForwardSpatialYUV_AVX2; break; // ... and so on } } // { // ... init_function(table->wavelet, CFHD_PROCESSOR_AVX2); table->wavelet->TransformForwardSpatialYUV(...);;; }
Pros: Compare only once
Cons: Use (small but) more memory usage.
encoder/decoder dispatch method:
- switch->case e.g.
switch (encoder->deviceExtension) { # ifdef BUILD_WITH_SIMD_X86_MMX case CFHD_PROCESSOR_MMX: TransformForwardSpatialYUV_mmx(); break; # endif // BUILD_WITH_SIMD_X86_MMX # ifdef BUILD_WITH_SIMD_X86_SSE2 case CFHD_PROCESSOR_SSE2: TransformForwardSpatialYUV_sse2(); break; # endif // BUILD_WITH_SIMD_X86_SSE2 # ifdef BUILD_WITH_SIMD_X86_AVX2 case CFHD_PROCESSOR_AVX2: TransformForwardSpatialYUV_avx2(); break; # endif // BUILD_WITH_SIMD_X86_AVX2 # ifdef BUILD_WITH_SIMD_ARM_NEON case CFHD_PROCESSOR_NEON: TransformForwardSpatialYUV_neon(); break; # endif // BUILD_WITH_SIMD_ARM_NEON # ifdef BUILD_WITH_SIMD_PPC_ALTIVEC case CFHD_PROCESSOR_ALTIVEC: TransformForwardSpatialYUV_altivec(); break; # endif // BUILD_WITH_SIMD_PPC_ALTIVEC default: // Generic scalar operation TransformForwardSpatialYUV(); }
Pros: Small code changes.
Cons: Loooong codes. Code may runs slower than "thunks"
Benchmark
57.5fps switch->case method.
207.6fps function table method.
208.0fps direct call method. (No dispatch)
Conclusion
switch->case method is 3-4 times slower. Use function tables.
Edited by Grant KimThings to do
Add table pointer to all related functions
So the other sub functions can track a proper code path such as when SSE2 code calls SSE4 codes when SSE4 flag is set.
Such as:
void func1_avx(void *function_table, int *param, int *param2) { // Do something function_table->subfunction(function_table, param, param2); }
Edited by Grant Kim
Please register or sign in to reply