onednn onnx threadripper

AMD Ryzen Threadripper 3990X 64-Core testing with a Gigabyte TRX40 AORUS PRO WIFI (F4p BIOS) and AMD Radeon RX 5700 8GB on Pop 21.10 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2203314-PTS-ONEDNNON39&gru&rdt&rro.

onednn onnx threadripperProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerOpenGLVulkanCompilerFile-SystemScreen ResolutionABCDAMD Ryzen Threadripper 3990X 64-Core @ 2.90GHz (64 Cores / 128 Threads)Gigabyte TRX40 AORUS PRO WIFI (F4p BIOS)AMD Starship/Matisse128GBSamsung SSD 970 EVO Plus 500GBAMD Radeon RX 5700 8GB (1750/875MHz)AMD Navi 10 HDMI AudioDELL P2415QIntel I211 + Intel Wi-Fi 6 AX200Pop 21.105.17.0-rc1-sched-core-phx (x86_64)GNOME Shell 40.5X Server4.6 Mesa 21.2.2 (LLVM 12.0.1)1.2.182GCC 11.2.0ext43840x2160OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-ZPT0kp/gcc-11-11.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-ZPT0kp/gcc-11-11.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0x8301039Python Details- Python 3.9.7Security Details- itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected

onednn onnx threadripperonnx: GPT-2 - CPU - Parallelonnx: GPT-2 - CPU - Standardonnx: yolov4 - CPU - Parallelonnx: yolov4 - CPU - Standardonnx: bertsquad-12 - CPU - Parallelonnx: bertsquad-12 - CPU - Standardonnx: fcn-resnet101-11 - CPU - Parallelonnx: fcn-resnet101-11 - CPU - Standardonnx: ArcFace ResNet-100 - CPU - Parallelonnx: ArcFace ResNet-100 - CPU - Standardonnx: super-resolution-10 - CPU - Parallelonnx: super-resolution-10 - CPU - Standardonednn: IP Shapes 1D - f32 - CPUonednn: IP Shapes 3D - f32 - CPUonednn: IP Shapes 1D - u8s8f32 - CPUonednn: IP Shapes 3D - u8s8f32 - CPUonednn: Convolution Batch Shapes Auto - f32 - CPUonednn: Deconvolution Batch shapes_1d - f32 - CPUonednn: Deconvolution Batch shapes_3d - f32 - CPUonednn: Convolution Batch Shapes Auto - u8s8f32 - CPUonednn: Deconvolution Batch shapes_1d - u8s8f32 - CPUonednn: Deconvolution Batch shapes_3d - u8s8f32 - CPUonednn: Recurrent Neural Network Training - f32 - CPUonednn: Recurrent Neural Network Inference - f32 - CPUonednn: Recurrent Neural Network Training - u8s8f32 - CPUonednn: Recurrent Neural Network Inference - u8s8f32 - CPUonednn: Matrix Multiply Batch Shapes Transformer - f32 - CPUonednn: Recurrent Neural Network Training - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPUonednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPUonednn: Matrix Multiply Batch Shapes Transformer - bf16bf16bf16 - CPUABCD34614219361293424531821531088995381573232.009535.543872.184331.137740.9412666.680052.106176.394301.526190.9791354959.961260.204954.341236.757.591655028.061208.52311.6037351247103622934256478015610721010378064011.964206.270722.351611.117740.9100566.821662.110256.433301.498710.9927135003.991251.245034.831246.076.930134997.821242.4411.3449352948233622954326468115310791017378475601.993836.286632.374031.119270.9041106.850392.112126.446891.560000.9870204964.591221.125003.381238.607.579415011.491250.9911.903534954441361300421642811571079991373173751.916816.408062.421761.107050.9284046.906072.111466.443301.455110.9848194882.281211.444950.971223.537.049814884.901254.3911.75215OpenBenchmarking.org

ONNX Runtime

Model: GPT-2 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: GPT-2 - Device: CPU - Executor: ParallelDCBA8001600240032004000SE +/- 4.21, N = 3SE +/- 6.29, N = 3SE +/- 0.44, N = 3SE +/- 7.49, N = 334953529351234611. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: GPT-2 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: GPT-2 - Device: CPU - Executor: StandardDCBA10002000300040005000SE +/- 44.12, N = 12SE +/- 17.68, N = 3SE +/- 30.47, N = 3SE +/- 60.06, N = 1244414823471042191. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: yolov4 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: yolov4 - Device: CPU - Executor: ParallelDCBA80160240320400SE +/- 0.17, N = 3SE +/- 0.33, N = 3SE +/- 0.29, N = 3SE +/- 0.50, N = 33613623623611. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: yolov4 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: yolov4 - Device: CPU - Executor: StandardDCBA70140210280350SE +/- 1.04, N = 3SE +/- 1.64, N = 3SE +/- 1.42, N = 3SE +/- 3.18, N = 43002952932931. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: bertsquad-12 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: bertsquad-12 - Device: CPU - Executor: ParallelDCBA90180270360450SE +/- 0.93, N = 3SE +/- 1.80, N = 3SE +/- 1.44, N = 3SE +/- 2.93, N = 34214324254241. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: bertsquad-12 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: bertsquad-12 - Device: CPU - Executor: StandardDCBA140280420560700SE +/- 1.32, N = 3SE +/- 2.02, N = 3SE +/- 0.67, N = 3SE +/- 3.69, N = 36426466475311. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: fcn-resnet101-11 - Device: CPU - Executor: ParallelDCBA20406080100SE +/- 0.17, N = 3SE +/- 0.17, N = 3SE +/- 0.17, N = 3SE +/- 0.17, N = 3818180821. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: fcn-resnet101-11 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: fcn-resnet101-11 - Device: CPU - Executor: StandardDCBA306090120150SE +/- 0.44, N = 3SE +/- 0.73, N = 3SE +/- 0.44, N = 3SE +/- 0.33, N = 31571531561531. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: ArcFace ResNet-100 - Device: CPU - Executor: ParallelDCBA2004006008001000SE +/- 2.33, N = 3SE +/- 7.42, N = 3SE +/- 4.36, N = 3SE +/- 5.11, N = 310791079107210881. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: ArcFace ResNet-100 - Device: CPU - Executor: StandardDCBA2004006008001000SE +/- 7.44, N = 3SE +/- 3.28, N = 3SE +/- 5.18, N = 3SE +/- 5.53, N = 3991101710109951. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: super-resolution-10 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: super-resolution-10 - Device: CPU - Executor: ParallelDCBA8001600240032004000SE +/- 42.45, N = 4SE +/- 26.17, N = 3SE +/- 19.55, N = 3SE +/- 34.71, N = 337313784378038151. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: super-resolution-10 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: super-resolution-10 - Device: CPU - Executor: StandardDCBA16003200480064008000SE +/- 46.97, N = 3SE +/- 47.29, N = 3SE +/- 409.51, N = 12SE +/- 58.03, N = 373757560640173231. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

oneDNN

Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: IP Shapes 1D - Data Type: f32 - Engine: CPUDCBA0.45210.90421.35631.80842.2605SE +/- 0.06721, N = 15SE +/- 0.05487, N = 15SE +/- 0.06425, N = 12SE +/- 0.02176, N = 151.916811.993831.964202.00953MIN: 1.25MIN: 1.39MIN: 1.46MIN: 1.591. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: IP Shapes 3D - Data Type: f32 - Engine: CPUDCBA246810SE +/- 0.07364, N = 3SE +/- 0.06711, N = 3SE +/- 0.00710, N = 3SE +/- 0.01655, N = 36.408066.286636.270725.54387MIN: 6.17MIN: 6.01MIN: 6.08MIN: 5.31. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPUDCBA0.54491.08981.63472.17962.7245SE +/- 0.08567, N = 15SE +/- 0.10516, N = 12SE +/- 0.08755, N = 15SE +/- 0.09135, N = 122.421762.374032.351612.18433MIN: 1.53MIN: 1.6MIN: 1.4MIN: 1.281. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPUDCBA0.2560.5120.7681.0241.28SE +/- 0.01354, N = 3SE +/- 0.00210, N = 3SE +/- 0.00878, N = 9SE +/- 0.00225, N = 31.107051.119271.117741.13774MIN: 1.03MIN: 1.04MIN: 1MIN: 1.041. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPUDCBA0.21180.42360.63540.84721.059SE +/- 0.009748, N = 15SE +/- 0.009118, N = 3SE +/- 0.010535, N = 15SE +/- 0.010786, N = 30.9284040.9041100.9100560.941266MIN: 0.83MIN: 0.84MIN: 0.8MIN: 0.861. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPUDCBA246810SE +/- 0.03878, N = 3SE +/- 0.03051, N = 3SE +/- 0.03741, N = 3SE +/- 0.04611, N = 36.906076.850396.821666.68005MIN: 6.2MIN: 6.18MIN: 6.16MIN: 5.951. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPUDCBA0.47520.95041.42561.90082.376SE +/- 0.00569, N = 3SE +/- 0.00512, N = 3SE +/- 0.00652, N = 3SE +/- 0.00321, N = 32.111462.112122.110252.10617MIN: 2.06MIN: 2.06MIN: 2.05MIN: 2.051. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPUDCBA246810SE +/- 0.01318, N = 3SE +/- 0.00816, N = 3SE +/- 0.02160, N = 3SE +/- 0.01609, N = 36.443306.446896.433306.39430MIN: 6.36MIN: 6.34MIN: 6.33MIN: 6.311. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPUDCBA0.3510.7021.0531.4041.755SE +/- 0.02072, N = 3SE +/- 0.00131, N = 3SE +/- 0.04182, N = 12SE +/- 0.00789, N = 31.455111.560001.498711.52619MIN: 1.3MIN: 1.41MIN: 0.93MIN: 1.381. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPUDCBA0.22340.44680.67020.89361.117SE +/- 0.000768, N = 3SE +/- 0.004648, N = 3SE +/- 0.001082, N = 3SE +/- 0.001712, N = 30.9848190.9870200.9927130.979135MIN: 0.92MIN: 0.91MIN: 0.93MIN: 0.931. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPUDCBA11002200330044005500SE +/- 59.27, N = 15SE +/- 48.98, N = 3SE +/- 8.67, N = 3SE +/- 28.21, N = 34882.284964.595003.994959.96MIN: 4219.23MIN: 4820.84MIN: 4937.34MIN: 4866.161. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPUDCBA30060090012001500SE +/- 9.49, N = 3SE +/- 2.45, N = 3SE +/- 15.69, N = 3SE +/- 1.48, N = 31211.441221.121251.241260.20MIN: 1174.61MIN: 1196.31MIN: 1199.03MIN: 1234.631. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPUDCBA11002200330044005500SE +/- 19.43, N = 3SE +/- 9.55, N = 3SE +/- 24.43, N = 3SE +/- 40.36, N = 94950.975003.385034.834954.34MIN: 4866.67MIN: 4934.99MIN: 4959.6MIN: 4613.841. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPUDCBA30060090012001500SE +/- 5.02, N = 3SE +/- 2.90, N = 3SE +/- 10.61, N = 15SE +/- 9.32, N = 31223.531238.601246.071236.75MIN: 1198.19MIN: 1199.98MIN: 1122.63MIN: 1201.871. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPUDCBA246810SE +/- 0.36772, N = 12SE +/- 0.41090, N = 15SE +/- 0.38999, N = 15SE +/- 0.31854, N = 157.049817.579416.930137.59165MIN: 4.64MIN: 4.43MIN: 4.81MIN: 5.061. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPUDCBA11002200330044005500SE +/- 80.48, N = 13SE +/- 10.61, N = 3SE +/- 9.21, N = 3SE +/- 3.11, N = 34884.905011.494997.825028.06MIN: 4023.92MIN: 4942.23MIN: 4933.96MIN: 4972.231. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPUDCBA30060090012001500SE +/- 10.63, N = 3SE +/- 14.38, N = 3SE +/- 8.78, N = 3SE +/- 37.60, N = 121254.391250.991242.441208.52MIN: 1209.18MIN: 1202.91MIN: 1204.89MIN: 796.321. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPUDCBA3691215SE +/- 0.39, N = 12SE +/- 0.32, N = 15SE +/- 0.07, N = 3SE +/- 0.23, N = 1511.7511.9011.3411.60MIN: 8.18MIN: 8.22MIN: 10.78MIN: 9.981. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread


Phoronix Test Suite v10.8.5