
Intrinsics Guide
__m256i _mm256_hadd_epi16 (__m256i a, __m256i b)
Synopsis
__m256i _mm256_hadd_epi16 (__m256i a, __m256i b)
#include "immintrin.h"
Instruction: vphaddw ymm, ymm, ymm
CPUID Flags: AVX2
#include "immintrin.h"
Instruction: vphaddw ymm, ymm, ymm
CPUID Flags: AVX2
Description
Horizontally add adjacent pairs of 16-bit integers in a and b, and pack the signed 16-bit results in dst.
Operation
dst[15:0] := a[31:16] + a[15:0]
dst[31:16] := a[63:48] + a[47:32]
dst[47:32] := a[95:80] + a[79:64]
dst[63:48] := a[127:112] + a[111:96]
dst[79:64] := b[31:16] + b[15:0]
dst[95:80] := b[63:48] + b[47:32]
dst[111:96] := b[95:80] + b[79:64]
dst[127:112] := b[127:112] + b[111:96]
dst[143:128] := a[159:144] + a[143:128]
dst[159:144] := a[191:176] + a[175:160]
dst[175:160] := a[223:208] + a[207:192]
dst[191:176] := a[255:240] + a[239:224]
dst[207:192] := b[127:112] + b[143:128]
dst[223:208] := b[159:144] + b[175:160]
dst[239:224] := b[191:176] + b[207:192]
dst[255:240] := b[223:208] + b[239:224]
dst[MAX:256] := 0
Performance
| Architecture | Latency | Throughput |
|---|---|---|
| Haswell | 3 | 2 |
__m256i _mm256_hadd_epi32 (__m256i a, __m256i b)
Synopsis
__m256i _mm256_hadd_epi32 (__m256i a, __m256i b)
#include "immintrin.h"
Instruction: vphaddd ymm, ymm, ymm
CPUID Flags: AVX2
#include "immintrin.h"
Instruction: vphaddd ymm, ymm, ymm
CPUID Flags: AVX2
Description
Horizontally add adjacent pairs of 32-bit integers in a and b, and pack the signed 32-bit results in dst.
Operation
dst[31:0] := a[63:32] + a[31:0]
dst[63:32] := a[127:96] + a[95:64]
dst[95:64] := b[63:32] + b[31:0]
dst[127:96] := b[127:96] + b[95:64]
dst[159:128] := a[191:160] + a[159:128]
dst[191:160] := a[255:224] + a[223:192]
dst[223:192] := b[191:160] + b[159:128]
dst[255:224] := b[255:224] + b[223:192]
dst[MAX:256] := 0
Performance
| Architecture | Latency | Throughput |
|---|---|---|
| Haswell | 3 | 2 |
'SIMD > Intel Instrinsic' 카테고리의 다른 글
| SIMD: Transpose (0) | 2015.01.26 |
|---|---|
| _mm_set_epi16 , (0) | 2015.01.22 |
| _mm_srai_epi16() , _mm_srai_epi32() : Shift (0) | 2015.01.22 |
| __m128i _mm_blend_epi32 () : Mix data (0) | 2015.01.21 |
| void _mm_prefetch ( ) (0) | 2015.01.14 |