본문 바로가기

SIMD8

SIMD: Transpose /** * @[Description] : Transpose * @ [Param] : pResidual * @ [Param] : buffer for result of transpose. * @ [Param] : Transform size */void Transpose(short* pResi, short *pBuf, int iTrSize){ short*pSrc; short*pDst; int iWidthBlk; int iHeightBlk; int i, j; __m128i Line0, Line1, Line2, Line3, Line4, Line5, Line6, Line7; __m128i L0, L1, L2, L3, H0, H1, H2, H3; iHeightBlk = iWidthBlk = (iTrSize) >> 3.. 2015. 1. 26.
_mm256_hadd_epi16 (), _mm256_hadd_epi32() Intrinsics Guide __m256i _mm256_hadd_epi16 (__m256i a, __m256i b)Synopsis__m256i _mm256_hadd_epi16 (__m256i a, __m256i b) #include "immintrin.h" Instruction: vphaddw ymm, ymm, ymm CPUID Flags: AVX2DescriptionHorizontally add adjacent pairs of 16-bit integers in a and b, and pack the signed 16-bit results in dst.Operationdst[15:0] := a[31:16] + a[15:0] dst[31:16] := a[63:48] + a[47:32] dst[47:32].. 2015. 1. 23.
_mm_set_epi16 , Intrinsics Guide __m128i _mm_set1_epi16 (short a)Synopsis__m128i _mm_set1_epi16 (short a) #include "emmintrin.h" CPUID Flags: SSE2DescriptionBroadcast 16-bit integer a to all all elements of dst. This intrinsic may generate vpbroadcastw.OperationFOR j := 0 to 7 i := j*16 dst[i+15:i] := a[15:0] ENDFOR __m128i _mm_set_epi16 (short e7, short e6, short e5, short e4, short e3, short e2, short e1, sho.. 2015. 1. 22.
_mm_srai_epi16() , _mm_srai_epi32() : Shift Intrinsics Guide __m128i _mm_srai_epi16 (__m128i a, int imm8)Synopsis__m128i _mm_srai_epi16 (__m128i a, int imm8) #include "emmintrin.h" Instruction: psraw xmm, imm CPUID Flags: SSE2DescriptionShift packed 16-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst.OperationFOR j := 0 to 7 i := j*16 IF imm8[7:0] > 15 dst[i+15:i] := SignBit ELSE dst[i+15:i] := Sig.. 2015. 1. 22.