h264/aarch64: add intra loop filter neon asm
authorJanne Grunau <janne-libav@jannau.net>
Mon, 13 Aug 2018 18:43:19 +0000 (20:43 +0200)
committerJanne Grunau <janne-libav@jannau.net>
Sat, 26 Jan 2019 11:05:10 +0000 (12:05 +0100)
commit28a8b5413b64b831dfb8650208bccd8b78360484
tree090b0141e8734a31fca45d432cf7c266d34ed852
parent846c3d6aca5484904e60946c4fe8b8833bc07f92
h264/aarch64: add intra loop filter neon asm

Add my neon asm from x264 relicensed under the LGPL 2.1 or later. Ported
(x264 uses nv12 chroma) and optimized.

Cycle count for checkasm --bench on a Snapdragon 820e:
h264_h_loop_filter_luma_intra_8bpp_c: 60.0
h264_h_loop_filter_luma_intra_8bpp_neon: 54.2
h264_v_loop_filter_luma_intra_8bpp_c: 148.3
h264_v_loop_filter_luma_intra_8bpp_neon: 73.8
h264_h_loop_filter_chroma_intra_8bpp_c: 27.8
h264_h_loop_filter_chroma_intra_8bpp_neon: 21.4
h264_h_loop_filter_chroma_mbaff_intra_8bpp_c: 15.8
h264_h_loop_filter_chroma_mbaff_intra_8bpp_neon: 15.7
h264_v_loop_filter_chroma_intra_8bpp_c: 45.8
h264_v_loop_filter_chroma_intra_8bpp_neon: 17.3
libavcodec/aarch64/h264dsp_init_aarch64.c
libavcodec/aarch64/h264dsp_neon.S