One could remove the recomputation of the perm
vector by assuming (stride % 16) == 0, unfortunately
this is not always true. Quite a lot of load/stores
- can be removed by assuming proper alignement of
+ can be removed by assuming proper alignment of
src & stride :-(
*/
uint8_t *src2 = src;
One could remove the recomputation of the perm
vector by assuming (stride % 16) == 0, unfortunately
this is not always true. Quite a lot of load/stores
- can be removed by assuming proper alignement of
+ can be removed by assuming proper alignment of
src & stride :-(
*/
uint8_t *src2 = src;
const vector signed short dornotd = vec_sel((vector signed short)zero,
dclampedfinal,
vec_cmplt(absmE, vqp));
- /* add/substract to l4 and l5 */
+ /* add/subtract to l4 and l5 */
const vector signed short vb4minusd = vec_sub(vb4, dornotd);
const vector signed short vb5plusd = vec_add(vb5, dornotd);
/* finally, stores */
One could remove the recomputation of the perm
vector by assuming (stride % 16) == 0, unfortunately
this is not always true. Quite a lot of load/stores
- can be removed by assuming proper alignement of
+ can be removed by assuming proper alignment of
src & stride :-(
*/
uint8_t *srcCopy = src;