libweston: Optimise matrix multiplication

The previous version used div() to separate the column and row of the current element, but that function is implemented as a libc call, which prevented the compiler from vectorising the loop and made matrix multiplication appear quite high in profiles. With div() removed, we are down from 64 calls to vfmadd132ss acting on one float at a time, to just 8 calls to vfmadd132ps when compiled with AVX2 support (or 16 mulps, 16 addps with SSE2 support only), and the function isn’t a hot spot any more. Signed-off-by: Emmanuel Gil Peyrot <linkmauve@linkmauve.fr>
author: Emmanuel Gil Peyrot <linkmauve@linkmauve.fr> 2023-01-02 17:46:18 +0100
committer: Derek Foreman <derek.foreman@collabora.com> 2023-01-11 20:03:36 +0000
commit: e13c99690b7e1c4175fef7a58200694a2f3509ba (patch)
tree: de236a97b8aa1bdd4274086e986b3fad8283e815 /shared
parent: 102acac6a9c758d47d876ddcecc4ad9a10b0c34b (diff)
1 files changed, 9 insertions, 9 deletions
diff --git a/shared/matrix.c b/shared/matrix.c
index f301c5fa..72717f37 100644
--- a/shared/matrix.c
+++ b/shared/matrix.c
@@ -61,16 +61,16 @@ weston_matrix_multiply(struct weston_matrix *m, const struct weston_matrix *n)
 {
 	struct weston_matrix tmp;
 	const float *row, *column;
-	div_t d;
-	int i, j;
+	int i, j, k;
 
-	for (i = 0; i < 16; i++) {
-		tmp.d[i] = 0;
-		d = div(i, 4);
-		row = m->d + d.quot * 4;
-		column = n->d + d.rem;
-		for (j = 0; j < 4; j++)
-			tmp.d[i] += row[j] * column[j * 4];
+	for (i = 0; i < 4; i++) {
+		row = m->d + i * 4;
+		for (j = 0; j < 4; j++) {
+			tmp.d[4 * i + j] = 0;
+			column = n->d + j;
+			for (k = 0; k < 4; k++)
+				tmp.d[4 * i + j] += row[k] * column[k * 4];
+		}
 	}
 	tmp.type = m->type | n->type;
 	memcpy(m, &tmp, sizeof tmp);
author	Emmanuel Gil Peyrot <linkmauve@linkmauve.fr>	2023-01-02 17:46:18 +0100
committer	Derek Foreman <derek.foreman@collabora.com>	2023-01-11 20:03:36 +0000
commit	e13c99690b7e1c4175fef7a58200694a2f3509ba (patch)
tree	de236a97b8aa1bdd4274086e986b3fad8283e815 /shared
parent	102acac6a9c758d47d876ddcecc4ad9a10b0c34b (diff)