Commit Graph

6 Commits

Author SHA1 Message Date
Nigel Tao
f72412cfe3 vector: fix overflow when rasterizing a 30 degree line.
There are some obvious TODOs, but they will be follow-up commits.

This is about correctness, not performance, but for the record:

name                              old time/op  new time/op   delta
GlyphAlpha16Over-8                3.16µs ± 0%   3.38µs ± 0%   +6.96%         (p=0.000 n=9+10)
GlyphAlpha16Src-8                 3.06µs ± 0%   3.28µs ± 0%   +7.21%        (p=0.000 n=10+10)
GlyphAlpha32Over-8                4.92µs ± 1%   5.23µs ± 1%   +6.24%        (p=0.000 n=10+10)
GlyphAlpha32Src-8                 4.53µs ± 1%   4.83µs ± 0%   +6.69%        (p=0.000 n=10+10)
GlyphAlpha64Over-8                9.60µs ± 1%  10.21µs ± 0%   +6.36%        (p=0.000 n=10+10)
GlyphAlpha64Src-8                 8.04µs ± 0%   8.68µs ± 1%   +7.99%         (p=0.000 n=9+10)
GlyphAlpha128Over-8               23.1µs ± 0%   24.2µs ± 1%   +5.08%         (p=0.000 n=9+10)
GlyphAlpha128Src-8                16.8µs ± 0%   18.0µs ± 1%   +6.76%        (p=0.000 n=10+10)
GlyphAlpha256Over-8               68.6µs ± 1%   70.3µs ± 0%   +2.50%        (p=0.000 n=10+10)
GlyphAlpha256Src-8                43.6µs ± 0%   45.4µs ± 0%   +4.08%         (p=0.000 n=10+8)
GlyphRGBA16Over-8                 4.92µs ± 0%   5.14µs ± 0%   +4.48%          (p=0.000 n=9+9)
GlyphRGBA16Src-8                  4.39µs ± 0%   4.59µs ± 0%   +4.60%          (p=0.000 n=8+9)
GlyphRGBA32Over-8                 11.8µs ± 0%   12.2µs ± 1%   +2.89%         (p=0.000 n=9+10)
GlyphRGBA32Src-8                  9.79µs ± 1%  10.03µs ± 0%   +2.49%         (p=0.000 n=10+7)
GlyphRGBA64Over-8                 36.7µs ± 1%   37.5µs ± 1%   +2.23%        (p=0.000 n=10+10)
GlyphRGBA64Src-8                  28.5µs ± 0%   29.1µs ± 0%   +2.09%        (p=0.000 n=10+10)
GlyphRGBA128Over-8                 133µs ± 0%    135µs ± 0%   +1.51%         (p=0.000 n=10+9)
GlyphRGBA128Src-8                 99.1µs ± 0%  100.5µs ± 1%   +1.47%         (p=0.000 n=9+10)
GlyphRGBA256Over-8                 505µs ± 0%    511µs ± 0%   +1.18%         (p=0.000 n=9+10)
GlyphRGBA256Src-8                  372µs ± 0%    374µs ± 0%   +0.69%         (p=0.000 n=9+10)

Change-Id: Ice1d77de5bc2649f8cd88366bcae3c00e78d65c2
Reviewed-on: https://go-review.googlesource.com/31113
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2016-10-14 22:25:13 +00:00
Nigel Tao
beb9675609 vector: fix overflow when rasterizing wide lines.
Change-Id: Iea92b74ca9533de2ef17534ee3acf4f40c3d03ef
Reviewed-on: https://go-review.googlesource.com/30899
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2016-10-14 02:01:44 +00:00
Nigel Tao
8edbaf3f9e vector: add SIMD versions of xxxAccumulateOpOver.
name                             old time/op  new time/op  delta
GlyphAlpha16Over-8               3.55µs ± 0%  3.17µs ± 0%  -10.58%  (p=0.000 n=10+10)
GlyphAlpha32Over-8               6.73µs ± 1%  4.94µs ± 0%  -26.55%   (p=0.000 n=10+9)
GlyphAlpha64Over-8               16.4µs ± 0%   9.6µs ± 0%  -41.30%   (p=0.000 n=9+10)
GlyphAlpha128Over-8              47.3µs ± 0%  23.1µs ± 1%  -51.09%    (p=0.000 n=9+9)
GlyphAlpha256Over-8               159µs ± 0%    69µs ± 0%  -56.82%    (p=0.000 n=9+8)

A comparison of the non-SIMD and SIMD versions:

name                              time/op
FixedAccumulateOpOver16-8          579ns ± 0%
FixedAccumulateOpOverSIMD16-8      183ns ± 0%
FloatingAccumulateOpOver16-8       670ns ± 1%
FloatingAccumulateOpOverSIMD16-8   242ns ± 0%
FixedAccumulateOpOver64-8         9.61µs ± 0%
FixedAccumulateOpOverSIMD64-8     2.72µs ± 0%
FloatingAccumulateOpOver64-8      11.1µs ± 0%
FloatingAccumulateOpOverSIMD64-8  3.65µs ± 0%

Change-Id: I08273c40ac5445f39b77a88fb8b6b07fd3e5f84b
Reviewed-on: https://go-review.googlesource.com/30831
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2016-10-13 03:04:36 +00:00
Nigel Tao
746988e7a2 vector: add SIMD versions of xxxAccumulateOpSrc.
name                         old time/op  new time/op  delta
GlyphAlpha16Src-8            3.37µs ± 0%  3.07µs ± 1%   -8.86%    (p=0.000 n=9+9)
GlyphAlpha32Src-8            6.01µs ± 1%  4.55µs ± 0%  -24.28%   (p=0.000 n=10+9)
GlyphAlpha64Src-8            13.2µs ± 0%   8.1µs ± 0%  -38.69%   (p=0.000 n=10+9)
GlyphAlpha128Src-8           32.9µs ± 0%  16.9µs ± 0%  -48.85%   (p=0.000 n=10+9)
GlyphAlpha256Src-8           98.0µs ± 0%  43.6µs ± 1%  -55.50%  (p=0.000 n=10+10)

A comparison of the non-SIMD and SIMD versions:

name                             time/op
FixedAccumulateOpSrc16-8          368ns ± 0%
FixedAccumulateOpSrcSIMD16-8     86.8ns ± 1%
FloatingAccumulateOpSrc16-8       434ns ± 0%
FloatingAccumulateOpSrcSIMD16-8   119ns ± 0%
FixedAccumulateOpSrc64-8         6.12µs ± 0%
FixedAccumulateOpSrcSIMD64-8     1.17µs ± 0%
FloatingAccumulateOpSrc64-8      7.15µs ± 0%
FloatingAccumulateOpSrcSIMD64-8  1.68µs ± 1%

Change-Id: I58e5c7a3ecd12e536aab8e765e94275453d0eac8
Reviewed-on: https://go-review.googlesource.com/30431
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2016-10-10 07:32:32 +00:00
Nigel Tao
72141d56a2 vector: re-order some functions.
There are no code changes, just a re-ordering so that these files are
consistent with others in this package: OpOver, OpSrc, Mask.

Change-Id: Ib1d46a8e912dae0c760af655e919b77023688189
Reviewed-on: https://go-review.googlesource.com/30111
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2016-10-02 02:10:30 +00:00
Nigel Tao
992afa5d48 vector: add a fixed point math implementation.
name                      old time/op  new time/op  delta

GlyphAlpha16Over-8        4.48µs ± 1%  3.56µs ± 0%  -20.70%   (p=0.000 n=9+10)
GlyphAlpha16Src-8         4.17µs ± 0%  3.38µs ± 1%  -19.09%  (p=0.000 n=10+10)
GlyphAlpha32Over-8        9.03µs ± 0%  6.74µs ± 0%  -25.33%   (p=0.000 n=9+10)
GlyphAlpha32Src-8         7.46µs ± 1%  5.98µs ± 0%  -19.80%   (p=0.000 n=10+9)
GlyphAlpha64Over-8        21.3µs ± 0%  16.4µs ± 0%  -22.84%  (p=0.000 n=10+10)
GlyphAlpha64Src-8         16.2µs ± 1%  13.1µs ± 0%  -19.33%  (p=0.000 n=10+10)
GlyphAlpha128Over-8       59.8µs ± 0%  47.2µs ± 0%  -21.11%    (p=0.000 n=9+9)
GlyphAlpha128Src-8        41.3µs ± 1%  33.0µs ± 0%  -20.26%   (p=0.000 n=9+10)
GlyphAlpha256Over-8        197µs ± 0%   158µs ± 0%  -19.44%   (p=0.000 n=9+10)
GlyphAlpha256Src-8         124µs ± 0%    98µs ± 0%  -21.17%    (p=0.000 n=9+9)

GlyphAlphaLoose16Over-8   4.73µs ± 0%  3.97µs ± 1%  -16.06%  (p=0.000 n=10+10)
GlyphAlphaLoose16Src-8    4.41µs ± 0%  3.64µs ± 1%  -17.50%  (p=0.000 n=10+10)
GlyphAlphaLoose32Over-8   9.62µs ± 0%  8.47µs ± 0%  -11.95%  (p=0.000 n=10+10)
GlyphAlphaLoose32Src-8    8.25µs ± 0%  7.19µs ± 0%  -12.88%    (p=0.000 n=9+9)
GlyphAlphaLoose64Over-8   25.6µs ± 0%  22.2µs ± 0%  -13.01%    (p=0.000 n=9+9)
GlyphAlphaLoose64Src-8    20.2µs ± 0%  17.2µs ± 1%  -14.98%  (p=0.000 n=10+10)
GlyphAlphaLoose128Over-8  83.4µs ± 1%  68.2µs ± 0%  -18.27%  (p=0.000 n=10+10)
GlyphAlphaLoose128Src-8   59.8µs ± 0%  47.4µs ± 0%  -20.77%   (p=0.000 n=10+9)
GlyphAlphaLoose256Over-8   273µs ± 1%   239µs ± 0%  -12.52%   (p=0.000 n=10+9)
GlyphAlphaLoose256Src-8    187µs ± 0%   155µs ± 1%  -16.91%   (p=0.000 n=9+10)

GlyphRGBA16Over-8         5.99µs ± 0%  5.24µs ± 1%  -12.60%   (p=0.000 n=9+10)
GlyphRGBA16Src-8          5.48µs ± 0%  4.68µs ± 0%  -14.68%   (p=0.000 n=9+10)
GlyphRGBA32Over-8         14.6µs ± 0%  13.5µs ± 0%   -7.60%    (p=0.000 n=9+9)
GlyphRGBA32Src-8          12.6µs ± 0%  11.4µs ± 0%   -9.62%    (p=0.000 n=9+9)
GlyphRGBA64Over-8         44.8µs ± 0%  42.2µs ± 0%   -5.69%    (p=0.000 n=9+9)
GlyphRGBA64Src-8          36.6µs ± 1%  33.5µs ± 1%   -8.55%    (p=0.000 n=9+9)
GlyphRGBA128Over-8         162µs ± 0%   148µs ± 1%   -8.85%   (p=0.000 n=10+9)
GlyphRGBA128Src-8          129µs ± 1%   114µs ± 0%  -11.61%   (p=0.000 n=9+10)
GlyphRGBA256Over-8         588µs ± 0%   573µs ± 0%   -2.53%   (p=0.000 n=9+10)
GlyphRGBA256Src-8          455µs ± 0%   426µs ± 1%   -6.51%   (p=0.000 n=9+10)

GlyphNRGBA16Over-8        27.0µs ± 4%  26.3µs ± 2%   -2.65%   (p=0.001 n=9+10)
GlyphNRGBA16Src-8         19.4µs ± 3%  18.6µs ± 1%   -4.35%   (p=0.000 n=9+10)
GlyphNRGBA32Over-8        97.4µs ± 3%  96.8µs ± 2%     ~      (p=0.447 n=9+10)
GlyphNRGBA32Src-8         66.6µs ± 3%  64.5µs ± 1%   -3.21%   (p=0.000 n=10+9)
GlyphNRGBA64Over-8         372µs ± 3%   368µs ± 1%     ~     (p=0.105 n=10+10)
GlyphNRGBA64Src-8          235µs ± 1%   234µs ± 1%     ~       (p=0.130 n=8+8)
GlyphNRGBA128Over-8       1.45ms ± 2%  1.48ms ± 3%   +2.06%    (p=0.014 n=9+9)
GlyphNRGBA128Src-8         926µs ± 3%   937µs ± 1%     ~      (p=0.113 n=10+9)
GlyphNRGBA256Over-8       5.76ms ± 2%  5.90ms ± 3%   +2.29%   (p=0.001 n=9+10)
GlyphNRGBA256Src-8        3.59ms ± 1%  3.86ms ± 1%   +7.46%   (p=0.000 n=9+10)

Change-Id: I72f25193b5be4e57af09e9eea4eee50545a34cbf
Reviewed-on: https://go-review.googlesource.com/29972
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2016-09-30 06:47:22 +00:00