Issue 2813823002: Adding new functionality for SIMD optimizations in AEC3

peah-webrtc

Patchset #1 (id:1) has been deleted

3 years, 8 months ago (2017-04-11 08:03:42 UTC) #1

peah-webrtc

Description was changed from ========== MoreSSe2Optimizations MoreSSe2Optimizations BUG= ========== to ========== Adding new functionality for ...

3 years, 8 months ago (2017-04-11 08:10:21 UTC) #2

peah-webrtc

Description was changed from ========== Adding new functionality for SIMD optimizations in AEC3 Most of ...

3 years, 8 months ago (2017-04-11 08:16:08 UTC) #4

peah-webrtc

peah@webrtc.org changed reviewers: + aleloi@webrtc.org, ivoc@webrtc.org

3 years, 8 months ago (2017-04-11 08:16:38 UTC) #5

peah-webrtc

Hi, This is a CL that replaces a some of the computations in AEC3 using ...

3 years, 8 months ago (2017-04-11 08:16:39 UTC) #6

ivoc

See some minor remarks below. Also, I didn't test this, but I think it might ...

3 years, 8 months ago (2017-04-11 12:03:02 UTC) #7

peah-webrtc

Thanks for the comments! I've uploaded a new patch. https://codereview.webrtc.org/2813823002/diff/40001/webrtc/modules/audio_processing/aec3/suppression_gain.cc File webrtc/modules/audio_processing/aec3/suppression_gain.cc (right): https://codereview.webrtc.org/2813823002/diff/40001/webrtc/modules/audio_processing/aec3/suppression_gain.cc#newcode52 webrtc/modules/audio_processing/aec3/suppression_gain.cc:52: ...

3 years, 8 months ago (2017-04-11 13:09:26 UTC) #8

aleloi

lgtm! Nice that the complexity could be moved from suppression_gain.cc. https://codereview.webrtc.org/2813823002/diff/40001/webrtc/modules/audio_processing/aec3/vector_math.h File webrtc/modules/audio_processing/aec3/vector_math.h (right): https://codereview.webrtc.org/2813823002/diff/40001/webrtc/modules/audio_processing/aec3/vector_math.h#newcode14 ...

3 years, 8 months ago (2017-04-11 14:27:30 UTC) #9

peah-webrtc

On 2017/04/11 12:03:02, ivoc wrote: > See some minor remarks below. Also, I didn't test ...

3 years, 8 months ago (2017-04-11 15:12:02 UTC) #10

On 2017/04/11 12:03:02, ivoc wrote:
> See some minor remarks below. Also, I didn't test this, but I think it might
be
> good for performance to split the loop into 3 parts:
> 
> 1. Process start of the array which is potentially unaligned, up to the first
> alignment boundary.
> 2. Process the main chunk of aligned memory (using _mm_load_ps instead of
> _mm_loadu_ps)
> 3. Process the remainder of unaligned memory.
> 
> Also, I think these math functions have a pretty low number of computations
per
> memory access, which means that the SSE speedup could be lower than expected
> (since memory is likely to be the bottleneck).
> 
>
https://codereview.webrtc.org/2813823002/diff/40001/webrtc/modules/audio_proc...
> File webrtc/modules/audio_processing/aec3/suppression_gain.cc (right):
> 
>
https://codereview.webrtc.org/2813823002/diff/40001/webrtc/modules/audio_proc...
> webrtc/modules/audio_processing/aec3/suppression_gain.cc:52: // TODO(peah):
Add
> further optimizations, in particular for the divisions.
> Is this still relevant?
> 
>
https://codereview.webrtc.org/2813823002/diff/40001/webrtc/modules/audio_proc...
> File webrtc/modules/audio_processing/aec3/vector_math.h (right):
> 
>
https://codereview.webrtc.org/2813823002/diff/40001/webrtc/modules/audio_proc...
> webrtc/modules/audio_processing/aec3/vector_math.h:45: for (size_t k = 0; k <
> kVectorLimit; ++k, j += 4) {
> I think the 2 different loop variables are a bit confusing. An alternative is
to
> do it like this:
> 
> int j = 0;
> for (; j<kVectorLimit*4; j+=4) {
>   ...
> }
> for (; j<x.size(); j++) {
>   ...
> }
> 
>
https://codereview.webrtc.org/2813823002/diff/40001/webrtc/modules/audio_proc...
> webrtc/modules/audio_processing/aec3/vector_math.h:74: for (size_t k = 0; k <
> kVectorLimit; ++k, j += 4) {
> Same here.
> 
>
https://codereview.webrtc.org/2813823002/diff/40001/webrtc/modules/audio_proc...
> webrtc/modules/audio_processing/aec3/vector_math.h:102: for (size_t k = 0; k <
> kVectorLimit; ++k, j += 4) {
> And here.
> 
>
https://codereview.webrtc.org/2813823002/diff/40001/webrtc/modules/audio_proc...
> File webrtc/modules/audio_processing/aec3/vector_math_unittest.cc (right):
> 
>
https://codereview.webrtc.org/2813823002/diff/40001/webrtc/modules/audio_proc...
> webrtc/modules/audio_processing/aec3/vector_math_unittest.cc:28: x[k] = 2.f /
> 3.f * k;
> I would add some braces to make the order of operations more clear (although I
> think the code is correct).
> 
>
https://codereview.webrtc.org/2813823002/diff/40001/webrtc/modules/audio_proc...
> webrtc/modules/audio_processing/aec3/vector_math_unittest.cc:35: EXPECT_EQ(z,
> z_sse2);
> Does this check the contents of the arrays as well?
> 
>
https://codereview.webrtc.org/2813823002/diff/40001/webrtc/modules/audio_proc...
> webrtc/modules/audio_processing/aec3/vector_math_unittest.cc:51: y[k] = 2.f /
> 3.f * k;
> Braces would be nice.

Sorry, I did not see the initial remarks:
Yes, the code does not do any assumptions on alignment at all (uses loadu_
instead of load_). The reason for this is that a) on newer platforms, the
alignment does not seem to matter (my benchmarks gave no difference). I think it
matters on older platforms though, but this actually matches how this is done in
AEC2. Therefore, since it was not straightforward to align the signals I did not
do that and until the alignment is done I'd prefer to use 2 loops. Furthermore,
I'm not sure how to really verify that the alignment is properly done if I
cannot show any difference in the benchmarks.

On the platforms where I tested, load_ gave no difference compared to load_ps.
That is along what I read on the net would be the case.

WDYT?

peah-webrtc

The CQ bit was checked by peah@webrtc.org to run a CQ dry run

3 years, 8 months ago (2017-04-12 05:49:06 UTC) #11

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.webrtc.org/2813823002/60001

3 years, 8 months ago (2017-04-12 05:49:14 UTC) #12

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 8 months ago (2017-04-12 05:59:22 UTC) #13

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: win_x64_clang_rel on master.tryserver.webrtc (JOB_FAILED, http://build.chromium.org/p/tryserver.webrtc/builders/win_x64_clang_rel/builds/12663)

3 years, 8 months ago (2017-04-12 05:59:23 UTC) #14

peah-webrtc

The CQ bit was checked by peah@webrtc.org to run a CQ dry run

3 years, 8 months ago (2017-04-12 06:10:57 UTC) #15

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.webrtc.org/2813823002/80001

3 years, 8 months ago (2017-04-12 06:11:02 UTC) #16

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 8 months ago (2017-04-12 06:48:00 UTC) #17

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

3 years, 8 months ago (2017-04-12 06:48:01 UTC) #18

ivoc

On 2017/04/11 15:12:02, peah-webrtc wrote: > On 2017/04/11 12:03:02, ivoc wrote: > > See some ...

3 years, 8 months ago (2017-04-12 07:26:50 UTC) #19

On 2017/04/11 15:12:02, peah-webrtc wrote:
> On 2017/04/11 12:03:02, ivoc wrote:
> > See some minor remarks below. Also, I didn't test this, but I think it might
> be
> > good for performance to split the loop into 3 parts:
> > 
> > 1. Process start of the array which is potentially unaligned, up to the
first
> > alignment boundary.
> > 2. Process the main chunk of aligned memory (using _mm_load_ps instead of
> > _mm_loadu_ps)
> > 3. Process the remainder of unaligned memory.
> > 
> > Also, I think these math functions have a pretty low number of computations
> per
> > memory access, which means that the SSE speedup could be lower than expected
> > (since memory is likely to be the bottleneck).
> > 
> >
>
https://codereview.webrtc.org/2813823002/diff/40001/webrtc/modules/audio_proc...
> > File webrtc/modules/audio_processing/aec3/suppression_gain.cc (right):
> > 
> >
>
https://codereview.webrtc.org/2813823002/diff/40001/webrtc/modules/audio_proc...
> > webrtc/modules/audio_processing/aec3/suppression_gain.cc:52: // TODO(peah):
> Add
> > further optimizations, in particular for the divisions.
> > Is this still relevant?
> > 
> >
>
https://codereview.webrtc.org/2813823002/diff/40001/webrtc/modules/audio_proc...
> > File webrtc/modules/audio_processing/aec3/vector_math.h (right):
> > 
> >
>
https://codereview.webrtc.org/2813823002/diff/40001/webrtc/modules/audio_proc...
> > webrtc/modules/audio_processing/aec3/vector_math.h:45: for (size_t k = 0; k
<
> > kVectorLimit; ++k, j += 4) {
> > I think the 2 different loop variables are a bit confusing. An alternative
is
> to
> > do it like this:
> > 
> > int j = 0;
> > for (; j<kVectorLimit*4; j+=4) {
> >   ...
> > }
> > for (; j<x.size(); j++) {
> >   ...
> > }
> > 
> >
>
https://codereview.webrtc.org/2813823002/diff/40001/webrtc/modules/audio_proc...
> > webrtc/modules/audio_processing/aec3/vector_math.h:74: for (size_t k = 0; k
<
> > kVectorLimit; ++k, j += 4) {
> > Same here.
> > 
> >
>
https://codereview.webrtc.org/2813823002/diff/40001/webrtc/modules/audio_proc...
> > webrtc/modules/audio_processing/aec3/vector_math.h:102: for (size_t k = 0; k
<
> > kVectorLimit; ++k, j += 4) {
> > And here.
> > 
> >
>
https://codereview.webrtc.org/2813823002/diff/40001/webrtc/modules/audio_proc...
> > File webrtc/modules/audio_processing/aec3/vector_math_unittest.cc (right):
> > 
> >
>
https://codereview.webrtc.org/2813823002/diff/40001/webrtc/modules/audio_proc...
> > webrtc/modules/audio_processing/aec3/vector_math_unittest.cc:28: x[k] = 2.f
/
> > 3.f * k;
> > I would add some braces to make the order of operations more clear (although
I
> > think the code is correct).
> > 
> >
>
https://codereview.webrtc.org/2813823002/diff/40001/webrtc/modules/audio_proc...
> > webrtc/modules/audio_processing/aec3/vector_math_unittest.cc:35:
EXPECT_EQ(z,
> > z_sse2);
> > Does this check the contents of the arrays as well?
> > 
> >
>
https://codereview.webrtc.org/2813823002/diff/40001/webrtc/modules/audio_proc...
> > webrtc/modules/audio_processing/aec3/vector_math_unittest.cc:51: y[k] = 2.f
/
> > 3.f * k;
> > Braces would be nice.
> 
> Sorry, I did not see the initial remarks:
> Yes, the code does not do any assumptions on alignment at all (uses loadu_
> instead of load_). The reason for this is that a) on newer platforms, the
> alignment does not seem to matter (my benchmarks gave no difference). I think
it
> matters on older platforms though, but this actually matches how this is done
in
> AEC2. Therefore, since it was not straightforward to align the signals I did
not
> do that and until the alignment is done I'd prefer to use 2 loops.
Furthermore,
> I'm not sure how to really verify that the alignment is properly done if I
> cannot show any difference in the benchmarks.
> 
> On the platforms where I tested, load_ gave no difference compared to load_ps.
> That is along what I read on the net would be the case.
> 
> WDYT?

Ok, let's keep it like this for now. LGTM.

peah-webrtc

The patchset sent to the CQ was uploaded after l-g-t-m from aleloi@webrtc.org Link to the ...

3 years, 8 months ago (2017-04-12 08:18:26 UTC) #21

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.webrtc.org/2813823002/80001

3 years, 8 months ago (2017-04-12 08:18:33 UTC) #22

commit-bot: I haz the power

CQ is committing da patch. Bot data: {"patchset_id": 80001, "attempt_start_ts": 1491985106317250, "parent_rev": "9f28b1d354a2f457032fec5f1a0b3aeac31f7275", "commit_rev": "5e79b293137f9022322331c9644743203d246ba3"}

3 years, 8 months ago (2017-04-12 08:20:46 UTC) #23

commit-bot: I haz the power

Description was changed from ========== Adding new functionality for SIMD optimizations in AEC3 Most of ...

3 years, 8 months ago (2017-04-12 08:20:50 UTC) #24

commit-bot: I haz the power

3 years, 8 months ago (2017-04-12 08:20:51 UTC) #25

Message was sent while issue was closed.

Committed patchset #3 (id:80001) as
https://chromium.googlesource.com/external/webrtc/+/5e79b293137f9022322331c96...

Issue 2813823002: Adding new functionality for SIMD optimizations in AEC3 (Closed)

Description

Patch Set 1 #

Patch Set 2 : Changes in response to reviewer comments #

Patch Set 3 : Fixed build error on windows #

Messages