|
|
Index: webrtc/modules/audio_processing/agc/legacy/analog_agc.c |
diff --git a/webrtc/modules/audio_processing/agc/legacy/analog_agc.c b/webrtc/modules/audio_processing/agc/legacy/analog_agc.c |
index 36c67c282a02a1b9096f01aa8bda572624f8b67b..030077afe30aae60d2a611670cbaf0a0323e3015 100644 |
--- a/webrtc/modules/audio_processing/agc/legacy/analog_agc.c |
+++ b/webrtc/modules/audio_processing/agc/legacy/analog_agc.c |
@@ -476,16 +476,20 @@ void WebRtcAgc_ZeroCtrl(LegacyAgc* stt, int32_t* inMicLevel, int32_t* env) { |
int16_t i; |
int32_t tmp32 = 0; |
int32_t midVal; |
+ const int kZeroThreshold = 500; |
peah-webrtc
2016/05/23 04:59:36
This solution definitely works, but another varian
This solution definitely works, but another variant would have been just to
replace
int32_t tmp32 = 0;
by
int64_t tmp64 = 0;
and then do the additions in 64 bit instead.
that should probably also be faster than having the if-statement inside the
for-loop.
Or is there a reason not to use 64 bit values here?
minyue-webrtc
2016/05/23 07:40:48
Yes, I think early exiting the loop is more effici
On 2016/05/23 04:59:36, peah-webrtc wrote:
> This solution definitely works, but another variant would have been just to
> replace
> int32_t tmp32 = 0;
> by
> int64_t tmp64 = 0;
> and then do the additions in 64 bit instead.
>
> that should probably also be faster than having the if-statement inside the
> for-loop.
>
> Or is there a reason not to use 64 bit values here?
Yes, I think early exiting the loop is more efficient, too.
peah-webrtc
2016/05/23 12:00:19
I think that this would be a clear case where a 64
On 2016/05/23 07:40:48, minyue-webrtc wrote:
> On 2016/05/23 04:59:36, peah-webrtc wrote:
> > This solution definitely works, but another variant would have been just to
> > replace
> > int32_t tmp32 = 0;
> > by
> > int64_t tmp64 = 0;
> > and then do the additions in 64 bit instead.
> >
> > that should probably also be faster than having the if-statement inside the
> > for-loop.
> >
> > Or is there a reason not to use 64 bit values here?
>
> Yes, I think early exiting the loop is more efficient, too.
I think that this would be a clear case where a 64 bit value should be used. It
will lead to cleaner code and I doubt that the early return will save anything,
as this is a very short loop of fixed size, which should be ideal to pipeline
(meaning that an early return won't save computations as it anyway breaks the
pipeline).
The only reason I can see for not using a 64 bit value for the sum is if it is
significantly more complex. Do you know if that is the case?
kwiberg-webrtc
2016/05/23 12:44:40
Are the operations int64 += int32 and int64 >= int
On 2016/05/23 12:00:19, peah-webrtc wrote:
> On 2016/05/23 07:40:48, minyue-webrtc wrote:
> > On 2016/05/23 04:59:36, peah-webrtc wrote:
> > > This solution definitely works, but another variant would have been just
to
> > > replace
> > > int32_t tmp32 = 0;
> > > by
> > > int64_t tmp64 = 0;
> > > and then do the additions in 64 bit instead.
> > >
> > > that should probably also be faster than having the if-statement inside
the
> > > for-loop.
> > >
> > > Or is there a reason not to use 64 bit values here?
> >
> > Yes, I think early exiting the loop is more efficient, too.
>
> I think that this would be a clear case where a 64 bit value should be used.
It
> will lead to cleaner code and I doubt that the early return will save
anything,
> as this is a very short loop of fixed size, which should be ideal to pipeline
> (meaning that an early return won't save computations as it anyway breaks the
> pipeline).
>
> The only reason I can see for not using a 64 bit value for the sum is if it is
> significantly more complex. Do you know if that is the case?
Are the operations int64 += int32 and int64 >= int32_constant? That ought to be
pretty efficient even on a 32-bit platform---I'm guessing ~two instructions for
the addition (add low half, add carry to high half), and ~three for the
comparison (test low half, test if high half is zero, combine the results).
On 64-bit platforms, 64-bit operations ought to be just as cheap as 32-bit
operations as long as they operate on registers.
So my guess is that using a 64-bit accumulator is a good idea.
minyue-webrtc
2016/05/24 01:32:26
Loop in 481-488 is really to check if env[0..9] ar
On 2016/05/23 12:00:19, peah-webrtc wrote:
> On 2016/05/23 07:40:48, minyue-webrtc wrote:
> > On 2016/05/23 04:59:36, peah-webrtc wrote:
> > > This solution definitely works, but another variant would have been just
to
> > > replace
> > > int32_t tmp32 = 0;
> > > by
> > > int64_t tmp64 = 0;
> > > and then do the additions in 64 bit instead.
> > >
> > > that should probably also be faster than having the if-statement inside
the
> > > for-loop.
> > >
> > > Or is there a reason not to use 64 bit values here?
> >
> > Yes, I think early exiting the loop is more efficient, too.
>
> I think that this would be a clear case where a 64 bit value should be used.
It
> will lead to cleaner code and I doubt that the early return will save
anything,
> as this is a very short loop of fixed size, which should be ideal to pipeline
> (meaning that an early return won't save computations as it anyway breaks the
> pipeline).
>
> The only reason I can see for not using a 64 bit value for the sum is if it is
> significantly more complex. Do you know if that is the case?
Loop in 481-488 is really to check if env[0..9] are generally zeros. In most
cases, this should be invalidated at index 0, it is nice that we return early. I
do not fully understand the pipeline argument.
peah-webrtc
2016/05/24 06:16:20
What I mean by pipelining is that as the loop is s
On 2016/05/24 01:32:26, minyue-webrtc wrote:
> On 2016/05/23 12:00:19, peah-webrtc wrote:
> > On 2016/05/23 07:40:48, minyue-webrtc wrote:
> > > On 2016/05/23 04:59:36, peah-webrtc wrote:
> > > > This solution definitely works, but another variant would have been just
> to
> > > > replace
> > > > int32_t tmp32 = 0;
> > > > by
> > > > int64_t tmp64 = 0;
> > > > and then do the additions in 64 bit instead.
> > > >
> > > > that should probably also be faster than having the if-statement inside
> the
> > > > for-loop.
> > > >
> > > > Or is there a reason not to use 64 bit values here?
> > >
> > > Yes, I think early exiting the loop is more efficient, too.
> >
> > I think that this would be a clear case where a 64 bit value should be used.
> It
> > will lead to cleaner code and I doubt that the early return will save
> anything,
> > as this is a very short loop of fixed size, which should be ideal to
pipeline
> > (meaning that an early return won't save computations as it anyway breaks
the
> > pipeline).
> >
> > The only reason I can see for not using a 64 bit value for the sum is if it
is
> > significantly more complex. Do you know if that is the case?
>
> Loop in 481-488 is really to check if env[0..9] are generally zeros. In most
> cases, this should be invalidated at index 0, it is nice that we return early.
I
> do not fully understand the pipeline argument.
What I mean by pipelining is that as the loop is so predictable for the compiler
that it should be extremely efficient to perform for the CPU and it utilizes the
CPU pipeline in a really good manner. Possibly it could even be done using SIMD
instructions. Since the loop limit is fixed, and as low as 10, I expect the
compiler would probably do loop-unrolling on the loop, e.g, replace the loop by.
tmp32 += env[0];
tmp32 += env[1];
...
tmp32 += env[9];
Therefore, this loop is extremely cheap and doing an early return is probably
not worth it, and would potentially even increase the complexity.
The compiler would possibly be able to do good pipelining of the early-return
variant of the loop but it is at least not as straightforward to do that, and
the break statement will anyway break the pipeline for the loop. Also, loop
unrolling would be less nice, e.g.,
if (env[0] >= kZeroThreshold || tmp32 >= kZeroThreshold) {
goto early_return_label;
}
tmp32 += env[0];
if (env[1] >= kZeroThreshold || tmp32 >= kZeroThreshold) {
goto early_return_label;
}
tmp32 += env[1];
...
if (env[9] >= kZeroThreshold || tmp32 >= kZeroThreshold) {
goto early_return_label;
}
tmp32 += env[9];
early_return_label:
(The compiler would probably optimize the loop unrolling above to look very
different though).
If you expect the test to often be invalidated at index 0, I'd then instead
change the code to be,
int64_t tmp64 = env[0];
if (tmp64 < kZeroThreshold) {
for (i = 1; i < 10; ++i) {
tmp64 = env[i];
}
}
That would provide a really efficient early return, and would give a very nice
optimize able loop. I don't think it is worth the extra code though, as this is
an extremely cheap loop for the compiler to perform, and as these types of
optimization only affect the average level of the complexity which means that it
is mainly a way to save electrical power of the CPU (i.e., does not affect the
algorithmic complexity).
As I see it, the only reason for not going for 64 bit variables should be a high
cost of the 64 bit additions. Do you know if that is the case?
minyue-webrtc
2016/05/24 07:16:12
I can basically follow your suggestion, but I thin
On 2016/05/24 06:16:20, peah-webrtc wrote:
> On 2016/05/24 01:32:26, minyue-webrtc wrote:
> > On 2016/05/23 12:00:19, peah-webrtc wrote:
> > > On 2016/05/23 07:40:48, minyue-webrtc wrote:
> > > > On 2016/05/23 04:59:36, peah-webrtc wrote:
> > > > > This solution definitely works, but another variant would have been
just
> > to
> > > > > replace
> > > > > int32_t tmp32 = 0;
> > > > > by
> > > > > int64_t tmp64 = 0;
> > > > > and then do the additions in 64 bit instead.
> > > > >
> > > > > that should probably also be faster than having the if-statement
inside
> > the
> > > > > for-loop.
> > > > >
> > > > > Or is there a reason not to use 64 bit values here?
> > > >
> > > > Yes, I think early exiting the loop is more efficient, too.
> > >
> > > I think that this would be a clear case where a 64 bit value should be
used.
> > It
> > > will lead to cleaner code and I doubt that the early return will save
> > anything,
> > > as this is a very short loop of fixed size, which should be ideal to
> pipeline
> > > (meaning that an early return won't save computations as it anyway breaks
> the
> > > pipeline).
> > >
> > > The only reason I can see for not using a 64 bit value for the sum is if
it
> is
> > > significantly more complex. Do you know if that is the case?
> >
> > Loop in 481-488 is really to check if env[0..9] are generally zeros. In most
> > cases, this should be invalidated at index 0, it is nice that we return
early.
> I
> > do not fully understand the pipeline argument.
>
> What I mean by pipelining is that as the loop is so predictable for the
compiler
> that it should be extremely efficient to perform for the CPU and it utilizes
the
> CPU pipeline in a really good manner. Possibly it could even be done using
SIMD
> instructions. Since the loop limit is fixed, and as low as 10, I expect the
> compiler would probably do loop-unrolling on the loop, e.g, replace the loop
by.
> tmp32 += env[0];
> tmp32 += env[1];
> ...
> tmp32 += env[9];
> Therefore, this loop is extremely cheap and doing an early return is probably
> not worth it, and would potentially even increase the complexity.
>
> The compiler would possibly be able to do good pipelining of the early-return
> variant of the loop but it is at least not as straightforward to do that, and
> the break statement will anyway break the pipeline for the loop. Also, loop
> unrolling would be less nice, e.g.,
> if (env[0] >= kZeroThreshold || tmp32 >= kZeroThreshold) {
> goto early_return_label;
> }
> tmp32 += env[0];
> if (env[1] >= kZeroThreshold || tmp32 >= kZeroThreshold) {
> goto early_return_label;
> }
> tmp32 += env[1];
>
> ...
> if (env[9] >= kZeroThreshold || tmp32 >= kZeroThreshold) {
> goto early_return_label;
> }
> tmp32 += env[9];
> early_return_label:
>
>
> (The compiler would probably optimize the loop unrolling above to look very
> different though).
>
> If you expect the test to often be invalidated at index 0, I'd then instead
> change the code to be,
> int64_t tmp64 = env[0];
> if (tmp64 < kZeroThreshold) {
> for (i = 1; i < 10; ++i) {
> tmp64 = env[i];
> }
> }
>
> That would provide a really efficient early return, and would give a very nice
> optimize able loop. I don't think it is worth the extra code though, as this
is
> an extremely cheap loop for the compiler to perform, and as these types of
> optimization only affect the average level of the complexity which means that
it
> is mainly a way to save electrical power of the CPU (i.e., does not affect the
> algorithmic complexity).
>
>
> As I see it, the only reason for not going for 64 bit variables should be a
high
> cost of the 64 bit additions. Do you know if that is the case?
>
>
I can basically follow your suggestion, but I think it is a chance to learn
something. Let me continue the discussion a bit.
I think pipelining is useful if the lines in the loop can be parallized, this
wiki page has good explanation.
https://en.wikipedia.org/wiki/Software_pipelining
For summation, parallization is not possible.
peah-webrtc
2016/05/24 10:46:29
I don't know this as well as I should do, but I me
On 2016/05/24 07:16:12, minyue-webrtc wrote:
> On 2016/05/24 06:16:20, peah-webrtc wrote:
> > On 2016/05/24 01:32:26, minyue-webrtc wrote:
> > > On 2016/05/23 12:00:19, peah-webrtc wrote:
> > > > On 2016/05/23 07:40:48, minyue-webrtc wrote:
> > > > > On 2016/05/23 04:59:36, peah-webrtc wrote:
> > > > > > This solution definitely works, but another variant would have been
> just
> > > to
> > > > > > replace
> > > > > > int32_t tmp32 = 0;
> > > > > > by
> > > > > > int64_t tmp64 = 0;
> > > > > > and then do the additions in 64 bit instead.
> > > > > >
> > > > > > that should probably also be faster than having the if-statement
> inside
> > > the
> > > > > > for-loop.
> > > > > >
> > > > > > Or is there a reason not to use 64 bit values here?
> > > > >
> > > > > Yes, I think early exiting the loop is more efficient, too.
> > > >
> > > > I think that this would be a clear case where a 64 bit value should be
> used.
> > > It
> > > > will lead to cleaner code and I doubt that the early return will save
> > > anything,
> > > > as this is a very short loop of fixed size, which should be ideal to
> > pipeline
> > > > (meaning that an early return won't save computations as it anyway
breaks
> > the
> > > > pipeline).
> > > >
> > > > The only reason I can see for not using a 64 bit value for the sum is if
> it
> > is
> > > > significantly more complex. Do you know if that is the case?
> > >
> > > Loop in 481-488 is really to check if env[0..9] are generally zeros. In
most
> > > cases, this should be invalidated at index 0, it is nice that we return
> early.
> > I
> > > do not fully understand the pipeline argument.
> >
> > What I mean by pipelining is that as the loop is so predictable for the
> compiler
> > that it should be extremely efficient to perform for the CPU and it utilizes
> the
> > CPU pipeline in a really good manner. Possibly it could even be done using
> SIMD
> > instructions. Since the loop limit is fixed, and as low as 10, I expect the
> > compiler would probably do loop-unrolling on the loop, e.g, replace the loop
> by.
> > tmp32 += env[0];
> > tmp32 += env[1];
> > ...
> > tmp32 += env[9];
> > Therefore, this loop is extremely cheap and doing an early return is
probably
> > not worth it, and would potentially even increase the complexity.
> >
> > The compiler would possibly be able to do good pipelining of the
early-return
> > variant of the loop but it is at least not as straightforward to do that,
and
> > the break statement will anyway break the pipeline for the loop. Also, loop
> > unrolling would be less nice, e.g.,
> > if (env[0] >= kZeroThreshold || tmp32 >= kZeroThreshold) {
> > goto early_return_label;
> > }
> > tmp32 += env[0];
> > if (env[1] >= kZeroThreshold || tmp32 >= kZeroThreshold) {
> > goto early_return_label;
> > }
> > tmp32 += env[1];
> >
> > ...
> > if (env[9] >= kZeroThreshold || tmp32 >= kZeroThreshold) {
> > goto early_return_label;
> > }
> > tmp32 += env[9];
> > early_return_label:
> >
> >
> > (The compiler would probably optimize the loop unrolling above to look very
> > different though).
> >
> > If you expect the test to often be invalidated at index 0, I'd then instead
> > change the code to be,
> > int64_t tmp64 = env[0];
> > if (tmp64 < kZeroThreshold) {
> > for (i = 1; i < 10; ++i) {
> > tmp64 = env[i];
> > }
> > }
> >
> > That would provide a really efficient early return, and would give a very
nice
> > optimize able loop. I don't think it is worth the extra code though, as this
> is
> > an extremely cheap loop for the compiler to perform, and as these types of
> > optimization only affect the average level of the complexity which means
that
> it
> > is mainly a way to save electrical power of the CPU (i.e., does not affect
the
> > algorithmic complexity).
> >
> >
> > As I see it, the only reason for not going for 64 bit variables should be a
> high
> > cost of the 64 bit additions. Do you know if that is the case?
> >
> >
> I can basically follow your suggestion, but I think it is a chance to learn
> something. Let me continue the discussion a bit.
>
> I think pipelining is useful if the lines in the loop can be parallized, this
> wiki page has good explanation.
>
> https://en.wikipedia.org/wiki/Software_pipelining
>
> For summation, parallization is not possible.
I don't know this as well as I should do, but I meant primarily hardware
pipelining https://en.wikipedia.org/wiki/Instruction_pipelining
For summation, the lines in the loop can definitely be parallellized in hardware
(probably in this case via loop-unrolling though.
E.g.
sum = A+B+C+D can be parallelized as
E=A+B
F=C+D
sum=E+F
where E and F can be computed in parallel independently of each other.
minyue-webrtc
2016/05/24 13:05:45
ok, I can do int64 sum.
On 2016/05/24 10:46:29, peah-webrtc wrote:
> On 2016/05/24 07:16:12, minyue-webrtc wrote:
> > On 2016/05/24 06:16:20, peah-webrtc wrote:
> > > On 2016/05/24 01:32:26, minyue-webrtc wrote:
> > > > On 2016/05/23 12:00:19, peah-webrtc wrote:
> > > > > On 2016/05/23 07:40:48, minyue-webrtc wrote:
> > > > > > On 2016/05/23 04:59:36, peah-webrtc wrote:
> > > > > > > This solution definitely works, but another variant would have
been
> > just
> > > > to
> > > > > > > replace
> > > > > > > int32_t tmp32 = 0;
> > > > > > > by
> > > > > > > int64_t tmp64 = 0;
> > > > > > > and then do the additions in 64 bit instead.
> > > > > > >
> > > > > > > that should probably also be faster than having the if-statement
> > inside
> > > > the
> > > > > > > for-loop.
> > > > > > >
> > > > > > > Or is there a reason not to use 64 bit values here?
> > > > > >
> > > > > > Yes, I think early exiting the loop is more efficient, too.
> > > > >
> > > > > I think that this would be a clear case where a 64 bit value should be
> > used.
> > > > It
> > > > > will lead to cleaner code and I doubt that the early return will save
> > > > anything,
> > > > > as this is a very short loop of fixed size, which should be ideal to
> > > pipeline
> > > > > (meaning that an early return won't save computations as it anyway
> breaks
> > > the
> > > > > pipeline).
> > > > >
> > > > > The only reason I can see for not using a 64 bit value for the sum is
if
> > it
> > > is
> > > > > significantly more complex. Do you know if that is the case?
> > > >
> > > > Loop in 481-488 is really to check if env[0..9] are generally zeros. In
> most
> > > > cases, this should be invalidated at index 0, it is nice that we return
> > early.
> > > I
> > > > do not fully understand the pipeline argument.
> > >
> > > What I mean by pipelining is that as the loop is so predictable for the
> > compiler
> > > that it should be extremely efficient to perform for the CPU and it
utilizes
> > the
> > > CPU pipeline in a really good manner. Possibly it could even be done using
> > SIMD
> > > instructions. Since the loop limit is fixed, and as low as 10, I expect
the
> > > compiler would probably do loop-unrolling on the loop, e.g, replace the
loop
> > by.
> > > tmp32 += env[0];
> > > tmp32 += env[1];
> > > ...
> > > tmp32 += env[9];
> > > Therefore, this loop is extremely cheap and doing an early return is
> probably
> > > not worth it, and would potentially even increase the complexity.
> > >
> > > The compiler would possibly be able to do good pipelining of the
> early-return
> > > variant of the loop but it is at least not as straightforward to do that,
> and
> > > the break statement will anyway break the pipeline for the loop. Also,
loop
> > > unrolling would be less nice, e.g.,
> > > if (env[0] >= kZeroThreshold || tmp32 >= kZeroThreshold) {
> > > goto early_return_label;
> > > }
> > > tmp32 += env[0];
> > > if (env[1] >= kZeroThreshold || tmp32 >= kZeroThreshold) {
> > > goto early_return_label;
> > > }
> > > tmp32 += env[1];
> > >
> > > ...
> > > if (env[9] >= kZeroThreshold || tmp32 >= kZeroThreshold) {
> > > goto early_return_label;
> > > }
> > > tmp32 += env[9];
> > > early_return_label:
> > >
> > >
> > > (The compiler would probably optimize the loop unrolling above to look
very
> > > different though).
> > >
> > > If you expect the test to often be invalidated at index 0, I'd then
instead
> > > change the code to be,
> > > int64_t tmp64 = env[0];
> > > if (tmp64 < kZeroThreshold) {
> > > for (i = 1; i < 10; ++i) {
> > > tmp64 = env[i];
> > > }
> > > }
> > >
> > > That would provide a really efficient early return, and would give a very
> nice
> > > optimize able loop. I don't think it is worth the extra code though, as
this
> > is
> > > an extremely cheap loop for the compiler to perform, and as these types of
> > > optimization only affect the average level of the complexity which means
> that
> > it
> > > is mainly a way to save electrical power of the CPU (i.e., does not affect
> the
> > > algorithmic complexity).
> > >
> > >
> > > As I see it, the only reason for not going for 64 bit variables should be
a
> > high
> > > cost of the 64 bit additions. Do you know if that is the case?
> > >
> > >
> > I can basically follow your suggestion, but I think it is a chance to learn
> > something. Let me continue the discussion a bit.
> >
> > I think pipelining is useful if the lines in the loop can be parallized,
this
> > wiki page has good explanation.
> >
> > https://en.wikipedia.org/wiki/Software_pipelining
> >
> > For summation, parallization is not possible.
>
> I don't know this as well as I should do, but I meant primarily hardware
> pipelining https://en.wikipedia.org/wiki/Instruction_pipelining
>
> For summation, the lines in the loop can definitely be parallellized in
hardware
> (probably in this case via loop-unrolling though.
>
> E.g.
> sum = A+B+C+D can be parallelized as
> E=A+B
> F=C+D
> sum=E+F
> where E and F can be computed in parallel independently of each other.
>
>
ok, I can do int64 sum.
|
/* Is the input signal zero? */ |
for (i = 0; i < 10; i++) { |
+ if (env[i] >= kZeroThreshold || tmp32 >= kZeroThreshold) { |
+ break; |
+ } |
tmp32 += env[i]; |
} |
/* Each block is allowed to have a few non-zero |
* samples. |
*/ |
- if (tmp32 < 500) { |
+ if (i == 10 && tmp32 < kZeroThreshold) { |
stt->msZero += 10; |
} else { |
stt->msZero = 0; |