Issue 2003623003: Re-enable UBsan on AGC.

minyue-webrtc

Patchset #1 (id:1) has been deleted

4 years, 7 months ago (2016-05-21 08:28:48 UTC) #1

minyue-webrtc

Description was changed from ========== fixing BUG= ========== to ========== Re-enable UBsan on AGC. BUG=webrtc:5530 ...

4 years, 7 months ago (2016-05-21 12:30:32 UTC) #8

minyue-webrtc

minyue@webrtc.org changed reviewers: + peah@webrtc.org

4 years, 7 months ago (2016-05-21 12:30:32 UTC) #9

minyue-webrtc

Hi Per, Would you take a look at this fix on AGC overflow? https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_processing/agc/legacy/digital_agc.c File ...

4 years, 7 months ago (2016-05-21 12:33:37 UTC) #10

peah-webrtc

https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_processing/agc/legacy/analog_agc.c File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right): https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_processing/agc/legacy/analog_agc.c#newcode479 webrtc/modules/audio_processing/agc/legacy/analog_agc.c:479: const int kZeroThreshold = 500; This solution definitely works, ...

4 years, 7 months ago (2016-05-23 04:59:37 UTC) #11

https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro...
File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right):

https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro...
webrtc/modules/audio_processing/agc/legacy/analog_agc.c:479: const int
kZeroThreshold = 500;
This solution definitely works, but another variant would have been just to
replace 
int32_t tmp32 = 0;
by
int64_t tmp64 = 0;
and then do the additions in 64 bit instead.

that should probably also be faster than having the if-statement inside the
for-loop.

Or is there a reason not to use 64 bit values here?

https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro...
File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right):

https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro...
webrtc/modules/audio_processing/agc/legacy/digital_agc.c:194: zeros =
WEBRTC_SPL_MIN(WebRtcSpl_NormW32(numFIX * 2),
How do you know that numFIX * 2 does not overflow? Normalizing as
WebRtcSpl_NormW32(numFIX) which was done before seems safer.

Alternatively, if you want the norm to be one less than before, it is better to
compute zeros as
WebRtcSpl_NormW32(numFIX) - 1

https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro...
webrtc/modules/audio_processing/agc/legacy/digital_agc.c:194: zeros =
WEBRTC_SPL_MIN(WebRtcSpl_NormW32(numFIX * 2),
The norm computation is on some platforms fairly complex. It probably does not
matter that much, but it is likely faster to do instead do:
int64_t tmp64 = numFIX * 256;  // Q8.
if (tmp64 > den)  // |den| is Q8. 
{
 zeros = WebRtcSpl_NormW32(numFIX);
} else {
 zeros = WebRtcSpl_NormW32(den) + 8;
}

https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro...
webrtc/modules/audio_processing/agc/legacy/digital_agc.c:201: numFIX -= tmp32no1
/ 2;
Does changing WebRtcSpl_NormW32(numFIX) to WebRtcSpl_NormW32(numFIX * 2) solve
the overflow on these on 201+203?

minyue-webrtc

Thank you for the review! I have a better solution now. PTAL. https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_processing/agc/legacy/analog_agc.c File webrtc/modules/audio_processing/agc/legacy/analog_agc.c ...

4 years, 7 months ago (2016-05-23 07:40:48 UTC) #13

Thank you for the review! I have a better solution now. PTAL.

https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro...
File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right):

https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro...
webrtc/modules/audio_processing/agc/legacy/analog_agc.c:479: const int
kZeroThreshold = 500;
On 2016/05/23 04:59:36, peah-webrtc wrote:
> This solution definitely works, but another variant would have been just to
> replace 
> int32_t tmp32 = 0;
> by
> int64_t tmp64 = 0;
> and then do the additions in 64 bit instead.
> 
> that should probably also be faster than having the if-statement inside the
> for-loop.
> 
> Or is there a reason not to use 64 bit values here?

Yes, I think early exiting the loop is more efficient, too.

https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro...
File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right):

https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro...
webrtc/modules/audio_processing/agc/legacy/digital_agc.c:194: zeros =
WEBRTC_SPL_MIN(WebRtcSpl_NormW32(numFIX * 2),
Good question, after a second thought, I see that |den| being too large can also
overflow. A better way is to deal with rounding in another way. See my new
patch.

On 2016/05/23 04:59:37, peah-webrtc wrote:
> How do you know that numFIX * 2 does not overflow? Normalizing as
> WebRtcSpl_NormW32(numFIX) which was done before seems safer.
> 
> Alternatively, if you want the norm to be one less than before, it is better
to
> compute zeros as
> WebRtcSpl_NormW32(numFIX) - 1

https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro...
File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right):

https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro...
webrtc/modules/audio_processing/agc/legacy/digital_agc.c:192: if (numFIX > (den
>> 8) || -numFIX > (den >> 8))  // |den| is Q8.
this part is switched back to the old style, but added a new condition "||
-numFIX > (den >> 8)", which is indeed an error of the old code (error when
numFIX is negative)

https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro...
webrtc/modules/audio_processing/agc/legacy/digital_agc.c:203: int remainder =
numFIX % tmp32no1;
this part is a new solution to rounding numFIX / tmp32no1.

The bitexactness test showed that this solution is equivalent in normal
conditions, but it is safer since it avoids overflow risk in line 201/203 in the
old code.

peah-webrtc

https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_processing/agc/legacy/analog_agc.c File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right): https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_processing/agc/legacy/analog_agc.c#newcode479 webrtc/modules/audio_processing/agc/legacy/analog_agc.c:479: const int kZeroThreshold = 500; On 2016/05/23 07:40:48, minyue-webrtc ...

4 years, 7 months ago (2016-05-23 12:00:19 UTC) #14

https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro...
File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right):

https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro...
webrtc/modules/audio_processing/agc/legacy/analog_agc.c:479: const int
kZeroThreshold = 500;
On 2016/05/23 07:40:48, minyue-webrtc wrote:
> On 2016/05/23 04:59:36, peah-webrtc wrote:
> > This solution definitely works, but another variant would have been just to
> > replace 
> > int32_t tmp32 = 0;
> > by
> > int64_t tmp64 = 0;
> > and then do the additions in 64 bit instead.
> > 
> > that should probably also be faster than having the if-statement inside the
> > for-loop.
> > 
> > Or is there a reason not to use 64 bit values here?
> 
> Yes, I think early exiting the loop is more efficient, too.

I think that this would be a clear case where a 64 bit value should be used. It
will lead to cleaner code and I doubt that the early return will save anything,
as this is a very short loop of fixed size, which should be ideal to pipeline
(meaning that an early return won't save computations as it anyway breaks the
pipeline).

The only reason I can see for not using a 64 bit value for the sum is if it is
significantly more complex. Do you know if that is the case?

https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro...
File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right):

https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro...
webrtc/modules/audio_processing/agc/legacy/digital_agc.c:192: if (numFIX > (den
>> 8) || -numFIX > (den >> 8))  // |den| is Q8.
On 2016/05/23 07:40:48, minyue-webrtc wrote:
> this part is switched back to the old style, but added a new condition "||
> -numFIX > (den >> 8)", which is indeed an error of the old code (error when
> numFIX is negative)

Nice!

https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro...
webrtc/modules/audio_processing/agc/legacy/digital_agc.c:202: y32 = numFIX /
tmp32no1;  // in Q14
I don't think this is the right approach, even though it may work.
It is quite complicated and it also makes use of the % operator which I expect
to be as complex as a division in fixed point code.

The standard way of solving this is to normalize one step less, which will give
you headroom in the addition. I think you should do that, as it leads to much
simpler code and is cheaper as well.

Furthermore, I don't think the addition of
tmp32no1 / 2 is really needed. It should be sufficient to add 1<< (zeros-1) (or
subtract that, dependent on the sign of numFix). 
That is the standard way of doing rounding in fixed point code.

kwiberg-webrtc

https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_processing/agc/legacy/analog_agc.c File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right): https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_processing/agc/legacy/analog_agc.c#newcode479 webrtc/modules/audio_processing/agc/legacy/analog_agc.c:479: const int kZeroThreshold = 500; On 2016/05/23 12:00:19, peah-webrtc ...

4 years, 7 months ago (2016-05-23 12:44:40 UTC) #15

minyue-webrtc

https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_processing/agc/legacy/analog_agc.c File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right): https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_processing/agc/legacy/analog_agc.c#newcode479 webrtc/modules/audio_processing/agc/legacy/analog_agc.c:479: const int kZeroThreshold = 500; On 2016/05/23 12:00:19, peah-webrtc ...

4 years, 7 months ago (2016-05-24 01:32:27 UTC) #16

https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro...
File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right):

https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro...
webrtc/modules/audio_processing/agc/legacy/analog_agc.c:479: const int
kZeroThreshold = 500;
On 2016/05/23 12:00:19, peah-webrtc wrote:
> On 2016/05/23 07:40:48, minyue-webrtc wrote:
> > On 2016/05/23 04:59:36, peah-webrtc wrote:
> > > This solution definitely works, but another variant would have been just
to
> > > replace 
> > > int32_t tmp32 = 0;
> > > by
> > > int64_t tmp64 = 0;
> > > and then do the additions in 64 bit instead.
> > > 
> > > that should probably also be faster than having the if-statement inside
the
> > > for-loop.
> > > 
> > > Or is there a reason not to use 64 bit values here?
> > 
> > Yes, I think early exiting the loop is more efficient, too.
> 
> I think that this would be a clear case where a 64 bit value should be used.
It
> will lead to cleaner code and I doubt that the early return will save
anything,
> as this is a very short loop of fixed size, which should be ideal to pipeline
> (meaning that an early return won't save computations as it anyway breaks the
> pipeline).
> 
> The only reason I can see for not using a 64 bit value for the sum is if it is
> significantly more complex. Do you know if that is the case?

Loop in 481-488 is really to check if env[0..9] are generally zeros. In most
cases, this should be invalidated at index 0, it is nice that we return early. I
do not fully understand the pipeline argument.

https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro...
File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right):

https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro...
webrtc/modules/audio_processing/agc/legacy/digital_agc.c:202: y32 = numFIX /
tmp32no1;  // in Q14
On 2016/05/23 12:00:19, peah-webrtc wrote:
> I don't think this is the right approach, even though it may work.
> It is quite complicated and it also makes use of the % operator which I expect
> to be as complex as a division in fixed point code.
> 
> The standard way of solving this is to normalize one step less, which will
give
> you headroom in the addition. I think you should do that, as it leads to much
> simpler code and is cheaper as well.
> 
> Furthermore, I don't think the addition of
> tmp32no1 / 2 is really needed. It should be sufficient to add 1<< (zeros-1)
(or
> subtract that, dependent on the sign of numFix). 
> That is the standard way of doing rounding in fixed point code.

+ 1<<(zeros-1) is to do ceil(), what we want here is round(). WDYT

peah-webrtc

https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_processing/agc/legacy/analog_agc.c File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right): https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_processing/agc/legacy/analog_agc.c#newcode479 webrtc/modules/audio_processing/agc/legacy/analog_agc.c:479: const int kZeroThreshold = 500; On 2016/05/24 01:32:26, minyue-webrtc ...

4 years, 7 months ago (2016-05-24 06:16:20 UTC) #17

https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro...
File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right):

https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro...
webrtc/modules/audio_processing/agc/legacy/analog_agc.c:479: const int
kZeroThreshold = 500;
On 2016/05/24 01:32:26, minyue-webrtc wrote:
> On 2016/05/23 12:00:19, peah-webrtc wrote:
> > On 2016/05/23 07:40:48, minyue-webrtc wrote:
> > > On 2016/05/23 04:59:36, peah-webrtc wrote:
> > > > This solution definitely works, but another variant would have been just
> to
> > > > replace 
> > > > int32_t tmp32 = 0;
> > > > by
> > > > int64_t tmp64 = 0;
> > > > and then do the additions in 64 bit instead.
> > > > 
> > > > that should probably also be faster than having the if-statement inside
> the
> > > > for-loop.
> > > > 
> > > > Or is there a reason not to use 64 bit values here?
> > > 
> > > Yes, I think early exiting the loop is more efficient, too.
> > 
> > I think that this would be a clear case where a 64 bit value should be used.
> It
> > will lead to cleaner code and I doubt that the early return will save
> anything,
> > as this is a very short loop of fixed size, which should be ideal to
pipeline
> > (meaning that an early return won't save computations as it anyway breaks
the
> > pipeline).
> > 
> > The only reason I can see for not using a 64 bit value for the sum is if it
is
> > significantly more complex. Do you know if that is the case?
> 
> Loop in 481-488 is really to check if env[0..9] are generally zeros. In most
> cases, this should be invalidated at index 0, it is nice that we return early.
I
> do not fully understand the pipeline argument.

What I mean by pipelining is that as the loop is so predictable for the compiler
that it should be extremely efficient to perform for the CPU and it utilizes the
CPU pipeline in a really good manner. Possibly it could even be done using SIMD
instructions. Since the loop limit is fixed, and as low as 10, I expect the
compiler would probably do loop-unrolling on the loop, e.g, replace the loop by.
tmp32 += env[0];
tmp32 += env[1];
...
tmp32 += env[9];
Therefore, this loop is extremely cheap and doing an early return is probably
not worth it, and would potentially even increase the complexity.

The compiler would possibly be able to do good pipelining of the early-return
variant of the loop but it is at least not as straightforward to do that, and
the break statement will anyway break the pipeline for the loop. Also, loop
unrolling would be less nice, e.g.,
if (env[0] >= kZeroThreshold || tmp32 >= kZeroThreshold) {
  goto early_return_label;
}
tmp32 += env[0];
if (env[1] >= kZeroThreshold || tmp32 >= kZeroThreshold) {
  goto early_return_label;
}
tmp32 += env[1];

...
if (env[9] >= kZeroThreshold || tmp32 >= kZeroThreshold) {
  goto early_return_label;
}
tmp32 += env[9];
early_return_label:

(The compiler would probably optimize the loop unrolling above to look very
different though).

If you expect the test to often be invalidated at index 0, I'd then instead
change the code to be, 
int64_t tmp64 = env[0];
if (tmp64 < kZeroThreshold) {
  for (i = 1; i < 10; ++i) {
    tmp64 = env[i];
  }
}

That would provide a really efficient early return, and would give a very nice
optimize able loop. I don't think it is worth the extra code though, as this is
an extremely cheap loop for the compiler to perform, and as these types of
optimization only affect the average level of the complexity which means that it
is mainly a way to save electrical power of the CPU (i.e., does not affect the
algorithmic complexity).

As I see it, the only reason for not going for 64 bit variables should be a high
cost of the 64 bit additions. Do you know if that is the case?

https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro...
File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right):

https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro...
webrtc/modules/audio_processing/agc/legacy/digital_agc.c:202: y32 = numFIX /
tmp32no1;  // in Q14
On 2016/05/24 01:32:26, minyue-webrtc wrote:
> On 2016/05/23 12:00:19, peah-webrtc wrote:
> > I don't think this is the right approach, even though it may work.
> > It is quite complicated and it also makes use of the % operator which I
expect
> > to be as complex as a division in fixed point code.
> > 
> > The standard way of solving this is to normalize one step less, which will
> give
> > you headroom in the addition. I think you should do that, as it leads to
much
> > simpler code and is cheaper as well.
> > 
> > Furthermore, I don't think the addition of
> > tmp32no1 / 2 is really needed. It should be sufficient to add 1<< (zeros-1)
> (or
> > subtract that, dependent on the sign of numFix). 
> > That is the standard way of doing rounding in fixed point code.
> 
> + 1<<(zeros-1) is to do ceil(), what we want here is round(). WDYT

Sorry, I realize that I was wrong about that the addition of tmp32no1 / 2 not
being needed. My comment about that was not correct at all.

However, I still think that setting zeros to one less than it is done now is the
correct solution, as that will should give headroom for the addition of
tmp32nol/2. 

Changing the value of zeros to one less does not change the rounding to a ceil.
It only affects the accuracy of the division.
WDYT?

minyue-webrtc

https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_processing/agc/legacy/analog_agc.c File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right): https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_processing/agc/legacy/analog_agc.c#newcode479 webrtc/modules/audio_processing/agc/legacy/analog_agc.c:479: const int kZeroThreshold = 500; On 2016/05/24 06:16:20, peah-webrtc ...

4 years, 7 months ago (2016-05-24 07:16:12 UTC) #18

https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro...
File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right):

https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro...
webrtc/modules/audio_processing/agc/legacy/analog_agc.c:479: const int
kZeroThreshold = 500;
On 2016/05/24 06:16:20, peah-webrtc wrote:
> On 2016/05/24 01:32:26, minyue-webrtc wrote:
> > On 2016/05/23 12:00:19, peah-webrtc wrote:
> > > On 2016/05/23 07:40:48, minyue-webrtc wrote:
> > > > On 2016/05/23 04:59:36, peah-webrtc wrote:
> > > > > This solution definitely works, but another variant would have been
just
> > to
> > > > > replace 
> > > > > int32_t tmp32 = 0;
> > > > > by
> > > > > int64_t tmp64 = 0;
> > > > > and then do the additions in 64 bit instead.
> > > > > 
> > > > > that should probably also be faster than having the if-statement
inside
> > the
> > > > > for-loop.
> > > > > 
> > > > > Or is there a reason not to use 64 bit values here?
> > > > 
> > > > Yes, I think early exiting the loop is more efficient, too.
> > > 
> > > I think that this would be a clear case where a 64 bit value should be
used.
> > It
> > > will lead to cleaner code and I doubt that the early return will save
> > anything,
> > > as this is a very short loop of fixed size, which should be ideal to
> pipeline
> > > (meaning that an early return won't save computations as it anyway breaks
> the
> > > pipeline).
> > > 
> > > The only reason I can see for not using a 64 bit value for the sum is if
it
> is
> > > significantly more complex. Do you know if that is the case?
> > 
> > Loop in 481-488 is really to check if env[0..9] are generally zeros. In most
> > cases, this should be invalidated at index 0, it is nice that we return
early.
> I
> > do not fully understand the pipeline argument.
> 
> What I mean by pipelining is that as the loop is so predictable for the
compiler
> that it should be extremely efficient to perform for the CPU and it utilizes
the
> CPU pipeline in a really good manner. Possibly it could even be done using
SIMD
> instructions. Since the loop limit is fixed, and as low as 10, I expect the
> compiler would probably do loop-unrolling on the loop, e.g, replace the loop
by.
> tmp32 += env[0];
> tmp32 += env[1];
> ...
> tmp32 += env[9];
> Therefore, this loop is extremely cheap and doing an early return is probably
> not worth it, and would potentially even increase the complexity.
> 
> The compiler would possibly be able to do good pipelining of the early-return
> variant of the loop but it is at least not as straightforward to do that, and
> the break statement will anyway break the pipeline for the loop. Also, loop
> unrolling would be less nice, e.g.,
> if (env[0] >= kZeroThreshold || tmp32 >= kZeroThreshold) {
>   goto early_return_label;
> }
> tmp32 += env[0];
> if (env[1] >= kZeroThreshold || tmp32 >= kZeroThreshold) {
>   goto early_return_label;
> }
> tmp32 += env[1];
> 
> ...
> if (env[9] >= kZeroThreshold || tmp32 >= kZeroThreshold) {
>   goto early_return_label;
> }
> tmp32 += env[9];
> early_return_label:
> 
> 
> (The compiler would probably optimize the loop unrolling above to look very
> different though).
> 
> If you expect the test to often be invalidated at index 0, I'd then instead
> change the code to be, 
> int64_t tmp64 = env[0];
> if (tmp64 < kZeroThreshold) {
>   for (i = 1; i < 10; ++i) {
>     tmp64 = env[i];
>   }
> }
> 
> That would provide a really efficient early return, and would give a very nice
> optimize able loop. I don't think it is worth the extra code though, as this
is
> an extremely cheap loop for the compiler to perform, and as these types of
> optimization only affect the average level of the complexity which means that
it
> is mainly a way to save electrical power of the CPU (i.e., does not affect the
> algorithmic complexity).
> 
> 
> As I see it, the only reason for not going for 64 bit variables should be a
high
> cost of the 64 bit additions. Do you know if that is the case?
> 
> 
I can basically follow your suggestion, but I think it is a chance to learn
something. Let me continue the discussion a bit.

I think pipelining is useful if the lines in the loop can be parallized, this
wiki page has good explanation.

https://en.wikipedia.org/wiki/Software_pipelining

For summation, parallization is not possible.

https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro...
File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right):

https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro...
webrtc/modules/audio_processing/agc/legacy/digital_agc.c:202: y32 = numFIX /
tmp32no1;  // in Q14
On 2016/05/24 06:16:20, peah-webrtc wrote:
> On 2016/05/24 01:32:26, minyue-webrtc wrote:
> > On 2016/05/23 12:00:19, peah-webrtc wrote:
> > > I don't think this is the right approach, even though it may work.
> > > It is quite complicated and it also makes use of the % operator which I
> expect
> > > to be as complex as a division in fixed point code.
> > > 
> > > The standard way of solving this is to normalize one step less, which will
> > give
> > > you headroom in the addition. I think you should do that, as it leads to
> much
> > > simpler code and is cheaper as well.
> > > 
> > > Furthermore, I don't think the addition of
> > > tmp32no1 / 2 is really needed. It should be sufficient to add 1<<
(zeros-1)
> > (or
> > > subtract that, dependent on the sign of numFix). 
> > > That is the standard way of doing rounding in fixed point code.
> > 
> > + 1<<(zeros-1) is to do ceil(), what we want here is round(). WDYT
> 
> Sorry, I realize that I was wrong about that the addition of tmp32no1 / 2 not
> being needed. My comment about that was not correct at all.
> 
> However, I still think that setting zeros to one less than it is done now is
the
> correct solution, as that will should give headroom for the addition of
> tmp32nol/2. 
> 
> Changing the value of zeros to one less does not change the rounding to a
ceil.
> It only affects the accuracy of the division.
> WDYT?

I also realized that +1<<(zero-1) does not do ceil(). But that is not the core
part of the discussion. What worth discuss is weather we want a precise round()
or not. In my opinion, finding the nearest integer should be most reasonable. 1)
it is well defined, 2) we do all shifting above to improve accuracy, and we
should not be too relaxed here, 3) Computationally, it should not be much more
complex. and if you think % will do another /, we may calculate remainder by
numFIX - y32 * tmp32nol.

peah-webrtc

https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_processing/agc/legacy/analog_agc.c File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right): https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_processing/agc/legacy/analog_agc.c#newcode479 webrtc/modules/audio_processing/agc/legacy/analog_agc.c:479: const int kZeroThreshold = 500; On 2016/05/24 07:16:12, minyue-webrtc ...

4 years, 7 months ago (2016-05-24 10:46:29 UTC) #19

https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro...
File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right):

https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro...
webrtc/modules/audio_processing/agc/legacy/analog_agc.c:479: const int
kZeroThreshold = 500;
On 2016/05/24 07:16:12, minyue-webrtc wrote:
> On 2016/05/24 06:16:20, peah-webrtc wrote:
> > On 2016/05/24 01:32:26, minyue-webrtc wrote:
> > > On 2016/05/23 12:00:19, peah-webrtc wrote:
> > > > On 2016/05/23 07:40:48, minyue-webrtc wrote:
> > > > > On 2016/05/23 04:59:36, peah-webrtc wrote:
> > > > > > This solution definitely works, but another variant would have been
> just
> > > to
> > > > > > replace 
> > > > > > int32_t tmp32 = 0;
> > > > > > by
> > > > > > int64_t tmp64 = 0;
> > > > > > and then do the additions in 64 bit instead.
> > > > > > 
> > > > > > that should probably also be faster than having the if-statement
> inside
> > > the
> > > > > > for-loop.
> > > > > > 
> > > > > > Or is there a reason not to use 64 bit values here?
> > > > > 
> > > > > Yes, I think early exiting the loop is more efficient, too.
> > > > 
> > > > I think that this would be a clear case where a 64 bit value should be
> used.
> > > It
> > > > will lead to cleaner code and I doubt that the early return will save
> > > anything,
> > > > as this is a very short loop of fixed size, which should be ideal to
> > pipeline
> > > > (meaning that an early return won't save computations as it anyway
breaks
> > the
> > > > pipeline).
> > > > 
> > > > The only reason I can see for not using a 64 bit value for the sum is if
> it
> > is
> > > > significantly more complex. Do you know if that is the case?
> > > 
> > > Loop in 481-488 is really to check if env[0..9] are generally zeros. In
most
> > > cases, this should be invalidated at index 0, it is nice that we return
> early.
> > I
> > > do not fully understand the pipeline argument.
> > 
> > What I mean by pipelining is that as the loop is so predictable for the
> compiler
> > that it should be extremely efficient to perform for the CPU and it utilizes
> the
> > CPU pipeline in a really good manner. Possibly it could even be done using
> SIMD
> > instructions. Since the loop limit is fixed, and as low as 10, I expect the
> > compiler would probably do loop-unrolling on the loop, e.g, replace the loop
> by.
> > tmp32 += env[0];
> > tmp32 += env[1];
> > ...
> > tmp32 += env[9];
> > Therefore, this loop is extremely cheap and doing an early return is
probably
> > not worth it, and would potentially even increase the complexity.
> > 
> > The compiler would possibly be able to do good pipelining of the
early-return
> > variant of the loop but it is at least not as straightforward to do that,
and
> > the break statement will anyway break the pipeline for the loop. Also, loop
> > unrolling would be less nice, e.g.,
> > if (env[0] >= kZeroThreshold || tmp32 >= kZeroThreshold) {
> >   goto early_return_label;
> > }
> > tmp32 += env[0];
> > if (env[1] >= kZeroThreshold || tmp32 >= kZeroThreshold) {
> >   goto early_return_label;
> > }
> > tmp32 += env[1];
> > 
> > ...
> > if (env[9] >= kZeroThreshold || tmp32 >= kZeroThreshold) {
> >   goto early_return_label;
> > }
> > tmp32 += env[9];
> > early_return_label:
> > 
> > 
> > (The compiler would probably optimize the loop unrolling above to look very
> > different though).
> > 
> > If you expect the test to often be invalidated at index 0, I'd then instead
> > change the code to be, 
> > int64_t tmp64 = env[0];
> > if (tmp64 < kZeroThreshold) {
> >   for (i = 1; i < 10; ++i) {
> >     tmp64 = env[i];
> >   }
> > }
> > 
> > That would provide a really efficient early return, and would give a very
nice
> > optimize able loop. I don't think it is worth the extra code though, as this
> is
> > an extremely cheap loop for the compiler to perform, and as these types of
> > optimization only affect the average level of the complexity which means
that
> it
> > is mainly a way to save electrical power of the CPU (i.e., does not affect
the
> > algorithmic complexity).
> > 
> > 
> > As I see it, the only reason for not going for 64 bit variables should be a
> high
> > cost of the 64 bit additions. Do you know if that is the case?
> > 
> > 
> I can basically follow your suggestion, but I think it is a chance to learn
> something. Let me continue the discussion a bit.
> 
> I think pipelining is useful if the lines in the loop can be parallized, this
> wiki page has good explanation.
> 
> https://en.wikipedia.org/wiki/Software_pipelining
> 
> For summation, parallization is not possible.

I don't know this as well as I should do, but I meant primarily hardware
pipelining https://en.wikipedia.org/wiki/Instruction_pipelining

For summation, the lines in the loop can definitely be parallellized in hardware
(probably in this case via loop-unrolling though. 

E.g.
sum = A+B+C+D can be parallelized as
E=A+B
F=C+D
sum=E+F 
where E and F can be computed in parallel independently of each other.

https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro...
File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right):

https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro...
webrtc/modules/audio_processing/agc/legacy/digital_agc.c:202: y32 = numFIX /
tmp32no1;  // in Q14
On 2016/05/24 07:16:12, minyue-webrtc wrote:
> On 2016/05/24 06:16:20, peah-webrtc wrote:
> > On 2016/05/24 01:32:26, minyue-webrtc wrote:
> > > On 2016/05/23 12:00:19, peah-webrtc wrote:
> > > > I don't think this is the right approach, even though it may work.
> > > > It is quite complicated and it also makes use of the % operator which I
> > expect
> > > > to be as complex as a division in fixed point code.
> > > > 
> > > > The standard way of solving this is to normalize one step less, which
will
> > > give
> > > > you headroom in the addition. I think you should do that, as it leads to
> > much
> > > > simpler code and is cheaper as well.
> > > > 
> > > > Furthermore, I don't think the addition of
> > > > tmp32no1 / 2 is really needed. It should be sufficient to add 1<<
> (zeros-1)
> > > (or
> > > > subtract that, dependent on the sign of numFix). 
> > > > That is the standard way of doing rounding in fixed point code.
> > > 
> > > + 1<<(zeros-1) is to do ceil(), what we want here is round(). WDYT
> > 
> > Sorry, I realize that I was wrong about that the addition of tmp32no1 / 2
not
> > being needed. My comment about that was not correct at all.
> > 
> > However, I still think that setting zeros to one less than it is done now is
> the
> > correct solution, as that will should give headroom for the addition of
> > tmp32nol/2. 
> > 
> > Changing the value of zeros to one less does not change the rounding to a
> ceil.
> > It only affects the accuracy of the division.
> > WDYT?
> 
> I also realized that +1<<(zero-1) does not do ceil(). But that is not the core
> part of the discussion. What worth discuss is weather we want a precise
round()
> or not. In my opinion, finding the nearest integer should be most reasonable.
1)
> it is well defined, 2) we do all shifting above to improve accuracy, and we
> should not be too relaxed here, 3) Computationally, it should not be much more
> complex. and if you think % will do another /, we may calculate remainder by
> numFIX - y32 * tmp32nol.

I strongly doubt that we really need a rounding division here as this is a
gaintable computation. Do you know what negative impact a truncating division
would have?

The drawback of doing a zeros-1 normalization instead of a zeros normalization
is that you lose 1 bit accuracy in the 32 bit divisions. I would say that is
acceptable, in particular as the rest of the computations are in a fixed Q
value. 

Furthermore, looking at your code for this, I'm actually not sure it is
sufficient to avoid overflow. How can you guarantee that y32++ does not
overflow? (I could not come up with a good example for that though.)

minyue-webrtc

thanks for the explanation! see my comments and I'd like to see your preference and ...

4 years, 7 months ago (2016-05-24 13:05:45 UTC) #20

thanks for the explanation! see my comments and I'd like to see your preference
and I will go for it.

https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro...
File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right):

https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro...
webrtc/modules/audio_processing/agc/legacy/analog_agc.c:479: const int
kZeroThreshold = 500;
On 2016/05/24 10:46:29, peah-webrtc wrote:
> On 2016/05/24 07:16:12, minyue-webrtc wrote:
> > On 2016/05/24 06:16:20, peah-webrtc wrote:
> > > On 2016/05/24 01:32:26, minyue-webrtc wrote:
> > > > On 2016/05/23 12:00:19, peah-webrtc wrote:
> > > > > On 2016/05/23 07:40:48, minyue-webrtc wrote:
> > > > > > On 2016/05/23 04:59:36, peah-webrtc wrote:
> > > > > > > This solution definitely works, but another variant would have
been
> > just
> > > > to
> > > > > > > replace 
> > > > > > > int32_t tmp32 = 0;
> > > > > > > by
> > > > > > > int64_t tmp64 = 0;
> > > > > > > and then do the additions in 64 bit instead.
> > > > > > > 
> > > > > > > that should probably also be faster than having the if-statement
> > inside
> > > > the
> > > > > > > for-loop.
> > > > > > > 
> > > > > > > Or is there a reason not to use 64 bit values here?
> > > > > > 
> > > > > > Yes, I think early exiting the loop is more efficient, too.
> > > > > 
> > > > > I think that this would be a clear case where a 64 bit value should be
> > used.
> > > > It
> > > > > will lead to cleaner code and I doubt that the early return will save
> > > > anything,
> > > > > as this is a very short loop of fixed size, which should be ideal to
> > > pipeline
> > > > > (meaning that an early return won't save computations as it anyway
> breaks
> > > the
> > > > > pipeline).
> > > > > 
> > > > > The only reason I can see for not using a 64 bit value for the sum is
if
> > it
> > > is
> > > > > significantly more complex. Do you know if that is the case?
> > > > 
> > > > Loop in 481-488 is really to check if env[0..9] are generally zeros. In
> most
> > > > cases, this should be invalidated at index 0, it is nice that we return
> > early.
> > > I
> > > > do not fully understand the pipeline argument.
> > > 
> > > What I mean by pipelining is that as the loop is so predictable for the
> > compiler
> > > that it should be extremely efficient to perform for the CPU and it
utilizes
> > the
> > > CPU pipeline in a really good manner. Possibly it could even be done using
> > SIMD
> > > instructions. Since the loop limit is fixed, and as low as 10, I expect
the
> > > compiler would probably do loop-unrolling on the loop, e.g, replace the
loop
> > by.
> > > tmp32 += env[0];
> > > tmp32 += env[1];
> > > ...
> > > tmp32 += env[9];
> > > Therefore, this loop is extremely cheap and doing an early return is
> probably
> > > not worth it, and would potentially even increase the complexity.
> > > 
> > > The compiler would possibly be able to do good pipelining of the
> early-return
> > > variant of the loop but it is at least not as straightforward to do that,
> and
> > > the break statement will anyway break the pipeline for the loop. Also,
loop
> > > unrolling would be less nice, e.g.,
> > > if (env[0] >= kZeroThreshold || tmp32 >= kZeroThreshold) {
> > >   goto early_return_label;
> > > }
> > > tmp32 += env[0];
> > > if (env[1] >= kZeroThreshold || tmp32 >= kZeroThreshold) {
> > >   goto early_return_label;
> > > }
> > > tmp32 += env[1];
> > > 
> > > ...
> > > if (env[9] >= kZeroThreshold || tmp32 >= kZeroThreshold) {
> > >   goto early_return_label;
> > > }
> > > tmp32 += env[9];
> > > early_return_label:
> > > 
> > > 
> > > (The compiler would probably optimize the loop unrolling above to look
very
> > > different though).
> > > 
> > > If you expect the test to often be invalidated at index 0, I'd then
instead
> > > change the code to be, 
> > > int64_t tmp64 = env[0];
> > > if (tmp64 < kZeroThreshold) {
> > >   for (i = 1; i < 10; ++i) {
> > >     tmp64 = env[i];
> > >   }
> > > }
> > > 
> > > That would provide a really efficient early return, and would give a very
> nice
> > > optimize able loop. I don't think it is worth the extra code though, as
this
> > is
> > > an extremely cheap loop for the compiler to perform, and as these types of
> > > optimization only affect the average level of the complexity which means
> that
> > it
> > > is mainly a way to save electrical power of the CPU (i.e., does not affect
> the
> > > algorithmic complexity).
> > > 
> > > 
> > > As I see it, the only reason for not going for 64 bit variables should be
a
> > high
> > > cost of the 64 bit additions. Do you know if that is the case?
> > > 
> > > 
> > I can basically follow your suggestion, but I think it is a chance to learn
> > something. Let me continue the discussion a bit.
> > 
> > I think pipelining is useful if the lines in the loop can be parallized,
this
> > wiki page has good explanation.
> > 
> > https://en.wikipedia.org/wiki/Software_pipelining
> > 
> > For summation, parallization is not possible.
> 
> I don't know this as well as I should do, but I meant primarily hardware
> pipelining https://en.wikipedia.org/wiki/Instruction_pipelining
> 
> For summation, the lines in the loop can definitely be parallellized in
hardware
> (probably in this case via loop-unrolling though. 
> 
> E.g.
> sum = A+B+C+D can be parallelized as
> E=A+B
> F=C+D
> sum=E+F 
> where E and F can be computed in parallel independently of each other.
> 
>  

ok, I can do int64 sum.

https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro...
File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right):

https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro...
webrtc/modules/audio_processing/agc/legacy/digital_agc.c:202: y32 = numFIX /
tmp32no1;  // in Q14
On 2016/05/24 10:46:29, peah-webrtc wrote:
> On 2016/05/24 07:16:12, minyue-webrtc wrote:
> > On 2016/05/24 06:16:20, peah-webrtc wrote:
> > > On 2016/05/24 01:32:26, minyue-webrtc wrote:
> > > > On 2016/05/23 12:00:19, peah-webrtc wrote:
> > > > > I don't think this is the right approach, even though it may work.
> > > > > It is quite complicated and it also makes use of the % operator which
I
> > > expect
> > > > > to be as complex as a division in fixed point code.
> > > > > 
> > > > > The standard way of solving this is to normalize one step less, which
> will
> > > > give
> > > > > you headroom in the addition. I think you should do that, as it leads
to
> > > much
> > > > > simpler code and is cheaper as well.
> > > > > 
> > > > > Furthermore, I don't think the addition of
> > > > > tmp32no1 / 2 is really needed. It should be sufficient to add 1<<
> > (zeros-1)
> > > > (or
> > > > > subtract that, dependent on the sign of numFix). 
> > > > > That is the standard way of doing rounding in fixed point code.
> > > > 
> > > > + 1<<(zeros-1) is to do ceil(), what we want here is round(). WDYT
> > > 
> > > Sorry, I realize that I was wrong about that the addition of tmp32no1 / 2
> not
> > > being needed. My comment about that was not correct at all.
> > > 
> > > However, I still think that setting zeros to one less than it is done now
is
> > the
> > > correct solution, as that will should give headroom for the addition of
> > > tmp32nol/2. 
> > > 
> > > Changing the value of zeros to one less does not change the rounding to a
> > ceil.
> > > It only affects the accuracy of the division.
> > > WDYT?
> > 
> > I also realized that +1<<(zero-1) does not do ceil(). But that is not the
core
> > part of the discussion. What worth discuss is weather we want a precise
> round()
> > or not. In my opinion, finding the nearest integer should be most
reasonable.
> 1)
> > it is well defined, 2) we do all shifting above to improve accuracy, and we
> > should not be too relaxed here, 3) Computationally, it should not be much
more
> > complex. and if you think % will do another /, we may calculate remainder by
> > numFIX - y32 * tmp32nol.
> 
> I strongly doubt that we really need a rounding division here as this is a
> gaintable computation. Do you know what negative impact a truncating division
> would have?
> 
> The drawback of doing a zeros-1 normalization instead of a zeros normalization
> is that you lose 1 bit accuracy in the 32 bit divisions. I would say that is
> acceptable, in particular as the rest of the computations are in a fixed Q
> value. 
> 
> Furthermore, looking at your code for this, I'm actually not sure it is
> sufficient to avoid overflow. How can you guarantee that y32++ does not
> overflow? (I could not come up with a good example for that though.)
> 

1) y32++ and -- is surely safe, since as long as |den| > 1,  the ratio will be
able to add/minus 1.

2) Regarding shifting zero -1 bits, I have thought about it. It is not totally
safe. what numFIX is already large (zero = 0)?

3) Not adding tmp32nol / 2 is good, but we loose the precision of nearest
integer.

If you like, I can go for 3).

peah-webrtc

https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_processing/agc/legacy/digital_agc.c File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right): https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_processing/agc/legacy/digital_agc.c#newcode202 webrtc/modules/audio_processing/agc/legacy/digital_agc.c:202: y32 = numFIX / tmp32no1; // in Q14 On ...

4 years, 7 months ago (2016-05-24 13:37:10 UTC) #21

https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro...
File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right):

https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro...
webrtc/modules/audio_processing/agc/legacy/digital_agc.c:202: y32 = numFIX /
tmp32no1;  // in Q14
On 2016/05/24 13:05:45, minyue-webrtc wrote:
> On 2016/05/24 10:46:29, peah-webrtc wrote:
> > On 2016/05/24 07:16:12, minyue-webrtc wrote:
> > > On 2016/05/24 06:16:20, peah-webrtc wrote:
> > > > On 2016/05/24 01:32:26, minyue-webrtc wrote:
> > > > > On 2016/05/23 12:00:19, peah-webrtc wrote:
> > > > > > I don't think this is the right approach, even though it may work.
> > > > > > It is quite complicated and it also makes use of the % operator
which
> I
> > > > expect
> > > > > > to be as complex as a division in fixed point code.
> > > > > > 
> > > > > > The standard way of solving this is to normalize one step less,
which
> > will
> > > > > give
> > > > > > you headroom in the addition. I think you should do that, as it
leads
> to
> > > > much
> > > > > > simpler code and is cheaper as well.
> > > > > > 
> > > > > > Furthermore, I don't think the addition of
> > > > > > tmp32no1 / 2 is really needed. It should be sufficient to add 1<<
> > > (zeros-1)
> > > > > (or
> > > > > > subtract that, dependent on the sign of numFix). 
> > > > > > That is the standard way of doing rounding in fixed point code.
> > > > > 
> > > > > + 1<<(zeros-1) is to do ceil(), what we want here is round(). WDYT
> > > > 
> > > > Sorry, I realize that I was wrong about that the addition of tmp32no1 /
2
> > not
> > > > being needed. My comment about that was not correct at all.
> > > > 
> > > > However, I still think that setting zeros to one less than it is done
now
> is
> > > the
> > > > correct solution, as that will should give headroom for the addition of
> > > > tmp32nol/2. 
> > > > 
> > > > Changing the value of zeros to one less does not change the rounding to
a
> > > ceil.
> > > > It only affects the accuracy of the division.
> > > > WDYT?
> > > 
> > > I also realized that +1<<(zero-1) does not do ceil(). But that is not the
> core
> > > part of the discussion. What worth discuss is weather we want a precise
> > round()
> > > or not. In my opinion, finding the nearest integer should be most
> reasonable.
> > 1)
> > > it is well defined, 2) we do all shifting above to improve accuracy, and
we
> > > should not be too relaxed here, 3) Computationally, it should not be much
> more
> > > complex. and if you think % will do another /, we may calculate remainder
by
> > > numFIX - y32 * tmp32nol.
> > 
> > I strongly doubt that we really need a rounding division here as this is a
> > gaintable computation. Do you know what negative impact a truncating
division
> > would have?
> > 
> > The drawback of doing a zeros-1 normalization instead of a zeros
normalization
> > is that you lose 1 bit accuracy in the 32 bit divisions. I would say that is
> > acceptable, in particular as the rest of the computations are in a fixed Q
> > value. 
> > 
> > Furthermore, looking at your code for this, I'm actually not sure it is
> > sufficient to avoid overflow. How can you guarantee that y32++ does not
> > overflow? (I could not come up with a good example for that though.)
> > 
> 
> 1) y32++ and -- is surely safe, since as long as |den| > 1,  the ratio will be
> able to add/minus 1.
> 
> 2) Regarding shifting zero -1 bits, I have thought about it. It is not totally
> safe. what numFIX is already large (zero = 0)?
> 
> 3) Not adding tmp32nol / 2 is good, but we loose the precision of nearest
> integer.
> 
> If you like, I can go for 3).

Looking at it again, I see that the division is wrongly done. It should be
zeros = WebRtcSpl_NormW32(numFIX);
numFIX *= 1 << zeros;  // Q(14+zeros)
int sft = WebRtcSpl_NormW32(den) - 16;
tmp32no1 = WEBRTC_SPL_SHIFT_W32(den, -sft); // Q(8-sft) (ensures 16 bit
precision of the division)
y32 = numFIX / tmp32no1;  // Q(14+zeros-8+sft)
sft = 8 - (14+zeros-8+sft);
y32 += (1<<(sft-1)); // Adding 0.5 Q8 to ensure proper rounding 
y32 = WEBRTC_SPL_SHIFT_W32(y32, -sft);  // Q(8) (rounded)

Please doublecheck the above though.

minyue-webrtc

https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_processing/agc/legacy/digital_agc.c File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right): https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_processing/agc/legacy/digital_agc.c#newcode202 webrtc/modules/audio_processing/agc/legacy/digital_agc.c:202: y32 = numFIX / tmp32no1; // in Q14 On ...

4 years, 6 months ago (2016-05-26 02:28:20 UTC) #22

https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro...
File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right):

https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro...
webrtc/modules/audio_processing/agc/legacy/digital_agc.c:202: y32 = numFIX /
tmp32no1;  // in Q14
On 2016/05/24 13:37:10, peah-webrtc wrote:
> On 2016/05/24 13:05:45, minyue-webrtc wrote:
> > On 2016/05/24 10:46:29, peah-webrtc wrote:
> > > On 2016/05/24 07:16:12, minyue-webrtc wrote:
> > > > On 2016/05/24 06:16:20, peah-webrtc wrote:
> > > > > On 2016/05/24 01:32:26, minyue-webrtc wrote:
> > > > > > On 2016/05/23 12:00:19, peah-webrtc wrote:
> > > > > > > I don't think this is the right approach, even though it may work.
> > > > > > > It is quite complicated and it also makes use of the % operator
> which
> > I
> > > > > expect
> > > > > > > to be as complex as a division in fixed point code.
> > > > > > > 
> > > > > > > The standard way of solving this is to normalize one step less,
> which
> > > will
> > > > > > give
> > > > > > > you headroom in the addition. I think you should do that, as it
> leads
> > to
> > > > > much
> > > > > > > simpler code and is cheaper as well.
> > > > > > > 
> > > > > > > Furthermore, I don't think the addition of
> > > > > > > tmp32no1 / 2 is really needed. It should be sufficient to add 1<<
> > > > (zeros-1)
> > > > > > (or
> > > > > > > subtract that, dependent on the sign of numFix). 
> > > > > > > That is the standard way of doing rounding in fixed point code.
> > > > > > 
> > > > > > + 1<<(zeros-1) is to do ceil(), what we want here is round(). WDYT
> > > > > 
> > > > > Sorry, I realize that I was wrong about that the addition of tmp32no1
/
> 2
> > > not
> > > > > being needed. My comment about that was not correct at all.
> > > > > 
> > > > > However, I still think that setting zeros to one less than it is done
> now
> > is
> > > > the
> > > > > correct solution, as that will should give headroom for the addition
of
> > > > > tmp32nol/2. 
> > > > > 
> > > > > Changing the value of zeros to one less does not change the rounding
to
> a
> > > > ceil.
> > > > > It only affects the accuracy of the division.
> > > > > WDYT?
> > > > 
> > > > I also realized that +1<<(zero-1) does not do ceil(). But that is not
the
> > core
> > > > part of the discussion. What worth discuss is weather we want a precise
> > > round()
> > > > or not. In my opinion, finding the nearest integer should be most
> > reasonable.
> > > 1)
> > > > it is well defined, 2) we do all shifting above to improve accuracy, and
> we
> > > > should not be too relaxed here, 3) Computationally, it should not be
much
> > more
> > > > complex. and if you think % will do another /, we may calculate
remainder
> by
> > > > numFIX - y32 * tmp32nol.
> > > 
> > > I strongly doubt that we really need a rounding division here as this is a
> > > gaintable computation. Do you know what negative impact a truncating
> division
> > > would have?
> > > 
> > > The drawback of doing a zeros-1 normalization instead of a zeros
> normalization
> > > is that you lose 1 bit accuracy in the 32 bit divisions. I would say that
is
> > > acceptable, in particular as the rest of the computations are in a fixed Q
> > > value. 
> > > 
> > > Furthermore, looking at your code for this, I'm actually not sure it is
> > > sufficient to avoid overflow. How can you guarantee that y32++ does not
> > > overflow? (I could not come up with a good example for that though.)
> > > 
> > 
> > 1) y32++ and -- is surely safe, since as long as |den| > 1,  the ratio will
be
> > able to add/minus 1.
> > 
> > 2) Regarding shifting zero -1 bits, I have thought about it. It is not
totally
> > safe. what numFIX is already large (zero = 0)?
> > 
> > 3) Not adding tmp32nol / 2 is good, but we loose the precision of nearest
> > integer.
> > 
> > If you like, I can go for 3).
> 
> Looking at it again, I see that the division is wrongly done. It should be
> zeros = WebRtcSpl_NormW32(numFIX);
> numFIX *= 1 << zeros;  // Q(14+zeros)
> int sft = WebRtcSpl_NormW32(den) - 16;
> tmp32no1 = WEBRTC_SPL_SHIFT_W32(den, -sft); // Q(8-sft) (ensures 16 bit
> precision of the division)
> y32 = numFIX / tmp32no1;  // Q(14+zeros-8+sft)
> sft = 8 - (14+zeros-8+sft);
> y32 += (1<<(sft-1)); // Adding 0.5 Q8 to ensure proper rounding 
> y32 = WEBRTC_SPL_SHIFT_W32(y32, -sft);  // Q(8) (rounded)
> 
> Please doublecheck the above though.
>  

not really. y32 is supposed to be Q14, and the rounding by "+tmp32no1 / 2"
(before my change) is only to +/-2^(-14), rather than +/-2^(-8). So the division
was meant to be very precise. What you propose makes y32 Q8 rounded, that is
different from the original code.

peah-webrtc

https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_processing/agc/legacy/digital_agc.c File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right): https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_processing/agc/legacy/digital_agc.c#newcode202 webrtc/modules/audio_processing/agc/legacy/digital_agc.c:202: y32 = numFIX / tmp32no1; // in Q14 On ...

4 years, 6 months ago (2016-05-26 07:48:38 UTC) #23

https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro...
File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right):

https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro...
webrtc/modules/audio_processing/agc/legacy/digital_agc.c:202: y32 = numFIX /
tmp32no1;  // in Q14
On 2016/05/26 02:28:20, minyue-webrtc wrote:
> On 2016/05/24 13:37:10, peah-webrtc wrote:
> > On 2016/05/24 13:05:45, minyue-webrtc wrote:
> > > On 2016/05/24 10:46:29, peah-webrtc wrote:
> > > > On 2016/05/24 07:16:12, minyue-webrtc wrote:
> > > > > On 2016/05/24 06:16:20, peah-webrtc wrote:
> > > > > > On 2016/05/24 01:32:26, minyue-webrtc wrote:
> > > > > > > On 2016/05/23 12:00:19, peah-webrtc wrote:
> > > > > > > > I don't think this is the right approach, even though it may
work.
> > > > > > > > It is quite complicated and it also makes use of the % operator
> > which
> > > I
> > > > > > expect
> > > > > > > > to be as complex as a division in fixed point code.
> > > > > > > > 
> > > > > > > > The standard way of solving this is to normalize one step less,
> > which
> > > > will
> > > > > > > give
> > > > > > > > you headroom in the addition. I think you should do that, as it
> > leads
> > > to
> > > > > > much
> > > > > > > > simpler code and is cheaper as well.
> > > > > > > > 
> > > > > > > > Furthermore, I don't think the addition of
> > > > > > > > tmp32no1 / 2 is really needed. It should be sufficient to add
1<<
> > > > > (zeros-1)
> > > > > > > (or
> > > > > > > > subtract that, dependent on the sign of numFix). 
> > > > > > > > That is the standard way of doing rounding in fixed point code.
> > > > > > > 
> > > > > > > + 1<<(zeros-1) is to do ceil(), what we want here is round(). WDYT
> > > > > > 
> > > > > > Sorry, I realize that I was wrong about that the addition of
tmp32no1
> /
> > 2
> > > > not
> > > > > > being needed. My comment about that was not correct at all.
> > > > > > 
> > > > > > However, I still think that setting zeros to one less than it is
done
> > now
> > > is
> > > > > the
> > > > > > correct solution, as that will should give headroom for the addition
> of
> > > > > > tmp32nol/2. 
> > > > > > 
> > > > > > Changing the value of zeros to one less does not change the rounding
> to
> > a
> > > > > ceil.
> > > > > > It only affects the accuracy of the division.
> > > > > > WDYT?
> > > > > 
> > > > > I also realized that +1<<(zero-1) does not do ceil(). But that is not
> the
> > > core
> > > > > part of the discussion. What worth discuss is weather we want a
precise
> > > > round()
> > > > > or not. In my opinion, finding the nearest integer should be most
> > > reasonable.
> > > > 1)
> > > > > it is well defined, 2) we do all shifting above to improve accuracy,
and
> > we
> > > > > should not be too relaxed here, 3) Computationally, it should not be
> much
> > > more
> > > > > complex. and if you think % will do another /, we may calculate
> remainder
> > by
> > > > > numFIX - y32 * tmp32nol.
> > > > 
> > > > I strongly doubt that we really need a rounding division here as this is
a
> > > > gaintable computation. Do you know what negative impact a truncating
> > division
> > > > would have?
> > > > 
> > > > The drawback of doing a zeros-1 normalization instead of a zeros
> > normalization
> > > > is that you lose 1 bit accuracy in the 32 bit divisions. I would say
that
> is
> > > > acceptable, in particular as the rest of the computations are in a fixed
Q
> > > > value. 
> > > > 
> > > > Furthermore, looking at your code for this, I'm actually not sure it is
> > > > sufficient to avoid overflow. How can you guarantee that y32++ does not
> > > > overflow? (I could not come up with a good example for that though.)
> > > > 
> > > 
> > > 1) y32++ and -- is surely safe, since as long as |den| > 1,  the ratio
will
> be
> > > able to add/minus 1.
> > > 
> > > 2) Regarding shifting zero -1 bits, I have thought about it. It is not
> totally
> > > safe. what numFIX is already large (zero = 0)?
> > > 
> > > 3) Not adding tmp32nol / 2 is good, but we loose the precision of nearest
> > > integer.
> > > 
> > > If you like, I can go for 3).
> > 
> > Looking at it again, I see that the division is wrongly done. It should be
> > zeros = WebRtcSpl_NormW32(numFIX);
> > numFIX *= 1 << zeros;  // Q(14+zeros)
> > int sft = WebRtcSpl_NormW32(den) - 16;
> > tmp32no1 = WEBRTC_SPL_SHIFT_W32(den, -sft); // Q(8-sft) (ensures 16 bit
> > precision of the division)
> > y32 = numFIX / tmp32no1;  // Q(14+zeros-8+sft)
> > sft = 8 - (14+zeros-8+sft);
> > y32 += (1<<(sft-1)); // Adding 0.5 Q8 to ensure proper rounding 
> > y32 = WEBRTC_SPL_SHIFT_W32(y32, -sft);  // Q(8) (rounded)
> > 
> > Please doublecheck the above though.
> >  
> 
> not really. y32 is supposed to be Q14, and the rounding by "+tmp32no1 / 2"
> (before my change) is only to +/-2^(-14), rather than +/-2^(-8). So the
division
> was meant to be very precise. What you propose makes y32 Q8 rounded, that is
> different from the original code.

Good find, sorry about that.  I also realized that I did not add 1 bit of
headroom for the rounding addition With this, I think that the below should be
the correct code. Note, however, that furthermore, adding 0.5 Q14 only is valid
for positive numbers so the code example needs to be modified accordingly. 

zeros = WebRtcSpl_NormW32(abs(numFIX)) -1;
numFIX = abs(numFix) * (1 << zeros);  // Q(14+zeros)
int sft = WebRtcSpl_NormW32(den) - 16;
tmp32no1 = WEBRTC_SPL_SHIFT_W32(den, -sft); // Q(8-sft) (ensures 16 bit
precision of the division)
y32 = numFIX / tmp32no1;  // Q(14+zeros-8+sft)
sft = 14 - (14+zeros-8+sft);
y32  += ((numFix > 0) - (numFix < 0)) * (1<<(sft-1)); // Adding 0.5 Q14 to
ensure proper rounding 
y32 = WEBRTC_SPL_SHIFT_W32(y32, -sft);  // Q(14) (rounded)


My point with the above is, however, that adding tmp32no1 / 2 is not necessary
for the rounding as the rounding is based on the bit that is shifted out when
placing y32 into Q14. 

To illustrate
5 (Q0)  is
3 (Q1) if rounded
2 (Q1) if not rounded

For this the rounding operation would be
101 (Q0) => (101 + 001) >> 1 (Q1) = 11 (Q1)
and the non-rounding operation would be
101 (Q0) => 101 >> 1 (Q1) = 10 (Q1)

If you really want to avoid the extra bit of headroom that avoids overflow when
doing the rounding addition you could check the value of numFIX = abs(numFix) *
(1 << zeros); and if that is so large that the addition will overflow then re-do
the shift with zeros-1. I do, however, not think that is worthwhile.

The above described scheme is the standard fixed-point way of doing this. If you
want to do a simple fix, I would go for zeros-1 shifts. It is, as you point out,
not perfect either if numFix is already normalized (zeros=0), but you will then
anyway have potential problems with cases when den < 2^8 as then zeros==0 will
cause den to be shifted to zero which will cause a division by zero. I think
this division have several implicit assumptions about the magnitude of the
variables anyway and those needs to be taken into account when designing the
division.

peah-webrtc

https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_processing/agc/legacy/digital_agc.c File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right): https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_processing/agc/legacy/digital_agc.c#newcode202 webrtc/modules/audio_processing/agc/legacy/digital_agc.c:202: y32 = numFIX / tmp32no1; // in Q14 On ...

4 years, 6 months ago (2016-05-26 12:27:54 UTC) #24

https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro...
File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right):

https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro...
webrtc/modules/audio_processing/agc/legacy/digital_agc.c:202: y32 = numFIX /
tmp32no1;  // in Q14
On 2016/05/26 07:48:38, peah-webrtc wrote:
> On 2016/05/26 02:28:20, minyue-webrtc wrote:
> > On 2016/05/24 13:37:10, peah-webrtc wrote:
> > > On 2016/05/24 13:05:45, minyue-webrtc wrote:
> > > > On 2016/05/24 10:46:29, peah-webrtc wrote:
> > > > > On 2016/05/24 07:16:12, minyue-webrtc wrote:
> > > > > > On 2016/05/24 06:16:20, peah-webrtc wrote:
> > > > > > > On 2016/05/24 01:32:26, minyue-webrtc wrote:
> > > > > > > > On 2016/05/23 12:00:19, peah-webrtc wrote:
> > > > > > > > > I don't think this is the right approach, even though it may
> work.
> > > > > > > > > It is quite complicated and it also makes use of the %
operator
> > > which
> > > > I
> > > > > > > expect
> > > > > > > > > to be as complex as a division in fixed point code.
> > > > > > > > > 
> > > > > > > > > The standard way of solving this is to normalize one step
less,
> > > which
> > > > > will
> > > > > > > > give
> > > > > > > > > you headroom in the addition. I think you should do that, as
it
> > > leads
> > > > to
> > > > > > > much
> > > > > > > > > simpler code and is cheaper as well.
> > > > > > > > > 
> > > > > > > > > Furthermore, I don't think the addition of
> > > > > > > > > tmp32no1 / 2 is really needed. It should be sufficient to add
> 1<<
> > > > > > (zeros-1)
> > > > > > > > (or
> > > > > > > > > subtract that, dependent on the sign of numFix). 
> > > > > > > > > That is the standard way of doing rounding in fixed point
code.
> > > > > > > > 
> > > > > > > > + 1<<(zeros-1) is to do ceil(), what we want here is round().
WDYT
> > > > > > > 
> > > > > > > Sorry, I realize that I was wrong about that the addition of
> tmp32no1
> > /
> > > 2
> > > > > not
> > > > > > > being needed. My comment about that was not correct at all.
> > > > > > > 
> > > > > > > However, I still think that setting zeros to one less than it is
> done
> > > now
> > > > is
> > > > > > the
> > > > > > > correct solution, as that will should give headroom for the
addition
> > of
> > > > > > > tmp32nol/2. 
> > > > > > > 
> > > > > > > Changing the value of zeros to one less does not change the
rounding
> > to
> > > a
> > > > > > ceil.
> > > > > > > It only affects the accuracy of the division.
> > > > > > > WDYT?
> > > > > > 
> > > > > > I also realized that +1<<(zero-1) does not do ceil(). But that is
not
> > the
> > > > core
> > > > > > part of the discussion. What worth discuss is weather we want a
> precise
> > > > > round()
> > > > > > or not. In my opinion, finding the nearest integer should be most
> > > > reasonable.
> > > > > 1)
> > > > > > it is well defined, 2) we do all shifting above to improve accuracy,
> and
> > > we
> > > > > > should not be too relaxed here, 3) Computationally, it should not be
> > much
> > > > more
> > > > > > complex. and if you think % will do another /, we may calculate
> > remainder
> > > by
> > > > > > numFIX - y32 * tmp32nol.
> > > > > 
> > > > > I strongly doubt that we really need a rounding division here as this
is
> a
> > > > > gaintable computation. Do you know what negative impact a truncating
> > > division
> > > > > would have?
> > > > > 
> > > > > The drawback of doing a zeros-1 normalization instead of a zeros
> > > normalization
> > > > > is that you lose 1 bit accuracy in the 32 bit divisions. I would say
> that
> > is
> > > > > acceptable, in particular as the rest of the computations are in a
fixed
> Q
> > > > > value. 
> > > > > 
> > > > > Furthermore, looking at your code for this, I'm actually not sure it
is
> > > > > sufficient to avoid overflow. How can you guarantee that y32++ does
not
> > > > > overflow? (I could not come up with a good example for that though.)
> > > > > 
> > > > 
> > > > 1) y32++ and -- is surely safe, since as long as |den| > 1,  the ratio
> will
> > be
> > > > able to add/minus 1.
> > > > 
> > > > 2) Regarding shifting zero -1 bits, I have thought about it. It is not
> > totally
> > > > safe. what numFIX is already large (zero = 0)?
> > > > 
> > > > 3) Not adding tmp32nol / 2 is good, but we loose the precision of
nearest
> > > > integer.
> > > > 
> > > > If you like, I can go for 3).
> > > 
> > > Looking at it again, I see that the division is wrongly done. It should be
> > > zeros = WebRtcSpl_NormW32(numFIX);
> > > numFIX *= 1 << zeros;  // Q(14+zeros)
> > > int sft = WebRtcSpl_NormW32(den) - 16;
> > > tmp32no1 = WEBRTC_SPL_SHIFT_W32(den, -sft); // Q(8-sft) (ensures 16 bit
> > > precision of the division)
> > > y32 = numFIX / tmp32no1;  // Q(14+zeros-8+sft)
> > > sft = 8 - (14+zeros-8+sft);
> > > y32 += (1<<(sft-1)); // Adding 0.5 Q8 to ensure proper rounding 
> > > y32 = WEBRTC_SPL_SHIFT_W32(y32, -sft);  // Q(8) (rounded)
> > > 
> > > Please doublecheck the above though.
> > >  
> > 
> > not really. y32 is supposed to be Q14, and the rounding by "+tmp32no1 / 2"
> > (before my change) is only to +/-2^(-14), rather than +/-2^(-8). So the
> division
> > was meant to be very precise. What you propose makes y32 Q8 rounded, that is
> > different from the original code.
> 
> Good find, sorry about that.  I also realized that I did not add 1 bit of
> headroom for the rounding addition With this, I think that the below should be
> the correct code. Note, however, that furthermore, adding 0.5 Q14 only is
valid
> for positive numbers so the code example needs to be modified accordingly. 
> 
> zeros = WebRtcSpl_NormW32(abs(numFIX)) -1;
> numFIX = abs(numFix) * (1 << zeros);  // Q(14+zeros)
> int sft = WebRtcSpl_NormW32(den) - 16;
> tmp32no1 = WEBRTC_SPL_SHIFT_W32(den, -sft); // Q(8-sft) (ensures 16 bit
> precision of the division)
> y32 = numFIX / tmp32no1;  // Q(14+zeros-8+sft)
> sft = 14 - (14+zeros-8+sft);
> y32  += ((numFix > 0) - (numFix < 0)) * (1<<(sft-1)); // Adding 0.5 Q14 to
> ensure proper rounding 
> y32 = WEBRTC_SPL_SHIFT_W32(y32, -sft);  // Q(14) (rounded)
> 
> 
> My point with the above is, however, that adding tmp32no1 / 2 is not necessary
> for the rounding as the rounding is based on the bit that is shifted out when
> placing y32 into Q14. 
> 
> To illustrate
> 5 (Q0)  is
> 3 (Q1) if rounded
> 2 (Q1) if not rounded
> 
> For this the rounding operation would be
> 101 (Q0) => (101 + 001) >> 1 (Q1) = 11 (Q1)
> and the non-rounding operation would be
> 101 (Q0) => 101 >> 1 (Q1) = 10 (Q1)
> 
> If you really want to avoid the extra bit of headroom that avoids overflow
when
> doing the rounding addition you could check the value of numFIX = abs(numFix)
*
> (1 << zeros); and if that is so large that the addition will overflow then
re-do
> the shift with zeros-1. I do, however, not think that is worthwhile.
> 
> The above described scheme is the standard fixed-point way of doing this. If
you
> want to do a simple fix, I would go for zeros-1 shifts. It is, as you point
out,
> not perfect either if numFix is already normalized (zeros=0), but you will
then
> anyway have potential problems with cases when den < 2^8 as then zeros==0 will
> cause den to be shifted to zero which will cause a division by zero. I think
> this division have several implicit assumptions about the magnitude of the
> variables anyway and those needs to be taken into account when designing the
> division.

Having looked at it again, I think the proper code should be as below. But
please doublecheck it as it is really easy to get that wrong.

int sign_numfix = (numFix > 0) - (numFix < 0);                                  
                                                          
numFIX = abs(numFIX);                                                           
                                                          
zeros = WebRtcSpl_NormW32(numFIX) - 1;                                          
                                                          

// Doing this in proper shifts is faster than using WEBRTC_SPL_SHIFT_W32 as that
one both shifts and multiplies.
if (zeros < 0)
  numFIX = numFix >> abs(zeros);            // Q(14+zeros)                      
                                                          
else if (sft > 0)
  numFIX = numFix << zeros;                 // Q(14+zeros)                      
                                                          

int sign_den = (den > 0) - (den < 0);                                           
                                                          
den = abs(den);                             // Q(8)                             
                                                          
int sft = WebRtcSpl_NormW32(den) - 16;      // -16 to ensure 16 bit precision in
the result of the division.                               

// Doing this in proper shifts is faster than using WEBRTC_SPL_SHIFT_W32 as that
one both shifts and multiplies.
if (sft < 0)
  tmp32no1 = den >> abs(sft);               // Q(8-sft)                         
                                                          
else if (sft > 0)
  tmp32no1 = den << sft;                    // Q(8-sft)                         
                                                          


y32 = numFIX / tmp32no1;                    // Q(14+zeros-(8-sft))              
                                                          
sft = 14 - (14+zeros-8+sft);                                                    
                                                          

// Only round if shifting downwards.
if ((sft-1) <= 0)                                                               
                                                          
  y32  +=  (1<<abs((sft-1)));                      // Adding 0.5 Q14 to ensure
proper rounding                                             

if (sft < 0)
  y32 = y32 >> abs(sft);                    // Q(14) (rounded)                  
                                                          
else if (sft > 0)
  y32 = y32 << sft;                         // Q(14) (rounded)                  
                                                          

y32 *= int sign_numfix * int sign_den;

minyue-webrtc

Thank you for the inspiring idea! I simplified your approach and uploaded a new patch. ...

4 years, 6 months ago (2016-06-01 04:45:11 UTC) #25

Thank you for the inspiring idea! I simplified your approach and uploaded a new
patch. Bit exactness shows that it gives the same result as my earlier (more
complicated) solution.

https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro...
File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right):

https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro...
webrtc/modules/audio_processing/agc/legacy/digital_agc.c:202: y32 = numFIX /
tmp32no1;  // in Q14
On 2016/05/26 12:27:54, peah-webrtc wrote:
> On 2016/05/26 07:48:38, peah-webrtc wrote:
> > On 2016/05/26 02:28:20, minyue-webrtc wrote:
> > > On 2016/05/24 13:37:10, peah-webrtc wrote:
> > > > On 2016/05/24 13:05:45, minyue-webrtc wrote:
> > > > > On 2016/05/24 10:46:29, peah-webrtc wrote:
> > > > > > On 2016/05/24 07:16:12, minyue-webrtc wrote:
> > > > > > > On 2016/05/24 06:16:20, peah-webrtc wrote:
> > > > > > > > On 2016/05/24 01:32:26, minyue-webrtc wrote:
> > > > > > > > > On 2016/05/23 12:00:19, peah-webrtc wrote:
> > > > > > > > > > I don't think this is the right approach, even though it may
> > work.
> > > > > > > > > > It is quite complicated and it also makes use of the %
> operator
> > > > which
> > > > > I
> > > > > > > > expect
> > > > > > > > > > to be as complex as a division in fixed point code.
> > > > > > > > > > 
> > > > > > > > > > The standard way of solving this is to normalize one step
> less,
> > > > which
> > > > > > will
> > > > > > > > > give
> > > > > > > > > > you headroom in the addition. I think you should do that, as
> it
> > > > leads
> > > > > to
> > > > > > > > much
> > > > > > > > > > simpler code and is cheaper as well.
> > > > > > > > > > 
> > > > > > > > > > Furthermore, I don't think the addition of
> > > > > > > > > > tmp32no1 / 2 is really needed. It should be sufficient to
add
> > 1<<
> > > > > > > (zeros-1)
> > > > > > > > > (or
> > > > > > > > > > subtract that, dependent on the sign of numFix). 
> > > > > > > > > > That is the standard way of doing rounding in fixed point
> code.
> > > > > > > > > 
> > > > > > > > > + 1<<(zeros-1) is to do ceil(), what we want here is round().
> WDYT
> > > > > > > > 
> > > > > > > > Sorry, I realize that I was wrong about that the addition of
> > tmp32no1
> > > /
> > > > 2
> > > > > > not
> > > > > > > > being needed. My comment about that was not correct at all.
> > > > > > > > 
> > > > > > > > However, I still think that setting zeros to one less than it is
> > done
> > > > now
> > > > > is
> > > > > > > the
> > > > > > > > correct solution, as that will should give headroom for the
> addition
> > > of
> > > > > > > > tmp32nol/2. 
> > > > > > > > 
> > > > > > > > Changing the value of zeros to one less does not change the
> rounding
> > > to
> > > > a
> > > > > > > ceil.
> > > > > > > > It only affects the accuracy of the division.
> > > > > > > > WDYT?
> > > > > > > 
> > > > > > > I also realized that +1<<(zero-1) does not do ceil(). But that is
> not
> > > the
> > > > > core
> > > > > > > part of the discussion. What worth discuss is weather we want a
> > precise
> > > > > > round()
> > > > > > > or not. In my opinion, finding the nearest integer should be most
> > > > > reasonable.
> > > > > > 1)
> > > > > > > it is well defined, 2) we do all shifting above to improve
accuracy,
> > and
> > > > we
> > > > > > > should not be too relaxed here, 3) Computationally, it should not
be
> > > much
> > > > > more
> > > > > > > complex. and if you think % will do another /, we may calculate
> > > remainder
> > > > by
> > > > > > > numFIX - y32 * tmp32nol.
> > > > > > 
> > > > > > I strongly doubt that we really need a rounding division here as
this
> is
> > a
> > > > > > gaintable computation. Do you know what negative impact a truncating
> > > > division
> > > > > > would have?
> > > > > > 
> > > > > > The drawback of doing a zeros-1 normalization instead of a zeros
> > > > normalization
> > > > > > is that you lose 1 bit accuracy in the 32 bit divisions. I would say
> > that
> > > is
> > > > > > acceptable, in particular as the rest of the computations are in a
> fixed
> > Q
> > > > > > value. 
> > > > > > 
> > > > > > Furthermore, looking at your code for this, I'm actually not sure it
> is
> > > > > > sufficient to avoid overflow. How can you guarantee that y32++ does
> not
> > > > > > overflow? (I could not come up with a good example for that though.)
> > > > > > 
> > > > > 
> > > > > 1) y32++ and -- is surely safe, since as long as |den| > 1,  the ratio
> > will
> > > be
> > > > > able to add/minus 1.
> > > > > 
> > > > > 2) Regarding shifting zero -1 bits, I have thought about it. It is not
> > > totally
> > > > > safe. what numFIX is already large (zero = 0)?
> > > > > 
> > > > > 3) Not adding tmp32nol / 2 is good, but we loose the precision of
> nearest
> > > > > integer.
> > > > > 
> > > > > If you like, I can go for 3).
> > > > 
> > > > Looking at it again, I see that the division is wrongly done. It should
be
> > > > zeros = WebRtcSpl_NormW32(numFIX);
> > > > numFIX *= 1 << zeros;  // Q(14+zeros)
> > > > int sft = WebRtcSpl_NormW32(den) - 16;
> > > > tmp32no1 = WEBRTC_SPL_SHIFT_W32(den, -sft); // Q(8-sft) (ensures 16 bit
> > > > precision of the division)
> > > > y32 = numFIX / tmp32no1;  // Q(14+zeros-8+sft)
> > > > sft = 8 - (14+zeros-8+sft);
> > > > y32 += (1<<(sft-1)); // Adding 0.5 Q8 to ensure proper rounding 
> > > > y32 = WEBRTC_SPL_SHIFT_W32(y32, -sft);  // Q(8) (rounded)
> > > > 
> > > > Please doublecheck the above though.
> > > >  
> > > 
> > > not really. y32 is supposed to be Q14, and the rounding by "+tmp32no1 / 2"
> > > (before my change) is only to +/-2^(-14), rather than +/-2^(-8). So the
> > division
> > > was meant to be very precise. What you propose makes y32 Q8 rounded, that
is
> > > different from the original code.
> > 
> > Good find, sorry about that.  I also realized that I did not add 1 bit of
> > headroom for the rounding addition With this, I think that the below should
be
> > the correct code. Note, however, that furthermore, adding 0.5 Q14 only is
> valid
> > for positive numbers so the code example needs to be modified accordingly. 
> > 
> > zeros = WebRtcSpl_NormW32(abs(numFIX)) -1;
> > numFIX = abs(numFix) * (1 << zeros);  // Q(14+zeros)
> > int sft = WebRtcSpl_NormW32(den) - 16;
> > tmp32no1 = WEBRTC_SPL_SHIFT_W32(den, -sft); // Q(8-sft) (ensures 16 bit
> > precision of the division)
> > y32 = numFIX / tmp32no1;  // Q(14+zeros-8+sft)
> > sft = 14 - (14+zeros-8+sft);
> > y32  += ((numFix > 0) - (numFix < 0)) * (1<<(sft-1)); // Adding 0.5 Q14 to
> > ensure proper rounding 
> > y32 = WEBRTC_SPL_SHIFT_W32(y32, -sft);  // Q(14) (rounded)
> > 
> > 
> > My point with the above is, however, that adding tmp32no1 / 2 is not
necessary
> > for the rounding as the rounding is based on the bit that is shifted out
when
> > placing y32 into Q14. 
> > 
> > To illustrate
> > 5 (Q0)  is
> > 3 (Q1) if rounded
> > 2 (Q1) if not rounded
> > 
> > For this the rounding operation would be
> > 101 (Q0) => (101 + 001) >> 1 (Q1) = 11 (Q1)
> > and the non-rounding operation would be
> > 101 (Q0) => 101 >> 1 (Q1) = 10 (Q1)
> > 
> > If you really want to avoid the extra bit of headroom that avoids overflow
> when
> > doing the rounding addition you could check the value of numFIX =
abs(numFix)
> *
> > (1 << zeros); and if that is so large that the addition will overflow then
> re-do
> > the shift with zeros-1. I do, however, not think that is worthwhile.
> > 
> > The above described scheme is the standard fixed-point way of doing this. If
> you
> > want to do a simple fix, I would go for zeros-1 shifts. It is, as you point
> out,
> > not perfect either if numFix is already normalized (zeros=0), but you will
> then
> > anyway have potential problems with cases when den < 2^8 as then zeros==0
will
> > cause den to be shifted to zero which will cause a division by zero. I think
> > this division have several implicit assumptions about the magnitude of the
> > variables anyway and those needs to be taken into account when designing the
> > division.
> 
> Having looked at it again, I think the proper code should be as below. But
> please doublecheck it as it is really easy to get that wrong.
> 
> int sign_numfix = (numFix > 0) - (numFix < 0);                                
 
>                                                           
> numFIX = abs(numFIX);                                                         
 
>                                                           
> zeros = WebRtcSpl_NormW32(numFIX) - 1;                                        
 
>                                                           
> 
> // Doing this in proper shifts is faster than using WEBRTC_SPL_SHIFT_W32 as
that
> one both shifts and multiplies.
> if (zeros < 0)
>   numFIX = numFix >> abs(zeros);            // Q(14+zeros)                    
 
>                                                           
> else if (sft > 0)
>   numFIX = numFix << zeros;                 // Q(14+zeros)                    
 
>                                                           
> 
> int sign_den = (den > 0) - (den < 0);                                         
 
>                                                           
> den = abs(den);                             // Q(8)                           
 
>                                                           
> int sft = WebRtcSpl_NormW32(den) - 16;      // -16 to ensure 16 bit precision
in
> the result of the division.                               
> 
> // Doing this in proper shifts is faster than using WEBRTC_SPL_SHIFT_W32 as
that
> one both shifts and multiplies.
> if (sft < 0)
>   tmp32no1 = den >> abs(sft);               // Q(8-sft)                       
 
>                                                           
> else if (sft > 0)
>   tmp32no1 = den << sft;                    // Q(8-sft)                       
 
>                                                           
> 
> 
> y32 = numFIX / tmp32no1;                    // Q(14+zeros-(8-sft))            
 
>                                                           
> sft = 14 - (14+zeros-8+sft);                                                  
 
>                                                           
> 
> // Only round if shifting downwards.
> if ((sft-1) <= 0)                                                             
 
>                                                           
>   y32  +=  (1<<abs((sft-1)));                      // Adding 0.5 Q14 to ensure
> proper rounding                                             
> 
> if (sft < 0)
>   y32 = y32 >> abs(sft);                    // Q(14) (rounded)                
 
>                                                           
> else if (sft > 0)
>   y32 = y32 << sft;                         // Q(14) (rounded)                
 
>                                                           
> 
> y32 *= int sign_numfix * int sign_den;  
> 
> 
> 
I see the point: do division so that the quotient is Q15 or more, then add 1 and
right shift. I think Q15 is just right, no need for more. I simplify it and see
the new patch set.

peah-webrtc

https://codereview.webrtc.org/2003623003/diff/200001/webrtc/modules/audio_processing/agc/legacy/analog_agc.c File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right): https://codereview.webrtc.org/2003623003/diff/200001/webrtc/modules/audio_processing/agc/legacy/analog_agc.c#newcode483 webrtc/modules/audio_processing/agc/legacy/analog_agc.c:483: if (env[i] >= kZeroThreshold || tmp32 >= kZeroThreshold) { ...

4 years, 6 months ago (2016-06-03 05:10:35 UTC) #27

peah-webrtc

https://codereview.webrtc.org/2003623003/diff/200001/webrtc/modules/audio_processing/agc/legacy/digital_agc.c File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right): https://codereview.webrtc.org/2003623003/diff/200001/webrtc/modules/audio_processing/agc/legacy/digital_agc.c#newcode203 webrtc/modules/audio_processing/agc/legacy/digital_agc.c:203: y32 = (y32 + 1) / 2; // in ...

4 years, 6 months ago (2016-06-03 05:11:05 UTC) #28

minyue-webrtc

https://codereview.webrtc.org/2003623003/diff/200001/webrtc/modules/audio_processing/agc/legacy/digital_agc.c File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right): https://codereview.webrtc.org/2003623003/diff/200001/webrtc/modules/audio_processing/agc/legacy/digital_agc.c#newcode203 webrtc/modules/audio_processing/agc/legacy/digital_agc.c:203: y32 = (y32 + 1) / 2; // in ...

4 years, 6 months ago (2016-06-03 06:40:06 UTC) #29

peah-webrtc

https://codereview.webrtc.org/2003623003/diff/220001/webrtc/modules/audio_processing/agc/legacy/analog_agc.c File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right): https://codereview.webrtc.org/2003623003/diff/220001/webrtc/modules/audio_processing/agc/legacy/analog_agc.c#newcode483 webrtc/modules/audio_processing/agc/legacy/analog_agc.c:483: if (env[i] >= kZeroThreshold || tmp32 >= kZeroThreshold) { ...

4 years, 6 months ago (2016-06-03 06:51:12 UTC) #30

minyue-webrtc

On 2016/06/03 06:51:12, peah-webrtc wrote: > https://codereview.webrtc.org/2003623003/diff/220001/webrtc/modules/audio_processing/agc/legacy/analog_agc.c > File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right): > > https://codereview.webrtc.org/2003623003/diff/220001/webrtc/modules/audio_processing/agc/legacy/analog_agc.c#newcode483 > ...

4 years, 6 months ago (2016-06-03 07:29:21 UTC) #31

peah-webrtc

https://codereview.webrtc.org/2003623003/diff/240001/webrtc/modules/audio_processing/agc/legacy/analog_agc.c File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right): https://codereview.webrtc.org/2003623003/diff/240001/webrtc/modules/audio_processing/agc/legacy/analog_agc.c#newcode483 webrtc/modules/audio_processing/agc/legacy/analog_agc.c:483: if (env[i] >= kZeroThreshold || tmp32 >= kZeroThreshold) { ...

4 years, 6 months ago (2016-06-03 08:00:56 UTC) #32

peah-webrtc

https://codereview.webrtc.org/2003623003/diff/260001/webrtc/modules/audio_processing/agc/legacy/analog_agc.c File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right): https://codereview.webrtc.org/2003623003/diff/260001/webrtc/modules/audio_processing/agc/legacy/analog_agc.c#newcode484 webrtc/modules/audio_processing/agc/legacy/analog_agc.c:484: break; You still have not revised this according to ...

4 years, 6 months ago (2016-06-03 08:27:12 UTC) #34

minyue-webrtc

true! thanks https://codereview.webrtc.org/2003623003/diff/260001/webrtc/modules/audio_processing/agc/legacy/analog_agc.c File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right): https://codereview.webrtc.org/2003623003/diff/260001/webrtc/modules/audio_processing/agc/legacy/analog_agc.c#newcode484 webrtc/modules/audio_processing/agc/legacy/analog_agc.c:484: break; On 2016/06/03 08:27:12, peah-webrtc wrote: > ...

4 years, 6 months ago (2016-06-03 08:32:32 UTC) #35

peah-webrtc

https://codereview.webrtc.org/2003623003/diff/280001/webrtc/modules/audio_processing/agc/legacy/analog_agc.c File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right): https://codereview.webrtc.org/2003623003/diff/280001/webrtc/modules/audio_processing/agc/legacy/analog_agc.c#newcode477 webrtc/modules/audio_processing/agc/legacy/analog_agc.c:477: int64_t tmp32 = 0; I think the variable name ...

4 years, 6 months ago (2016-06-03 08:34:46 UTC) #36

minyue-webrtc

On 2016/06/03 08:34:46, peah-webrtc wrote: > https://codereview.webrtc.org/2003623003/diff/280001/webrtc/modules/audio_processing/agc/legacy/analog_agc.c > File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right): > > https://codereview.webrtc.org/2003623003/diff/280001/webrtc/modules/audio_processing/agc/legacy/analog_agc.c#newcode477 > ...

4 years, 6 months ago (2016-06-03 10:20:32 UTC) #37

peah-webrtc

On 2016/06/03 10:20:32, minyue-webrtc wrote: > On 2016/06/03 08:34:46, peah-webrtc wrote: > > > https://codereview.webrtc.org/2003623003/diff/280001/webrtc/modules/audio_processing/agc/legacy/analog_agc.c ...

4 years, 6 months ago (2016-06-03 11:26:50 UTC) #38

minyue-webrtc

On 2016/06/03 11:26:50, peah-webrtc wrote: > On 2016/06/03 10:20:32, minyue-webrtc wrote: > > On 2016/06/03 ...

4 years, 6 months ago (2016-06-03 11:32:26 UTC) #39

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/2003623003/300001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/2003623003/300001

4 years, 6 months ago (2016-06-03 11:32:42 UTC) #41

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 6 months ago (2016-06-03 11:38:00 UTC) #42

commit-bot: I haz the power

Try jobs failed on following builders: presubmit on tryserver.webrtc (JOB_FAILED, http://build.chromium.org/p/tryserver.webrtc/builders/presubmit/builds/6065)

4 years, 6 months ago (2016-06-03 11:38:01 UTC) #43

minyue-webrtc

minyue@webrtc.org changed reviewers: + kjellander@webrtc.org

4 years, 6 months ago (2016-06-03 11:39:22 UTC) #44

minyue-webrtc

On 2016/06/03 11:38:01, commit-bot: I haz the power wrote: > Try jobs failed on following ...

4 years, 6 months ago (2016-06-03 11:39:53 UTC) #45

kjellander_webrtc

On 2016/06/03 11:39:53, minyue-webrtc wrote: > On 2016/06/03 11:38:01, commit-bot: I haz the power wrote: ...

4 years, 6 months ago (2016-06-03 11:48:35 UTC) #46

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/2003623003/300001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/2003623003/300001

4 years, 6 months ago (2016-06-03 11:56:52 UTC) #48

commit-bot: I haz the power

Description was changed from ========== Re-enable UBsan on AGC. BUG=webrtc:5530 ========== to ========== Re-enable UBsan ...

4 years, 6 months ago (2016-06-03 13:07:19 UTC) #49

commit-bot: I haz the power

Description was changed from ========== Re-enable UBsan on AGC. BUG=webrtc:5530 ========== to ========== Re-enable UBsan ...

4 years, 6 months ago (2016-06-03 13:07:49 UTC) #51

commit-bot: I haz the power

Patchset 8 (id:??) landed as https://crrev.com/293c86d67384c15f46b8296096a62a14b4a58d33 Cr-Commit-Position: refs/heads/master@{#13034}

4 years, 6 months ago (2016-06-03 13:07:51 UTC) #52

åsapersson

4 years, 6 months ago (2016-06-09 11:00:26 UTC) #53

Message was sent while issue was closed.

A revert of this CL (patchset #8 id:300001) has been created in
https://codereview.webrtc.org/2056683002/ by asapersson@webrtc.org.

The reason for reverting is: Breaks bot..

Issue 2003623003: Re-enable UBsan on AGC. (Closed)

Description

Patch Set 1 : #

Patch Set 2 : safer method #

Patch Set 3 : new solution #

Patch Set 4 : fixing negative case #

Patch Set 5 : fix #

Patch Set 6 : on per's suggestion #

Patch Set 7 : addressing another place #

Patch Set 8 : change variable name #

Messages