|
|
Created:
4 years, 7 months ago by minyue-webrtc Modified:
4 years, 6 months ago Reviewers:
kjellander_webrtc, peah-webrtc CC:
webrtc-reviews_webrtc.org, Andrew MacDonald, tterriberry_mozilla.com, audio-team_agora.io, hlundin-webrtc, kwiberg-webrtc, the sun, aluebs-webrtc, bjornv1 Base URL:
https://chromium.googlesource.com/external/webrtc.git@master Target Ref:
refs/pending/heads/master Project:
webrtc Visibility:
Public. |
DescriptionRe-enable UBsan on AGC.
BUG=webrtc:5530
Committed: https://crrev.com/293c86d67384c15f46b8296096a62a14b4a58d33
Cr-Commit-Position: refs/heads/master@{#13034}
Patch Set 1 : #
Total comments: 14
Patch Set 2 : safer method #
Total comments: 14
Patch Set 3 : new solution #
Total comments: 5
Patch Set 4 : fixing negative case #
Total comments: 2
Patch Set 5 : fix #
Total comments: 2
Patch Set 6 : on per's suggestion #
Total comments: 2
Patch Set 7 : addressing another place #
Total comments: 1
Patch Set 8 : change variable name #
Created: 4 years, 6 months ago
Messages
Total messages: 53 (16 generated)
Patchset #1 (id:1) has been deleted
Patchset #1 (id:20001) has been deleted
Patchset #1 (id:40001) has been deleted
Patchset #1 (id:60001) has been deleted
Patchset #1 (id:80001) has been deleted
Patchset #1 (id:100001) has been deleted
Patchset #1 (id:120001) has been deleted
Description was changed from ========== fixing BUG= ========== to ========== Re-enable UBsan on AGC. BUG=webrtc:5530 ==========
minyue@webrtc.org changed reviewers: + peah@webrtc.org
Hi Per, Would you take a look at this fix on AGC overflow? https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (left): https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/digital_agc.c:192: if (numFIX > (den >> 8)) // |den| is Q8. Two problems were with this part 1. if |numFIX| is negative, it will take 196 and the shift can be too large 2. line 203/205 may overflow. See my solution.
https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right): https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/analog_agc.c:479: const int kZeroThreshold = 500; This solution definitely works, but another variant would have been just to replace int32_t tmp32 = 0; by int64_t tmp64 = 0; and then do the additions in 64 bit instead. that should probably also be faster than having the if-statement inside the for-loop. Or is there a reason not to use 64 bit values here? https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right): https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/digital_agc.c:194: zeros = WEBRTC_SPL_MIN(WebRtcSpl_NormW32(numFIX * 2), How do you know that numFIX * 2 does not overflow? Normalizing as WebRtcSpl_NormW32(numFIX) which was done before seems safer. Alternatively, if you want the norm to be one less than before, it is better to compute zeros as WebRtcSpl_NormW32(numFIX) - 1 https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/digital_agc.c:194: zeros = WEBRTC_SPL_MIN(WebRtcSpl_NormW32(numFIX * 2), The norm computation is on some platforms fairly complex. It probably does not matter that much, but it is likely faster to do instead do: int64_t tmp64 = numFIX * 256; // Q8. if (tmp64 > den) // |den| is Q8. { zeros = WebRtcSpl_NormW32(numFIX); } else { zeros = WebRtcSpl_NormW32(den) + 8; } https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/digital_agc.c:201: numFIX -= tmp32no1 / 2; Does changing WebRtcSpl_NormW32(numFIX) to WebRtcSpl_NormW32(numFIX * 2) solve the overflow on these on 201+203?
Patchset #2 (id:160001) has been deleted
Thank you for the review! I have a better solution now. PTAL. https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right): https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/analog_agc.c:479: const int kZeroThreshold = 500; On 2016/05/23 04:59:36, peah-webrtc wrote: > This solution definitely works, but another variant would have been just to > replace > int32_t tmp32 = 0; > by > int64_t tmp64 = 0; > and then do the additions in 64 bit instead. > > that should probably also be faster than having the if-statement inside the > for-loop. > > Or is there a reason not to use 64 bit values here? Yes, I think early exiting the loop is more efficient, too. https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right): https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/digital_agc.c:194: zeros = WEBRTC_SPL_MIN(WebRtcSpl_NormW32(numFIX * 2), Good question, after a second thought, I see that |den| being too large can also overflow. A better way is to deal with rounding in another way. See my new patch. On 2016/05/23 04:59:37, peah-webrtc wrote: > How do you know that numFIX * 2 does not overflow? Normalizing as > WebRtcSpl_NormW32(numFIX) which was done before seems safer. > > Alternatively, if you want the norm to be one less than before, it is better to > compute zeros as > WebRtcSpl_NormW32(numFIX) - 1 https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right): https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/digital_agc.c:192: if (numFIX > (den >> 8) || -numFIX > (den >> 8)) // |den| is Q8. this part is switched back to the old style, but added a new condition "|| -numFIX > (den >> 8)", which is indeed an error of the old code (error when numFIX is negative) https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/digital_agc.c:203: int remainder = numFIX % tmp32no1; this part is a new solution to rounding numFIX / tmp32no1. The bitexactness test showed that this solution is equivalent in normal conditions, but it is safer since it avoids overflow risk in line 201/203 in the old code.
https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right): https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/analog_agc.c:479: const int kZeroThreshold = 500; On 2016/05/23 07:40:48, minyue-webrtc wrote: > On 2016/05/23 04:59:36, peah-webrtc wrote: > > This solution definitely works, but another variant would have been just to > > replace > > int32_t tmp32 = 0; > > by > > int64_t tmp64 = 0; > > and then do the additions in 64 bit instead. > > > > that should probably also be faster than having the if-statement inside the > > for-loop. > > > > Or is there a reason not to use 64 bit values here? > > Yes, I think early exiting the loop is more efficient, too. I think that this would be a clear case where a 64 bit value should be used. It will lead to cleaner code and I doubt that the early return will save anything, as this is a very short loop of fixed size, which should be ideal to pipeline (meaning that an early return won't save computations as it anyway breaks the pipeline). The only reason I can see for not using a 64 bit value for the sum is if it is significantly more complex. Do you know if that is the case? https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right): https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/digital_agc.c:192: if (numFIX > (den >> 8) || -numFIX > (den >> 8)) // |den| is Q8. On 2016/05/23 07:40:48, minyue-webrtc wrote: > this part is switched back to the old style, but added a new condition "|| > -numFIX > (den >> 8)", which is indeed an error of the old code (error when > numFIX is negative) Nice! https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/digital_agc.c:202: y32 = numFIX / tmp32no1; // in Q14 I don't think this is the right approach, even though it may work. It is quite complicated and it also makes use of the % operator which I expect to be as complex as a division in fixed point code. The standard way of solving this is to normalize one step less, which will give you headroom in the addition. I think you should do that, as it leads to much simpler code and is cheaper as well. Furthermore, I don't think the addition of tmp32no1 / 2 is really needed. It should be sufficient to add 1<< (zeros-1) (or subtract that, dependent on the sign of numFix). That is the standard way of doing rounding in fixed point code.
https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right): https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/analog_agc.c:479: const int kZeroThreshold = 500; On 2016/05/23 12:00:19, peah-webrtc wrote: > On 2016/05/23 07:40:48, minyue-webrtc wrote: > > On 2016/05/23 04:59:36, peah-webrtc wrote: > > > This solution definitely works, but another variant would have been just to > > > replace > > > int32_t tmp32 = 0; > > > by > > > int64_t tmp64 = 0; > > > and then do the additions in 64 bit instead. > > > > > > that should probably also be faster than having the if-statement inside the > > > for-loop. > > > > > > Or is there a reason not to use 64 bit values here? > > > > Yes, I think early exiting the loop is more efficient, too. > > I think that this would be a clear case where a 64 bit value should be used. It > will lead to cleaner code and I doubt that the early return will save anything, > as this is a very short loop of fixed size, which should be ideal to pipeline > (meaning that an early return won't save computations as it anyway breaks the > pipeline). > > The only reason I can see for not using a 64 bit value for the sum is if it is > significantly more complex. Do you know if that is the case? Are the operations int64 += int32 and int64 >= int32_constant? That ought to be pretty efficient even on a 32-bit platform---I'm guessing ~two instructions for the addition (add low half, add carry to high half), and ~three for the comparison (test low half, test if high half is zero, combine the results). On 64-bit platforms, 64-bit operations ought to be just as cheap as 32-bit operations as long as they operate on registers. So my guess is that using a 64-bit accumulator is a good idea.
https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right): https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/analog_agc.c:479: const int kZeroThreshold = 500; On 2016/05/23 12:00:19, peah-webrtc wrote: > On 2016/05/23 07:40:48, minyue-webrtc wrote: > > On 2016/05/23 04:59:36, peah-webrtc wrote: > > > This solution definitely works, but another variant would have been just to > > > replace > > > int32_t tmp32 = 0; > > > by > > > int64_t tmp64 = 0; > > > and then do the additions in 64 bit instead. > > > > > > that should probably also be faster than having the if-statement inside the > > > for-loop. > > > > > > Or is there a reason not to use 64 bit values here? > > > > Yes, I think early exiting the loop is more efficient, too. > > I think that this would be a clear case where a 64 bit value should be used. It > will lead to cleaner code and I doubt that the early return will save anything, > as this is a very short loop of fixed size, which should be ideal to pipeline > (meaning that an early return won't save computations as it anyway breaks the > pipeline). > > The only reason I can see for not using a 64 bit value for the sum is if it is > significantly more complex. Do you know if that is the case? Loop in 481-488 is really to check if env[0..9] are generally zeros. In most cases, this should be invalidated at index 0, it is nice that we return early. I do not fully understand the pipeline argument. https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right): https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/digital_agc.c:202: y32 = numFIX / tmp32no1; // in Q14 On 2016/05/23 12:00:19, peah-webrtc wrote: > I don't think this is the right approach, even though it may work. > It is quite complicated and it also makes use of the % operator which I expect > to be as complex as a division in fixed point code. > > The standard way of solving this is to normalize one step less, which will give > you headroom in the addition. I think you should do that, as it leads to much > simpler code and is cheaper as well. > > Furthermore, I don't think the addition of > tmp32no1 / 2 is really needed. It should be sufficient to add 1<< (zeros-1) (or > subtract that, dependent on the sign of numFix). > That is the standard way of doing rounding in fixed point code. + 1<<(zeros-1) is to do ceil(), what we want here is round(). WDYT
https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right): https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/analog_agc.c:479: const int kZeroThreshold = 500; On 2016/05/24 01:32:26, minyue-webrtc wrote: > On 2016/05/23 12:00:19, peah-webrtc wrote: > > On 2016/05/23 07:40:48, minyue-webrtc wrote: > > > On 2016/05/23 04:59:36, peah-webrtc wrote: > > > > This solution definitely works, but another variant would have been just > to > > > > replace > > > > int32_t tmp32 = 0; > > > > by > > > > int64_t tmp64 = 0; > > > > and then do the additions in 64 bit instead. > > > > > > > > that should probably also be faster than having the if-statement inside > the > > > > for-loop. > > > > > > > > Or is there a reason not to use 64 bit values here? > > > > > > Yes, I think early exiting the loop is more efficient, too. > > > > I think that this would be a clear case where a 64 bit value should be used. > It > > will lead to cleaner code and I doubt that the early return will save > anything, > > as this is a very short loop of fixed size, which should be ideal to pipeline > > (meaning that an early return won't save computations as it anyway breaks the > > pipeline). > > > > The only reason I can see for not using a 64 bit value for the sum is if it is > > significantly more complex. Do you know if that is the case? > > Loop in 481-488 is really to check if env[0..9] are generally zeros. In most > cases, this should be invalidated at index 0, it is nice that we return early. I > do not fully understand the pipeline argument. What I mean by pipelining is that as the loop is so predictable for the compiler that it should be extremely efficient to perform for the CPU and it utilizes the CPU pipeline in a really good manner. Possibly it could even be done using SIMD instructions. Since the loop limit is fixed, and as low as 10, I expect the compiler would probably do loop-unrolling on the loop, e.g, replace the loop by. tmp32 += env[0]; tmp32 += env[1]; ... tmp32 += env[9]; Therefore, this loop is extremely cheap and doing an early return is probably not worth it, and would potentially even increase the complexity. The compiler would possibly be able to do good pipelining of the early-return variant of the loop but it is at least not as straightforward to do that, and the break statement will anyway break the pipeline for the loop. Also, loop unrolling would be less nice, e.g., if (env[0] >= kZeroThreshold || tmp32 >= kZeroThreshold) { goto early_return_label; } tmp32 += env[0]; if (env[1] >= kZeroThreshold || tmp32 >= kZeroThreshold) { goto early_return_label; } tmp32 += env[1]; ... if (env[9] >= kZeroThreshold || tmp32 >= kZeroThreshold) { goto early_return_label; } tmp32 += env[9]; early_return_label: (The compiler would probably optimize the loop unrolling above to look very different though). If you expect the test to often be invalidated at index 0, I'd then instead change the code to be, int64_t tmp64 = env[0]; if (tmp64 < kZeroThreshold) { for (i = 1; i < 10; ++i) { tmp64 = env[i]; } } That would provide a really efficient early return, and would give a very nice optimize able loop. I don't think it is worth the extra code though, as this is an extremely cheap loop for the compiler to perform, and as these types of optimization only affect the average level of the complexity which means that it is mainly a way to save electrical power of the CPU (i.e., does not affect the algorithmic complexity). As I see it, the only reason for not going for 64 bit variables should be a high cost of the 64 bit additions. Do you know if that is the case? https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right): https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/digital_agc.c:202: y32 = numFIX / tmp32no1; // in Q14 On 2016/05/24 01:32:26, minyue-webrtc wrote: > On 2016/05/23 12:00:19, peah-webrtc wrote: > > I don't think this is the right approach, even though it may work. > > It is quite complicated and it also makes use of the % operator which I expect > > to be as complex as a division in fixed point code. > > > > The standard way of solving this is to normalize one step less, which will > give > > you headroom in the addition. I think you should do that, as it leads to much > > simpler code and is cheaper as well. > > > > Furthermore, I don't think the addition of > > tmp32no1 / 2 is really needed. It should be sufficient to add 1<< (zeros-1) > (or > > subtract that, dependent on the sign of numFix). > > That is the standard way of doing rounding in fixed point code. > > + 1<<(zeros-1) is to do ceil(), what we want here is round(). WDYT Sorry, I realize that I was wrong about that the addition of tmp32no1 / 2 not being needed. My comment about that was not correct at all. However, I still think that setting zeros to one less than it is done now is the correct solution, as that will should give headroom for the addition of tmp32nol/2. Changing the value of zeros to one less does not change the rounding to a ceil. It only affects the accuracy of the division. WDYT?
https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right): https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/analog_agc.c:479: const int kZeroThreshold = 500; On 2016/05/24 06:16:20, peah-webrtc wrote: > On 2016/05/24 01:32:26, minyue-webrtc wrote: > > On 2016/05/23 12:00:19, peah-webrtc wrote: > > > On 2016/05/23 07:40:48, minyue-webrtc wrote: > > > > On 2016/05/23 04:59:36, peah-webrtc wrote: > > > > > This solution definitely works, but another variant would have been just > > to > > > > > replace > > > > > int32_t tmp32 = 0; > > > > > by > > > > > int64_t tmp64 = 0; > > > > > and then do the additions in 64 bit instead. > > > > > > > > > > that should probably also be faster than having the if-statement inside > > the > > > > > for-loop. > > > > > > > > > > Or is there a reason not to use 64 bit values here? > > > > > > > > Yes, I think early exiting the loop is more efficient, too. > > > > > > I think that this would be a clear case where a 64 bit value should be used. > > It > > > will lead to cleaner code and I doubt that the early return will save > > anything, > > > as this is a very short loop of fixed size, which should be ideal to > pipeline > > > (meaning that an early return won't save computations as it anyway breaks > the > > > pipeline). > > > > > > The only reason I can see for not using a 64 bit value for the sum is if it > is > > > significantly more complex. Do you know if that is the case? > > > > Loop in 481-488 is really to check if env[0..9] are generally zeros. In most > > cases, this should be invalidated at index 0, it is nice that we return early. > I > > do not fully understand the pipeline argument. > > What I mean by pipelining is that as the loop is so predictable for the compiler > that it should be extremely efficient to perform for the CPU and it utilizes the > CPU pipeline in a really good manner. Possibly it could even be done using SIMD > instructions. Since the loop limit is fixed, and as low as 10, I expect the > compiler would probably do loop-unrolling on the loop, e.g, replace the loop by. > tmp32 += env[0]; > tmp32 += env[1]; > ... > tmp32 += env[9]; > Therefore, this loop is extremely cheap and doing an early return is probably > not worth it, and would potentially even increase the complexity. > > The compiler would possibly be able to do good pipelining of the early-return > variant of the loop but it is at least not as straightforward to do that, and > the break statement will anyway break the pipeline for the loop. Also, loop > unrolling would be less nice, e.g., > if (env[0] >= kZeroThreshold || tmp32 >= kZeroThreshold) { > goto early_return_label; > } > tmp32 += env[0]; > if (env[1] >= kZeroThreshold || tmp32 >= kZeroThreshold) { > goto early_return_label; > } > tmp32 += env[1]; > > ... > if (env[9] >= kZeroThreshold || tmp32 >= kZeroThreshold) { > goto early_return_label; > } > tmp32 += env[9]; > early_return_label: > > > (The compiler would probably optimize the loop unrolling above to look very > different though). > > If you expect the test to often be invalidated at index 0, I'd then instead > change the code to be, > int64_t tmp64 = env[0]; > if (tmp64 < kZeroThreshold) { > for (i = 1; i < 10; ++i) { > tmp64 = env[i]; > } > } > > That would provide a really efficient early return, and would give a very nice > optimize able loop. I don't think it is worth the extra code though, as this is > an extremely cheap loop for the compiler to perform, and as these types of > optimization only affect the average level of the complexity which means that it > is mainly a way to save electrical power of the CPU (i.e., does not affect the > algorithmic complexity). > > > As I see it, the only reason for not going for 64 bit variables should be a high > cost of the 64 bit additions. Do you know if that is the case? > > I can basically follow your suggestion, but I think it is a chance to learn something. Let me continue the discussion a bit. I think pipelining is useful if the lines in the loop can be parallized, this wiki page has good explanation. https://en.wikipedia.org/wiki/Software_pipelining For summation, parallization is not possible. https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right): https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/digital_agc.c:202: y32 = numFIX / tmp32no1; // in Q14 On 2016/05/24 06:16:20, peah-webrtc wrote: > On 2016/05/24 01:32:26, minyue-webrtc wrote: > > On 2016/05/23 12:00:19, peah-webrtc wrote: > > > I don't think this is the right approach, even though it may work. > > > It is quite complicated and it also makes use of the % operator which I > expect > > > to be as complex as a division in fixed point code. > > > > > > The standard way of solving this is to normalize one step less, which will > > give > > > you headroom in the addition. I think you should do that, as it leads to > much > > > simpler code and is cheaper as well. > > > > > > Furthermore, I don't think the addition of > > > tmp32no1 / 2 is really needed. It should be sufficient to add 1<< (zeros-1) > > (or > > > subtract that, dependent on the sign of numFix). > > > That is the standard way of doing rounding in fixed point code. > > > > + 1<<(zeros-1) is to do ceil(), what we want here is round(). WDYT > > Sorry, I realize that I was wrong about that the addition of tmp32no1 / 2 not > being needed. My comment about that was not correct at all. > > However, I still think that setting zeros to one less than it is done now is the > correct solution, as that will should give headroom for the addition of > tmp32nol/2. > > Changing the value of zeros to one less does not change the rounding to a ceil. > It only affects the accuracy of the division. > WDYT? I also realized that +1<<(zero-1) does not do ceil(). But that is not the core part of the discussion. What worth discuss is weather we want a precise round() or not. In my opinion, finding the nearest integer should be most reasonable. 1) it is well defined, 2) we do all shifting above to improve accuracy, and we should not be too relaxed here, 3) Computationally, it should not be much more complex. and if you think % will do another /, we may calculate remainder by numFIX - y32 * tmp32nol.
https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right): https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/analog_agc.c:479: const int kZeroThreshold = 500; On 2016/05/24 07:16:12, minyue-webrtc wrote: > On 2016/05/24 06:16:20, peah-webrtc wrote: > > On 2016/05/24 01:32:26, minyue-webrtc wrote: > > > On 2016/05/23 12:00:19, peah-webrtc wrote: > > > > On 2016/05/23 07:40:48, minyue-webrtc wrote: > > > > > On 2016/05/23 04:59:36, peah-webrtc wrote: > > > > > > This solution definitely works, but another variant would have been > just > > > to > > > > > > replace > > > > > > int32_t tmp32 = 0; > > > > > > by > > > > > > int64_t tmp64 = 0; > > > > > > and then do the additions in 64 bit instead. > > > > > > > > > > > > that should probably also be faster than having the if-statement > inside > > > the > > > > > > for-loop. > > > > > > > > > > > > Or is there a reason not to use 64 bit values here? > > > > > > > > > > Yes, I think early exiting the loop is more efficient, too. > > > > > > > > I think that this would be a clear case where a 64 bit value should be > used. > > > It > > > > will lead to cleaner code and I doubt that the early return will save > > > anything, > > > > as this is a very short loop of fixed size, which should be ideal to > > pipeline > > > > (meaning that an early return won't save computations as it anyway breaks > > the > > > > pipeline). > > > > > > > > The only reason I can see for not using a 64 bit value for the sum is if > it > > is > > > > significantly more complex. Do you know if that is the case? > > > > > > Loop in 481-488 is really to check if env[0..9] are generally zeros. In most > > > cases, this should be invalidated at index 0, it is nice that we return > early. > > I > > > do not fully understand the pipeline argument. > > > > What I mean by pipelining is that as the loop is so predictable for the > compiler > > that it should be extremely efficient to perform for the CPU and it utilizes > the > > CPU pipeline in a really good manner. Possibly it could even be done using > SIMD > > instructions. Since the loop limit is fixed, and as low as 10, I expect the > > compiler would probably do loop-unrolling on the loop, e.g, replace the loop > by. > > tmp32 += env[0]; > > tmp32 += env[1]; > > ... > > tmp32 += env[9]; > > Therefore, this loop is extremely cheap and doing an early return is probably > > not worth it, and would potentially even increase the complexity. > > > > The compiler would possibly be able to do good pipelining of the early-return > > variant of the loop but it is at least not as straightforward to do that, and > > the break statement will anyway break the pipeline for the loop. Also, loop > > unrolling would be less nice, e.g., > > if (env[0] >= kZeroThreshold || tmp32 >= kZeroThreshold) { > > goto early_return_label; > > } > > tmp32 += env[0]; > > if (env[1] >= kZeroThreshold || tmp32 >= kZeroThreshold) { > > goto early_return_label; > > } > > tmp32 += env[1]; > > > > ... > > if (env[9] >= kZeroThreshold || tmp32 >= kZeroThreshold) { > > goto early_return_label; > > } > > tmp32 += env[9]; > > early_return_label: > > > > > > (The compiler would probably optimize the loop unrolling above to look very > > different though). > > > > If you expect the test to often be invalidated at index 0, I'd then instead > > change the code to be, > > int64_t tmp64 = env[0]; > > if (tmp64 < kZeroThreshold) { > > for (i = 1; i < 10; ++i) { > > tmp64 = env[i]; > > } > > } > > > > That would provide a really efficient early return, and would give a very nice > > optimize able loop. I don't think it is worth the extra code though, as this > is > > an extremely cheap loop for the compiler to perform, and as these types of > > optimization only affect the average level of the complexity which means that > it > > is mainly a way to save electrical power of the CPU (i.e., does not affect the > > algorithmic complexity). > > > > > > As I see it, the only reason for not going for 64 bit variables should be a > high > > cost of the 64 bit additions. Do you know if that is the case? > > > > > I can basically follow your suggestion, but I think it is a chance to learn > something. Let me continue the discussion a bit. > > I think pipelining is useful if the lines in the loop can be parallized, this > wiki page has good explanation. > > https://en.wikipedia.org/wiki/Software_pipelining > > For summation, parallization is not possible. I don't know this as well as I should do, but I meant primarily hardware pipelining https://en.wikipedia.org/wiki/Instruction_pipelining For summation, the lines in the loop can definitely be parallellized in hardware (probably in this case via loop-unrolling though. E.g. sum = A+B+C+D can be parallelized as E=A+B F=C+D sum=E+F where E and F can be computed in parallel independently of each other. https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right): https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/digital_agc.c:202: y32 = numFIX / tmp32no1; // in Q14 On 2016/05/24 07:16:12, minyue-webrtc wrote: > On 2016/05/24 06:16:20, peah-webrtc wrote: > > On 2016/05/24 01:32:26, minyue-webrtc wrote: > > > On 2016/05/23 12:00:19, peah-webrtc wrote: > > > > I don't think this is the right approach, even though it may work. > > > > It is quite complicated and it also makes use of the % operator which I > > expect > > > > to be as complex as a division in fixed point code. > > > > > > > > The standard way of solving this is to normalize one step less, which will > > > give > > > > you headroom in the addition. I think you should do that, as it leads to > > much > > > > simpler code and is cheaper as well. > > > > > > > > Furthermore, I don't think the addition of > > > > tmp32no1 / 2 is really needed. It should be sufficient to add 1<< > (zeros-1) > > > (or > > > > subtract that, dependent on the sign of numFix). > > > > That is the standard way of doing rounding in fixed point code. > > > > > > + 1<<(zeros-1) is to do ceil(), what we want here is round(). WDYT > > > > Sorry, I realize that I was wrong about that the addition of tmp32no1 / 2 not > > being needed. My comment about that was not correct at all. > > > > However, I still think that setting zeros to one less than it is done now is > the > > correct solution, as that will should give headroom for the addition of > > tmp32nol/2. > > > > Changing the value of zeros to one less does not change the rounding to a > ceil. > > It only affects the accuracy of the division. > > WDYT? > > I also realized that +1<<(zero-1) does not do ceil(). But that is not the core > part of the discussion. What worth discuss is weather we want a precise round() > or not. In my opinion, finding the nearest integer should be most reasonable. 1) > it is well defined, 2) we do all shifting above to improve accuracy, and we > should not be too relaxed here, 3) Computationally, it should not be much more > complex. and if you think % will do another /, we may calculate remainder by > numFIX - y32 * tmp32nol. I strongly doubt that we really need a rounding division here as this is a gaintable computation. Do you know what negative impact a truncating division would have? The drawback of doing a zeros-1 normalization instead of a zeros normalization is that you lose 1 bit accuracy in the 32 bit divisions. I would say that is acceptable, in particular as the rest of the computations are in a fixed Q value. Furthermore, looking at your code for this, I'm actually not sure it is sufficient to avoid overflow. How can you guarantee that y32++ does not overflow? (I could not come up with a good example for that though.)
thanks for the explanation! see my comments and I'd like to see your preference and I will go for it. https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right): https://codereview.webrtc.org/2003623003/diff/140001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/analog_agc.c:479: const int kZeroThreshold = 500; On 2016/05/24 10:46:29, peah-webrtc wrote: > On 2016/05/24 07:16:12, minyue-webrtc wrote: > > On 2016/05/24 06:16:20, peah-webrtc wrote: > > > On 2016/05/24 01:32:26, minyue-webrtc wrote: > > > > On 2016/05/23 12:00:19, peah-webrtc wrote: > > > > > On 2016/05/23 07:40:48, minyue-webrtc wrote: > > > > > > On 2016/05/23 04:59:36, peah-webrtc wrote: > > > > > > > This solution definitely works, but another variant would have been > > just > > > > to > > > > > > > replace > > > > > > > int32_t tmp32 = 0; > > > > > > > by > > > > > > > int64_t tmp64 = 0; > > > > > > > and then do the additions in 64 bit instead. > > > > > > > > > > > > > > that should probably also be faster than having the if-statement > > inside > > > > the > > > > > > > for-loop. > > > > > > > > > > > > > > Or is there a reason not to use 64 bit values here? > > > > > > > > > > > > Yes, I think early exiting the loop is more efficient, too. > > > > > > > > > > I think that this would be a clear case where a 64 bit value should be > > used. > > > > It > > > > > will lead to cleaner code and I doubt that the early return will save > > > > anything, > > > > > as this is a very short loop of fixed size, which should be ideal to > > > pipeline > > > > > (meaning that an early return won't save computations as it anyway > breaks > > > the > > > > > pipeline). > > > > > > > > > > The only reason I can see for not using a 64 bit value for the sum is if > > it > > > is > > > > > significantly more complex. Do you know if that is the case? > > > > > > > > Loop in 481-488 is really to check if env[0..9] are generally zeros. In > most > > > > cases, this should be invalidated at index 0, it is nice that we return > > early. > > > I > > > > do not fully understand the pipeline argument. > > > > > > What I mean by pipelining is that as the loop is so predictable for the > > compiler > > > that it should be extremely efficient to perform for the CPU and it utilizes > > the > > > CPU pipeline in a really good manner. Possibly it could even be done using > > SIMD > > > instructions. Since the loop limit is fixed, and as low as 10, I expect the > > > compiler would probably do loop-unrolling on the loop, e.g, replace the loop > > by. > > > tmp32 += env[0]; > > > tmp32 += env[1]; > > > ... > > > tmp32 += env[9]; > > > Therefore, this loop is extremely cheap and doing an early return is > probably > > > not worth it, and would potentially even increase the complexity. > > > > > > The compiler would possibly be able to do good pipelining of the > early-return > > > variant of the loop but it is at least not as straightforward to do that, > and > > > the break statement will anyway break the pipeline for the loop. Also, loop > > > unrolling would be less nice, e.g., > > > if (env[0] >= kZeroThreshold || tmp32 >= kZeroThreshold) { > > > goto early_return_label; > > > } > > > tmp32 += env[0]; > > > if (env[1] >= kZeroThreshold || tmp32 >= kZeroThreshold) { > > > goto early_return_label; > > > } > > > tmp32 += env[1]; > > > > > > ... > > > if (env[9] >= kZeroThreshold || tmp32 >= kZeroThreshold) { > > > goto early_return_label; > > > } > > > tmp32 += env[9]; > > > early_return_label: > > > > > > > > > (The compiler would probably optimize the loop unrolling above to look very > > > different though). > > > > > > If you expect the test to often be invalidated at index 0, I'd then instead > > > change the code to be, > > > int64_t tmp64 = env[0]; > > > if (tmp64 < kZeroThreshold) { > > > for (i = 1; i < 10; ++i) { > > > tmp64 = env[i]; > > > } > > > } > > > > > > That would provide a really efficient early return, and would give a very > nice > > > optimize able loop. I don't think it is worth the extra code though, as this > > is > > > an extremely cheap loop for the compiler to perform, and as these types of > > > optimization only affect the average level of the complexity which means > that > > it > > > is mainly a way to save electrical power of the CPU (i.e., does not affect > the > > > algorithmic complexity). > > > > > > > > > As I see it, the only reason for not going for 64 bit variables should be a > > high > > > cost of the 64 bit additions. Do you know if that is the case? > > > > > > > > I can basically follow your suggestion, but I think it is a chance to learn > > something. Let me continue the discussion a bit. > > > > I think pipelining is useful if the lines in the loop can be parallized, this > > wiki page has good explanation. > > > > https://en.wikipedia.org/wiki/Software_pipelining > > > > For summation, parallization is not possible. > > I don't know this as well as I should do, but I meant primarily hardware > pipelining https://en.wikipedia.org/wiki/Instruction_pipelining > > For summation, the lines in the loop can definitely be parallellized in hardware > (probably in this case via loop-unrolling though. > > E.g. > sum = A+B+C+D can be parallelized as > E=A+B > F=C+D > sum=E+F > where E and F can be computed in parallel independently of each other. > > ok, I can do int64 sum. https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right): https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/digital_agc.c:202: y32 = numFIX / tmp32no1; // in Q14 On 2016/05/24 10:46:29, peah-webrtc wrote: > On 2016/05/24 07:16:12, minyue-webrtc wrote: > > On 2016/05/24 06:16:20, peah-webrtc wrote: > > > On 2016/05/24 01:32:26, minyue-webrtc wrote: > > > > On 2016/05/23 12:00:19, peah-webrtc wrote: > > > > > I don't think this is the right approach, even though it may work. > > > > > It is quite complicated and it also makes use of the % operator which I > > > expect > > > > > to be as complex as a division in fixed point code. > > > > > > > > > > The standard way of solving this is to normalize one step less, which > will > > > > give > > > > > you headroom in the addition. I think you should do that, as it leads to > > > much > > > > > simpler code and is cheaper as well. > > > > > > > > > > Furthermore, I don't think the addition of > > > > > tmp32no1 / 2 is really needed. It should be sufficient to add 1<< > > (zeros-1) > > > > (or > > > > > subtract that, dependent on the sign of numFix). > > > > > That is the standard way of doing rounding in fixed point code. > > > > > > > > + 1<<(zeros-1) is to do ceil(), what we want here is round(). WDYT > > > > > > Sorry, I realize that I was wrong about that the addition of tmp32no1 / 2 > not > > > being needed. My comment about that was not correct at all. > > > > > > However, I still think that setting zeros to one less than it is done now is > > the > > > correct solution, as that will should give headroom for the addition of > > > tmp32nol/2. > > > > > > Changing the value of zeros to one less does not change the rounding to a > > ceil. > > > It only affects the accuracy of the division. > > > WDYT? > > > > I also realized that +1<<(zero-1) does not do ceil(). But that is not the core > > part of the discussion. What worth discuss is weather we want a precise > round() > > or not. In my opinion, finding the nearest integer should be most reasonable. > 1) > > it is well defined, 2) we do all shifting above to improve accuracy, and we > > should not be too relaxed here, 3) Computationally, it should not be much more > > complex. and if you think % will do another /, we may calculate remainder by > > numFIX - y32 * tmp32nol. > > I strongly doubt that we really need a rounding division here as this is a > gaintable computation. Do you know what negative impact a truncating division > would have? > > The drawback of doing a zeros-1 normalization instead of a zeros normalization > is that you lose 1 bit accuracy in the 32 bit divisions. I would say that is > acceptable, in particular as the rest of the computations are in a fixed Q > value. > > Furthermore, looking at your code for this, I'm actually not sure it is > sufficient to avoid overflow. How can you guarantee that y32++ does not > overflow? (I could not come up with a good example for that though.) > 1) y32++ and -- is surely safe, since as long as |den| > 1, the ratio will be able to add/minus 1. 2) Regarding shifting zero -1 bits, I have thought about it. It is not totally safe. what numFIX is already large (zero = 0)? 3) Not adding tmp32nol / 2 is good, but we loose the precision of nearest integer. If you like, I can go for 3).
https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right): https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/digital_agc.c:202: y32 = numFIX / tmp32no1; // in Q14 On 2016/05/24 13:05:45, minyue-webrtc wrote: > On 2016/05/24 10:46:29, peah-webrtc wrote: > > On 2016/05/24 07:16:12, minyue-webrtc wrote: > > > On 2016/05/24 06:16:20, peah-webrtc wrote: > > > > On 2016/05/24 01:32:26, minyue-webrtc wrote: > > > > > On 2016/05/23 12:00:19, peah-webrtc wrote: > > > > > > I don't think this is the right approach, even though it may work. > > > > > > It is quite complicated and it also makes use of the % operator which > I > > > > expect > > > > > > to be as complex as a division in fixed point code. > > > > > > > > > > > > The standard way of solving this is to normalize one step less, which > > will > > > > > give > > > > > > you headroom in the addition. I think you should do that, as it leads > to > > > > much > > > > > > simpler code and is cheaper as well. > > > > > > > > > > > > Furthermore, I don't think the addition of > > > > > > tmp32no1 / 2 is really needed. It should be sufficient to add 1<< > > > (zeros-1) > > > > > (or > > > > > > subtract that, dependent on the sign of numFix). > > > > > > That is the standard way of doing rounding in fixed point code. > > > > > > > > > > + 1<<(zeros-1) is to do ceil(), what we want here is round(). WDYT > > > > > > > > Sorry, I realize that I was wrong about that the addition of tmp32no1 / 2 > > not > > > > being needed. My comment about that was not correct at all. > > > > > > > > However, I still think that setting zeros to one less than it is done now > is > > > the > > > > correct solution, as that will should give headroom for the addition of > > > > tmp32nol/2. > > > > > > > > Changing the value of zeros to one less does not change the rounding to a > > > ceil. > > > > It only affects the accuracy of the division. > > > > WDYT? > > > > > > I also realized that +1<<(zero-1) does not do ceil(). But that is not the > core > > > part of the discussion. What worth discuss is weather we want a precise > > round() > > > or not. In my opinion, finding the nearest integer should be most > reasonable. > > 1) > > > it is well defined, 2) we do all shifting above to improve accuracy, and we > > > should not be too relaxed here, 3) Computationally, it should not be much > more > > > complex. and if you think % will do another /, we may calculate remainder by > > > numFIX - y32 * tmp32nol. > > > > I strongly doubt that we really need a rounding division here as this is a > > gaintable computation. Do you know what negative impact a truncating division > > would have? > > > > The drawback of doing a zeros-1 normalization instead of a zeros normalization > > is that you lose 1 bit accuracy in the 32 bit divisions. I would say that is > > acceptable, in particular as the rest of the computations are in a fixed Q > > value. > > > > Furthermore, looking at your code for this, I'm actually not sure it is > > sufficient to avoid overflow. How can you guarantee that y32++ does not > > overflow? (I could not come up with a good example for that though.) > > > > 1) y32++ and -- is surely safe, since as long as |den| > 1, the ratio will be > able to add/minus 1. > > 2) Regarding shifting zero -1 bits, I have thought about it. It is not totally > safe. what numFIX is already large (zero = 0)? > > 3) Not adding tmp32nol / 2 is good, but we loose the precision of nearest > integer. > > If you like, I can go for 3). Looking at it again, I see that the division is wrongly done. It should be zeros = WebRtcSpl_NormW32(numFIX); numFIX *= 1 << zeros; // Q(14+zeros) int sft = WebRtcSpl_NormW32(den) - 16; tmp32no1 = WEBRTC_SPL_SHIFT_W32(den, -sft); // Q(8-sft) (ensures 16 bit precision of the division) y32 = numFIX / tmp32no1; // Q(14+zeros-8+sft) sft = 8 - (14+zeros-8+sft); y32 += (1<<(sft-1)); // Adding 0.5 Q8 to ensure proper rounding y32 = WEBRTC_SPL_SHIFT_W32(y32, -sft); // Q(8) (rounded) Please doublecheck the above though.
https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right): https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/digital_agc.c:202: y32 = numFIX / tmp32no1; // in Q14 On 2016/05/24 13:37:10, peah-webrtc wrote: > On 2016/05/24 13:05:45, minyue-webrtc wrote: > > On 2016/05/24 10:46:29, peah-webrtc wrote: > > > On 2016/05/24 07:16:12, minyue-webrtc wrote: > > > > On 2016/05/24 06:16:20, peah-webrtc wrote: > > > > > On 2016/05/24 01:32:26, minyue-webrtc wrote: > > > > > > On 2016/05/23 12:00:19, peah-webrtc wrote: > > > > > > > I don't think this is the right approach, even though it may work. > > > > > > > It is quite complicated and it also makes use of the % operator > which > > I > > > > > expect > > > > > > > to be as complex as a division in fixed point code. > > > > > > > > > > > > > > The standard way of solving this is to normalize one step less, > which > > > will > > > > > > give > > > > > > > you headroom in the addition. I think you should do that, as it > leads > > to > > > > > much > > > > > > > simpler code and is cheaper as well. > > > > > > > > > > > > > > Furthermore, I don't think the addition of > > > > > > > tmp32no1 / 2 is really needed. It should be sufficient to add 1<< > > > > (zeros-1) > > > > > > (or > > > > > > > subtract that, dependent on the sign of numFix). > > > > > > > That is the standard way of doing rounding in fixed point code. > > > > > > > > > > > > + 1<<(zeros-1) is to do ceil(), what we want here is round(). WDYT > > > > > > > > > > Sorry, I realize that I was wrong about that the addition of tmp32no1 / > 2 > > > not > > > > > being needed. My comment about that was not correct at all. > > > > > > > > > > However, I still think that setting zeros to one less than it is done > now > > is > > > > the > > > > > correct solution, as that will should give headroom for the addition of > > > > > tmp32nol/2. > > > > > > > > > > Changing the value of zeros to one less does not change the rounding to > a > > > > ceil. > > > > > It only affects the accuracy of the division. > > > > > WDYT? > > > > > > > > I also realized that +1<<(zero-1) does not do ceil(). But that is not the > > core > > > > part of the discussion. What worth discuss is weather we want a precise > > > round() > > > > or not. In my opinion, finding the nearest integer should be most > > reasonable. > > > 1) > > > > it is well defined, 2) we do all shifting above to improve accuracy, and > we > > > > should not be too relaxed here, 3) Computationally, it should not be much > > more > > > > complex. and if you think % will do another /, we may calculate remainder > by > > > > numFIX - y32 * tmp32nol. > > > > > > I strongly doubt that we really need a rounding division here as this is a > > > gaintable computation. Do you know what negative impact a truncating > division > > > would have? > > > > > > The drawback of doing a zeros-1 normalization instead of a zeros > normalization > > > is that you lose 1 bit accuracy in the 32 bit divisions. I would say that is > > > acceptable, in particular as the rest of the computations are in a fixed Q > > > value. > > > > > > Furthermore, looking at your code for this, I'm actually not sure it is > > > sufficient to avoid overflow. How can you guarantee that y32++ does not > > > overflow? (I could not come up with a good example for that though.) > > > > > > > 1) y32++ and -- is surely safe, since as long as |den| > 1, the ratio will be > > able to add/minus 1. > > > > 2) Regarding shifting zero -1 bits, I have thought about it. It is not totally > > safe. what numFIX is already large (zero = 0)? > > > > 3) Not adding tmp32nol / 2 is good, but we loose the precision of nearest > > integer. > > > > If you like, I can go for 3). > > Looking at it again, I see that the division is wrongly done. It should be > zeros = WebRtcSpl_NormW32(numFIX); > numFIX *= 1 << zeros; // Q(14+zeros) > int sft = WebRtcSpl_NormW32(den) - 16; > tmp32no1 = WEBRTC_SPL_SHIFT_W32(den, -sft); // Q(8-sft) (ensures 16 bit > precision of the division) > y32 = numFIX / tmp32no1; // Q(14+zeros-8+sft) > sft = 8 - (14+zeros-8+sft); > y32 += (1<<(sft-1)); // Adding 0.5 Q8 to ensure proper rounding > y32 = WEBRTC_SPL_SHIFT_W32(y32, -sft); // Q(8) (rounded) > > Please doublecheck the above though. > not really. y32 is supposed to be Q14, and the rounding by "+tmp32no1 / 2" (before my change) is only to +/-2^(-14), rather than +/-2^(-8). So the division was meant to be very precise. What you propose makes y32 Q8 rounded, that is different from the original code.
https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right): https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/digital_agc.c:202: y32 = numFIX / tmp32no1; // in Q14 On 2016/05/26 02:28:20, minyue-webrtc wrote: > On 2016/05/24 13:37:10, peah-webrtc wrote: > > On 2016/05/24 13:05:45, minyue-webrtc wrote: > > > On 2016/05/24 10:46:29, peah-webrtc wrote: > > > > On 2016/05/24 07:16:12, minyue-webrtc wrote: > > > > > On 2016/05/24 06:16:20, peah-webrtc wrote: > > > > > > On 2016/05/24 01:32:26, minyue-webrtc wrote: > > > > > > > On 2016/05/23 12:00:19, peah-webrtc wrote: > > > > > > > > I don't think this is the right approach, even though it may work. > > > > > > > > It is quite complicated and it also makes use of the % operator > > which > > > I > > > > > > expect > > > > > > > > to be as complex as a division in fixed point code. > > > > > > > > > > > > > > > > The standard way of solving this is to normalize one step less, > > which > > > > will > > > > > > > give > > > > > > > > you headroom in the addition. I think you should do that, as it > > leads > > > to > > > > > > much > > > > > > > > simpler code and is cheaper as well. > > > > > > > > > > > > > > > > Furthermore, I don't think the addition of > > > > > > > > tmp32no1 / 2 is really needed. It should be sufficient to add 1<< > > > > > (zeros-1) > > > > > > > (or > > > > > > > > subtract that, dependent on the sign of numFix). > > > > > > > > That is the standard way of doing rounding in fixed point code. > > > > > > > > > > > > > > + 1<<(zeros-1) is to do ceil(), what we want here is round(). WDYT > > > > > > > > > > > > Sorry, I realize that I was wrong about that the addition of tmp32no1 > / > > 2 > > > > not > > > > > > being needed. My comment about that was not correct at all. > > > > > > > > > > > > However, I still think that setting zeros to one less than it is done > > now > > > is > > > > > the > > > > > > correct solution, as that will should give headroom for the addition > of > > > > > > tmp32nol/2. > > > > > > > > > > > > Changing the value of zeros to one less does not change the rounding > to > > a > > > > > ceil. > > > > > > It only affects the accuracy of the division. > > > > > > WDYT? > > > > > > > > > > I also realized that +1<<(zero-1) does not do ceil(). But that is not > the > > > core > > > > > part of the discussion. What worth discuss is weather we want a precise > > > > round() > > > > > or not. In my opinion, finding the nearest integer should be most > > > reasonable. > > > > 1) > > > > > it is well defined, 2) we do all shifting above to improve accuracy, and > > we > > > > > should not be too relaxed here, 3) Computationally, it should not be > much > > > more > > > > > complex. and if you think % will do another /, we may calculate > remainder > > by > > > > > numFIX - y32 * tmp32nol. > > > > > > > > I strongly doubt that we really need a rounding division here as this is a > > > > gaintable computation. Do you know what negative impact a truncating > > division > > > > would have? > > > > > > > > The drawback of doing a zeros-1 normalization instead of a zeros > > normalization > > > > is that you lose 1 bit accuracy in the 32 bit divisions. I would say that > is > > > > acceptable, in particular as the rest of the computations are in a fixed Q > > > > value. > > > > > > > > Furthermore, looking at your code for this, I'm actually not sure it is > > > > sufficient to avoid overflow. How can you guarantee that y32++ does not > > > > overflow? (I could not come up with a good example for that though.) > > > > > > > > > > 1) y32++ and -- is surely safe, since as long as |den| > 1, the ratio will > be > > > able to add/minus 1. > > > > > > 2) Regarding shifting zero -1 bits, I have thought about it. It is not > totally > > > safe. what numFIX is already large (zero = 0)? > > > > > > 3) Not adding tmp32nol / 2 is good, but we loose the precision of nearest > > > integer. > > > > > > If you like, I can go for 3). > > > > Looking at it again, I see that the division is wrongly done. It should be > > zeros = WebRtcSpl_NormW32(numFIX); > > numFIX *= 1 << zeros; // Q(14+zeros) > > int sft = WebRtcSpl_NormW32(den) - 16; > > tmp32no1 = WEBRTC_SPL_SHIFT_W32(den, -sft); // Q(8-sft) (ensures 16 bit > > precision of the division) > > y32 = numFIX / tmp32no1; // Q(14+zeros-8+sft) > > sft = 8 - (14+zeros-8+sft); > > y32 += (1<<(sft-1)); // Adding 0.5 Q8 to ensure proper rounding > > y32 = WEBRTC_SPL_SHIFT_W32(y32, -sft); // Q(8) (rounded) > > > > Please doublecheck the above though. > > > > not really. y32 is supposed to be Q14, and the rounding by "+tmp32no1 / 2" > (before my change) is only to +/-2^(-14), rather than +/-2^(-8). So the division > was meant to be very precise. What you propose makes y32 Q8 rounded, that is > different from the original code. Good find, sorry about that. I also realized that I did not add 1 bit of headroom for the rounding addition With this, I think that the below should be the correct code. Note, however, that furthermore, adding 0.5 Q14 only is valid for positive numbers so the code example needs to be modified accordingly. zeros = WebRtcSpl_NormW32(abs(numFIX)) -1; numFIX = abs(numFix) * (1 << zeros); // Q(14+zeros) int sft = WebRtcSpl_NormW32(den) - 16; tmp32no1 = WEBRTC_SPL_SHIFT_W32(den, -sft); // Q(8-sft) (ensures 16 bit precision of the division) y32 = numFIX / tmp32no1; // Q(14+zeros-8+sft) sft = 14 - (14+zeros-8+sft); y32 += ((numFix > 0) - (numFix < 0)) * (1<<(sft-1)); // Adding 0.5 Q14 to ensure proper rounding y32 = WEBRTC_SPL_SHIFT_W32(y32, -sft); // Q(14) (rounded) My point with the above is, however, that adding tmp32no1 / 2 is not necessary for the rounding as the rounding is based on the bit that is shifted out when placing y32 into Q14. To illustrate 5 (Q0) is 3 (Q1) if rounded 2 (Q1) if not rounded For this the rounding operation would be 101 (Q0) => (101 + 001) >> 1 (Q1) = 11 (Q1) and the non-rounding operation would be 101 (Q0) => 101 >> 1 (Q1) = 10 (Q1) If you really want to avoid the extra bit of headroom that avoids overflow when doing the rounding addition you could check the value of numFIX = abs(numFix) * (1 << zeros); and if that is so large that the addition will overflow then re-do the shift with zeros-1. I do, however, not think that is worthwhile. The above described scheme is the standard fixed-point way of doing this. If you want to do a simple fix, I would go for zeros-1 shifts. It is, as you point out, not perfect either if numFix is already normalized (zeros=0), but you will then anyway have potential problems with cases when den < 2^8 as then zeros==0 will cause den to be shifted to zero which will cause a division by zero. I think this division have several implicit assumptions about the magnitude of the variables anyway and those needs to be taken into account when designing the division.
https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right): https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/digital_agc.c:202: y32 = numFIX / tmp32no1; // in Q14 On 2016/05/26 07:48:38, peah-webrtc wrote: > On 2016/05/26 02:28:20, minyue-webrtc wrote: > > On 2016/05/24 13:37:10, peah-webrtc wrote: > > > On 2016/05/24 13:05:45, minyue-webrtc wrote: > > > > On 2016/05/24 10:46:29, peah-webrtc wrote: > > > > > On 2016/05/24 07:16:12, minyue-webrtc wrote: > > > > > > On 2016/05/24 06:16:20, peah-webrtc wrote: > > > > > > > On 2016/05/24 01:32:26, minyue-webrtc wrote: > > > > > > > > On 2016/05/23 12:00:19, peah-webrtc wrote: > > > > > > > > > I don't think this is the right approach, even though it may > work. > > > > > > > > > It is quite complicated and it also makes use of the % operator > > > which > > > > I > > > > > > > expect > > > > > > > > > to be as complex as a division in fixed point code. > > > > > > > > > > > > > > > > > > The standard way of solving this is to normalize one step less, > > > which > > > > > will > > > > > > > > give > > > > > > > > > you headroom in the addition. I think you should do that, as it > > > leads > > > > to > > > > > > > much > > > > > > > > > simpler code and is cheaper as well. > > > > > > > > > > > > > > > > > > Furthermore, I don't think the addition of > > > > > > > > > tmp32no1 / 2 is really needed. It should be sufficient to add > 1<< > > > > > > (zeros-1) > > > > > > > > (or > > > > > > > > > subtract that, dependent on the sign of numFix). > > > > > > > > > That is the standard way of doing rounding in fixed point code. > > > > > > > > > > > > > > > > + 1<<(zeros-1) is to do ceil(), what we want here is round(). WDYT > > > > > > > > > > > > > > Sorry, I realize that I was wrong about that the addition of > tmp32no1 > > / > > > 2 > > > > > not > > > > > > > being needed. My comment about that was not correct at all. > > > > > > > > > > > > > > However, I still think that setting zeros to one less than it is > done > > > now > > > > is > > > > > > the > > > > > > > correct solution, as that will should give headroom for the addition > > of > > > > > > > tmp32nol/2. > > > > > > > > > > > > > > Changing the value of zeros to one less does not change the rounding > > to > > > a > > > > > > ceil. > > > > > > > It only affects the accuracy of the division. > > > > > > > WDYT? > > > > > > > > > > > > I also realized that +1<<(zero-1) does not do ceil(). But that is not > > the > > > > core > > > > > > part of the discussion. What worth discuss is weather we want a > precise > > > > > round() > > > > > > or not. In my opinion, finding the nearest integer should be most > > > > reasonable. > > > > > 1) > > > > > > it is well defined, 2) we do all shifting above to improve accuracy, > and > > > we > > > > > > should not be too relaxed here, 3) Computationally, it should not be > > much > > > > more > > > > > > complex. and if you think % will do another /, we may calculate > > remainder > > > by > > > > > > numFIX - y32 * tmp32nol. > > > > > > > > > > I strongly doubt that we really need a rounding division here as this is > a > > > > > gaintable computation. Do you know what negative impact a truncating > > > division > > > > > would have? > > > > > > > > > > The drawback of doing a zeros-1 normalization instead of a zeros > > > normalization > > > > > is that you lose 1 bit accuracy in the 32 bit divisions. I would say > that > > is > > > > > acceptable, in particular as the rest of the computations are in a fixed > Q > > > > > value. > > > > > > > > > > Furthermore, looking at your code for this, I'm actually not sure it is > > > > > sufficient to avoid overflow. How can you guarantee that y32++ does not > > > > > overflow? (I could not come up with a good example for that though.) > > > > > > > > > > > > > 1) y32++ and -- is surely safe, since as long as |den| > 1, the ratio > will > > be > > > > able to add/minus 1. > > > > > > > > 2) Regarding shifting zero -1 bits, I have thought about it. It is not > > totally > > > > safe. what numFIX is already large (zero = 0)? > > > > > > > > 3) Not adding tmp32nol / 2 is good, but we loose the precision of nearest > > > > integer. > > > > > > > > If you like, I can go for 3). > > > > > > Looking at it again, I see that the division is wrongly done. It should be > > > zeros = WebRtcSpl_NormW32(numFIX); > > > numFIX *= 1 << zeros; // Q(14+zeros) > > > int sft = WebRtcSpl_NormW32(den) - 16; > > > tmp32no1 = WEBRTC_SPL_SHIFT_W32(den, -sft); // Q(8-sft) (ensures 16 bit > > > precision of the division) > > > y32 = numFIX / tmp32no1; // Q(14+zeros-8+sft) > > > sft = 8 - (14+zeros-8+sft); > > > y32 += (1<<(sft-1)); // Adding 0.5 Q8 to ensure proper rounding > > > y32 = WEBRTC_SPL_SHIFT_W32(y32, -sft); // Q(8) (rounded) > > > > > > Please doublecheck the above though. > > > > > > > not really. y32 is supposed to be Q14, and the rounding by "+tmp32no1 / 2" > > (before my change) is only to +/-2^(-14), rather than +/-2^(-8). So the > division > > was meant to be very precise. What you propose makes y32 Q8 rounded, that is > > different from the original code. > > Good find, sorry about that. I also realized that I did not add 1 bit of > headroom for the rounding addition With this, I think that the below should be > the correct code. Note, however, that furthermore, adding 0.5 Q14 only is valid > for positive numbers so the code example needs to be modified accordingly. > > zeros = WebRtcSpl_NormW32(abs(numFIX)) -1; > numFIX = abs(numFix) * (1 << zeros); // Q(14+zeros) > int sft = WebRtcSpl_NormW32(den) - 16; > tmp32no1 = WEBRTC_SPL_SHIFT_W32(den, -sft); // Q(8-sft) (ensures 16 bit > precision of the division) > y32 = numFIX / tmp32no1; // Q(14+zeros-8+sft) > sft = 14 - (14+zeros-8+sft); > y32 += ((numFix > 0) - (numFix < 0)) * (1<<(sft-1)); // Adding 0.5 Q14 to > ensure proper rounding > y32 = WEBRTC_SPL_SHIFT_W32(y32, -sft); // Q(14) (rounded) > > > My point with the above is, however, that adding tmp32no1 / 2 is not necessary > for the rounding as the rounding is based on the bit that is shifted out when > placing y32 into Q14. > > To illustrate > 5 (Q0) is > 3 (Q1) if rounded > 2 (Q1) if not rounded > > For this the rounding operation would be > 101 (Q0) => (101 + 001) >> 1 (Q1) = 11 (Q1) > and the non-rounding operation would be > 101 (Q0) => 101 >> 1 (Q1) = 10 (Q1) > > If you really want to avoid the extra bit of headroom that avoids overflow when > doing the rounding addition you could check the value of numFIX = abs(numFix) * > (1 << zeros); and if that is so large that the addition will overflow then re-do > the shift with zeros-1. I do, however, not think that is worthwhile. > > The above described scheme is the standard fixed-point way of doing this. If you > want to do a simple fix, I would go for zeros-1 shifts. It is, as you point out, > not perfect either if numFix is already normalized (zeros=0), but you will then > anyway have potential problems with cases when den < 2^8 as then zeros==0 will > cause den to be shifted to zero which will cause a division by zero. I think > this division have several implicit assumptions about the magnitude of the > variables anyway and those needs to be taken into account when designing the > division. Having looked at it again, I think the proper code should be as below. But please doublecheck it as it is really easy to get that wrong. int sign_numfix = (numFix > 0) - (numFix < 0); numFIX = abs(numFIX); zeros = WebRtcSpl_NormW32(numFIX) - 1; // Doing this in proper shifts is faster than using WEBRTC_SPL_SHIFT_W32 as that one both shifts and multiplies. if (zeros < 0) numFIX = numFix >> abs(zeros); // Q(14+zeros) else if (sft > 0) numFIX = numFix << zeros; // Q(14+zeros) int sign_den = (den > 0) - (den < 0); den = abs(den); // Q(8) int sft = WebRtcSpl_NormW32(den) - 16; // -16 to ensure 16 bit precision in the result of the division. // Doing this in proper shifts is faster than using WEBRTC_SPL_SHIFT_W32 as that one both shifts and multiplies. if (sft < 0) tmp32no1 = den >> abs(sft); // Q(8-sft) else if (sft > 0) tmp32no1 = den << sft; // Q(8-sft) y32 = numFIX / tmp32no1; // Q(14+zeros-(8-sft)) sft = 14 - (14+zeros-8+sft); // Only round if shifting downwards. if ((sft-1) <= 0) y32 += (1<<abs((sft-1))); // Adding 0.5 Q14 to ensure proper rounding if (sft < 0) y32 = y32 >> abs(sft); // Q(14) (rounded) else if (sft > 0) y32 = y32 << sft; // Q(14) (rounded) y32 *= int sign_numfix * int sign_den;
Thank you for the inspiring idea! I simplified your approach and uploaded a new patch. Bit exactness shows that it gives the same result as my earlier (more complicated) solution. https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right): https://codereview.webrtc.org/2003623003/diff/180001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/digital_agc.c:202: y32 = numFIX / tmp32no1; // in Q14 On 2016/05/26 12:27:54, peah-webrtc wrote: > On 2016/05/26 07:48:38, peah-webrtc wrote: > > On 2016/05/26 02:28:20, minyue-webrtc wrote: > > > On 2016/05/24 13:37:10, peah-webrtc wrote: > > > > On 2016/05/24 13:05:45, minyue-webrtc wrote: > > > > > On 2016/05/24 10:46:29, peah-webrtc wrote: > > > > > > On 2016/05/24 07:16:12, minyue-webrtc wrote: > > > > > > > On 2016/05/24 06:16:20, peah-webrtc wrote: > > > > > > > > On 2016/05/24 01:32:26, minyue-webrtc wrote: > > > > > > > > > On 2016/05/23 12:00:19, peah-webrtc wrote: > > > > > > > > > > I don't think this is the right approach, even though it may > > work. > > > > > > > > > > It is quite complicated and it also makes use of the % > operator > > > > which > > > > > I > > > > > > > > expect > > > > > > > > > > to be as complex as a division in fixed point code. > > > > > > > > > > > > > > > > > > > > The standard way of solving this is to normalize one step > less, > > > > which > > > > > > will > > > > > > > > > give > > > > > > > > > > you headroom in the addition. I think you should do that, as > it > > > > leads > > > > > to > > > > > > > > much > > > > > > > > > > simpler code and is cheaper as well. > > > > > > > > > > > > > > > > > > > > Furthermore, I don't think the addition of > > > > > > > > > > tmp32no1 / 2 is really needed. It should be sufficient to add > > 1<< > > > > > > > (zeros-1) > > > > > > > > > (or > > > > > > > > > > subtract that, dependent on the sign of numFix). > > > > > > > > > > That is the standard way of doing rounding in fixed point > code. > > > > > > > > > > > > > > > > > > + 1<<(zeros-1) is to do ceil(), what we want here is round(). > WDYT > > > > > > > > > > > > > > > > Sorry, I realize that I was wrong about that the addition of > > tmp32no1 > > > / > > > > 2 > > > > > > not > > > > > > > > being needed. My comment about that was not correct at all. > > > > > > > > > > > > > > > > However, I still think that setting zeros to one less than it is > > done > > > > now > > > > > is > > > > > > > the > > > > > > > > correct solution, as that will should give headroom for the > addition > > > of > > > > > > > > tmp32nol/2. > > > > > > > > > > > > > > > > Changing the value of zeros to one less does not change the > rounding > > > to > > > > a > > > > > > > ceil. > > > > > > > > It only affects the accuracy of the division. > > > > > > > > WDYT? > > > > > > > > > > > > > > I also realized that +1<<(zero-1) does not do ceil(). But that is > not > > > the > > > > > core > > > > > > > part of the discussion. What worth discuss is weather we want a > > precise > > > > > > round() > > > > > > > or not. In my opinion, finding the nearest integer should be most > > > > > reasonable. > > > > > > 1) > > > > > > > it is well defined, 2) we do all shifting above to improve accuracy, > > and > > > > we > > > > > > > should not be too relaxed here, 3) Computationally, it should not be > > > much > > > > > more > > > > > > > complex. and if you think % will do another /, we may calculate > > > remainder > > > > by > > > > > > > numFIX - y32 * tmp32nol. > > > > > > > > > > > > I strongly doubt that we really need a rounding division here as this > is > > a > > > > > > gaintable computation. Do you know what negative impact a truncating > > > > division > > > > > > would have? > > > > > > > > > > > > The drawback of doing a zeros-1 normalization instead of a zeros > > > > normalization > > > > > > is that you lose 1 bit accuracy in the 32 bit divisions. I would say > > that > > > is > > > > > > acceptable, in particular as the rest of the computations are in a > fixed > > Q > > > > > > value. > > > > > > > > > > > > Furthermore, looking at your code for this, I'm actually not sure it > is > > > > > > sufficient to avoid overflow. How can you guarantee that y32++ does > not > > > > > > overflow? (I could not come up with a good example for that though.) > > > > > > > > > > > > > > > > 1) y32++ and -- is surely safe, since as long as |den| > 1, the ratio > > will > > > be > > > > > able to add/minus 1. > > > > > > > > > > 2) Regarding shifting zero -1 bits, I have thought about it. It is not > > > totally > > > > > safe. what numFIX is already large (zero = 0)? > > > > > > > > > > 3) Not adding tmp32nol / 2 is good, but we loose the precision of > nearest > > > > > integer. > > > > > > > > > > If you like, I can go for 3). > > > > > > > > Looking at it again, I see that the division is wrongly done. It should be > > > > zeros = WebRtcSpl_NormW32(numFIX); > > > > numFIX *= 1 << zeros; // Q(14+zeros) > > > > int sft = WebRtcSpl_NormW32(den) - 16; > > > > tmp32no1 = WEBRTC_SPL_SHIFT_W32(den, -sft); // Q(8-sft) (ensures 16 bit > > > > precision of the division) > > > > y32 = numFIX / tmp32no1; // Q(14+zeros-8+sft) > > > > sft = 8 - (14+zeros-8+sft); > > > > y32 += (1<<(sft-1)); // Adding 0.5 Q8 to ensure proper rounding > > > > y32 = WEBRTC_SPL_SHIFT_W32(y32, -sft); // Q(8) (rounded) > > > > > > > > Please doublecheck the above though. > > > > > > > > > > not really. y32 is supposed to be Q14, and the rounding by "+tmp32no1 / 2" > > > (before my change) is only to +/-2^(-14), rather than +/-2^(-8). So the > > division > > > was meant to be very precise. What you propose makes y32 Q8 rounded, that is > > > different from the original code. > > > > Good find, sorry about that. I also realized that I did not add 1 bit of > > headroom for the rounding addition With this, I think that the below should be > > the correct code. Note, however, that furthermore, adding 0.5 Q14 only is > valid > > for positive numbers so the code example needs to be modified accordingly. > > > > zeros = WebRtcSpl_NormW32(abs(numFIX)) -1; > > numFIX = abs(numFix) * (1 << zeros); // Q(14+zeros) > > int sft = WebRtcSpl_NormW32(den) - 16; > > tmp32no1 = WEBRTC_SPL_SHIFT_W32(den, -sft); // Q(8-sft) (ensures 16 bit > > precision of the division) > > y32 = numFIX / tmp32no1; // Q(14+zeros-8+sft) > > sft = 14 - (14+zeros-8+sft); > > y32 += ((numFix > 0) - (numFix < 0)) * (1<<(sft-1)); // Adding 0.5 Q14 to > > ensure proper rounding > > y32 = WEBRTC_SPL_SHIFT_W32(y32, -sft); // Q(14) (rounded) > > > > > > My point with the above is, however, that adding tmp32no1 / 2 is not necessary > > for the rounding as the rounding is based on the bit that is shifted out when > > placing y32 into Q14. > > > > To illustrate > > 5 (Q0) is > > 3 (Q1) if rounded > > 2 (Q1) if not rounded > > > > For this the rounding operation would be > > 101 (Q0) => (101 + 001) >> 1 (Q1) = 11 (Q1) > > and the non-rounding operation would be > > 101 (Q0) => 101 >> 1 (Q1) = 10 (Q1) > > > > If you really want to avoid the extra bit of headroom that avoids overflow > when > > doing the rounding addition you could check the value of numFIX = abs(numFix) > * > > (1 << zeros); and if that is so large that the addition will overflow then > re-do > > the shift with zeros-1. I do, however, not think that is worthwhile. > > > > The above described scheme is the standard fixed-point way of doing this. If > you > > want to do a simple fix, I would go for zeros-1 shifts. It is, as you point > out, > > not perfect either if numFix is already normalized (zeros=0), but you will > then > > anyway have potential problems with cases when den < 2^8 as then zeros==0 will > > cause den to be shifted to zero which will cause a division by zero. I think > > this division have several implicit assumptions about the magnitude of the > > variables anyway and those needs to be taken into account when designing the > > division. > > Having looked at it again, I think the proper code should be as below. But > please doublecheck it as it is really easy to get that wrong. > > int sign_numfix = (numFix > 0) - (numFix < 0); > > numFIX = abs(numFIX); > > zeros = WebRtcSpl_NormW32(numFIX) - 1; > > > // Doing this in proper shifts is faster than using WEBRTC_SPL_SHIFT_W32 as that > one both shifts and multiplies. > if (zeros < 0) > numFIX = numFix >> abs(zeros); // Q(14+zeros) > > else if (sft > 0) > numFIX = numFix << zeros; // Q(14+zeros) > > > int sign_den = (den > 0) - (den < 0); > > den = abs(den); // Q(8) > > int sft = WebRtcSpl_NormW32(den) - 16; // -16 to ensure 16 bit precision in > the result of the division. > > // Doing this in proper shifts is faster than using WEBRTC_SPL_SHIFT_W32 as that > one both shifts and multiplies. > if (sft < 0) > tmp32no1 = den >> abs(sft); // Q(8-sft) > > else if (sft > 0) > tmp32no1 = den << sft; // Q(8-sft) > > > > y32 = numFIX / tmp32no1; // Q(14+zeros-(8-sft)) > > sft = 14 - (14+zeros-8+sft); > > > // Only round if shifting downwards. > if ((sft-1) <= 0) > > y32 += (1<<abs((sft-1))); // Adding 0.5 Q14 to ensure > proper rounding > > if (sft < 0) > y32 = y32 >> abs(sft); // Q(14) (rounded) > > else if (sft > 0) > y32 = y32 << sft; // Q(14) (rounded) > > > y32 *= int sign_numfix * int sign_den; > > > I see the point: do division so that the quotient is Q15 or more, then add 1 and right shift. I think Q15 is just right, no need for more. I simplify it and see the new patch set.
Hi Per, Would you take a look at the new patch?
https://codereview.webrtc.org/2003623003/diff/200001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right): https://codereview.webrtc.org/2003623003/diff/200001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/analog_agc.c:483: if (env[i] >= kZeroThreshold || tmp32 >= kZeroThreshold) { What happened to the int64 sum variant? https://codereview.webrtc.org/2003623003/diff/200001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right): https://codereview.webrtc.org/2003623003/diff/200001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/digital_agc.c:192: if (numFIX > (den >> 8) || -numFIX > (den >> 8)) // |den| is Q8. The if-condition will compute den >> 8 twice. Please pre-compute it outside the if-statement. https://codereview.webrtc.org/2003623003/diff/200001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/digital_agc.c:203: y32 = (y32 + 1) / 2; // in Q14 rounded Nice! This should work apart from that the rounding is wrong if y32<0. Furthermore, I think you can do the division by 2 as a left shift, but please take care to ensure that a negative sign does not get into the way.
https://codereview.webrtc.org/2003623003/diff/200001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right): https://codereview.webrtc.org/2003623003/diff/200001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/digital_agc.c:203: y32 = (y32 + 1) / 2; // in Q14 rounded On 2016/06/03 05:10:35, peah-webrtc wrote: > Nice! > This should work apart from that the rounding is wrong if y32<0. > > Furthermore, I think you can do the division by 2 as a left shift, but please > take care to ensure that a negative sign does not get into the way. Sorry, meant right shift.
https://codereview.webrtc.org/2003623003/diff/200001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right): https://codereview.webrtc.org/2003623003/diff/200001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/digital_agc.c:203: y32 = (y32 + 1) / 2; // in Q14 rounded On 2016/06/03 05:10:35, peah-webrtc wrote: > Nice! > This should work apart from that the rounding is wrong if y32<0. > > Furthermore, I think you can do the division by 2 as a left shift, but please > take care to ensure that a negative sign does not get into the way. Nice catch! I was careless there. To round negative value (a), is it either (a - 1) / 2, or a >> 1, no "-1" if using shift. See my new patch.
https://codereview.webrtc.org/2003623003/diff/220001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right): https://codereview.webrtc.org/2003623003/diff/220001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/analog_agc.c:483: if (env[i] >= kZeroThreshold || tmp32 >= kZeroThreshold) { What about my comment on what happened to the int64 sum variant? https://codereview.webrtc.org/2003623003/diff/220001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right): https://codereview.webrtc.org/2003623003/diff/220001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/digital_agc.c:203: y32 = (y32 + (y32 > 0 ? 1 : 0)) >> 1; // This is to do rounding in Q14. As far as I know, right-shifting negative numbers is implementation dependent so I don't think that is a good idea. Therefore, when y32 is negative I think you need to apply the shift to -y32 and then negate the result. But please doublecheck this. Furthermore, the rounding is still wrong, I think. You should subtract -1 if y32 is negative, right? Please doublecheck though.
On 2016/06/03 06:51:12, peah-webrtc wrote: > https://codereview.webrtc.org/2003623003/diff/220001/webrtc/modules/audio_pro... > File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right): > > https://codereview.webrtc.org/2003623003/diff/220001/webrtc/modules/audio_pro... > webrtc/modules/audio_processing/agc/legacy/analog_agc.c:483: if (env[i] >= > kZeroThreshold || tmp32 >= kZeroThreshold) { > What about my comment on what happened to the int64 sum variant? > > https://codereview.webrtc.org/2003623003/diff/220001/webrtc/modules/audio_pro... > File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right): > > https://codereview.webrtc.org/2003623003/diff/220001/webrtc/modules/audio_pro... > webrtc/modules/audio_processing/agc/legacy/digital_agc.c:203: y32 = (y32 + (y32 > > 0 ? 1 : 0)) >> 1; // This is to do rounding in Q14. > As far as I know, right-shifting negative numbers is implementation dependent so > I don't think that is a good idea. Therefore, when y32 is negative I think you > need to apply the shift to -y32 and then negate the result. But please > doublecheck this. > > Furthermore, the rounding is still wrong, I think. You should subtract -1 if y32 > is negative, right? Please doublecheck though. since the right shift of negative number is implementation dependent, the best way to do it is "/2". compiler can optimize it. see a new patch.
https://codereview.webrtc.org/2003623003/diff/240001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right): https://codereview.webrtc.org/2003623003/diff/240001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/analog_agc.c:483: if (env[i] >= kZeroThreshold || tmp32 >= kZeroThreshold) { Again: what about my comment on what happened to the int64 sum variant? https://codereview.webrtc.org/2003623003/diff/240001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/agc/legacy/digital_agc.c (right): https://codereview.webrtc.org/2003623003/diff/240001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/digital_agc.c:203: y32 = (y32 + (y32 >= 0 ? 1 : -1)) / 2; // This is to do rounding in Q14. Why not use: if (y32 < 0) y32 = -(((-y32) + 1) >> 1); else y32 = (y32 + 1) >> 1); The above statement is not more complex than the above, as that also contains a short-hand if statement. I doubt that the compiler will optimize the division by two into a shift, as it too need to make the assumption on the sign to achieve that.
I like your idea
https://codereview.webrtc.org/2003623003/diff/260001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right): https://codereview.webrtc.org/2003623003/diff/260001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/analog_agc.c:484: break; You still have not revised this according to your earlier comment about moving to using a 64 bit variable.
true! thanks https://codereview.webrtc.org/2003623003/diff/260001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right): https://codereview.webrtc.org/2003623003/diff/260001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/analog_agc.c:484: break; On 2016/06/03 08:27:12, peah-webrtc wrote: > You still have not revised this according to your earlier comment about moving > to using a 64 bit variable. Ah! true. Thank you so much!
https://codereview.webrtc.org/2003623003/diff/280001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right): https://codereview.webrtc.org/2003623003/diff/280001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/agc/legacy/analog_agc.c:477: int64_t tmp32 = 0; I think the variable name should be changed as well, as the "32" part indicates that it is a 32 bit value.
On 2016/06/03 08:34:46, peah-webrtc wrote: > https://codereview.webrtc.org/2003623003/diff/280001/webrtc/modules/audio_pro... > File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right): > > https://codereview.webrtc.org/2003623003/diff/280001/webrtc/modules/audio_pro... > webrtc/modules/audio_processing/agc/legacy/analog_agc.c:477: int64_t tmp32 = 0; > I think the variable name should be changed as well, as the "32" part indicates > that it is a 32 bit value. right! I changed it to tmp (without nr of bits)
On 2016/06/03 10:20:32, minyue-webrtc wrote: > On 2016/06/03 08:34:46, peah-webrtc wrote: > > > https://codereview.webrtc.org/2003623003/diff/280001/webrtc/modules/audio_pro... > > File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right): > > > > > https://codereview.webrtc.org/2003623003/diff/280001/webrtc/modules/audio_pro... > > webrtc/modules/audio_processing/agc/legacy/analog_agc.c:477: int64_t tmp32 = > 0; > > I think the variable name should be changed as well, as the "32" part > indicates > > that it is a 32 bit value. > > right! I changed it to tmp (without nr of bits) Awesome! lgtm (but I think you will probably need to update the testvectors in the unittest as well).
On 2016/06/03 11:26:50, peah-webrtc wrote: > On 2016/06/03 10:20:32, minyue-webrtc wrote: > > On 2016/06/03 08:34:46, peah-webrtc wrote: > > > > > > https://codereview.webrtc.org/2003623003/diff/280001/webrtc/modules/audio_pro... > > > File webrtc/modules/audio_processing/agc/legacy/analog_agc.c (right): > > > > > > > > > https://codereview.webrtc.org/2003623003/diff/280001/webrtc/modules/audio_pro... > > > webrtc/modules/audio_processing/agc/legacy/analog_agc.c:477: int64_t tmp32 = > > 0; > > > I think the variable name should be changed as well, as the "32" part > > indicates > > > that it is a 32 bit value. > > > > right! I changed it to tmp (without nr of bits) > > Awesome! lgtm > > (but I think you will probably need to update the testvectors in the unittest as > well). You mean the reference levels in the bit exactness tests? No, if I do need so, I should have made mistakes in the last patches, since from ps2 the changes are mainly about complexity and should be bit exact.
The CQ bit was checked by minyue@webrtc.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/2003623003/300001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/2003623003/300001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: presubmit on tryserver.webrtc (JOB_FAILED, http://build.chromium.org/p/tryserver.webrtc/builders/presubmit/builds/6065)
minyue@webrtc.org changed reviewers: + kjellander@webrtc.org
On 2016/06/03 11:38:01, commit-bot: I haz the power wrote: > Try jobs failed on following builders: > presubmit on tryserver.webrtc (JOB_FAILED, > http://build.chromium.org/p/tryserver.webrtc/builders/presubmit/builds/6065) +kjellander to approve the ubsan blacklist.
On 2016/06/03 11:39:53, minyue-webrtc wrote: > On 2016/06/03 11:38:01, commit-bot: I haz the power wrote: > > Try jobs failed on following builders: > > presubmit on tryserver.webrtc (JOB_FAILED, > > http://build.chromium.org/p/tryserver.webrtc/builders/presubmit/builds/6065) > > +kjellander to approve the ubsan blacklist. tools/ubsan/blacklist.txt: lgtm
The CQ bit was checked by minyue@webrtc.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/2003623003/300001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/2003623003/300001
Message was sent while issue was closed.
Description was changed from ========== Re-enable UBsan on AGC. BUG=webrtc:5530 ========== to ========== Re-enable UBsan on AGC. BUG=webrtc:5530 ==========
Message was sent while issue was closed.
Committed patchset #8 (id:300001)
Message was sent while issue was closed.
Description was changed from ========== Re-enable UBsan on AGC. BUG=webrtc:5530 ========== to ========== Re-enable UBsan on AGC. BUG=webrtc:5530 Committed: https://crrev.com/293c86d67384c15f46b8296096a62a14b4a58d33 Cr-Commit-Position: refs/heads/master@{#13034} ==========
Message was sent while issue was closed.
Patchset 8 (id:??) landed as https://crrev.com/293c86d67384c15f46b8296096a62a14b4a58d33 Cr-Commit-Position: refs/heads/master@{#13034}
Message was sent while issue was closed.
A revert of this CL (patchset #8 id:300001) has been created in https://codereview.webrtc.org/2056683002/ by asapersson@webrtc.org. The reason for reverting is: Breaks bot.. |