|
|
Created:
3 years, 9 months ago by AleBzk Modified:
3 years, 8 months ago Reviewers:
aleloi CC:
webrtc-reviews_webrtc.org, peah-webrtc, Andrew MacDonald, aleloi, tterriberry_mozilla.com, audio-team_agora.io, hlundin-webrtc, kwiberg-webrtc, minyue-webrtc, the sun, aluebs-webrtc, bjornv1 Target Ref:
refs/heads/master Project:
webrtc Visibility:
Public. |
DescriptionExport library that generates an HTLM file with the scores organized in tables.
BUG=webrtc:7218
NOTRY=True
Review-Url: https://codereview.webrtc.org/2717973006
Cr-Commit-Position: refs/heads/master@{#17512}
Committed: https://chromium.googlesource.com/external/webrtc/+/29e3330139ff1dc070aad94e31955a006038d3e4
Patch Set 1 #
Total comments: 17
Patch Set 2 : comments from Alex addressed #Patch Set 3 : rebase #
Depends on Patchset: Dependent Patchsets: Messages
Total messages: 14 (6 generated)
Description was changed from ========== Export library that generates an HTLM file with the scores organized in tables. BUG=webrtc:7218 NOTRY=True ========== to ========== Export library that generates an HTLM file with the scores organized in tables. BUG=webrtc:7218 NOTRY=True ==========
alessiob@webrtc.org changed reviewers: + aleloi@webrtc.org
Hi Alex, This is a script that collects the scores computed while evaluating APM with different flags, noise conditions, and probing signals. It then generates an HTML file. This CL is only about the script and the interface of the export class.
lgtm after _get_score_descriptions looks better https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi... File webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py (right): https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py:67: score_name = parts[4][6:-4] The indexes are difficult to follow. I think it would be clearer with regexs. What about (cfg, inp, noise, noisep, score) = score_filepath.split(os.sep)[-5:] config_part = re.match("cfg-(.*)", cfg).groups(0) # plus 4 lines for input_name, noise_name, etc? Or something different: an inner function def strip_prefix(a_str, prefix): assert a_str.starts_with(prefix) return a_str[len(prefix):] config_name = strip_prefix(parts[0], "cfg-") # plus 4 lines https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py:108: nested_dict = lambda: collections.defaultdict(nested_dict) It's a dict which returns a reference to itself for missing keys. Seems complex, why is it needed? Is this a common pattern? https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py:116: args.output_dir, 'cfg-*', 'input-*', 'noise-*', '*', 'score-*.txt') This should filter out scores like these, right: /path/to/cfg-default/input-1/noise-0/something/score-1.txt Are we certain that os.path always works with globs? https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py:121: _get_score_descriptors(score_filepath)) I would argue that from a readability POW, using regexs instead of the impl of _get_score_descriptors is better. Something along the lines of (cfg, inp, noise, noisep, score) = re.search(".*/(cfg-.*)/(input-.*)/(noise-.*)/(.*)/(score-.*)[.]txt", filepath).groups() is better. https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py:137: scores[score_name][config_name][input_name][noise_name][noise_params] = { I'm not convinced about the nested dictionary. It's neat, but is it necessary? Can the config be a tuple instead? scores[(score, cfg, inp, ...)] = {...} If constructing the table in the dependent CL is easier with the grouping from the nested dict, or if it's a common pattern it should stay. https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi... File webrtc/modules/audio_processing/test/py_quality_assessment/quality_assessment/export.py (right): https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/test/py_quality_assessment/quality_assessment/export.py:22: f.write('APM Quality Assessment scores\n') And the actual table generation happens in the next CL here, right https://codereview.webrtc.org/2723703002#ps20001
https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi... File webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py (right): https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py:26: def _instance_arguments_parser(): Noticed from the comments to the dependent patch set that we use the Chromium python style guide. Names should be CamelCase. Not sure about what it says about module privates starting with underscores, though. Maybe it should be _InstanceArgumentParser(). Please fix this too before submitting!
Hi Alex, thanks a lot for all your comments! You can see my replies below. For some, I made the corrections you proposed. https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi... File webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py (right): https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py:26: def _instance_arguments_parser(): On 2017/03/09 12:15:06, aleloi wrote: > Noticed from the comments to the dependent patch set that we use the Chromium > python style guide. Names should be CamelCase. Not sure about what it says about > module privates starting with underscores, though. Maybe it should be > _InstanceArgumentParser(). Please fix this too before submitting! Done. https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py:67: score_name = parts[4][6:-4] On 2017/03/09 12:06:05, aleloi wrote: > The indexes are difficult to follow. I think it would be clearer with regexs. > > What about > > (cfg, inp, noise, noisep, score) = score_filepath.split(os.sep)[-5:] > config_part = re.match("cfg-(.*)", cfg).groups(0) > # plus 4 lines for input_name, noise_name, etc? > > > Or something different: an inner function > > def strip_prefix(a_str, prefix): > assert a_str.starts_with(prefix) > return a_str[len(prefix):] > > config_name = strip_prefix(parts[0], "cfg-") > # plus 4 lines Done. https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py:108: nested_dict = lambda: collections.defaultdict(nested_dict) On 2017/03/09 12:06:06, aleloi wrote: > It's a dict which returns a reference to itself for missing keys. Seems complex, > why is it needed? Is this a common pattern? Right, that's how it works. This is a known practice to avoid the manual creation of intermediate keys when they don't exist. It's like "mkdir" vs "mkdir -p" and it improves the readability when nested items are added (just one line). One completely different way to go is using pandas to store all the data in a DataFrame object and then exploiting pandas native indexing. However, I don't expect that the way I nest data will change anytime soon, so I would keep relying on nested dictionaries via collections, which is standard Python. https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py:116: args.output_dir, 'cfg-*', 'input-*', 'noise-*', '*', 'score-*.txt') On 2017/03/09 12:06:05, aleloi wrote: > This should filter out scores like these, right: > > /path/to/cfg-default/input-1/noise-0/something/score-1.txt > > Are we certain that os.path always works with globs? Thanks for asking. In this case * is not treated as wildcard, I'm using os.path.join() to build the search pattern passed to glob.iglob(). The latter will then treat starts in src_path as wildcards. Briefly, os.path.join() is a mere string concatenation. https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py:121: _get_score_descriptors(score_filepath)) On 2017/03/09 12:06:05, aleloi wrote: > I would argue that from a readability POW, using regexs instead of the impl of > _get_score_descriptors is better. > > Something along the lines of > > (cfg, inp, noise, noisep, score) = > re.search(".*/(cfg-.*)/(input-.*)/(noise-.*)/(.*)/(score-.*)[.]txt", > filepath).groups() > > is better. Thanks a lot, you're right. I addressed this point in _GetScoreDescriptors() using pre-compiled regexprs. https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py:137: scores[score_name][config_name][input_name][noise_name][noise_params] = { On 2017/03/09 12:06:05, aleloi wrote: > I'm not convinced about the nested dictionary. It's neat, but is it necessary? > Can the config be a tuple instead? scores[(score, cfg, inp, ...)] = {...} That's a possible alternative - using frozenset() since keys must be immutable. But... (see below) > > If constructing the table in the dependent CL is easier with the grouping from > the nested dict, or if it's a common pattern it should stay. ... you're right: constructing the table does take advantage from having a nested structure. It is by far more efficient. As pointed out in one of my previous comments, a clean way to go would be relying on pandas DataFrames exploiting the indices you get for free. Which is also good to decouple the way data is organized from the way it's accessed for rendering purpose. But relying on third-party libraries can be too much for the APM QA tool at this stage. https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi... File webrtc/modules/audio_processing/test/py_quality_assessment/quality_assessment/export.py (right): https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/test/py_quality_assessment/quality_assessment/export.py:22: f.write('APM Quality Assessment scores\n') On 2017/03/09 12:06:06, aleloi wrote: > And the actual table generation happens in the next CL here, right > https://codereview.webrtc.org/2723703002#ps20001 Yup!
Great work, ship it! https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi... File webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py (right): https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py:108: nested_dict = lambda: collections.defaultdict(nested_dict) On 2017/03/09 14:45:01, AleBzk wrote: > On 2017/03/09 12:06:06, aleloi wrote: > > It's a dict which returns a reference to itself for missing keys. Seems > complex, > > why is it needed? Is this a common pattern? > > Right, that's how it works. > This is a known practice to avoid the manual creation of intermediate keys when > they don't exist. > It's like "mkdir" vs "mkdir -p" and it improves the readability when nested > items are added (just one line). > > One completely different way to go is using pandas to store all the data in a > DataFrame object and then exploiting pandas native indexing. However, I don't > expect that the way I nest data will change anytime soon, so I would keep > relying on nested dictionaries via collections, which is standard Python. Acknowledged. https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py:116: args.output_dir, 'cfg-*', 'input-*', 'noise-*', '*', 'score-*.txt') On 2017/03/09 14:45:01, AleBzk wrote: > On 2017/03/09 12:06:05, aleloi wrote: > > This should filter out scores like these, right: > > > > /path/to/cfg-default/input-1/noise-0/something/score-1.txt > > > > Are we certain that os.path always works with globs? > > Thanks for asking. In this case * is not treated as wildcard, I'm using > os.path.join() to build the search pattern passed to glob.iglob(). The latter > will then treat starts in src_path as wildcards. Briefly, os.path.join() is a > mere string concatenation. Acknowledged. https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py:137: scores[score_name][config_name][input_name][noise_name][noise_params] = { On 2017/03/09 14:45:01, AleBzk wrote: > On 2017/03/09 12:06:05, aleloi wrote: > > I'm not convinced about the nested dictionary. It's neat, but is it necessary? > > Can the config be a tuple instead? scores[(score, cfg, inp, ...)] = {...} > > That's a possible alternative - using frozenset() since keys must be immutable. > But... (see below) > > > > > If constructing the table in the dependent CL is easier with the grouping from > > the nested dict, or if it's a common pattern it should stay. > > ... you're right: constructing the table does take advantage from having a > nested structure. > It is by far more efficient. As pointed out in one of my previous comments, a > clean way to go would be relying on pandas DataFrames exploiting the indices you > get for free. Which is also good to decouple the way data is organized from the > way it's accessed for rendering purpose. But relying on third-party libraries > can be too much for the APM QA tool at this stage. Acknowledged.
The CQ bit was checked by alessiob@webrtc.org
The patchset sent to the CQ was uploaded after l-g-t-m from aleloi@webrtc.org Link to the patchset: https://codereview.webrtc.org/2717973006/#ps40001 (title: "rebase")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.webrtc.org/...
CQ is committing da patch. Bot data: {"patchset_id": 40001, "attempt_start_ts": 1491233772334710, "parent_rev": "c42f54057050c933008a49d57582577bfb9aed25", "commit_rev": "29e3330139ff1dc070aad94e31955a006038d3e4"}
Message was sent while issue was closed.
Description was changed from ========== Export library that generates an HTLM file with the scores organized in tables. BUG=webrtc:7218 NOTRY=True ========== to ========== Export library that generates an HTLM file with the scores organized in tables. BUG=webrtc:7218 NOTRY=True Review-Url: https://codereview.webrtc.org/2717973006 Cr-Commit-Position: refs/heads/master@{#17512} Committed: https://chromium.googlesource.com/external/webrtc/+/29e3330139ff1dc070aad94e3... ==========
Message was sent while issue was closed.
Committed patchset #3 (id:40001) as https://chromium.googlesource.com/external/webrtc/+/29e3330139ff1dc070aad94e3...
Message was sent while issue was closed.
A revert of this CL (patchset #3 id:40001) has been created in https://codereview.webrtc.org/2791293002/ by alessiob@webrtc.org. The reason for reverting is: Issue with rebasing from depending CL. |