Issue 2717973006: Export library that generates an HTLM file with the scores organized in tables.

Issue 2717973006: Export library that generates an HTLM file with the scores organized in tables. (Closed)

Created:
3 years, 9 months ago by AleBzk

Modified:
3 years, 8 months ago

Reviewers:
aleloi

CC:
webrtc-reviews_webrtc.org, peah-webrtc, Andrew MacDonald, aleloi, tterriberry_mozilla.com, audio-team_agora.io, hlundin-webrtc, kwiberg-webrtc, minyue-webrtc, the sun, aluebs-webrtc, bjornv1

Target Ref:
refs/heads/master

Project:
webrtc

Visibility:
Public.

More Reviews

Description

Export library that generates an HTLM file with the scores organized in tables. BUG=webrtc:7218 NOTRY=True Review-Url: https://codereview.webrtc.org/2717973006 Cr-Commit-Position: refs/heads/master@{#17512} Committed: https://chromium.googlesource.com/external/webrtc/+/29e3330139ff1dc070aad94e31955a006038d3e4

Patch Set 2 : comments from Alex addressed #

Patch Set 3 : rebase #

Created: 3 years, 8 months ago

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+179 lines, -0 lines)			Patch
M	webrtc/modules/audio_processing/test/py_quality_assessment/BUILD.gn	View	1 2	1 chunk	+2 lines, -0 lines	0 comments	Download
M	webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment_export.py	View	1 2	1 chunk	+151 lines, -0 lines	0 comments	Download
A	webrtc/modules/audio_processing/test/py_quality_assessment/quality_assessment/export.py	View	1	1 chunk	+26 lines, -0 lines	0 comments	Download

Depends on Patchset:

Issue 2720203003 Patch 20001

Dependent Patchsets:

Issue 2723703002 Patch 40001

Messages

Total messages: 14 (6 generated)

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages

AleBzk

Hi Alex, This is a script that collects the scores computed while evaluating APM with ...

3 years, 9 months ago (2017-02-28 16:32:17 UTC) #3

aleloi

lgtm after _get_score_descriptions looks better https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py File webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py (right): https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py#newcode67 webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py:67: score_name = parts[4][6:-4] The ...

3 years, 9 months ago (2017-03-09 12:06:06 UTC) #4

lgtm after _get_score_descriptions looks better

https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi...
File
webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py
(right):

https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi...
webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py:67:
score_name = parts[4][6:-4]
The indexes are difficult to follow. I think it would be clearer with regexs.

What about

(cfg, inp, noise, noisep, score) = score_filepath.split(os.sep)[-5:]
config_part = re.match("cfg-(.*)", cfg).groups(0)
# plus 4 lines for input_name, noise_name, etc?


Or something different: an inner function

def strip_prefix(a_str, prefix):
   assert a_str.starts_with(prefix)
   return a_str[len(prefix):]

config_name = strip_prefix(parts[0], "cfg-")
# plus 4 lines

https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi...
webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py:108:
nested_dict = lambda: collections.defaultdict(nested_dict)
It's a dict which returns a reference to itself for missing keys. Seems complex,
why is it needed? Is this a common pattern?

https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi...
webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py:116:
args.output_dir, 'cfg-*', 'input-*', 'noise-*', '*', 'score-*.txt')
This should filter out scores like these, right:

/path/to/cfg-default/input-1/noise-0/something/score-1.txt

Are we certain that os.path always works with globs?

https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi...
webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py:121:
_get_score_descriptors(score_filepath))
I would argue that from a readability POW, using regexs instead of the impl of
_get_score_descriptors is better.

Something along the lines of 

(cfg, inp, noise, noisep, score) =
re.search(".*/(cfg-.*)/(input-.*)/(noise-.*)/(.*)/(score-.*)[.]txt",
filepath).groups()

is better.

https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi...
webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py:137:
scores[score_name][config_name][input_name][noise_name][noise_params] = {
I'm not convinced about the nested dictionary. It's neat, but is it necessary?
Can the config be a tuple instead? scores[(score, cfg, inp, ...)] = {...}

If constructing the table in the dependent CL is easier with the grouping from
the nested dict, or if it's a common pattern it should stay.

https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi...
File
webrtc/modules/audio_processing/test/py_quality_assessment/quality_assessment/export.py
(right):

https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi...
webrtc/modules/audio_processing/test/py_quality_assessment/quality_assessment/export.py:22:
f.write('APM Quality Assessment scores\n')
And the actual table generation happens in the next CL here, right
https://codereview.webrtc.org/2723703002#ps20001

aleloi

https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py File webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py (right): https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py#newcode26 webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py:26: def _instance_arguments_parser(): Noticed from the comments to the dependent ...

3 years, 9 months ago (2017-03-09 12:15:06 UTC) #5

AleBzk

Hi Alex, thanks a lot for all your comments! You can see my replies below. ...

3 years, 9 months ago (2017-03-09 14:45:02 UTC) #6

Hi Alex, thanks a lot for all your comments! You can see my replies below. For
some, I made the corrections you proposed.

https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi...
File
webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py
(right):

https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi...
webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py:26:
def _instance_arguments_parser():
On 2017/03/09 12:15:06, aleloi wrote:
> Noticed from the comments to the dependent patch set that we use the Chromium
> python style guide. Names should be CamelCase. Not sure about what it says
about
> module privates starting with underscores, though. Maybe it should be
> _InstanceArgumentParser(). Please fix this too before submitting!

Done.

https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi...
webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py:67:
score_name = parts[4][6:-4]
On 2017/03/09 12:06:05, aleloi wrote:
> The indexes are difficult to follow. I think it would be clearer with regexs.
> 
> What about
> 
> (cfg, inp, noise, noisep, score) = score_filepath.split(os.sep)[-5:]
> config_part = re.match("cfg-(.*)", cfg).groups(0)
> # plus 4 lines for input_name, noise_name, etc?
> 
> 
> Or something different: an inner function
> 
> def strip_prefix(a_str, prefix):
>    assert a_str.starts_with(prefix)
>    return a_str[len(prefix):]
> 
> config_name = strip_prefix(parts[0], "cfg-")
> # plus 4 lines

Done.

https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi...
webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py:108:
nested_dict = lambda: collections.defaultdict(nested_dict)
On 2017/03/09 12:06:06, aleloi wrote:
> It's a dict which returns a reference to itself for missing keys. Seems
complex,
> why is it needed? Is this a common pattern?

Right, that's how it works.
This is a known practice to avoid the manual creation of intermediate keys when
they don't exist.
It's like "mkdir" vs "mkdir -p" and it improves the readability when nested
items are added (just one line).

One completely different way to go is using pandas to store all the data in a
DataFrame object and then exploiting pandas native indexing. However, I don't
expect that the way I nest data will change anytime soon, so I would keep
relying on nested dictionaries via collections, which is standard Python.

https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi...
webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py:116:
args.output_dir, 'cfg-*', 'input-*', 'noise-*', '*', 'score-*.txt')
On 2017/03/09 12:06:05, aleloi wrote:
> This should filter out scores like these, right:
> 
> /path/to/cfg-default/input-1/noise-0/something/score-1.txt
> 
> Are we certain that os.path always works with globs?

Thanks for asking. In this case * is not treated as wildcard, I'm using
os.path.join() to build the search pattern passed to glob.iglob(). The latter
will then treat starts in src_path as wildcards. Briefly, os.path.join() is a
mere string concatenation.

https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi...
webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py:121:
_get_score_descriptors(score_filepath))
On 2017/03/09 12:06:05, aleloi wrote:
> I would argue that from a readability POW, using regexs instead of the impl of
> _get_score_descriptors is better.
> 
> Something along the lines of 
> 
> (cfg, inp, noise, noisep, score) =
> re.search(".*/(cfg-.*)/(input-.*)/(noise-.*)/(.*)/(score-.*)[.]txt",
> filepath).groups()
> 
> is better.

Thanks a lot, you're right. I addressed this point in _GetScoreDescriptors()
using pre-compiled regexprs.

https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi...
webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py:137:
scores[score_name][config_name][input_name][noise_name][noise_params] = {
On 2017/03/09 12:06:05, aleloi wrote:
> I'm not convinced about the nested dictionary. It's neat, but is it necessary?
> Can the config be a tuple instead? scores[(score, cfg, inp, ...)] = {...}

That's a possible alternative - using frozenset() since keys must be immutable.
But... (see below)

> 
> If constructing the table in the dependent CL is easier with the grouping from
> the nested dict, or if it's a common pattern it should stay.

... you're right: constructing the table does take advantage from having a
nested structure.
It is by far more efficient. As pointed out in one of my previous comments, a
clean way to go would be relying on pandas DataFrames exploiting the indices you
get for free. Which is also good to decouple the way data is organized from the
way it's accessed for rendering purpose. But relying on third-party libraries
can be too much for the APM QA tool at this stage.

https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi...
File
webrtc/modules/audio_processing/test/py_quality_assessment/quality_assessment/export.py
(right):

https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi...
webrtc/modules/audio_processing/test/py_quality_assessment/quality_assessment/export.py:22:
f.write('APM Quality Assessment scores\n')
On 2017/03/09 12:06:06, aleloi wrote:
> And the actual table generation happens in the next CL here, right
> https://codereview.webrtc.org/2723703002#ps20001

Yup!

aleloi

Great work, ship it! https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py File webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py (right): https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py#newcode108 webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py:108: nested_dict = lambda: collections.defaultdict(nested_dict) On ...

3 years, 9 months ago (2017-03-09 14:49:24 UTC) #7

Great work, ship it!

https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi...
File
webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py
(right):

https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi...
webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py:108:
nested_dict = lambda: collections.defaultdict(nested_dict)
On 2017/03/09 14:45:01, AleBzk wrote:
> On 2017/03/09 12:06:06, aleloi wrote:
> > It's a dict which returns a reference to itself for missing keys. Seems
> complex,
> > why is it needed? Is this a common pattern?
> 
> Right, that's how it works.
> This is a known practice to avoid the manual creation of intermediate keys
when
> they don't exist.
> It's like "mkdir" vs "mkdir -p" and it improves the readability when nested
> items are added (just one line).
> 
> One completely different way to go is using pandas to store all the data in a
> DataFrame object and then exploiting pandas native indexing. However, I don't
> expect that the way I nest data will change anytime soon, so I would keep
> relying on nested dictionaries via collections, which is standard Python.

Acknowledged.

https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi...
webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py:116:
args.output_dir, 'cfg-*', 'input-*', 'noise-*', '*', 'score-*.txt')
On 2017/03/09 14:45:01, AleBzk wrote:
> On 2017/03/09 12:06:05, aleloi wrote:
> > This should filter out scores like these, right:
> > 
> > /path/to/cfg-default/input-1/noise-0/something/score-1.txt
> > 
> > Are we certain that os.path always works with globs?
> 
> Thanks for asking. In this case * is not treated as wildcard, I'm using
> os.path.join() to build the search pattern passed to glob.iglob(). The latter
> will then treat starts in src_path as wildcards. Briefly, os.path.join() is a
> mere string concatenation.

Acknowledged.

https://codereview.webrtc.org/2717973006/diff/1/webrtc/modules/audio_processi...
webrtc/modules/audio_processing/test/py_quality_assessment/apm_quality_assessment-export.py:137:
scores[score_name][config_name][input_name][noise_name][noise_params] = {
On 2017/03/09 14:45:01, AleBzk wrote:
> On 2017/03/09 12:06:05, aleloi wrote:
> > I'm not convinced about the nested dictionary. It's neat, but is it
necessary?
> > Can the config be a tuple instead? scores[(score, cfg, inp, ...)] = {...}
> 
> That's a possible alternative - using frozenset() since keys must be
immutable.
> But... (see below)
> 
> > 
> > If constructing the table in the dependent CL is easier with the grouping
from
> > the nested dict, or if it's a common pattern it should stay.
> 
> ... you're right: constructing the table does take advantage from having a
> nested structure.
> It is by far more efficient. As pointed out in one of my previous comments, a
> clean way to go would be relying on pandas DataFrames exploiting the indices
you
> get for free. Which is also good to decouple the way data is organized from
the
> way it's accessed for rendering purpose. But relying on third-party libraries
> can be too much for the APM QA tool at this stage.

Acknowledged.

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.webrtc.org/2717973006/40001

3 years, 8 months ago (2017-04-03 15:36:15 UTC) #10

commit-bot: I haz the power

Committed patchset #3 (id:40001) as https://chromium.googlesource.com/external/webrtc/+/29e3330139ff1dc070aad94e31955a006038d3e4

3 years, 8 months ago (2017-04-03 15:39:10 UTC) #13

AleBzk

3 years, 8 months ago (2017-04-03 15:46:49 UTC) #14

Message was sent while issue was closed.

A revert of this CL (patchset #3 id:40001) has been created in
https://codereview.webrtc.org/2791293002/ by alessiob@webrtc.org.

The reason for reverting is: Issue with rebasing from depending CL.

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages