Issue 2656493002: Register metric Histogram names.

Issue 2656493002: Register metric Histogram names. (Closed)

Created:
3 years, 11 months ago by benjhayden

Modified:
3 years, 2 months ago

Reviewers:

CC:
catapult-reviews_chromium.org, tracing-review_chromium.org

Target Ref:
refs/heads/master

Project:
catapult

Visibility:
Public.

More Reviews

Description

Register metric Histogram names. Currently, if anything at all goes wrong on the long and twisty path to computing metrics, then the value is simply missing. In order to figure out why it's missing, we need to dig manually into unstructured logs. Previously, telemetry could sometimes produce FailureValues, but FailureValues are specified at the wrong granularity, and don't actually solve the problem, and aren't produced at every level of the long and twisty path. If metrics register the names of the values that they produce, then, in the case of failure, each step of the long and twisty path can still produce all of the values that should have been produced. Failure Histograms are the new FailureValues. The secret is that Failure Histograms are just Histograms with a FailureInfo diagnostic. Everything that applies to Histograms in the successful case also applies to Failure Histograms, i.e. TelemetryInfo, BuildbotInfo, DeviceInfo, etc. Since Failure Histograms have the same names as successful Histograms, they can be surfaced on the dashboard in the timeseries charts. This is a huge improvement over the status quo, in which charts often have huge unhelpful gaping holes. FailureInfo Diagnostics can also be displayed beautifully on sheriff-o-matic and aggregated to compute long-term big data statistics about failure modes. BUG=catapult:#3076

	Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+218 lines, -35 lines)			Patch
M	telemetry/telemetry/internal/story_runner.py	View	1 2 3	2 chunks	+9 lines, -1 line	0 comments	Download
M	telemetry/telemetry/web_perf/timeline_based_measurement.py	View	1 2 3	3 chunks	+11 lines, -6 lines	0 comments	Download
A	tracing/tracing/metrics/all_histogram_names.py	View		1 chunk	+33 lines, -0 lines	0 comments	Download
A	tracing/tracing/metrics/all_histogram_names_cmdline.html	View		1 chunk	+23 lines, -0 lines	0 comments	Download
M	tracing/tracing/metrics/blink/gc_metric.html	View	1 2 3	1 chunk	+3 lines, -1 line	0 comments	Download
M	tracing/tracing/metrics/cpu_process_metric.html	View	1 2 3	1 chunk	+3 lines, -1 line	0 comments	Download
M	tracing/tracing/metrics/metric_map_function.html	View	1 2 3	1 chunk	+47 lines, -4 lines	0 comments	Download
M	tracing/tracing/metrics/metric_registry.html	View	1 2 3	1 chunk	+19 lines, -1 line	0 comments	Download
M	tracing/tracing/metrics/metric_registry_test.html	View	1 2 3	4 chunks	+14 lines, -3 lines	0 comments	Download
M	tracing/tracing/metrics/sample_metric.html	View		1 chunk	+3 lines, -1 line	0 comments	Download
M	tracing/tracing/metrics/system_health/clock_sync_latency_metric.html	View	1 2 3	1 chunk	+3 lines, -1 line	0 comments	Download
M	tracing/tracing/metrics/system_health/cpu_time_metric.html	View		1 chunk	+2 lines, -1 line	0 comments	Download
M	tracing/tracing/metrics/system_health/estimated_input_latency_metric.html	View	1 2 3	1 chunk	+3 lines, -1 line	0 comments	Download
M	tracing/tracing/metrics/system_health/loading_metric.html	View	1 2 3	1 chunk	+3 lines, -1 line	0 comments	Download
M	tracing/tracing/metrics/system_health/long_tasks_metric.html	View		1 chunk	+2 lines, -1 line	0 comments	Download
M	tracing/tracing/metrics/system_health/memory_metric.html	View	1 2 3	1 chunk	+2 lines, -1 line	0 comments	Download
M	tracing/tracing/metrics/system_health/power_metric.html	View	1 2 3	1 chunk	+3 lines, -1 line	0 comments	Download
M	tracing/tracing/metrics/system_health/responsiveness_metric.html	View	1 2 3	1 chunk	+2 lines, -1 line	0 comments	Download
M	tracing/tracing/metrics/system_health/webview_startup_metric.html	View		1 chunk	+3 lines, -1 line	0 comments	Download
M	tracing/tracing/metrics/tracing_metric.html	View	1 2 3	1 chunk	+3 lines, -1 line	0 comments	Download
M	tracing/tracing/metrics/v8/execution_metric.html	View		1 chunk	+3 lines, -1 line	0 comments	Download
M	tracing/tracing/metrics/v8/gc_metric.html	View	1 2 3	1 chunk	+3 lines, -1 line	0 comments	Download
M	tracing/tracing/metrics/v8/runtime_stats_metric.html	View	1 2 3	1 chunk	+6 lines, -2 lines	0 comments	Download
M	tracing/tracing/metrics/v8/v8_metrics.html	View		1 chunk	+3 lines, -1 line	0 comments	Download
M	tracing/tracing/ui/side_panel/metrics_side_panel_test.html	View		1 chunk	+3 lines, -1 line	0 comments	Download
M	tracing/tracing_project.py	View	1 2 3	2 chunks	+9 lines, -1 line	0 comments	Download

Messages

Total messages: 5 (5 generated)

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages

benjhayden

Description was changed from ========== Register metric Histogram names. BUG=catapult:#3076 ========== to ========== Register metric ...

3 years, 11 months ago (2017-01-23 23:19:24 UTC) #4

benjhayden

3 years, 11 months ago (2017-01-24 05:20:07 UTC) #5

Description was changed from

==========
Register metric Histogram names.

Currently, if anything at all goes wrong on the long and twisty path to
computing
metrics, then the value is simply missing. In order to figure out why it's
missing,
we need to dig manually into unstructured logs.

Previously, telemetry could sometimes produce FailureValues, but FailureValues
are
specified at the wrong granularity, and don't actually solve the problem, and
aren't
produced at every level of the long and twisty path.

If metrics register the names of the values that they produce, then, in the
event of
failure, each step of the long and twisty path can still produce all of the
values
that should have been produced.

Failure Histograms are the new FailureValues. The secret is that Failure
Histograms
are just Histograms, with a FailureInfo diagnostic. Everything that applies to
Histograms in the successful case also applies to Failure Histograms, i.e.
TelemetryInfo, BuildbotInfo, DeviceInfo, etc.

Since Failure Histograms have the same names as successful Histograms, they can
be
surfaced on the dashboard in the timeseries charts. This is a huge improvement
over
the status quo, in which charts often have huge unhelpful gaping holes.

FailureInfo Diagnostics can also be displayed beautifully on sheriff-o-matic and
aggregated to compute long-term big data statistics about failure modes.

BUG=catapult:#3076
==========

to

==========
Register metric Histogram names.

Currently, if anything at all goes wrong on the long and twisty path to
computing
metrics, then the value is simply missing. In order to figure out why it's
missing,
we need to dig manually into unstructured logs.

Previously, telemetry could sometimes produce FailureValues, but FailureValues
are
specified at the wrong granularity, and don't actually solve the problem, and
aren't
produced at every level of the long and twisty path.

If metrics register the names of the values that they produce, then, in the case
of
failure, each step of the long and twisty path can still produce all of the
values
that should have been produced.

Failure Histograms are the new FailureValues. The secret is that Failure
Histograms
are just Histograms with a FailureInfo diagnostic. Everything that applies to
Histograms in the successful case also applies to Failure Histograms, i.e.
TelemetryInfo, BuildbotInfo, DeviceInfo, etc.

Since Failure Histograms have the same names as successful Histograms, they can
be
surfaced on the dashboard in the timeseries charts. This is a huge improvement
over
the status quo, in which charts often have huge unhelpful gaping holes.

FailureInfo Diagnostics can also be displayed beautifully on sheriff-o-matic and
aggregated to compute long-term big data statistics about failure modes.

BUG=catapult:#3076
==========

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages