Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(99)

Issue 2656493002: Register metric Histogram names. (Closed)

Created:
3 years, 11 months ago by benjhayden
Modified:
3 years, 2 months ago
Reviewers:
CC:
catapult-reviews_chromium.org, tracing-review_chromium.org
Target Ref:
refs/heads/master
Project:
catapult
Visibility:
Public.

Description

Register metric Histogram names. Currently, if anything at all goes wrong on the long and twisty path to computing metrics, then the value is simply missing. In order to figure out why it's missing, we need to dig manually into unstructured logs. Previously, telemetry could sometimes produce FailureValues, but FailureValues are specified at the wrong granularity, and don't actually solve the problem, and aren't produced at every level of the long and twisty path. If metrics register the names of the values that they produce, then, in the case of failure, each step of the long and twisty path can still produce all of the values that should have been produced. Failure Histograms are the new FailureValues. The secret is that Failure Histograms are just Histograms with a FailureInfo diagnostic. Everything that applies to Histograms in the successful case also applies to Failure Histograms, i.e. TelemetryInfo, BuildbotInfo, DeviceInfo, etc. Since Failure Histograms have the same names as successful Histograms, they can be surfaced on the dashboard in the timeseries charts. This is a huge improvement over the status quo, in which charts often have huge unhelpful gaping holes. FailureInfo Diagnostics can also be displayed beautifully on sheriff-o-matic and aggregated to compute long-term big data statistics about failure modes. BUG=catapult:#3076

Patch Set 1 : . #

Patch Set 2 : rebase #

Patch Set 3 : validateHistogramNames #

Patch Set 4 : rebase #

Unified diffs Side-by-side diffs Delta from patch set Stats (+218 lines, -35 lines) Patch
M telemetry/telemetry/internal/story_runner.py View 1 2 3 2 chunks +9 lines, -1 line 0 comments Download
M telemetry/telemetry/web_perf/timeline_based_measurement.py View 1 2 3 3 chunks +11 lines, -6 lines 0 comments Download
A tracing/tracing/metrics/all_histogram_names.py View 1 chunk +33 lines, -0 lines 0 comments Download
A tracing/tracing/metrics/all_histogram_names_cmdline.html View 1 chunk +23 lines, -0 lines 0 comments Download
M tracing/tracing/metrics/blink/gc_metric.html View 1 2 3 1 chunk +3 lines, -1 line 0 comments Download
M tracing/tracing/metrics/cpu_process_metric.html View 1 2 3 1 chunk +3 lines, -1 line 0 comments Download
M tracing/tracing/metrics/metric_map_function.html View 1 2 3 1 chunk +47 lines, -4 lines 0 comments Download
M tracing/tracing/metrics/metric_registry.html View 1 2 3 1 chunk +19 lines, -1 line 0 comments Download
M tracing/tracing/metrics/metric_registry_test.html View 1 2 3 4 chunks +14 lines, -3 lines 0 comments Download
M tracing/tracing/metrics/sample_metric.html View 1 chunk +3 lines, -1 line 0 comments Download
M tracing/tracing/metrics/system_health/clock_sync_latency_metric.html View 1 2 3 1 chunk +3 lines, -1 line 0 comments Download
M tracing/tracing/metrics/system_health/cpu_time_metric.html View 1 chunk +2 lines, -1 line 0 comments Download
M tracing/tracing/metrics/system_health/estimated_input_latency_metric.html View 1 2 3 1 chunk +3 lines, -1 line 0 comments Download
M tracing/tracing/metrics/system_health/loading_metric.html View 1 2 3 1 chunk +3 lines, -1 line 0 comments Download
M tracing/tracing/metrics/system_health/long_tasks_metric.html View 1 chunk +2 lines, -1 line 0 comments Download
M tracing/tracing/metrics/system_health/memory_metric.html View 1 2 3 1 chunk +2 lines, -1 line 0 comments Download
M tracing/tracing/metrics/system_health/power_metric.html View 1 2 3 1 chunk +3 lines, -1 line 0 comments Download
M tracing/tracing/metrics/system_health/responsiveness_metric.html View 1 2 3 1 chunk +2 lines, -1 line 0 comments Download
M tracing/tracing/metrics/system_health/webview_startup_metric.html View 1 chunk +3 lines, -1 line 0 comments Download
M tracing/tracing/metrics/tracing_metric.html View 1 2 3 1 chunk +3 lines, -1 line 0 comments Download
M tracing/tracing/metrics/v8/execution_metric.html View 1 chunk +3 lines, -1 line 0 comments Download
M tracing/tracing/metrics/v8/gc_metric.html View 1 2 3 1 chunk +3 lines, -1 line 0 comments Download
M tracing/tracing/metrics/v8/runtime_stats_metric.html View 1 2 3 1 chunk +6 lines, -2 lines 0 comments Download
M tracing/tracing/metrics/v8/v8_metrics.html View 1 chunk +3 lines, -1 line 0 comments Download
M tracing/tracing/ui/side_panel/metrics_side_panel_test.html View 1 chunk +3 lines, -1 line 0 comments Download
M tracing/tracing_project.py View 1 2 3 2 chunks +9 lines, -1 line 0 comments Download

Messages

Total messages: 5 (5 generated)
benjhayden
Patchset #1 (id:1) has been deleted
3 years, 11 months ago (2017-01-23 22:49:32 UTC) #1
benjhayden
Patchset #1 (id:20001) has been deleted
3 years, 11 months ago (2017-01-23 22:53:29 UTC) #2
benjhayden
Patchset #1 (id:40001) has been deleted
3 years, 11 months ago (2017-01-23 23:03:34 UTC) #3
benjhayden
Description was changed from ========== Register metric Histogram names. BUG=catapult:#3076 ========== to ========== Register metric ...
3 years, 11 months ago (2017-01-23 23:19:24 UTC) #4
benjhayden
3 years, 11 months ago (2017-01-24 05:20:07 UTC) #5
Description was changed from

==========
Register metric Histogram names.

Currently, if anything at all goes wrong on the long and twisty path to
computing
metrics, then the value is simply missing. In order to figure out why it's
missing,
we need to dig manually into unstructured logs.

Previously, telemetry could sometimes produce FailureValues, but FailureValues
are
specified at the wrong granularity, and don't actually solve the problem, and
aren't
produced at every level of the long and twisty path.

If metrics register the names of the values that they produce, then, in the
event of
failure, each step of the long and twisty path can still produce all of the
values
that should have been produced.

Failure Histograms are the new FailureValues. The secret is that Failure
Histograms
are just Histograms, with a FailureInfo diagnostic. Everything that applies to
Histograms in the successful case also applies to Failure Histograms, i.e.
TelemetryInfo, BuildbotInfo, DeviceInfo, etc.

Since Failure Histograms have the same names as successful Histograms, they can
be
surfaced on the dashboard in the timeseries charts. This is a huge improvement
over
the status quo, in which charts often have huge unhelpful gaping holes.

FailureInfo Diagnostics can also be displayed beautifully on sheriff-o-matic and
aggregated to compute long-term big data statistics about failure modes.

BUG=catapult:#3076
==========

to

==========
Register metric Histogram names.

Currently, if anything at all goes wrong on the long and twisty path to
computing
metrics, then the value is simply missing. In order to figure out why it's
missing,
we need to dig manually into unstructured logs.

Previously, telemetry could sometimes produce FailureValues, but FailureValues
are
specified at the wrong granularity, and don't actually solve the problem, and
aren't
produced at every level of the long and twisty path.

If metrics register the names of the values that they produce, then, in the case
of
failure, each step of the long and twisty path can still produce all of the
values
that should have been produced.

Failure Histograms are the new FailureValues. The secret is that Failure
Histograms
are just Histograms with a FailureInfo diagnostic. Everything that applies to
Histograms in the successful case also applies to Failure Histograms, i.e.
TelemetryInfo, BuildbotInfo, DeviceInfo, etc.

Since Failure Histograms have the same names as successful Histograms, they can
be
surfaced on the dashboard in the timeseries charts. This is a huge improvement
over
the status quo, in which charts often have huge unhelpful gaping holes.

FailureInfo Diagnostics can also be displayed beautifully on sheriff-o-matic and
aggregated to compute long-term big data statistics about failure modes.

BUG=catapult:#3076
==========

Powered by Google App Engine
This is Rietveld 408576698