|
|
Created:
3 years, 2 months ago by rnephew (Reviews Here) Modified:
3 years, 2 months ago CC:
catapult-reviews_chromium.org Target Ref:
refs/heads/master Project:
catapult Visibility:
Public. |
Description[Soundwave] Add tool that displays alerts and related noise and bug data.
BUG=chromium:769027
Patch Set 1 #Patch Set 2 : [Soundwave] Add tool that displays alerts and related noise and bug data. #Patch Set 3 : Fix error when metric is None when generating noise data #Messages
Total messages: 16 (2 generated)
rnephew@chromium.org changed reviewers: + charliea@chromium.org, nednguyen@google.com, perezju@chromium.org, sullivan@chromium.org
On 2017/09/26 21:45:13, rnephew (Reviews Here) wrote: The json file is example output from the tool running ./alert_analyzer -b system_health.memory_desktop --days=30
On 2017/09/26 21:46:17, rnephew (Reviews Here) wrote: > On 2017/09/26 21:45:13, rnephew (Reviews Here) wrote: > > The json file is example output from the tool running ./alert_analyzer -b > system_health.memory_desktop --days=30 You should file a tracking bug for this effort, Randy. I suggest using crbug so that we can express bug dependencies easily.
Description was changed from ========== [Soundwave] Add tool that displays alerts and related noise and bug data. ========== to ========== [Soundwave] Add tool that displays alerts and related noise and bug data. BUG=chromium:769027 ==========
> You should file a tracking bug for this effort, Randy. I suggest using crbug so > that we can express bug dependencies easily. Done.
I feel it's probably a bit too early to start writing code, and we should start with a design doc instead. What do we want this script to produce? Which specific questions are we looking to answer? Which data we need to collect to answer those questions? Some high level comments on the code given: - we're probably going to need a ton of statistics/numerical analysis, so let's go for numpy/pandas and not reinvent the wheel (e.g. no _GetVariance). - I've found that json is not a great format for these kinds of analyses, ideally we should get all data needed for a particular analysis and dump it all into one or more csv files. Those offer great flexibility working with pandas, Google Sheets, and internal tools too.
On 2017/09/27 08:10:49, perezju wrote: > I feel it's probably a bit too early to start writing code, and we should start > with a design doc instead. > > What do we want this script to produce? Which specific questions are we looking > to answer? Which data we need to collect to answer those questions? > > Some high level comments on the code given: > > - we're probably going to need a ton of statistics/numerical analysis, so let's > go for numpy/pandas and not reinvent the wheel (e.g. no _GetVariance). > > - I've found that json is not a great format for these kinds of analyses, > ideally we should get all data needed for a particular analysis and dump it all > into one or more csv files. Those offer great flexibility working with pandas, > Google Sheets, and internal tools too. +1 of what Juan said. My flow is usually: Prototype to prove workable concept --> Design --> Design approval --> CLs
The major problem with writing a design first is that you have to have some semblance of an idea on what you are going for. This is more of an investigation tool to look alerts, and the bug and noise data on those alerts. No one has any real idea how to do metric noise analysis. Me included. This is intended as a way to look at existing alerts, and see if some sort of pattern emerges. I've noticed that alerts tend to happen on metrics that have relatively low CV. Thats not surprising, coefficient of variation being low means that there is less variation in the historic data after all. Without this tool though, I wouldn't have known that was a trend though. Still might not be, my sample size is still a bit small to make any conclusions. bit.ly/chromium-project-soundwave would work as a high level design document that explains what goals this is moving towards, but I can write a short doc ala go/design-sketch for this tool if you guys want. There's a chicken and the egg problem. You have to have some idea where the goal line is to write a design document... but you cant run towards the goal line until you have spotted it. If you also look at the data in the json file, I dont see a good way to store that as a CSV. Their is alert data, bug data, and noise data inside of it. I could come up with a CSV schema for it, but ordered json just seems like an easier way to interact with it. I'll work on switching to numpy/pandas.
On 2017/09/27 15:14:59, rnephew (Reviews Here) wrote: > The major problem with writing a design first is that you have to have some > semblance of an idea on what you are going for. This is a valid case of not needing design doc. I recall we have experimental/ folder for these use cases? > This is more of an investigation > tool to look alerts, and the bug and noise data on those alerts. No one has any > real idea how to do metric noise analysis. Me included. This is intended as a > way to look at existing alerts, and see if some sort of pattern emerges. I've > noticed that alerts tend to happen on metrics that have relatively low CV. Thats > not surprising, coefficient of variation being low means that there is less > variation in the historic data after all. Without this tool though, I wouldn't > have known that was a trend though. Still might not be, my sample size is still > a bit small to make any conclusions. bit.ly/chromium-project-soundwave would > work as a high level design document that explains what goals this is moving > towards, but I can write a short doc ala go/design-sketch for this tool if you > guys want. > > There's a chicken and the egg problem. You have to have some idea where the goal > line is to write a design document... but you cant run towards the goal line > until you have spotted it. > > If you also look at the data in the json file, I dont see a good way to store > that as a CSV. Their is alert data, bug data, and noise data inside of it. I > could come up with a CSV schema for it, but ordered json just seems like an > easier way to interact with it. > > I'll work on switching to numpy/pandas.
lgtm so you can iterate on this. For the next step, I suggest: 1) Sync up with Juan on the directions. He has a lot of experience with these analysis, so can probably share valueable insights 2) Improve the script so that you don't need to hit real network anytime you iterate your analysis. I suggest making it possible to fetch data from network once, and reuse it many time when iterating.
On 2017/09/27 18:47:43, nednguyen wrote: > lgtm so you can iterate on this. For the next step, I suggest: > 1) Sync up with Juan on the directions. He has a lot of experience with these > analysis, so can probably share valueable insights > 2) Improve the script so that you don't need to hit real network anytime you > iterate your analysis. I suggest making it possible to fetch data from network > once, and reuse it many time when iterating. Part 2 is exactly the purpose of this script. I can pull down the data and reuse it over and over again, as it is saved in a json file.
On 2017/09/27 18:57:59, rnephew (Reviews Here) wrote: > On 2017/09/27 18:47:43, nednguyen wrote: > > lgtm so you can iterate on this. For the next step, I suggest: > > 1) Sync up with Juan on the directions. He has a lot of experience with these > > analysis, so can probably share valueable insights > > 2) Improve the script so that you don't need to hit real network anytime you > > iterate your analysis. I suggest making it possible to fetch data from network > > once, and reuse it many time when iterating. > > Part 2 is exactly the purpose of this script. I can pull down the data and reuse > it over and over again, as it is saved in a json file. Which script analyze local json file? I don't see alert_analyzer.py's commandline args support taking in an existing json file.
On 2017/09/27 19:02:45, nednguyen wrote: > On 2017/09/27 18:57:59, rnephew (Reviews Here) wrote: > > On 2017/09/27 18:47:43, nednguyen wrote: > > > lgtm so you can iterate on this. For the next step, I suggest: > > > 1) Sync up with Juan on the directions. He has a lot of experience with > these > > > analysis, so can probably share valueable insights > > > 2) Improve the script so that you don't need to hit real network anytime you > > > iterate your analysis. I suggest making it possible to fetch data from > network > > > once, and reuse it many time when iterating. > > > > Part 2 is exactly the purpose of this script. I can pull down the data and > reuse > > it over and over again, as it is saved in a json file. > > Which script analyze local json file? I don't see alert_analyzer.py's > commandline args support taking in an existing json file. I have not created any helper script for analyzing the data it pulls down and processes. I think that maybe I should rename it from 'alert_analyzer' to 'alert_processor' or something similar, since its not really analyzing anything just pulling data and generating noise data for the metrics involved in the alerts.
On 2017/09/27 19:07:18, rnephew (Reviews Here) wrote: > On 2017/09/27 19:02:45, nednguyen wrote: > > On 2017/09/27 18:57:59, rnephew (Reviews Here) wrote: > > > On 2017/09/27 18:47:43, nednguyen wrote: > > > > lgtm so you can iterate on this. For the next step, I suggest: > > > > 1) Sync up with Juan on the directions. He has a lot of experience with > > these > > > > analysis, so can probably share valueable insights > > > > 2) Improve the script so that you don't need to hit real network anytime > you > > > > iterate your analysis. I suggest making it possible to fetch data from > > network > > > > once, and reuse it many time when iterating. > > > > > > Part 2 is exactly the purpose of this script. I can pull down the data and > > reuse > > > it over and over again, as it is saved in a json file. > > > > Which script analyze local json file? I don't see alert_analyzer.py's > > commandline args support taking in an existing json file. > > I have not created any helper script for analyzing the data it pulls down and > processes. I think that maybe I should rename it from 'alert_analyzer' to > 'alert_processor' or something similar, since its not really analyzing anything > just pulling data and generating noise data for the metrics involved in the > alerts. fetch_perf_data (our convention is script doesn't have .py tail)
On 2017/09/27 19:07:18, rnephew (Reviews Here) wrote: > On 2017/09/27 19:02:45, nednguyen wrote: > > On 2017/09/27 18:57:59, rnephew (Reviews Here) wrote: > > > On 2017/09/27 18:47:43, nednguyen wrote: > > > > lgtm so you can iterate on this. For the next step, I suggest: > > > > 1) Sync up with Juan on the directions. He has a lot of experience with > > these > > > > analysis, so can probably share valueable insights > > > > 2) Improve the script so that you don't need to hit real network anytime > you > > > > iterate your analysis. I suggest making it possible to fetch data from > > network > > > > once, and reuse it many time when iterating. > > > > > > Part 2 is exactly the purpose of this script. I can pull down the data and > > reuse > > > it over and over again, as it is saved in a json file. > > > > Which script analyze local json file? I don't see alert_analyzer.py's > > commandline args support taking in an existing json file. > > I have not created any helper script for analyzing the data it pulls down and > processes. I think that maybe I should rename it from 'alert_analyzer' to > 'alert_processor' or something similar, since its not really analyzing anything > just pulling data and generating noise data for the metrics involved in the > alerts. Since reitveld is going read only soon, we should move this discussion to the email Juan started. I will be moving this over to gerrit soon. |