DAOS-19150 bio: add percentage_used to nvme stats#18486
DAOS-19150 bio: add percentage_used to nvme stats#18486johannlombardi wants to merge 14 commits into
Conversation
Report percentage_used via the nvme health metrics. Signed-off-by: Johann Lombardi <johann.lombardi@hpe.com>
|
Ticket title is 'Add NVMe Controller Lifetime Percentage Used to Health Statistics' |
|
Test stage Build on EL 8 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18486/1/execution/node/269/log |
|
Test stage Build on EL 9 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18486/1/execution/node/263/log |
|
Test stage Build on Leap 15 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18486/1/execution/node/344/log |
|
Test stage Build on EL 8 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18486/2/execution/node/280/log |
|
Test stage Build on EL 9 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18486/2/execution/node/273/log |
|
Test stage Build on Leap 15 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18486/2/execution/node/344/log |
Signed-off-by: Johann Lombardi <johann.lombardi@hpe.com>
tanabarr
left a comment
There was a problem hiding this comment.
there needs to be updated to health stats printers on the go code and possibly some updates on the metric side, would you like me to add the missing parts @johannlombardi ?
|
Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-18486/3/testReport/ |
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
Features: control metrics Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
|
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-18486/5/display/redirect |
Test-tag: pr control hw,medium,test_nvme_telemetry_metrics Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
|
Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-18486/7/testReport/ |
| // versions: | ||
| // protoc-gen-go v1.34.1 | ||
| // protoc v3.5.0 | ||
| // protoc v3.14.0 |
There was a problem hiding this comment.
Judging from the generated outputs it is not a problem. But for the record: can we change the version of these generators willy-nilly or should we strive to stick to one agreed version?
There was a problem hiding this comment.
we should stick to the one in go.mod, I will fix
There was a problem hiding this comment.
The instructions for development of go code is //docs.daos.io/latest/dev/development/ , libprotoc isn't tracked in go.mod so the rule of thumb is to use the distro version which is 3.14.0 in Rocky >= 9.7. Does that sound okay?
daltonbohning
left a comment
There was a problem hiding this comment.
ftest changes LGTM
|
Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-18486/8/testReport/ |
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
Features: control Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
|
@daltonbohning do the changes look sufficient for coverage in ftest? I got stuck adding extra coverage because I didn't know which value limits to apply without introducing intermittent failures. |
|
implementation needs fixing, health stats are not being updated |
Features: control Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
|
Test stage Functional on EL 9 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18486/9/execution/node/1041/log |
daltonbohning
left a comment
There was a problem hiding this comment.
ftest LGTM. Some of the changes were backed out so just let me know if you still need me to look into something else ftest-related
|
Test stage Unit Test with memcheck completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-18486/10/testReport/ |
I couldn't get the extra changes that added coverage to work, mainly because of the range requirements and choosing an upper limit that would work for all permutations. if you think the coverage is okay in its existing form then I will leave as is, if you think other changes need to be made please let me know. |
Features: control Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
…spdk-stats Features: control Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
|
Test stage Unit Test with memcheck completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-18486/11/testReport/ |
Add (lifetime) percentage_used NVMe statistic to the nvme_stats
structure which is returned in dmg storage scan --nvme-health and
dmg storage query list-devices --health commands. This value is a
SMART stat and returned through a SPDK health page fetch. Add the same
value to the list of NVMe health metrics stored.
Steps for the author:
After all prior steps are complete: