Skip to content

DAOS-19150 bio: add percentage_used to nvme stats#18486

Open
johannlombardi wants to merge 14 commits into
masterfrom
jlo/spdk-stats
Open

DAOS-19150 bio: add percentage_used to nvme stats#18486
johannlombardi wants to merge 14 commits into
masterfrom
jlo/spdk-stats

Conversation

@johannlombardi

@johannlombardi johannlombardi commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Add (lifetime) percentage_used NVMe statistic to the nvme_stats
structure which is returned in dmg storage scan --nvme-health and
dmg storage query list-devices --health commands. This value is a
SMART stat and returned through a SPDK health page fetch. Add the same
value to the list of NVMe health metrics stored.

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

Report percentage_used via the nvme health metrics.

Signed-off-by: Johann Lombardi <johann.lombardi@hpe.com>
@johannlombardi johannlombardi requested review from a team as code owners June 11, 2026 20:10
@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown

Ticket title is 'Add NVMe Controller Lifetime Percentage Used to Health Statistics'
Status is 'In Review'
Labels: 'linkedin'
https://daosio.atlassian.net/browse/DAOS-19150

@daosbuild3

Copy link
Copy Markdown
Collaborator

@daosbuild3

Copy link
Copy Markdown
Collaborator

@daosbuild3

Copy link
Copy Markdown
Collaborator

Signed-off-by: Johann Lombardi <johann.lombardi@hpe.com>
@daosbuild3

Copy link
Copy Markdown
Collaborator

@daosbuild3

Copy link
Copy Markdown
Collaborator

@daosbuild3

Copy link
Copy Markdown
Collaborator

Signed-off-by: Johann Lombardi <johann.lombardi@hpe.com>

@NiuYawei NiuYawei left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it requires some control plane changes to display the new metric? @tanabarr

@tanabarr tanabarr left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there needs to be updated to health stats printers on the go code and possibly some updates on the metric side, would you like me to add the missing parts @johannlombardi ?

@daosbuild3

Copy link
Copy Markdown
Collaborator

Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-18486/3/testReport/

tanabarr added 3 commits June 18, 2026 15:32
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
@tanabarr tanabarr requested review from a team as code owners June 18, 2026 14:58
tanabarr added 2 commits June 18, 2026 19:02
Features: control metrics
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
@tanabarr tanabarr requested review from a team as code owners June 19, 2026 09:49
@daosbuild3

Copy link
Copy Markdown
Collaborator

Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-18486/5/display/redirect

Test-tag: pr control hw,medium,test_nvme_telemetry_metrics
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
@daosbuild3

Copy link
Copy Markdown
Collaborator

Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-18486/7/testReport/

janekmi
janekmi previously approved these changes Jun 22, 2026
// versions:
// protoc-gen-go v1.34.1
// protoc v3.5.0
// protoc v3.14.0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Judging from the generated outputs it is not a problem. But for the record: can we change the version of these generators willy-nilly or should we strive to stick to one agreed version?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should stick to the one in go.mod, I will fix

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instructions for development of go code is //docs.daos.io/latest/dev/development/ , libprotoc isn't tracked in go.mod so the rule of thumb is to use the distro version which is 3.14.0 in Rocky >= 9.7. Does that sound okay?

@daltonbohning daltonbohning left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ftest changes LGTM

@daosbuild3

Copy link
Copy Markdown
Collaborator

Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-18486/8/testReport/

tanabarr added 2 commits June 24, 2026 12:51
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
Features: control
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
@tanabarr

Copy link
Copy Markdown
Contributor

@daltonbohning do the changes look sufficient for coverage in ftest? I got stuck adding extra coverage because I didn't know which value limits to apply without introducing intermittent failures.

@tanabarr

Copy link
Copy Markdown
Contributor

implementation needs fixing, health stats are not being updated

[nabarrot@edaos-15 daos]$ install/bin/dmg storage scan -i -l edaos-[15] --nvme-health| grep ercent
    Percentage Used:0%
    Percentage Used:0%
    Percentage Used:0%
    Percentage Used:0%
[nabarrot@edaos-15 daos]$ install/bin/dmg storage query usage -i -l edaos-[15]
Hosts    SCM-Total SCM-Free SCM-Used NVMe-Total NVMe-Free NVMe-Used
-----    --------- -------- -------- ---------- --------- ---------
edaos-15 1.6 TB    798 GB   50 %     32 TB      16 TB     49 %
[nabarrot@edaos-15 daos]$

Features: control
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
@daosbuild3

Copy link
Copy Markdown
Collaborator

@daltonbohning daltonbohning left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ftest LGTM. Some of the changes were backed out so just let me know if you still need me to look into something else ftest-related

@daosbuild3

Copy link
Copy Markdown
Collaborator

Test stage Unit Test with memcheck completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-18486/10/testReport/

@tanabarr

Copy link
Copy Markdown
Contributor

ftest LGTM. Some of the changes were backed out so just let me know if you still need me to look into something else ftest-related

I couldn't get the extra changes that added coverage to work, mainly because of the range requirements and choosing an upper limit that would work for all permutations. if you think the coverage is okay in its existing form then I will leave as is, if you think other changes need to be made please let me know.

tanabarr added 2 commits June 28, 2026 05:18
Features: control
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
…spdk-stats

Features: control
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
@daosbuild3

Copy link
Copy Markdown
Collaborator

Test stage Unit Test with memcheck completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-18486/11/testReport/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

6 participants