Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NA tcp: Add FI_TAG_RPC support #784

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jxiong
Copy link

@jxiong jxiong commented Feb 13, 2025

This flag allows the libfabric to drop messages with unmatched tags. It fixes an issue that an endpoint would be choked if it receives a stale reply and libfabric doesn't know how to handle it.

This flag allows the libfabric to drop messages with unmatched tags.
It fixes an issue that the endpoint would be choked if it receives a
stale reply and libfabric doesn't know how to handle it.

Signed-off-by: Jinshan Xiong <jinshanx@google.com>
Copy link

github-actions bot commented Feb 13, 2025

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

@jxiong
Copy link
Author

jxiong commented Feb 13, 2025

ofi pr: ofiwg/libfabric#10783

@jxiong
Copy link
Author

jxiong commented Feb 13, 2025

I have read the CLA Document and I hereby sign the CLA

@soumagne
Copy link
Member

I would move the code to https://github.com/mercury-hpc/mercury/blob/master/src/na/na_ofi.c#L3506 and instead do:

#if FI_VERSION_GE(FI_COMPILE_VERSION, FI_VERSION(2, 0))
        if (FI_VERSION_GE(fi_version(), FI_VERSION(2, 0)) &&
            (prov_type == NA_OFI_PROV_TCP)) {
            char *env = getenv("NA_OFI_TCP_TAG_RPC");
            if (env == NULL || atoi(env) != 0) /* Enabled by default */
                hints->ep_attr->mem_tag_format = FI_TAG_RPC;
        }
#endif

jxiong added a commit to daos-stack/daos that referenced this pull request Feb 25, 2025
Otherwise it would choke the underlying tcp connection.

** a temporary fix on our end while waiting for the libfabric decision **

internal ticket: b/395943619
ofi PR: https://github.com/ofiwg/libfabric/pull/{10792,10783}
mercury PR: mercury-hpc/mercury#784

Change-Id: I108417b19f02d027e3bf7ee55165edd43c23d15b
Signed-off-by: Jinshan Xiong <jinshanx@google.com>
jxiong added a commit to daos-stack/daos that referenced this pull request Feb 25, 2025
Otherwise it would choke the underlying tcp connection.

** a temporary fix on our end while waiting for the libfabric decision **

internal ticket: b/395943619
ofi PR: https://github.com/ofiwg/libfabric/pull/{10792,10783}
mercury PR: mercury-hpc/mercury#784

Change-Id: I108417b19f02d027e3bf7ee55165edd43c23d15b
Signed-off-by: Jinshan Xiong <jinshanx@google.com>
jxiong added a commit to daos-stack/daos that referenced this pull request Feb 26, 2025
Otherwise it would choke the underlying tcp connection.

** a temporary fix on our end while waiting for the libfabric decision **

internal ticket: b/395943619
ofi PR: https://github.com/ofiwg/libfabric/pull/{10792,10783}
mercury PR: mercury-hpc/mercury#784

Skip-func-hw-test-medium: false
Skip-func-hw-tests-large-md-on-ssd: false
Test-provider: ofi+tcp

Change-Id: I108417b19f02d027e3bf7ee55165edd43c23d15b
Signed-off-by: Jinshan Xiong <jinshanx@google.com>
jolivier23 pushed a commit to daos-stack/daos that referenced this pull request Feb 27, 2025
Otherwise it would choke the underlying tcp connection.

** a temporary fix on our end while waiting for the libfabric decision **

internal ticket: b/395943619
ofi PR: https://github.com/ofiwg/libfabric/pull/{10792,10783}
mercury PR: mercury-hpc/mercury#784

Change-Id: I108417b19f02d027e3bf7ee55165edd43c23d15b

Signed-off-by: Jinshan Xiong <jinshanx@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants