Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: Invalid requirement: 'swarmlearning==client': Expected end or semicolon (after name and no valid version specifier) #249

Closed
PNg-HA opened this issue Jul 15, 2024 · 3 comments

Comments

@PNg-HA
Copy link

PNg-HA commented Jul 15, 2024

Issue description

  • issue description: During running mnist example, I meet the error in all SWOP containers, after the command " Step 5/5 : RUN pip3 install /tmp/hpe-swarmcli-pkg/swarmlearning-client-py3-none-manylinux_2_24_x86_64.whl"

  • occurrence - consistent or rare: consistent

  • error messages: "ERROR: Invalid requirement: 'swarmlearning==client': Expected end or semicolon (after name and no valid version specifier)"

  • commands used for starting containers:

SN1: ./scripts/bin/run-sn -d --rm --name=sn1
--network=host-1-net --host-ip=${HOST_1_IP}
--sentinel --sn-p2p-port=${SN_P2P_PORT}
--sn-api-port=${SN_API_PORT}
--key=workspace/mnist/cert/sn-1-key.pem
--cert=workspace/mnist/cert/sn-1-cert.pem
--capath=workspace/mnist/cert/ca/capath
--apls-ip=${APLS_IP}

SN2: ./scripts/bin/run-sn -d --rm --name=sn2
--network=host-2-net --host-ip=${HOST_2_IP}
--sentinel-ip=${SN_1_IP} --sn-p2p-port=${SN_P2P_PORT}
--sn-api-port=${SN_API_PORT} --key=workspace/mnist/cert/sn-2-key.pem
--cert=workspace/mnist/cert/sn-2-cert.pem
--capath=workspace/mnist/cert/ca/capath
--apls-ip=${APLS_IP}

SWOP1: ./scripts/bin/run-swop -d --name=swop1 --network=host-1-net
--sn-ip=${SN_1_IP} --sn-api-port=${SN_API_PORT}
--usr-dir=workspace/mnist/swop --profile-file-name=swop1_profile.yaml
--key=workspace/mnist/cert/swop-1-key.pem
--cert=workspace/mnist/cert/swop-1-cert.pem
--capath=workspace/mnist/cert/ca/capath -e SWOP_KEEP_CONTAINERS=True -e http_proxy= -e https_proxy=
--apls-ip=${APLS_IP}

SWOP2: ./scripts/bin/run-swop -d --name=swop2 --network=host-2-net
--sn-ip=${SN_2_IP} --sn-api-port=${SN_API_PORT}
--usr-dir=workspace/mnist/swop --profile-file-name=swop2_profile.yaml
--key=workspace/mnist/cert/swop-2-key.pem
--cert=workspace/mnist/cert/swop-2-cert.pem
--capath=workspace/mnist/cert/ca/capath -e SWOP_KEEP_CONTAINERS=True -e http_proxy= -e https_proxy=
--apls-ip=${APLS_IP}
SWCI:
./scripts/bin/run-swci --name=swci1 --network=host-1-net
--usr-dir=workspace/mnist/swci --init-script-name=swci-init
--key=workspace/mnist/cert/swci-1-key.pem
--cert=workspace/mnist/cert/swci-1-cert.pem
--capath=workspace/mnist/cert/ca/capath
-e http_proxy= -e https_proxy= --apls-ip=${APLS_IP}
-e SWCI_RUN_TASK_MAX_WAIT_TIME=960
image

Swarm Learning Version: 2.2

  • Find the docker tag of the Swarm images ( $ docker images | grep hub.myenterpriselicense.hpe.com/hpe_eval/swarm-learning )

OS and ML Platform

  • details of host OS: Ubuntu server 22.04.02 LTS
  • details of ML platform used: Keras (Tensorflow)
  • details of Swarm learning Cluster (Number of machines, SL nodes, SN nodes): 2 machine (1 SL, 1 SN per machine)

Quick Checklist: Respond [Yes/No]

  • APLS server web GUI shows available Licenses? Yes
  • If Multiple systems are used, can each system access every other system?
  • Is Password-less SSH configuration setup for all the systems? No
  • If GPU or other protected resources are used, does the account have sufficient privileges to access and use them?
  • Is the user id a member of the docker group? Yes

Additional notes

  • Are you running documented example without any modification? Yes
  • Add any additional information about use case or any notes which supports for issue investigation:

NOTE: Create an archive with supporting artifacts and attach to issue, whenever applicable.

sn2.txt
sn1.txt
swop2.txt
swop1.txt
swci.txt

@FrancescaFrattini
Copy link

Getting same error with the CIFAR10 example when I run those commands
cp -L lib/swarmlearning-client-py3-none-manylinux_2_24_x86_64.whl workspace/cifar10/ml-context/ docker build -t user-ml-env-tf2.7.0 workspace/cifar10/ml-context

@PNg-HA
Copy link
Author

PNg-HA commented Jul 16, 2024

I have searched the message error in google and found the same message error in posit-dev/rsconnect-python#605. If this is true, I think the real issue lines in the line "RUN pip3 install --upgrade pip" in the user_env_tf_build_task.yaml that will update to the latest pip version, which is the version results in the wheel error.
I will try to upgrade to the suitable pip version and inform the results later.

@PNg-HA
Copy link
Author

PNg-HA commented Jul 16, 2024

It works. The suitable pip version I use is 24.0. Here is the suitable user_env_tf_build_task.yaml.
user_env_tf_build_task.txt

@PNg-HA PNg-HA closed this as completed Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants