Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permission issue - Failed to open file HASharedWorkPath #4

Open
ddsat opened this issue Jun 14, 2021 · 15 comments
Open

Permission issue - Failed to open file HASharedWorkPath #4

ddsat opened this issue Jun 14, 2021 · 15 comments

Comments

@ddsat
Copy link

ddsat commented Jun 14, 2021

Hi, I'm trying your approach but facing below issue at step maven-build of task maven-ace-build (tekton pipeline, ocp 4.6). Any idea what may be missing from your script and how to resolve it please. Thanks

Compiling log
[�[1;34mINFO�[m] �[1m---------------< �[0;36mace-demo-pipeline:demo-infrastructure�[0;1m >----------------�[m
[�[1;34mINFO�[m] �[1mBuilding demo-infrastructure 0.0.1 [6/7]�[m
[�[1;34mINFO�[m] �[1m--------------------------------[ jar ]---------------------------------�[m
[�[1;34mINFO�[m]
[�[1;34mINFO�[m] �[1m--- �[0;32mmaven-compiler-plugin:3.1:compile�[m �[1m(default-compile)�[m @ �[36mdemo-infrastructure�[0;1m ---�[m
[�[1;34mINFO�[m] Changes detected - recompiling the module!
[�[1;33mWARNING�[m] File encoding has not been set, using platform encoding ANSI_X3.4-1968, i.e. build is platform dependent!
[�[1;34mINFO�[m] Compiling 1 source file to /work/ace-demo-pipeline/demo-infrastructure/target/classes
[�[1;34mINFO�[m]
[�[1;34mINFO�[m] �[1m--- �[0;32mexec-maven-plugin:3.0.0:exec�[m �[1m(create-work-dir)�[m @ �[36mdemo-infrastructure�[0;1m ---�[m
mqsicreateworkdir: Copying sample server.config.yaml to work directory
1 file(s) copied.
Failed to open file /var/mqsi/registry/utility/HASharedWorkPath with error Permission denied
BIP2113E: IBM App Connect Enterprise internal error: diagnostic information ''Permission denied'', '13', ''/var/mqsi/registry/utility/HASharedWorkPath''.
An internal software error has occurred in IBM App Connect Enterprise. Further messages may indicate the effect of this error on the component.
Shutdown and restart the component. If the problem continues to occur, then restart the system. If the problem still continues to occur contact your IBM support center.
BIP8081E: An error occurred while processing the command.
An error occurred while the command was running; the command has cleaned up and ended.
Use messages prior to this one to determine the cause of the error.
Check for some common problems:
Does the user id have the correct authorities (for example a member of the mqbrkrs group)?
Is any operating system limit set too low to allow the command to run?
Is the environment correctly set up?
Correct the problem and retry the command, otherwise, contact your IBM support center.
[�[1;31mERROR�[m] Command execution failed.
�[1;31morg.apache.commons.exec.ExecuteException�[m: �[1;31mProcess exited with an error: 81 (Exit value: 81)�[m
�[1mat�[m org.apache.commons.exec.DefaultExecutor.executeInternal (�[1mDefaultExecutor.java:404�[m)
�[1mat�[m org.apache.commons.exec.DefaultExecutor.execute (�[1mDefaultExecutor.java:166�[m)
�[1mat�[m org.codehaus.mojo.exec.ExecMojo.executeCommandLine (�[1mExecMojo.java:982�[m)
�[1mat�[m org.codehaus.mojo.exec.ExecMojo.executeCommandLine (�[1mExecMojo.java:929�[m)
�[1mat�[m org.codehaus.mojo.exec.ExecMojo.execute (�[1mExecMojo.java:457�[m)
�[1mat�[m org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (�[1mDefaultBuildPluginManager.java:137�[m)
�[1mat�[m org.apache.maven.lifecycle.internal.MojoExecutor.execute (�[1mMojoExecutor.java:210�[m)
�[1mat�[m org.apache.maven.lifecycle.internal.MojoExecutor.execute (�[1mMojoExecutor.java:156�[m)
�[1mat�[m org.apache.maven.lifecycle.internal.MojoExecutor.execute (�[1mMojoExecutor.java:148�[m)
�[1mat�[m org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (�[1mLifecycleModuleBuilder.java:117�[m)
�[1mat�[m org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (�[1mLifecycleModuleBuilder.java:81�[m)
�[1mat�[m org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (�[1mSingleThreadedBuilder.java:56�[m)
�[1mat�[m org.apache.maven.lifecycle.internal.LifecycleStarter.execute (�[1mLifecycleStarter.java:128�[m)
�[1mat�[m org.apache.maven.DefaultMaven.doExecute (�[1mDefaultMaven.java:305�[m)
�[1mat�[m org.apache.maven.DefaultMaven.doExecute (�[1mDefaultMaven.java:192�[m)
�[1mat�[m org.apache.maven.DefaultMaven.execute (�[1mDefaultMaven.java:105�[m)
�[1mat�[m org.apache.maven.cli.MavenCli.execute (�[1mMavenCli.java:957�[m)
�[1mat�[m org.apache.maven.cli.MavenCli.doMain (�[1mMavenCli.java:289�[m)
�[1mat�[m org.apache.maven.cli.MavenCli.main (�[1mMavenCli.java:193�[m)
�[1mat�[m sun.reflect.NativeMethodAccessorImpl.invoke0 (�[1mNative Method�[m)
�[1mat�[m sun.reflect.NativeMethodAccessorImpl.invoke (�[1mNativeMethodAccessorImpl.java:90�[m)
�[1mat�[m sun.reflect.DelegatingMethodAccessorImpl.invoke (�[1mDelegatingMethodAccessorImpl.java:55�[m)
�[1mat�[m java.lang.reflect.Method.invoke (�[1mMethod.java:508�[m)
�[1mat�[m org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (�[1mLauncher.java:282�[m)
�[1mat�[m org.codehaus.plexus.classworlds.launcher.Launcher.launch (�[1mLauncher.java:225�[m)
�[1mat�[m org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (�[1mLauncher.java:406�[m)
�[1mat�[m org.codehaus.plexus.classworlds.launcher.Launcher.main (�[1mLauncher.java:347�[m)
[�[1;34mINFO�[m] �[1m------------------------------------------------------------------------�[m
[�[1;34mINFO�[m] �[1mReactor Summary:�[m

@tdolby-at-uk-ibm-com
Copy link
Contributor

This looks like it might be related to userids and permissions; it's entirely possible that OCP 4.6 is randomising the userid (or group memberships) in some way that breaks the Maven script.

To confirm this, it's probably worth running the "id" and "whoami" commands from the script that runs Maven, as I suspect it may not be a member of the mqbrkrs group. As for fixing the problem, I think the easiest way to fix it would be to run "chmod -R 777 /var/mqsi" as root in the container Dockerfile, as that would certainly eliminate the permissions issues.

It's also possible to set the MQSI_REGISTRY env var to somewhere else that's just been created with mqsicreateworkdir; I had to do this when running buildah with RedHat's s2i, and the script is here: https://github.com/tdolby-at-uk-ibm-com/ace-s2i-demo/blob/main/ace-build.sh

@ddsat
Copy link
Author

ddsat commented Jun 16, 2021

thank you @tdolby-at-uk-ibm-com. I tried to set serviceaccount that executes pipelinerun as pipeline and added it to group that can RunAsAny. Yet still the same.

For chmod you meant rebuilding tdolby/experimental:pipeline-travis-build-maven?

It's now issue in the below step and it fails when excecute mqsicreateworkdir in maven script under demo-infrastructure. Can you suggest a simple change in script for it to work (according to current script in this project)
name: maven-build
image: tdolby/experimental:pipeline-travis-build-maven
script: |
#!/bin/bash
export LICENSE=accept
. /opt/ibm/ace-11/server/bin/mqsiprofile
mkdir /work/maven-output
cd /work/ace-demo-pipeline
mvn -Dinstall.work.directory=/work/maven-output/ace-server install
volumeMounts:
- mountPath: /work
name: work

@tdolby-at-uk-ibm-com
Copy link
Contributor

Yes, you'd have to build another container on top of the original one, with a chmod in the Dockerfile.

However, it seems possible that if you add

export MQSI_REGISTRY=/tmp/perms-work-dir/config
mqsicreateworkdir /tmp/perms-work-dir
export MQSI_WORKPATH=$MQSI_REGISTRY

just before the mkdir /work/maven-output line, then it should work. This is what I had to do with buildah in the s2i repo, and so it seemed like a good bet.

Even though I can't recreate your setup, I've tested this solution locally by deliberately making /var/mqsi unreadable and running Maven; originally, it showed the same error as you see, and after those three lines it started working.

@ddsat
Copy link
Author

ddsat commented Jun 16, 2021

Thanks for your good bet. It works. Some other issues with TeaTests job but I can skip that part.

Got another issue with permission at the next step. Could rectify it please?
STEP-DOCKER-BUILD-AND-PUSH
Error: error resolving dockerfile path: copying dockerfile: open /kaniko/Dockerfile: permission denied

One more thing. I notice you use *deployment.yaml and *service.yaml to deploy the image. If my environment already contains Platform Navigator Dashboard and App Connect Dashboard, should it work too? 'Cause it seems I just need to deploy an integration server (from image packaged with bar files) Any advice on this please?

@tdolby-at-uk-ibm-com
Copy link
Contributor

Glad to hear the initial fix works (though ignoring tests might be unwise in the long term!).

From a quick search, it looks like OpenShift with random userids doesn't work with kaniko. It seems that Google and RedHat have different ideas about how this should work, according to GoogleContainerTools/kaniko#681 where the issue has been closed after some investigation.

They suggest using securityContext: runAsUser: 0 (as is also suggested here https://stackoverflow.com/questions/60911478/permission-denied-for-the-kaniko-job-in-openshift-cluster ) at which point presumably the permissions problems go away, so that's definitely worth a try.

While this demo is really aimed at public cloud IKS (and others), OpenShift ought to work outside the Code-Ready Container install that I run locally to try things out. I'll have to try and get buildah working again (broken during my upgrade to ACE v12) along with s2i and see where that gets us.

@tdolby-at-uk-ibm-com
Copy link
Contributor

One more thing. I notice you use *deployment.yaml and *service.yaml to deploy the image. If my environment already contains Platform Navigator Dashboard and App Connect Dashboard, should it work too? 'Cause it seems I just need to deploy an integration server (from image packaged with bar files) Any advice on this please?

In theory, you should be able to run the custom image using the IntegrationServer operator and you shouldn't need to run the deployment/service/route files I'm using.

However, the custom image produce by the pipeline in this repo doesn't build on top of the certified container (because that container is too big to fit in the IBM Cloud Container Registry's free tier of 512MB) and so I fear you'll have some issues trying to go down this path.

I don't think it would be hard to change that, though I haven't tried it myself. You'd need to change tekton/Dockerfile and adjust the BASE_IMAGE passed to kaniko so that it points to the certified container, and it should be possible to put the application into the usual /home/aceuser/ace-server work directory that way so it starts automatically.

@ddsat
Copy link
Author

ddsat commented Jun 16, 2021

One more thing. I notice you use *deployment.yaml and *service.yaml to deploy the image. If my environment already contains Platform Navigator Dashboard and App Connect Dashboard, should it work too? 'Cause it seems I just need to deploy an integration server (from image packaged with bar files) Any advice on this please?

In theory, you should be able to run the custom image using the IntegrationServer operator and you shouldn't need to run the deployment/service/route files I'm using.

However, the custom image produce by the pipeline in this repo doesn't build on top of the certified container (because that container is too big to fit in the IBM Cloud Container Registry's free tier of 512MB) and so I fear you'll have some issues trying to go down this path.

I don't think it would be hard to change that, though I haven't tried it myself. You'd need to change tekton/Dockerfile and adjust the BASE_IMAGE passed to kaniko so that it points to the certified container, and it should be possible to put the application into the usual /home/aceuser/ace-server work directory that way so it starts automatically.

Really appreciate your help.

in UI with Platform Navigator and App Connect Dashboard, I'm using image cp.icr.io/cp/appc/ace-server-prod

Would it be possible if I use this as a BASE_IMAGE in tekton/Docker file while still running the same maven-ace-build task?
And instead of running *deployment.yaml, i will just need to prepare yaml of creating integration server with new image. *service.yaml isn't necessary is it? Because integration server yaml is the only I need if I create it from App Connect Dashboard?

With my environment context, do you think I still can go down this path?

@ddsat
Copy link
Author

ddsat commented Jun 16, 2021

Glad to hear the initial fix works (though ignoring tests might be unwise in the long term!).

From a quick search, it looks like OpenShift with random userids doesn't work with kaniko. It seems that Google and RedHat have different ideas about how this should work, according to GoogleContainerTools/kaniko#681 where the issue has been closed after some investigation.

They suggest using securityContext: runAsUser: 0 (as is also suggested here https://stackoverflow.com/questions/60911478/permission-denied-for-the-kaniko-job-in-openshift-cluster ) at which point presumably the permissions problems go away, so that's definitely worth a try.

While this demo is really aimed at public cloud IKS (and others), OpenShift ought to work outside the Code-Ready Container install that I run locally to try things out. I'll have to try and get buildah working again (broken during my upgrade to ACE v12) along with s2i and see where that gets us.

Thanks. the service account i'm using already is RunAsAny. RunAsUser: 0 you meant put it in the running pod?

@ddsat ddsat closed this as completed Jun 16, 2021
@ddsat ddsat reopened this Jun 16, 2021
@tdolby-at-uk-ibm-com
Copy link
Contributor

in UI with Platform Navigator and App Connect Dashboard, I'm using image cp.icr.io/cp/appc/ace-server-prod

Would it be possible if I use this as a BASE_IMAGE in tekton/Docker file while still running the same maven-ace-build task?
And instead of running *deployment.yaml, i will just need to prepare yaml of creating integration server with new image. *service.yaml isn't necessary is it? Because integration server yaml is the only I need if I create it from App Connect Dashboard?

With my environment context, do you think I still can go down this path?

Yes, I think that has a good chance of succeeding, though it's possible more modifications will be needed to the Dockerfile if the certified container has anything in a different location or whatever.

You should certainly be able to use a single yaml file for the IntegrationServer, and not require the deployment or service yaml files, as the operator should create everything you need.

@tdolby-at-uk-ibm-com
Copy link
Contributor

Thanks. the service account i'm using already is RunAsAny. RunAsUser: 0 you meant put it in the running pod?

I think that's what they meant, yes: I don't have an easy way to test this, but the Stack Overflow link shows someone suggesting

apiVersion: v1
kind: Pod
metadata:
  name: security-context-demo
spec:
  securityContext:
    runAsUser: 0

as the solution.

@ddsat
Copy link
Author

ddsat commented Jun 16, 2021

Thanks. the service account i'm using already is RunAsAny. RunAsUser: 0 you meant put it in the running pod?

I think that's what they meant, yes: I don't have an easy way to test this, but the Stack Overflow link shows someone suggesting

apiVersion: v1
kind: Pod
metadata:
  name: security-context-demo
spec:
  securityContext:
    runAsUser: 0

as the solution.

the running pod is gcr.io/kaniko-project/executor:v0.16.0 Can you suggest the change in script. I'll try to run it

@ddsat
Copy link
Author

ddsat commented Jun 16, 2021

in UI with Platform Navigator and App Connect Dashboard, I'm using image cp.icr.io/cp/appc/ace-server-prod
Would it be possible if I use this as a BASE_IMAGE in tekton/Docker file while still running the same maven-ace-build task?
And instead of running *deployment.yaml, i will just need to prepare yaml of creating integration server with new image. *service.yaml isn't necessary is it? Because integration server yaml is the only I need if I create it from App Connect Dashboard?
With my environment context, do you think I still can go down this path?

Yes, I think that has a good chance of succeeding, though it's possible more modifications will be needed to the Dockerfile if the certified container has anything in a different location or whatever.

You should certainly be able to use a single yaml file for the IntegrationServer, and not require the deployment or service yaml files, as the operator should create everything you need.

I'll give it a try after I can manage to execute both tasks in this pipeline successful.

@tdolby-at-uk-ibm-com
Copy link
Contributor

After a bit of effort, I managed to break my local OpenShift cluster the same way yours broke, and I believe what's needed is a different pipelinerun. Instead of the ace-pipeline-run-crc.yaml as-is, I think the following should work:

apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
  name: ace-pipeline-run-1
spec:
  serviceAccountName: ace-tekton-service-account
  pipelineRef:
    name: ace-pipeline
  podTemplate:
    securityContext:
      runAsNonRoot: false
      runAsUser: 0
  params:
    - name: dockerRegistry
      value: "image-registry.openshift-image-registry.svc:5000/default"

which should set the pod to running as root, which should in turn allow kaniko to operate successfully. The YAML changes came from https://github.com/tektoncd/pipeline/blob/main/docs/pipelineruns.md#specifying-a-pod-template and appear to be completely standard.

This seemed to work for me (after I made sure the docker credentials in "regcred" were present and correct) and hopefully will work for you and not fall foul of any restrictions on running pods as root . . .

@ddsat
Copy link
Author

ddsat commented Jun 22, 2021

Thanks @tdolby-at-uk-ibm-com for your detailed suggestions. I'll try it today.

@ddsat
Copy link
Author

ddsat commented Jun 23, 2021

I've updated securityContext and secret for my image registry in the serviceaccount pipeline that i'm using (highlighted in bold). It seems you've updated ace minimal version to ace-minimal-build-12.0.1.0-alpine too. Both tasks of ace-demo-pipeline works successfully thanks to your suggestion.

However deployment yaml file can't successfully create the pod. I'm not sure if it's because of my environment with certified image Platform Navigator instance and certified ACE Dashboard image (11.0.0.11-r2) or not. Will try to look more into this.

$ oc get serviceaccount pipeline
NAME SECRETS AGE
pipeline 2 12d
$ oc get serviceaccount pipeline -o yaml
apiVersion: v1
imagePullSecrets:

  • name: pipeline-dockercfg-fnffp
    kind: ServiceAccount
    metadata:
    creationTimestamp: "2021-06-08T11:11:17Z"
    name: pipeline
    namespace: ace-dev
    resourceVersion: "20040872"
    selfLink: /api/v1/namespaces/ace-dev/serviceaccounts/pipeline
    uid: 522707d8-f6c5-4f17-ba3a-124a98735ca6
    secrets:
  • name: pipeline-dockercfg-fnffp
  • name: pipeline-token-rzpwv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants