NameNode HA: HDFS access gives error "Operation category READ is not supported in state standby" #54

juv · 2018-07-06T08:31:58Z

I've set up NameNode HA. A problem is that my service will route to either active namenode or standby namenode. For example, I have a Spark History server running with parameter SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=hdfs://hadoop-hdfs-nn.my-namespace.svc:9000/shared/spark-logs".
The following error Operation category READ is not supported in state standby occurs:

Exception in thread "main" java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:278)
	at org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby
	at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)
	at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1779)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1313)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setSafeMode(NameNodeRpcServer.java:1063)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setSafeMode(ClientNamenodeProtocolServerSideTranslatorPB.java:739)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

	at org.apache.hadoop.ipc.Client.call(Client.java:1475)
	at org.apache.hadoop.ipc.Client.call(Client.java:1412)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
	at com.sun.proxy.$Proxy10.setSafeMode(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.setSafeMode(ClientNamenodeProtocolTranslatorPB.java:666)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
	at com.sun.proxy.$Proxy11.setSafeMode(Unknown Source)
	at org.apache.hadoop.hdfs.DFSClient.setSafeMode(DFSClient.java:2596)
	at org.apache.hadoop.hdfs.DistributedFileSystem.setSafeMode(DistributedFileSystem.java:1223)
	at org.apache.spark.deploy.history.FsHistoryProvider.isFsInSafeMode(FsHistoryProvider.scala:673)
	at org.apache.spark.deploy.history.FsHistoryProvider.isFsInSafeMode(FsHistoryProvider.scala:666)
	at org.apache.spark.deploy.history.FsHistoryProvider.initialize(FsHistoryProvider.scala:159)
	at org.apache.spark.deploy.history.FsHistoryProvider.<init>(FsHistoryProvider.scala:156)
	at org.apache.spark.deploy.history.FsHistoryProvider.<init>(FsHistoryProvider.scala:78)
... 6 more

I guess that the Spark History Server has accessed the k8s service and got routed to the standby namenode instead of the active namenode. Have you thought about this problem @kimoonkim? I have an idea to solve this:

add another service hadoop-hdfs-active-nn
select only active namenode pod
have another lightweight pod running that keeps track of the active namenode
set the label selector to the new active namenode when the active namenode is switched

Then, let all clients connect to the hadoop-hdfs-active-nn service instead of hadoop-hdfs-nn. I am not sure if it is possible to set the label of a service from within a pod though...

The text was updated successfully, but these errors were encountered:

vvbogdanov87 · 2018-09-10T07:40:25Z

You should configure a client to work with HDFS in HA mode manually. You can get params from hdfs-site.xml
kubectl describe configmap hdfs-config
Some apps can read the params from hdfs-config.xml Just put hdfs-config.xml to /etc/hadoop/conf
Some apps can be configured using flags. I use next syntax to launch spark(2.3.1) jobs on k8s:

spark-submit ... \
    --conf spark.hadoop.fs.defaultFS="hdfs://hdfs-k8s" \
    --conf spark.hadoop.fs.default.name="hdfs://hdfs-k8s" \
    --conf spark.hadoop.dfs.nameservices="hdfs-k8s" \
    --conf spark.hadoop.dfs.ha.namenodes.hdfs-k8s="nn0,nn1" \
    --conf spark.hadoop.dfs.namenode.rpc-address.hdfs-k8s.nn0="hdfs-namenode-0.hdfs-namenode.default:8020" \
    --conf spark.hadoop.dfs.namenode.rpc-address.hdfs-k8s.nn1="hdfs-namenode-1.hdfs-namenode.default:8020" \
    --conf spark.hadoop.dfs.client.failover.proxy.provider.hdfs-k8s="org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider" \

Homegrown apps can be configured via code changes https://stackoverflow.com/a/35911455

maver1ck · 2018-10-20T07:11:57Z

I did the same mounting secrets with core-site and hdfs-site into Spark app.
The options you gave didn't work.

xianliangz · 2019-06-07T21:40:12Z

We have similar idea with @juv and I am implementing it. Basically We create a ZooKeeper watcher to get notification on the event of ZooKeeper znode /hadoop-ha/ns/ActiveStandbyElectorLock. Then we will get the info about which NameNode is active. We label the active NameNode so that the k8s service only route requests to active NameNode. In this way, client don't need to retry to figure out which is active.
I would like to seek feedback on this solution from experts.
Thank you!

dennischin · 2019-09-20T00:48:39Z

@maver1ck were you able to get this working with configmaps? I am also trying with configmaps and I am failing as well.

My hdfs-ha cluster name is not resolvable even though all my info is properly mounted in configmaps and the core-site.xml and hdfs-site.xml files are properly mounted.

Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: hdfs-k8s
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:378)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:678)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:619)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1866)
at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:721)
at org.apache.spark.util.Utils$.fetchFile(Utils.scala:496)
at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:811)
at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:803)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:130)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:803)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:375)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: hdfs-k8s

vvbogdanov87 · 2019-09-20T03:50:38Z

@dennischin Can you connect to hdfs-client pod and traverse files on hdfs?

kubectl get pod -l=app=hdfs-client
kubectl exec -ti <podname> bash
hadoop fs -ls /
hadoop fs -put /local/file /

You can find configuration on hdfs-client pod /etc/hadoop-custom-conf/

dennischin · 2019-09-20T16:48:16Z

@vvbogdanov87 , yes i can interact with hdfs (outside of spark) without issue.

geekyouth · 2020-01-23T00:57:39Z

vim spark-evn.sh

export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=30 -Dspark.history.fs.logDirectory=hdfs://mycluster/spark-job-log"
YARN_CONF_DIR=/opt/hadoop-2.7.2-ha/etc/hadoop

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NameNode HA: HDFS access gives error "Operation category READ is not supported in state standby" #54

NameNode HA: HDFS access gives error "Operation category READ is not supported in state standby" #54

juv commented Jul 6, 2018

vvbogdanov87 commented Sep 10, 2018

maver1ck commented Oct 20, 2018

xianliangz commented Jun 7, 2019

dennischin commented Sep 20, 2019

vvbogdanov87 commented Sep 20, 2019 •

edited

Loading

dennischin commented Sep 20, 2019

geekyouth commented Jan 23, 2020

NameNode HA: HDFS access gives error "Operation category READ is not supported in state standby" #54

NameNode HA: HDFS access gives error "Operation category READ is not supported in state standby" #54

Comments

juv commented Jul 6, 2018

vvbogdanov87 commented Sep 10, 2018

maver1ck commented Oct 20, 2018

xianliangz commented Jun 7, 2019

dennischin commented Sep 20, 2019

vvbogdanov87 commented Sep 20, 2019 • edited Loading

dennischin commented Sep 20, 2019

geekyouth commented Jan 23, 2020

vvbogdanov87 commented Sep 20, 2019 •

edited

Loading