빅데이터(BigData)/Spark

ERROR cluster.YarnScheduler: Lost executor 1 on xxx-Xxxx: Slave lost 에러 발생 시

leebaro 2021. 8. 9.
728x90

spark-submit으로 Spark streaming을 실행하면 2시간 마다 배치가 중단되는 현상이 발생하고 아래와 같은 에러가 나타난다.

 

local모드로 실행하면 문제가 없지만 client 모드 또는 cluster 모드로 실행하면 에러가 발생한다.


[2021-08-09 02:19:05,746] {bash_operator.py:128} INFO - 21/08/09 02:19:05 ERROR cluster.YarnScheduler: Lost executor 1 on xxx-Xxxx: Slave lost
[2021-08-09 02:19:05,911] {bash_operator.py:128} INFO - 21/08/09 02:19:05 ERROR client.TransportClient: Failed to send RPC RPC 8387377159996559940 to /xxx.xxx.xxx.xxx:59188: java.nio.channels.ClosedChannelException
[2021-08-09 02:19:05,911] {bash_operator.py:128} INFO - java.nio.channels.ClosedChannelException
[2021-08-09 02:19:05,911] {bash_operator.py:128} INFO - at io.netty.channel.AbstractChannel$AbstractUnsafe.newClosedChannelException(AbstractChannel.java:957)
[2021-08-09 02:19:05,911] {bash_operator.py:128} INFO - at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:865)
[2021-08-09 02:19:05,911] {bash_operator.py:128} INFO - at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367)
[2021-08-09 02:19:05,911] {bash_operator.py:128} INFO - at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
[2021-08-09 02:19:05,911] {bash_operator.py:128} INFO - at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764)
[2021-08-09 02:19:05,911] {bash_operator.py:128} INFO - at io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1104)
[2021-08-09 02:19:05,911] {bash_operator.py:128} INFO - at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
[2021-08-09 02:19:05,912] {bash_operator.py:128} INFO - at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
[2021-08-09 02:19:05,912] {bash_operator.py:128} INFO - at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
[2021-08-09 02:19:05,912] {bash_operator.py:128} INFO - at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
[2021-08-09 02:19:05,912] {bash_operator.py:128} INFO - at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
[2021-08-09 02:19:05,912] {bash_operator.py:128} INFO - at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
[2021-08-09 02:19:05,912] {bash_operator.py:128} INFO - at java.lang.Thread.run(Thread.java:748)

아래 글을 참고하면 yarn-site.xml 파일에서 옵션을 변경해야 한다고 제안한다.

<property>
    <name>yarn.nodemanager.pmem-check-enabled</name>
    <value>false</value>
</property>

<property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
</property>

 

 

시도1.

spark-submit 실행 시 아래 옵션을 2g로 변경해봤으나 동일한 에러 발생

 

Property NameDefaultMeaningSince Version

spark.yarn.am.memory 512m Amount of memory to use for the YARN Application Master in client mode, in the same format as JVM memory strings (e.g. 512m, 2g). In cluster mode, use spark.driver.memory instead.Use lower-case suffixes, e.g. k, m, g, t, and p, for kibi-, mebi-, gibi-, tebi-, and pebibytes, respectively. 1.3.0

 

참고

https://khstar.tistory.com/entry/YARN%EC%9D%84-%EC%9D%B4%EC%9A%A9%ED%95%9C-Spark-cluster%EA%B5%AC%EC%84%B1-%EC%A4%91-%EC%97%90%EB%9F%AC

 

YARN을 이용한 Spark cluster구성 중 에러

hadoop의 yarn을 이용한 spark cluster를 구성중입니다. 공부중이라 그냥 이슈사항만 작성합니다. core-site.xml, yarn-site.xml 파일이 위치한 경로를 시스템 환경설정 파일(.profile, .bash_profile 등)에 HADOO..

khstar.tistory.com

https://stackoverflow.com/questions/39467761/how-to-know-what-is-the-reason-for-closedchannelexceptions-with-spark-shell-in-y

 

How to know what is the reason for ClosedChannelExceptions with spark-shell in YARN client mode?

I have been trying to run spark-shell in YARN client mode, but I am getting a lot of ClosedChannelException errors. I am using spark 2.0.0 build for Hadoop 2.6. Here are the exceptions : $ spark-...

stackoverflow.com

728x90