toPandas() 후 조회 시 index 2 is out of bounds for axis 0 with size 에러가 발생할 때

빅데이터(BigData)/Spark

toPandas() 후 조회 시 index 2 is out of bounds for axis 0 with size 에러가 발생할 때

leebaro 2021. 3. 16. 15:37

spark dataframe 또는 koalas를 이용해서 DF를 만들고 toPandas()를 이용해서 pandas DF로 변환해야하는 경우가 있다.

필자 같은 경우 DF로 heatmap을 만드는데 koalas DF에서 만들면 에러가 발생해서 pandas df로 변환했다.

문제는 변환 후 조회하면 "index 2 is out of bounds for axis 0 with size" 와 같은 에러가 발생했다.

구체적으로 DF에 NaN 값이 있었고, df.fillna(0)으로 NaN을 0값으로 변환한 경우에 에러가 발생했다.

이경우 toPandas() 코드 윗 부분에 아래와 같은 코드를 추가하면 된다. 파라미터를 -1로 하면 동일한 에러가 발생하는 것을 확인할 수 있다.

pd.set_option('display.max_columns', 0)

참고

pyspark toPandas() IndexError: index is out of bounds

I'm experiencing a weird behaviour of pyspark's .toPandas() method running from Jupyt. For example, if I try this: data = [{"Category": 'Category A', "ID": 1, "Value":...

stackoverflow.com

'빅데이터(BigData) > Spark' 카테고리의 다른 글

스파크에서 로그 레벨 정의하기 (0)	2021.04.12
spark에서 parquet 파일 데이터 조회하기 (0)	2021.03.24
Koalas에서 Cannot combine the series or dataframe because it comes from a different dataframe 에러 발생 시 (0)	2021.03.15
Container killed by YARN for exceeding physical memory limits 에러 발생 시 (0)	2021.03.15
No lease on .. File does not exist. Holder DFSClient_NONMAPREDUCE_-690256595_53 does not have any open files. 에러 발생 시 (1)	2021.02.23

현재글toPandas() 후 조회 시 index 2 is out of bounds for axis 0 with size 에러가 발생할 때

프로도의 블로그

pyspark, Recommendation System, 추천 시스템, 손자병법, python, 빅데이터, pandas, Association Rule, spark, PET, scikit-learn, Machine Learning, 맥북, airflow, 부모역할훈련, git, 추천시스템, 부모 역할 훈련, 머신러닝, 파이썬,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

프로도의 블로그