Spark explode() 사용해서 List 로 된 컬럼을 행으로 분리하기

남제이입니다! 2023. 10. 1. 17:57

728x90

Spark Dataframe 에 다음과 같이 리스트 형태로 들어간 컬럼이 있을 것이다.

scala> val df = Seq(("Nam", List("A", "B", "C", "D"))).toDF("name", "grade")
df: org.apache.spark.sql.DataFrame = [name: string, grade: array<string>]

scala> df.show()
+----+------------+
|name|       grade|
+----+------------+
| Nam|[A, B, C, D]|
+----+------------+

이런 경우에 grade 라는 컬럼을 각 row 로 분리할 필요가 생길수도 있다.
이때, explode() 함수를 통해서 리스트를 각 row 로 분리해줄 수 있다.

원하는 column 을 explode() 함수를 통해 분리해서 펼쳐주면 다음과 같이 리스트가 각 row 로 분리된 것을 확인할 수 있다.

scala> df.select(explode(col("grade")).alias("score")).show()
+-----+
|score|
+-----+
|    A|
|    B|
|    C|
|    D|
+-----+

Spark 공식 문서 참고

https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.functions.explode.html

pyspark.sql.functions.explode — PySpark 3.1.3 documentation

Returns a new row for each element in the given array or map. Uses the default column name col for elements in the array and key and value for elements in the map unless specified otherwise. Examples >>> from pyspark.sql import Row >>> eDF = spark.createDa

spark.apache.org

728x90

저작자표시 비영리 변경금지 (새창열림)