Orc table creation from spark sql with snappy compression

11/18/2023

Then error as 18/04/26 21:03:44 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, .th, executor 1): : Task failed while writing rows.Īt .$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:285)Īt .$$anonfun$write$1.apply(FileFormatWriter.scala:197)Īt .$$anonfun$write$1.apply(FileFormatWriter.scala:196)Īt .nTask(ResultTask.scala:87)Īt .n(Task.scala:109)Īt .Executor$n(Executor.scala:345)Īt .runWorker(ThreadPoolExecutor.java:1149)Īt $n(ThreadPoolExecutor.java:624)Ĭaused by: : .maxCompressedLength(I)IĪt .maxCompressedLength(Native Method)Īt .maxCompressedLength(Snappy.java:316)Īt 圜press(Snapp圜ompressor.java:67)Īt .(CompressorStream.java:81)Īt .(CompressorStream.java:92)Īt $press(CodecFactory.java:112)Īt $ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:89)Īt 1.writePage(ColumnWriterV1.java:153)Īt 1.flush(ColumnWriterV1.java:241)Īt 1.flush(ColumnWriteStoreV1.java:126)Īt (InternalParquetRecordWriter.java:159)Īt (InternalParquetRecordWriter.java:111)Īt (ParquetRecordWriter.java:112)Īt .close(ParquetOutputWriter.scala:42)Īt .$SingleDirectoryWriteTask.releaseResources(FileFormatWriter.scala:405)Īt .$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:396)Īt .$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:269)Īt .$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:267)Īt .Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1411)Īt .$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:272)ġ8/04/26 21:03:44 ERROR scheduler.TaskSetManager: Task 0 in stage 0.0 failed 4 times aborting jobġ8/04/26 21:03:44 ERROR datasources.FileFormatWriter: Aborting job null. Sql("INSERT INTO parquet_table_name VALUES(1, 'test')") Sql("CREATE TABLE parquet_table_name (x INT, y STRING) STORED AS PARQUET") So the only thing you can set is the compression codec, using dataframe.write().format("orc").option("compression","snappy").I'm trying to create Hive table with snappy compression via Spark2. Note that the default compression codec has changed with Spark 2 before that it was zlib This can be one of the known case-insensitive shorten

compression (default snappy): compression codec to use when.
You can set the following ORC-specific option(s) for writing ORC But again, these properties must be set before creating (or re-creating) the hiveContext.įor ORC and the other formats, you have to resort to format-specific DataFrameWriter options quoting the latest JavaDoc. There are some Spark-specific properties for Parquet, and they are well documented. Spark uses its own SerDe libraries for ORC (and Parquet, JSON, CSV, etc) so it does not have to honor the standard Hadoop/Hive properties. ("orc.compress","") // will now be Snappy Val sparkAlt = .SparkSession.builder().config("orc.compress","snappy").getOrCreate() ("orc.compress","") // depends on Hadoop conf Val hiveContextAlt = new .SQLContext(scAlt) Val scAlt = new ((new ).set("orc.compress","snappy"))

Sc.getConf.get("orc.compress","") // depends on Hadoop conf or in your code, by re-creating the SparkContext.

either in the hive-site.xml available to Spark at launch time.They are Hive configuration properties, that must be defined before creating the hiveContext object. Orc.compress and the rest are not Spark DataFrameWriter options. You are making two different errors here.

0 Comments

Orc table creation from spark sql with snappy compression

Leave a Reply.

Author

Archives

Categories