sparkavro: Manupilate Apache Avro file with sparklyr

2017-03-26·

Aki Ariga

· 1 min read

I created a simple sparklyr extension to handle Apache Avro file. It is just a simple wrapper of DataBrick’s spark-avro. It is listed in the official document of sparklyr extensions.

chezou/sparkavro
_sparkavro - Load Avro data into Spark with sparklyr_github.com

Installation

Use {devtools} to install sparkavro.

devtools::install_github(“chezou/avrospark”)

Simple usage

You can read and write Avro file as follows:

library(sparklyr)
library(sparkavro)
sc <- spark_connect(master = “spark://HOST:PORT”)
df <- spark_read_avro(sc, “test_table”, “/user/foo/test.avro”)
spark_write_avro(df, “/tmp/output”)

This is the very first version, so there might be bugs especially around options. If you find any bug, please raise on the GitHub issue.

Last updated on 2017-03-26

sparkavro: Manupilate Apache Avro file with sparklyr

Installation

Simple usage

Related