sparkavro: Manupilate Apache Avro file with sparklyr

I created a simple sparklyr extension to handle Apache Avro file. It is just a simple wrapper of DataBrick’s spark-avro. It is listed in the official document of sparklyr extensions.

chezou/sparkavro
_sparkavro - Load Avro data into Spark with sparklyr_github.com
[](https://github.com/chezou/sparkavro)

Installation

Use {devtools} to install sparkavro.

devtools::install_github(“chezou/avrospark”)

Simple usage

You can read and write Avro file as follows:

library(sparklyr)
library(sparkavro)
sc <- spark_connect(master = “spark://HOST:PORT”)
df <- spark_read_avro(sc, “test_table”, “/user/foo/test.avro”)
spark_write_avro(df, “/tmp/output”)

This is the very first version, so there might be bugs especially around options. If you find any bug, please raise on the GitHub issue.

Avatar
Aki Ariga
Machine Learning Engineer

Interested in Machine Learning, ML Ops, and Data driven business.