sparkavro: Manupilate Apache Avro file with sparklyr

2017-03-26·
Aki Ariga
Aki Ariga
· 1 min read

I created a simple sparklyr extension to handle Apache Avro file. It is just a simple wrapper of DataBrick’s spark-avro. It is listed in the official document of sparklyr extensions.

chezou/sparkavro
_sparkavro - Load Avro data into Spark with sparklyr_github.com

Installation

Use {devtools} to install sparkavro.

devtools::install_github(“chezou/avrospark”)

Simple usage

You can read and write Avro file as follows:

library(sparklyr)
library(sparkavro)
sc <- spark_connect(master = “spark://HOST:PORT”)
df <- spark_read_avro(sc, “test_table”, “/user/foo/test.avro”)
spark_write_avro(df, “/tmp/output”)

This is the very first version, so there might be bugs especially around options. If you find any bug, please raise on the GitHub issue.

Aki Ariga
Authors
Principal Software Engineer
Interested in Machine Learning, ML Ops, and Data driven business. If you like my blog post, I’m glad if you can buy me a tea 😉

Related