sparkavro: Manupilate Apache Avro file with sparklyr

I created a simple sparklyr extension to handle Apache Avro file. It is just a simple wrapper of DataBrick’s spark-avro. It is listed in the official document of sparklyr extensions.

_sparkavro - Load Avro data into Spark with


Use {devtools} to install sparkavro.


Simple usage

You can read and write Avro file as follows:

sc <- spark_connect(master = “spark://HOST:PORT”)
df <- spark_read_avro(sc, “test_table”, “/user/foo/test.avro”)
spark_write_avro(df, “/tmp/output”)

This is the very first version, so there might be bugs especially around options. If you find any bug, please raise on the GitHub issue.

Aki Ariga
Aki Ariga
Machine Learning Engineer

Interested in Machine Learning, ML Ops, and Data driven business. If you like my blog post, I’m glad if you can buy me a tea 😉

  Gift a cup of Tea