Blog Posts

Tags Categories

py> operator development guide for Python users

This article show how to develop a digdag Python workflow task efficiently.

How to release Python package from GitHub Actions

Photo by Hitesh Choudhary on Unsplash Recently, I changed my CI from Travis to GitHub Actions. GitHub Actions is handy and useful for testing, publishing Python packages. Testing Python code on GitHub Actions Migration from Travis is super easy, just writing a simple workflow like: The benefits of GitHub Actions for Python are: We can use build matrix (e.g., OS and Python versions) like Travis Launch time of GitHub is faster than Travis Easy for additional dependency installation by using uses syntax, which uses another workflow For example, installing JDK can be written as:

How to test a new Docker image for digdag workflow on CircleCI?

Photo by Campaign Creators on Unsplash Testing workflow runnability would be important when we build a complex workflow. digdag is a workflow engine which syntax is simple and is able to run tasks with SQL, Python, Ruby, shell script, etc. digdag has Docker executor and it works like a charm with py>, rb>, and sh> operators. How to ensure a new Docker image runnable with existing digdag workflow? I’ll show the way to run through it on CircleCI.

The first conference of Operational Machine Learning: OpML ‘19

I attended OpML ’19 is a conference for “Operational Machine Learning” held at Santa Clara on May 20th. OpML ‘19 _The 2019 USENIX Conference on Operational Machine Learning (OpML ‘19) will take place on Monday, May 20, 2019, at the…[]( The scope of this conference is varied and seems not to be specified yet, even if I attended it. I’ll borrow the description from the OpML website. The 2019 USENIX Conference on Operational Machine Learning (OpML ’19) provides a forum for both researchers and industry practitioners to develop and bring impactful research advances and cutting edge solutions to the pervasive challenges of ML production lifecycle management.

Ruby for Data Science and Machine Learning

I attended RubyKaigi 2019 held at Fukuoka from Apr 18 to Apr 21. This year’s RubyKaigi was a really great opportunity for me to know the possibility of Data Science and Machine Learning for Ruby. Data Science and Ruby As many of you may know, Ruby is widely known for web application with such as Ruby on Rails, but there is another momentum of Ruby or non-Python language. Here is the list of the sessions about Data Science.

A recent update of tabula-py

Photo by Joshua Rawson-Harris on Unsplash This article is a repost of Patreon article published last December. I’m planning to bump up the next version of tabula-py within few weeks. (Note: Oct 7th, 2019) As of Oct. 2019, I launched a documentation site and Google Colab notebook for tabula-py. The FAQ would be good place to execute accurate extraction. This is my first post on patreon. Apologies for delayed announcement of recent update of tabula-py.

Python basics: package management

Python is a very famous programming language for machine learning. In this article, I will introduce basic Python environment. Glossary I will introduce basic terms about Python package management. pip: A tool for package installation. It retrieves Python packages from PyPI. pip is gem command of Ruby. virtualenv: Package isolation tool for Python. It has similar function with bundler of Ruby, but it also has the function to change Python

Why OSS based machine learning is good?

This article is translation of Japanese version. After releasing of TensorFlow, the movement of OSS-based machine learning is accelerating. François Chollet, the creator of Keras, says the essential point of this change. I think his phrase is enough, but in this article, I would like to organize why open source machine learning is great, and what recent trends are. tl;dr Machine learning and deep learning frameworks have become standard things for software engineers Since arXiv becomes very famous, many papers are published before peer review of international conferences.

How to run Cloudera Director on your macOS/Windows 10

Cloudera Director is a provisioning tool for CDH and Cloudera Enterprise. We can launch cluster with Web GUI or CLI tool. Using Cloudera Director CLI tool, you can manage your cluster with configuration file, that enables you to manage configurations with git. In this article, I will introduce how to install Cloudera Director into your local macOS or Windows 10. For usage of Cloudera Director, see also the document. Cloudera Director 2.

tabula-py now able to extract remote PDF and multiple tables at once

(Note: Oct 7th, 2019) As of Oct. 2019, I launched a documentation site and Google Colab notebook for tabula-py. The FAQ would be good place to execute accurate extraction. tabula-py is a Python library which enables you to extract tables from PDF into pandas DataFrames. Today, I released v0.8.0. In this post, I will introduce improvements after previous post of tabula-py. If you don’t familiar with tabula-py, you can see previous one.