What’s data lineage? Data lineage is something to describe “Where this data comes from and where it goes?” I learned this term in my previous job. They provided “Cloudera Navigator” which includes data lineage from execution logs of Hive/Spark etc. lineage of Cloudera Navigator via https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/cn_lineage_generation.html sqllineage is awesome open source tool for visualizing lineage Recently, I learned there is a Python package so called sqllinage, that makes analyze and