Note: Views expressed here are my own and do not represent my employer in any fashion.
It is without a doubt the containers have become the default standard for running cloud native applications. The same has extended to data pipeline in which they are finally reproducible and they can be run anywhere in the same fashion. With the onset of Kubernetes, we can now deploy these repeatable data pipelines easily at scale with less management overhead.
In this article, we will be looking at how to use and set up dbt to be orchestrated as a container on Kubernetes with Github Actions Integration. The idea is such that when we adjust our dbt modelling, it will trigger a Github Actions which will deploy the updated container onto Kubernetes. Let’s begin!
We have written a simple dbt model based on the Jaffle shop from dbt in which we are doing some simple aggregations (dim_customers.sql) with the CUSTOMERS table (stg_customers.sql) and ORDERS table (stg_orders.sql). For reference sake, I am going to paste the stg_orders.sql and stg_customers.sql files below.
We will also create a profiles.yml file so that it contains the profile data of our dbt environment. For this, we are going to make the profiles.yml file read our environment variables from our Kubernetes cluster which we will configure later as Kubernetes secrets.
We also have our Dockerfile in which we will be using to containerise our dbt container.
Let us now configure our GitHub actions to help build and store our docker container into dockerhub
Also remember to set your action secrets in your github repository and you can trigger the GitHub actions on git push!
If we take a look at our Snowflake environment, we can see the DIM_CUSTOMER Table has materialized.