Data Warehouse as Code
Infrastructure as Code for the Snowflake data warehouse platform
PS: Anything which has been written here is entirely in my own capacity and knowledge.
More often than not we have heard of running Infrastructure-As-Code (IAC) to maintain the infrastructure state for cloud native applications. However, this got me thinking. With Data Warehouse playing a critical piece in many critical cloud native applications for enterprise companies, how can we manage the state of these warehouse infrastructure? Are we able to have a set of repeatable code for data warehouse infrastructure for operations?
Turns out there is a way to do so. In this article I will be creating a simple Data Warehouse DevOps Demo. For IAC, I will be using Terraform alongside with Jenkins to help automate our pipeline. Terraform is one of the most popular IAC which strongly resonates with a lot of DevOps engineers.
For my Data Warehouse, I will be using Snowflake as a Data Warehouse platform. If you do not know already, Snowflake is steadily becoming a critical piece of companies’ data warehouse strategy.
With the rise in popularity in Snowflake and Terraform, it is great to know that a Terraform provider plugin for managing Snowflake accounts was created by chanzuckerberg. (https://registry.terraform.io/providers/chanzuckerberg/snowflake/latest/docs).
An overview of our architecture is given here. For this, we will be creating our end state for our warehouse defined in Terraform. After checking in our source code into Github, Jenkins will pull from our repo to deploy the Snowflake Data Warehouse as defined in our IAC.
Looking at our code state, I am creating a simple Data Warehouse of xsmall, deploying a database entitled “TF_DEMO_DB” with a schema (no Fail-safe period enabled) and my table having 2 simple columns.
In my Jenkins pipeline, I have set up and deployed the Jenkins Plugin and created a Jenkins File with a parameterised build of my SNOWFLAKE_USER, SNOWFLAKE_ACCOUNT, SNOWFLAKE_REGION and SNOWFLAKE_PASSWORD.
After triggering my pipeline, we can see that Jenkins has fired off and created my warehouse, schema, database and associated tables
So what happens if I want to add in an additional column say description with the variant data type?
Going back to our Terraform code, I have added in one more column stating description with type variant.
After checking in my code into Github and rerunning the Jenkins workflow, going back to our Snowflake console, we can see that the column description has been added. Pretty neat!
As such, from this we can see how we can use Terraform, Jenkins and Github to create a Day 1 creation+ Day 2 continuous Data Warehouse Devops Pipeline for Snowflake.
Hope this was a good educational read. Have fun coding!