Skip to content

Analyse Apache Airflow with MSSQL DB.

Notifications You must be signed in to change notification settings

a-shirude/airflow-mssql

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 

Repository files navigation

airflow-mssql

Analyse Apache Airflow with MSSQL DB. Official docs

Purpose

Create a DAG which will connect to mssql server and run simple stored procedure to create table.

System Requirements

Airflow doesn't officially support running on Windows. Need a setup on a Linux VM. Install Oracle VM VirtualBox on Windows system and setup Ubuntu OS.

Installation Guide

Install python on Ubuntu :

sudo apt update

sudo apt-get install python3.6

From sqlAlchemy docs - The SQL Server dialect uses pyodbc as the default DBAPI.

Install pyodbc :

Download link

pip3 install pyodbc

Install Apache-Airflow :

Refer : Tutorial

pip3 install apache-airflow[mssql]

Now Airflow will create the $AIRFLOW_HOME folder and lay an “airflow.cfg” file with defaults that get you going fast.

Inside airflow.cfg file :

Set load_examples = False Set sql_alchemy_conn = mssql+pyodbc://'login':'password'@'serverIP'/'airflow_test_DB'?driver='pathOf_odbcDriver'

NOTE : Do not install pymssql. This gives error - 'str' object has no attribute 'tzinfo'. Tried all timezone aware methods but got no success. Probably missed some airflow or sqlAlchemy configurations. Airflow timezone

Setup DAG and Airflow UI

Airflow - Initiation of DB in MSSQL Server. Creates all metadata tables for the application, it is responsible to setup backend

airflow initdb

start the web server, default port is 8080

airflow webserver -p 8080

  1. Edit coonnections inside Airflow UI --> Admin --> Connections
  2. Create DAG : repo contains an example DAG. Inside DAG task,

Set mssql_conn_id = '<newly_created_connection>'

Checkpoints while running DAG

  • check if airflow scheduler is running
  • check if airflow webserver is running
  • check if all DAGs are set to On in the web UI
  • check if the DAGs have a start date which is in the past
  • check if the DAGs have a proper schedule (before the schedule date) which is shown in the web UI
  • check if the dag has the proper pool and queue.

About

Analyse Apache Airflow with MSSQL DB.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%