Skip to content

prabh8331/Data-Engineering-with-Azure

Repository files navigation

Data Engineering 3.0 with Azure

Pre-Read Material-

Python

Linux

Docker Fundamentals & Installation- https://www.youtube.com/watch?v=jPdIRX6q4jA&list=PLy7NrYWoggjzfAHlUusx2wuDwfCrmJYcs

Github fundamentals: https://youtu.be/8JJ101D3knE

Quick setting up:

  1. Windows with github and 2. Linux with github Install git on windows: https://git-scm.com/
# open git bash

git config --global user.name "prabh8331"
git config --global user.email "prabh8331@gmail.com"

ssh git and github setup
cd ~/.ssh
ssh-keygen -t ed25519 -C "prabh8331@gmail.com"

eval "$(ssh-agent -s)"
ssh-add ~/.ssh/id_ed25519
cat id_ed25519.pub (copy)
go to github>settings>ssh and GPG keys > new ssh key > paste the key

ssh -T git@github.com
  1. Windows with Linux
# windows ssh setup 
# -- ubuntu server part 
cd ~/.ssh
ssh-keygen -t ed25519
     name the key as windows
eval "$(ssh-agent -s)"
ssh-add ~/.ssh/windows

cat windows.pub (copy key from here)
nano authorized_keys (paste here)

# windows part 
--- now copy this "windows" (privte key to "C:\Users\Komalpreet Kaur\.ssh" location)
or open vscode and edit ssh config file (which is in location "C:\Users\Komalpreet Kaur.ssh\config")

Host ubuntu_server
  HostName 192.168.1.111
  User userver
  IdentityFile "C:\Users\Komalpreet Kaur\.ssh\windows"


MySQL workbench setup windows- https://youtu.be/8JJ101D3knE

create GCP account https://www.youtube.com/watch?v=m5hwU0jD0qc

what interview qns others are getting:

basic DSA is required from Leetcode

incremental data refresh in snowflacks and databricks

SQL- window's functions common table expressions how to create funciotn in SQL stored proceedures - know the thoury views indexing itterative and recursive (CTE)

kuberneeties are more part of Devops

nosql don't support ACID property nosql we would want consisitcy nosql is best trafic , scalibility, parallel , analytical query

python - pandas, (numpy not required) DSA (leetcode - 2 qns everyday) system design is not needed but basic of datapipeline is needed Scala

tockenization in oracle stream?

How to practice SQL

  1. leetcode
  2. search case study in github

Azure fibric

DP203 certificate

after course can cover devops part

data modeling and data warehousing, datalakes, iceberge hudi kubernities , devops etc.

in interview asking the ETL part, data processing part with respect to databricks

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published