Data Engineering 3.0 with Azure

Pre-Read Material-

Python

Linux

Docker Fundamentals & Installation- https://www.youtube.com/watch?v=jPdIRX6q4jA&list=PLy7NrYWoggjzfAHlUusx2wuDwfCrmJYcs

Github fundamentals: https://youtu.be/8JJ101D3knE

Quick setting up:

Windows with github and 2. Linux with github Install git on windows: https://git-scm.com/

# open git bash

git config --global user.name "prabh8331"
git config --global user.email "prabh8331@gmail.com"

ssh git and github setup
cd ~/.ssh
ssh-keygen -t ed25519 -C "prabh8331@gmail.com"

eval "$(ssh-agent -s)"
ssh-add ~/.ssh/id_ed25519
cat id_ed25519.pub (copy)
go to github>settings>ssh and GPG keys > new ssh key > paste the key

ssh -T git@github.com

Windows with Linux

# windows ssh setup 
# -- ubuntu server part 
cd ~/.ssh
ssh-keygen -t ed25519
     name the key as windows
eval "$(ssh-agent -s)"
ssh-add ~/.ssh/windows

cat windows.pub (copy key from here)
nano authorized_keys (paste here)

# windows part 
--- now copy this "windows" (privte key to "C:\Users\Komalpreet Kaur\.ssh" location)
or open vscode and edit ssh config file (which is in location "C:\Users\Komalpreet Kaur.ssh\config")

Host ubuntu_server
  HostName 192.168.1.111
  User userver
  IdentityFile "C:\Users\Komalpreet Kaur\.ssh\windows"

MySQL workbench setup windows- https://youtu.be/8JJ101D3knE

create GCP account https://www.youtube.com/watch?v=m5hwU0jD0qc

what interview qns others are getting:

basic DSA is required from Leetcode

incremental data refresh in snowflacks and databricks

SQL- window's functions common table expressions how to create funciotn in SQL stored proceedures - know the thoury views indexing itterative and recursive (CTE)

kuberneeties are more part of Devops

nosql don't support ACID property nosql we would want consisitcy nosql is best trafic , scalibility, parallel , analytical query

python - pandas, (numpy not required) DSA (leetcode - 2 qns everyday) system design is not needed but basic of datapipeline is needed Scala

tockenization in oracle stream?

How to practice SQL

leetcode
search case study in github

Azure fibric

DP203 certificate

after course can cover devops part

data modeling and data warehousing, datalakes, iceberge hudi kubernities , devops etc.

in interview asking the ETL part, data processing part with respect to databricks

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
Interview-prep		Interview-prep
Linux-Basics-main		Linux-Basics-main
Module1-SQL		Module1-SQL
Module13a-AWS		Module13a-AWS
Module2-BigData Fundamentals-and-Hadoop		Module2-BigData Fundamentals-and-Hadoop
Module4-Kafka		Module4-Kafka
Module7-Spark		Module7-Spark
Python_Fundamentals-main		Python_Fundamentals-main
architecture		architecture
aws_dev_env/lambda		aws_dev_env/lambda
.gitignore		.gitignore
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Engineering 3.0 with Azure

About

Releases

Packages

Languages

prabh8331/Data-Engineering-with-Azure

Folders and files

Latest commit

History

Repository files navigation

Data Engineering 3.0 with Azure

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages