-
Notifications
You must be signed in to change notification settings - Fork 0
/
x.1. database_fundamentals.txt
183 lines (171 loc) · 10.8 KB
/
x.1. database_fundamentals.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
// LESSON 1.1 - NORMALIZATION
- organize data and reduce redundant data
- improves data integrity
- this is useful if you're the one creating the database but you're a database engineer!!! you modify existing database already
- there's still alot of thigns i havents finished yet, ill tackle them all when i need them
data anomalies
- insertional anomaly
- update anomalies:
- deletion anomaly:
1st normal form (1nf)
- eliminate repeating groups
- attribute domain does not change
- there is a unique name for every attribute/column
- order in whcih data is stored does not matter
- all relation must have singled valued attribute only
example: not 1nf since there are columns which contains multiple value in a single row
ID Name Courses
------------------
1 A c1, c2
2 E c3
3 M C2, c3
converted into 1nf
ID Name Course
------------------
1 A c1
1 A c2
2 E c3
3 M c2
3 M c3
2nd normal form (2nf)
- based on the concept of full functional dependency
- should not contain partial dependency
partial dependency: occurs when a non-prime attribute is functionally dependent on part of a candidate key
candidate key:
- applies to relations with composite keys
- composite keys are multiple primary keys which can be used to uniquely identify a row
- relation with a single attribute primary key is automatically atleast 2nf
- to be in 2nf, it has to be in 1nf first
example: not 2nf since stud_no and course_no are both primary keys thus forming (composite key) | candidate keys are also not unique for each rows so we also have to fix that
STUD_NO COURSE_NO COURSE_FEE
1 C1 1000
2 C2 1500
1 C4 2000
4 C3 1000
4 C1 1000
2 C5 2000
converted into 2nf
TABLE 1 | TABLE 2
STUD_NO COURSE_NO | COURSE_NO COURSE_FEE
1 C1 | C1 1000
2 C2 | C2 1500
1 C4 | C3 1000
4 C3 | C4 2000
4 C1 | C5 2000
2 C5 |
3rd normal form (3nf)
- if no transitive dependency for non-prime attributes
transitive dependency: e.g. if we know the book then we know the author but if we know the author we cannot know the book, that is transitive dependency
non prime attributes: if a value for an attribute can be repeated, or in other words if values isn't unique
boyce-codd normal form (bcnf)
4th normal form
5th normal form (pj/nf or pjnf)
CONCLUSION
1NF - NO MULTI VALUE ATTRIBUTE
2NF - NO COMPOSITE KEY, SPLIT THE TABLE | NO PARTIAL DEPENDENCY
3NF - NO TRANSITIVE DEPENDENCY
// LESSON 1.2 - ACID TRANSACTIONS
transaction : single logical unit fo work which accesses and possibly modifies the content of a database
- in other words: a single transaction is equal to single modification of the db
- In order to maintain consistency in a database, before and after the transaction, certain properties are followed. These are called ACID properties
ACID - Atomicity, Consistency, Isolation, Durability
Atomicity
- entire transaction takes place at once or doesn't happen at all
- transaction must not occur partially
- all or nothing
abort
- changes made are not visible
commit
- changes made are visible
Consistency
- databse must be consistent before and after the transaction
Isolation
- multiple transactions occur independently without interference
Durability
- changes of a successful transaction occurs even if the system failure occurs
// LESSON 1.3 - CAP THEOREM
CAP - Consistency, Availability, Partition Tolerance
- it is not possible to guarantee all three of the desirable properties at the same time in a distributed system with data replication
- theorem states that networked shared data system can only strong support two of the following three properties
- CAP and ACID do not refer to the same identical concept
Consistency
- nodes will have the same copies of a replicated data item visible for various transaction
Availability
- each read or write request for a data item will either be processedd successfully or will receive a message that the operation cannot be completed
Partition Tolerant
- means that the system can continue operating if the network connecting the nodes has a fault that results in two or more partitions
// LESSON 1.4 - OLTP VS OLAP
OLAP - Online Analyitical Processing
- type of software tools that are used for data analysis for business decisions
- get insights from the database retrieved from multiple database systems at one time
- any type of data warehouse system is an olap system
- used for data mining, analytics, decision making, etc.
OLTP - Online Transaction Processing
- OLTP administers day to day transaction of an organization
- used for business task
- example: atm center is an oltp
- another example: online banking, online airline ticket booking, etc.
// LESSON 1.5 - HORIZONTAL VS VERTICAL SCALING
- scaling either compress or exapnd the system to meet the expected needs
- can be achieved by adding resources to meet the smaller expectaion in the current system, or by adding a new in the existine one, or both
vertical scaling
- e.g. buying bigger machine to meet the expectation
- when new resources are added in the existing system to meet the expectation
- cheaper than horizontal scaling and also requires less time to be fixed
horizontal scaling
- e.g. buying more machine to meet the expectation
- when new server racks are added in the existing system to meet the higher expectation
// LESSON 1.6 - DIMENSIONAL MODELING
- one of the data modelling techniques used in data warehouse design
- goal is to improve the data retrieval
- advantage of this model is so that we can store data in such a way that it is easier to store and retrieve the data once stored in a data warehouse
- used by many olap systems
steps for dimensional model
identify the business object
identify granularity
- granularity is the lowest level of information stored in the table
- the level of detail for business problem and its solution is described by Grain
idetify dimension and attribute
dimensions
- dimensions are objects or things
- dimensions categorize and describe data warehouse facts and measures in a way that supports meaningful answers to business questions.
- e.g. data dimension may contain data like a year, motnh and weekday
identify the fact
- measurable data is helf by the fact table
- e.g. numerical values like price or cost per unit, etc.
build schema
- we implment dimension model in this step
- a schema is a database structure (how the table must look like somehow like an interface)
// LESSON X.1 - DATABASE TERMS
DATABASE - collection of relations
DATAFRAME - labeled data structure just like a json, or an object in js or a dict in python
RELATIONS - multiple tables are connected to each other
TABLE - collection of columns
COLUMNS - collection of individual fields/attribute
FIELD - single individual column it describes a particular cell found on any row
ATTRIBUTE - single individual value in a column that describes an entity
PRIME ATTRIBUTE - unique value
NON-PRIME ATTRIBUTE - not unique value
ROWS - data object
TUPLE - one record/row
RECORD - just like tuple, row
ENTITIY - single row | in er diagram entity is the table
DATA - individual cell data
SCHEMA - like a format on how a columns datatype on a table should look like
KEYS
PRIMARY KEYS - uniquely identifies every row in that table | e.g. { stud_id }
FOREIGN KEYS - creates relationship between two tables, allow navigation between two different instance of an entity. This is the primary key from a different table that is in your current table. | e.g. table1(other) - {stud_id}, table2(current) - {stud_id}
COMPOSITE KEY - key that consists of two or more attributes that uniquely identify any row in the table | e.g. { cust_id, product_code }
CANDIDATE KEY - unique and no repeating values just like primary key. A field that can uniquely identify a row | e.g. {stud_id}, {roll_no}, {email} - these single fields alone can identify a unique row
SUPER KEY - group of single or multiple keys which can identify rows in a table | e.g. { emp_ssn } - this column alone can identify the row, { emp_id } - this column alone can identify the row, { emp_ssn,emp_id } - these two columns alone can idenfity a row, etc.
ALTERNATE KEY - Remaining candidate key after primary key is chosen. | e.g. let's say there are three candidate keys:{stud_id,roll_no,email} then {stud_id} was chosen as primary key therefore {roll_no,email} becomes the alternate key
COMPOUND KEY - has two or more foreign keys in your current table which could be used to uniquely identify a row | e.g. {student_id} and {course_id} are primary key in other table but is a foreign key in our current table, we could combine these two fields {student_id,course_id} to form a compound key
SURROGATE KEY - an artificial or non natural key which aims to uniquely identify each record is called a surrogate key, pretty much like priamry except its system generated | e.g. 12asdfq23h
DEPENDENCIES
PARTIAL DEPENDENCIES - occurs when a non prime attribute is dependent on part of a candidate key( or a key which could tell if a an entity is unique )
TRANSITIVE DEPENDENCIES - e.g. if we know the book then we know the author but if we know the author we cannot know the book, that is transitive dependency
FUNCTIONAL DEPENDENCIES -
DATA PIPELINE - general process by which data moves from one system into another
DAG (DIRECTED ACYCLIC GRAPH) - like a data pipeline but then it has subtasks for different cases | like waterfall method once it reaches a level it cannot go back
UPSTREAM DEPENDENCY - one where something must happen before your project can start something else
DOWNSTREAM DEPENDENCY - something your project must deliver before somethin else can start