- Loop Invariant: true at beginning of an iteration, and remains true at the beginning of the next iteration
- Initialisation: invariant is true before the first iteration of the loop
- Maintenance: invariant remains true before the next iteration
- Termination: when the algo terminates, the invariant provides a useful property for showing correctness
Prove by strong induction, eg. with a statement
-
Base case: prove base case
$P(k_0)$ is true-
$k_0$ is commonly$0$ or$1$
-
-
Induction step: prove
$P(k+1)$ is true under the assumption that$P(k_0),P(k_0+1),\dots,P(k)$ are all true
- When the problem only occurs a few times and is small, we prefer a simple algorithm
- When the problem occurs many times and is big, we prefer an efficient algorithm
- Big
$O$ : asymptotic upper bound (may or may not be tight)-
$f(n) = O(g(n))$ if$\exists$ $c, n_0 > 0$ st.$0 \leq f(n) \leq cg(n)$ ,$\forall n\geq n_0$
-
- Big
$\Omega$ : asymptotic lower bound-
$f(n) = \Omega(g(n))$ if$\exists$ $c, n_0 > 0$ st.$0 \leq cg(n) \leq f(n)$ ,$\forall n\geq n_0$
-
- Big
$\Theta$ : asymptotic tight bounds (may or may not be tight)-
$f(n) = \Omega(g(n))$ if$\exists$ $c_1, c_2, n_0 > 0$ st.$0 \leq c_1g(n) \leq f(n) \leq c_2g(n)$ ,$\forall n\geq n_0$
-
- Small
$o$ : asymptotic upper bound (but not tight; strictly$<$ )-
$f(n) = o(g(n))$ if$\exists$ $c, n_0 > 0$ st.$0 \leq f(n) < cg(n)$ ,$\forall n\geq n_0$
-
- Small
$\omega$ : asymptotic lower bound (but not tight; strictly$>$ )-
$f(n) = \Omega(g(n))$ if$\exists$ $c, n_0 > 0$ st.$0 \leq cg(n) < f(n)$ ,$\forall n\geq n_0$
-
$a = b^{\log_b a}$ $\log_c(ab) = \log_ca + log_cb$ $log_ba^n = nlog_ba$ $log_ba = \frac{\log_ca}{\log_cb}$ $log_b\frac{1}{a} = -\log_ba$ $log_ba = \frac{1}{log_ab}$ $a^{\log_bc} = c^{\log_ba}$ -
$\lg n = \Theta(\ln n) = \Theta(\log_{10}n)$ - Base of logarithm does not matter in asymptotics
$\lg(n!) = \Theta(n\lg n)$
$a^{-1}=\frac{1}{a}$ $(a^m)^n = a^{mn}$ $a^ma^n = a^{m+n}$ $e^x \geq 1+x$ -
$n^k=o(a^n), \forall k>0, a>1$ - Any exponential function
$a^n$ with base$a > 1$ grows faster than any polynomial function$n^k$
- Any exponential function
- Arithmetic:
$\sum_{k=1}^{n}k = \frac{1}{2}n(n+1) = \Theta(n^2)$ - Geometric:
$\sum_{k=1}^{\infty}x^k = \frac{1}{1-x}$ , when$|x| < 1$ - Harmonic:
$H_n = \sum_{k=1}^n\frac{1}{k} = \log n + O(1) = \Theta(\log n)$ $H_{\log(n)} = \sum_{k=1}^{\log(n)}\frac{1}{k} = \log(\ln n) + O(1)= \Theta(\log\log n)$
- Telescoping:
$\sum_{k=0}^{n-1}(a_k-a_{k+1}) = a_0 - a_n$
$n! = \sqrt{2\pi n}(\frac{n}{e})^n(1+\Theta(\frac{1}{n}))$ -
$\omega(2^n) = n! = o(n^n)$ -
$n!$ is lower bounded by$2^n$ and upper bounded by$n^n$
-
Assume
$\lim_{n\rightarrow \infty}\frac{f(n)}{g(n)}=0 \implies f(n)=o(g(n))$ $\lim_{n\rightarrow \infty}\frac{f(n)}{g(n)}<\infty \implies f(n)=O(g(n))$ -
$0<\lim_{n\rightarrow \infty}\frac{f(n)}{g(n)}<\infty \implies f(n)=\Theta(g(n))$ - As
$n \rightarrow\infty$ ,$\frac{f(n)}{g(n)}$ converges to a defined number
- As
$\lim_{n\rightarrow \infty}\frac{f(n)}{g(n)}>0 \implies f(n)=\Omega(g(n))$ $\lim_{n\rightarrow \infty}\frac{f(n)}{g(n)}=\infty \implies f(n)=\omega(g(n))$ - L'Hopital's:
$\lim_{x\rightarrow \infty}\frac{f(x)}{g(x)} = \lim_{x\rightarrow \infty}\frac{f'(x)}{g'(x)}$
Assumes
$\frac{d}{dx}\log_bx = \frac{1}{x\ln b}$ $\frac{d}{dx}\lg \lg n = \frac{1}{n\ln 2 \ln n}$ $\frac{d}{dx}\lg^px = \frac{p\ln^{p-1}x}{x\ln^p(p-1)}$ $\frac{d}{dx}e^{nx} = ne^{nx}$
$f(n) = \Theta(g(n)) \wedge g(n) = \Theta(h(n)) \implies f(n) = \Theta(h(n))$ $f(n) = O(g(n)) \wedge g(n) = O(h(n)) \implies f(n) = O(h(n))$ $f(n) = \Omega(g(n)) \wedge g(n) = \Omega(h(n)) \implies f(n) = \Omega(h(n))$ $f(n) = o(g(n)) \wedge g(n) = o(h(n)) \implies f(n) = o(h(n))$ $f(n) = \omega(g(n)) \wedge g(n) = \omega(h(n)) \implies f(n) = \omega(h(n))$
$f(n) = \Theta(f(n))$ $f(n) = O(f(n))$ $f(n) = \Omega(f(n))$
$f(n) = \Theta(g(n)) \iff g(n) = \Theta(f(n))$
$f(n) = O(g(n)) \iff g(n) = \Omega(f(n))$ $f(n) = o(g(n)) \iff g(n) = \omega(f(n))$
$O(\lg(n!)) \equiv O(n\lg n) \ll O(n^2) \ll O((\lg n)!) \ll O(2^n) \ll O(n!)$ -
$O(\lg n) \ll O(n^\epsilon)$ (for any$\epsilon > 0$ , ie.$\epsilon = 0.1$ )
-
$f(n) = O(n^{\log_ba-\epsilon})$ for some constant$\epsilon > 0 \implies T(n) = \Theta(n^{\log_ba})$ -
$f(n)$ must be polynomially smaller than$n^{\log_ba}$ by a factor of$n^\epsilon$ for some constant$\epsilon > 0$
-
-
$f(n) = \Theta(n^{\log_ba}\lg^kn)$ for some$k\geq 0 \implies T(n) = \Theta(n^{\log_ba}\lg^{k+1} n)$ -
$[f(n) = \Omega(n^{\log_ba+\epsilon})$ for some constant$\epsilon > 0] \wedge [af(\frac{n}{b}) \leq cf(n)$ for some constant$c < 1$ and all sufficiently large$n] \implies T(n) = \Theta(f(n))$ -
$f(n)$ must be polynomially larger AND satisfy condition that$af(\frac{n}{b}) \leq cf(n)$
-
Intuitively, we compare the function
Master thorem can fail to work: these 3 cases do not cover all possibilities of
Eg. for
- Let
$g(n) = n^{\log_ba+\epsilon} = n^{1+\epsilon} = n \times n^\epsilon$ , for some constant$\epsilon > 0$ - Assume
$f(n)$ is indeed polynomially larger than$g(n)$ - ie.
$\exists,\epsilon>0$ st.$f(n) = \Omega(n^{\log_ba+\epsilon}) = \Omega(g(n))$
- ie.
- To find the asymptotic bounds, take
$\lim_{n\rightarrow\infty}\frac{f(n)}{g(n)}$ $\frac{f(n)}{g(n)} = \frac{n\lg n}{n\times n^\epsilon} = \frac{\lg n}{n^\epsilon}$ -
$f'(n) = \frac{1}{n\ln 2}$ ,$g'(n) = \epsilon n^{\epsilon-1}$ - Using L'Hopital's,
$\lim_{n\rightarrow\infty}\frac{f(n)}{g(n)}=\lim_{n\rightarrow\infty}\frac{f'(n)}{g'(n)} = \lim_{n\rightarrow\infty}\frac{1}{n\ln 2 \times \epsilon n^{\epsilon-1}} = 0$
- Since
$\lim_{n\rightarrow\infty}\frac{f(n)}{g(n)} = 0$ ,$f(n)=o(g(n)) \therefore$ contradiction- ie.
$f(n)$ is asymptotically smaller than$g(n)$ which contradicts our assumption at line (2)
- ie.
- Thus,
$f(n)$ is not polynomially larger than$g(n)$ and case 3 does not apply
Case 1:
- Let
$g(n) = n^{\log_ba-\epsilon}$ , for some constant$\epsilon > 0$ - Check if
$f(n)$ is polynomially smaller than$n^{\log_ba}$ , ie.$f(n)=O(g(n))$ $\lim_{n\rightarrow\infty}\frac{f(n)}{g(n)} < \infty \implies f(n)=O(g(n))$ $f(n)=O(g(n)) \implies T(n) = \Theta(n^{\log_ba})$ - If
$f(n)$ is not polynomially smaller, recurrence cannot be solved by Master Theorem
Case 2:
$T(n) = \Theta(n^{\log_ba}\lg^{k+1} n)$
Case 3:
- Let
$g(n) = n^{\log_ba+\epsilon}$ , for some constant$\epsilon > 0$ - Check if
$f(n)$ is polynomially larger than$n^{\log_ba}$ , ie.$f(n)=\Omega(g(n))$ $\lim_{n\rightarrow\infty}\frac{f(n)}{g(n)} > 0 \implies f(n=\Omega(g(n))$ - Check if regularity condition holds, ie.
$af(\frac{n}{b}) \leq cf(n)$ for some$c < 1$ and for all sufficiently large$n$ $f(n)=\Omega(g(n)) \wedge af(\frac{n}{b}) \leq cf(n) \implies T(n) = \Theta(f(n))$ - If either condition not satisfied, recurrence cannot be solved by Master Theorem
-
Guess:
$T(n)=O(n\lg n)$ - ie.
$\exist c>0, T(n)\leq cn\lg n$
- ie.
-
Assume: that our guess is true for
$\forall m<n$ - ie.
$\exist c>0, T(m)\leq cm\lg m$
- ie.
-
Prove:
$T(n)\leq cn\lg n$ under the above assumption- Let
$m=\frac{n}{2}$ (since$\frac{n}{2}$ is clearly less than$n$ , our assumption holds) $T(\frac{n}{2}) \leq c\frac{n}{2}\lg\frac{n}{2}$ $2T(\frac{n}{2}) \leq cn\lg\frac{n}{2}$ - Substituting into original recurrence:
$T(n)\leq cn\lg\frac{n}{2}+n$ $T(n)\leq cn(\lg n - \lg 2)+an$ $T(n)\leq cn(\lg n - 1)+an$ $T(n)\leq cn\lg n - cn +an$ -
$T(n)\leq cn\lg n - (c-a)n \leq cn\lg n$ for any sufficiently large$c>a$ - Thus, we have proven our statement, initial guess must be true
- Let
-
Guess:
$T(n)=O(n)$ -
Assume: that our guess is true for
$\forall m<n$ -
Prove:
$T(n)\leq cn$ under the above assumption- Let
$m=\frac{n}{2}$ (since$\frac{n}{2}$ is clearly less than$n$ , our assumption holds) $T(\frac{n}{2}) \leq c\frac{n}{2}$ $2T(\frac{n}{2}) \leq cn$ - Substituting into original recurrence:
$T(n)\leq cn+an$ - However,
$cn+an \not\leq cn$ , thus we have failed to prove our statement
- Let
- Divide the problem into a number of subproblems that are smaller instances of the same problem
- Conquer the subproblems by solving them recursively. If the subproblem sizes are small enough, just solve the subproblems in a straightforward manner
- Combine the solutions to the subproblems into the solution for the original problem
- Make 2 subproblems become just 1 subproblem
- eg. from
$T(n)=2T(\frac{n}{2})+1 = \Theta(n)$ to$T(n)=T(\frac{n}{2})+1 = \Theta(\log n)$
- eg. from
Given
View comparison sorts in terms of a decision tree. Because any correct sorting algorithm must be able to produce each permutation of its input, there are
Consider a decision tree of height
$n! \leq l \leq 2^h$ $h \geq \lg(n!)$ -
$h = \Omega(n \lg n)$ , by Stirling's Approximation
Assume that each of the
CountingSort(A, B, k)
let C[0...k] = new array
for i = 0 to k
C[i] = 0
# Initialise array C
for j = 1 to A.length
C[A[j]] = C[A[j]] + 1
# C[i] now contains the number of elements == i
for i = 1 to k
C[i] = C[i] + C[i - 1]
# C[i] now contains the number of elements <= i
for j = A.length downto 1
B[C[A[j]]] = A[j]
C[A[j]] = C[A[j]] - 1
- Time:
$\Theta(n+k)$ , or$\Theta(n)$ if$k = O(n)$
- Let
$A$ be an integer array of length$n$ where the largest integer is$k$ - Let each integer be represented in base
$b$ - Let
$d$ be the number of digits in the largest integer$k$ - In base
$b$ ,$d = \lceil\log_bk\rceil$
- In base
RadixSort(A, d)
for i = 1 to d
stable sort array A on digit i (via counting sort)
In the algorithm, notice that we are simply doing counting sort on each digit from least significant to most significant. Since a digit can take on
The question then becomes in choosing the correct value for
Theorem: if
Proof: we can set
Theorem: given
Proof: each key has
Sub routine used (ie. counting sort) must be stable
Using MSD radix sort can also correctly sort. For LSD,
Source: https://www.quora.com/Why-is-it-wrong-for-Radix-Sort-to-sort-by-the-most-significant-digit
An algorithm is randomised if its behaviour is determined not only by its input but also by values produced by a random-number generator
- Monte Carlo Algorithm: a randomised algorithm whose output may be incorrect with a certain probability; runtime bounds hold deterministically
- Las Vegas Algorithm: a randomixed algorithm that always gives correct results, or informs about the failure; runtime differs depending on the input
- Running time of a randomised algo depends on the RNG
- For the same input, the running time can be different depending on the random numbers
- The "average" running time of a randomised algo over all possible random numbers is called the expected running time
-
Non-randomised: standard QuickSort
- Guaranted to find the exact solution
- Running time is dependent on the input
-
Monte Carlo: finding
$\pi$ by randomly filling a square with dots and finding the ratio of dots out and in the circle- Only gives an approximation of
$\pi$ (which is technically not the correct/definite answer)
- Only gives an approximation of
-
Las Vegas: binary search on sorted array but using random indices
- Guaranteed to find the correct answer eventually, but running time depends on the random numbers and is independent on the input array
- Worst case
$O(n)$ ; expected$\Theta(\log n)$
Let
$X = \sum_1^n I[A]$ $E[X] = E[\sum_1^n I[A]]$ -
$E[X] = \sum_1^n E[I[A]]$ , by linearity of expectation $E[X] = \sum_1^n Pr(A) = \sum_1^n p = np$
Given a hash table
The has function
For a key
- Let
$C_y = I[h(y) = h(x)]$ - Note that
$E[C_y] = Pr(h(y) = h(x)) \leq \frac{1}{m}$ - Length of chain at
$h(x) = 1 + \sum_{y \neq x, y \in T}C_y$ - Thus,
$E[h(x)] = E[1 + \sum_{y \neq x, y \in T}C_y]$ $= 1 + \sum_{y \neq x, y \in T}E[C_y]$ $= 1 + \sum_{y \neq x, y \in T} Pr(h(y) = h(x))$ $= 1 + \sum_{y \neq x, y \in T} \frac{1}{m}$ $= 1 + \frac{n-1}{m}$ $\leq 1 + \alpha$
Given
Given hash table of size
Note that collisions are counted over distinct pairs: if four people share the same exact birthday (or if four keys hashes to the exact same bin), we count it as
For each pair
By the property of indicator variables, we also have,
We now let
Note that the two summations are just a result of the number of unique pairs of
As such, given
The running time of Quicksort is dominated by time spent in the Partition procedure. Each time the Parition procedure is called, it selecs a pivot element, and this element is never included in any future recursive calls to Quicksort and Partition. Thus, there can be at most
Let
Observe that each pair of elements is compared at most once. Elements are compared only to the pivot element and, after a particular call of Partition finishes, the pivot element used in that call is never again compared to any other elements.
- Let
$X_{ij} = I[z_i$ is compared to$z_j]$ , - Since each pair is compared at most once,
$X= \sum_{i=1}^{n-1}\sum_{j=i+1}^{n}X_{ij}$ $E[X]= E[\sum_{i=1}^{n-1}\sum_{j=i+1}^{n}X_{ij}]$ $E[X]= \sum_{i=1}^{n-1}\sum_{j=i+1}^{n}E[X_{ij}]$ -
$E[X]= \sum_{i=1}^{n-1}\sum_{j=i+1}^{n}Pr(z_i$ is compared to$z_j)$
Thus, our work now is to find
In general, once a pivot
Thus, combining the equations,
The
- Minimum: first order statistic,
$i = 1$ - Maximum:
$n^{th}$ order statistic,$i = n$ - Median: lower,
$i=\lfloor (\frac{n+1}{2} \rfloor$ OR upper,$i=\lceil (\frac{n+1}{2} \rceil$
Given an array
randomisedSelect(A, l, h, i)
if l == h # base case, one element array
return A[l]
p = randomisedPartition(A, l, h)
k = p - l + 1
if i == k # pivot value is the answer
return A[p]
elif i < k
# only search low partition
return randomisedSelect(A, l, p - 1, i)
else
# only search high partition
return randomisedSelect(A, p + 1, h, i - k)
-
$A[p]$ is the pivot element - Call to
randomisedPartition
partitions$A[l,\dots,h]$ into 2 possibly empty subarrays$A[l,\dots,p-1]$ and$A[p+1,\dots,h]$ -
$k$ is the number of elements in$A[l,\dots,p]$ , ie. the number of elements in the low side of the partition including$A[p]$
Absolute worse case running time is
Any element is equally likely to be selected as the pivot. Thus, for each
To obtain an upper bound, we assume the
For
If
We want to prove that
Thus, to prove that
As long as we choose a constant
Thus, if we assume that
Can an adversay construct an input of size $n$ to force randomised quickselect to run in $\Omega(n^2)$ time?
False! Randomised algorithm depends on the randomisation, not on the input. Since the adversary has no way of controlling the randomisation, it has no way to force the algo to run in
$\Omega(n^2)$ time.
An approximate selection algorithm for the median, used as a subroutine to get a good-enough pivot for quickselect to achieve a worst case running time of
- Divide the
$n$ elements of the input array into$\lfloor\frac{n}{5}\rfloor$ groups of 5 elemnts each, and at most one group made up of the remaning$n \mod 5$ elements - Find the median of each of the
$\lceil\frac{n}{5}\rceil$ groups via insertion sort of each group - Now, find the lower median-of-medians
$x$ of the$\lceil\frac{n}{5}\rceil$ medians found in step 2
At least half of the medians found in step 2 are more than or equal to median-of-medians
Thus, with a chosen pivot using median-of-medians, at least
Steps 1 and 2 takes
We need to prove
which is at most
Because we assume
- Show that a sequence of
$n$ operations takes worst-case time$T(n)$ in total - The amortised cost per operation is thus worst-case
$\frac{T(n)}{n}$
- Amount we charge an operation is the amortised cost; different operations may have different amortised cost
- When amortised cost
$>$ actual cost, assign difference as credit - Credit can help pay for later operations whose amortised cost
$<$ actual cost - Credit must not become negative
Invariant: credit never drops below 0
Observe: at any point, every 1 bit in the counter will contribute 1 credit
Proof:
- Every time we set a bit
$0 \rightarrow 1$ , we pay$2$ credits -
$1$ credit is used to flip the bit while$1$ credit is stored - Every time we reset a bit from
$1 \rightarrow 0$ , we use$1$ credit to flip the bit - Hence, amount of credit is the number of
$1$ 's in the binary counter - Since the number of
$1$ 's is non-negative, this shows that the credit is always non-negative
We will perform
-
$\Phi$ : potential function which maps each data structure$D_i$ to a real number$\Phi(D_i)$ -
$\Phi(D_i)$ : potential associated with data structure$D_i$
Amortised cost
$$\hat{c}i = c_i + [\Phi(D_i) - \Phi(D{i-1})]$$
Thus, the total amortised cost of the
$$ \begin{aligned} \sum_{i=1}^n \hat{c}i &= \sum{i=1}^n (c_i + \Phi(D_i) - \Phi(D_{i-1}))\ &= \sum_{i=1}^n c_i + \Phi(D_n) - \Phi(D_0) \end{aligned} $$
We need to show that
- Select a suitable
$\Phi$ so that for the costly operation,$\Delta\Phi_i$ is negative to such an extent that it nullifies or reduces the effect of the actual cost
- Push(
$x$ ): pushes an element$x$ to the stack - PopAll(): pops all elements from the stack
Define potential function
Since the number of objects in the stack is never negative, the stack
The total amortised cost of
If the
Thus, by using the equation for the amortised cost of the
$$ \begin{aligned} \hat{c}i &= c_i + [\Phi(D_i) - \Phi(D{i-1})]\ &= 1 + [1]\ &= 2 \end{aligned} $$
If the
Thus, by using the equation for the amortised cost of the
$$ \begin{aligned} \hat{c}i &= c_i + [\Phi(D_i) - \Phi(D{i-1})]\ &= s' + [-s']\ &= 0 \end{aligned} $$
Since the amortised cost of each of the two operations is of order
- Double the size of the array every time the array overflows
- Worst case time to execute one insertion is
$O(n)$ - Observe that only when the table overflows then a new array of double the size is created
- It will take
$O(1)$ time for insertions till next overflow
Thus,
Thus, amortised cost of each insertion
Invariant: credit never drops below 0
Observe: at any point, every insert operation will contribute
Proof:
- For every Insert operation at a cell,
- 1 credit to actually insert into current table
- 2 credits for creating another 2 cells
- 1 credit for copying itself
- 1 credit for copying left half
- When table overfills, table has to be doubled, and whole table has to be copied again, which is paid for by the 4 additional credits of each insert operation
- Thus, when every insert operation is set to 5 credits, the credit will never be non-negative and the amortised running time is constant
fib(n):
if n <= 2: f = 1
else: f = fib(n - 1) + fib(n - 2)
return f
Let fib(n)
.
Thus,
Intuition: multiplying by
$2$ each time, and subtracting by$2$ each time. We can subtract$2$ from$n$ exactly$\frac{n}{2}$ times. Thus,$2^\frac{n}{2}$ .
- During execution of recursive
fib
, observe that the whole recursion tree is visited - Number of instructions must be at least number of nodes in this recursion tree
memo = {} # initialise empty dictionary
fib(n):
if n in memo: return memo[n]
if n <= 2: f = 1
else: f = fib(n - 1) + fib(n - 2)
memo[n] = f
return f
Notice that lines
fib = {} # initialise empty dictionary
for k in range(1, k + 1):
if k <= 2: f = 1
else: f = fib[k - 1] + fib[k - 2]
fib[k] = f
return fib[n]
Notice again that lines
Let
Given: 2 sequences
Aim: compute a (not "the", since there can be multiple) longest sequence
Given: 2 sequences
Base Cases
General Case
Thus total time
Given: 2 sequences
Aim: find length of the LCS of
Base Cases
General Case
L(n, m):
# base case
if n == 0 or m == 0:
return 0
# general case
if A[n] == B[m]:
return L(n - 1, m - 1) + 1
else:
return Max(L(n - 1, m), L(n, m - 1))
This time complexity is no different from the brute force approach as we are solving overlapping subproblems multiple times:
Thus, we can apply dynamic programming approach:
L(n, m):
# dp is a 2D array, n is col, m is row
dp = Array[n, m]
for i = 0 to n:
dp[i, 0] = 0 # init bottom row of dp to 0
for i = 0 to m:
dp[0, i] = 0 # init left col of dp to 0
for j = 1 to m:
for i = 1 to n:
# general case in recursion
if A[i] == B[j]:
dp[i, j] = dp[i - 1, j - 1] + 1
else:
dp[i, j] = Max(dp[i - 1, j], dp[i, j - 1])
- Total time
$= O(nm)$ - Total Space
$= O(nm)$
A problem exhibits optimal substructure if an optimal solution to the problem contains within it optimal solutions to subproblems. For example, solving problem
To prove that an optimal substructure exists:
- Show that a solution to the problem consists of making a choice, and making this choice leaves one or more subproblems to be solved
- Assume that the choice that leads to an optimal solution is given (do not concern yourself yet with how to determine this choice, just assume it has been given)
- Given this choice, determine which subproblems ensue and how to best characterise the resulting space of subproblems
- Show that the solutions to the subproblems used within an optimal solution to the problem must themselves be optimal by using a "cut-and-paste" technique:
- Assume each of the subproblem solution is not optimal
- By "cutting out" the non-optimal solution to the subproblems and "pasting in" the optimal one, we can get a better solution to the original problem
- This contradicts our assumption of line 2 that we already have an optiaml solution
- Thus, assumption that subproblem solution is not optimal must be incorrect; the subproblem solutions must themselves be optimal
- Find the optimal substructure
- Express the solutions recursively
- Find the overlapping subproblems
- Observe that there are only polynomial number of distinct subproblems
- However, the recursive algorithm takes non-polynomial/exponential time because it is solving the same subproblems multiple times
- Thus, we compute the recursive solution iteratively in a bottom-up fashion (or top-down with memoization)
Use the LCS algorithm to give an character
, the output should be carac
.
Solution: run LCS algorithm on both the input string, and the reverse of the input string. Thus, running time
Given: weight-value pairs
Ouput: a subset
Observe that there are
Find the optimal substructure:
Let
Base case:
Case 1: item
Case 2: item
Compute the solution iteratively in a bottom-up fashion:
knapsack(v, w, W):
m = Array[n, W] # m is a n*W array
for j = 0 to W:
m[o, j] = 0
for i = 1 to n:
m[i, 0] = 0
for i = 1 to n:
for j = 0 to W:
if j >= w[i]:
m[i, j] = Max(m[i - 1, j - w[i]] + v[i], m[i - 1, j])
else:
m[i, j] = m[i - 1, j]
return m[n, W]
- Time taken to fill up a table entry
$= O(1)$ - Total time
$= O(nW)$ - Total space
$= O(nW)$
Given: directed graph
Notations:
-
$n=|V|$ ,$m=|E|$ -
$\delta(u,v)$ : distance from$u$ to$v$ -
$P(u,v)$ : shortest path from$u$ to$v$
Objectives:
- Compute
$\delta(s,v)$ for all$v\in V\backslash{s}$ - Compute
$P(s,v)$ for all$v\in V\backslash{s}$
If all edge weights
Two crucial facts exploited:
- The nearest neighbour is also the vertex nearest to
$s$ - Optimal subpath property: given a shortest path
$P(s,v)$ that passes through$u$ ,$P(s,u)$ and$P(u,v)$ are themselves shortest paths
However, if there are negative edges weights, the properties are violated:
- The nearest neighbour is not necessarily the vertex nearest to
$s$ . There can exist some other path with negative edge weights that are not the direct neighbour of$s$ -
$\delta(s,y)=5$ by going$s \rightarrow u \rightarrow v \rightarrow x \rightarrow y$ , but$\delta(s,v)=40$ (green path) and is not the subpath of$P(s,y)$ , given by$s \rightarrow u \rightarrow v$ , which will yield distance of$60$
Theorem: all shortest paths in
Let
SSSP can now be converted to compute
Recursive formulation:
- If
$L(v,i)$ has$<i$ edges:$L(v,i)=L(v,i-1)$ - If
$L(v,i)$ has exactly$i$ edges:$L(v,i)=min_{(x,v)\in E}(L(x,i-1)+\omega(x,v))$
- For each
$v \in V \backslash{s}$ :- If
$(s,v)\in E$ :$L[v,1] = \omega(s,v)$ - Else:
$L[v,1] = \infty$
- If
$L[s,1]=0$ - For
$i=2$ to$n-1$ :- For each
$v\in V$ :$L[v,i]=L[v,i-1]$ - For each
$(x,v)\in E$ :$L[v,i] = min(L[v,i],L[x,i-1]+\omega(x,v)$
- For each
Time:
Space:
A greedy algorithm obtains an optimal solution to a problem by making a sequence of choices. At each decision point, the algorithm makes a choice that seems best at the moment (ie. the local best). This heuristic strategy does not always produce an optimal solution.
Example:
Output:
- Sort the
$n$ numbers to form sorted list,$O(n\log n)$ - Use DP apporach to find LCS between sorted and original sequence,
$O(n^2)$
You are given some shuffled cards. Cards must be drawn individually and placed in a pile such that the order of cards in the pile is in decreasing rank. Find the least number of piles required.
Greedy strategy: always try to place a card on the leftmost pile. In no such pile exists, create a new pile on the right side.
Observe that:
- Cards in each pile are in decreasing order
- Any increasing subsequence of the deck contains at most one card from each pile
- Piles are in decreasing order and any card that is in the top of a pile has come after cards at the bottom
- Thus, length of LIS
$\leq$ minimum number of piles
We know that:
- Minimum number of piles
$\leq$ number of piles in greedy strategy - Number of piles in greedy strategy
$\leq$ length of LIS
Combining these and the observation above, minimum number of piles
- Use a stack to implement each pile
- At most
$n$ stacks,$1$ for each of the$n$ cards
- At most
- Place card on the leftmost allowed pile, else create a new pile
- Since the top cards of each stack are in sorted order, binary search can be used,
$O(n\log n)$
- Since the top cards of each stack are in sorted order, binary search can be used,
Time:
- Cast the optimisation problem as one in which we make a choice and are left with one subproblem to solve
- Prove that there is always an optimal solution to the original problem that makes the greedy choice, so that the greedy choice is always safe
- Demonstrate optimal substructure by showing that, having made the greedy choice, what remains is a subproblem with the property that if we combine an optimal solution to the subproblem with the greedy choice we have made, we arrive at an optimal solution to the original problem
Beneath every greedy algorithm, there is almost always a more cumbersome dynamic-programming solution
Proof by contradiction:
- Assume that there exists an optimal solution that is better than our greedy solution
- Show that some parts of the optimal solution is different from greedy solution (assuming optimal is better, some steps taken by optimal must be different from greedy)
- Show that this different steps will result in optimal solution being worse than greedy and hence contradicting our assumption that optimal is better than greedy
Refer here for detailed proof of Fractional Knapsack and Activity Scheduling: https://www2.cs.duke.edu/courses/fall17/compsci330/lecture6note.pdf
Alternatively, proof by induction and other methods learnt can be used too
Given: alphabet set
How many bits are needed to encode a text file with
Trivial answer:
Can we use fewer bits to store set
$A$ ?
No.Can we use fewer bits to store a text file?
Yes, due to the huge variation in the frequency of alphabets in a text.
Intuitively, use shorter bit encodings for more frequent alphabets and longer bit encodings for less frequent alphabets.
A coding
Problem: given a set of
-
$\gamma$ is a prefix coding -
$ABL(\gamma)$ is minimum
Solution: labelled binary tree
Similarly, given a prefix code, we can also work backwards to find the labelled binary tree.
Theorem: for each prefix code of a set
- There is a bijective mapping between the alphabets and the leaves
- The label of a path from root to a leaf node corresponds to the prefix code of the corresponding alphabet
Thus, we can express the
In other words, if we are able to find the smallest correct labelled binary tree, we would also solve the problem of optimal prefix codes.
Lemma: the binary tree corresponding to the optimal prefix coding must be a full binary tree (every internal node must have degree exactly
Intuitively, more frequent alphabets should be closer to the root.
Let
Let
Thus,
- If
$|A| = 2$ - return
$(a_1,a_2)$
- return
- Else,
- Let
$a_1$ and$a_2$ be the two alphabets with least frequencies [initial$O(n\log n)$ time to sort] - Remove
$a_1$ and$a_2$ from$A$ - Create a new alphabet
$a'$ , where$f(a') = f(a_1) + f(a_2)$ - Insert
$a'$ into$A$ to form$A'$ [do binary search to insert,$O(\log n)$ ] - Build optimum tree
$T = OPT(A')$ - Replace node
$a'$ in$T$ by$(a_1 \leftarrow^0 o \rightarrow^1 a_2)$ - return
$T$
- Let
Time:
Given
Given
Thus, by equation
Given input weight-value pairs
Optimal Substructure: If we remove
Greedy Choice Property: an optimal solution is one that has the highest total value density (highest value per kg). Since we are always adding as much of the highest value density we can, we are going to end up with the highest total value density. Suppose otherwise that we had some other solution that used some amount of the lower value density object. However, we could easily substitute these lower value density objects with some of the higher value density objects meaning our original solution coud not have been optimal.
- Sort items by decreasing order of the value density:
$O(n\log n)$ - If target weight
$W$ is not yet met, recursively take as much of the highest value density objects:$O(n)$
Total time:
Consider a decision problem
Now, suppose that we already know how to solve a different decision problem
- The transformation takes polynomial time
- The answers are the same: answer for
$\alpha$ is "yes" if and only if the answer for$\beta$ is also "yes"
We call such a procedure a polynomial-time reduction algorithm and it provides us a way to solve problem
- Given an instance
$\alpha$ of problem$A$ , use a polynomial-time reduction algorithm to transform it to an instance$\beta$ of problem$B$ - Run the polynomial-time decision algorithm for
$B$ on the instance$\beta$ - Use the answer for
$\beta$ as the answer for$\alpha$
If there is a
Thus, if
In practice, poly-time algorithms generally have runtime
$O(n)$ or$O(n^2)$ or$O(n^3)$ , and not some arbitrarily large one like$O(n^{100})$ .
For polynomial time, we mean that the runtime is polynomial in the length of the encoding of the problem instance.
An algorithm that runs in time polynomial in the numeric value of the input but is exponential in the length of the input is called a pseudo-polynomial time algorithm.
- Decision Problems: outputs a boolean value (ie.
true
orfalse
) - Optimisation Problems: outputs either a maximal or minimal quantity
Given an optimisation problem, we can convert it into a decision problem. For example, finding if a path of length
Thus, if we can provide evidence that a decision problem is hard, we also provide evidence that its related optimisation problem is hard. Likewise, if we can prove that the optimisation problem is easy, its related decision problem is easy as well.
Given an undirected graph
- It originates and terminates at the same vertex
- Each vertex in
$V$ is visited exactly once
Optimisation: compute cycle of maximum length
Decision: does there exist a cycle of size
Given an undirected complete graph
- It originates and terminates at the same vertex
- Each vertex in
$V$ is visited exactly once - There is an edge between every consecutive pair of vertices in the sequence
Optimisation: compute TSP tour of minimum cost
Decision: does there exist a TSP tour of cost at most
At first glance, these 2 problems look quite different. HC takes as instance an undirected graph, while TSP takes as instance an undirected complete graph with weighted edges.
However, we can transform
- Let
$G=(V,E)$ be the instance of the original HC problem - Build a complete graph
$G'$ on$V$ vertices - For each pair
$(u,v)$ in$G'$ :- If
$(u,v)\in E$ , set cost$c(u,v)=1$ - Else, set cost
$c(u,v)=2$ (any cost$>1$ will work)
- If
Theorem:
Proof (forward direction): if
- Let
$C$ be a HC in$G$ -
$G$ is a subgraph of$G'$ - So
$C$ must be present in$G'$ as well -
$C$ is a tour since each vertex appears exactly once in$C$ - Cost of each edge of
$C$ is$1$ since each edge of$C$ is present in$G$ as well - So the tour
$C = n$
Proof (backward direction): if
- Let
$C$ be a TSP tour of at most$n$ in$G'$ - Cost of each edge in
$G'$ is at least$1$ - There are
$n$ edges in$C$ - By 1, 2, and 3, each edge of
$C$ must have weight exactly$1$ - Thus, each edge of
$C$ is present in$G$ as well - So
$C$ is present in$G$ as well - Since each vertex appears exactly once in
$C$ , so$C$ is a HC as well
Reduction: on original instance
The set of all decision problems that can be solved in polynomial time.
The set of all decision problems for which the instances to the problem can be verified in polynomial time, but which no known polynomial-time algorithm exists.
Note that there is no proof that no polynomial-time algorithm exists for this class.
Proving NP: given a problem
Theorem: If any NP-complete problem is polynomial-time solvable, then P
The set of all decision problems
Proving NP-completeness: let
- Prove that
$X \in$ NP - Pick a problem
$A$ which is already known to be NP-complete - Show that
$A \leq_p X$
The set of all (not necessarily decision) problems
Theorem: every problem from NP can be reduced to circuit satisfiability.
Proof sketch: consider any problem
- What we know is that it has an efficient certifier, say
$Q$ - Any algorithm which outputs "yes"/"no" can be represented as a DAG where
- Internal nodes are gates
- Leaves are binary inputs
- Output is
1
/0
-
Literal: a boolean variable or its negation
-
$x_i$ ,$\neg{x_i}$
-
-
Clause: a disjunction (OR) of literals
$C_i=x_1 \vee \neg{x_2}$
-
Conjunctive Normal Form (CNF): a formula
$\Phi$ that is a conjunction (AND) of clauses$\Phi = C_1 \wedge C_2 \wedge C_3$
-
CNF-SAT: given a CNF formula
$\Phi$ , does it have a satisfying truth assignment?- ie. is there a way to assign all the literals such that
$\Phi$ evaluates to
- ie. is there a way to assign all the literals such that
-
3-SAT: SAT where each clause contains exactly 3 literals corresponding to different variables
- ie.
$\Phi = (x_1 \vee \neg{x_2} \vee x_3) \wedge (\neg{x_1} \vee x_2 \vee x_3)$
- ie.
So, 3-SAT is NP-complete.
Proof of NP: let the certificate be the actual subset found. It is trivial to verify the certificate by just summing up the elements in the subset and checking if the sum is the target value.
Proof of NP-complete: We shall use the Partition problem and reduce it to Subset Sum. Ie. Partition
Transformation: given an input instance for Partition (eg. a multiset
- Calculate
$T = \frac{\sum_{i=1}^n a_i}{2}$ (trivially, this takes polynomial time) - Use
$(A,T)$ as the instance for Subset Sum
Theorem: there is a valid Partition for
Proof (forward direction): if there is a valid Partition for
- Since there is a valid partition,
$A$ can be split into two mutually disjoint subsets such that the sum of integers in each subset is equal - Thus, the sum of each subset must therefore be half the total sum of all integers in
$A$ - Thus, there exists a subset that sums to
$T$ in$A$ , and Subset Sum for$(A,T)$ is valid
Proof (backward direction): if there is a valid Subset Sum for
- Let
$S_1$ be the valid subset found by Subset Sum $\forall s \in S_1,, \sum s = T$ - Let
$S_2$ be the set of integers in A excluding the integers in$S_1$ (ie.$S_2 = A / S_1$ ) - Clearly, the sum of all integers in
$S_2$ is$2T - T = T$ - Thus,
$S_1$ and$S_2$ are the Partitions of$A$ (ie. two mutually disjoint subsets of$A$ such that the sum of integers in each subset are equal to each other)
An algorithm has an approximation ratio of
$$ \begin{aligned} &\frac{C}{C^} \leq \rho(n) \rightarrow \text{ for minimisation, } \rho(n) \geq 1 \ &\frac{C}{C^} \geq \rho(n) \rightarrow \text{ for maximisation, } \rho(n) \leq 1 \end{aligned} $$
Basically, the approximation algorithm with
$\rho(n)$ that is as close to$1$ as possible is the better algorithm.
An approximation scheme for an optimization problem is an approximation algorithm that takes as input not only an instance of the problem, but also a value
We say that an approximation scheme is a polynomial-time approximation scheme if for any fixed
We say that an approximation scheme is a fully polynomial-time approximation scheme if it is an approximation scheme and its running time is polynomial in both
A vertex cover of an undirected graph
- Initialise
$C = \emptyset$ -
$E' = G.E$ (set$E'$ to be a copy of the edges in$G$ ) - While
$E' \neq \emptyset$ :- Let
$(u,v)$ be an arbitrary edge of$E'$ $C = C \cup {u,v}$ - Remove from
$E'$ every edge incident on$u$ or$v$
- Let
- Return
$C$
Using adjacency lists to represent
Theorem: Approx-Vertex-Cover is a polynomial-time 2-approximation algorithm
Proof: firstly, the set
Let
For the approximation algorithm, each execution of picking an arbitrary edge picks an edges for which neither of its endpoints is already in
Combining the two equations,
$$ \begin{aligned} |C| &= 2|A| \leq 2|C^|\ |C| &\leq 2|C^|\ \frac{C}{C^*} &\leq 2 \end{aligned} $$