title | type | duration | creator | competencies | ||||
---|---|---|---|---|---|---|---|---|
Data Structures |
lesson |
2:30 |
|
Programming |
After this lesson, students will be able to:
- Identify common data structures
- Linked lists, Trees, Heaps, Tries, Graphs
- Be familiar with their common methods
- And be able to determine their run-times
- Know use-cases for each data structure discussed
Before this lesson, students should:
Linked lists are used to represent an ordered set. They are similar to arrays but do NOT live contiguously in memory.
We have a reference to the head (and possibly the tail). Each node has a reference (or pointer) to the next node.
In a doubly-linked-list each node also has a reference to the prev node.
We cannot get an element at a given index in constant time. We must iterate over the list.
We can however insert/remove an element to the head in constant time. This is not the case for an array. (Hence shift
/unshift
)
Linked Lists are often used to represent Stacks and Queues.
Why might we prefer to use linked list to implement a stack or queue? Why not an array?
Why would we choose to use a linked list over an array? What are the advantages and disadvantages?
A tree is a collection of nodes, where each node has some number of children. We cannot see every node at once, but we do have a pointer to the root node.
Leaf nodes are those that have no children.
Trees can be used to represent a family tree (where the old people are on top), or a file structure.
We can traverse a tree using Depth first search or Breadth first search. More on this below.
A binary tree is a tree such that every node has at most 2 children.
A binary search tree (BST) is one such that:
- The value of every left child is less than that of its parent
- The value of every right child is greater than that of its parent
This makes searching for elements in a BST very fast.
To insert into a BST, we compare the new val with the root value. If it is less than the root, we repeat the process for the left subtree. If it is greater than the root we repeat the process for the right subtree. We keep repeating this process until our node ends up on the bottom.
Look up how to remove a node if you are interested ;)
Tree.instanceMethods.traverse(callback) :=
left.traverse(callback) if hasLeftChild?
callback(value)
right.traverse(callback) if hasRightChild?
The above pseudocode is called in-order traversal. Check out pre-order and post-order as well.
This is a little more complicated. Look it up if you are interested ;)
How quickly (in Big-O) can we find an element in a BST with n
elements?
How quickly can we find the min element? The max element?
Check out my Typescript and compiled Javascript implementations
An AVL (named for its inventors) tree is a self-balancing binary search tree. You do not need to understand how these work but know that they exist.
Self-balancing means that whenever we insert or remove a node, we re-organize the rest of tree so that no node with 0 or 1 children hangs more than 1 level below another.
Why would we want a self-balancing tree? Is it worth the implementation? Are the run-times different than a tree that is not self-balancing?
A Min Heap is binary tree such that the value of every node is less than the value of all of its children. Thus, the smallest value is always the root of the tree.
They are used when we want to keep track of the smallest element in a set.
You can probably guess what a Max Heap is
To get the min element we just look at the root. (Duh)
To insert elements, we add a node to first available position (the bottom-left) and "bubble it up". This means we keep swapping it with its parent until the value of the node is no longer larger than the parent.
How do we know inserting in this way will always leave us with a Min Heap?
To extract the min element we replace the root with the last element (the one in the bottom-right position). Then we keep bubbling the element down. This means we keep swapping it with its smaller child as long as the child is smaller than it.
How do we know extracting in this way will always leave us with a Min Heap?
A heap is a complete binary tree. That is, only the last level may not be full, with all leaf nodes on the left. Because of this we can use arrays to represent each node of the tree. Let's look at the example from above to see out this works.
For each element at index i
, its children are at 2*i
and 2*i + 1
. Even simpler, the parent of any element at index i
is at index i/2
(rounding down). This works all long as we put a dummy at index 0. (Thus the root is at index 1)
Besides keeping track of a min/max in set, heaps can be used keep track of medians. To do this we keep half of our elements in a min heap and the other half in a max heap. High-level implementation here
Ever hear of heap sort? This is a very efficient and easy algorithm for sorting arrays. Fun!
What would be the run-time of insert
? get_min?
? extract_min
?
Check out my Ruby implementation here
Pronounced try or tree depending on who you ask. The word comes from retrieval but many pronounce it like try to distinguish from trees.
Anyway. This is a Trie
Tries are often to represent a set of words. The children of each node are the next letter of a word in our set.
Check out my Ruby implementation here
Let's assume we have a trie
of n
words where the longest words of length m
.
What would be the run-time of includes?
? insert
? What about initializing our trie?
Why would we use a trie compared to a hash or BST? How much space do we need compared to other data structures.
A Graph is a collection of nodes (or vertices) and edges. They can represent a network where each node is an element and each edge represents a connection between those elements.
Types of graphs:
-
Directed: Graphs can be directed or undirected
- directed: a graph of twitter users and followers
- undirected: a graph of facebook users and friends
-
Weighted: Graphs can also have weights
- A graph of cities and distances between them may bed weighted
- A weighted graph can also have direction.
-
Mutli-edged: A multigraph can have multiple edges between nodes.
Depending on the complexity of what we want to represent, many different types of graphs will do the trick.
Depending on the type of graph, many different types of object can represent it.
What kind of networks can we use a graph to represent? What type of graph would these be?
What kind of objects could we use to represent a graph? What if has weights? What if it has direction?
How can we represent a tree using a graph? How can we represent a linked list using a graph?
Like graphs? Like money? Find a poly-time solution to Traveling Salesman and win a million bucks!
;)
Leaving it out this time. There are many good resources out there for this material. Google, Wikipedia, etc. Practice self discovery!
It is OK to use Wikipedia to get a basic level of understanding of data structures and algorithms. Why? Because you can prove to yourself that the information is correct! For a more advanced level of understanding, however, books and published papers are probably the way to go.
- Why do we care about how we represent data?
- How do we determine which data structures to use for different purposes?
- How can we determine the run-time of data structure's method?