A Segment Tree for Christmas

Firm grounding in the Basics of Data Structures
Algorithmic complexity

Welcome to a festive new installment of Algosaurus! ‘Tis the season of Christmas trees and holly, so today, we’ll be talking about a special tree for the occasion, called the Segment Tree.

Mind you, this isn’t such an easy-to-understand data structure, but is indeed one of the most elegant and useful ones out there. It’s a long-ish post, so we’ve kept a few checkpoints to keep track of where you are:

Peep the duck and her family is decorating their Christmas tree with shiny new baubles today.


She has some mild OCD, so she divides her tree into different sections and wants to calculate the number of baubles between any 2 sections. At the same time, her babies keep adding baubles to any section whenever they like.


She decides to keep track of the number of baubles in each section through an array A.


Let’s call the operation of finding the number of baubles between section x and section y, query x y; the operation of Peep’s babies adding y baubles to section x, add x y

Basically, query x y asks the sum of the values A_x to A_y, and add x y asks to add the value y at the index x.


Of course, that doesn’t sound hard at all! Peep can just iterate from A_i to A_j each time she decides to query, and updates are even simpler, she can just set A_x = A_x + y.

Let us analyze her solution. She takes O(N) time per query, and O(1) time per update.

But what if there are a lot of queries? We’d need to iterate over a range multiple times, so that doesn’t sound so good in terms of complexity.

Enter, the prefix sum array. Instead of merely storing A, we store an array B, which stores the *cumulative sum* up till that index of A, like so:


Despite the super fancy name and definition, prefix sums are simple creatures. Calculating them is easy-peasy.

def prefix_calc(A):
    B = []
    curr = 0
    for i in A:
        curr += i
    return B

Note how we’re calculating each element of B in O(1), for a total run-time of O(N). Now, we can answer queries in O(1), by simply returning B_y - B_{x-1}.

Yay! But wait, what happens when her babies add some new baubles, ie. we get an update?

For every update in A, we need to update the elements in B as well, since the sum changes accordingly. We need to run over O(N) elements of B, so updates are O(N).


When her tree is tiny, she can still afford to count the baubles individually. But what if she has to decorate the Christmas tree at Rockefeller Center?


Hmm, looks like we need to think a bit about this.

So the prefix sum is very quick for queries, and the generic array is very quick for updates. An important question to ask at this point is, why?

It’s because in the prefix sum, any given query can be broken into 2 values in the structure we have. And in the generic array, each update is to change a single element. These are all constant values which don’t change with the *indexes* of the queries or the updates.

What if we found a middle ground, one that does both things reasonably quickly?

The beauty of the prefix sum array B, is that it functions using a running sum broken into intervals, but the downside is that you have to update every element of B when even a single element of the original array A is updated.

In other words, the cumulative sum of every element up till the given index in B, is strongly dependent on the elements of array A. We need to decouple intervals from updates as much as we can, because we’re lazy and don’t want to update our interval values unless we *really* need to.

Why? Because that speeds up updates of course! Let’s play around with array B a bit.


Hmm, this isn’t particularly helpful. We’ll still have to loop over all the elements in A to produce this value, in case of an update. Let’s split this into 2 and see what happens.


Slightly better. If any update happens between indices 1…4 of A, we don’t have to calculate the running sum of indices 5…8 anymore. We can simply recalculate the running sum of 1…4 and add to it the run sum of 5..8.


Now we’re talking. Any change in the interval, say, 3…4 won’t affect the run sums of 1..2 and 5…8. Simply recalculate the run sum of 3…4 and let the change in 3…4 ‘percolate’ to 1…4 and then to 1…8.

And finally…


The last level of ‘decoupling’ updates from intervals, and reducing the dependence of pre-calculated run sums from the elements of A. Any change in a single element won’t affect the rest of the elements.

This is called a segment tree.

Since this kind of structure is formed by the *union* of intervals or segments, it supports any function which supports the union operator.

That means Peep can find the greatest number of baubles between any 2 sections as well! (Since max supports the union operator as well, ie. if max(a, b) = a and max(c, d) = d, then max(a, b, c, d) = max(a, d))


Why is this nice? Note that each index in the original array appears in ceil(\log N) + 1 nodes (one on each level), which is O(\log N). This means that if we change the value of a single element, O(\log N) nodes in the segment tree change. Which implies that we can do updates quickly.

Note that any interval can be expressed as a union of some nodes of the tree. As an example look at the nodes we need for 1\dots 7.


We claim that any interval can be represented by O(\log N) nodes in the tree, so we can conceivably do queries quickly too. We’ll prove this after talking about the query function itself right here, so stay put!

Let’s now try to count the number of nodes. There’s N leaves (leaves are nodes with no children), N/2 nodes in the level above, N/4 above that, and so on, until 1. This gives us the sum 1 + 2 + 4 + \dots + N, which sums to 2\cdot N - 1 = O(N)

Summing up, this means we can do updates and queries in O(\log N), in O(N) space.


Let’s concretely make our functions now.


Now, for each node indexed v, if v isn’t a leaf node, its children are indexed 2\cdot v and 2\cdot v + 1. These observations suffice to describe a build method, with which we can initialize the tree. Note that we’ve used an array of size 4 \cdot N to represent the tree. This is necessary because of the full binary-heap like representation of the segment tree we’re using.

max_N = (1 << 20) #This is the maximum size of array our tree will support
seg = [0] * (4 * max_N) #Initialises the ’tree’ to all zeroes.

def build(someList, v, L, R):
    global seg
    if (L == R):
        seg[v] = someList[L]
        mid = (L + R) / 2 #Integer division is equivalent to flooring.
        build(someList, 2 * v, L, mid)
        build(someList, 2 * v + 1, mid + 1, R)
        seg[v] = seg[2 * v] + seg[2 * v + 1]

For the update procedure, the simplest way is to travel down to the lowest level, make the necessary update, and let it travel back up the tree as we add the left and right children for every node.

Let’s use an example!


This procedure can be described very cutely with a recursive function.

def update(v, L, R, x, y):
    global seg
    if (L <= x <= R): #the index to be updated is part of this node
        if (L == R): #this is a leaf node
            seg[v] += y
            mid = (L + R) / 2
            update(2 * v, L, mid, x, y)
            update(2 * v + 1, mid + 1, R, x, y)

            #update node using new values of children
            seg[v] = seg[2 * v] + seg[2 * v + 1]

Consider a query. Say that the query asks for the sum of values in the interval x \dots y. Say, further, that we are on node v, which represents the interval l \dots r. There are three possibilities:

1. l \dots r and x \dots y do not intersect. In this case, we are not interested in the value of this node.
2. x \leq l \leq r \leq y. In this case, we will use the value of the node as is.
3. There is a part of l \dots r, which resides outside x \dots y. Here, it makes sense to examine the children of v, to find the intervals that fit better, or not at all (see cases 1 and 2).


Let’s look at a concrete example!


So the answer to the query is stored in run sum (or whatever you call it).

This can be represented in a recursive method as follows.

def query(v, L, R, x, y):
    if (R < x or y < L):
        #case 1
        return 0
    elif (x <= L <= R <= y):
        #case 2
        return seg[v]
        #case 3
        mid = (L + R) / 2
        return query(2 * v, L, mid, x, y) + query(2 * v + 1, mid + 1, R, x, y)

We’ve put up a global list-based version and a class-based version of the segment tree code on Gist, so go check them out!

Now, some interesting stuff. Let’s say that instead of the sum function, we had some function f, which takes as input a list of values and outputs a single real number. What properties should it have so that our segment tree can support it?

We are combining intervals, in O(1) for the sum function, which gives us a runtime of O(\log N) per operation, so if combining intervals takes time T, then each operation is O(T\cdot\log N). So we can use this structure for any f whose value of T is low enough to be acceptable for the given situation.

For example, for fmax function, T = O(1), so the segment tree works quickly for that.

We’ll now prove that the query function runs in O(\log N) for any arbitrary interval.

At the heart of the proof is the fact that the query function visits at most 4 nodes in each level.

We always visit the root level, and it has only 1 node, so we have a base case. Also, let’s assume that the hypothesis holds true for all levels \leq k.

Now, if we visit 1 node on the k^{\text{th}} level, then we can visit only 2 in the next (since a single node has 2 children). Note how the function only visits continuous nodes in the next level.

If we visit 2, then they have only 4 children, which means we’re still fine visiting all 4.

We will never visit 3 nodes on a level. This is because if an interval does not fit well, we visit both of its children, which implies that we never visit an odd number of nodes on a level (except on root).

If we visit 4 nodes on a level, then by the previous argument, they must be continuous. We claim, by way of contradiction, that we will need to visit all 8 children from these nodes. This implies that we have case 3 for every one of these nodes. Note, however, that if any two nodes have case 3, the query interval spans every node between them, which contradicts our supposition. So we’ll only visit the children of at-most 2 nodes on this level, and hence at most 4 nodes on the next level.

Now, since the hypothesis holds for all levels \leq k implies it holds for level k + 1, by strong induction, it holds for every level.

Now, since there are O(\log N) levels, the query function will use at most 4\cdot\log N = O(\log N) nodes per query.

And we’re done!

What if we had updates of the type:

add x y z

where one has to add z to every element between x and y (both inclusive)?

We could do y - x + 1 updates on our tree, but that would be O(N \log N), which is atrocious.

Alternatively, note that we can break up the update interval into some nodes (exactly like the query function).

But instead of reporting the sums of all the nodes which exhibit case 2, we will update all of them. If z is added to every element of a node of size R - L + 1, its sum increases by (R - L + 1) \cdot z, which means we can update each node in O(1), for an overall runtime of O(\log N).

But *wait*.

The children of every non-leaf node we changed do not have the new value. What if we get a query that uses those nodes? Let’s create an additional variable for every node, called lazy, so that lazy_v stores the pending addition to node v.

We’ll only do additions to a node when we *need* to use the node, since we’re lazy folks and don’t want to do any work unless we need to.


What do we mean by pending addition? The idea is that when a node falls in case 2 (from the cases we defined around the query function), we update it, and add z to the pending values of its children. This adds an additional procedure to both the update and the query functions.

Before we do any work on a node, we need to apply any pending add updates, and propagate these updates to its children, by adding the updates to their lazy values.

This part is magical! All the pending changes for every update are squashed into a single update. No multiple updates are required anymore and we can make do with only 1, that too only when we need to.

Since we’re only doing an additional constant amount of work per node, everything is still O(\log N)!

Let’s make this procedure clearer with an example.


Now, let’s see how the lazy values come into play. Say that we call get a query 4 5.


The idea is hopefully precise enough to translate to code now. We’ve put up the code right here, so go take a look at it!

This concludes our article on segment tree. The segment tree is a rather wonderful data structure, so we hope we’ve done justice to it. As usual, we love hearing from you, so do send us your feedback at rawrr@algosaur.us!

Stack Exchange thread on proof of time complexity of query operation on segment tree

A Heap of (less) Complexity: Binomial Heaps

Firm grounding in the Basics of Data Structures
Algorithmic complexity

Hi, welcome to another installment of Algosaurus, this time on Binomial Heaps. They are a more efficient form of heaps than the simple binary heaps we looked at earlier, not to mention they have a beautiful intuition to them. Even though they aren’t frequently used, they’re still good for expanding your mind.

Let’s compare the complexities of various operations on Binary, Binomial, and Fibonacci Heaps respectively.


Need I say anything further?

As a disclaimer, this article is definitely not aimed at beginners to programming. The Binomial Heap concept is somewhat complex and you might look like this after your first reading:


That said, let’s begin!

First, a quick recap of binary heaps. In my last article, I implemented binary heaps as max heaps, where the largest element was stored at the root of the tree.

Today, we’re going to show them as min-heaps, where the root element is the smallest.


Heaps were created to improve the complexity of graph algorithms like Dijkstra’s Algorithm, by executing the algorithm using a heap. Notably, binary min-heaps are used as priority queues there.

Thing is, two more operations are often used in graph algorithms.

One, merging two heaps together to form a new heap.

Two, decreasing the value stored in a node, called decrease-key.

There’s no easy way to merge regular binary heaps apart from reconstructing the heap from scratch, making the complexity O(m \log n). That’s no good.

Once again, the time complexities for operations on binary heaps are as follows:


We can perform most operations in O(\log n) time or less.


Question is, can we do better?


Don’t worry Algosaurus, we can make that reality.

How do we merge nicely then?

Algosaurus then remembers that game he used to love playing on the phone, which merged tiles of the same power of 2 together.


You’ve played 2048 too, right?

Let’s get some intuition using binary arithmetic now.


What if we could adapt this approach to store our elements in ‘packets’ whose sizes are powers of 2?


With this, we have three properties fixed:

  • Sizes must be powers of two
  • Can efficiently fuse packets of the same size
  • Can efficiently find the minimum element of each packet

Let’s get down to deletion now.



Relax Algosaurus. Remember how any number can be expressed in terms of powers of 2?

Let’s ‘fracture’ the packet from which we deleted the element.


Then put it all back together.


With that, our four properties are:

  • Sizes must be powers of two
  • Can efficiently fuse packets of the same size
  • Can efficiently find the minimum element of each packet
  • Can efficiently ‘fracture’ a packet of 2^k elements into similar packets of smaller powers of 2

In what form should we express our packets in?

With binomial trees of course!

Quoting directly from the slides referenced in the Acknowledgements:

“A binomial tree of order k is a type of tree recursively defined as follows:
A binomial tree of order k is a single node whose children are binomial trees of order 0, 1, 2, …, k – 1.”


Let’s apply the heap property to the binomial trees.


A binomial heap is basically a forest of heap-ordered binomial trees. Now, let’s check whether our binomial trees adhere to the 4 properties we set earlier.

    • Sizes must be powers of two – Pretty obvious.
    • Can efficiently fuse packets of the same size – As shown below, it’s a trivial operation!


    • Can efficiently find the minimum element of each packet – Each tree has a pointer to its minimum element, so it can be retrieved in O(1) time.
    • Can efficiently ‘fracture’ a packet of 2^k elements into similar packets of smaller powers of 2


Since we’ve finally got an intuitive understanding of merging and deletion for the ‘packets’, let’s step through doing so in Binomial Heaps.





For now, I’m just going to say that the amortized time of insertion into a binomial heap is O(1). Feel free to take my word, or check them out in the slides given in the Acknowledgements.

But we still have a couple of problems…

When we intermix insert() and extractMin(), we force expensive insertions to happen repeatedly, wrecking our amortized time for insertion from O(1) to O(\log n), because we have to merge \log n binomial trees together at the same time.


Hmm. How do we make the speed of our insertion operations independent of whether we do deletions in the middle?

In Bengali we have a saying, “If you don’t have a head, you won’t have a headache”.

What if we just don’t merge the O(\log n) binomial trees together, avoiding the step altogether?

That is, just add the isolated, unmerged binomial tree to the forest with every insertion and do nothing else.

We’ll only coalesce the trees together when we call extractMin(), only when we need to. The number of trees required to coalesce all the disjoint trees, is \log n, just like the number of bits required to represent a decimal number n, is \log n.

We can’t just ‘merge’ all the mini-heaps together because the ‘merge’ operation assumes that all the trees in the binomial heap are in ascending order, and that isn’t the case here.

Let’s go back to our 2048-analogy for some intuition.


And we’re done!

Here are the final amortized complexities of our ‘lazy’ binomial heap:
insert(): O(1)
merge(): O(1)
findMin(): O(1)
extractMin(): O(\log n)

I’m not proving them here, in interest of keeping this proof-free. Feel free to check them out in the slides given in the Acknowledgements.

This concludes my article on Binomial Heaps. Although this may not have been as fun as my other articles, I hope it managed to demystify Binomial Heaps for you.

This article simply wouldn’t have been possible without this series of slides CS166 from Stanford. These slides were the template for my article, so full props to them.

“It’s turtles all the way down.” – A guide to the Basics of Data Structures



Patience Algosaurus.

Data structures by themselves aren’t all that useful, but they’re indispensable when used in specific applications, like finding the shortest path between points in a map, or finding a name in a phone book with say, a billion elements (no, binary search just doesn’t cut it sometimes!).

Oh, and did I mention that they’re used just about everywhere in software systems and competitive programming?

This time, we only have two levels and a bonus, since this is an article on just the basics of data structures. Having a Mastery level just doesn’t make sense when there’s a ridiculous number of complicated data structures.

Say hello to Loopie.

dsb2Loopie enjoys playing Hockey with her family. By playing, I mean…


When the turtles are shucked into the goal, they are deposited back on top of the pile.

Evidently, Loopie’s family likes sliding on ice.

Notice how the first turtle added on the pile, is the first turtle to be ejected.

This is called a queue.

Similar to a real queue, the first element which was added to the list, will be the first element out. This is called a FIFO (First In First Out) structure.

Insertion and deletion operations?


q = []

def insert(elem):
    q.append(elem) #inserts elem into the end of the queue
    print q

def delete():
    q.pop(0) #removes 0th element of the queue
    print q

After a fun-filled afternoon playing Hockey, Loopie is making pancakes for everyone. She places all the pancakes in a similar pile.


Then serves them to the family one by one.

Notice how the first pancake she made, is the last one she serves.

This is called a stack.

The first element which was added to the list, will be the last one out. This is called a LIFO (Last In First Out) structure.

Insertion and deletion operations?


s = []

def push(elem): #insertion in a stack is called 'pushing' into a stack
    print s

#deletion from a stack is called 'popping' from a stack
#pop is already a predefined function in Python for all arrays, but we'll still define it here for learning purposes as customPop()

def customPop():
    print s

Ever seen a density column?

All the items from top to bottom, are in ascending order of their densities. What happens when you drop an object of arbitrary density into the column?

It settles to the correct position on its own, due to difference in densities in the layers above and below it.

Heaps are something of that ilk.

Consider a heap like so.

A heap is a complete binary tree, meaning every parent has two children. Even though we visualize it as a heap, it is implemented through a regular array.

Also, a binary tree is always of height log n, where n is the number of elements. By the way, log n in Computer Science is always considered to be log_2 n by default, because we really like binary stuff.

This is a max-heap, where the fundamental heap property is that the children of any parent node, will be smaller than the parent node itself. In min-heaps, the children are always larger than the parent node.

A few basic function definitions:

global heap
global currSize

def parent(i): #returns parent index of ith index
    return i/2

def left(i): #returns left child of ith index
    return 2*i

def right(i): #returns right child of ith index
    return (2*i + 1)

Let’s tackle this part-by-part.

1) Inserting an element into a pre-existing heap

We first insert the element into the bottom of the heap, ie. last index in the array. Then we repeated apply the heap property on the index of the element till it reaches the appropriate position.


The algorithm is as follows:

1. Add the element to the bottom level of the heap.
2. Compare the added element with its parent; if they are in the correct order, stop.
3. If not, swap the element with its parent and return to the previous step.


def swap(a, b): #to swap a-th and b-th elements in heap
    temp = heap[a]
    heap[a] = heap[b]
    heap[b] = temp

def insert(elem):
    global currSize
    index = len(heap)
    currSize += 1
    par = parent(index)
    flag = 0
    while flag != 1:
        if index == 1: #we have reached the root of the heap
            flag = 1
        elif heap[par] > elem: #if parent index is larger than index of elem, then elem has now been inserted into the right place
            flag = 1
        else: #swaps the parent and the index itself
            swap(par, index)
            index = par
            par = parent(index)
    print heap

The maximum number of times this while loop can run, is the height of the tree itself, or log n. Hence the time complexity is O(log n).

2) Extracting the largest element from the heap

The first element of the heap is always the largest, so we just remove that and replace the top element with the bottom one. Then we restore the heap property back to the heap, through a function called maxHeapify().

1. Replace the root of the heap with the last element on the last level.
2. Compare the new root with its children; if they are in the correct order, stop.
3. If not, swap the element with one of its children and return to Step 2. (Swap with its smaller child in a min-heap and its larger child in a max-heap.)


def extractMax():
    global currSize
    if currSize != 0:
        maxElem = heap[1]
        heap[1] = heap[currSize] #replaces root element with the last element
        heap.pop(currSize) #deletes last element present in heap
        currSize -= 1 #reduces size of heap
        return maxElem

def maxHeapify(index):
    global currSize
    lar = index
    l = left(index)
    r = right(index)

    #print heap

    #finds the larger child of the index; if larger child exists, swaps it with the index
    if l <= currSize and heap[l] > heap[lar]:
        lar = l
    if r <= currSize and heap[r] > heap[lar]:
        lar = r
    if lar != index:
        swap(index, lar)

Again, the maximum number of times maxHeapify() can be executed, is the height of the tree itself, or log n. Hence the time complexity is O(log n).

3) How to make a heap out of any random array

Okay, so there’s two ways to go about it. The first way is to just repeatedly insert every element into the previously empty heap.

This is easy, but relatively inefficient. The time complexity of this comes out to be O(n log n) because of an O(log n) function being run n times.

There’s a better, more efficient way of doing this, where we simply maxHeapify() every ‘sub-heap’ from the (currSize/2)th element back to the 1st one.

This runs at O(n) complexity and it’s beyond the scope of this article to prove why. Just understand that any element past the halfway point in a binary tree, will have no children and that most of the ‘sub-heaps’ formed will be of height less than log n.


def buildHeap():
    global currSize
    for i in range(currSize/2, 0, -1): #third argument in range() shows increment factor, here -1
        print heap
    currSize = len(heap)-1


Ah, we’ve been leading up to this question this entire time.

Heaps are used to implement an efficient sort of sort, unsurprisingly called, the Heapsort.

Unlike the sorely inefficient Insertion Sort and Bubble Sort with their measly O(n^2) complexities, Heapsort runs in O(n log n) time, beating them to the dust.

It’s not even complicated, just keep extracting the largest element from the heap till the heap is empty, placing them sequentially at back of the array where the heap is stored.


def heapSort():
    for i in range(1, len(heap)):
        print heap
        heap.insert(len(heap)-i, extractMax()) #inserting the greatest element at the back of the array
    currSize = len(heap)-1

To tie it all together, I’ve written a few lines of helper code to input elements into the heap and try out all the functions. Check it out right here. Oh, and for all the people who are acquainted with classes in Python, I’ve also written a Heap class here.

Voila! Wasn’t that easy? Here’s a partying Loopie just for coming this far.


We also use heaps in the form of priority queues to find the shortest path between points in a graph using Dijkstra’s Algorithm, but that’s a post for another day.

Loopie wants to teach her baby turtles how to identify shapes and colours, so she brings home a large number of pieces of different colours.


This took them a lot of time, as well as confusion.

So she gets another toy to make the process easier.
dsb8This was easier because the babies already knew that the pieces were categorized according to shape. What if we labeled each of the poles as follows?


The babies would just have to check for the pole number, then look through a far smaller number of pieces on the pole.

Now imagine one pole for every combination of colour and shape possible.

Let the pole number be calculated as follows:

purple triangle:
p+u+r+t+r+i = 16+21+18+20+18+9 = Pole #102

red rectangle:
r+e+d+r+e+c = 18+5+4+18+5+3 = Pole #53


We know that 6*26 = 156 combinations are possible (why?), so we’ll have 156 poles in total.

Let’s call this formula to calculate pole numbers, the hash function.

In code:

def hashFunc(piece):
    words = piece.split(" ") #splitting string into words
    colour = words[0]
    shape = words[1]
    poleNum = 0
    for i in range(0, 3):
        poleNum += ord(colour[i]) - 96
        poleNum += ord(shape[i]) - 96
    return poleNum

If we ever need to finish where ‘pink square’ is kept, we just use

hashFunc('pink square')

and check the pole number, which happens to be pole #96.


This is an example of a hash table, where the location of an element is stored in terms of a hash function. The poles here are analogous to buckets in proper terminology.

This makes time taken to search for a particular element independent of the total number of elements, ie. O(1) time.

Let this sink in. Searching in a hash table can be done in constant time.


What if we’re searching for a ‘dreary-blue rectangle’, assuming a colour called ‘dreary-blue’ exists?

hashFunc('dreary-blue rectangle')

returns pole #53, which clashes with the pole number for ‘red rectangle’. This is called a collision.

How do we resolve it?

We use a method called separate chaining, which is a fancy way of saying every bucket consists of a list, and we simply search through the list if we ever find multiple entries.

Here, we’ll just put the dreary-blue rectangle on top of the red rectangle, and just pick either one whenever we need to.

The key in any good hash table, is to choose the appropriate hash function for the data. This is unarguably the most important thing in making a hash table, so people spend a lot of time on designing a good hash function for their purpose.

In a good hash table, no bucket should have more than 2-3 entries. If there are more, then the hashing isn’t working well, and we need to change the hash function.


Searching is independent of the number of elements for god’s sake. We can use hash tables for just about anything involving a gigantic number of elements, like database entries or phone books.

We also use hash functions in searching for strings or sub-strings in large collections of texts using the Rabin-Karp algorithm. This is useful for detecting plagiarism in academic papers by comparing them against source material. Again, a post for another day!


I plan on writing more on advanced data structures like Fibonacci Heaps and Segment Trees and their uses, so subscribe to Algosaurus for updates!

I hope you found this informal guide to the basics of data structures useful. If you liked it, want to insult me, or if you want to talk about dinosaurs or whatever, shoot me a mail at rawrr@algosaur.us

Introduction to Algorithms – Cormen, Leiserson, Rivest, and Stein (pages 151 – 161)

“To iterate is human, to recurse, divine.” – A guide to Recursion


  • Logical progression of thought
  • Some mathematical aptitude
  • Basic knowledge of functions

Recursion is possibly one of the most challenging concepts in Computer Science to get the hang of, and something which has fascinated me for a very long time. Sure, we all know the textbook definition:

“Recursion is a method where the solution to a problem depends on the solutions to smaller instances of the same problem.”

But it still takes an “Aha! Moment” to get an intuitive grasp on it, which is why I’ll finally take a stab at it.

As usual, we have three levels, so switch to them as you see fit:

Algosaurus coincidentally happens to be part of an obscure death metal band you’ve never heard of. He goes to a concert venue with a poorly-designed audio system and starts off with his smash-hit song “Rock Tore My World Apart”. He growls into the mic as he descends into the chorus and…

Evidently Photoshop isn’t exactly my strong suit. Apologies.

…is rudely interrupted by a loud screeching sound from the amplifier.
rec2What just happened was a feedback loop. While Algosaurus was singing (growling?) his heart out, his mic picked up the output of the surrounding amplifiers, and fed it right back into the input.
rec6Although the above example doesn’t reduce to a meaningful output, rather it results in a stack overflow (the screeching sound), this is the essential idea of recursion.
rec7In recursion, we have:

Sub-problem (which is Problem with a smaller input)
Base Case (a Problem with a constant output)

Confused by the jargon?
rec4So was I.

No article on recursion is complete without talking about Fibonacci Numbers.

Fibonacci series: 0, 1, 1, 2, 3, 5, 8, 13, 21…

You’ll notice that the nth Fibonacci number, after n = 0 and n = 1, is the sum of the previous two Fibonacci numbers.

Namely, fibo(n) = fibo(n-1) + fibo(n-2)

Notice how this is a recursive definition? Let’s break it down into a recursion tree.

So, let’s define Fibonacci numbers in terms of Problems and Base Cases.

Problem: fibo(n)
Sub-problem(s): fibo(n-1) + fibo(n-2)
Constant Base Case: fibo(1) == 1 and fibo(0) == 0

This is also known as the divide-and-conquer paradigm of algorithms, where we segregate a problem into smaller sub-problems, and finally into a simple base case.

Final code for Fibonacci numbers?

def fibo(n):
    if n == 0:
        result = 0 #Base case
    elif n == 1:
        result = 1 #Base case
        result = fibo(n-1) + fibo(n-2) #Feeding the 'output' right back into the input function
    return result

Lo and behold, you just learnt recursion.

Let’s take another classic example to hit the point home.

Once upon a time, in the Cretaceous Era, there lived a family of Algosauri in a land far, far away. Baby Algosaurus had a toy with 3 rods and 7 discs.

His Dad challenged him to a game, where the objective was to transfer all the discs to another rod, obeying the following rules:

1. Only one disk may be moved at a time.
2. Only the uppermost disk from one rod can be moved at once.
3. It can be put on another rod, only if that rod is empty or if the uppermost disk of that rod is larger than the disc being moved.

How did Baby Algosaurus do it?

Let’s assume there to be n number of discs, and we’ll be moving the tower from source to target using the helper rod:

For n = 1:
rec8 1 step. Easy enough.

For n = 2:


3 steps. We’re moving along now.
In Step 1, notice how after we shift the 1st disc to Helper, we have a tower of n – 1 = 1 disc, on Helper.

For n = 3:
And we’re done in 7 steps!
Observe that in Step 3, we have a tower of n – 1 = 2 discs, on Helper now. Notice anything similar to the previous case?

In case you haven’t noticed…

T_1 = 2^1 - 1
T_2 = 2^2 - 1
T_3 = 2^3 - 1

In fact, we can prove by mathematical induction, that the least number of moves required to solve a tower of n discs is:

T_n = 2^n - 1

Click here if you’d like to see the proof for yourself.

Back? Good.

We now need to solve the problem for n = 7. We can of course, figure it out by hand as above, but what’re we learning recursion for then?

A quick recap of the divide-and-conquer method:

Sub-problem (which is a Problem with a smaller input)
Base Case (a Problem with a constant output)

So here, in the Tower of Hanoi problem (yeah, that’s the official name), we have:

The general rule for moving towers:
rec12Problem: Moving tower of n discs from Source to Target
Sub-Problem: Moving tower of n – 1 discs from Source to Helper, then moving last disc to Target
Base Case: Moving tower of n = 1

Great. Let’s finally translate this into code.

def hanoi(n, source, helper, target):
    if n > 0:
        hanoi(n - 1, source, target, helper)    #moving tower of size n - 1 to helper:
        #moving disk from source peg to target peg
        if source:    #checks whether source is empty or not
            target.append(source.pop())    #appends the last disc from source, to target
        hanoi(n - 1, helper, source, target)    #moving tower of size n-1 from helper to target
source = [4,3,2,1]
target = []
helper = []

print source, helper, target


Feel free to go through this entire section once more to digest it, because it’s pretty mentally intensive if you aren’t yet used to thinking this way.

Proof by induction:

The base cases for this problem are pretty simple…

T_0 = 0 = 2^0 - 1
T_1 = 1 = 2^1 - 1

Let’s assume T_k = 2^k - 1 to be true for now.

To prove: T_{k+1} = T^{k+1} - 1 is true.

So we have,
= 2T_k + 1
= 2(2^k - 1) + 1
= 2^{k+1} - 2 + 1
= 2^{k+1} - 1

Hence proved. Click here to go back up.

Self-similarity is symmetry across scale. It implies recursion, pattern inside of pattern. Monstrous shapes like the Koch Curve display self-similarity because they look exactly the same, even under high magnification. Self-similarity is an easily recognizable quality. It’s images are everywhere in the culture: in the infinitely deep reflection of a person standing between two mirrors, or in the cartoon notion of a fish eating a smaller fish eating a smaller fish eating a smaller fish.

– Chaos – The Amazing Science of the Unpredictable, James Gleick

In 1976, a biologist named Robert May published a seminal paper on the population of animals. This paper would soon create a storm, and not just in biology.

Let’s take a look at what it was all about.

May proposed that variation in population of animals could be modeled by a simple equation, formally called the logistic equation:


This is a recurrence relation, an equation which recursively spits out values as a function of its previous values, ie. takes the output and shoves it back into the input.

It’s such a cute equation, isn’t it? Who’d guess that its interpretation could transform into *this*…

This is called a bifurcation diagram, and is also a kind of fractal. It shows the values the population of the species evaluates to finally (as time tends to infinity), depending on the value of r, the growth parameter.

Let’s dissect this.

r between 0 and 1 – Population dies out
r between 1 and 3 – Population converges to a single value
r between 3 and 3.44 – Population oscillates between two values
r between 3.44 and 3.54 – Oscillates between four values
r between 3.54 and 3.56 – Oscillates between 8 values

All of the above are pretty much independent of initial conditions, ie. the initial x_n we take.

What about 3.56 onward?

Slight differences in initial conditions result in dramatic changes in population, with no particular order. (In science-y language, ‘Extreme sensitivity to initial conditions’)

This particular paper was path-breaking because it showed how Chaos Theory and fractals aren’t obscure theoretical constructs; they’re things which explain real-life phenomena and its glorious imperfections.

This is a bit beyond the scope of this article, but it goes to show how wide-ranging a role does Recursion play, not just in computer science, but in nature itself.


I hope you now have enough a grasp on thinking recursively to understand algorithms like Quick Sort and Merge Sort, and attempt a few basic ad-hoc problems from Codechef et al.

If you are just as fascinated as I am about recursion, then you might enjoy reading about the following:
Chaos Theory
Infinite Series and Loops
Feedback loops in Op-amps (they’re an essential part of electronics)

I plan on writing more about the above topics, so do subscribe to Algosaurus for updates.

If you liked this guide, want to insult me, or if you want to talk about dinosaurs or whatever, shoot me a mail at rawrr@algosaur.us


Such complex, very wow – A guide to Algorithmic Complexity


  • Exponentiation
  • Basic algebra
  • Functions and asymptotes
  • Prior knowledge of Insertion Sort and other basic sorts of sorts

This post is roughly divided into three parts, so feel free to switch to either part depending on your level:

Algorithmic complexity and growth of functions form the very foundation of algorithms. They give us an idea of how fast or slow an algorithm is, so you can’t go anywhere near designing and analyzing ’em without having these as tools.

Algosaurus wants to know what ‘growth of functions’ means. So he decides to get two bunnies. And the bunnies start having kids, hypothetically four per couple. Lots of them, because well, erm…

If we examine the growth of bunnies, we find that it grows at the rate of 2^n, where n is the number of generations. This isn’t very good news for Algosaurus, because despite bunnies being cute and cuddly, there is something called having too many bunnies.

Roughly speaking, Bunny Breeding Algorithm has a complexity of O(2^n).

New Doc_1

This is because the number of bunnies in next generation (the output) grows exponentially with respect to the number of bunnies we started with (the input).

Let’s dissect this.

“Order of growth of the running time of an algorithm gives a simple characterization of the algorithm’s efficiency and also allow us to compare the relative performance of alternative algorithms”.

– Holy Gospel of CLRS

Damn, that’s a mouthful.

Imagine Algosaurus is descending a very odd staircase with n stairs. To descend each stair, he has to walk n tiles horizontally, and only then he can step down.

So for each stair, he has to walk n tiles. Therefore, for n stairs, he must walk n^2 tiles.

def weirdStairs(nofStairs):
    steps = 0
    for i in range(0, nofStairs):
        steps += nofStairs
        steps += 1
    print steps

Total time taken to walk the entire flight of stairs?

Similarly, let’s take the example of the Insertion Sort.

def insertionSort(list):
    for i in range(1, len(list)):                                     
        currentValue = list[i]                                         
        position = index                                

        while position > 0 and list[position - 1] > currentValue:
            list[position] = list[position - 1]
        list[position] = currentValue

Suppose the list is in descending order and every number has to be “inserted” into the right place, ie. each line of code in the loops is executed for every element.

As you go through the code, you’ll notice that for each element present in the list, it will have to iterate through the rest of elements in the list.

If n is the length of the list, the performance isn’t linearly proportional to n, it’s proportional to the square of n.

So, the worst case running time of the Insertion Sort comes out to be O(n^2), the ‘big-oh’ being part of standard notation and ‘n’ being the total number of elements in the list.

This is the worst case performance of the Insertion Sort, and usually the only kind of performance we care about.

We’ve only talked about lists with 10, maybe 100 elements. What about those with 1000, or even a million elements? Something like the total number of Facebook users?

Oh boy, then algorithmic complexity really begins to play a role.

The number of elements is so large in these cases, that only the highest power, ie. degree, of the polynomial matters, and nothing more. The rest of the terms are rendered insignificant.

The priority order for growth of functions is as follows:
9Also, roughly speaking, we can estimate the complexity of an algorithm by analyzing its loop structure.

Traversing a list is an O(n) operation, so is finding the length of one. This means running time is directly proportional to the number of elements.

def traversal(list):
    for i in range(len(list)):
        print list[i]

So at this point, Algosaurus knows enough about the growth of functions to figure out whether one algorithm is faster than the other. Yaaaaay.

For a more formal treatment of complexity analysis, read on. Or maybe come back to it after a cup of tea when all this has sunk in. *Subtle plug for bookmarking this page*

Opening CLRS page 43, Algosaurus sees…

2 Let’s first talk about ‘asymptotic notation’. As you probably know, when the input of certain functions tends to infinity, the output sometimes approaches a line, but doesn’t quite touch it. This line is called an asymptote. 8 4 You should care because when designing a particular algorithm or implementing one, it is important to know how it will perform with huge input sizes. You already know what big-oh notation means: it tells you the worst-case performance of an algorithm. Let’s switch to big-omega. I’ll use the last example again. Suppose our input list is sorted to begin with, ie. the best-case performance.

def insertionSort(someList):
    for i in range(1, len(someList)):                                     
        currentValue = someList[i]                                         
        position = i                                

        while position > 0 and someList[position - 1] > currentValue:
            someList[position] = someList[position - 1]
            position -= 1
        someList[position] = currentValue
    return someList

We’ll still have to traverse every element in the list to determine whether it is sorted or not, even if we don’t go into the nested while loop. The outer loop is still executed ‘n’ times in the best case scenario.

So the best case performance turns out to Ω(n).

To digest what the rest of the godforsaken symbols mean, we need some math, namely…

Functions! And graphs!

Formally speaking, we can define Θ, O, and Ω as the following.

This doesn’t really mean much, to be honest. Let’s bring out the big guns: graphs. (Plugged from page 45 of CLRS)

A few finer points. These functions are defined only when f(n) is monotonically adhering to the required constraints after n_0.

For example, for big-omega, Ω(g(n)) is formally defined only when it is less than the runtime of the function (c*g(n)) for all values greater than n_0.

a) This is for big-theta. It is sandwiched by two functions, showing that Θ(g(n)) is defined only when it is bound from the upper and the lower sides. You can’t just say it, you gotta prove it. Hard work.

b) This is for big-oh. The runtime function is always less than O(g(n)), showing that big-oh is the *upper bound* for the runtime. Actual performance won’t be worse than this.

c) For big-omega. The runtime function is always greater than Ω(g(n)), showing that big-omega is the *lower bound* for the runtime. You can’t do much better than this.


Patience Algosaurus. Here are the practical implications:

1. Big-omega is useful when you need a *lower bound* for the time, to show that an algorithm will always be slower than Ω(g(n)). Best-case performance.
Meh, sometimes used.

2. Big-oh is used when you need an *upper bound* for the time, to show that an algorithm will always be faster than O(g(n)). Worst-case performance.
Used all the time.

3. Finding big-theta is usually harder than big-oh, so big-oh is used more frequently, despite big-theta being more mathematically accurate. Math folks must find it blasphemous, but no can do.

Phew, that was a lot of theory.


I hope you found this informal guide to algorithmic complexity analysis useful. If you liked it, want to insult me, or if you want to talk about dinosaurs or whatever, shoot me a mail at rawrr@algosaur.us

Introduction to Algorithms – Cormen, Leiserson, Rivest, and Stein (pages 43 – 64)
Algorithms – Dasgupta, Papadimitrou, and Vazirani (For the priority order)
Numbers: The Key to the Universe – Kjartan Poskitt (Bunny example inspired by this book)