[The Beauty of Programming] Find the Kth biggest number within list of N numbers

Again it’s been a while after previous post…

O(nlogn) – Sort
Sort the list, and retrieve the nth one.

O(n*k) – Partial Sort
There can be multiple ways to do partial sort, one of them being selective sort: loop the list – first cycle get the biggest one; second cycle get the second one etc. till the Kth cycle. On each cycle we can move the number to the front, so that l[0:K] is sorted after Kth cycle.

O(n*logk) – Binary Search
This can be a little bit tricky but the idea is like binary search – divide and conquer:

def findKth(l, left, right, k):
    pivot_pos = partition(l, left, right)
    la = pivot_pos - left + 1
    lb = right - pivot_pos
    if la > k:
        return findKth(l, left, pivot_pos, k)
    else:
        return findKth(l, pivot_pos + 1, right, k - la)

O(n*logk) – Use a data structure to track K numbers
The idea is: use a data structure to store first K numbers. Initially, put first K numbers in L to the data structure; then loop through the list: for each element, check if the min number of the data structure is less than it, if not then pass, else remove the min number in the datastructure, and put the number in. Apparently min heap is a valid data structure to do it. O(n*logk)

O(n) – bitvec
Again make use of bitvec if possible….

[The Beauty of Programming] Max Sum of Subarray

Algorithm 1 – O(n^2): enumerate all possible subarray
This is pretty straightforward, but second program improves the performance by not repeating calculating sum.

def max_subarray_enum(l):
    """
    >>> l = [43, 23, -3, 32, -43, -22, 12, 3, -4, 5]
    >>> print(max_subarray_enum(l))
    95
    >>> l = [-3, 32, -43, -22, 12, 3, -4, 5]
    >>> print(max_subarray_enum(l))
    32
    """
    length = len(l)
    def sum(i, len):
        # assertion
        val = 0
        for n in range(0, len):
            val += l[i + n]
        return val
    # loop invariant:
    # l[i:i+len] is subarray which we need to calculate the sum
    # termination: enumerated all possible i j combinations
    max_val = l[0]
    for i in range(0, length):
        for subarray_len in range(1, length - i + 1):
            val = sum(i, subarray_len)
            if val > max_val: max_val = val
    return max_val
def max_subarray_enum2(l):
    """
    >>> l = [43, 23, -3, 32, -43, -22, 12, 3, -4, 5]
    >>> print(max_subarray_enum2(l))
    95
    >>> l = [-3, 32, -43, -22, 12, 3, -4, 5]
    >>> print(max_subarray_enum2(l))
    32
    """
    length = len(l)
    # loop invariant:
    # l[i:i+len] is subarray which we need to calculate the sum
    # termination: enumerated all possible i j combinations
    max_val = l[0]
    for i in range(0, length):
        val = 0
        for subarray_len in range(1, length - i + 1):
            val += l[i + subarray_len - 1]
            if val > max_val: max_val = val
    return max_val

Algorithm 2 – O(nlogn): divide and conquer

def max_subarray_divide(l, left, right):
    """
    >>> l = [43, 23, -3, 32, -43, -22, 12, 3, -4, 5]
    >>> print(max_subarray_divide(l, 0, len(l) - 1))
    95
    >>> l = [-3, 32, -43, -22, 12, 3, -4, 5]
    >>> print(max_subarray_divide(l, 0, len(l) - 1))
    32
    """
    # divide and conquer:
    # max = max_val(
    #   recur(firsthalf),
    #   recur(secondhalf),
    #   max_sum_left + max_sum_right
    # )
    def max_val(vals):
        # TODO: assert
        max = vals[0]
        for val in vals:
            if max < val: max = val
        return max
    def max_sum(start, end, step):
        if step > 0 and start > end:
            return 0
        elif step < 0 and end > start:
            return 0
        index_range = range(start, end + step, step)
        max = l[index_range[0]]
        sum = max
        for i in index_range[1:]:
            sum += l[i]
            if max < sum: max = sum
        return max

    if left > right:
        return 0  # actually, this one should be minus infinity
    if left == right:
        return l[left]

    mid = left + ((right - left) >> 1)
    # NOTE for the following statement, if we put max_sum(mid, right, 1)
    # then there will be dead recursive loop
    max_sum_right = max_sum(mid + 1, right, 1)
    max_sum_left = max_sum(mid, left, -1)
    max_sum_first = max_subarray_divide(l, left, mid)
    max_sum_right = max_subarray_divide(l, mid + 1, right)

    return max_val([max_sum_first, max_sum_right, max_sum_left + max_sum_right])

Algorithm 3 – O(n) with O(n) space: dynamic programming

def max_subarray(l):
    """
    >>> l = [43, 23, -3, 32, -43, -22, 12, 3, -4, 5]
    >>> print(max_subarray(l))
    95
    >>> l = [-3, 32, -43, -22, 12, 3, -4, 5]
    >>> print(max_subarray(l))
    32
    """
    # following are the dynamic programming vars:
    # - tail_max[n] means max subarray that ends at position n
    # - max[n] means max subarray in array l[0:n+1]
    #
    # following are the dynamic programming equations:
    # - tail_max[n] = max(tail_max[n-1]+l[n], l[n])
    # - max[n] = max(max[n-1], tail_max[n])
    #
    def max_val(a, b):
        return a if a > b else b
    # init data structures
    length = len(l)
    max = [None] * length
    tail_max = [None] * length
    # initial values
    if len(l) == 0:
        return 0
    max[0] = l[0]
    tail_max[0] = l[0]
    # dynamic programming: loop
    for n in range(1, length):
        tail_max[n] = max_val(tail_max[n-1]+l[n], l[n])
        max[n] = max_val(max[n-1], tail_max[n])
    return max[n]

Algorithm 4 – O(n) with O(1) space: dynamic programming with improvement

def max_subarray_o1space(l):
    """
    >>> l = [43, 23, -3, 32, -43, -22, 12, 3, -4, 5]
    >>> print(max_subarray_o1space(l))
    95
    >>> l = [-3, 32, -43, -22, 12, 3, -4, 5]
    >>> print(max_subarray_o1space(l))
    32
    """
    def max_val(a, b):
        return a if a > b else b
    # initial values
    if len(l) == 0:
        return 0
    max_prev = l[0]
    tail_max_prev = l[0]
    # init data structures
    max = max_prev
    tail_max = tail_max_prev
    # dynamic programming: loop
    for n in range(1, len(l)):
        tail_max = max_val(tail_max_prev+l[n], l[n])
        max = max_val(max_prev, tail_max)
        tail_max_prev = tail_max
        max_prev = max
    return max

2D array?…
OK, I’ll ignore the code here….but it can be converted to 1D array like:

for upper_bound in range(0, first_dim_length(matrix)):
  for lower_bound in range(upper_bound, first_dim_length(matrix):
    merge the 2D array in within upper and lower bound to 1D array
    use dynamic programming approach to solve it

[Fight] Sort Algorithms

Ran into tons of production issues yesterday…still lots of ongoing issue. So I am going to keep the recent posts short. This one is for sort algorithms.

Insert Sort 1: use binary search to find the insert point

def insert_sort(l):
    """
    >>> l = [32, 64, 15, 38, 87, 13, 53, 12]
    >>> insert_sort(l)
    >>> print(l)
    [12, 13, 15, 32, 38, 53, 64, 87]
    >>> l = prog_1_4_gen_list_2(1 << 8, 99)
    >>> insert_sort(l)
    >>> test_verify_sorted(l)
    True
    """
    # loop invariant:
    # i indicates l[0:i] is sorted and l[i] needs to be inserted
    # init: i = 1
    # terminate: i = len(l)
    for i in range(1, len(l)):
        val = l[i]
        # find insert point
        insert_point = prog_2_A_binary_search(l[0:i], l[i], True)
        # right shift the l[insert_point:i] by 1
        # loop invariant:
        # j indicates the current empty position, and l[j-1] needs to be shifted
        # init: j = i
        # termination: j = insert_point
        for j in range(i, insert_point, -1):
            l[j] = l[j - 1]
        # insert the value into the position
        l[insert_point] = val

Insert Sort 2: scan back while shifting

def insert_sort2(l):
    """
    >>> l = [32, 64, 15, 38, 87, 13, 53, 12]
    >>> insert_sort2(l)
    >>> print(l)
    [12, 13, 15, 32, 38, 53, 64, 87]
    """
    # loop invariant:
    # i indicates l[0:i] is sorted and l[i] needs to be inserted
    # init: i = 1
    # termination: i = len(l)
    for i in range(1, len(l)):
        val = l[i]
        # find insert point, while shifting from left
        # loop invariant, j indicates:
        # - current empty position, and l[j-1] needs to be shifted
        # - l[j:i] contains all elements > val
        # - l[0:j] are to be scanned
        # termination: j == 0 (no l[j - 1]), or l[j - 1] > val
        j = i
        while j > 0 and val < l[j - 1]:
            l[j] = l[j - 1]
            j -= 1
        # need to insert the value to the insert point
        l[j] = val

Heap Sort: heap_insert and heap_pop primitives

def heap_sort(l):
    """
    >>> l = [106, 136, 124, 63, 76, 147, 92, 35]
    >>> heap_sort(l)
    >>> print(l)
    [35, 63, 76, 92, 106, 124, 136, 147]
    >>> l = prog_1_4_gen_list_2(1 << 8, 99)
    >>> heap_sort(l)
    >>> test_verify_sorted(l)
    True
    """
    def heap_insert(l, end):
        # input contract:
        # - l[0:end] is already heap
        # - end is the new element to insert
        # output contact:
        # l[0:end+1] is heap
        #
        # process: loop to 'bubble up' the l[end] element.
        # invariant:
        # - p to indicate the current parent node to compare
        # - pos to indicate the current position of the new element
        # - terminal condition: p is out of scope, or l[p] is already bigger
        def parent(pos):
            return (pos-1) >> 1
        assert(end >= 1)
        p = parent(end)
        pos = end
        val = l[pos]
        while p >= 0 and l[p] < val:
            l[p], l[pos] = l[pos], l[p]
            pos = p
            p = parent(pos)
        return
    def heap_pop(l, end):
        # input contract (assuming len = len(l)):
        # - l[0:end+1] is heap
        # output contract:
        # - l[0:end] is heap
        # - l[end] is the element that is pop up
        #
        # firstly, move the last element to be the new heap top;
        l[0], l[end] = l[end], l[0]
        # secondly, 'bubble down' the top element
        # loop invariant:
        # - n to indicate the current location of the element
        # termination: n is the leaf node, or n is already bigger
        def left_child(pos):
            return (n << 1) + 1
        n = 0
        val = l[n]
        left_child_n = left_child(n)
        while left_child_n < end:
            right_child_n = left_child_n + 1
            # case 1: only left child exists
            if right_child_n >= end:
                if l[left_child_n] > val:
                    l[left_child_n], l[n] = l[n], l[left_child_n]
                    # maintain loop invariant
                    n = left_child_n
                    left_child_n = left_child(n)
                else:
                    # termination condition: val is bigger than all its children
                    break
            # case 2: both left and right exists
            else:
                if val >= l[right_child_n] and val >= l[left_child_n]:
                    # termination condition: val is bigger than all its children
                    break
                else:
                    # need to swap n with the bigger one
                    big_child_n = left_child_n
                    if l[left_child_n] < l[right_child_n]:
                        big_child_n = right_child_n
                    l[big_child_n], l[n] = l[n], l[big_child_n]
                    # maintain loop invariant
                    n = big_child_n
                    left_child_n = left_child(n)
        return
    # heap sort
    # step 1 - heaplify
    length = len(l)
    for i in range(1, len(l)):
        heap_insert(l, i)
    # step 2 - pop
    for i in range(0, len(l)):
        heap_pop(l, length - 1 - i)
    return

Quick Sort: loop invariant to help track the pivot

def qsort(l, left, right):
    """
    >>> l = [32, 64, 15, 38, 87, 13, 53, 12]
    >>> qsort(l, 0, len(l) - 1)
    >>> qsort(l, 0, len(l) - 1)
    >>> print(l)
    [12, 13, 15, 32, 38, 53, 64, 87]
    >>> l = prog_1_4_gen_list_2(1 << 8, 99)
    >>> qsort(l, 0, len(l) - 1)
    >>> test_verify_sorted(l)
    True
    """
    def partition(l, left, right):
        # choose random pivot position and swap it to the first one
        pivot_pos = random.randint(left, right)
        l[pivot_pos], l[left] = l[left], l[pivot_pos]
        pivot_pos = left
        pivot_val = l[left]
        # loop to move all the big numbers to the right side of pivot,
        # and all the small numbers to the left side
        #
        # invariants:
        # - [low, high] indicates the range where the numbers are not handled,
        #   i.e. [left, low) contains all small numbers and (high, right] contains all big ones.
        # - pivot_pos indicates the pivot position, it should be equivalent to either low or high
        # - initially low, high are set to be left, right to indicate no numbers are handled
        # - termination condition: low = high. it also indicates the final pivot pos
        low, high = left, right
        while low != high:
            if pivot_pos == low:
                # compare at higher end because at lower there's nothing to compare
                if l[high] < pivot_val:
                    # move small number to right, and increment low to indicate
                    # the left handled range extends
                    l[low], l[high] = l[high], l[low]
                    pivot_pos = high
                    low += 1
                else:
                    # big number already at left, decrease high to indicate the
                    # right handled range extends
                    high -= 1
            elif pivot_pos == high:
                # compare at lower end
                if l[low] > pivot_val:
                    l[low], l[high] = l[high], l[low]
                    pivot_pos = low
                    high -= 1
                else:
                    low += 1
            else:
                # assert could not happen
                assert(false)
        return pivot_pos
    # corner case: when left == right, partition function will work and return the only
    # element as pivot. Then the logic won't be broken
    if left > right:
        return
    pivot_pos = partition(l, left, right)
    qsort(l, left, pivot_pos - 1)
    qsort(l, pivot_pos + 1, right)
    return