# [The Beauty of Programming] Find the Kth biggest number within list of N numbers

Again it’s been a while after previous post…

O(nlogn) – Sort
Sort the list, and retrieve the nth one.

O(n*k) – Partial Sort
There can be multiple ways to do partial sort, one of them being selective sort: loop the list – first cycle get the biggest one; second cycle get the second one etc. till the Kth cycle. On each cycle we can move the number to the front, so that l[0:K] is sorted after Kth cycle.

O(n*logk) – Binary Search
This can be a little bit tricky but the idea is like binary search – divide and conquer:

```def findKth(l, left, right, k):
pivot_pos = partition(l, left, right)
la = pivot_pos - left + 1
lb = right - pivot_pos
if la > k:
return findKth(l, left, pivot_pos, k)
else:
return findKth(l, pivot_pos + 1, right, k - la)
```

O(n*logk) – Use a data structure to track K numbers
The idea is: use a data structure to store first K numbers. Initially, put first K numbers in L to the data structure; then loop through the list: for each element, check if the min number of the data structure is less than it, if not then pass, else remove the min number in the datastructure, and put the number in. Apparently min heap is a valid data structure to do it. O(n*logk)

O(n) – bitvec
Again make use of bitvec if possible….

# [The Beauty of Programming] Max Sum of Subarray

Algorithm 1 – O(n^2): enumerate all possible subarray
This is pretty straightforward, but second program improves the performance by not repeating calculating sum.

```def max_subarray_enum(l):
"""
>>> l = [43, 23, -3, 32, -43, -22, 12, 3, -4, 5]
>>> print(max_subarray_enum(l))
95
>>> l = [-3, 32, -43, -22, 12, 3, -4, 5]
>>> print(max_subarray_enum(l))
32
"""
length = len(l)
def sum(i, len):
# assertion
val = 0
for n in range(0, len):
val += l[i + n]
return val
# loop invariant:
# l[i:i+len] is subarray which we need to calculate the sum
# termination: enumerated all possible i j combinations
max_val = l[0]
for i in range(0, length):
for subarray_len in range(1, length - i + 1):
val = sum(i, subarray_len)
if val > max_val: max_val = val
return max_val
```
```def max_subarray_enum2(l):
"""
>>> l = [43, 23, -3, 32, -43, -22, 12, 3, -4, 5]
>>> print(max_subarray_enum2(l))
95
>>> l = [-3, 32, -43, -22, 12, 3, -4, 5]
>>> print(max_subarray_enum2(l))
32
"""
length = len(l)
# loop invariant:
# l[i:i+len] is subarray which we need to calculate the sum
# termination: enumerated all possible i j combinations
max_val = l[0]
for i in range(0, length):
val = 0
for subarray_len in range(1, length - i + 1):
val += l[i + subarray_len - 1]
if val > max_val: max_val = val
return max_val
```

Algorithm 2 – O(nlogn): divide and conquer

```def max_subarray_divide(l, left, right):
"""
>>> l = [43, 23, -3, 32, -43, -22, 12, 3, -4, 5]
>>> print(max_subarray_divide(l, 0, len(l) - 1))
95
>>> l = [-3, 32, -43, -22, 12, 3, -4, 5]
>>> print(max_subarray_divide(l, 0, len(l) - 1))
32
"""
# divide and conquer:
# max = max_val(
#   recur(firsthalf),
#   recur(secondhalf),
#   max_sum_left + max_sum_right
# )
def max_val(vals):
# TODO: assert
max = vals[0]
for val in vals:
if max < val: max = val
return max
def max_sum(start, end, step):
if step > 0 and start > end:
return 0
elif step < 0 and end > start:
return 0
index_range = range(start, end + step, step)
max = l[index_range[0]]
sum = max
for i in index_range[1:]:
sum += l[i]
if max < sum: max = sum
return max

if left > right:
return 0  # actually, this one should be minus infinity
if left == right:
return l[left]

mid = left + ((right - left) >> 1)
# NOTE for the following statement, if we put max_sum(mid, right, 1)
# then there will be dead recursive loop
max_sum_right = max_sum(mid + 1, right, 1)
max_sum_left = max_sum(mid, left, -1)
max_sum_first = max_subarray_divide(l, left, mid)
max_sum_right = max_subarray_divide(l, mid + 1, right)

return max_val([max_sum_first, max_sum_right, max_sum_left + max_sum_right])
```

Algorithm 3 – O(n) with O(n) space: dynamic programming

```def max_subarray(l):
"""
>>> l = [43, 23, -3, 32, -43, -22, 12, 3, -4, 5]
>>> print(max_subarray(l))
95
>>> l = [-3, 32, -43, -22, 12, 3, -4, 5]
>>> print(max_subarray(l))
32
"""
# following are the dynamic programming vars:
# - tail_max[n] means max subarray that ends at position n
# - max[n] means max subarray in array l[0:n+1]
#
# following are the dynamic programming equations:
# - tail_max[n] = max(tail_max[n-1]+l[n], l[n])
# - max[n] = max(max[n-1], tail_max[n])
#
def max_val(a, b):
return a if a > b else b
# init data structures
length = len(l)
max = [None] * length
tail_max = [None] * length
# initial values
if len(l) == 0:
return 0
max[0] = l[0]
tail_max[0] = l[0]
# dynamic programming: loop
for n in range(1, length):
tail_max[n] = max_val(tail_max[n-1]+l[n], l[n])
max[n] = max_val(max[n-1], tail_max[n])
return max[n]
```

Algorithm 4 – O(n) with O(1) space: dynamic programming with improvement

```def max_subarray_o1space(l):
"""
>>> l = [43, 23, -3, 32, -43, -22, 12, 3, -4, 5]
>>> print(max_subarray_o1space(l))
95
>>> l = [-3, 32, -43, -22, 12, 3, -4, 5]
>>> print(max_subarray_o1space(l))
32
"""
def max_val(a, b):
return a if a > b else b
# initial values
if len(l) == 0:
return 0
max_prev = l[0]
tail_max_prev = l[0]
# init data structures
max = max_prev
tail_max = tail_max_prev
# dynamic programming: loop
for n in range(1, len(l)):
tail_max = max_val(tail_max_prev+l[n], l[n])
max = max_val(max_prev, tail_max)
tail_max_prev = tail_max
max_prev = max
return max
```

2D array?…
OK, I’ll ignore the code here….but it can be converted to 1D array like:

```for upper_bound in range(0, first_dim_length(matrix)):
for lower_bound in range(upper_bound, first_dim_length(matrix):
merge the 2D array in within upper and lower bound to 1D array
use dynamic programming approach to solve it
```

# [Fight] Sort Algorithms

Ran into tons of production issues yesterday…still lots of ongoing issue. So I am going to keep the recent posts short. This one is for sort algorithms.

Insert Sort 1: use binary search to find the insert point

```def insert_sort(l):
"""
>>> l = [32, 64, 15, 38, 87, 13, 53, 12]
>>> insert_sort(l)
>>> print(l)
[12, 13, 15, 32, 38, 53, 64, 87]
>>> l = prog_1_4_gen_list_2(1 << 8, 99)
>>> insert_sort(l)
>>> test_verify_sorted(l)
True
"""
# loop invariant:
# i indicates l[0:i] is sorted and l[i] needs to be inserted
# init: i = 1
# terminate: i = len(l)
for i in range(1, len(l)):
val = l[i]
# find insert point
insert_point = prog_2_A_binary_search(l[0:i], l[i], True)
# right shift the l[insert_point:i] by 1
# loop invariant:
# j indicates the current empty position, and l[j-1] needs to be shifted
# init: j = i
# termination: j = insert_point
for j in range(i, insert_point, -1):
l[j] = l[j - 1]
# insert the value into the position
l[insert_point] = val
```

Insert Sort 2: scan back while shifting

```def insert_sort2(l):
"""
>>> l = [32, 64, 15, 38, 87, 13, 53, 12]
>>> insert_sort2(l)
>>> print(l)
[12, 13, 15, 32, 38, 53, 64, 87]
"""
# loop invariant:
# i indicates l[0:i] is sorted and l[i] needs to be inserted
# init: i = 1
# termination: i = len(l)
for i in range(1, len(l)):
val = l[i]
# find insert point, while shifting from left
# loop invariant, j indicates:
# - current empty position, and l[j-1] needs to be shifted
# - l[j:i] contains all elements > val
# - l[0:j] are to be scanned
# termination: j == 0 (no l[j - 1]), or l[j - 1] > val
j = i
while j > 0 and val < l[j - 1]:
l[j] = l[j - 1]
j -= 1
# need to insert the value to the insert point
l[j] = val
```

Heap Sort: heap_insert and heap_pop primitives

```def heap_sort(l):
"""
>>> l = [106, 136, 124, 63, 76, 147, 92, 35]
>>> heap_sort(l)
>>> print(l)
[35, 63, 76, 92, 106, 124, 136, 147]
>>> l = prog_1_4_gen_list_2(1 << 8, 99)
>>> heap_sort(l)
>>> test_verify_sorted(l)
True
"""
def heap_insert(l, end):
# input contract:
# - l[0:end] is already heap
# - end is the new element to insert
# output contact:
# l[0:end+1] is heap
#
# process: loop to 'bubble up' the l[end] element.
# invariant:
# - p to indicate the current parent node to compare
# - pos to indicate the current position of the new element
# - terminal condition: p is out of scope, or l[p] is already bigger
def parent(pos):
return (pos-1) >> 1
assert(end >= 1)
p = parent(end)
pos = end
val = l[pos]
while p >= 0 and l[p] < val:
l[p], l[pos] = l[pos], l[p]
pos = p
p = parent(pos)
return
def heap_pop(l, end):
# input contract (assuming len = len(l)):
# - l[0:end+1] is heap
# output contract:
# - l[0:end] is heap
# - l[end] is the element that is pop up
#
# firstly, move the last element to be the new heap top;
l[0], l[end] = l[end], l[0]
# secondly, 'bubble down' the top element
# loop invariant:
# - n to indicate the current location of the element
# termination: n is the leaf node, or n is already bigger
def left_child(pos):
return (n << 1) + 1
n = 0
val = l[n]
left_child_n = left_child(n)
while left_child_n < end:
right_child_n = left_child_n + 1
# case 1: only left child exists
if right_child_n >= end:
if l[left_child_n] > val:
l[left_child_n], l[n] = l[n], l[left_child_n]
# maintain loop invariant
n = left_child_n
left_child_n = left_child(n)
else:
# termination condition: val is bigger than all its children
break
# case 2: both left and right exists
else:
if val >= l[right_child_n] and val >= l[left_child_n]:
# termination condition: val is bigger than all its children
break
else:
# need to swap n with the bigger one
big_child_n = left_child_n
if l[left_child_n] < l[right_child_n]:
big_child_n = right_child_n
l[big_child_n], l[n] = l[n], l[big_child_n]
# maintain loop invariant
n = big_child_n
left_child_n = left_child(n)
return
# heap sort
# step 1 - heaplify
length = len(l)
for i in range(1, len(l)):
heap_insert(l, i)
# step 2 - pop
for i in range(0, len(l)):
heap_pop(l, length - 1 - i)
return
```

Quick Sort: loop invariant to help track the pivot

```def qsort(l, left, right):
"""
>>> l = [32, 64, 15, 38, 87, 13, 53, 12]
>>> qsort(l, 0, len(l) - 1)
>>> qsort(l, 0, len(l) - 1)
>>> print(l)
[12, 13, 15, 32, 38, 53, 64, 87]
>>> l = prog_1_4_gen_list_2(1 << 8, 99)
>>> qsort(l, 0, len(l) - 1)
>>> test_verify_sorted(l)
True
"""
def partition(l, left, right):
# choose random pivot position and swap it to the first one
pivot_pos = random.randint(left, right)
l[pivot_pos], l[left] = l[left], l[pivot_pos]
pivot_pos = left
pivot_val = l[left]
# loop to move all the big numbers to the right side of pivot,
# and all the small numbers to the left side
#
# invariants:
# - [low, high] indicates the range where the numbers are not handled,
#   i.e. [left, low) contains all small numbers and (high, right] contains all big ones.
# - pivot_pos indicates the pivot position, it should be equivalent to either low or high
# - initially low, high are set to be left, right to indicate no numbers are handled
# - termination condition: low = high. it also indicates the final pivot pos
low, high = left, right
while low != high:
if pivot_pos == low:
# compare at higher end because at lower there's nothing to compare
if l[high] < pivot_val:
# move small number to right, and increment low to indicate
# the left handled range extends
l[low], l[high] = l[high], l[low]
pivot_pos = high
low += 1
else:
# big number already at left, decrease high to indicate the
# right handled range extends
high -= 1
elif pivot_pos == high:
# compare at lower end
if l[low] > pivot_val:
l[low], l[high] = l[high], l[low]
pivot_pos = low
high -= 1
else:
low += 1
else:
# assert could not happen
assert(false)
return pivot_pos
# corner case: when left == right, partition function will work and return the only
# element as pivot. Then the logic won't be broken
if left > right:
return
pivot_pos = partition(l, left, right)
qsort(l, left, pivot_pos - 1)
qsort(l, pivot_pos + 1, right)
return
```