[Programming Pearls] Column 2 – Binary Search

So, just a simple implementation of binary search…

def prog_2_A_binary_search(l, a, return_insertion_point = False):
    """
    >>> l = [1,20,34,45,76,87,90,100,103,298,500]
    >>> print(prog_2_A_binary_search(l, 298))
    9
    >>> print(prog_2_A_binary_search(l, 999))
    -1
    >>> print(prog_2_A_binary_search(l, 999, True))
    11
    >>> print(prog_2_A_binary_search(l, -22, True))
    0
    """
    lower = 0
    upper = len(l) - 1
    # on loop end condition:
    # [corner case] when lower == upper, we can still continue with
    # the logic as then mid will equal to lower / upper, so it won't break the logic
    while lower <= upper:
        # on calculation of mid:
        # 1. it will probably only happen to C: if we use (lower + upper) / 2,
        #    there will be risk where lower + upper exceeds the integer limitation
        # 2. as the remainder will be truncated, the mid is always either the middle one,
        #    or the one to the left of middle - which means if numbers cannot be found,
        #    upper can be within range [-1, len(l) - 1], lower can be within range [0, len(l)]
        mid = lower + ((upper - lower) >> 1)
        if a > l[mid]:
            lower = mid + 1
        elif a < l[mid]:
            upper = mid - 1
        else:
            return mid
    # on return value:
    # if we want to return a 'insertion point', meaning if
    # a is not in l, return i where i is either
    # 1. equivalent to len(l) if l[len - 1] < a
    # 2. l[i] > a
    # we can use return lower
    return lower if return_insertion_point else -1

[Programming Pearls] Column 1 – uninitialized space

Source:
Problem 1.9

Problem Description:
Reserve a set of memory without initialization, meaning the data in there is random. Name the memory as ‘data’.
Design a mechanism to track and tell whether given index in data is initialized or not.

Algorithm 1:
1. Use a set data structure (naming it to initialized_set) to store initialized indices. Initially it is empty.
2. Write value to data[i]: need to initialized_set.add(i)
3. Access value at data[i]: need to check initialized_set.has(i)
Flaw: ops on initialized_set are O(logN).

Algorithm 2:
Purpose: to reduce op cost on initialized_set from O(logN) to O(N).
1. Use array initialized_arr to store initialized indices. Initially it has length of len(data).
2. Use array track_positions: track_positions[i] stores index in initialized_arr, i.e. initialized_arr[track_positions[i]] should equal to i if data[i] is initialized. Initially it has length of len(data).
3. Use a int variable ‘track’ to indicate (assert) initialized_set[0:track] is valid set.
4. Write value to data[i]: need to set initialized_arr[track] = i; track_positions[i] = track; track += 1.
5. Access value at data[i]: need to check initialized_arr[track_positions[i]] == i && track > track_positions[i]

[Programming Pearls] Column 1 – Non Dup Random Numbers

Source:
Problem 4

Problem:
Generate k random numbers with no duplicates

Solution 1:

def prog_1_4_gen_list(max, count):
    # TODO: avoid dead loop....
    i = 0
    v = prog_1_2_bitvec(max)
    l = []
    while (i < count):
        num = random.randint(0, max - 1)
        if not v.has(num):
            v.set(num)
            i += 1
            l.append(num)
    return l

Solution 2:
In each cycle of the loop:
- numbers[i + 1 :] is non-dup number pool
- numbers[i] will be a number that randomly chosen from number pool

def prog_1_4_gen_list_2(max, count):
    numbers = [n for n in range(0, max)]
    for i in range(0, count):
        randnum = random.randint(0, max - 1)
        numbers[i], numbers[randnum] = numbers[randnum], numbers[i]
    return numbers[: count]