[Programming Pearls] Column 2 – Find anagrams

Problem:
Given a list of words, collect anagrams.

Solution:
For each word, generate its signature by sorting its chars. And then collect words with same signature.

Code:

def prog_2_C_find_anagrams(l):
    """
    >>> prog_2_C_find_anagrams(['spot', 'stop', 'tops', 'deal', 'lead'])
    defaultdict(<class 'list'>, {'opst': ['spot', 'stop', 'tops'], 'adel': ['deal', 'lead']})
    """
    from collections import defaultdict
    result = defaultdict(list)
    for word in l:
        word = word.lower()
        charlist = list(word)
        sorted_charlist = sorted(charlist)
        normalized_word = "".join(sorted_charlist)
        result[normalized_word].append(word)
    return result

[Programming Pearls] Column 2 – Shift list

Problem:
(Right) shift given string by n characters.

Solution:
Case: Start with a concrete sample like “abcdefgh”, shift by 2 chars. Result will be like “ghabcdef”.

Model #1:
Note that “gh” will be finally at leftmost – so the process can be like
1. swap “gh” and “ab” => “ghcdefab”
2. shift “cdefab” by 2
Note that above process is recursive. Basically if we name “ab” as A, “cdef” as B, “gh” as C. Then the problem is to change ABC to CAB. So after step 1 we got CBA. Then the problem is converted to handle substring “BA” in same way – basically, separate “BA” to “B1B2A” where B1 and A have equivalent length, then swap to get AB2B1, and then the problem is on B2B1, etc.
There are some points that worth attention, like if we model the string to ABC (len(A) == len(C), what if len(A) > len(ABC) / 2?

Model #2:
Based on what’s being analyzed in model #1, if we model the string to AB then the problem is how to convert AB to BA. With the introduction of primitive ‘revert’ operation, it can be modelled like: revert AB to get B1A1, revert B1 and A1 one by one to get BA.

Code for model #2

def prog_2_B_shift(s, n):
    """
    >>> print(prog_2_B_shift('abcdefgh', 5))
    defghabc
    >>> print(prog_2_B_shift('abcdefgh', 2))
    ghabcdef
    >>> print(prog_2_B_shift('abcdefgh', 8))
    abcdefgh
    >>> print(prog_2_B_shift('abcdefgh', 11))
    fghabcde
    """
    def reverse(l, left, right):
        # TODO: check out of range
        # better use assertion since the expected caller will
        # be from internal
        while right > left:
            l[left], l[right] = l[right], l[left]
            left += 1
            right -= 1
    # TODO: check if l / s is empty
    # TODO: check if n < 0
    # better use if - return or if - throw because caller is
    # from outside - unless we got confirmation on the input data
    l = list(s)
    count = len(l)
    if (n > count): n = n - count
    reverse(l, 0, count - 1)
    reverse(l, 0, n - 1)
    reverse(l, n, count - 1)
    return ''.join(l)

[Programming Pearls] Column 2 – Find Missing Number

Problem:
Similarly to the problem A described in column 2: given a range [lower_bound, upper_bound], and a sorted list l; elements in l are within given range; some numbers within given range are missing in the list; find out one.

Solution #1:
Similarly to binary search, the key is to divide the list and refine the range in cycle. On each loop cycle:
1. Check end condition #1: the first element > lower_bound, or last element < upper_bound. 2. Divide the 'list' into two sub lists (find pivot index in the middle, and split the list to [0, pivot] and [pivot + 1, last]. 3. Check end condition #2: compare l[pivot] and l[pivot + 1] and check if difference is 1. 4. Compare the length of each sublist, to the value difference (sublist[len(sublist)-1] - sublist[0]). If length is less, then update the loop variables: list <= sublist, range [sublist[0], sublit[last]]. Code for solution #1:

def prog_2_A_find_missing(l, lower_bound, upper_bound):
    """
    >>> print(prog_2_A_find_missing([1, 2, 3, 5, 7, 8], 1, 8))
    4
    >>> print(prog_2_A_find_missing([1, 2, 3, 4, 6, 7], 1, 7))
    5
    """
    lower = 0
    upper = len(l) - 1
    missing = -1
    # on loop condition:
    # '=' means the list to be scanned can only contain one element
    # loop assertions:
    # - l[lower : upper] (including l[upper])indicates sorted list that contains missing numbers.
    # - range(lower_bound, upper_bound) indicates the range that missing numbers are within
    # - on each cycle, problem can be defined as:
    #   sorted list l[lower : upper] contains numbers within range(lower_bound, upper_bound),
    #   try to find the missing number (a number that is within range but not present in the list)
    while lower <= upper:
        # end conditions: found that given boundaries exceed list boundaries
        if l[lower] > lower_bound:
            missing = lower_bound
            break
        if l[upper] < upper_bound:
            missing = upper_bound
            break
        # deal with case where pivot_right > upper
        # this could happen when there is only one element in the list
        pivot_left = lower + ((upper - lower) >> 1)
        pivot_right = pivot_left + 1
        if pivot_right > upper:
            break
        # another end condition:
        # the missing one is between pivot_left and pivot_right
        if l[pivot_right] - l[pivot_left] > 1:
            missing = l[pivot_right] - 1
            break
        gap_left = l[pivot_left] - l[lower] + 1
        gap_right = l[upper] - l[pivot_right] + 1
        len_left = pivot_left - lower + 1
        len_right = upper - pivot_right + 1
        # compare length to find out which section contains missing element
        if gap_left > len_left:
            # find missing in left section
            upper = pivot_left
            upper_bound = l[upper]
            # does not really matter whether we have following statement
            # lower_bound = l[lower]
        elif gap_right > len_right:
            # find missing in right section
            lower = pivot_right
            lower_bound = l[lower]
            # does not really matter whether we have following statement
            # upper_bound = l[upper]
        # if they all equal, no missing element
        else:
            break
    return missing

Solution #2:
The solution is similar, except it does not require the list is sorted.
On each cycle, the list will be divided to two groups, one containing even numbers and the other containing odd numbers.

def prog_2_A_find_missing_bitwise(l, lower_bound, upper_bound):
    """
    >>> print(prog_2_A_find_missing_bitwise([1, 2, 3, 5, 7, 8], 1, 8))
    6
    >>> print(prog_2_A_find_missing_bitwise([1, 2, 3, 4, 6, 7], 1, 7))
    5
    """
    def split(numbers):
        set_odd = [num >> 1 for num in numbers if num & 1 == 1]
        set_even = [num >> 1 for num in numbers if num & 1 == 0]
        return (set_odd, set_even)
    # TODO: odd count of numbers
    # corner case: when numbers set has only one element, it will still work
    # cause either set_odd or set_even will be empty and that one will be the missing one
    numbers = l
    numbers_range = range(lower_bound, upper_bound + 1)
    # count = (upper_bound - lower_bound + 1) >> 1
    bit_index = 0
    num_missing = 0
    # loop assertions:
    # - the list numbers contains missing numbers
    # - bit_index and num_missing records the path to calculate missing number
    while len(numbers) > 0:
        set_odd, set_even = split(numbers)
        range_odd, range_even = split(numbers_range)
        if len(set_odd) < len(range_odd):
            numbers = set_odd
            numbers_range = range_odd
            num_missing += (1 << bit_index)
        elif len(set_even) < len(range_even):
            numbers = set_even
            numbers_range = range_even
            num_missing += (0 << bit_index) # can be ignored, though...
        #count >>= 1
        bit_index += 1
    return num_missing