Category Archives: TTTech

T_T Tech…

[Programming Pearls] Column 4 – Writing Correct Programs

The column mainly describes the topic on how to write a correct program.

Writing a program
On writing programs, it illustrates a top-down way: write high-level descriptive (declarative) p-code; and then for each p-code statement, break it into next level with more details; continue the process until the program is composed.

While top-down way in general is effective especially when you are solving algorithm problems where problem can be clearly defined (at some level…). Paul Graham and other lisp hackers usually like to mention the way bottom-up. IMHO, the bottom-up way, is generally applied in cases where the problem you try to solve is too big (like for example, design a CAD software) and unclear at some point. For such kind of problems, it is hard to grasp or define the problem clearly. The bottom-up will start by exploring problem domain, defining domain primitives and ‘buzzword’ (like to to rotate a line, then you discovered there will be strong need on matrix calculation features), helping you to capture and really understand the domain, and then finally solve the problem while defining / discovering it (meaning, define a IT solution for given business requirement, mostly described by business / domain people).

That went too far so I’ll stop here…

Validate a program
Mostly it’s bottom up and is based on assertions on invariants. According to different program structures, the invariants vary. Like typically for a loop structure, the invariants can be divided into – init, preserve and termination. For a function, (contract-based) it is more on input params and return values (output params). Anyway, words do not mean anything so I’ll just give an example of an incorrect program where the error is actually caused because function invariable is not maintained. It’s a simple recursive based binary search.

def prog_2_A_binary_search_recur_incorrect(l, a):
    # termination
    if len(l) == 0:
        return -1
    # binary search
    mid = (len(l) - 1) >> 1
    mid_val = l[mid]
    if a > mid_val:
        return prog_2_A_binary_search_recur(l[mid + 1 : ], a)
    elif a == mid_val:
        return mid
    else:
        return prog_2_A_binary_search_recur(l[ : mid], a)

The error is “mid” is actually related to current input l related to current recursive level, not the original l. So the fix is to add a ‘ref’ in there and maintain the offset like:

def prog_2_A_binary_search_recur(l, a, ref = 0):
    """
    >>> l = [1,20,34,45,76,87,90,100,103,298,500]
    >>> print(prog_2_A_binary_search_recur(l, 298))
    9
    >>> print(prog_2_A_binary_search_recur(l, 999))
    -1
    """
    # termination
    if len(l) == 0:
        return -1
    # binary search
    mid = (len(l) - 1) >> 1
    mid_val = l[mid]
    if a > mid_val:
        return prog_2_A_binary_search_recur(l[mid + 1 : ], a, ref + mid + 1)
    elif a == mid_val:
        return ref + mid
    else:
        return prog_2_A_binary_search_recur(l[ : mid], a, ref)

[Programming Pearls] Column 3 – Questionary

Problem
Given a list of feedbacks to questionary, output the statistics report, grouped by ethnic group.
One feedback record contains info like ethnic group, gender, citizenship, etc.

Solution
This is a simple problem. The key here is to choose and clearly define the data structures. Instinctively,

1. Feedback record: One feedback record is naturally a dictionary like {‘ethnic’:'African American’, ‘gender’:'Male’, ‘citizenship’:'US’ }.
2. Statistics: It is a list of statistics record, where each record is identified by ethnic group and contains several statistics fields. Likewise, one statistics record is naturally a dictionary like {‘ethnic’:'African American’, ‘Gender_Male’:3, ‘Gender_Female’:3, ‘citizenship_US’: 4, ‘citizenship_permvisa’: 1, …}

Side note: the statistics table can be generated by SQL-like statement (cause it’s actually aggregate query) like: select ethnic, count(gender == ‘male’) as gender_male, … group by ethnic

Code:
Based on data structure definition, code is straightforward:

def prog_3_A_questionary(l):
    """
    >>> l = []
    >>> l.append(dict(ethnic='African American', gender='Male', citizenship='Perm Visa'))
    >>> l.append(dict(ethnic='African American', gender='Male', citizenship='US Citizen'))
    >>> l.append(dict(ethnic='African American', gender='Male', citizenship='Perm Visa'))
    >>> l.append(dict(ethnic='African American', gender=None, citizenship='Perm Visa'))
    >>> l.append(dict(ethnic='African American', gender='Female', citizenship='Perm Visa'))
    >>> l.append(dict(ethnic='African American', gender='Female', citizenship='Temp'))
    >>> l.append(dict(ethnic='African American', gender='Female', citizenship='Perm Visa'))
    >>> l.append(dict(ethnic='African American', gender='Female', citizenship=None))
    >>> l.append(dict(ethnic='Spanish Surname', gender='Female', citizenship='Perm Visa'))
    >>> l.append(dict(ethnic='Spanish Surname', gender='Female', citizenship='Temp'))
    >>> l.append(dict(ethnic='Spanish Surname', gender='Female', citizenship='Perm Visa'))
    >>> l.append(dict(ethnic='Asian American', gender='Male', citizenship='Perm Visa'))
    >>> from pprint import pprint
    >>> pprint(prog_3_A_questionary(l))
    {'African American': {'Total': 8,
                          'citizenship_Perm Visa': 5,
                          'citizenship_Temp': 1,
                          'citizenship_US Citizen': 1,
                          'ethnic_African American': 8,
                          'gender_Female': 4,
                          'gender_Male': 3},
     'Asian American': {'Total': 1,
                        'citizenship_Perm Visa': 1,
                        'ethnic_Asian American': 1,
                        'gender_Male': 1},
     'Spanish Surname': {'Total': 3,
                         'citizenship_Perm Visa': 2,
                         'citizenship_Temp': 1,
                         'ethnic_Spanish Surname': 3,
                         'gender_Female': 3}}
    """
    # TODO: actually, we should write code to initialize the stat fields like
    # 'Gender_Male' to be 0. This will help:
    # - reduce the runtime check logic to see if the field is already there or not
    # - initialize all the stat_fields to 0, in order to prevent any missing stat_fields.
    from collections import defaultdict
    stat = {}
    for record in l:
        # get stat record by 'group key' - ethnic
        ethnic = record['ethnic']
        if not ethnic in stat:
            stat_record = {}
            stat[ethnic] = stat_record
        stat_record = stat[ethnic]
        # increase 'Total'
        if not 'Total' in stat_record:
            stat_record['Total'] = 0
        stat_record['Total'] += 1
        # increase stat fields like gender_Male
        for key, val in record.items():
            # handle missing answer
            if val == None:
                continue
            stat_field = '%s_%s' % (key, val)
            if not stat_field in stat_record:
                stat_record[stat_field] = 0
            stat_record[stat_field] += 1
    return stat

[Programming Pearls] Column 2 – Find anagrams

Problem:
Given a list of words, collect anagrams.

Solution:
For each word, generate its signature by sorting its chars. And then collect words with same signature.

Code:

def prog_2_C_find_anagrams(l):
    """
    >>> prog_2_C_find_anagrams(['spot', 'stop', 'tops', 'deal', 'lead'])
    defaultdict(<class 'list'>, {'opst': ['spot', 'stop', 'tops'], 'adel': ['deal', 'lead']})
    """
    from collections import defaultdict
    result = defaultdict(list)
    for word in l:
        word = word.lower()
        charlist = list(word)
        sorted_charlist = sorted(charlist)
        normalized_word = "".join(sorted_charlist)
        result[normalized_word].append(word)
    return result