Tag Archives: Fight

[Programming Pearls] Column 1 – uninitialized space

Source:
Problem 1.9

Problem Description:
Reserve a set of memory without initialization, meaning the data in there is random. Name the memory as ‘data’.
Design a mechanism to track and tell whether given index in data is initialized or not.

Algorithm 1:
1. Use a set data structure (naming it to initialized_set) to store initialized indices. Initially it is empty.
2. Write value to data[i]: need to initialized_set.add(i)
3. Access value at data[i]: need to check initialized_set.has(i)
Flaw: ops on initialized_set are O(logN).

Algorithm 2:
Purpose: to reduce op cost on initialized_set from O(logN) to O(N).
1. Use array initialized_arr to store initialized indices. Initially it has length of len(data).
2. Use array track_positions: track_positions[i] stores index in initialized_arr, i.e. initialized_arr[track_positions[i]] should equal to i if data[i] is initialized. Initially it has length of len(data).
3. Use a int variable ‘track’ to indicate (assert) initialized_set[0:track] is valid set.
4. Write value to data[i]: need to set initialized_arr[track] = i; track_positions[i] = track; track += 1.
5. Access value at data[i]: need to check initialized_arr[track_positions[i]] == i && track > track_positions[i]

[Programming Pearls] Column 1 – Non Dup Random Numbers

[Programming Pearls] Column 1 – bitvec

Leave a reply

持续到2月，坚持至少一周2更, on algorithms…

Use case of a bitvec:
v = bitvec(capacity)
v.set(45322)
v.set(1000234)
…
if v.has(453222): print(‘ok it’s there’)

Data structure:
bytearray

Key concepts on storage:
- 1M bytes == 8M bits, meaning bitvec on 1MB memory will be able to handle 8M numbers, with the max number to be 8M – 1 (assuming we need to store 0 in the bitvec)
- 10M numbers requests 10/8M bytes – Assuming ‘M’ is always 2^20, not necessarily 1,000,000.

Key implementation problems:
In the case of v.set(num),
- What is the corresponding position (naming it as pos) for num in the underlying bytearray?
pos = num / 8 (or num >> 3)
- What is the corresponding mask index (naming it as r, remainder) in bytearray[pos]?
r = num % 8 (or num & 0×7)

The implementation of bit vector:

class prog_1_2_bitvec:
    """
    >>> v = prog_1_2_bitvec(6)
    >>> v.set(2)
    >>> v.set(5)
    >>> print('%x' % v.bytes[0]) # should be 100100
    24
    >>> v.has(2)
    True
    >>> v.has(5)
    True
    >>> v.unset(2)
    >>> v.has(2)
    False
    >>> v.unset(5)
    >>> v.has(5)
    False
    >>> v.set(3)
    >>> v.has(3)
    True
    """

    def __init__(self, max):
        if max <= 0: raise Exception('invalid max - length <= 0')
        self.bytes = bytearray(int(max / 8 + 1))
        self.max = max

    def set(self, num):
        self._valid(num)
        pos, mask = self._divide(num)
        self.bytes[pos] |= mask

    def has(self, num):
        self._valid(num)
        pos, mask = self._divide(num)
        return self.bytes[pos] & mask == mask

    def unset(self, num):
        self._valid(num)
        pos, mask = self._divide(num)
        self.bytes[pos] &= ~mask

    def count(self):
        # lie: TODO: best way to count the number of 1
        # will cover it later
        pass

    def _divide(self, num):
        d = num >> 3
        r = num & 0x7
        return (d, 1 << r)

    def _valid(self, num):
        if num < 0: raise Exception('invalid num - negative number')
        if num >= self.max: raise Exception('invalid num - num > %d' % self.max)

The implementation of multi-pass bitvec sort:

def prog_1_3_bitsort_multipass(l, max, k):
    """
    TODO: margin conditions...

    >>> max = 1 << 8
    >>> l = prog_1_4_gen_list(max, max >> 1)
    >>> print(l[0:20])  # doctest:+ELLIPSIS
    [...]
    >>> l = prog_1_3_bitsort_multipass(l, max, 2)
    >>> print(l[0:20])  # doctest:+ELLIPSIS
    [...]
    >>> test_verify_sorted(l)
    True
    """
    def sort_singlepass(l, max, i, k):
        # assumptions:
        # 1. let slice = max / k, the func is to handle numbers from range i * slice ~ (i + 1) * slice
        # 2. The mapping relationship is n => n - i * slice
        # exception cases (needs assertion):
        # 1. max / k has remainder
        slice = int(max / k)
        upper = (i + 1) * slice
        lower = i * slice
        v = prog_1_2_bitvec(slice)
        for n in l:
            if n < lower or n >= upper: continue
            mapped_n = n - lower
            v.set(mapped_n)
        sortedlist = [n + i * slice for n in range(slice) if v.has(n)]
        return sortedlist

    sortedlist = []
    for i in range(0, k):
        sortedlist += sort_singlepass(l, max, i, k)
    return sortedlist

Hints:
- Overall: define use case – UT, method signature, etc
- Write logic: write mainstraem logic with notes (in comments) or assertions for exception / corner cases.
- Organize mind: cases!!

Tests (TODO):
- Areas: UT – like UT for sort_singlepass, basic E2E (positive), negative like to pass numbers less than 0 or exceeding data volume, code coverage (previous test cases may have covered most), performance

Overskill…

每天灌水一点点

Tag Archives: Fight

[Programming Pearls] Column 1 – uninitialized space

[Programming Pearls] Column 1 – Non Dup Random Numbers

[Programming Pearls] Column 1 – bitvec