持续到2月,坚持至少一周2更, on algorithms…
Use case of a bitvec:
v = bitvec(capacity)
v.set(45322)
v.set(1000234)
…
if v.has(453222): print(‘ok it’s there’)
Data structure:
bytearray
Key concepts on storage:
- 1M bytes == 8M bits, meaning bitvec on 1MB memory will be able to handle 8M numbers, with the max number to be 8M – 1 (assuming we need to store 0 in the bitvec)
- 10M numbers requests 10/8M bytes – Assuming ‘M’ is always 2^20, not necessarily 1,000,000.
Key implementation problems:
In the case of v.set(num),
- What is the corresponding position (naming it as pos) for num in the underlying bytearray?
pos = num / 8 (or num >> 3)
- What is the corresponding mask index (naming it as r, remainder) in bytearray[pos]?
r = num % 8 (or num & 0×7)
The implementation of bit vector:
class prog_1_2_bitvec: """ >>> v = prog_1_2_bitvec(6) >>> v.set(2) >>> v.set(5) >>> print('%x' % v.bytes[0]) # should be 100100 24 >>> v.has(2) True >>> v.has(5) True >>> v.unset(2) >>> v.has(2) False >>> v.unset(5) >>> v.has(5) False >>> v.set(3) >>> v.has(3) True """ def __init__(self, max): if max <= 0: raise Exception('invalid max - length <= 0') self.bytes = bytearray(int(max / 8 + 1)) self.max = max def set(self, num): self._valid(num) pos, mask = self._divide(num) self.bytes[pos] |= mask def has(self, num): self._valid(num) pos, mask = self._divide(num) return self.bytes[pos] & mask == mask def unset(self, num): self._valid(num) pos, mask = self._divide(num) self.bytes[pos] &= ~mask def count(self): # lie: TODO: best way to count the number of 1 # will cover it later pass def _divide(self, num): d = num >> 3 r = num & 0x7 return (d, 1 << r) def _valid(self, num): if num < 0: raise Exception('invalid num - negative number') if num >= self.max: raise Exception('invalid num - num > %d' % self.max)
The implementation of multi-pass bitvec sort:
def prog_1_3_bitsort_multipass(l, max, k): """ TODO: margin conditions... >>> max = 1 << 8 >>> l = prog_1_4_gen_list(max, max >> 1) >>> print(l[0:20]) # doctest:+ELLIPSIS [...] >>> l = prog_1_3_bitsort_multipass(l, max, 2) >>> print(l[0:20]) # doctest:+ELLIPSIS [...] >>> test_verify_sorted(l) True """ def sort_singlepass(l, max, i, k): # assumptions: # 1. let slice = max / k, the func is to handle numbers from range i * slice ~ (i + 1) * slice # 2. The mapping relationship is n => n - i * slice # exception cases (needs assertion): # 1. max / k has remainder slice = int(max / k) upper = (i + 1) * slice lower = i * slice v = prog_1_2_bitvec(slice) for n in l: if n < lower or n >= upper: continue mapped_n = n - lower v.set(mapped_n) sortedlist = [n + i * slice for n in range(slice) if v.has(n)] return sortedlist sortedlist = [] for i in range(0, k): sortedlist += sort_singlepass(l, max, i, k) return sortedlist
Hints:
- Overall: define use case – UT, method signature, etc
- Write logic: write mainstraem logic with notes (in comments) or assertions for exception / corner cases.
- Organize mind: cases!!
Tests (TODO):
- Areas: UT – like UT for sort_singlepass, basic E2E (positive), negative like to pass numbers less than 0 or exceeding data volume, code coverage (previous test cases may have covered most), performance