持续到2月,坚持至少一周2更, on algorithms…
Use case of a bitvec:
v = bitvec(capacity)
v.set(45322)
v.set(1000234)
…
if v.has(453222): print(‘ok it’s there’)
Data structure:
bytearray
Key concepts on storage:
- 1M bytes == 8M bits, meaning bitvec on 1MB memory will be able to handle 8M numbers, with the max number to be 8M – 1 (assuming we need to store 0 in the bitvec)
- 10M numbers requests 10/8M bytes – Assuming ‘M’ is always 2^20, not necessarily 1,000,000.
Key implementation problems:
In the case of v.set(num),
- What is the corresponding position (naming it as pos) for num in the underlying bytearray?
pos = num / 8 (or num >> 3)
- What is the corresponding mask index (naming it as r, remainder) in bytearray[pos]?
r = num % 8 (or num & 0×7)
The implementation of bit vector:
class prog_1_2_bitvec:
"""
>>> v = prog_1_2_bitvec(6)
>>> v.set(2)
>>> v.set(5)
>>> print('%x' % v.bytes[0]) # should be 100100
24
>>> v.has(2)
True
>>> v.has(5)
True
>>> v.unset(2)
>>> v.has(2)
False
>>> v.unset(5)
>>> v.has(5)
False
>>> v.set(3)
>>> v.has(3)
True
"""
def __init__(self, max):
if max <= 0: raise Exception('invalid max - length <= 0')
self.bytes = bytearray(int(max / 8 + 1))
self.max = max
def set(self, num):
self._valid(num)
pos, mask = self._divide(num)
self.bytes[pos] |= mask
def has(self, num):
self._valid(num)
pos, mask = self._divide(num)
return self.bytes[pos] & mask == mask
def unset(self, num):
self._valid(num)
pos, mask = self._divide(num)
self.bytes[pos] &= ~mask
def count(self):
# lie: TODO: best way to count the number of 1
# will cover it later
pass
def _divide(self, num):
d = num >> 3
r = num & 0x7
return (d, 1 << r)
def _valid(self, num):
if num < 0: raise Exception('invalid num - negative number')
if num >= self.max: raise Exception('invalid num - num > %d' % self.max)
The implementation of multi-pass bitvec sort:
def prog_1_3_bitsort_multipass(l, max, k):
"""
TODO: margin conditions...
>>> max = 1 << 8
>>> l = prog_1_4_gen_list(max, max >> 1)
>>> print(l[0:20]) # doctest:+ELLIPSIS
[...]
>>> l = prog_1_3_bitsort_multipass(l, max, 2)
>>> print(l[0:20]) # doctest:+ELLIPSIS
[...]
>>> test_verify_sorted(l)
True
"""
def sort_singlepass(l, max, i, k):
# assumptions:
# 1. let slice = max / k, the func is to handle numbers from range i * slice ~ (i + 1) * slice
# 2. The mapping relationship is n => n - i * slice
# exception cases (needs assertion):
# 1. max / k has remainder
slice = int(max / k)
upper = (i + 1) * slice
lower = i * slice
v = prog_1_2_bitvec(slice)
for n in l:
if n < lower or n >= upper: continue
mapped_n = n - lower
v.set(mapped_n)
sortedlist = [n + i * slice for n in range(slice) if v.has(n)]
return sortedlist
sortedlist = []
for i in range(0, k):
sortedlist += sort_singlepass(l, max, i, k)
return sortedlist
Hints:
- Overall: define use case – UT, method signature, etc
- Write logic: write mainstraem logic with notes (in comments) or assertions for exception / corner cases.
- Organize mind: cases!!
Tests (TODO):
- Areas: UT – like UT for sort_singlepass, basic E2E (positive), negative like to pass numbers less than 0 or exceeding data volume, code coverage (previous test cases may have covered most), performance