Problem
Given a list of feedbacks to questionary, output the statistics report, grouped by ethnic group.
One feedback record contains info like ethnic group, gender, citizenship, etc.
Solution
This is a simple problem. The key here is to choose and clearly define the data structures. Instinctively,
1. Feedback record: One feedback record is naturally a dictionary like {‘ethnic’:'African American’, ‘gender’:'Male’, ‘citizenship’:'US’ }.
2. Statistics: It is a list of statistics record, where each record is identified by ethnic group and contains several statistics fields. Likewise, one statistics record is naturally a dictionary like {‘ethnic’:'African American’, ‘Gender_Male’:3, ‘Gender_Female’:3, ‘citizenship_US’: 4, ‘citizenship_permvisa’: 1, …}
Side note: the statistics table can be generated by SQL-like statement (cause it’s actually aggregate query) like: select ethnic, count(gender == ‘male’) as gender_male, … group by ethnic
Code:
Based on data structure definition, code is straightforward:
def prog_3_A_questionary(l):
"""
>>> l = []
>>> l.append(dict(ethnic='African American', gender='Male', citizenship='Perm Visa'))
>>> l.append(dict(ethnic='African American', gender='Male', citizenship='US Citizen'))
>>> l.append(dict(ethnic='African American', gender='Male', citizenship='Perm Visa'))
>>> l.append(dict(ethnic='African American', gender=None, citizenship='Perm Visa'))
>>> l.append(dict(ethnic='African American', gender='Female', citizenship='Perm Visa'))
>>> l.append(dict(ethnic='African American', gender='Female', citizenship='Temp'))
>>> l.append(dict(ethnic='African American', gender='Female', citizenship='Perm Visa'))
>>> l.append(dict(ethnic='African American', gender='Female', citizenship=None))
>>> l.append(dict(ethnic='Spanish Surname', gender='Female', citizenship='Perm Visa'))
>>> l.append(dict(ethnic='Spanish Surname', gender='Female', citizenship='Temp'))
>>> l.append(dict(ethnic='Spanish Surname', gender='Female', citizenship='Perm Visa'))
>>> l.append(dict(ethnic='Asian American', gender='Male', citizenship='Perm Visa'))
>>> from pprint import pprint
>>> pprint(prog_3_A_questionary(l))
{'African American': {'Total': 8,
'citizenship_Perm Visa': 5,
'citizenship_Temp': 1,
'citizenship_US Citizen': 1,
'ethnic_African American': 8,
'gender_Female': 4,
'gender_Male': 3},
'Asian American': {'Total': 1,
'citizenship_Perm Visa': 1,
'ethnic_Asian American': 1,
'gender_Male': 1},
'Spanish Surname': {'Total': 3,
'citizenship_Perm Visa': 2,
'citizenship_Temp': 1,
'ethnic_Spanish Surname': 3,
'gender_Female': 3}}
"""
# TODO: actually, we should write code to initialize the stat fields like
# 'Gender_Male' to be 0. This will help:
# - reduce the runtime check logic to see if the field is already there or not
# - initialize all the stat_fields to 0, in order to prevent any missing stat_fields.
from collections import defaultdict
stat = {}
for record in l:
# get stat record by 'group key' - ethnic
ethnic = record['ethnic']
if not ethnic in stat:
stat_record = {}
stat[ethnic] = stat_record
stat_record = stat[ethnic]
# increase 'Total'
if not 'Total' in stat_record:
stat_record['Total'] = 0
stat_record['Total'] += 1
# increase stat fields like gender_Male
for key, val in record.items():
# handle missing answer
if val == None:
continue
stat_field = '%s_%s' % (key, val)
if not stat_field in stat_record:
stat_record[stat_field] = 0
stat_record[stat_field] += 1
return stat