#!/bin/sh """"exec ${PYTHON:-python} -t $0 "$@";" """ # vim: filetype=python expandtab smarttab shiftwidth=4 import sys import string from operator import itemgetter as nth alphanumerics = string.lowercase + string.digits def emit_top_new(paragraph, show_count, previous, keep_previous_count): """Given some text and a list of disallowed characters, return two samples of the most common letters. One sample is likely to be fed back in as a disallowed list. >>> emit_top_new("", 3, ['1', '2'], 2) ([], []) >>> emit_top_new("abcdefghighighijkljklmnopppp", 3, [], 2) (['p', 'h', 'i'], ['p', 'h']) >>> emit_top_new("abcdefghighighijkljklmnopppp", 3, ['p', 'h'], 2) (['i', 'g', 'l'], ['i', 'g']) """ character_count = dict([c, 0] for c in alphanumerics) for c in previous: character_count.pop(c, None) for c in paragraph.lower(): try: character_count[c] += 1 except KeyError: pass # not a character we care about sortable = character_count.items() sortable.sort(key=nth(1)) sortable.reverse() sortable = [k for k, v in sortable if v > 0] return sortable[:show_count], sortable[:keep_previous_count] if __name__ == "__main__": import doctest doctest.testmod() show_count = 10 keep_previous_count = 5 previous = set() lines = [] for line in sys.stdin: if line.strip() == "": if lines: new, previous = emit_top_new("".join(lines), show_count, previous, keep_previous_count) print new lines = [] else: lines.append(line) if lines: new, previous = emit_top_new("".join(lines), show_count, previous, keep_previous_count) print new