Python to parse fields in Amazon S3 logs

The log format for Amazon S3 is slightly annoying. Not overwhelmingly so, but the date field has the field separator (a space) in the middle of it and it isn’t encapsulated by quote characters. Here’s some code to split the fields up, assuming you’ve downloaded the log file already (it’s easy enough to list all logs and retrieve them with boto):

import csv
r = csv.reader(open('logfilename'),
        delimiter=' ',quotechar='"')
log_entries = []
for i in r:
    i[2] = i[2] + " " + i[3] # repair date field
    del i[3]
    log_entries.append(i)

Filtering Python dictionaries

Here’s a little Python snippet I just made up, but is immensely useful because I couldn’t find an obvious method like filter that applied to dictionaries instead of lists. This code pulls out specific key-value pairs from a dictionary and puts them in a new dictionary.

>>> x={ 'test1':1, 'test2':2, 'test3':3 }
>>> my_keys = ('test1','test2')
>>> y=dict(filter(lambda t: t[0] in my_keys, x.items()))
{'test': 1, 'test2': 2}

Obviously the performance characteristics of this won’t scale, but if you just want a few keys out of a dictionary then you should be fine.