Entries Tagged 'geek' ↓

The weird and wonderful world of UAVs

Recently, my long time friend and colleague Ben Goertzel came to Hong Kong to help advise on the AI project I’m working on. He also happened to bring a “Parrot” quadcopter (warning, this link autoplays a youtube video), which is an awesome wifi controlled toy that has quad rotors. Not much different to a radio-controlled helicopter except it’s much cheaper and also more stable.

There are some vague plans to do autonomous control of these devices using vision processing and voice recognition. Although the actual hardware used maybe different since quadcopter drones have gone hobbyist and you can build your own from scratch.

I will now leave you with two youtube videos of them in action…

Using the Kinect hacked onto a drone to allow it to build a 3d model of the environment and do it’s own path finding:

Other’s have used motion capture to allow tracking of a ball which is then juggled between two quadcopters with trampolines:

Variadic template example in C++0x

I must have been living under a rock because I’d been unaware of all the cool additions being added in C++0x. I thought I’d do a series of examples showing simple use of several of these. This one is about variadic templates, namely, templates that can have any number of types associated with them.

So not only can you do:

<typename T> int my_function(T arg) {...}

But you can do this:

<typename T, typename... Args> in my_function(T arg, Args the_rest) {...}

Wikipedia has a good overview, so I’ll direct you there for more. I’m essentially copying their printf example, but I had a few issues compiling it, so for completeness I’ll mention them here:

#include <stdexcept>
#include <iostream>

void mprintf(const char* s)
    while (*s) {
        if (*s == '%' && *(++s) != '%')
            throw std::runtime_error("invalid format string: missing arguments");
        std::cout << *s++;
template<typename T, typename... Args>
void mprintf(const char* s, T value, Args... args)
    while (*s) {
        if (*s == '%' && *(++s) != '%') {
            std::cout << value;
            mprintf(s, args...); // call even when *s == 0 to detect extra arguments
        std::cout << *s++;
    throw std::logic_error("extra arguments provided to printf");

int main(int argc, char** argv) {
    std::string s("test");
    std::string s2("test2");
    mprintf("% - %", s, s2);
    return 0;

Compile with:

gcc -std=c++0x -lstdc++ file.cc -o vtemplate-example

If you run the executable vtemplate-example you should get the output “test – test2”.

A couple of notes:

  • I renamed printf (used in the wikipedia example) to mprintf because the compiler complained about the call to the overloaded printf being ambiguous.
  • If you are on OSX, make sure you have gcc-4.4 installed and that you invoke it instead of the Apple gcc. If you try to use the Apple version of gcc it won’t understand the -std=c++0x option.

I hope to play with lambda functions sometime soon too.

Python to parse fields in Amazon S3 logs

The log format for Amazon S3 is slightly annoying. Not overwhelmingly so, but the date field has the field separator (a space) in the middle of it and it isn’t encapsulated by quote characters. Here’s some code to split the fields up, assuming you’ve downloaded the log file already (it’s easy enough to list all logs and retrieve them with boto):

import csv
r = csv.reader(open('logfilename'),
        delimiter=' ',quotechar='"')
log_entries = []
for i in r:
    i[2] = i[2] + " " + i[3] # repair date field
    del i[3]

SizeUp behaviour using Compiz

Recently I found out about SizeUp in OSX and found it really useful. Basically it gives you hot keys for window positions, such that you can maximise them vertically and attach them to the left or right of the screen. Great for placing terminal windows and browsers. This is similar to the behaviour in Windows 7 (don’t know what they call it or care, they are just copying this stuff from existing window managers and getting all the credit). You can also send a window to a corner, or maximise horizontally and attach to top/bottom.

I knew it must be possible in linux somehow. For one thing, there’s wmctrl, a command line program for scripting window positions and I found some scripts made by others in the Ubuntu forums that act similar to the way I wanted.

However, it turns out there is something already available if you’re using Compiz as your display manager.

To change to using Compiz and get the required config tool, run:

sudo aptitude install compizconfig-settings-manager compiz-fusion-plugins-extra

And then open the menu System → Preferences → Appearance. Go to the Visual Effects tab and choose “Extra”.

Then fire up the CompizConfig Settings Manager that’s also under System → Preferences. When the dialog loads, go to the filter and type “grid”. This is the module of Compiz that gives you almost the same behaviour as SizeUp (you can get the rest of the behaviour using other modules in the “Window Management” category.

Update: changed the apt-get command to also install compiz-fusion-plugins-extra as the grid plugin is no longer part of the core package.

Upload a file to S3 with Boto

Amazon S3 is a distributed storage service which I’ve recently been working with. Boto is Python library for working with Amazon Web Services, which S3 is one facet of. This post will demonstrate how to upload a file using boto (a future post will demonstrate who to create the parameters for POST multi-part request that another client can use to upload to S3 without knowing your AWS key id or secret access key).

I will assume you’ve signed up for S3 already, and successfully downloaded and installed boto.

import boto

# Fill these in - you get them when you sign up for S3

bucket_name = AWS_ACCESS_KEY_ID.lower() + '-mah-bucket'
conn = boto.connect_s3(AWS_ACCESS_KEY_ID,

If you’ve previously learnt about S3 you’ll know that bucket names need to be unique, so that’s why we’ve used the AWS Key ID as a prefix (as this is your unique id it’s unlikely someone else will be using it as a prefix). We’ve also converted the bucket to lowercase, as DNS is case-insensitive and it’s nice to use vanity domains of the form http://[bucket_name].s3.amazonaws.com/.

We create the connection object with boto.connect_s3() and this object let’s us interact with S3. We now create ourselves a bucket and upload a file:

import boto.s3
bucket = conn.create_bucket(bucket_name,

testfile = "replace this with an actual filename"
print 'Uploading %s to Amazon S3 bucket %s' % \
       (testfile, bucket_name)

import sys
def percent_cb(complete, total):

from boto.s3.key import Key
k = Key(bucket)
k.key = 'my test file'
        cb=percent_cb, num_cb=10)

There you have it! The set_contents_from_filename is a particularly nifty method which simplifies all the streaming of data to S3. Amazon’s prototype python S3 library required that the file be loaded into memory. This doesn’t work too well if you are working with large media files.

Oh, and the percent_cb function is a call back that gets called as the upload progresses.


I have the code-brain.

Now that I live centrally, I’ve been finding myself more frequently excusing myself for having code-brain. What is code-brain? It’s when I’ve spent a day immersed in hacking code without social interaction… and to get out of this state generally just requires time. Usually if I went anywhere it’d require a drive and enough temporal separation for me to revert to a more sociable state, but now that I live in town, it’s about a 5 minute walk to meet up with people which is not enough time and words come with difficulty.

Given the plasticity of the brain, I sometimes wonder if coding for a living is psychologically stunting for one’s social behaviour. However, I can quickly reject that because many coders I know are very social and I don’t see them having the same difficulty.

I sometimes liken it to a mild form of Aspergers, since when dealing with code, for the most part it’s possible to keep everything you need to know about in front of you and it’s finite. It’s not overwhelming except when you’re thrown into a new project with a large existing code-base. Coming from being immersed in such a controlled environment it’s hard to adapt to being in a room full of people because it’s impossible to predict exactly what everyone else will do.

I don’t want to predict what everyone else will do, that’d make life boring. I just know it personally takes time to adjust between the two environments. If anyone has read or seen anything about this phenomenon then I’d appreciate links/comments – mostly so I can understand how to speed the transition and get more enjoyment from social situations without the painful transition.

Possibly I just need to make the transition more often and it’ll become easier 😉

Filtering Python dictionaries

Here’s a little Python snippet I just made up, but is immensely useful because I couldn’t find an obvious method like filter that applied to dictionaries instead of lists. This code pulls out specific key-value pairs from a dictionary and puts them in a new dictionary.

>>> x={ 'test1':1, 'test2':2, 'test3':3 }
>>> my_keys = ('test1','test2')
>>> y=dict(filter(lambda t: t[0] in my_keys, x.items()))
{'test': 1, 'test2': 2}

Obviously the performance characteristics of this won’t scale, but if you just want a few keys out of a dictionary then you should be fine.

The problems with OSX

I’ve recently bought a MacBook Pro 13″ for work and because I thought it’d make the music process much easier. Windows and Linux are fun for hacking/games, but sound stuff usually results in spending time on configuration rather than things just working. Anyhow, I’ve had a couple of weeks, and although the Macbook Pro is a nice machine, I have some gripes about the OS:

  • No consistent method for a keyboard shortcut to beginning of line or end of line. Seriously guys, what’s up with that? There are ways to go to start and end of lines, but some apps interpret these actions as start/end of buffer, others ignore them. Even Apple software isn’t consistent with this behaviour. Same goes for page-up and page-down short-cuts.
  • Control and command keys. Given that their names are almost synonyms, the extra key is kind of annoying because it’s often used in place of where Control would be used in any other OS. Difference for the sake of difference does not make you a beautiful and unique snowflake. And yet, Apple goes to all this trouble to simplify and allow the interface to be controlled with one mouse button, or to remove apparently superfluous home/end/pg-up/pg-down keys.
  • No consistent policy for click-through (what is click through? See a anti-click-through person here. As far as I can tell it seems to be about making the computer nanny you, for me… the general lack of click-through (except in rare cases) means I have to click an extra 200 or so times a day. I could get used to that… eventually. But some (Apple) apps don’t follow this policy. So it just ends up as a confusing mess. It’s even less productive than having direct mouse focus.
  • I thought Apple just couldn’t program multi-threaded applications on Windows. iTunes would stall frequently… turns out that’s normal. Clicking the help menu in most app also causes the menu to lock up (And not just the first time… everytime you click it seems to be regenerating an index or something. This leads to clicking elsewhere while waiting and then having to wait AGAIN. Exasperate the user looking for help? Not exactly a smart user relations design!)
  • A money grubbing $50 for a miniplug to DVI adaptor. Another $50 for VGA adaptor because they don’t give you a DVI-I with your DVI adaptor. Also the miniplug doesn’t support audio if you get the DVI to HDMI cable. My 1.5 year old Dell has HDMI out with audio and VGA (the only difference between HDMI to DVI is the plug and HDMI also supporting audio, so a basic HDMI to DVI cable is all that’s needed). For $50 an adaptor, I’d at least expect the adaptor to match the style of the Macbook Pro’s aluminium case, instead of being plastic white thing that clashes.
  • Try to move a file using finder without having to drag things. You can’t, cut and paste commands are disabled. In fact, why even have a cut item under the edit menu? It’s always greyed out as far as I’ve seen, as if to taunt anyone that’s used a decent file browser! Instead it’s “NO, YOU ARBITRARILY CAN’T DO THAT.” Thanks Apple, I love you too.

Maybe my expectations were too high, and maybe I’ll get used to the quirks of Apple software, but I currently miss Ubuntu. If anybody can point me somewhere that solves these issues though, then I’d be most appreciative.

Having said all that… things I like: multi-screen support done right, dock is pretty cool, the Macbook feels nice.

Designing robust lyric and cover art retrievers

One of the very last sessions for the GSoC Mentor Summit was about Media players. There were lead devs from Amarok and XMMS2, and it was cool to speak with them in person. One frequent issue that Amarok (I can’t remember if it was also an issue for XMMS2) was that lyric sites keep going down and changing their format, sometimes adding ads in the middle of the lyrics. Another was that Amazon no longer let’s them use the album cover art, and the substitute of last.fm has very small cover art images.

My suggestion for both, but which would need to be implemented in somewhat different ways, would be to use a variety of lyrics sites, then use text similarity matching to work out what the actual lyrics part of the page was. For images, you could use google image search, and then return the image that was most frequent, as well as having some heuristic for preference of square images. I think that, although not perfect, this would make the the system a lot more robust against further changes.

Text similarity and overlaps is well understood as a computer science problem. It’s used by the shotgun sequencing approach for DNA sequencing… as well as variety of search and indexing problems. Hopefully I’ll release a usable library for it over the summer – I’ll call it libshotgun-lyrics 😉

Mathematics are pretty

I just uploaded several fractal videos to YouTube on behalf of a colleague who works on OpenCog with me: Linas Vepstas.

The interior measure of the circle map, with the parameters and values as explained here:

Polylogarithm function (more info):