Empathy in the machine

A draft post/idea from the archives that I thought it was about time that I release. Funnily, this was entirely before I started working on NetEmpathy – maybe it’s not as disconnected as I thought from AGI after all!

It is my belief that empathy is a a prerequisite to consciousness.

I recently read Hofstadter’s I am a strange loop, whose central themes are around recursive representations of self leading to our perception of consciousness. For some, the idea that our consciousness is somewhat of an illusion might be hard to swallow – but then, quite likely, so are all the other qualia. They seem real to us, because our mind makes it real. To me, it’s not a huge hurdle to believe. I find the idea that our minds are infinitely representing themselves via self-reflection kind of beautiful in simplicity. You can get some very strange things happening when things start self-reflecting.

For example, Gödel’s incompleteness theorem originally broke Principia Mathematica and can do the same for any sufficiently expressive formal system when you force that formal system to reason about itself. One day I’ll commit to explaining this in a post, but people write entire books about the idea to make Godel’s theorem and it’s consequences easy to understand!

And as an example of self-reflection and recursion being beautiful, I merely have to point to fractals which exhibit self-similarity at arbitrary levels of recursion. Or perhaps the recursive and repeating hallucinations induced by psychedelics give us some clue about the recursive structures within the brain.

Hofstadter also later in the book delves into slightly murky mystical waters, which I find quite entertaining and not without merit. He says that, due to us modelling of the behaviour of others, we also start representing their consciousness too. The eventual conclusion, which is explained in much greater and philosophical detail in his book, is that our “consciousness” isn’t just the sum of what’s in our head but is a holistic total of ourselves and everyone’s representation of us in their heads.

I don’t think the Turing test will really be complete until a machine can model humans as individual and make insightful comments on their motivations. Ok, so that wouldn’t formally be the Turing test any more, but I think that as a judgement of conscious intelligence, the artificial agent needs to at least be able to reflect the motivations of others and understand the representation of itself within others. Lots of recursive representations!

The development of consciousness within AI via empathy is what, in my opinion, will allow us to create friendly AI. Formal proofs won’t work due to computational irreducibility of complex systems. In an admittedly strained analogy this is similar to trying to formally prove where a toy sailboat will end up after dropping it in a river upstream. Trying to prove that it won’t get caught in an eddy before it reaches the ocean of friendliness (or perhaps if you’re pessimistic and you view the eddy as the small space of possibilities for friendly AI). Sure computers and silicon act deterministically (for the most part), but any useful intelligence will interact with an uncertain universe. It will also have to model humans out of necessity as humans are one of the primary agents on the Earth that will need to interact with… perhaps not if it becomes all-powerful but certainly initially. By modelling humans, it’s effectively empathising with our motivations and causing parts of our consciousness to be represented inside it[1].

Given that machine could increase it’s computationally capacity exponentially via Moore’s law (not to mention via potentially large investment and subsequently rapid datacenter expansion) it could eventually model many more individuals than any one human does. So if the AI had a large number of simulated human minds, which would, if accurately modelled, probably bawk at killing the original, then any actions the AI performed would likely benefit the largest number of individuals.

Or perhaps the AI would become neurotic trying to satisfy the desires and wants of conflicting opinions.

In some ways this is similar to Eliezer’s Collected Extrapolated Volition (as I remember it at least… It was a long time ago that I read it. I should do so again to see how/if it fits with what I’ve said here).

[1] People might claim that this won’t be an issue because digital minds designed from scratch will be able to box up individual representations to prevent a bleed through of beliefs. Unfortunately, I don’t think this is a tractable design for AI, even if it was desirable. AI is about efficiency of computation and representation, so these concepts and beliefs will blend. Besides, conceptual blending is quite likely a strong source of new ideas and hypotheses in the human brain.