Empathy in the machine

A draft post/idea from the archives that I thought it was about time that I release. Funnily, this was entirely before I started working on NetEmpathy – maybe it’s not as disconnected as I thought from AGI after all!

It is my belief that empathy is a a prerequisite to consciousness.

I recently read Hofstadter’s I am a strange loop, whose central themes are around recursive representations of self leading to our perception of consciousness. For some, the idea that our consciousness is somewhat of an illusion might be hard to swallow – but then, quite likely, so are all the other qualia. They seem real to us, because our mind makes it real. To me, it’s not a huge hurdle to believe. I find the idea that our minds are infinitely representing themselves via self-reflection kind of beautiful in simplicity. You can get some very strange things happening when things start self-reflecting.

For example, Gödel’s incompleteness theorem originally broke Principia Mathematica and can do the same for any sufficiently expressive formal system when you force that formal system to reason about itself. One day I’ll commit to explaining this in a post, but people write entire books about the idea to make Godel’s theorem and it’s consequences easy to understand!

And as an example of self-reflection and recursion being beautiful, I merely have to point to fractals which exhibit self-similarity at arbitrary levels of recursion. Or perhaps the recursive and repeating hallucinations induced by psychedelics give us some clue about the recursive structures within the brain.

Hofstadter also later in the book delves into slightly murky mystical waters, which I find quite entertaining and not without merit. He says that, due to us modelling of the behaviour of others, we also start representing their consciousness too. The eventual conclusion, which is explained in much greater and philosophical detail in his book, is that our “consciousness” isn’t just the sum of what’s in our head but is a holistic total of ourselves and everyone’s representation of us in their heads.

I don’t think the Turing test will really be complete until a machine can model humans as individual and make insightful comments on their motivations. Ok, so that wouldn’t formally be the Turing test any more, but I think that as a judgement of conscious intelligence, the artificial agent needs to at least be able to reflect the motivations of others and understand the representation of itself within others. Lots of recursive representations!

The development of consciousness within AI via empathy is what, in my opinion, will allow us to create friendly AI. Formal proofs won’t work due to computational irreducibility of complex systems. In an admittedly strained analogy this is similar to trying to formally prove where a toy sailboat will end up after dropping it in a river upstream. Trying to prove that it won’t get caught in an eddy before it reaches the ocean of friendliness (or perhaps if you’re pessimistic and you view the eddy as the small space of possibilities for friendly AI). Sure computers and silicon act deterministically (for the most part), but any useful intelligence will interact with an uncertain universe. It will also have to model humans out of necessity as humans are one of the primary agents on the Earth that will need to interact with… perhaps not if it becomes all-powerful but certainly initially. By modelling humans, it’s effectively empathising with our motivations and causing parts of our consciousness to be represented inside it[1].

Given that machine could increase it’s computationally capacity exponentially via Moore’s law (not to mention via potentially large investment and subsequently rapid datacenter expansion) it could eventually model many more individuals than any one human does. So if the AI had a large number of simulated human minds, which would, if accurately modelled, probably bawk at killing the original, then any actions the AI performed would likely benefit the largest number of individuals.

Or perhaps the AI would become neurotic trying to satisfy the desires and wants of conflicting opinions.

In some ways this is similar to Eliezer’s Collected Extrapolated Volition (as I remember it at least… It was a long time ago that I read it. I should do so again to see how/if it fits with what I’ve said here).

[1] People might claim that this won’t be an issue because digital minds designed from scratch will be able to box up individual representations to prevent a bleed through of beliefs. Unfortunately, I don’t think this is a tractable design for AI, even if it was desirable. AI is about efficiency of computation and representation, so these concepts and beliefs will blend. Besides, conceptual blending is quite likely a strong source of new ideas and hypotheses in the human brain.



6 comments ↓

#1   Schneider on 03.19.10 at 11:47 am

Quite interesting. But, as many human beings are, couldn’t we build a “depressed empathic machine”, which couldn’t, for example, be able to cause any injury in humans?
I mean, we’re talking about “machine psychology” hehehe, we could build a not-so-healthy one :)
I think it’s quite safe then.

#2   Vladimir Nesov on 03.19.10 at 9:36 pm

A “proof of Friendliness” doesn’t let you know object-level facts about what specifically the AI will do, it only tells you an abstract fact that whatever the AI will do is preferable from the point of view of humans. If you have two identical complicated programs, of which you don’t and can’t know what they compute on what possible inputs, you can still be sure that they’ll compute exactly the same thing on all occasions. Similarly, a Friendly AI needs to prefer the same course of action as humans would (on reflection), in all specific circumstances. This might be proved even if we have no idea of what that exactly our of Friendly AI’s preference might be.

#3   Joel on 03.20.10 at 10:46 am

@Vladimir Can you point me to the proof for your statement:

“If you have two identical complicated programs, of which you don’t and can’t know what they compute on what possible inputs, you can still be sure that they’ll compute exactly the same thing on all occasions.”

Unless you mean that they receive exactly the same inputs at the same times, then yes, I agree. But any intelligent agent doesn’t exist in a world with an invariant temporal chain of inputs.

So two identical programs could diverge depending on the relative ordering of inputs. Not necessarily diverge from friendliness, but in terms of eventual knowledge and belief.

#4   Vladimir Nesov on 03.21.10 at 1:11 am

Identical programs will have identical outputs on identical inputs. Thus, placed in the same situations (which includes the inputs), the identical programs will do the same thing. Note that this is a theoretical relation, you don’t need to have both programs around, or do experiments on them, or have one of the programs examine the other, to know that the relation holds true.

With Friendly AI (FAI), you care about a certain relation between it and (say) a human: in each situation X (incl. all input, etc.), the FAI needs to prefer to do the same action as the human would prefer in the same situation X (on reflection, if they were both capable of understanding X fully, etc.). This is more complicated than same behavior of of two identical programs, since preferring the same thing doesn’t necessarily mean that they’ll actually do the same thing, or do the most preferred thing: when solving a difficult program, your goal might be to find the right solution, but you’ll only be able to find some approximation to a solution, or a candidate solution that you aren’t sure is the right solution. This holds both for us stupid apes, and for superintelligent AIs, given appropriately difficult problems.

This is to reply to two ideas you voiced in this post: first, you don’t need FAI to actually interact or observe humans, to “emphasize”, in order to have a knowable property of being Friendly, at least not during the actual operation (you of course need to confer the info about humans into the FAI at the start, but this is again the info about humans as systems, not magic knowledge of what they’ll actually do, or explicit info about what they will do in all possible situations). Second, uncomputability or chaotic behavior of environment are not game-stoppers, as you only need to have a certain abstract property of FAI as a system, that can then be let to freely develop for any unanticipated circumstance it happens to encounter. Friendliness need to be shown in general, as a property of a system, from which Friendly behavior in the specific situation (which is not knowable in advance) follows as a special case.

You might want to read the current sequence I’m writing on my blog, to get a better idea of where I’m coming from.

#5   James MM on 03.23.10 at 12:48 am

“Hofstadter… says that, due to us modelling of the behaviour of others, we also start representing their consciousness too. The eventual conclusion, which is explained in much greater and philosophical detail in his book, is that our “consciousness” isn’t just the sum of what’s in our head but is a holistic total of ourselves and everyone’s representation of us in their heads.”

This sounds to me very much like the ‘collective unconscious’ that Jung talks about—the cultural paradigm that we use to filter our ideas and actions through in daily life. We need to empathise in order to be successful and to attain happiness in our lives, and we use this paradigm as a tool of interpretation. If we hold in our minds an accurate image of the collective unconscious, then we interact in more meaningful and beneficial ways with others, and thus become more successful and achieve more happiness.

Seems only fair that any AI entity should have the same goals, using empathy as a tool to that end.

#6   Joel on 03.23.10 at 11:41 am

@Vladimir I can agree with what you’ve mentioned. In particular reference to the system having a “preference” for friendly behaviour. As I’ve understood it in the past, there seemed to be a section of the friendly AI community that were focused on guaranteeing friendliness.

I guess it depends if friendliness is to be judged by actions or intention. But as the saying goes, the road to hell is paved with good intention.

@James I think that’s a similar but related idea which I wouldn’t mind learning more about. The representation of other people’s consciousness is more of a two way thing and at a slightly more direct level than my impression of Jung’s ‘collective unconscious’ – perhaps we should call it our ‘collective consciousness’? ;-)

Leave a Comment