Reinforcement learning has been used extensively in robotics for many years, ranging from the simple types of obstacle avoidance seen in the seven dwarfs all the way up to the elaborate brain based Darwin automata capable of more sophisticated associative learning This is predicated upon a psychological school of thought known as behaviorism, which was extremely popular for most of the 20th century but which has fallen out of favor in recent decades.
In the realm of learning robotics there don't seem to have been many attempts thus far to go beyond the limited bounds of behaviorism. Clearly in animals classical conditioning does play an important role, but it's by no means the whole story. For example, young children don't learn language by being explicitly rewarded or punished every time some new concept is acquired. Much of learning seems to be self-organised, where the creature has a spontaneous ability to classify and is engaging in a kind of synchronization with its environment where the resulting conceptual framework converges towards stable forms, like eddies in the flow of causality.
Is human language Turing complete?
Even reinforcement learning, when combined with self-organization aren't sufficient to fully explain complex behaviors described as "intelligent". To be capable of a wide range of adaptation the system needs to be Turing complete in a manner which supports the generation of a population of virtual machines capable of implementing arbitrary transformations or behaviors.
Turing machines can of course be implemented in many different kinds of ways, and there are many possible systems which are computationally equivalent to them. As noted by people like Chomsky, human language seems to exhibit qualities similar to Turing completeness, such as the ability to generate recursive structures. If human language is a Turing complete system, as I suspect that it is, then this may explain why it is such a powerful tool for regimenting thought processes, and has class 4 type behavior. Interactions between multiple such systems, both within a single mind and between multiple minds could be the way in which the problems of classical incompleteness are overcome, since at the interfaces between machines undefined behavior can occur and the systems themselves may comprise heterogeneous but symbiotic formalisms which like the bimetallic strip of a mechanical clock work against each other in a complementary way.
The language augmented nature of human thought is probably also the reason why reinforcement learning can have more extensive consequences than you might see in other creatures, up to and including the complete reorganization of a person's world view based upon one or a small number of learning instances. So instead of just reinforcing some single isolated categorization the reinforcing event becomes the input to a Turing machine which has consequences that are not always easy to foresee (without running the program) and may spawn new virtual machines or erase existing ones.
Dimensions of Affect
Another issue with reinforcement learning as typically practiced in robotics is that learning itself tends to be unidimensional. The learning rate may vary, but there are not usually more than one learning parameter. Hence it's difficult to simulate fading affect memory biases, where memories associated with positive or negative consequences are differentially weighted and can have multidimensional results. A possible research programme would be to apply the biologically based affect theory of Silvian Tomkins to robotics and try to replicate learning scenarios which have similar outcomes to those seen in people.