How Do We Teach Robots to Move Objects?

ai-artificial-intelligence-machine-185725

By now, you most likely understand the basics of how AI systems function.

In essence, they are usually built to think like humans and learn on the fly. What may yet be unclear is how this instinctive form of learning relates to how AIs teach themselves to move.

On a basic level, learning any sort of movement involves utilizing skills and learning habits related to deep learning. MCube, which is one team that is working on this sort of problem, is running with the added assumption that the more accurate a system’s training data is, the less any errors will occur. According to them, the best way to improve an AI is to use the best hardware, sensors and cameras that it is possible to use. In doing so, the team theorizes that over time, it will be able to create a truly autonomous AI system. What definitely stands in their favor is the fact that they now have Amazon’s resources at their disposal, due to their participation and placing in the 2017 Amazon Robotics Challenge.

With this, they seemed to have achieved a reasonably high level of execution on their theory of autonomous robotic manipulation. After one of MCube’s explanations of this idea, it became clear to me that in understanding the parts of this theory, one could then more easily understand how AIs are taught to move. To do so, I’ve taken the example below of Princeton University research on certain learning synergies. While this is not directly from the MCube team, it is listed as a source of inspiration for why they do what they do.

From the get-go, it’s clear that to some extent, MCube’s research is driven by encouraging their systems to run on trial and error. If the AI fails to push or grasp an object as it should, then it tries again and again until it succeeds in doing so. Several issues can arise with such an approach, including how to incentivize the system to continue its’ efforts. In a nutshell, it appears that the idea of achieving an ideal numerical value is usually applied, though this is not without its’ own limitations. According to this particular Princeton research team, with such an incentive comes a high level of structure and therefore restrictions, related to how the system is built.

Over the course of my research in this space, I’ve come to the conclusion that most AI professionals seem to share. Keeping AIs in an effective sandbox of rules leaves them ill-prepared for the real world and the quick decision making that achieving success in it requires. In response to this, MCube and the Princeton team that helped to inspire them, have suggested a particular variation of self-supervised deep learning called Q Learning.

Q Learning, is really nothing more than self-supervised deep learning with a unique rule set. Therefore, in understanding this type of deep learning, it may be said that you also grasp the importance of Q Learning. For MCube, it seems that this framework refers to the group of rules that govern how their AI systems learn to move in various ways. Typically, when an AI is structured with this foundation in mind, it is made to look for the shortest, but most efficient path to its’ goal through trial and error. It’s really no accident that self-supervised deep learning theories have been popularly compared to how children begin to learn to speak. They try and try again, while also mimicking others until the result is as it should be. In connection with this, language learning is not completely forced by a strict set of rules. It is, rather, encouraged by the brain to progress through trial and error.

For our purposes, this idea of almost encouraging failure is most important. In any form of self-supervised learning, an AI system will not receive any human-generated training data. Instead, they will enter whatever environment they were built to work in and learn by doing. In an overarching sense, they fail until they achieve the desired result, while also taking into account specifications that their creators have built into their behavior.

A good example of this might be that which is at the heart of Q Learning. As suggested above, in solving the problem it was built to solve, the AI needs to always look for the shortest, yet most efficient, path to get there. At the same time, it should be penalized for each error so that it can learn what actions are not conducive to reaching its’ goal. If you find it hard to conceptualize punishing an AI, then you’re not alone. Understanding how it’s done however, is actually quite simple. In a way, everything’s based on the numbers. Think about Google’s DeepMind playing the game of GO for a moment.

It was built to achieve the best possible score in the game to ensure victory and it succeeded. AIs like those involved with MCube’s projects can learn to move objects in the same fashion. Through trial and error, they record quantitative data on what actions they should and should not take to achieve their desired movement. They retain this learning because all past results are recorded forever in their neural networks. With is, we can also circle back to our point above about the usage of the best possible tools allowing an AI to generate the best possible training data on its’ own. These cameras, sensors, and other hardware items take in all that the AI does, continuously.

If this process sounds time confusing, that’s because it is. According to MCube, shortening the time it takes for an AI to learn any movement depends on curtailing the uncertainty inherent in its’ learning. This goal actually seems to be what is driving most of the firm’s work as of now. In the end, any chance of their success will hinge on the reliability of the Q Learning framework, which you can expect us to dig into in future posts.