Level 5 autonomy: Full self-driving cars don’t need a driver’s seat. Everyone is a passenger. But Deep learning’s long-tail problem won’t give us level 5 self-driving cars soon. Level 5 self-driving cars defines: “The vehicle can do all the driving in all circumstances, and the human occupants are just passengers and need never be involved in driving.”
Current neural networks can at best replicate a rough imitation of the human vision system. Deep learning has distinct limits that prevent it from making sense of the world in the way humans do. Neural networks require huge amounts of training data to work reliably, and they don’t have the flexibility of humans when facing a novel situation not included in their training data.
Human drivers also need to adapt themselves to new settings and environments, such as a new city or town, or a weather condition they haven’t experienced before (snow- or ice-covered roads, dirt tracks, heavy mist). However, we use intuitive physics, common sense, and our knowledge of how the world works to make rational decisions when we deal with new situations. We understand causality and can determine which events cause others. We also understand the goals and intents of other rational actors in our environments and reliably predict what their next move might be. But for the time being, deep learning algorithms don’t have such capabilities.
Interpolation & Extrapolation approach to solving self-driving cars, simply by making software improvements.
Extrapolation tries to extract rules from big data and apply them to the entire problem space. Interpolation relies on rich sampling of the problem space to calculate the spaces between samples.
Deep learning is fundamentally flawed because it can only interpolate. Deep neural networks extract patterns from data, but they don’t develop causal models of their environment. This is why they need to be precisely trained on the different nuances of the problem they want to solve. The human mind on the other hand, extracts high-level rules, symbols, and abstractions from each environment, and uses them to extrapolate to new settings and scenarios without the need for explicit training.
To improve deep learning systems, the vision only approach: We humans, too, mostly rely on our vision system to drive. We don’t have 3D mapping hardware wired to our brains to detect objects and avoid collisions. I think key here is the fact that the human brain is a direct-fit machine, which means it fills the space between the data points it has previously seen. The key here is to find the right distribution of data that can cover a vast area of the problem space. I think without some sort of abstraction and symbol manipulation, deep learning algorithms won’t be able to reach human-level driving capabilities.
There are many efforts to improve deep learning systems. One example is hybrid artificial intelligence, which combines neural networks and symbolic AI to give deep learning the capability to deal with abstractions.
Another notable area of research is “system 2 deep learning.” This approach, endorsed by deep learning pioneer Yoshua Bengio, uses a pure neural network–based approach to give symbol-manipulation capabilities to deep learning. Yann LeCun, a longtime colleague of Bengio, is working on “self-supervised learning,” deep learning systems that, like children, can learn by exploring the world by themselves and without requiring a lot of help and instructions from humans. And Geoffrey Hinton, a mentor to both Bengio and LeCun, is working on “capsule networks,” another neural network architecture that can create a quasi-three-dimensional representation of the world by observing pixels.
These are all promising directions that will hopefully integrate much-needed common sense, causality, and intuitive physics into deep learning algorithms, they are still in the early research phase and are not nearly ready to be deployed in self-driving cars and other AI applications.
One of the arguments I hear a lot is that human drivers make a lot of mistakes too. Humans get tired, distracted, reckless, drunk, and they cause more accidents than self-driving cars. The first part about human error is true. But I’m not so sure whether comparing accident frequency between human drivers and AI is correct. I believe the sample size and data distribution does not paint an accurate picture yet.
But more importantly, I think comparing numbers is misleading at this point. What is more important is the fundamental difference between how humans and AI perceive the world. Our eyes receive a lot of information, but our visual cortex is sensible to specific things, such as movement, shapes, specific colors and textures. Through billions of years of evolution, our vision has been honed to fulfil different goals that are crucial to our survival, such as spotting food and avoiding danger. But perhaps more importantly, our cars, roads, sidewalks, road signs, and buildings have evolved to accommodate our own visual preferences. Think about the color and shape of stop signs, lane dividers, flashers, etc. We have made all these choices—consciously or not—based on the general preferences and sensibilities of the human vision system.
Therefore, while we make a lot of mistakes, our mistakes are less weird and more predictable than the AI algorithms that power self-driving cars. We either have to wait for AI algorithms that exactly replicate the human vision system (which I think is unlikely any time soon), or we can take other pathways to make sure current AI algorithms and hardware can work reliably.
One such pathway is to change roads and infrastructure to accommodate the hardware and software present in cars. For instance, we can embed smart sensors in roads, lane dividers, cars, road signs, bridges, buildings, and objects. This will allow all these objects to identify each other and communicate through radio signals. Computer vision will still play an important role in autonomous driving, but it will be complementary to all the other smart technology that is present in the car and its environment. This is a scenario that is becoming increasingly possible as 5G networks are slowly becoming a reality and the price of smart sensors and internet connectivity decreases.
We’re still exploring the privacy and security threats of putting an internet-connected chip in everything. An intermediate scenario is the “geofenced” approach. Some experts describe these approaches as “moving the goalposts” or redefining the problem. Self-driving technology will only be allowed to operate in areas where its functionality has been fully tested and approved, where there’s smart infrastructure, and where the regulations have been tailored for autonomous vehicles (e.g., pedestrians are not allowed on roads, human drivers are limited, etc.). But given the current state of deep learning, the prospect of an overnight rollout of self-driving technology is not very promising. Such measures could help a smooth and gradual transition to autonomous vehicles as the technology improves, the infrastructure evolves, and regulations adapt.