How Neural Networks Power Robots at Starship | by Tanel Pärnamaa | Starship Technologies
Starship is building a fleet of robots to deliver packages locally on demand. To achieve this, robots need to be safe, polite, and fast. But how do you get there with low computing resources and without expensive sensors such as LIDARs? This is the technical reality that you need to tackle, unless you live in a world where customers willingly pay $ 100 for a delivery.
To begin with, the robots start by detecting the world with radars, a multitude of cameras and ultrasound.
However, the challenge is that most of this knowledge is low level and not semantic. For example, a robot may sense that an object is ten meters away, but without knowing the object category, it is difficult to make careful driving decisions.
Machine learning through neural networks is surprisingly useful in converting this unstructured low-level data into higher-level information.
Spaceship robots mainly drive on sidewalks and side streets when they need to. This poses a different set of challenges compared to self-driving cars. Traffic on automobile roads is more structured and predictable. Cars move along the lanes and don’t change direction too often while humans frequently stop suddenly, meander, may be accompanied by a dog on a leash, and do not signal their intentions with turn signals.
To understand the surrounding environment in real time, a central component of the robot is an object detection module – a program that captures images and returns a list of boxes of objects.
That’s great, but how do you write such a program?
An image is a large three-dimensional array made up of a myriad of numbers representing the intensity of the pixels. These values change significantly when the image is taken at night instead of day; when the color, scale or position of the object changes, or when the object itself is truncated or occluded.
For some complex problems, teaching is more natural than programming.
In the robot software we have a set of trainable units, mainly neural networks, where the code is written by the model itself. The program is represented by a set of weights.
At first, these numbers are initialized randomly, and the program’s output is also random. Engineers present sample models of what they would like to predict and ask the network to improve the next time it sees a similar entry. By changing the weights iteratively, the optimization algorithm searches for programs that more and more accurately predict bounding boxes.
However, careful consideration should be given to the examples used to train the model.
- Should the model be penalized or rewarded when it detects a car in a reflection of window?
- What should he do when he detects an image of a human being on a poster?
- Should a car trailer full of cars be annotated as one entity or should each of the cars be annotated separately?
These are all examples that occurred during the construction of the object detection module in our robots.
When learning a machine, big data is simply not enough. The data collected must be rich and varied. For example, using only uniformly sampled images and then annotating them would show many pedestrians and cars, but the model would lack examples of motorcycles or skaters to reliably detect these categories.
The team should specifically look for difficult examples and rare cases, otherwise the model would not progress. Starship operates in several different countries and varying weather conditions enrich the set of examples. Many people were surprised when the Starship delivery robots ran during the snowstorm ‘Emma’ UK, however, airports and schools remained closed.
At the same time, annotating data takes time and resources. Ideally, it is better to train and improve models with less data. This is where architectural engineering comes in. We encode prior knowledge into architecture and optimization processes to narrow search space to programs that are more likely to be in the real world.
In some computer vision applications such as pixel level segmentation, it is useful for the model to know whether the robot is on a sidewalk or a railway crossing. To provide a clue, we encode global clues at the image level into the architecture of the neural network; the model then determines whether to use it or not without having to learn it from scratch.
After data and architecture engineering, the model might work well. However, deep learning models require significant computing power, and this is a big challenge for the team as we cannot take advantage of the more powerful graphics cards on the low cost delivery robots powered by drums.
Starship wants our shipments to be low cost, which means our equipment has to be inexpensive. This is the same reason Starship does not use LIDAR (a detection system that works on the principle of radar, but uses light from a laser) which would make it much easier to understand the world – but we don’t want to that our customers pay more. than they need for delivery.
State-of-the-art object detection systems published in academic papers operate at around 5 frames per second [MaskRCNN], and real-time object detection papers do not report rates significantly above 100 FPS [Light-Head R-CNN, tiny-YOLO, tiny-DSOD]. In addition, these figures are shown on a single image; however, we need a 360 degree understanding (the equivalent of processing about 5 individual images).
To give perspective, the Starship models run at over 2000 FPS when measured on a consumer GPU and process a full 360 degree panoramic image in a single forward pass. This equates to 10,000 FPS when processing 5 individual images with a batch size of 1.
Neural networks are better than humans at many visual issues, although they can still contain bugs. For example, a bounding box may be too wide, the trust too low, or an object may be hallucinated in a place that is actually empty.
Fixing these bugs is a challenge.
Neural networks are seen as black boxes that are difficult to analyze and understand. However, to improve the model, engineers must understand the failure cases and delve into the details of what the model has learned.
The model is represented by a set of weights, and one can visualize what each specific neuron is trying to detect. For example, the first layers of the Starship network activate in standard patterns such as horizontal and vertical edges. The next block of layers detects more complex textures, while the upper layers detect car parts and complete objects.
Technical debt takes on another meaning with machine learning models. Engineers are constantly improving architectures, optimization processes, and datasets. The model thus becomes more precise. However, changing the detection model for a better one does not necessarily guarantee the success of a robot’s overall behavior.
There are dozens of components that use the output of the object detection model, each of which requires different precision and level of recall that are set based on the existing model. However, the new model can act differently in a number of ways. For example, the exit probability distribution could be biased towards higher values or be wider. Even though the average performance is better, it can be worse for a specific group like big cars. To avoid these obstacles, the team calibrates the probabilities and checks the regressions on several sets of stratified data.
Monitoring trainable software components poses a different set of challenges than monitoring standard software. Little concern is given regarding inference time or memory usage as they are mostly constant.
However, moving the dataset becomes the primary concern: the distribution of data used to train the model is different from where the model is currently deployed.
For example, all of a sudden, electric scooters can be driven on sidewalks. If the model did not take this class into account, the model will have trouble classifying it correctly. The information derived from the object detection module will disagree with other sensory information, which will result in a request for assistance from human operators and thus a slowdown in deliveries.
Neural networks allow Starship robots to be safe on level crossings avoiding obstacles like cars, and on sidewalks understanding all the different directions humans and other obstacles may choose to go.
Spacecraft robots achieve this using inexpensive hardware that poses a lot of engineering challenges, but makes robot deliveries a reality today. Starship robots make real deliveries seven days a week to cities around the world, and it is gratifying to see how our technology is continually bringing more convenience to people in their lives.
Please feel free to contact us for more detail about us, visiting our Contact page.