Documente Academic
Documente Profesional
Documente Cultură
highlights
• An end-to-end learning system for UGV indoor corridor tracking using camera and LiDAR.
• A deep learning framework for sensor fusion that is robust to sensor failure.
• Gating based dropout regularization to enable robustness to various data corruptions.
• Experimental demonstration of the efficacy of the proposed system on our in-house UGV.
article info a b s t r a c t
Article history: In this paper, we introduce a novel methodology for fusing sensors and improving robustness to
Received 6 July 2018 sensor failures in end-to-end learning based autonomous navigation of ground vehicles in unknown
Received in revised form 22 January 2019 environments. We propose the first learning based camera–LiDAR fusion methodology for autonomous
Accepted 7 March 2019
in-door navigation. Specifically, we develop a multimodal end-to-end learning system, which maps raw
Available online 18 March 2019
depths and pixels from LiDAR and camera, respectively, to the steering commands. A novel gating based
Keywords: dropout regularization technique is introduced which effectively performs multimodal sensor fusion
Robustness to sensor failures and reliably predicts steering commands even in the presence of various sensor failures. The robustness
Deep learning for autonomous navigation of our network architecture is demonstrated by experimentally evaluating its ability to autonomously
Vision/LiDAR based navigation navigate in the indoor corridor environment. Specifically, we show through various empirical results
Learning from demonstration
that our framework is robust to sensor failures, partial image occlusions, modifications of the camera
Sensor fusion
image intensity, and the presence of noise in the camera or LiDAR range images. Furthermore, we show
Autonomous vehicles
that some aspects of obstacle avoidance are implicitly learned (while not being specifically trained for
it); these learned navigation capabilities are shown in ground vehicle navigation around static and
dynamic obstacles.
© 2019 Published by Elsevier B.V.
https://doi.org/10.1016/j.robot.2019.03.001
0921-8890/© 2019 Published by Elsevier B.V.
N. Patel, A. Choromanska, P. Krishnamurthy et al. / Robotics and Autonomous Systems 116 (2019) 80–97 81
ω ω
ton = 1.5 + 0.5 × . (2)
100
The mapping from differential PWM command to angular
velocity of the vehicle has its own dynamics due to power ampli-
fiers, motor dynamics, and gear train and ground friction. Hence,
the plots are in terms of the percentage of the differential PWM
from its neutral value (0% corresponds to no turn, i.e., moving
straight forward) given to the motors driving the wheels on the
opposite sides of the vehicle. For example, the vehicle’s angular
velocity corresponding to differential PWM commands (steering
Fig. 4. Reconstructed ground vehicle trajectory using monocular ORB-SLAM
commands) 4%, 8%, and 16% in steady state are 0.15, 0.56, and
(yellow) [65] and ORB-SLAM2 (black) [64] (with camera and LiDAR) tracking
the center (blue) of a rectangular indoor corridor environment (anti-clockwise 1.89 rad/s.
trajectory) . (For interpretation of the references to color in this figure legend,
the reader is referred to the web version of this article.) 5. Proposed sensor fusion framework
Table 1
NetEmb: Deep learning based modality fusion architecture using embeddings. The left side of the table is for processing of the RGB image from the camera and
the right side of the table is for processing of the depth range image from the LiDAR. The feature vectors (of length 512) constructed from camera and LiDAR are
concatenated at layer 24.
Layer name Layer input (for Layer output Kernel Stride No. Layer name Layer input (for Layer output Kernel Stride No.
RGB image) size kernels LiDAR range size kernels
image)
1 Spatial 3 × 120 × 160 16 × 120 × 160 3 × 3 1 16 Spatial 1 × 900 × 16 16 × 900 × 16 3 × 3 1 16
convolution convolution
2 Rectified linear 16 × 120 × 160 16 × 120 × 160 – – – Rectified linear 16 × 900 × 16 16 × 900 × 16 – – –
unit unit
3 Spatial 16 × 120 × 160 16 × 120 × 160 3 × 3 1 16 Spatial 16 × 900 × 16 16 × 900 × 16 3 × 3 1 16
convolution convolution
4 Rectified linear 16 × 120 × 160 16 × 120 × 160 – – – Rectified linear 16 × 900 × 16 16 × 900 × 16 – – –
unit unit
5 Max pooling 16 × 120 × 160 16 × 60 × 80 2 × 2 2 – Max pooling 16 × 900 × 16 16 × 450 × 8 2 × 2 2 –
6 Spatial 16 × 60 × 80 32 × 60 × 80 3 × 3 1 32 Spatial 16 × 450 × 8 32 × 450 × 8 3 × 3 1 32
convolution convolution
7 Rectified linear 32 × 60 × 80 32 × 60 × 80 – – – Rectified linear 32 × 450 × 8 32 × 450 × 8 – – –
unit unit
8 Spatial 32 × 60 × 80 32 × 60 × 80 3 × 3 1 32 Spatial 32 × 450 × 8 32 × 450 × 8 3 × 3 1 32
convolution convolution
9 Rectified linear 32 × 60 × 80 32 × 60 × 80 – – – Rectified linear 32 × 450 × 8 32 × 450 × 8 – – –
unit unit
10 Max pooling 32 × 60 × 80 32 × 30 × 40 2 × 2 2 – Max pooling 32 × 450 × 8 32 × 225 × 4 2 × 2 2 –
11 Spatial 32 × 30 × 40 48 × 30 × 40 3 × 3 1 48 Spatial 32 × 225 × 4 48 × 225 × 4 3 × 3 1 48
convolution convolution
12 Rectified linear 48 × 30 × 40 48 × 30 × 40 – – – Rectified linear 48 × 225 × 4 48 × 225 × 4 – – –
unit unit
13 Spatial 48 × 30 × 40 48 × 30 × 40 3 × 3 1 48 Spatial 48 × 225 × 4 48 × 225 × 4 3 × 3 1 48
convolution convolution
14 Rectified linear 48 × 30 × 40 48 × 30 × 40 – – – Rectified linear 48 × 225 × 4 48 × 225 × 4 – – –
unit unit
15 Max pooling 48 × 30 × 40 48 × 15 × 20 2 × 2 2 – Max pooling 48 × 225 × 4 48 × 113 × 2 2 × 2 2 –
16 Spatial 48 × 15 × 20 64 × 15 × 20 3 × 3 1 64 Spatial 48 × 113 × 2 64 × 113 × 2 3 × 3 1 64
convolution convolution
17 Rectified linear 64 × 15 × 20 64 × 15 × 20 – – – Rectified linear 64 × 113 × 2 64 × 113 × 2 – – –
unit unit
18 Spatial 64 × 15 × 20 64 × 15 × 20 3 × 3 1 64 Spatial 64 × 113 × 2 64 × 113 × 2 3 × 3 1 64
convolution convolution
19 Rectified linear 64 × 15 × 20 64 × 15 × 20 – – – Rectified linear 64 × 113 × 2 64 × 113 × 2 – – –
unit unit
20 Max pooling 64 × 15 × 20 64 × 8 × 10 2 × 2 2 – Max pooling 64 × 113 × 2 64 × 57 × 1 2 × 2 2 –
21 Flatten 64 × 8 × 10 5120 – – – Flatten 64 × 57 × 1 3648 – – –
22 Fully connected 5120 512 – – – Fully connected 3648 512 – – –
23 Rectified linear 512 512 – – – Rectified linear 512 512 – – –
unit unit
24 Concatenate 512,512 1024 – – – – – – – – –
25 Fully connected 1024 32 – – – – – – – – –
26 Rectified linear 32 32 – – – – – – – – –
unit
27 Fully connected 32 10 – – – – – – – – –
28 Rectified linear 10 10 – – – – – – – – –
unit
29 Fully connected 10 1 – – – – – – – – –
Table 2
NetConEmb: Fusion architecture where the convolution feature maps are directly passed through
a fully connected network instead of first converting them into feature embeddings as done in
NetEmb. The first 20 layers and layers 25 to 29 are identical to NetEmb and the layers in bold are
the unique part of the network.
Layer name Layer input Layer output Layer name Layer input Layer output
1 .. 20 Same as Table 1
21 Flatten 64 × 8 × 10 5120 Flatten 64 × 57 × 1 3648
22 Concatenate 5120,3648 8768 – – –
23 Fully connected 8768 1024 – – –
24 Rectified linear unit 1024 1024 – – –
25 Fully connected 1024 32 – – –
26 Rectified linear unit 32 32 – – –
27 Fully connected 32 10 – – –
28 Rectified linear unit 10 10 – – –
29 Fully connected 10 1 – – –
N. Patel, A. Choromanska, P. Krishnamurthy et al. / Robotics and Autonomous Systems 116 (2019) 80–97 85
Table 3
NetGated: Fusion architecture with gating mechanism based on computing scalar weights from the feature embeddings and then
constructing a combination of the feature embeddings based on the scalar weights. The first 24 layers are identical to NetEmb and
layers in bold are the unique part of the architecture.
Layer name Layer input Layer output Layer name Layer input Layer output
1 .. 20 Same as Table 1
21 Flatten 64 × 8 × 10 5120 Flatten 64 × 57 × 1 3648
22 Fully connected 5120 512 Fully connected 3648 512
23 Rectified linear unit 512 512 Rectified linear unit 512 512
24 Concatenate 512,512 1024 – – –
25 Fully connected 1024 64 – – –
26 Rectified linear unit 64 64 – – –
27 Fully connected 64 2 – – –
28 Split 2 1,1 – – –
29 Multiplication with output 23 1 512 Multiplication with output 23 1 512
30 Addition 512,512 512 – – –
31 Fully connected 512 32 – – –
32 Rectified linear unit 32 32 – – –
33 Fully connected 32 1 – – –
Table 4
NetGatedDropout: Fusion architecture with gating mechanism based on computing scalar weights from the feature embeddings and
then constructing a combination of the feature embeddings based on the scalar weights. This network architecture is the same as
NetGated except that one additional layer (layer 28 shown in bold) is introduced.
Layer name Layer input Layer output Layer name Layer input Layer output
1 .. 20 Same as Table 1
21 Flatten 64 × 8 × 10 5120 Flatten 64 × 57 × 1 3648
22 Fully connected 5120 512 Fully connected 3648 512
23 Rectified linear unit 512 512 Rectified linear unit 512 512
24 Concatenate 512,512 1024 – – –
25 Fully connected 1024 64 – – –
26 Rectified linear unit 64 64 – – –
27 Fully connected 64 2 – – –
28 Dropout (with p = 0.5) 2 2 – – –
29 Split 2 1,1 – – –
30 Multiplication with output 23 1 512 Multiplication with output 23 1 512
31 Addition 512, 512 512 – – –
32 Fully connected 512 32 – – –
33 Rectified linear unit 32 32 – – –
34 Fully connected 32 1 – – –
In NetGated, the feature embedding architecture is similar to generalized Bernouilli distribution) are 2N − 1 for N sensors as
NetEmb. The embeddings are passed through a gating network we do not include the configuration corresponding to when both
to fuse the information from both the modalities to generate the modalities are turned off.
steering command. The gating network takes the two embeddings
obtained from RGB image and range image as input and outputs 5.2. Implementation and training
the corresponding two weights. These weights are then used
to perform a weighted sum of the embeddings. This weighted
The inputs to the networks are the normalized RGB image with
sum is then passed through two fully connected networks to
a field of view of 72◦ and the LiDAR range image which is cropped
obtain the steering command. Each of the considered network
such that the front half with a field of view of 180◦ is visible.
architectures is an end-to-end deep learning system that takes
Both the modalities are normalized by making each channel of
an RGB image and a LiDAR depth range image as input and
fuses the modalities using a deep neural network to predict the modality in the training dataset zero mean with a standard
the appropriate steering command of the ground vehicle for deviation of 1. At testing time, the mean and standard deviation
autonomous navigation. NetGated has 4,802,945 parameters. In calculated during training are used to normalize the input.
NetGatedDropout, we add an extra dropout layer to the two To train the networks, camera and LiDAR datasets were ob-
weights which randomly makes them zero during training. This tained by manually driving the vehicle (with constant speed)
essentially regularizes the network by making the network not through the corridor environment obtaining approximately the
dependent on necessarily having both the modalities to predict same amount of training data for straight motion, left turns,
the steering command and thus essentially results in the network and right turns. The network was trained on a dataset of 14 456
learning to perform robust sensor fusion. The proposed dropout images and the corresponding range images (around 24 min of
regularization is different from the usually utilized dropout. The data). It was trained using Adagrad optimizer with a base learning
usually utilized dropout layer is based on Bernouilli distribution rate of 0.01. Bias terms for all the layers in the networks are
whereas the proposed dropout regularization is based on gen- disabled. The network is implemented using the PyTorch frame-
eralized Bernoulli distribution where a configuration is selected work [66] and trained using an NVIDIA Titan X Pascal GPU based
based on the given probability. The number of configurations (for workstation.
86 N. Patel, A. Choromanska, P. Krishnamurthy et al. / Robotics and Autonomous Systems 116 (2019) 80–97
Fig. 5. NetGatedDropout: Our proposed end-to-end learning based architecture for fusion of camera and LiDAR sensors.
Our end-to-end learning framework learns to predict the ap- also compare the performance of the original NetGated, Net-
propriate steering command by learning the weights of the net- Gated retrained with training procedure described in the previ-
work which minimize the Huber loss between the predicted ous section, and NetGatedDropout for various conditions of the
steering commands and the recorded human steering commands. modalities (e.g., sensor data corruption).
We use the Huber loss instead of mean squared error since an
instability due to the divergence of the gradients was noted with 6.1. Performance of different network architectures
mean squared error loss. The Huber loss [67] for bounding box
regression and is given by To evaluate the performance of the original architectures
⎧
1 (namely NetEmb, NetConEmb, and NetGated as described in Ta-
⎨ (y − f (x))2 ,
⎪ for ∥y − f (x)∥ ≤ δ bles 1, 2, and 3, respectively), steering command predictions of
L(y, f (x)) = 2 (3) each network were compared with the steering commands of
⎩ ∥y − f (x)∥ − 1 δ 2 ,
⎪
other w ise. a human operator. To evaluate the performance of the above
2 fusion architecture impartially, all the networks have similar
We follow the training procedure explained in our previous structure except for their respective fusion mechanisms. This
work [8] for NetEmb, NetConEmb, and NetGated. To train the evaluation was done using a different dataset in a different cor-
network to utilize either sensors when available and also to be ridor environment (test dataset) than the one used for training.
robust to the possibility of sensor failure, the training of these The results of each of the architectures compared to the human
networks was performed in two stages. In the first stage of operator are shown in Fig. 6 where the steering commands
training, the network is trained with the corresponding LiDAR given by a human operator (during teleoperation of the UGV)
depth range images and camera RGB images for each time step are denoted as the ground truth. Additionally, Fig. 7 also shows
as inputs. In the second stage, the training of the network is the error (between network steering command prediction and
continued with corrupted data (i.e., with one modality shut down ground truth) frequency plots of the steering command to better
to mimic sensor failure). Specifically, the network is trained with interpret the performance of each architecture. The left and
60% corrupted data for each epoch out of which 50% of the data right plot of both the figures show clockwise (right turns) and
is with the camera shut off (i.e., zero values for all elements in counterclockwise (left turns) navigations of a complete floor of a
the RGB image) and 50% is with the LiDAR shut off. corridor environment.
Due to the architecture of NetGatedDropout which consists of As shown in Figs. 6 and 7, the utilization in NetEmb of an
the dropout layer, modality embeddings are randomly dropped equal-size embedding (constructed using a fully connected layer)
(to zero) which is similar to randomly shutting down one of the for each modality after the last convolution layer provides better
networks. As a result, retraining is not required for the NetGated- performance than NetConEmb as hypothesized in Section 5.1. The
Dropout architecture and it learns to fuse modalities through NetEmb architecture performs better when one of the modalities
end-to-end learning. The probability of dropping either weight is is switched off and also oscillates less compared to the Net-
set to 50% for the dropout layer. ConEmb architecture. As discussed in Section 5.1, for NetConEmb,
It is seen in Section 6 that the proposed network architecture the number of features for the camera after the last convolution
provides robust performance under sensor failures and various layer is much larger than for the LiDAR. This causes the output to
data corruptions and implicitly learns to use the relevant infor- become more dependent on one modality resulting in unbalanced
mation from both modalities to generate steering commands. We fusion. As a result, the steering commands oscillate more, similar
compare the NetGated and NetGatedDropout networks trained to the behavior of the camera-only network. Motivated by the
only on the original dataset and the NetGated network retrained observations above, fully connected layer based embeddings for
with the corrupted dataset and show that the NetGatedDropout each modality were also used in the NetGated architecture. An
network provides superior performance than both the original additional advantage of using an equal-size embedding for each
NetGated network and the retrained NetGated network (retrained modality is that it is then easier and more natural to fuse the
with the corrupted dataset) when one of the sensor modalities embeddings by the learned gated weights by simply taking a
fails or is partially occluded or noisy. Also, the NetGatedDropout weighted linear combination. As shown in Fig. 6, the NetGated
network retains the performance of the original/retrained Net- architecture based network learns to move straight with fewer
Gated networks when both the sensors are present. oscillations than even the human operator. The fusion of camera
and LiDAR results in a smoother output than a LiDAR-only system
6. Experimental studies as shown in Fig. 6.
Since a desirable characteristic of motion in the indoor corri-
In this section, experimental results are presented for the de- dor environment is that the ground vehicle should approximately
scribed architectures (NetEmb, NetConEmb, NetGated, NetGated- track the center of the corridor and should not come too close
Dropout), which are trained using both camera and LiDAR. We to walls when turning, a useful metric for the performance of
N. Patel, A. Choromanska, P. Krishnamurthy et al. / Robotics and Autonomous Systems 116 (2019) 80–97 87
Fig. 6. Steering command predictions of NetEmb, NetConEmb, and NetGated architecture when both camera and LiDAR are working.
Fig. 7. Error frequency plots for steering commands of NetEmb, NetConEmb, and NetGated architecture when both camera and LiDAR are working.
the system is the distance of the vehicle to the left side and Table 5
Standard deviation of minimum distances to the wall for a clockwise trajectory
right side walls/objects. The closest distances on the left and
under fully autonomous mode.
right sides vary quite significantly even for an ‘‘ideal’’ motion
Network type Network Left wall distance Right wall distance
due to several objects such as trash cans, boxes, empty spaces input standard deviation standard deviation
and open office doors at some locations. To remove such ‘‘noise’’ (in m) (in m)
effects, an effective performance metric is the standard deviation 1 NetConEmb Camera and 0.48 0.43
(rather than mean) of distances to the left side and right side network LiDAR
2 NetEmb Camera and 0.38 0.35
walls/objects. In addition to standard deviation, Fig. 8 shows the network LiDAR
histogram plots of left and right distances (computed from LiDAR 3 NetGated Camera and 0.32 0.31
range images) for human teleoperation and different architecture. network LiDAR
4 Human Camera and 0.34 0.3
These standard deviations were recorded under fully
Teleoperation LiDAR
autonomous mode (i.e., with the network providing the com-
mands to the autopilot) with the different networks for both
clockwise and counterclockwise directions. The measured stan-
dard deviations for a clockwise motion through the building 6.2. Saliency map visualization
corridor environment (one complete floor of the building) are
shown in Table 5 and it is noted that the NetGated network A saliency analysis of the proposed framework was performed
architecture provides the best (lowest) standard deviation; a sim- to determine which parts of the camera image and LiDAR range
ilar observation was also noted for a counterclockwise motion. image were important for prediction. These salient features of the
NetGatedDropout performs similarly to NetGated in the absence image can be visualized as a grayscale image (as shown in Figs. 9
of sensor noise/dropouts. The performance of NetGatedDropout and 10) with the same dimension as the input. For both camera
image and LiDAR range image, saliency maps are shown in Figs. 9
is shown in Table 6.
and 10. The brightness of each pixel of the saliency map is directly
For all the considered network architectures, it is noted that
proportional to its importance in determining the output of the
the system trained on a dataset with both camera and LiDAR
network.
data is not directly robust to the possibility of a sensor failing
The saliency map for a particular input is computed through
(i.e., only one sensor modality available and the other zeroed
gradient of the output with respect to the input as it determines
out). For example, NetGated places much more trust on the the change in output value with respect to the change in input.
LiDAR input than on the camera input and does not provide any The saliency maps were computed based on the guided back-
reasonable performance in the event of a LiDAR failure. Hence, in propagation method [68,69] where during backpropagation for
order to achieve robustness to sensor failure, we introduce the computing gradients, only positive gradients for positive acti-
training strategy described in Section 5.2 to continue retraining vations are propagated. Thus, to compute the saliency map for
of the network with corrupted data generated by synthetically each input, the ReLU activations are determined in the forward
turning off either of the two modalities. The performance of the pass and positive gradients are determined during the back-
retrained NetGated network (after retraining with this corrupted propagation. Next, using these positive gradients and positive
data based technique) is compared in Section 6.3 with the original activation as switches in backpropagation, the gradient of the
trained NetGated network and the human operator. output with respect to the inputs are determined. These gradients
88 N. Patel, A. Choromanska, P. Krishnamurthy et al. / Robotics and Autonomous Systems 116 (2019) 80–97
Fig. 8. Histogram of distances of the ground vehicle from the left wall and right wall for trajectories of different networks and human teleoperation in the corridor
environment.
Fig. 9. Saliency map visualization of activations of the NetGatedDropout architecture for various camera images. We have camera image inputs for the UGV in
various scenarios on the top and their corresponding saliency maps in the bottom.
are visualized by normalizing it through subtracting and dividing compared with ground truth. Fig. 12 shows the error frequency
the minimum and maximum gradient value from each pixel. plots for the above conditions. We simulate the conditions of
In the top row of Fig. 14, we have the camera image inputs camera or LiDAR being shut off by turning all the pixels of that
for the UGV in various scenarios and the corresponding activation modality to zero. The left and right plot of both the figures
saliency maps in the bottom. The most salient parts of the images show clockwise (right turns) and counterclockwise (left turns)
are near the intersection of walls and floors and edges of objects navigations of a complete floor of a corridor environment.
in the corridor environment like trashcans and doors. The LiDAR The performance characteristics (in terms of trajectory tra-
range images (during straight, left and right turns) and their versed) of the retrained NetGated network and NetGatedDropout
corresponding saliency maps are shown in Fig. 15 in the left and were also evaluated (under the possibilities of both camera and
right column respectively. As shown in the figure, the salient LiDAR available, only camera available, and only LiDAR available)
parts of the range image are around the wall and obstacles near using the distance standard deviation based metric as discussed
the UGV. Thus, from Figs. 14 and 15 it can be inferred that the above under fully autonomous operation of the UGV (Fig. 13). It is
proposed framework intuitively attends to nearby walls, floor and noted in Table 6 that both the NetGatedDropout and the retrained
obstacles to decide the output. NetGated network achieve autonomous navigation through the
corridor although the camera-only and LiDAR-only situations pro-
6.3. Performance of networks during data corruption vide lower performance (i.e., higher distance standard deviation)
than the camera + LiDAR situation. We also observe that Net-
We compare the performance of the original NetGated archi- GatedDropout has lower standard deviation than the retrained
tecture, the NetGated architecture when retrained with corrupted NetGated network in all the cases.
data as explained in 5.2, and the NetGatedDropout architecture We also compare the predictions of all three networks (Net-
for various conditions of sensor failures as described below. Gated, NetGatedDropout, and retrained NetGated) with the hu-
man teleoperation based ground truths under the conditions of
6.3.1. Either modality turned off only camera working, only LiDAR working, and both modalities
As shown in Figs. 11 and 12, when both camera and LiDAR working using root mean squared error and discretized accu-
are working, all the network architectures perform well and racy metric. As discussed before, all three network architectures
have very similar performance; but, when one of the modalities predict the duty cycle of the PWM signal, which in turn is con-
is shut off, the original NetGated architecture gives erroneous verted to the steering command. To compare ‘‘correctness’’ of
predictions. In Fig. 11, the predictions of the various architectural the predicted outputs, we discretize the outputs as described
frameworks when only camera is working, only LiDAR is working, below. We set a threshold of 10% to differentiate between straight
and when both sensor modalities are working are shown and movement, left turn, and right turn. A duty cycle between −10%
N. Patel, A. Choromanska, P. Krishnamurthy et al. / Robotics and Autonomous Systems 116 (2019) 80–97 89
Fig. 10. Saliency map visualization of the LiDAR range image for the activations of the NetGatedDropout network. We have the LiDAR range image input for the
UGV in three different scenarios on the left and their corresponding saliency maps on the right.
Fig. 11. Steering command predictions using the NetGated and NetGatedDropout architectures trained only with the camera + LiDAR dataset and the NetGated
retrained with corrupted data under cases of only camera working (top), only LiDAR working (middle), and both camera and LiDAR working (bottom).
and 10% corresponds to the ground vehicle moving straight. If 6.3.2. Partial occlusion
the duty cycle is greater than 10%, then it is equivalent to a right One of the most common instances of failures in vision and
LiDAR-based navigation is when there are partial occlusions of
turn and if the duty cycle is less than −10%, it is equivalent to a
camera or LiDAR range image. Our framework takes into consid-
left turn. Thus, by using the above methodology, the predictions eration performance of the network when either of the modalities
and ground truth are discretized into left/straight/right. The dis- is partially occluded. We show that our proposed framework pro-
cretized accuracy metrics of all three networks on the test dataset vides robustness to partial occlusion of the sensor data without
ever being specifically trained for it.
are shown in Table 7. The root mean squared error between
For a camera image, various types of partial occlusions were
the network predictions and human teleoperation ground truth considered and some of the worst-case scenarios are shown in the
for the same trajectory are shown in Table 8. It is observed last two images of the bottom row of Fig. 14. To test our network,
from the tables that NetGatedDropout performs better than both we selected a part of the camera image and made the values of
the pixels in that part zero. The steering command predictions
the retrained NetGated and the original NetGated architecture.
compared to the ground truth on a test dataset in a different
The NetGated architecture fails to predict the steering command corridor environment for the three networks (NetGated, retrained
when either of the modalities is not present. NetGated, and NetGatedDropout) are shown in the top row of
90 N. Patel, A. Choromanska, P. Krishnamurthy et al. / Robotics and Autonomous Systems 116 (2019) 80–97
Fig. 12. Error frequency plots for steering commands of the NetGated and NetGatedDropout architectures trained only with the camera + LiDAR dataset and the
NetGated retrained with corrupted data under cases of only camera working (top), only LiDAR working (middle), and both camera and LiDAR working (bottom).
Fig. 13. Distances of the ground vehicle from the left wall and right wall (in m) for counterclockwise (left turns) navigations in the corridor environment for
NetGatedDropout architecture.
Fig. 16. The top row of Fig. 17 shows the error frequency metric To empirically verify the results, we also compare the predic-
for the same. As shown in the figures, the NetGatedDropout tions of all three networks with the ground truth from human
architecture performs the best out of the three architectures. teleoperation using discretized accuracy metric and root mean
The performance of the three architectures was also tested squared error metric as explained in the previous subsection. The
for partially occluded LiDAR range image and some of the test results are shown in Tables 7 and 8 from which we can ascertain
that the NetGatedDropout architecture performs better than both
cases of occlusion are shown in the middle row of Fig. 15. Partial
the retrained NetGated and original NetGated architecture when
occlusion is simulated for a LiDAR range image during testing by
the camera image or the range image is partially occluded.
making the pixels of a part of the range image zero. We observe
from the graphs in the bottom row of Fig. 16 and error frequency 6.3.3. Camera image with varying image formations
graph in Fig. 17 that both NetGated and the retrained NetGated As shown in the top row of Fig. 14, the pixel intensities of the
fail to give accurate steering command predictions on the test camera image are varied by changing brightness, contrast, and
dataset and only NetGatedDropout is able to correctly predict saturation. The brightness of the image is varied by alpha blend-
steering commands. ing the original camera image with an image with all its pixels
N. Patel, A. Choromanska, P. Krishnamurthy et al. / Robotics and Autonomous Systems 116 (2019) 80–97 91
Fig. 14. Examples of various types of data corruption for a camera image. In the top row, we have the original camera image, image with brightness changed, image
with varying contrast, and image with varying saturation, respectively. In the bottom row, we have the image with additive random noise, multiplicative random
noise, and partially occluded images in different directions, respectively.
Fig. 15. Examples of various types of data corruption for a LiDAR range image. In the top row, we have the original LiDAR range image. We have partially occluded
LiDAR range image in various directions in the second row and image with additive random noise and multiplicative random noise in the bottom row.
Fig. 16. Steering command predictions using the different network architectures when camera images are partially occluded (top) and when LiDAR range images
are partially occluded (bottom).
zero. The saturation of the image is changed by alpha blending the of its own grayscale image. Alpha used for alpha blending or alpha
original camera image with its own grayscale image. The contrast
compositing is randomly generated between 0 and 1 for each
of the image is modified by alpha blending the original camera
image with an image whose pixel values are equal to the mean camera image.
92 N. Patel, A. Choromanska, P. Krishnamurthy et al. / Robotics and Autonomous Systems 116 (2019) 80–97
Fig. 17. Error frequency plots for steering commands of the different network architectures when camera images are partially occluded (top) and when LiDAR range
images are partially occluded (bottom).
Fig. 18. Steering command predictions using the different network architectures when camera brightness is changed.
Fig. 19. Error frequency plots for steering commands of the different network architectures when camera brightness is changed.
The performance of each network is shown in Fig. 18 and test the proposed network for various types of noise, namely
error frequency plot in Fig. 19, when the brightness of the camera additive random noise, multiplicative random noise, and salt and
image is modified. As observed from the figures, we see that pepper noise.
the original NetGated network gives oscillating and inaccurate For the camera image, we generate an image with noise am-
steering command predictions. The retrained NetGated performs plitude of half the maximum intensity and add it to the original
better than the original NetGated but does not perform as well image, resulting in an image such as the first image in the bottom
as the NetGatedDropout network. Similar results were observed row of Fig. 14. We also test our network on images whose pixels
when contrast or saturation was varied. are multiplied by random numbers generated between 0 and 1.
An example of this multiplicative random noise is shown in the
6.3.4. Data corruption by random noise second image in the bottom row of Fig. 14. The other type of noise
Inaccurate predictions due to image noise are one of the that we experimented with is the salt and pepper noise wherein
prevalent issues for any vision-based method. The noise is usually randomly selected sets of pixels in the image are made black or
caused by problems in the electronic circuitry of the sensors. We white.
N. Patel, A. Choromanska, P. Krishnamurthy et al. / Robotics and Autonomous Systems 116 (2019) 80–97 93
Fig. 20. Steering command predictions using the different network architectures when random noise is added to the camera images (top) and when random noise
is added to LiDAR range images (bottom).
Fig. 21. Error frequency plots for steering commands of the different network architectures when random noise is added to the camera images (top) and when
random noise is added to LiDAR range images (bottom).
Similar to the generation of the noisy camera images, three is observed from both the figures that the NetGatedDropout
different types of noisy LiDAR images were generated: (1) by architecture performs the best out of all the considered archi-
adding random noise with amplitude of half the maximum range tectures. This observation is empirically verified by comparing
value to each pixel of the range image, (2) with multiplica-
the predictions of all three networks with the ground truth using
tive random noise, (3) with salt and pepper noise. The random
noise in 2nd and 3rd cases are generated following the same discretized accuracy metric as explained in the previous subsec-
procedures as for generating camera images with multiplicative tion and root mean square error metric. The results are shown in
random noise and with salt and pepper noise, respectively. Tables 7 and 8 from which we can validate that the NetGated-
The performance of the three architectures (NetGated, re- Dropout architecture provides the best results in both the cases
trained NetGated, and NetGatedDropout) when the camera and
of the noisy camera images and the noisy LiDAR range images.
LiDAR images are corrupted with additive random noise is shown
in Fig. 20 (top and bottom rows, respectively). The error fre- Similar results were observed for the cases when the camera and
quency plots for the same, are shown in top and bottom rows LiDAR range images were corrupted with multiplicative random
of Fig. 21 for camera and LiDAR range images, respectively. It noise and salt and pepper noise.
94 N. Patel, A. Choromanska, P. Krishnamurthy et al. / Robotics and Autonomous Systems 116 (2019) 80–97
Table 6
Standard deviation of minimum distances to the walls using the retrained Net-
Gated and NetGatedDropout architectures when various modalities are turned
off for a counterclockwise trajectory.
Network type Network Left wall distance Right wall distance
input standard deviation standard deviation
(in m) (in m)
1 NetGated Camera 0.43 0.39
(Retrained)
architecture
2 NetGated- Camera 0.33 0.36
Dropout
architecture
3 NetGated LiDAR 0.37 0.32
(Retrained)
architecture
4 NetGated- LiDAR 0.31 0.32
Dropout
architecture
5 NetGated Camera and 0.33 0.32
(Retrained) LiDAR
architecture
6 NetGated- Camera and 0.31 0.3
Dropout LiDAR
architecture
Fig. 22. Example trajectories of a UGV autonomously navigating in an indoor
environment: left turn (top row), straight motion (middle row), right turn
(bottom row). These pictures were taken from behind the UGV.
6.4. Autonomous indoor navigation of the ground vehicle
The NetGatedDropout architecture based framework is able and right turns while the LiDAR enables the fused network to
to successfully navigate through the indoor corridor environ- detect the appropriate turn. When passing an open door or other
ment and is robust to sensor failures and various other data open spaces (such as a short corridor leading to a dead end),
corruptions. Autonomous navigation through corridors is shown the LiDAR being a more geometric sensor measuring distances
in Fig. 22. The ground vehicle is able to appropriately make
to points tends to make a LiDAR-only system move towards the
turns at corners enabling it to be equidistant from the walls
open space. However, the visual features implicitly detected from
after the turn. It is also able to navigate through narrower spaces
the camera enable the fused network to completely ignore such
(e.g., between trash cans) as shown in the middle two rows of
an ‘‘unintended’’ open space and remain at the center of the
Fig. 22.
corridor (Fig. 24).
Furthermore, the system implicitly learns to avoid static as
well as dynamic obstacles as shown in Fig. 23 without ever being
specifically trained for this purpose, i.e., the training dataset did 7. Conclusion
not include any specific demonstrations of moving around obsta-
cles. Also, the fused camera + LiDAR network performs better in A convolutional neural network based architecture was intro-
several scenarios than the camera-only or LiDAR-only situations. duced for fusing vision and depth measurements from camera
While a LiDAR-only network can enable avoiding of obstacles and LiDAR, respectively, for learning an autonomous navigation
such as humans, it does not typically detect small (low-profile) policy for a ground robot operating in an indoor environment.
objects since these register only a few points in the LiDAR scan. Multiple network architectures were considered including a novel
In such situations, the camera image enables the fused network dropout regularization aided gating based network architecture.
to avoid the obstacle. When approaching a visually featureless This architecture enables the ground vehicle navigation to be
wall, a camera-only system cannot disambiguate between left robust to the possibility of sensor failure or data corruption and to
Table 7
Comparison of accuracy metric values for the steering command predictions of NetGated, NetGated retrained by randomly switching off modalities, and
NetGatedDropout with the steering command ground truth (UGV driven by human operator).
Network type Both modalities Camera LiDAR additive Camera only LiDAR only Camera image LiDAR range
(in %) additive noise noise (in %) (in %) (in %) partially image partially
(in %) occluded (in %) occluded (in %)
NetGated 92.25 88.83 86.93 8.81 1.76 72.57 31.42
NetGated (Retrained) 92.24 88.85 80.83 91.84 88.09 90.09 45.39
NetGatedDropout 93.65 93.40 86.95 95.234 90.235 94.22 92.89
Table 8
Comparison of root mean squared error metric values for the steering command predictions of NetGated, NetGated retrained by randomly switching off modalities,
and NetGatedDropout with the steering command ground truth (UGV driven by human operator).
Network type Both modalities Camera LiDAR Camera only LiDAR only Camera image LiDAR range image
(in PWM %) additive noise additive noise (in PWM %) (in PWM %) partially occluded partially occluded
(in PWM %) (in PWM %) (in PWM %) (in PWM %)
NetGated 4.71 5.28 4.96 14.41 25.04 10.16 15.56
NetGated (Retrained) 4.88 5.40 5.54 5.35 8.24 7.02 13.35
NetGatedDropout 4.40 4.63 5.39 4.67 5.85 4.88 4.59
N. Patel, A. Choromanska, P. Krishnamurthy et al. / Robotics and Autonomous Systems 116 (2019) 80–97 95
[33] I. Kostavelis, A. Gasteratos, Learning spatially semantic representations for [56] J. Sung, I. Lenz, A. Saxena, Deep multimodal embedding: manipulat-
cognitive robot navigation, Robot. Auton. Syst. 61 (12) (2013) 1460–1475. ing novel objects with point-clouds, language and trajectories, in: 2017
[34] M. Bojarski, A. Choromanska, K. Choromanski, B. Firner, L.D. Jackel, U. IEEE International Conference on Robotics and Automation, ICRA 2017,
Muller, K. Zieba, Visualbackprop: visualizing CNNs for autonomous driving, Singapore, Singapore, May 29 - June 3, 2017, 2017, pp. 2794–2801.
CoRR, abs/1611.05418, 2016. [57] C. Hazirbas, L. Ma, C. Domokos, D. Cremers, Fusenet: Incorporating depth
[35] S. Lefevre, A. Carvalho, F. Borrelli, A learning-based framework for velocity into semantic segmentation via fusion-based CNN architecture, in: Pro-
control in autonomous driving, IEEE Trans. Autom. Sci. Eng. 13 (1) (2016) ceedings of the Asian Conference on Computer Vision, Taipei, Taiwan,
32–42. 2016, pp. 213–228.
[36] C. Richter, N. Roy, Learning to plan for visibility in navigation of un- [58] S. Song, J. Xiao, Deep sliding shapes for amodal 3D object detection in
known environments, in: Proceedings of the International Symposium on RGB-D images, in: Proceedings of the IEEE Conference on Computer Vision
Experimental Robotics, Springer, 2016, pp. 387–398. and Pattern Recognition (CVPR), Las Vegas, USA, 2016, pp. 808–816.
[37] C. Richter, N. Roy, Safe visual navigation via deep learning and novelty [59] M. Giering, V. Venugopalan, K. Reddy, Multi-modal sensor registration for
detection, in: Proceedings of the Robotics: Science and Systems Conference, vehicle perception via deep neural networks, in: Proceedings of the High
Cambridge, Massachusetts, 2017. Performance Extreme Computing Conference (HPEC), Waltham, USA, 2015,
[38] A. Dairi, F. Harrou, M. Senouci, Y. Sun, Unsupervised obstacle detection pp. 1–6.
in driving environments using deep-learning-based stereovision, Robot. [60] A. Eitel, J.T. Springenberg, L. Spinello, M.A. Riedmiller, W. Burgard, Multi-
Auton. Syst. 100 (2018) 287–301. modal deep learning for robust RGB-D object recognition, in: 2015 IEEE/RSJ
[39] H.F.M. Zaki, F. Shafait, A.S. Mian, Learning a deeply supervised multi-modal International Conference on Intelligent Robots and Systems, IROS 2015,
RGB-D embedding for semantic scene and object category recognition, Hamburg, Germany, September 28 - October 2, 2015, 2015, pp. 681–687.
Robot. Auton. Syst. 92 (2017) 41–52. [61] A. Valada, G.L. Oliveira, T. Brox, W. Burgard, Deep multispectral semantic
[40] G. Liu, A. Siravuru, S. Prabhakar, M.M. Veloso, G. Kantor, Learning end-to-
scene understanding of forested environments using multimodal fusion,
end multimodal sensor policies for autonomous navigation, in: Proceedings
in: International Symposium on Experimental Robotics, ISER 2016, Tokyo,
of the 1st Annual Conference on Robot Learning (CoRL), in: ser. Proceedings
Japan, October (2016) 3-6, 2016, pp. 465–477.
of Machine Learning Research, vol. 78, PMLR, California, USA, 2017, pp.
[62] O. Mees, A. Eitel, W. Burgard, Choosing smartly: Adaptive multimodal fu-
249–261.
sion for object detection in changing environments, CoRR, abs/1707.05733,
[41] A. Giusti, J. Guzzi, D.C. Cireşan, F.-L. He, J.P. Rodríguez, F. Fontana, M.
2017.
Faessler, C. Forster, J. Schmidhuber, G. Di Caro, et al., A machine learning
[63] A. Valada, J. Vertens, A. Dhall, W. Burgard, Adapnet: Adaptive semantic
approach to visual perception of forest trails for mobile robots, IEEE Robot.
segmentation in adverse environmental conditions, in: 2017 IEEE Inter-
Automat. Lett. 1 (2) (2016) 661–667.
national Conference on Robotics and Automation, ICRA 2017, Singapore,
[42] F. Sadeghi, S. Levine, CAD2RL: Real single-image flight without a single real
Singapore, May 29 - June 3, 2017, 2017, pp. 4644–4651.
image, in: Proceedings of the Robotics: Science and Systems Conference,
[64] R. Mur-Artal, J.D. Tardós, ORB-SLAM2: an open-source SLAM system for
Cambridge, Massachusetts, 2017.
monocular, stereo, and RGB-D cameras, IEEE Trans. Robot. 33 (5) (2017)
[43] D. Gandhi, L. Pinto, A. Gupta, Learning to fly by crashing, in: Proceedings
1255–1262.
of the IEEE/RSJ International Conference on Intelligent Robots and Systems
[65] R. Mur-Artal, J.M.M. Montiel, J.D. Tardos, Orb-slam: a versatile and accurate
(IROS), Vancouver, Canada, 2017, pp. 3948–3955.
monocular slam system, IEEE Trans. Robot. 31 (5) (2015) 1147–1163.
[44] W.D. Smart, L.P. Kaelbling, Effective reinforcement learning for mobile
robots, in: Proceedings of the IEEE International Conference on Robotics [66] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A.
and Automation (ICRA), Washington, DC, USA, 2002, pp. 3404–3410. Desmaison, L. Antiga, A. Lerer, Automatic differentiation in pytorch, 2017.
[45] J. Michels, A. Saxena, A.Y. Ng, High speed obstacle avoidance using [67] R.B. Girshick, Fast R-CNN, in: Proceedings of the IEEE International
monocular vision and reinforcement learning, in: Proceedings of the Conference on Computer Vision (ICCV), Santiago, Chile, 2015, pp.
International Conference on Machine Learning (ICML), in: ser. ACM In- 1440–1448.
ternational Conference Proceeding Series, vol. 119, ACM, Bonn, Germany, [68] M.D. Zeiler, R. Fergus, Visualizing and understanding convolutional net-
2005, pp. 593–600. works, in: Proceedings of the European Conference on Computer Vision,
[46] Y. Zhu, R. Mottaghi, E. Kolve, J.J. Lim, A. Gupta, L. Fei-Fei, A. Farhadi, Zurich, Switzerland, 2014, pp. 818–833.
Target-driven visual navigation in indoor scenes using deep reinforcement [69] J.T. Springenberg, A. Dosovitskiy, T. Brox, M. Riedmiller, Striving for
learning, in: Proceedings of the IEEE International Conference on Robotics simplicity: The all convolutional net, in: Proceedings of the International
and Automation (ICRA), Singapore, Singapore, 2017, pp. 3357–3364. Conference on Learning Representations (ICLR) Workshop, San Diego, USA,
[47] W. Gao, D. Hsu, W.S. Lee, S. Shen, K. Subramanian, Intention-net: Integrat- 2015.
ing planning and deep learning for goal-directed autonomous navigation,
in: Proceedings of the 1st Annual Conference on Robot Learning (CoRL), in:
ser. Proceedings of Machine Learning Research, vol. 78, PMLR, California,
Naman Patel received his Master degree in Electrical
USA, 2017, pp. 185–194. Engineering from NYU Tandon School of Engineering,
[48] Y.F. Chen, M. Everett, M. Liu, J.P. How, Socially aware motion planning with Brooklyn, New York, in 2016, where he is cur-
deep reinforcement learning, in: Proceedings of the IEEE/RSJ International rently working towards his Ph.D. degree with the
Conference on Intelligent Robots and Systems (IROS), Vancouver, Canada, Control/Robotics Research Laboratory (CRRL) headed
2017, pp. 1343–1350. by Professor Farshad Khorrami. His research inter-
[49] C. Xia, A.E. Kamel, Neural inverse reinforcement learning in autonomous ests include developing algorithms for autonomous
navigation, Robot. Auton. Syst. 84 (2016) 1–14. unmanned ground and aerial vehicles, simultaneous
[50] A. Khan, C. Zhang, N. Atanasov, K. Karydis, V. Kumar, D.D. Lee, Mem- localization and mapping systems, computer vision and
ory augmented control networks, in: Proceedings of the International artificial intelligence.
Conference on Learning Representations (ICLR), 2018.
[51] S. Gupta, J. Davidson, S. Levine, R. Sukthankar, J. Malik, Cognitive mapping
and planning for visual navigation, in: Proceedings of the IEEE Conference Anna Choromonska received her Ph.D. degree from
the Department of Electrical Engineering at Columbia
on Computer Vision and Pattern Recognition, 2017.
University in the City of New York in 2014, and a
[52] J. Zhang, L. Tai, J. Boedecker, W. Burgard, M. Liu, Neural SLAM, CoRR,
M.Sc. degree with distinctions from the Department
abs/1706.09520, 2017.
of Electronics and Information Technology at Warsaw
[53] D. Pathak, P. Mahmoudieh, G. Luo, P. Agrawal, D. Chen, F. Shentu, E.
University of Technology. She did her Post-Doctoral
Shelhamer, J. Malik, A.A. Efros, T. Darrell, Zero-shot visual imitation, in:
studies in the Computer Science Department at Courant
International Conference on Learning Representations (ICLR), 2018. Institute of Mathematical Sciences in NYU. She joined
[54] H. Zhu, J.-B. Weibel, S. Lu, Discriminative multi-modal feature fusion for the Department of Electrical and Computer Engineering
RGBD indoor scene recognition, in: Proceedings of the IEEE Conference on at NYU Tandon School of Engineering in the Spring
Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 2016, 2017 as an Assistant Professor. Prof. Choromanska’s
pp. 2969–2976. research interests focus on numerical optimization, deep learning, and large
[55] S. Gupta, R. Girshick, P. Arbeláez, J. Malik, Learning rich features from RGB- data analysis with applications to autonomous car driving. She is also working
D images for object detection and segmentation, in: Proceedings of the on learning from data streams, learning with expert advice, supervised and
European Conference on Computer Vision, Springer, Zurich, Switzerland, unsupervised online learning, clustering, and structured prediction. Prof. Choro-
2014, pp. 345–360. manska has co-authored several international conference papers and refereed
N. Patel, A. Choromanska, P. Krishnamurthy et al. / Robotics and Autonomous Systems 116 (2019) 80–97 97
journal publications, as well as book chapters. She is also a contributor to Farshad Khorrami received his Bachelors degrees in
the open source fast out-of-core learning system Vowpal Wabbit (aka VW). Mathematics and Electrical Engineering in 1982 and
Prof. Choromanska gave over 50 invited and conference talks and serves as a 1984 respectively from The Ohio State University. He
book editor (MIT Press volume), organizer of machine learning events at top also received his Master’s degree in Mathematics and
venues like NIPS, and a reviewer and area chair for several top machine learning Ph.D. in Electrical Engineering in 1984 and 1988 from
conferences (e.g., ICML, NIPS, and AISTATS) and journals (e.g., Transactions on The Ohio State University. Dr. Khorrami is currently a
Pattern Analysis and Machine Intelligence and Machine Learning). professor of Electrical & Computer Engineering Depart-
ment at NYU where he joined as an assistant professor
in Sept. 1988. His research interests include adaptive
Prashanth Krishnamurthy received his B.Tech. degree and nonlinear controls, robotics and automation, un-
in electrical engineering from Indian Institute of Tech- manned vehicles (fixed-wing and rotary wing aircrafts
nology, Chennai in 1999, and M.S. and Ph.D. degrees in as well as underwater vehicles and surface ships), resilient control for industrial
electrical engineering from Polytechnic University (now control systems, cyber security for cyber–physical systems, large-scale systems
NYU), Brooklyn, NY in 2002 and 2006, respectively. He and decentralized control, and real-time embedded instrumentation and control.
is currently a Research Scientist and Adjunct Faculty Prof. Khorrami has published more than 240 refereed journal and conference
with the Department of Electrical and Computer En- papers in these areas and holds thirteen U.S. patents. His book on ‘‘Modeling and
gineering at NYU Tandon School of Engineering, NY, adaptive nonlinear control of electric motors’’ was published by Springer Verlag
and a Senior Researcher with FarCo Technologies, NY. in 2003. He also has thirteen U.S. patents on novel smart micropositioners and
He has co-authored over 110 journal and conference actuators, control systems, and wireless sensors and actuators. He has developed
papers in the broad areas of autonomous systems, and directed the Control/Robotics Research Laboratory at Polytechnic University
robotics, and control systems. He has also co-authored the book ‘‘Modeling and (Now NYU). His research has been supported by the Army Research Office,
Adaptive Nonlinear Control of Electric Motors’’ published by Springer Verlag National Science Foundation, Office of Naval Research, DARPA, Air Force Research
in 2003. His research interests include robust and adaptive nonlinear control, Laboratory, Sandia National Laboratory, Army Research Laboratory, NASA, and
resilient control, autonomous vehicles and robotic systems, path planning and several corporations. Prof. Khorrami has served as general chair and conference
obstacle avoidance, sensor data fusion, machine learning, real-time embedded organizing committee member of several international conferences.
systems, electromechanical systems modeling and control, cyber–physical sys-
tems and cyber-security, decentralized and large-scale systems, high-fidelity and
hardware-in-the-loop simulation, real-time software implementations for robotic
systems, and distributed multi-agent systems.