posted on 2021-05-24, 12:09authored byHasan W. Almawi
This thesis introduces a method to combine static and dynamic features in a
convolutional neural network (CNN) to produce a motion and object boundary
prediction map. This approach provides the CNN with dynamic and static
cues and information, thus improving its predictions. The spatial stream of
the CNN learns to compute an object boundary prediction map from a single
RGB frame, while the temporal stream learns to compute a motion boundary
prediction map from the corresponding optical ow map. The streams are
then combined through an encoder-decoder architecture, where the decoder
learns to fuse the features from both streams to obtain a task specific output.
The proposed method yields state-of-the-art results on a motion boundaries
benchmark, and systematic improvements in object boundaries benchmarks
over methods that solely rely on static features extracted from a single RGB
frame.