Human Action Recognition Using Convolutional Neural Network and Depth Sensor Data
The paper proposes a technique for Human Action Recognition (HAR) that uses a Convolutional Neural Network (CNN). Depth data sequences from the motion sensing devices are converted into images and fed into a CNN rather than using any conventional or statistical method. The initial data was obtained from 10 actions performed by six subjects captured by the Kinect v2 sensor as well as 20 actions performed by 7 subjects from the MSR 3D Action data set. A custom CNN architecture consisting of three convolutional and three max pooling layers followed by a fully connected layer was used. Training, validation, and testing was carried out on a total of 39715 images. An accuracy of 97.23% was achieved on the Kinect data set. On the MSR data set the accuracy was 87.1%.