Hand Movement Identification Using Single-Stream Spatial Convolutional Neural Networks

Aldi Sidik Permana, Esmeralda Contessa Djamal, Fikri Nugraha, Fatan Kasyidi


Human-robot interaction can be through several ways, such as through device control, sounds, brain, and body, or hand gesture. There are two main issues: the ability to adapt to extreme settings and the number of frames processed concerning memory capabilities. Although it is necessary to be careful with the selection of the number of frames so as not to burden the memory, this paper proposed identifying hand gesture of video using Spatial Convolutional Neural Networks (CNN). The sequential image's spatial arrangement is extracted from the frames contained in the video so that each frame can be identified as part of one of the hand movements. The research used VGG16, as CNN architecture is concerned with the depth of learning where there are 13 layers of convolution and three layers of identification. Hand gestures can only be identified into four movements, namely 'right', 'left', 'grab', and 'phone'. Hand gesture identification on the video using Spatial CNN with an initial accuracy of 87.97%, then the second training increased to 98.05%. Accuracy was obtained after training using 5600 training data and 1120 test data, and the improvement occurred after manual noise reduction was performed.


hand gesture identification; video processing; spatial,-stream; convolutional neural networks

Full Text: PDF


  • There are currently no refbacks.