video caption, computer vision, dataset, natural language processing
video caption, computer vision, dataset, action recognition, machine learning