A dataset was created which consisted of 11,035 unique frames extracted from a total of 55 men's and women's cricket match highlights from YouTube which contained more than 1,300,000 total frames by training a simple Denoising Autoencoder with an architecture of 128*128*3 - 64 - 128*128*3 for semantic hashing and extracting the unique frames. As depicted in the figure below, the extracted frames were then categorized into three distinct classes -
Bowling (3676 frames):
All frames associated with the delivery of the ball with the batter facing the camera.
Field (3645 frames): All frames which focused on the field and fielding activities without being a closeup on the player
Miscellaneous (3714 frames): All frames which included diverse elements, such as audience shots, replays, and player zoom-ins, among others.
Copyright © Aarat