This document proposes a key frame extraction method for efficient and accurate activity recognition in videos using deep learning. It introduces a multi-stream convolutional neural network architecture that takes in multiple representations of video data as input, including different color spaces (RGB, YCbCr) and optical flow. Key frames are extracted from videos based on optical flow, representing important moments while reducing computational requirements compared to using all frames. The method is evaluated on the UCF-101 dataset and shows promising results for video classification using different combinations of color spaces and optical flow as input to the multi-stream CNN model.