Learning continually from sequential data is a difficult task, especially if the stream is divided into many small batches that contains non-i.i.d. data. Stochastic gradient descent usually fails in this settings, and the models experience high forgetting of past acquired knowledge. In this talk I'll try to address the problem of continually acquire new knowledge from many (almost 400) small batches (300 highly correlated frames each) of images from natural video streams.