The analysis of cluttered cells in a high-throughput time-lapse microscopy sequence is a challenging task, particularly in the presence of complex spatial structures and complicated temporal changes. We present a Deep Neural Network framework which addresses two aspects of cell segmentation in microscopy videos, namely, limited annotated training data and the inherent dependencies between video frames. In order to compensate for the limited training data and avoid overfitting, we introduce an adversarial loss, inspired by Goodfellow et. al, and propose a unique discriminator architecture, termed the Rib-Cage network. The Rib-Cage network is designed such that multi-level features of both the image and segmentation maps are compared at multiple scales allowing for the extraction of complex joint representations. Furthermore, we propose the integration of the U-Net architecture (Ronneberger et. al) with Convolutional Long Short Term Memory (C-LSTM). The segmentation network`s unique architecture enables it to capture multi-scale, compact, spatio-temporal encoding of the cells in the C-LSTMs memory units. The proposed network exploits temporal cues which facilitate the individual segmentation of touching or partially occluded cells. The method was applied to live cell microscopy data and tested on the common cell segmentation benchmark, the Cell Tracking Challenge (www.celltrackingchallenge.net), and ranked 1st and 2nd place on two challenging datasets.