This document summarizes a research paper on end-to-end joint learning of natural language understanding and dialogue management. The paper proposes an end-to-end deep hierarchical model that leverages multi-task learning using three supervised tasks: user intent classification, slot tagging, and system action prediction. The model outperforms previous pipelined models by accessing contextual dialogue history and allowing the dialogue management signals to refine the natural language understanding through backpropagation. Evaluation on a dialogue state tracking dataset shows the joint model achieves better dialogue management performance compared to baselines and also improves natural language understanding.