Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deep Content Learning in Traffic Prediction and Text Classification

43 views

Published on

As part of the 2018 HPCC Systems Community Day event:

In this talk, Jingqing will introduce recent advances at the Data Science Institute, Imperial College London, and focus on a general framework named Deep Content Learning. Two recent projects will be discussed as examples. In the traffic prediction project, we released a new large-scale traffic dataset with auxiliary information including search queries from Baidu Map app and proposed hybrid models to achieve state-of-the-art prediction accuracy. The other project on zero-shot text classification integrated semantic knowledge and used a two-phase architecture to tackle the challenging zero-shot learning in textual data. The integration of TensorLayer and HPCC Systems will be discussed in the talk.

Jingqing Zhang is a 1st-year PhD (HiPEDS) at Data Science Institute, Imperial College London under the supervision of Prof. Yi-Ke Guo. His research interest includes Text Mining, Data Mining, Deep Learning and their applications. He received his MRes degree in Computing from Imperial College with Distinction in 2017 and BEng in Computer Science and Technology from Tsinghua University in 2016.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Deep Content Learning in Traffic Prediction and Text Classification

  1. 1. 2018 HPCC Systems Summit Community Day Deep Content Learning in Traffic Prediction and Text Classification Jingqing Zhang Prof. Yike Guo Data Science Institute Imperial College London
  2. 2. Outline • Imperial DSI • Deep Content Learning • Research Projects – Traffic Prediction – Zero-shot Text Classification • TensorLayer • HPCC Systems + TensorLayer
  3. 3. The Success of Deep Learning Johnson, Justin, Andrej Karpathy, and Li Fei-Fei. "Densecap: Fully convolutional localization networks for dense captioning." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. CV & NLP Medical Game
  4. 4. Deep Learning + Content Providers Deep Learning Content Providers Deep Content Learning
  5. 5. Deep Content Learning Environment Perception Decision Making Reasoning machine learning deep learning data knowledge logics and rules decision suggestion Content Providers Dog • Huskies usually have a thick double coat that can be gray, black, copper red, or white. Their eyes are typically pale blue, although they may also be brown, green, blue, yellow, or heterochromic. • Husky
  6. 6. Concrete Projects (Completed So Far) • P1: Traffic Prediction – Deep Sequence Learning with Auxiliary Information for Traffic Prediction, KDD 2018 • P2: Zero-shot Text Classification – Integrating Semantic Knowledge to Tackle Zero-shot Text Classification, submitted for reviews
  7. 7. P1: Deep Sequence Learning with Auxiliary Information for Traffic Prediction Marriott Buckhead navigation to by map apps • Spearman’s rank correlation coefficient with • 𝜌 = −0.52, P-value= 1.23 × 10−4 HPCC Systems Summit • How does online info affect traffic ? Deep Sequence Learning with Auxiliary Information for Traffic Prediction, Binbing Liao, Jingqing Zhang, Chao Wu, Douglas McIlwraith, Tong Chen, Shengwen Yang, Yike Guo, and Fei Wu, KDD 2018
  8. 8. Solution Environment Perception Decision Making Reasoning Sequence learningTraffic data Query impact Event discovery Traffic prediction
  9. 9. Event Discovery in Query Records • The events discovered by query records can correspond with real events.
  10. 10. Modelling LSTM LSTM LSTM… Encoder 𝑣1 Graph CNN 𝑁𝐵(𝑣1) Concat 𝑣2 Graph CNN 𝑁𝐵(𝑣2) 𝑣 𝑡 Graph CNN 𝑁𝐵(𝑣𝑡) Concat Concat LSTM LSTM LSTM… 𝑄𝐼(𝑡 + 𝑡′)𝑄𝐼(𝑡 + 1) 𝑄𝐼(𝑡 + 2) Encoder for Query Impact … 𝑣 𝑡+𝑡′ LSTM LSTM LSTM <END> <START> 𝑣 𝑡+1 𝑣 𝑡+𝑡′−1 𝑣 𝑡+𝑡′ FC FC 𝐴𝑇(𝑣𝑡+1) 𝐴𝑇(𝑣𝑡+𝑡′) Concat Concat Decoder Traffic Perception Sequence Learning Decision Making Reasoning Query Impact
  11. 11. Result • It is more challenging to predict traffic when events happen. • The query impact is more informative and closer related to real-time traffic.  More information is available: https://github.com/JingqingZ/BaiduTraffic
  12. 12. P2: Integrating Semantic Knowledge to Tackle Zero-shot Text Classification • Zero-shot Learning: learn about a new category without a training instance – Which is “Okapi”? – a zebra-striped four legged animal with a brown torso and a deer-like face
  13. 13. Zero-shot Text Classification Environment Perception Decision Making Reasoning Traditional text classification Text documents Knowledge Zero-shot text classification Imperial College London is a public research university located in London. Education
  14. 14. Reasoning – Relationship Vectors ConceptNet Relationship vectors – Find the relation between words and classes without any training data – Particular types of relations – The length of shortest path
  15. 15. • In the learning stage, no information about unseen classes • In the inference stage, the unseen classes are known (label, description), but still no training data • Can we infer what the documents from unseen classes would look like? • Can we generate fake documents that look like real data from unseen classes? Reasoning – Topic Translation 𝑐: Germany 𝑤: Berlin 𝑤′ : ? 𝑐′ : France Vector Space
  16. 16. Example of Translated Documents Animal (Original) Mitra perdulca is a species of sea snail a marine gastropod mollusk in the family Mitridae the miters or miter snails. Animal  Plant Arecaceae perdulca is a flowering of port aster a naval mollusk gastropod in the fabaceae Clusiaceae the tiliaceae or rockery amaryllis. Animal  Athlete Mira perdulca is a swimmer of sailing sprinter an Olympian limpets gastropod in the basketball Middy the miters or miter skater. • Not completely understandable, but the translated documents contain the tone of the target class.
  17. 17. Decision Making – Two-phase Inference Binary Classification Fine-grained Classification Plants, also called green plants, are multicellular eukaryotes of the kingdom Plantae. Seen Unseen Plant
  18. 18. Result – Overall Performance • The proposed two-phase inference with integrated semantic knowledge is promising to tackle the challenging zero-shot text classification.  More information about this project will be released soon.
  19. 19. Implementation of Deep Learning
  20. 20. Gaps TensorFlow: low-level APIs Deep Learning: high-level neural networks Industry: high performance Abstraction gap Performance gap
  21. 21. TensorLayer – What is TensorLayer? • TensorLayer is an unique TensorFlow wrapper library that can I. teach deep learning II. help cutting-edge research III. run in the real-world • From late 2016 to present – > 4000 Stars – > 1000 Forks – > 70 Contributors – on GitHub
  22. 22. HPCC Systems + TensorLayer HPCC Systems TensorLayer Horovod TensorFlow Server 1 Server 2 Py3embed High-level wrapper Distributed framework  Data parallelism  Synchronous distributed training  GPU acceleration + CPU input pipeline
  23. 23. HPCC Systems + TensorLayer #GPU Dataset #Epoch Batch Size Time (s) #Images #Image/sec Accuracy GPU Mem (MB) 1 MNIST 50 512 135 2.5M 18.4K 0.98 ~315MB 2 MNIST 50 512 122 2.5M 20.4K 0.99 ~315MB 1 CIFAR 10 50 512 232 2.5M 10.6K 0.69 ~1435MB 2 CIFAR 10 50 512 221 2.5M 11.1K 0.71 ~1435MB
  24. 24. HPCC Systems + TensorLayer • Still too early to have a conclusion now. • Future works – Larger models to test distributed training, e.g. OpenPose. – Closer integration of HPCC Systems and TensorLayer. • https://github.com/tensorlayer/openpose-plus • https://github.com/tensorlayer/tensorlayer/tree/master/examples/distributed_training Data Processing Deployment Distributed Training
  25. 25. Summary Environment Perception Decision Making Reasoning Data Processing Deployment Distributed Training Deep Content Learning HPCC Systems + TensorLayer
  26. 26. Q & A Thanks Jingqing Zhang Prof. Yike Guo Data Science Institute Imperial College London Find more information, please visit http://www.doc.ic.ac.uk/~jz9215/

×