2018 HPCC Systems Summit Community Day
Deep Content Learning in Traffic
Prediction and Text Classification
Jingqing Zhang
Prof. Yike Guo
Data Science Institute
Imperial College London
Outline
• Imperial DSI
• Deep Content Learning
• Research Projects
– Traffic Prediction
– Zero-shot Text Classification
• TensorLayer
• HPCC Systems + TensorLayer
The Success of Deep Learning
Johnson, Justin, Andrej Karpathy, and Li Fei-Fei. "Densecap: Fully convolutional localization networks for dense captioning." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
CV & NLP Medical Game
Deep Learning + Content Providers
Deep
Learning
Content
Providers
Deep Content
Learning
Deep Content Learning
Environment Perception
Decision
Making
Reasoning
machine learning
deep learning
data
knowledge
logics and rules
decision
suggestion
Content
Providers
Dog
• Huskies usually have a thick
double coat that can be gray,
black, copper red, or white.
Their eyes are typically pale
blue, although they may also
be brown, green, blue, yellow,
or heterochromic.
• Husky
Concrete Projects (Completed So Far)
• P1: Traffic Prediction
– Deep Sequence Learning with Auxiliary Information for Traffic Prediction, KDD 2018
• P2: Zero-shot Text Classification
– Integrating Semantic Knowledge to Tackle Zero-shot Text Classification, submitted for reviews
P1: Deep Sequence Learning with Auxiliary Information for Traffic
Prediction
Marriott
Buckhead
navigation to
by map apps
• Spearman’s rank correlation coefficient with
• 𝜌 = −0.52, P-value= 1.23 × 10−4
HPCC Systems Summit
• How does online info affect traffic ?
Deep Sequence Learning with Auxiliary Information for Traffic Prediction, Binbing Liao, Jingqing Zhang, Chao Wu, Douglas McIlwraith, Tong Chen, Shengwen Yang, Yike
Guo, and Fei Wu, KDD 2018
Solution
Environment Perception
Decision
Making
Reasoning
Sequence learningTraffic data
Query impact
Event discovery
Traffic prediction
Event Discovery in Query Records
• The events discovered by query records can correspond with real events.
Modelling
LSTM LSTM LSTM…
Encoder
𝑣1
Graph
CNN
𝑁𝐵(𝑣1)
Concat
𝑣2
Graph
CNN
𝑁𝐵(𝑣2) 𝑣 𝑡
Graph
CNN
𝑁𝐵(𝑣𝑡)
Concat Concat
LSTM LSTM LSTM…
𝑄𝐼(𝑡 + 𝑡′)𝑄𝐼(𝑡 + 1) 𝑄𝐼(𝑡 + 2)
Encoder for Query Impact
…
𝑣 𝑡+𝑡′
LSTM LSTM LSTM
<END>
<START>
𝑣 𝑡+1
𝑣 𝑡+𝑡′−1 𝑣 𝑡+𝑡′
FC FC
𝐴𝑇(𝑣𝑡+1) 𝐴𝑇(𝑣𝑡+𝑡′)
Concat Concat
Decoder
Traffic
Perception
Sequence Learning
Decision
Making
Reasoning
Query Impact
Result
• It is more challenging to predict traffic when
events happen.
• The query impact is more informative and
closer related to real-time traffic.
 More information is available: https://github.com/JingqingZ/BaiduTraffic
P2: Integrating Semantic Knowledge to Tackle Zero-shot
Text Classification
• Zero-shot Learning: learn about a new category without a training instance
– Which is “Okapi”?
– a zebra-striped four legged animal with a brown torso and a deer-like face
Zero-shot Text Classification
Environment Perception
Decision
Making
Reasoning
Traditional text
classification
Text documents Knowledge Zero-shot text
classification
Imperial College
London is a public
research university
located in London.
Education
Reasoning – Relationship Vectors
ConceptNet
Relationship vectors
– Find the relation between words and
classes without any training data
– Particular types of relations
– The length of shortest path
• In the learning stage, no information about unseen classes
• In the inference stage, the unseen classes are known (label, description), but still no training data
• Can we infer what the documents from unseen classes would look like?
• Can we generate fake documents that look like real data from unseen classes?
Reasoning – Topic Translation
𝑐: Germany
𝑤: Berlin 𝑤′
: ?
𝑐′
: France
Vector Space
Example of Translated Documents
Animal (Original) Mitra perdulca is a species of sea snail a marine gastropod mollusk in the family
Mitridae the miters or miter snails.
Animal  Plant Arecaceae perdulca is a flowering of port aster a naval mollusk gastropod in the
fabaceae Clusiaceae the tiliaceae or rockery amaryllis.
Animal  Athlete Mira perdulca is a swimmer of sailing sprinter an Olympian limpets gastropod in
the basketball Middy the miters or miter skater.
• Not completely understandable, but the translated documents contain the tone of the target class.
Decision Making – Two-phase Inference
Binary
Classification
Fine-grained
Classification
Plants, also called
green plants, are
multicellular
eukaryotes of the
kingdom Plantae.
Seen
Unseen
Plant
Result – Overall Performance
• The proposed two-phase inference with integrated semantic knowledge is promising to tackle the
challenging zero-shot text classification.
 More information about this project will be released soon.
Implementation of Deep Learning
Gaps
TensorFlow: low-level APIs Deep Learning: high-level
neural networks
Industry: high performance
Abstraction
gap
Performance
gap
TensorLayer – What is TensorLayer?
• TensorLayer is an unique TensorFlow wrapper library that can
I. teach deep learning
II. help cutting-edge research
III. run in the real-world
• From late 2016 to present
– > 4000 Stars
– > 1000 Forks
– > 70 Contributors
– on GitHub
HPCC Systems + TensorLayer
HPCC Systems
TensorLayer
Horovod
TensorFlow
Server 1 Server 2
Py3embed
High-level
wrapper Distributed
framework
 Data parallelism
 Synchronous distributed training
 GPU acceleration + CPU input pipeline
HPCC Systems + TensorLayer
#GPU Dataset #Epoch
Batch
Size
Time (s) #Images #Image/sec Accuracy
GPU Mem
(MB)
1 MNIST 50 512 135 2.5M 18.4K 0.98 ~315MB
2 MNIST 50 512 122 2.5M 20.4K 0.99 ~315MB
1 CIFAR 10 50 512 232 2.5M 10.6K 0.69 ~1435MB
2 CIFAR 10 50 512 221 2.5M 11.1K 0.71 ~1435MB
HPCC Systems + TensorLayer
• Still too early to have a conclusion now.
• Future works
– Larger models to test distributed training, e.g. OpenPose.
– Closer integration of HPCC Systems and TensorLayer.
• https://github.com/tensorlayer/openpose-plus
• https://github.com/tensorlayer/tensorlayer/tree/master/examples/distributed_training
Data
Processing
Deployment
Distributed
Training
Summary
Environment Perception
Decision
Making
Reasoning
Data
Processing
Deployment
Distributed
Training
Deep Content Learning
HPCC Systems + TensorLayer
Q & A
Thanks
Jingqing Zhang
Prof. Yike Guo
Data Science Institute
Imperial College London
Find more information, please visit
http://www.doc.ic.ac.uk/~jz9215/

Deep Content Learning in Traffic Prediction and Text Classification

  • 1.
    2018 HPCC SystemsSummit Community Day Deep Content Learning in Traffic Prediction and Text Classification Jingqing Zhang Prof. Yike Guo Data Science Institute Imperial College London
  • 2.
    Outline • Imperial DSI •Deep Content Learning • Research Projects – Traffic Prediction – Zero-shot Text Classification • TensorLayer • HPCC Systems + TensorLayer
  • 4.
    The Success ofDeep Learning Johnson, Justin, Andrej Karpathy, and Li Fei-Fei. "Densecap: Fully convolutional localization networks for dense captioning." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. CV & NLP Medical Game
  • 5.
    Deep Learning +Content Providers Deep Learning Content Providers Deep Content Learning
  • 6.
    Deep Content Learning EnvironmentPerception Decision Making Reasoning machine learning deep learning data knowledge logics and rules decision suggestion Content Providers Dog • Huskies usually have a thick double coat that can be gray, black, copper red, or white. Their eyes are typically pale blue, although they may also be brown, green, blue, yellow, or heterochromic. • Husky
  • 7.
    Concrete Projects (CompletedSo Far) • P1: Traffic Prediction – Deep Sequence Learning with Auxiliary Information for Traffic Prediction, KDD 2018 • P2: Zero-shot Text Classification – Integrating Semantic Knowledge to Tackle Zero-shot Text Classification, submitted for reviews
  • 8.
    P1: Deep SequenceLearning with Auxiliary Information for Traffic Prediction Marriott Buckhead navigation to by map apps • Spearman’s rank correlation coefficient with • 𝜌 = −0.52, P-value= 1.23 × 10−4 HPCC Systems Summit • How does online info affect traffic ? Deep Sequence Learning with Auxiliary Information for Traffic Prediction, Binbing Liao, Jingqing Zhang, Chao Wu, Douglas McIlwraith, Tong Chen, Shengwen Yang, Yike Guo, and Fei Wu, KDD 2018
  • 9.
  • 10.
    Event Discovery inQuery Records • The events discovered by query records can correspond with real events.
  • 11.
    Modelling LSTM LSTM LSTM… Encoder 𝑣1 Graph CNN 𝑁𝐵(𝑣1) Concat 𝑣2 Graph CNN 𝑁𝐵(𝑣2)𝑣 𝑡 Graph CNN 𝑁𝐵(𝑣𝑡) Concat Concat LSTM LSTM LSTM… 𝑄𝐼(𝑡 + 𝑡′)𝑄𝐼(𝑡 + 1) 𝑄𝐼(𝑡 + 2) Encoder for Query Impact … 𝑣 𝑡+𝑡′ LSTM LSTM LSTM <END> <START> 𝑣 𝑡+1 𝑣 𝑡+𝑡′−1 𝑣 𝑡+𝑡′ FC FC 𝐴𝑇(𝑣𝑡+1) 𝐴𝑇(𝑣𝑡+𝑡′) Concat Concat Decoder Traffic Perception Sequence Learning Decision Making Reasoning Query Impact
  • 12.
    Result • It ismore challenging to predict traffic when events happen. • The query impact is more informative and closer related to real-time traffic.  More information is available: https://github.com/JingqingZ/BaiduTraffic
  • 13.
    P2: Integrating SemanticKnowledge to Tackle Zero-shot Text Classification • Zero-shot Learning: learn about a new category without a training instance – Which is “Okapi”? – a zebra-striped four legged animal with a brown torso and a deer-like face
  • 14.
    Zero-shot Text Classification EnvironmentPerception Decision Making Reasoning Traditional text classification Text documents Knowledge Zero-shot text classification Imperial College London is a public research university located in London. Education
  • 15.
    Reasoning – RelationshipVectors ConceptNet Relationship vectors – Find the relation between words and classes without any training data – Particular types of relations – The length of shortest path
  • 16.
    • In thelearning stage, no information about unseen classes • In the inference stage, the unseen classes are known (label, description), but still no training data • Can we infer what the documents from unseen classes would look like? • Can we generate fake documents that look like real data from unseen classes? Reasoning – Topic Translation 𝑐: Germany 𝑤: Berlin 𝑤′ : ? 𝑐′ : France Vector Space
  • 17.
    Example of TranslatedDocuments Animal (Original) Mitra perdulca is a species of sea snail a marine gastropod mollusk in the family Mitridae the miters or miter snails. Animal  Plant Arecaceae perdulca is a flowering of port aster a naval mollusk gastropod in the fabaceae Clusiaceae the tiliaceae or rockery amaryllis. Animal  Athlete Mira perdulca is a swimmer of sailing sprinter an Olympian limpets gastropod in the basketball Middy the miters or miter skater. • Not completely understandable, but the translated documents contain the tone of the target class.
  • 18.
    Decision Making –Two-phase Inference Binary Classification Fine-grained Classification Plants, also called green plants, are multicellular eukaryotes of the kingdom Plantae. Seen Unseen Plant
  • 19.
    Result – OverallPerformance • The proposed two-phase inference with integrated semantic knowledge is promising to tackle the challenging zero-shot text classification.  More information about this project will be released soon.
  • 20.
  • 21.
    Gaps TensorFlow: low-level APIsDeep Learning: high-level neural networks Industry: high performance Abstraction gap Performance gap
  • 22.
    TensorLayer – Whatis TensorLayer? • TensorLayer is an unique TensorFlow wrapper library that can I. teach deep learning II. help cutting-edge research III. run in the real-world • From late 2016 to present – > 4000 Stars – > 1000 Forks – > 70 Contributors – on GitHub
  • 23.
    HPCC Systems +TensorLayer HPCC Systems TensorLayer Horovod TensorFlow Server 1 Server 2 Py3embed High-level wrapper Distributed framework  Data parallelism  Synchronous distributed training  GPU acceleration + CPU input pipeline
  • 24.
    HPCC Systems +TensorLayer #GPU Dataset #Epoch Batch Size Time (s) #Images #Image/sec Accuracy GPU Mem (MB) 1 MNIST 50 512 135 2.5M 18.4K 0.98 ~315MB 2 MNIST 50 512 122 2.5M 20.4K 0.99 ~315MB 1 CIFAR 10 50 512 232 2.5M 10.6K 0.69 ~1435MB 2 CIFAR 10 50 512 221 2.5M 11.1K 0.71 ~1435MB
  • 25.
    HPCC Systems +TensorLayer • Still too early to have a conclusion now. • Future works – Larger models to test distributed training, e.g. OpenPose. – Closer integration of HPCC Systems and TensorLayer. • https://github.com/tensorlayer/openpose-plus • https://github.com/tensorlayer/tensorlayer/tree/master/examples/distributed_training Data Processing Deployment Distributed Training
  • 26.
  • 27.
    Q & A Thanks JingqingZhang Prof. Yike Guo Data Science Institute Imperial College London Find more information, please visit http://www.doc.ic.ac.uk/~jz9215/

Editor's Notes

  • #2 Hello everyone, It’s my great pleasure to celebrate this community day and introduce research advances at Data Science Institute, Imperial College London. I hope you will enjoy my talk.
  • #3 This is the outline of my talk. I will firstly introduce ourselves: Imperial College Data Science Institute. I will propose the idea of deep content learning with two projects we have conducted so far. And I will introduce TensorLayer which is a development tool for deep learning models. finally I would like to share some practice we have done to integrate HPCC Systems with TensorLayer.
  • #4 The Data Science Institute at Imperial College London was launched in 2014. The DSI aims to enhance Imperial's excellence in data-driven research across its faculties. Therefore, we receive support from faculty of engineering, medicine, natural science as well as the business school. The DSI consists of seven parts. One hub and six labs. Each lab has its own focus as you may find in this figure. And the Hub mainly focuses on data management, analysis and also machine learning. I am doing my PhD at the DSI Hub so my research would focus machine learning, deep learning and their applications. As you may notice, [click to next page]
  • #5 The deep learning has achieved great success in many scenarios including computer vision and natural language processing medical imaging and game playing In many tasks, the deep learning models perform even better than human. For examples, object recognition in images. Those tasks need to be well defined and mostly importantly, a huge amount of data is necessary. However, the tasks that may require semantic understanding, inference, reasoning can be very challenging for deep learning models. For examples, question answering, chatbot, medical diagnosis and etc. So the current AI systems are still far behind the ultimate goal of AI, which is AI should be able to do what human can do. [click to next page]
  • #6 The good news, nowadays, we not only in the era of big data, we also have lots of content providers. The content providers are the providers that can organise and provide knowledge in general or specific domains. The content they provide is also a kind of data but the data should be better organised , structured and in high quality. [click] A good example is the content provided by Elsevier and LexisNexis. [click] Therefore, we believe the combination of deep learning with the content would be essential in our future AI research. And we call it Deep Content Learning [click next page]
  • #7 We think in the Deep Content Learning, there are at least three key modules. The perception, the reasoning and the decision making. The perception module is a stage to extract features and representations from data. And this is what machine learning and deep learning are initially defined to do. The reasoning module should include additional knowledge from content provider to infer something related to the scenario. The final decision making module would combine all the results and make the right decision driven the utility. For example [click] Given an image of a dog, the perception module extracts the features of this dog, the colour, the eye colour. The reasoning module should find the knowledge that describes this specific kind of dog. And the decision making module should predict that this is a husky instead just saying that it is a dog. [click next page]
  • #8 We have conducted two concrete projects under the idea of Deep Content Learning. The traffic prediction with auxiliary information and the zero-shot text classification with semantic knowledge.
  • #9 As we know, the traffic is normally periodic. There are peak hours when the traffic is heavy and off-peak hours when the traffic is light. In this case, it is easy for models to predict the traffic. However, if a place is holding a public event, like here like today, HPCC Systems summit, the traffic may be not normal again. Because a crowd of people will come here and the traffic nearby will be abnormally heavier and a classic traffic prediction model may fail. But how can we detect such condition. I believe most people nowadays can’t drive without a navigation app like Google Map. If one person is searching this hotel, maybe everything is fine. [click] But if a lot of people is searching this hotel, I am a little worried about the traffic here. [click] This figure shows how the search query from map app is related to the traffic speed. The blue lines are normal condition and the red liens are abnormal condition. As you may find there is very clear negative correlation between traffic speed and online search query. And the statistics has verified this idea. [click to next page]
  • #10 In this project, we used conventional sequence learning as the perception. We quantified the query impact on traffic and did event discovery in the reasoning module. The decision making integrated all the information and did traffic speed prediction. [click next page]
  • #11 This table shows the events we discovered from the query records. [click] For example, the row of this table, we find that the number of queries that search capital gym at this period time is much higher than the normal query counts. And we find some other popular locations as well. [click] actually these events can correspond with real public events like concert, forum and attraction. [click next page]
  • #12 The slide introduces the modelling. pure temporal --> spatial relations --> attributes --> query impact
  • #14 The key challenge is to transfer knowledge from familiar to unfamiliar classes (generalisation). The research of zero-shot learning can be very useful when the training data of emerging classes is inefficient or even unavailable. And the problem of emerging classes is common in many domains such research topics, social media, advertisement, object recognition and medical diagnosis. Few previous research studied zero-shot text classification.
  • #15 Recognising text documents of categories that have never been seen in the training stage.
  • #17 As we have no training data for unseen classes, the model can be biased to the data we have. So the model may not be able to differentiate the difference between seen and unseen classes, especially when there is some semantic overlapping between classes.
  • #21 Reasons to use TensorFlow Largest user base Widest production adoption Well-maintained documents Battlefield-proof quality But hard to master Low-level interfaces
  • #24 We hope in the future TensorLayer can be integrated into HPCC Systems to provide powerful
  • #26 The improvement of distributed training on 2-GPU isn’t significant so far.