Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine

415 views

Published on

Yinlong started his talk with an introduction of his new position at Huawei, what is the company doing and more specifically how is it involved with Big Data Research and graphs. He also explained that his research center is currently working on Big Data Analytics and Management from 4 sides: Natural Language Processing, Graph analyrics, Machine Learning and Deep Learning.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine

  1. 1. Big Graph Analytics Engine Yinglong Xia 6/23/2016 8th Linked Data Benchmark Council TUC Meeting@Oracle Conference Center
  2. 2. 2 Introduction
  3. 3. 3 Introduction http://www.huawei.com/en/about-huawei/corporate-governance/corporate-governance
  4. 4. 4 Recent Growth Revenue Net Profits Cash flow 
 from 
 operating 
 activities http://www.huawei.com/en/about-huawei
  5. 5. 5 Collaboration Industrial Partners Universities Standards Technical 
 organizations Global Research 
 Institute & Labs Open Source
  6. 6. 6 Graph Analytics for Smart Big Data Big Data Analytics & Management Graph Machine
 Learning NLP Deep 
 Learning
  7. 7. 7 Graph in ONOS HotSDN’2014
  8. 8. 8 Topology Impact on Information Propagation
  9. 9. 9 Explore the Variety in Graph Analytics Graph
  10. 10. 10 Challenges ● Very large scale graphs for analysis • 10B~1000B in terms of the number of vertices • a few hundreds of properties, static and dynamic • distributed communication introduces additional overhead ● Irregularity in graph data access • Low data locality results in high disk/communication IO overhead • Data access patterns are diverse among graph analysis algorithms ● Near real-time requirement • Incorporate with incremental graph updates • Approximate query & analysis should be considered ● Efficiency and productivity to balance
  11. 11. 11 Graph Platform for Smart Big Data Infrastructure Data 
 Management Graph engines Visualization Analytics Single Machine Cluster GPU Server Cloud Structure Management Property
 Management Metadata
 Management Permission
 Control Basic Engine Streaming Graph Graphical Model Hyper Graph Bayes NetCommunity Label propagationCentrality Anomaly detection Matching Ego Feature Max Flow Dynamic Graph Vis Property Vis Large Graph Vis Incremental Update
  12. 12. bi-temp query E→Edge Prop 12 Graph Platform Data Source Graph Topology and Property V→Adjacency KC/KV Store V→Vertex Prop Prop Idx Encoding External (Solr/CLucene) Concurrency
 Control Main Storage Dynamic Graph Onlinequery/modification Property indices Ingestion V→TimeStamp→Adjacency Streaming graph storage V→TimeStamp→Vprop V→TimeStamp→Eprop SlidingWindow KC/KV Store CSRSparse subgraphs Densse SubgraphDense subgraphs GPUOffload DirectSolver IterativeSolver Snapshots Double buffering Batch 
 processing Streaming
 Graph TripleStore Streaming algorithms Graph Inference Inference Tools (Virtuoso, Jena, etc.) Knowledge Graph Online update property graph Periodically updated 
 static graph snapshots Probabilistic Graphical Model & Inference Offline Batch Processing online/offline analysis MVCC KV Store Snapshot
 Management
  13. 13. 13 Unified Graph Data Access Patterns 1 2 3 4 5 6 1 2 3 4 5 6 0.3 0.2 1.4 0.5 0.6 0.8 0.4 0.3 0.8 0.2 1.9 0.6 0.9 1.2 0.3 1.1 equivalent src dst value
 1 2 0.3 3 2 0.2 4 1 1.4 5 1 0.5 2 0.6 6 2 0.8 src dst value
 1 3 0.4 2 3 0.3 3 4 0.8 5 3 0.2 6 4 1.9 src dst value
 2 5 0.6 3 5 0.9 6 1.2 4 5 0.3 5 6 1.1 shard 1 (1, 2) shard 2 (3,4) shard 3 (5,6) src dst value
 1 2 0.3 3 2 0.2 4 1 1.4 5 1 0.5 2 0.6 6 2 0.8 src dst value
 1 3 0.4 2 3 0.3 3 4 0.8 5 3 0.2 6 4 1.9 src dst value
 2 5 0.6 3 5 0.9 6 1.2 4 5 0.3 5 6 1.1 src dst value
 1 2 0.3 3 2 0.2 4 1 1.4 5 1 0.5 2 0.6 6 2 0.8 src dst value
 1 3 0.4 2 3 0.3 3 4 0.8 5 3 0.2 6 4 1.9 src dst value
 2 5 0.6 3 5 0.9 6 1.2 4 5 0.3 5 6 1.1 1 2 3 4 5 6 0.3 0.2 1.4 0.5 0.6 0.8 0.4 0.3 0.8 0.2 1.9 0.6 0.9 1.2 0.3 1.1 1 2 3 4 5 6 0.3 0.2 1.4 0.5 0.6 0.8 0.4 0.3 0.8 0.2 1.9 0.6 0.9 1.2 0.3 1.1 step1step2step3 observationonPSWdataaccess
 patternsinspireshighlyefficient
 shardingrepresentation Iterationi
  14. 14. 14 Construct Edge-set Flows 1 2 3 4 5 6 1 2 3 4 5 6 0.3 0.2 1.4 0.5 0.6 0.8 0.4 0.3 0.8 0.2 1.9 0.6 0.9 1.2 0.3 1.1 3 5 1 2 4 6 1 2 3 4 5 6 0.2 0.5 0.6 0.8 0.2 0.9 1.2 1.1 0.3 0.4 0.3 0.6 1.4 0.3 0.8 1.9 3 5 1 2 4 6 1 2 3 4 5 6 0.2 0.5 0.6 0.8 0.2 0.9 1.2 1.1 0.3 0.4 0.3 0.6 1.4 0.3 0.8 1.9 1 4 7 1 2 3 2 5 8 4 5 6 row permutation column permutation Physical edge-sets 1 2 3 4 5 6 7 8 9 Flow direction
  15. 15. 15 Preliminary Experiments - Preproc. Graph Ingestion/Preprocessing Time Create the data in our format
  16. 16. 16 Preliminary Experiments - Comp. PageRank w/o Loading Time Decent speedup achieved w/ or w/o 
 loading time
  17. 17. 17 Preliminary Experiments PageRank Total Time
  18. 18. 18 Conclusion ● Many big data problems involve links among a lot of entities, naturally represented as a graph ● Property graph is highly expressive ● Industry is looking for graph/graphical model engines for complex network analysis, streaming graph, probabilistic graphical models, and RDF graph computing ● Efficiency is the key in many industry graph analysis systems, especially when the data volume is big ● Eventually, the graph engine should serve for AI Business systems
  19. 19. Thanks Yinglong Xia yinglong.xia.2010@ieee.org

×