Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apache Kafkaとグラフデータベースによる成長するネットワークグラフを分析・可視化する基盤

1,280 views

Published on

Apache Kafka Meetup Japan #4 https://kafka-apache-jp.connpass.com/event/77889/ LT発表資料

Published in: Technology
  • Be the first to comment

Apache Kafkaとグラフデータベースによる成長するネットワークグラフを分析・可視化する基盤

  1. 1. Apache Kafka / @laclefyoshi / ysaeki@r.recruit.co.jp
  2. 2. • Apache Kafka • : JanusGraph • Apache Kafka JanusGraph • • 2
  3. 3. Apache Kafka
  4. 4. Apache Kafka • Version 1.0.0 • Web {“browser_id”: “XXXX001”, “user_id”: “YYYY001”, “client_ip”: “ZZ.Z.ZZ.ZZZ”, “page_url”: “http://serviceA/a1001/a10012.html”, “referrer”: “http://serviceA/a1001/a10011.html”, “browser”: “Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20100101 Firefox/10.0”, “...”: “...”} ※ 4
  5. 5. • • SQL • SQL • : SELECT … JOIN (JOIN (JOIN … • • 5 : https://www.slideshare.net/laclefyoshi/ss-74398007
  6. 6. {“browser_id”: “XXXX001”, “user_id”: “YYYY001”, “client_ip”: “ZZ.Z.ZZ.ZZZ”, “page_url”: “http://serviceA/a1001/a10012.html”, “referrer”: “http://serviceA/a1001/a10011.html”, “browser”: “Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20100101 Firefox/10.0”, “...”: “...”} XXXX001 YYYY001 ZZ.Z.ZZ.ZZZ http://serviceA/ a1001/a10012.html http://serviceA/ a1001/a10011.html Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20100101 Firefox/10.0 browser_id user_id client_ip browser page_url link_to 6 page_url
  7. 7. {“browser_id”: “XXXX001”, “user_id”: “YYYY001”, “client_ip”: “ZZ.Z.ZZ.ZZZ”, “page_url”: “http://serviceA/a1001/a10012.html”, “referrer”: “http://serviceA/a1001/a10011.html”, “browser”: “Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20100101 Firefox/10.0”, “...”: “...”} XXXX001 YYYY001 ZZ.Z.ZZ.ZZZ http://serviceA/ a1001/a10012.html http://serviceA/ a1001/a10011.html Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20100101 Firefox/10.0 browser_id user_id client_ip browser page_url link_to 6 page_url
  8. 8. FlockDB Stardog 8
  9. 9. • Edge Vertex or Node • • • Property VertexVertex Edge id id name nametype 9
  10. 10. JanusGraph • • Titan folk • 2017 1 OSS • Titan 2016 6 • DataStax Enterprise Graph • Linux Foundation • Google IBM Hortonworks 10
  11. 11. JanusGraph • • HBase / GCP Cloud BigTable • Cassandra • Oracle Berkeley DB • InMemory • Elasticsearch/Solr • Gremlin • IBM, Google “SQLGraph: An Efficient Relational- Based Property Graph Store” (SIGMOD 2015) 11
  12. 12. Apache Kafka + JanusGraph
  13. 13. • Apache Kafka + Neo4j Web • Neo4j Use Case: Low Latency Graph Analytics & OLTP - Update 1M Nodes in 90 secs with Kafka and Neo4j Bolt • https://gist.github.com/graphadvantage/ a148613f75818897e396a64957dc6ef1 • Managing Genetic Ancestry at Scale with Neo4j and Kafka • https://stampedecon.com/big-data-conference-2015/ sessions/managing-genetic-ancestry-at-scale-with-neo4j- and-kafka/ 13
  14. 14. Users Developers Scientists 14
  15. 15. • AppEngine HTTP • • https://cloud.google.com/pubsub/docs/tutorials • : AppEngine PubSub API UrlFetch • https://cloud.google.com/appengine/quotas#UrlFetch 15
  16. 16. PubSub2Kafka • PubSub Kafka • fluentd Kafka • PubSub Input https://github.com/mia-0032/fluent-plugin- gcloud-pubsub-custom • Kafka Output https://github.com/fluent/fluent-plugin-kafka • Go • PubSub “cloud.google.com/go/pubsub” • Kafka “github.com/segmentio/kafka-go” • PubSub fluentd • Dataflow • 16
  17. 17. JanusGraph 1/2 • JanusGraph Server (Gramlin Server) • conf/janusgraph-hbase-es.properties BigTable conf/janusgraph-bigtable-es.properties • WebSocket HTTP conf/gremlin-server/gremlin-server.yaml storage.backend=hbase storage.hbase.ext.hbase.client.connection.impl= com.google.cloud.bigtable.hbase1_x.BigtableConnection storage.hbase.ext.google.bigtable.project.id=janus-poc storage.hbase.ext.google.bigtable.instance.id=janus-be host: 0.0.0.0 port: 8182 channelizer: org.apache.tinkerpop.gremlin.server.channel.HttpChannelizer 17
  18. 18. JanusGraph 2/2 • Nginx / Reverse Proxy location /janus/ { add_header Access-Control-Allow-Origin *; add_header Access-Control-Allow-Methods "POST, GET, OPTIONS"; add_header Access-Control-Allow-Headers "Origin, X-Requested-With, Content-Type, Accept"; add_header Access-Control-Allow-Credentials true; proxy_set_header Origin ""; proxy_pass http://127.0.0.1:8182/; if ($request_method = OPTIONS ) { add_header Access-Control-Allow-Origin *; add_header Access-Control-Allow-Methods "POST, GET, OPTIONS"; add_header Access-Control-Allow-Headers "Origin, X-Requested-With, Content-Type, Accept"; add_header Access-Control-Allow-Credentials true; add_header Content-Length 0; add_header Content-Type text/plain; return 200; } } JanusGraph Server Cross-Origin Resource Sharing 18
  19. 19. Kafka2Janusgraph • fluentd • JanusGraph Output Plugin Kafka Input JSON ↓ Vertex ID/Edge ↓ Gremlin ↓ HTTP POST JanusGraph Server query = <<-'QUERY' sv = g.V(src_id).tryNext().orElseGet { g.addV(T.id, src_id).next() } dv = g.V(dst_id).tryNext().orElseGet { g.addV(T.id, dst_id).next() } sv.addEdge(edge_label, dv) QUERY 19
  20. 20. Web UI • Vue.js + Nuxt • JanusGraph Server axios vis.js • Gremlin Gremlin JanusGraph Server 20
  21. 21. 1 • Gremlin JanusGraph server POST BODY • JanusGraph Server API / Web UI • • JanusGraph Server 2 • BigTable • 1 Kafka2Janusgraph • 1 Web UI storage.read-only=false # default storage.read-only=true 21
  22. 22. 2 • Edge • fluentd Output • JanusGraph Edge ID • • Edge ID • • Edge v1.addEdge("follow", v2, T.id, 20000) # Gremlin ↓ "Edge does not support user supplied identifiers”, "Exception-Class":"java.lang.UnsupportedOperationException", "exceptions":["java.lang.UnsupportedOperationException"] 22
  23. 23. 23 • Kafka • JanusGraph • • Kafka Connect • 


×