TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21|
    1299 6461 9318 38091|EGP|195.66.224.97|0|0||NAG||




            AS1299             AS9318



                     AS6461             AS38091
http://www.cascading.org/
[head]




                                                                   GlobHfs[/Users/friso/Downloads/bview/alltxt.txt]


                      [{14}:'proto', 'time', 'type', 'peerip', 'peeras', 'prefix', 'path', 'origin', 'nexthop', 'localpref', 'MED', 'community', 'AAGG', 'aggregator'] [{14}:'proto',
                      [{14}:'proto', 'time', 'type', 'peerip', 'peeras', 'prefix', 'path', 'origin', 'nexthop', 'localpref', 'MED', 'community', 'AAGG', 'aggregator'] [{14}:'proto',


Each('nodes')[PathToNodes[decl:'id', 'name']]


                                        [{2}:'id', 'name']
                                                                                                Each('edges')[PathToEdges[decl:'from', 'to', 'updatecount']]
                                        [{2}:'id', 'name']

                                                                                                                                  [{3}:'from', 'to', 'updatecount']
                      Each('nodes')[FilterPartialDuplicates[decl:'id', 'name']]
                                                                                                                                  [{3}:'from', 'to', 'updatecount']

                                                       [{2}:'id', 'name']
                                                                                                               GroupBy('edges')[by:['from', 'to']]
                                                       [{2}:'id', 'name']

                                                                                                                                    edges[{2}:'from', 'to']
                                          GroupBy('nodes')[by:['id']]
                                                                                                                                [{3}:'from', 'to', 'updatecount']

                                                          nodes[{1}:'id']
                                                                                                    Every('edges')[Sum[decl:'updatecount'][args:1]]
                                                         [{2}:'id', 'name']

                                                                                                                               [{3}:'from', 'to', 'updatecount']
                                    Every('nodes')[First[decl:'id', 'name']]
                                                                                                                               [{3}:'from', 'to', 'updatecount']

                                                        [{2}:'id', 'name']
                                                                                           Hfs['TextDelimited[['from', 'to', 'updatecount']]']['/tmp/edges']']
                                                        [{2}:'id', 'name']

                                                                                                                           [{3}:'from', 'to', 'updatecount']
                                         Hfs['TextDelimited[['id', 'name']]']['/tmp/nodes']']
                                                                                                                           [{3}:'from', 'to', 'updatecount']

                                                                                          [{2}:'id', 'name']
                                                                                          [{2}:'id', 'name']


                                                                                                    [tail]
http://gephi.org/
https://gephi.org/plugins/openord-layout/
http://bit.ly/IzWvcT and http://bit.ly/HHNNIb
http://thejit.org/




http://neo4j.org/
org.neo4j.kernel.impl.batchinsert.BatchInserter
org.neo4j.graphdb.index.BatchInserterIndexProvider



        30M nodes + 250M edges, < 20 minutes
• No SQL was used throughout the entire codebase
•   (Even though it was tempting to use Hive at one point)

• You can find code here: https://github.com/friso/graphs
• You can find me on Twitter here: @fzk
• You can find me on e-mail here: fvanvollenhoven@xebia.com

NoSQL War Stories preso: Hadoop and Neo4j for networks

  • 1.
    TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21| 1299 6461 9318 38091|EGP|195.66.224.97|0|0||NAG|| AS1299 AS9318 AS6461 AS38091
  • 3.
  • 4.
    [head] GlobHfs[/Users/friso/Downloads/bview/alltxt.txt] [{14}:'proto', 'time', 'type', 'peerip', 'peeras', 'prefix', 'path', 'origin', 'nexthop', 'localpref', 'MED', 'community', 'AAGG', 'aggregator'] [{14}:'proto', [{14}:'proto', 'time', 'type', 'peerip', 'peeras', 'prefix', 'path', 'origin', 'nexthop', 'localpref', 'MED', 'community', 'AAGG', 'aggregator'] [{14}:'proto', Each('nodes')[PathToNodes[decl:'id', 'name']] [{2}:'id', 'name'] Each('edges')[PathToEdges[decl:'from', 'to', 'updatecount']] [{2}:'id', 'name'] [{3}:'from', 'to', 'updatecount'] Each('nodes')[FilterPartialDuplicates[decl:'id', 'name']] [{3}:'from', 'to', 'updatecount'] [{2}:'id', 'name'] GroupBy('edges')[by:['from', 'to']] [{2}:'id', 'name'] edges[{2}:'from', 'to'] GroupBy('nodes')[by:['id']] [{3}:'from', 'to', 'updatecount'] nodes[{1}:'id'] Every('edges')[Sum[decl:'updatecount'][args:1]] [{2}:'id', 'name'] [{3}:'from', 'to', 'updatecount'] Every('nodes')[First[decl:'id', 'name']] [{3}:'from', 'to', 'updatecount'] [{2}:'id', 'name'] Hfs['TextDelimited[['from', 'to', 'updatecount']]']['/tmp/edges']'] [{2}:'id', 'name'] [{3}:'from', 'to', 'updatecount'] Hfs['TextDelimited[['id', 'name']]']['/tmp/nodes']'] [{3}:'from', 'to', 'updatecount'] [{2}:'id', 'name'] [{2}:'id', 'name'] [tail]
  • 6.
  • 7.
  • 8.
  • 9.
  • 13.
    • No SQLwas used throughout the entire codebase • (Even though it was tempting to use Hive at one point) • You can find code here: https://github.com/friso/graphs • You can find me on Twitter here: @fzk • You can find me on e-mail here: fvanvollenhoven@xebia.com