Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ETL into Neo4j


Published on

Learn some of the ways to load data into Neo4j quickly.

Published in: Technology

ETL into Neo4j

  1. 1. ETL into Neo4j Max De Marzi
  2. 2. About Me Built the Neography Gem (Ruby Wrapper to the Neo4j REST API) Playing with Neo4j since 10/2009• My Blog:• Find me on Twitter: @maxdemarzi• Email me:• GitHub:
  3. 3. Agenda• ETL your mind• ETL with Batch and the REST API• ETL with Gremlin and Groovy• ETL with the Batch Importer• ETL from SQL
  4. 4. ETL your MindYou have to start there
  5. 5. More Relational than RelationalStop thinking about how Start thinking about relationshipsTables are related
  6. 6. Objects like to mingleOptimized for “trees” of data Optimized for seeing the forest and the trees, and the branches, and the trunks
  7. 7. SELECT skills.*, user_skill.*FROM usersJOIN user_skill ON = user_skill.user_idJOIN skills ON user_skill.skill_id = skill.idWHERE = 1
  8. 8. START user = node(1)MATCH user -[user_skill]-> skillRETURN skill, user_skill
  9. 9. Property Graph
  10. 10. Language LanguageCountry Countrylanguage_code language_code country_codelanguage_name country_code country_nameword_count primary flag_uri Language Countryname name IS_SPOKEN_INcode codeword_count as_primary flag_uri
  11. 11. name: “Canada” languages_spoken: “[ „English‟, „French‟ ]” language:“English” spoken_in name: “USA”name: “Canada” language:“French” spoken_in name: “France”
  12. 12. Country name flag_uri language_name number_of_words yes_in_langauge no_in_language currency_code currency_name Country Languagename nameflag_uri SPEAKS number_of_words yes no Currency code name
  13. 13. ETL with Batch and the REST API
  14. 14. Batch command from REST APIGreat for importing Facebook/Twitter friendsKeep each request under 10k commandsPreferably send a request every 2k to 5k commands
  15. 15. Using Batch from Neography
  16. 16. Why Batch Transactional: any failures not committed. Ordered: responses guaranteed to be in the same order as sent. Continuous loading/updating nodes and relationships in spurts or streaming.
  17. 17. ETL with Gremlin and Groovy
  18. 18. Commit every 1000 changes or so, make sure to stop the transaction to commit thelast few changes at the very end.Look into auto-indexing to make life easier.Disabled by default. See Docs for trick to make it full textinstead of exact index.
  19. 19. Crazy Format is ok Id :: Title :: Genre|Genre|Genre But it’s preferable to stay clear of escape characters like “|”String location of data file, converted to URL, then processed one line at a time.Movie vertex created, genre vertex created unless it exists (index lookup), edgefrom movie to genre is created.Full walk-through on
  20. 20. ETL with the Batch Importer
  21. 21. Installation Walk-Through
  22. 22. Testing it7.5M nodes, 42M relationships in just over 3 minutes on a laptop.
  23. 23. Loading it into Neo4jFull walk-through on
  24. 24. When to use the Batch Importer? • 1st time loading or periodic reloading • When you need Speed • When you don’t mind a little Java
  25. 25. ETL from SQL
  26. 26. Identities who vouched for each otherrow_number() and INTO are our friends
  27. 27. The “term” vouched for will serve as our relationship type, status is a relationship property.
  28. 28. Notice there are no node ids.These are automatic, clkao is node 1
  29. 29. No time to get coffee >8-[
  30. 30. What about multiple types of nodes?No problem, just add the MAX(node_id) from the first table. Full walk-through at: Need help? E-mail me, catch me on Google chat or Skype. Please don’t be shy…. and read my blog:
  31. 31. Thank you!