Your SlideShare is downloading. ×
0
ETL into Neo4j
ETL into Neo4j
ETL into Neo4j
ETL into Neo4j
ETL into Neo4j
ETL into Neo4j
ETL into Neo4j
ETL into Neo4j
ETL into Neo4j
ETL into Neo4j
ETL into Neo4j
ETL into Neo4j
ETL into Neo4j
ETL into Neo4j
ETL into Neo4j
ETL into Neo4j
ETL into Neo4j
ETL into Neo4j
ETL into Neo4j
ETL into Neo4j
ETL into Neo4j
ETL into Neo4j
ETL into Neo4j
ETL into Neo4j
ETL into Neo4j
ETL into Neo4j
ETL into Neo4j
ETL into Neo4j
ETL into Neo4j
ETL into Neo4j
ETL into Neo4j
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

ETL into Neo4j

16,109

Published on

Learn some of the ways to load data into Neo4j quickly.

Learn some of the ways to load data into Neo4j quickly.

Published in: Technology
1 Comment
10 Likes
Statistics
Notes
No Downloads
Views
Total Views
16,109
On Slideshare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
188
Comments
1
Likes
10
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. ETL into Neo4j Max De Marzi
  • 2. About Me Built the Neography Gem (Ruby Wrapper to the Neo4j REST API) Playing with Neo4j since 10/2009• My Blog: http://maxdemarzi.com• Find me on Twitter: @maxdemarzi• Email me: maxdemarzi@gmail.com• GitHub: http://github.com/maxdemarzi
  • 3. Agenda• ETL your mind• ETL with Batch and the REST API• ETL with Gremlin and Groovy• ETL with the Batch Importer• ETL from SQL
  • 4. ETL your MindYou have to start there
  • 5. More Relational than RelationalStop thinking about how Start thinking about relationshipsTables are related
  • 6. Objects like to mingleOptimized for “trees” of data Optimized for seeing the forest and the trees, and the branches, and the trunks
  • 7. SELECT skills.*, user_skill.*FROM usersJOIN user_skill ON users.id = user_skill.user_idJOIN skills ON user_skill.skill_id = skill.idWHERE users.id = 1
  • 8. START user = node(1)MATCH user -[user_skill]-> skillRETURN skill, user_skill
  • 9. Property Graph
  • 10. Language LanguageCountry Countrylanguage_code language_code country_codelanguage_name country_code country_nameword_count primary flag_uri Language Countryname name IS_SPOKEN_INcode codeword_count as_primary flag_uri
  • 11. name: “Canada” languages_spoken: “[ „English‟, „French‟ ]” language:“English” spoken_in name: “USA”name: “Canada” language:“French” spoken_in name: “France”
  • 12. Country name flag_uri language_name number_of_words yes_in_langauge no_in_language currency_code currency_name Country Languagename nameflag_uri SPEAKS number_of_words yes no Currency code name
  • 13. ETL with Batch and the REST API
  • 14. Batch command from REST APIGreat for importing Facebook/Twitter friendsKeep each request under 10k commandsPreferably send a request every 2k to 5k commands
  • 15. Using Batch from Neography
  • 16. Why Batch Transactional: any failures not committed. Ordered: responses guaranteed to be in the same order as sent. Continuous loading/updating nodes and relationships in spurts or streaming.
  • 17. ETL with Gremlin and Groovy
  • 18. Commit every 1000 changes or so, make sure to stop the transaction to commit thelast few changes at the very end.Look into auto-indexing to make life easier.Disabled by default. See Docs for trick to make it full textinstead of exact index.http://docs.neo4j.org/chunked/milestone/auto-indexing.html
  • 19. Crazy Format is ok Id :: Title :: Genre|Genre|Genre But it’s preferable to stay clear of escape characters like “|”String location of data file, converted to URL, then processed one line at a time.Movie vertex created, genre vertex created unless it exists (index lookup), edgefrom movie to genre is created.Full walk-through on http://maxdemarzi.com/2012/01/13/neo4j-on-heroku-part-one/
  • 20. ETL with the Batch Importer
  • 21. Installation Walk-Through
  • 22. Testing it7.5M nodes, 42M relationships in just over 3 minutes on a laptop.
  • 23. Loading it into Neo4jFull walk-through on http://maxdemarzi.com/2012/02/28/batch-importer-part-1/
  • 24. When to use the Batch Importer? • 1st time loading or periodic reloading • When you need Speed • When you don’t mind a little Java
  • 25. ETL from SQL
  • 26. Identities who vouched for each otherrow_number() and INTO are our friends
  • 27. The “term” vouched for will serve as our relationship type, status is a relationship property.
  • 28. Notice there are no node ids.These are automatic, clkao is node 1
  • 29. No time to get coffee >8-[
  • 30. What about multiple types of nodes?No problem, just add the MAX(node_id) from the first table. Full walk-through at: http://maxdemarzi.com/2012/02/28/batch-importer-part-2/ Need help? E-mail me, catch me on Google chat or Skype. Please don’t be shy…. and read my blog: http://maxdemarzi.com
  • 31. Thank you! http://maxdemarzi.com

×