• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
ETL into Neo4j
 

ETL into Neo4j

on

  • 14,967 views

Learn some of the ways to load data into Neo4j quickly.

Learn some of the ways to load data into Neo4j quickly.

Statistics

Views

Total Views
14,967
Views on SlideShare
7,401
Embed Views
7,566

Actions

Likes
6
Downloads
163
Comments
1

13 Embeds 7,566

http://www.neo4j.org 7522
http://www.twylah.com 14
https://twitter.com 8
http://neo4j-org-staging.herokuapp.com 5
http://translate.googleusercontent.com 4
http://coderwall.com 3
https://www.rebelmouse.com 3
http://www.sayersj.com 2
http://localhost 1
http://vk.com 1
http://staging.neo4j.org 1
https://www.google.com 1
https://www.google.co.il 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    ETL into Neo4j ETL into Neo4j Presentation Transcript

    • ETL into Neo4j Max De Marzi
    • About Me Built the Neography Gem (Ruby Wrapper to the Neo4j REST API) Playing with Neo4j since 10/2009• My Blog: http://maxdemarzi.com• Find me on Twitter: @maxdemarzi• Email me: maxdemarzi@gmail.com• GitHub: http://github.com/maxdemarzi
    • Agenda• ETL your mind• ETL with Batch and the REST API• ETL with Gremlin and Groovy• ETL with the Batch Importer• ETL from SQL
    • ETL your MindYou have to start there
    • More Relational than RelationalStop thinking about how Start thinking about relationshipsTables are related
    • Objects like to mingleOptimized for “trees” of data Optimized for seeing the forest and the trees, and the branches, and the trunks
    • SELECT skills.*, user_skill.*FROM usersJOIN user_skill ON users.id = user_skill.user_idJOIN skills ON user_skill.skill_id = skill.idWHERE users.id = 1
    • START user = node(1)MATCH user -[user_skill]-> skillRETURN skill, user_skill
    • Property Graph
    • Language LanguageCountry Countrylanguage_code language_code country_codelanguage_name country_code country_nameword_count primary flag_uri Language Countryname name IS_SPOKEN_INcode codeword_count as_primary flag_uri
    • name: “Canada” languages_spoken: “[ „English‟, „French‟ ]” language:“English” spoken_in name: “USA”name: “Canada” language:“French” spoken_in name: “France”
    • Country name flag_uri language_name number_of_words yes_in_langauge no_in_language currency_code currency_name Country Languagename nameflag_uri SPEAKS number_of_words yes no Currency code name
    • ETL with Batch and the REST API
    • Batch command from REST APIGreat for importing Facebook/Twitter friendsKeep each request under 10k commandsPreferably send a request every 2k to 5k commands
    • Using Batch from Neography
    • Why Batch Transactional: any failures not committed. Ordered: responses guaranteed to be in the same order as sent. Continuous loading/updating nodes and relationships in spurts or streaming.
    • ETL with Gremlin and Groovy
    • Commit every 1000 changes or so, make sure to stop the transaction to commit thelast few changes at the very end.Look into auto-indexing to make life easier.Disabled by default. See Docs for trick to make it full textinstead of exact index.http://docs.neo4j.org/chunked/milestone/auto-indexing.html
    • Crazy Format is ok Id :: Title :: Genre|Genre|Genre But it’s preferable to stay clear of escape characters like “|”String location of data file, converted to URL, then processed one line at a time.Movie vertex created, genre vertex created unless it exists (index lookup), edgefrom movie to genre is created.Full walk-through on http://maxdemarzi.com/2012/01/13/neo4j-on-heroku-part-one/
    • ETL with the Batch Importer
    • Installation Walk-Through
    • Testing it7.5M nodes, 42M relationships in just over 3 minutes on a laptop.
    • Loading it into Neo4jFull walk-through on http://maxdemarzi.com/2012/02/28/batch-importer-part-1/
    • When to use the Batch Importer? • 1st time loading or periodic reloading • When you need Speed • When you don’t mind a little Java
    • ETL from SQL
    • Identities who vouched for each otherrow_number() and INTO are our friends
    • The “term” vouched for will serve as our relationship type, status is a relationship property.
    • Notice there are no node ids.These are automatic, clkao is node 1
    • No time to get coffee >8-[
    • What about multiple types of nodes?No problem, just add the MAX(node_id) from the first table. Full walk-through at: http://maxdemarzi.com/2012/02/28/batch-importer-part-2/ Need help? E-mail me, catch me on Google chat or Skype. Please don’t be shy…. and read my blog: http://maxdemarzi.com
    • Thank you! http://maxdemarzi.com