How Medium uses Neo4j
Nathaniel Felsen
May 20th, 2015
About Me
Nathaniel Felsen
Data & DevOps engineer
nathanielfelsen
n@medium.com
@faitlezen
Agenda
• What is Medium and what problem we are solving through Neo4j ?
• Why did we pick Neo4j ?
• Our architecture
• Steps taken and obstacles encountered while going live
• Improve Neo4j’s performances
• Live Demo
• Easiest, fastest way to create a
beautiful story
• Seamless integration of photos,
audio & video
• Optimized for web, tablet &
mobile
Medium is a beautiful
publishing experience.
Medium is a home for
influential contributors.
Medium is a place for
important ideas.
• Follow, share, recommend
• Personalized story feed
• Customized daily emails &
notifications
Medium is a network that
builds audience.
Datastore Selection Process
DynamoDB
• Need to nail the schema ≠ Experimentation
• Limited ways of querying data
• Things like short path between users won’t
perform well
Pros
• Expertise
• Already used to store user info
• No maintenance
• No hardware
Cons
• Using Relational database to
create graphs
• Sharding
Pros
• Used by lots of people and
heavily vetted.
• Less rampup for learning
querying language.
• Strong community.
Cons
Pros
Flock DB
• Not maintained anymore
• 2 tiers model
• Deal with sharding in the near
future
• Expertise
• SQL Lite syntax
• Open Source / Free
Cons
• Not free
• No expertise in house
• Requires hardware
Pros
• Easy to start
• Easy to experiment
• Good community
• Enteprise edition: HA, Backup, Support
Cons
Architecture
Our Social Service Architecture
Nodes
• User
• Post
• Collection
Relationships
• Edited
• Wrote
• Published
• Recommended
• Followed
• …
Use queues for the
writes
Write are done to the master only
If you lose your master, you need
to wait for a new election
Productionising Neo4j
Capacity planning
Initial Data Import
Metrics / Monitoring
• Architecture
• Systems
• Neo4j
• Dataset
• Java
• Services that interact with Neo4j
Logs aggregation /
Indexing
• ElasticSearch
• Logstash
• Logstash forwarder
• Kibana
Backups
• Incremental Backup
• Full Backup
Runbook / Playbook
Getting optimal performance with
Neo4j
Talk to the support
What Neo4j is good and not as good at
Long Traversal Where NOT Dense / Super Node
Cypher Tricks
http://watch.neo4j.org/video/84900121
Tune the configuration
over time
• Java Garbage collection (stop the world)
• Neo4j settings
Cache Settings
Neo4J2.0and2.1
Neo4J2.2
Server Plugins & Unmanaged Extensions
• Easy to Deploy
• The server’s functionality can be extended by adding plugins.
• RESTful Web Services (JAX-RS)
• Put more logic in the code like caching
• Sharp tool
Demo
Followers who recommended a story
Top Recommended stories
People Recommended to follow
Collaborative Filtering
n@medium.com
Questions ? Feedback ?

How medium uses Neo4j