Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Inside Wordniks Architecture          Tony Tam          @fehguy
Who is Wordnik?• Founded in 2008 by Erin McKean• "Understand meaning of words  automatically"• Patented "Free-Range Defini...
Its all about Data!
Data?• Word Graph is                       80 S built by data                                     reads!• Runtime answers ...
What we do with Data• Update the Graph constantly• Augment our NLP pipeline• "Reality-based Annotation" with  current, rea...
What we do with Data• Update the Graph constantly• Augment our NLP pipeline• "Reality-based Annotation" with  current, rea...
What we do with Data• Update the Graph constantly• Augment our NLP pipeline• "Reality-based Annotation" with Next???  curr...
Is a 20 year-old corpus good enough?
How we do it• Amazon EC2-based deployment• Efficiency through constraint-based  architecture  •   Small is Big!• Horizonta...
Micro Services• Services are stand-alone building blocks• Increase capacity through a "more like this"  button
Micro Services• Big application => micro servicesMonolithicapplication    "Isnt this       just     SOA?"
Micro Services• Big application => micro servicesMonolithicapplication    "Isnt this       just     SOA?"
Micro Services• Big application => micro servicesMonolithicapplication    "Isnt this       just     SOA?"
Micro Services• Big application => micro servicesMonolithicapplication    "Isnt this       just     SOA?"
Not PO-SOA• This is different  •   No proprietary message bus  •   Decoupled objects  •   Dedicated storage***• Speak REST...
Speak REST?• Sounds good but… •   REST semantics vary wildly •   HATEOAS vs. practical REST?/api/pet.json/1?delete (GET)/a...
Speak REST?• Sounds good but… •   REST semantics vary wildly •   HATEOAS vs. practical REST?/api/pet.json/1?delete (GET)  ...
SOA makes new Challenges• Its communication (not easy)• Need a consumer & provider contract• Driving force to create Swagger
What is Swagger?• Swagger is…  •   Spec for declaring and documenting an API  •   A framework for auto-generating the spec...
How?• Swagger Codegen  •   Creates a client based on your Swagger Specscala src/main/scala/Codegen.scala   ${swagger-spec-...
In the Wordnik Workflow• Jenkins will… •   Build a service library •   Build a stand-alone application distro •   Build an...
Back to Data• Micro services have small(ish) databases •   Share nothing across services •   YES To replica sets• Deployed...
Keeping Databases Small• Some easy tricks •   Schema-less => "schema per document" •   Keep field names short!db.foo.save(...
Keeping Databases Small• Some easy tricks •   Schema-less => "schema per document" •   Keep field names short!db.foo.save(...
Keeping Databases Small• Dont make _id just an "auto increment" Youre stuck with it! Be smart •   User collection? Try _id...
Keeping Databases Small• DAO or die! •   Fancy index scheme => control access to     collections                          ...
Keeping Databases Small• If/when you need to shard…                                  Dont                                 ...
Keeping Databases Small• Again, why keep them small?• Starting a new replica •   Initial sync •   Index rebuilding• Backup...
Keeping Databases Small• Again, why keep them small?                            Everythin• Starting a new replica      g i...
Ephemeral Storage?• Every EC2 instance type has some  (except micro)• Only available via EC2 API• Less prone to issues tha...
Ephemeral Storage?• Every EC2 instance type has some  (except micro)• Only available via EC2 API• Less prone to issues tha...
Keeping Data Safe
Which Zone? Which Region?
Which Zone? Which Region?Arbiter handles    external  connectivityissue detection
How does this really stack up?• Tuned indexes & access, split with services  •   Was: 3 DAS Devices w/18 TB disk  •   Now:...
As for Services• ~1,000 requests/sec via Swagger-enabled  micro services• Direct to Consumer via SwaggerSocket
Whats Next• Migrating all services to SwaggerSocket •   OSS WebSocket subprotocolhttps://github.com/wordnik/swaggersocket ...
If youre Interested…
If youre Interested…
If youre Interested…
If youre Interested…
If youre Interested…
If youre Interested…
If youre Interested…
See more:developer.wordnik.comswagger.wordnik.comgithub.com/wordnik            Questions?
Upcoming SlideShare
Loading in …5
×

Inside Wordnik's Architecture

4,706 views

Published on

Slides about Wordnik's arch

Published in: Technology, Education
  • Be the first to comment

Inside Wordnik's Architecture

  1. 1. Inside Wordniks Architecture Tony Tam @fehguy
  2. 2. Who is Wordnik?• Founded in 2008 by Erin McKean• "Understand meaning of words automatically"• Patented "Free-Range Definition" technology• Constructed largest (known) English Word Graph We do Discovery
  3. 3. Its all about Data!
  4. 4. Data?• Word Graph is 80 S built by data reads!• Runtime answers needed fast 50M+ Nodes! 80M+ Edges!
  5. 5. What we do with Data• Update the Graph constantly• Augment our NLP pipeline• "Reality-based Annotation" with current, real-world data
  6. 6. What we do with Data• Update the Graph constantly• Augment our NLP pipeline• "Reality-based Annotation" with current, real-world data Language is NOT static
  7. 7. What we do with Data• Update the Graph constantly• Augment our NLP pipeline• "Reality-based Annotation" with Next??? current, real-world data Twitter? Language is NOT static Tumblr? Wordpres s
  8. 8. Is a 20 year-old corpus good enough?
  9. 9. How we do it• Amazon EC2-based deployment• Efficiency through constraint-based architecture • Small is Big!• Horizontal scaling by adding servers! • Yea, we can always go vertical• Blah, blah, more details!
  10. 10. Micro Services• Services are stand-alone building blocks• Increase capacity through a "more like this" button
  11. 11. Micro Services• Big application => micro servicesMonolithicapplication "Isnt this just SOA?"
  12. 12. Micro Services• Big application => micro servicesMonolithicapplication "Isnt this just SOA?"
  13. 13. Micro Services• Big application => micro servicesMonolithicapplication "Isnt this just SOA?"
  14. 14. Micro Services• Big application => micro servicesMonolithicapplication "Isnt this just SOA?"
  15. 15. Not PO-SOA• This is different • No proprietary message bus • Decoupled objects • Dedicated storage***• Speak REST • Develop your services in… • Java • Scala • Ruby • Php
  16. 16. Speak REST?• Sounds good but… • REST semantics vary wildly • HATEOAS vs. practical REST?/api/pet.json/1?delete (GET)/api/pet.json/1 (DELETE) Al/api/pet.json/1 (POST empty) valid!So…
  17. 17. Speak REST?• Sounds good but… • REST semantics vary wildly • HATEOAS vs. practical REST?/api/pet.json/1?delete (GET) Peer All/api/pet.json/1 (DELETE) Review! valid!/api/pet.json/1 (POST empty) Better Docs!So… API API Styleguide Council! !
  18. 18. SOA makes new Challenges• Its communication (not easy)• Need a consumer & provider contract• Driving force to create Swagger
  19. 19. What is Swagger?• Swagger is… • Spec for declaring and documenting an API • A framework for auto-generating the spec • A library for client library generation • A JSON-based test framework• Its open source! • http://swagger.wordnik.com
  20. 20. How?• Swagger Codegen • Creates a client based on your Swagger Specscala src/main/scala/Codegen.scala ${swagger-spec-url} Scal a Ruby
  21. 21. In the Wordnik Workflow• Jenkins will… • Build a service library • Build a stand-alone application distro • Build an installable image (RPM) • Build a compatible client library• Consumers will… • Declare dependency on a service version • Use a client for that version • Be given a list of compatible services, by cluster, version
  22. 22. Back to Data• Micro services have small(ish) databases • Share nothing across services • YES To replica sets• Deployed to ephemeral storage • (more in a bit) • Small by design• How to keep them small?
  23. 23. Keeping Databases Small• Some easy tricks • Schema-less => "schema per document" • Keep field names short!db.foo.save({user_name:"Tony"}) Repeatdb.foo.save({un:"Tony"})10e9 times!• Indexes • They can get *huge* • Make _id matter!
  24. 24. Keeping Databases Small• Some easy tricks • Schema-less => "schema per document" • Keep field names short!db.foo.save({user_name:"Tony"}) Repeatdb.foo.save({un:"Tony"})10e9 times!• Indexes • They can get *huge* • Make _id matter!
  25. 25. Keeping Databases Small• Dont make _id just an "auto increment" Youre stuck with it! Be smart • User collection? Try _id: username • Email collection? Try _id: email • Date-driven collection? How about _id: "20120502" • db.logins.find({_id:/^201205/}) 1 7 Be lazy until you cant anymore! 1 2 5 7
  26. 26. Keeping Databases Small• DAO or die! • Fancy index scheme => control access to collections NO!!!! Yes
  27. 27. Keeping Databases Small• If/when you need to shard… Dont make your clients do this!
  28. 28. Keeping Databases Small• Again, why keep them small?• Starting a new replica • Initial sync • Index rebuilding• Backups• Index Compaction• Speed• TCO
  29. 29. Keeping Databases Small• Again, why keep them small? Everythin• Starting a new replica g is • Initial sync easier • Index rebuilding• Backups• Index Compaction• Speed This can• TCO take DAYS
  30. 30. Ephemeral Storage?• Every EC2 instance type has some (except micro)• Only available via EC2 API• Less prone to issues than EBS• Faster ***• Included in cost of server
  31. 31. Ephemeral Storage?• Every EC2 instance type has some (except micro)• Only available via EC2 API• Less prone to issues than EBS• Faster ***• Included in cost of server But dies on host reboot!
  32. 32. Keeping Data Safe
  33. 33. Which Zone? Which Region?
  34. 34. Which Zone? Which Region?Arbiter handles external connectivityissue detection
  35. 35. How does this really stack up?• Tuned indexes & access, split with services • Was: 3 DAS Devices w/18 TB disk • Now: 21 M1.large + M1.xlarge instances • 3 Zones, 2 regions• The Gory Detailsblog.wordnik.com/with-software-small-is-the-new-big
  36. 36. As for Services• ~1,000 requests/sec via Swagger-enabled micro services• Direct to Consumer via SwaggerSocket
  37. 37. Whats Next• Migrating all services to SwaggerSocket • OSS WebSocket subprotocolhttps://github.com/wordnik/swaggersocket • 25%-100% speed increase (sync & async)• Discovery via Wordnik
  38. 38. If youre Interested…
  39. 39. If youre Interested…
  40. 40. If youre Interested…
  41. 41. If youre Interested…
  42. 42. If youre Interested…
  43. 43. If youre Interested…
  44. 44. If youre Interested…
  45. 45. See more:developer.wordnik.comswagger.wordnik.comgithub.com/wordnik Questions?

×