How it's made - MyGet (CloudBurst)


Published on

Ever wonder how some applications are built? Ever wonder how to combine components of the Windows Azure platform? Stop wondering and learn how we’ve built, a multi-tenant software-as-a-service. In this session we’ll discuss architecture, commands, events, access control, multi tenancy and how to mix and match those things together. Learn about the growing pains and misconceptions we had on the Windows Azure platform. The result just may be a reliable, cost-effective solution that scales.

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Maarten
  • Maarten
  • Maarten
  • How it's made - MyGet (CloudBurst)

    1. 1. How it’s made: MyGet Maarten Balliauw @maartenballiauw
    2. 2. Who am I? Maarten Balliauw Daytime: Technical Evangelist, JetBrains Co-founder of MyGet Author – Pro NuGet AZUG Focus on web ASP.NET MVC, Windows Azure, SignalR, ... MVP Windows Azure & ASPInsider @maartenballiauw
    3. 3. Who am I? Maarten Balliauw Daytime: Technical Evangelist, JetBrains Co-founder of MyGet Author – Pro NuGet AZUG Focus on web ASP.NET MVC, Windows Azure, SignalR, ... MVP Windows Azure & ASPInsider @maartenballiauw
    4. 4. Who am I? Maarten Balliauw Daytime: Technical Evangelist, JetBrains Co-founder of MyGet Author – Pro NuGet AZUG Focus on web ASP.NET MVC, Windows Azure, SignalR, ... MVP Windows Azure & ASPInsider @maartenballiauw
    5. 5. Agenda NuGet? MyGet? How we started What we did not know Our first architecture Our second architecture Multi-tenancy ACS Tough times (learning moments) When business meets technology Conclusion
    6. 6. NuGet? MyGet?
    7. 7. NuGet? MyGet?
    8. 8. NuGet? MyGet?
    9. 9. Why MyGet? Safely store your IP with us Creating packages is hard. We have Build Services! Granular security Activity streams Symbol server Analytics
    10. 10. I’m not alone! Xavier Decoster @xavierdecoster Yves Goeleven @yvesgoeleven Also known as @MyGetTeam
    11. 11. How we started
    12. 12. The real begin? May 09, 2011
    13. 13. NuPack! Using OData as their feeds Which is some sort of WCF… Multiple feeds? Exchanged some ideas with Xavier Prototyped something during TechDays Belgium, 2011
    14. 14. Prototype online! May 31, 2011
    15. 15. Technologies used? Windows Azure Windows Azure Table Storage & Blob Storage Windows Azure ACS (no way I’m typing another user registration) ASP.NET MVC 2 MEF
    16. 16. Here’s some code from back then… [Authorize] public class FeedController : Controller { public ActionResult List() { var privateFeedTable = PrivateFeedTable.Create(); var privateFeeds = privateFeedTable.GetAll( f => f.PartitionKey == User.Identity.Name.ToBase64()); var model = new PrivateFeedListViewModel(); foreach (var privateFeed in privateFeeds.Where(f => f.IsVisible)) { var privateFeedViewModel = new PrivateFeedViewModel(); model.Items.Add(AutoMapper.Mapper.Map(privateFeed, privateFeedViewModel)); } return View(model);
    17. 17. How about this one? try { privateFeedNuGetPackageTable.Add(privateFeedPackage); } catch { // Omnomnom! }
    18. 18. Best practices used back then?
    19. 19. Architecture at the time? Cloud Services: one web role doing all work Storage: one storage account Windows Azure Access Control Service
    20. 20. What we did not know…
    21. 21. Users would come! Grew from 5 feeds to 70 feeds in a few weeks 10 feeds per week added thereafter
    22. 22. Data would come! One user pushed 1.300 packages worth 1 GB of storage Others started pushing CI packages Others seemed to be copying
    23. 23. ReSharper time!
    24. 24. ReSharper time! A lot of refactoring done Direct data access -> repositories Repositories used by services Services used by controllers Using best practices SOLID and DRY (well, not everywhere but refactoring takes time) Running on two instances (availability, yay!)
    25. 25. We became a startup Someone mentioned they would pay for our service Think about business model Volume of feeds and packages kept going up Users in EU and US
    26. 26. Our first architecture
    27. 27. Our first architecture - code
    28. 28. Awesome! Best practices! Layers! Typical business application architecture!
    29. 29. Not so awesome… Best practices! Are they? Layers! No spaghetti code but lasagna code Typical business application architecture! Proved to be very inflexible
    30. 30. Our first architecture - infrastructure
    31. 31. Awesome! Datacenters nearby our users Centralizes storage Packages on CDN for faster throughput DNS fail-over if one of the DC’s went down
    32. 32. No so awesome… Datacenters nearby our users Or not? Centralizes storage Speed of light! USA was slow! Packages on CDN for faster throughput Sync issues, downtime, … DNS fail-over if one of the DC’s went down Seems not every ISP follows DNS standards
    33. 33. We persisted! Local caching in USA added 2 instances in EU, 3 in the USA Speed of light! Syncing all data kept being slow Populating cache was a nightmare CDN kept having issues Of 3 instances, only 1 was being used with enough load (60%)
    34. 34. We were growing! We had public subscription plans We added enterprise tenants (multi-tenancy added) Resulting in… Architecture became complex Caching and syncing became complex
    35. 35. ReSharper time!
    36. 36. Our second architecture
    37. 37. We had a look at our workloads Managing feeds and packages Doesn’t matter much where (sync vs. bandwidth) Downloading packages May matter where, let the tenant decide Builds Who cares where!
    38. 38. Our 2nd architecture - infrastructure
    39. 39. Our 2nd architecture - code
    40. 40. Our first architecture… … was scaled across the globe … but as synchronous as it could be … prone to all issues with latency vs. synchrony Event Driven Architecture?* *disclaimer: we borrowed some concepts from EDA
    41. 41. EDA in MyGet Some actions put an ICommand on a queue (ground rule: if it can’t be done in 1 write, use ICommand) All actions complete with an IEvent on a queue Handlers can subscribe to ICommand and IEvent Handlers are idempotent and not depending on others
    42. 42. Example: log in 2 operations: 1 read, 1 write Read the profile Store the profile with LastLogin date No use of ICommand Finishes with UserLoggedInEvent
    43. 43. Example: change feed owner Many operations! Read two user profiles Read current access rights Change access rights Push new privileges to One command, one event ChangeFeedOwnerCommand FeedOwnerChangedEvent
    44. 44. Example: change feed owner
    45. 45. Gain? We now run on 2 instances, mostly for redundancy Average CPU usage? 20% (across machines) Flexibility! Way easier to implement new features! New feature: activity log Simply subscribe to events we want to see in that log
    46. 46. Storage No relational database (why not?) Event-driven architecture How do you store a feed’s packages and versions in an optimal way? Three important values: feed name, package id, package version Table per feed Package id = PartitionKey Package version = RowKey
    47. 47. Storage Reading 1.000 rows and deserializing them is SLOW (many seconds) We cache some tables on blob storage 1.000 rows in serialized JSON = small Loading one file over HTTP = fast Searching in memory through 1.000 rows = fast Cache update subscribed to IEvent
    48. 48. Multi-tenancy
    49. 49. How to bring this into code Just like Request, Response and User: a Tenant is contextual All those are potentially different for every request DI containers with lifetimes exist…
    50. 50. Resolving a tenant public interface ITenantContext { Tenant Tenant { get; } } // Registration in container builder.RegisterType<RequestTenantContext>() .As<ITenantContext>().InstancePerLifetimeScope(); public class RequestTenantContext { public Tenant Resolve(RequestContext context, IEnumerable<Tenant> tenants) { var hostname = context.HttpContext.Request.Url.Host; return tenants.FirstOrDefault(t => t.HostName == hostname); } }
    51. 51. Windows Azure Access Control Service
    52. 52. Imagine managing this! Multiple applications localhost:1196 … Multiple identity providers Who wants Microsoft Account? Google anyone? Oh, your custom ADFS? Sure!
    53. 53. ACS = identity orchestration
    54. 54. ACS for MyGet No more user registration One single trust relationship (= less coding) Microsoft Account, Yahoo!, Google, Facebook Other IdP’s (tenants and our own)* *We built many others and are working on a spin-off (Twitter, LinkedIn, Microsoft Account, …)
    55. 55. One small trick… var realm = TenantContext.Tenant.Realm; var allowedAudienceUris = FederatedAuthentication.FederationConfiguration .IdentityConfiguration .AudienceRestriction .AllowedAudienceUris; if (allowedAudienceUris.All( audience => audience.ToString() != TenantContext.Tenant.Realm)) { allowedAudienceUris.Add(new Uri(TenantContext.Tenant.Realm)); }
    56. 56. Tough times (learning moments)
    57. 57. Huge downtime on July 2nd, 2012 Symptoms: Users complaining about “downtime” No monitoring SMS alert Half an hour later: “site up!”, “site down!”, “site up!”, “site down!” SMS alerts No sign of issues in the Windows Azure Management portal But what’s the cause? We just deployed our multi-tenant architecture We just enabled storage analytics ELMAH was showing storage throttling 16.000 unprocessed commands and events in queue Full story at
    58. 58. Huge downtime on July 2nd, 2012 One, simple piece of code… GetHashCode() on Package object faulty GetHashCode() used to track object in data context (new vs. update) 2 objects with the same hashcode = UnhandledException Full story at
    59. 59. An exception killed the site? WTF?!? No. We caught any Exception and back then, blindly retry operations Resulting in 16.000 commands and events being retried continuously Causing storage throttling Causing the website to retry reads Causing more throttling Starving IIS worker threads Lessons learned? A simple bug can halt the entire application Only retry transient errors Our monitoring wasn’t optimal Our code wasn’t optimal (code from back when MyGet was a blog post…)
    60. 60. Huge downtime February 23rd, 2013 Symptoms: Everything down Furious users on social media Windows Azure Management Portal Down Furious tweets about #WindowsAzure The cause? Global outage of Windows Azure due to an expired SSL certificate on storage Full story at
    61. 61. Considerations and lessons learned Move storage to HTTP instead of HTTPS? Windows Azure down globally impacts us quite a bit Fail-over to another solution costs money and lots of effort Decided against it for now Considering off-Windows Azure backups of at least all packages Full story at
    62. 62. One more! New features… “Retention policies” introduced Seemed to be a success! 3+ million commands and events in queue Solution: scale out 20 instances did it in a few minutes Solution for the future: feature toggling
    63. 63. But overall…
    64. 64. When business meets technology
    65. 65. We’re constantly being bitten Introduce a new beta feature Come up with a revenue model See the feature needs serious rewriting (metering) Lesson learned? Think revenue early on.
    66. 66. Measure everything, test assumptions “The Lean Startup” book says this Don’t build it yourself: Google Analytics
    67. 67. this is why we built username/password registration, seems a lot of people prefer typing instead of one clic we must keep investing in Build Services feed discovery is more popular than we imagined from zero reactions on our blog and Twitter the technical fear we had about “download as ZIP” consuming too much server resources? Seems 19 people used it this month. *yawn*
    68. 68. Conclusion
    69. 69. Conclusion NuGet? MyGet? How we started What we did not know Our architecture Multi-tenancy Though times provide learning Measurement too
    70. 70. @maartenballiauw