Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Architecting extremelylargescalewebapplications

An overview of the much needed vicissitude in the architectural thought transformation from monolithic to microservices architecture

  • Be the first to comment

Architecting extremelylargescalewebapplications

  1. 1. ARCHITECTING EXTREMELY LARGE SCALE WEB APPLICATIONS A MUST read for every architect ABSTRACT An overviewof the muchneededvicissitude in the architectural thoughttransformationfrom monolithictomicroservicesarchitecture. PRASHANTH B PANDURANGA Directorof Technology
  2. 2. New York times auto scaled to 500,000 users, HipChathas about 1.2 billion messages/documentsstored, Sales force deals with 1,300,000,000 daily transactionswithover24,000 databasetransactionspersecond and over 22PB of raw SAN storage capacity, CinchCast has over 50 million page views a month, Pinterest has over 18 million visitors with a 10X growthrate, Amazon hasover 55 million active customer accounts, Flickr has over 4 billion queries per day, Netflix has 48 million members with over 50,000 requests per second and the list goes on. . Thanks to HighScalability, where the above statistics are derived from, you get a picture of the websites and the scale that I amreferring to! I hadcovered the SAAS requirementsin my previousblog. . In this blog I shall providea bird’seye view of the technologies used by a few large scale websites and their impact on the multi-tier web applicationarchitecture. Reference:HighScalability has someawesome architecture blogs, I havederived the technology stack from multitudeof articles hostedby highscalability anda few from Netflix blogs Technology Stack This section is notmeant to be an extensive overview of each of the below websites and their technology stack, butrather providesan overview of the variety of tools usedand the logical layers that form the architecture. NOTE: The below mentioned toolsused by thecompanies are point in time and may have been changed. Figure 1 NewYorkTimes Technology Stack
  3. 3. Figure 2 HipChat Technology stack Figure 3 Salesforce Technology stack
  4. 4. Figure 4 Netflix Technology stack I haveelaborated on SAAS requirementsand general architectural requirementsin my previousblog. engineering/ The requirements which apply the mostto large scale web applications: Performance: Therearelotof statisticspublishedrelatingto performanceimplicationsof web applications. Onesuchstatistic, Users abandonwebsiteevenif thereis a 2 seconddelay duringa transaction. Largescale web applications obviously have very high performance requirements. A very large percentage of applications gets redesigned primarily for a better user experience. Average Page Load Time as a fraction of Server and client time, Network time, Page views, Bounce Rate, Percentage Exit, Average Redirection Time, Average Domain Lookup Time, Average Server Connection Time, Average Server Response Time, Average Page DownloadTime, Average contentload time, Average sessiontime, DNSresolutiontime, TCP connection time, Time to first byte, Full page object load time, Requests per second, error rates, Peak responsetime, Uptime, CPU utilization, Memory Utilizationare just a few metrics tolook for Availability:Businesscontinuity isof utmostimportance. Various availability techniques can be applied on every layer. AlwaysOn Failover Cluster Instances, AlwaysOn Availability Groups, Database mirroring, Log shipping, Redundancy models - Active-Active, Active-Passive, Redundancy Methods - Hot Standby, Warm Standby, Cold Standby, and the measurement of the same expressed as mean time to failure, mean time to repair, Availability as a measure of MTTF / (MTTF + MTTR), Eliminating single points of failure,
  5. 5. Accelerating fault detection, isolation and resolution, hot spares, warm spares, cold spares, clustering, RAID, redundancy, Heart beats, watermarkingresources, check pointing, Watchdogs andmore Monitoring and Diagnostics: 26 front end proxy serves. Double that in backend app servers [Atlassian HipChat], 15,000+ hardware systems – [Salesforce], 100 hardware nodes in production – [CinchCast], 180 Web Engines + 240 API Engines, 88 MySQLDBs(cc2.8xlarge) + 1 slave each, 110 RedisInstances, 200 MemcacheInstances,4 Redis Task Manager + 80 Task Processors, ShardedSolr – [Pinterest], 1000+supporteddevices – [Netflix] Imaginemaintainingthose, when there are innumerableservers involved, failure of systemcomponentsis common, needless to say, to take appropriate timely actions monitoring and diagnostics plays a very importantrole. Scalability:Capability ofsupporting andoptimizingresource utilization on increasing workloadson various dimensions such as memory, cores, data structures, throughout and more. Goes without doubt that the application need to scale, in order for the application to perform well, and without automating it, the applicationcannot stay inexpensive. Automation:Identifyingfailuresandautomatingre-provisioningofthosecomponents/serversisextremely important Architecture hassignificantly emerged from a monolithicarchitecture to microservices architecture. Figure 5 Minimalistic layers Let’s take a look at the logical segregationof layers in current large scale web applications. APPLICATION DATA PRESENTATION SERVICE/BUSINESS LOGIC DATA
  6. 6. In the following table we look at some of the technologies usedby highscale websites and compare them to other similar tools/technologies Note:The technologiescovered by the abovewebsites was usedas the benchmark for comparison. Commentsare providedonly where I wanted toprovide additionaldetails. Tools/software Usage/Descriptio n TOP few similar tools/software to consider Comments RabbitMQ Message broker system Message oriented middleware Queuing software ESB ActiveMQ, Amazon SQS, HornetQ, HiveMQ, JMS, Kafka, ZeroMQ, MSMQ, NServiceBus, Azure Service Bus, OpenMQ, Redis, Storm, Akka, ApacheCamel and Spring, OFM, FuseESB, WebsphereMQ, WindowsService Bus, BizTalk, WSO2, Mule, Talend ESB, Gearman, JBoss, ServiceMix, OpenESB, Apache QPID Look out for AMQP compliance Someof the tools referred aren’t Messagebrokers butare usedin conjunctionto perform the same Other AMQP PIKA Shovel Kinesis Real time data processing Kafka, Storm Tornado Web Server HTTP Server Nginx, Apache, IIS, lighttpd, haproxy, varnish, glassfish, Jetty, Geronimo, Tomcat, Someare even usedas reverse proxy, proxies, PRESENTATION SERVICE/BUSINESS LOGIC DATA SECURITY ANALYTICS MONITORING&DIAGNOSTICS STORAGE EVENTS NOTIFICATION VIRTUALIZATION NETWORKCOMPUTE HARDWARE LAYER CLOUD CLOUD ADAPTER AUTOMATION/BATCH INTEGRATION PROCESSING ENGINE CACHE MANAGEMENT AD-HOCLAYER AUDITING CONFIGURATION DISTRIBUTED PROCESSING INGESTION DATA MANAGEMENT DATA RULES META DATA DATA QUALITY CLICKSTREAM DATA SOURCES CHAT SENSOR SOCIAL LOGS CRM ERP APPLICATION DATA CHANNELS EDW METADATA MANAGEMENT MULTI TENANT PROCESSING PARALLEL COMPUTE PARALLEL OPERATIONS COMPLEX EVENT PROCESSING GOVERNANCE WORKFLOW REPORTING STREAM PROCESSING IN-MEMORY PROCESSING MESSAGE TRANSPORTSMESSAGE QUEUESMESSAGE BROKERS INTEGRATION FRAMEWORKS ENTERPRISE SERVICE BUS INTEGRATION SUITES OPERATING SYSTEM
  7. 7. Cassandra Distributed database management system Mongodb, aerospike, accumulo, azure table storage, bigtable, couchbase, couchdb, dynamodb, datastax, ElasticSearch, Greenplum, Vertica, HBase, InfiniDb, InnoDB, MariaDB, neo4j, Netezza, TeraData, RedShift, Riak, RavenDB, Solr, Spark, VoltDB, With over 200 dbs, it’s difficult to list all: Checkoutthe below link https://prashanthp anduranga.wordpre /why-nosql-ok-but- why-so-many/ Someof themare dataware housing Solutions, While some are data processingengines Hana In-Memory database GemFire, Hekaton, Aerospike, BigMemory, DataBlitz, EhCache, eXtremeDB, FuelDB, HazelCast, MonetDB, Coherence, VoltDB Any Key value store can be used for the same, some of the enterprises have experimented using NoSQLstore used as a cache and unstructureddb solution Linux Operating System SUSE, FreeBSD, Solaris, Debian/Ubuntu, WindowsServer, Mac OSX, RHEL Only Server OS included in the list SockJS Web Socket like object Web socket,, atmosphere, SignalR, Alchemy, Fleck Couple of them listed are open source Libev Event Loop LibEvent, Asio, Nginx, epoll Fabrik Visual programming IDE Also known for Content construction Kits (CCK) IDE: Visual Studio, Eclipse, NetBeans, Aptana CCKS:Seblod, K2, chronoform, Zoo, Breezingforms, Cobalt, FlexiContent Java Programming language C, C++, Python, C#, PHP, Javascript, Ruby, R, Matlab, Objective-C, Visual Basic, Perl, Swift, Scala, Shell, GO, LISP, SAS, F#, Groovy, Lua Someof them listed are web programming languages adiitionals: HTML, SQL, Haskell Twisted Event driven network programming framework Tornado, Django, Asyncio, AWS Cloudprovider Azure, Rackspace, CenturyLink, Salesforce, Engineyard, Google, OpenStack, SAP, CloudBees, CumuLogic, Eucalyptus, Gigaspaces, Mulesoft, Parallels, Pivotal, puppetLabs, Ravello, The list includes infrastructure, platform, storage andsecurity cloud providers
  8. 8. Rightscale, SoftwareAG, Xively, AT & T, Cisco, Comcast, EMC, GoGrid, CSC, HP, IBM smartcloud, Joyent, Lucene Test search Engine Library Azure Search, Autonomy, Solr, GSA, Attivio, DTSearch, elasticSearch, endeca, FAST, MarkLogic, Nutch, Sphinx, Sketchy, Scumblr A few NOSQL databaseshave been used for the same, This list does notinclude all the NOSQLdatabases thatcould be used Adobe Air Cross-platform runtime Cordova(Phonegap), Appcelerator, Qt, Sencha, cocos2d-x, Xamarin, ionic, Kony, mono, xcode The ones listed here are cross platform as well as mobile development platforms. Sensu Monitoring Framework Zabbix, Nagios, icinga, monit, Riemann, statsd, graphite, zenoss, collectd, munin, cacti, new Relic, ganglia, splunk, sentry, dynatrace, datadog, skylight, zenoss, observium, spiceworks, solarwinds, fiddler, wireshark, httpwatch, firebug, soapUI, OpManager The list includes some of the: Infrastructure monitoring Searching, monitoringand analysing, Network monitoring Scalable distributed monitoringsystem PagerDuty NeoLoad Incident management system and performance testing and monitoring OpsGenie, VictorOps, xmatters, pingdom, Gomez, webpagetest, monitis, uptrends, keynote, OpsView, Apache JMeter, LoadRunner, WebLOAD, Appvance, NeoLoad, LoadUI, WAPT, Loadster, LoadImpact, Soasta, RationalPerformance Tester, Testing Anywhere,OpenSTA, QEngine (ManageEngine), Loadstorm, CloudTest, Httperf, SilkPerformer, BlazeMeter, Visual StudioTest Suite, Also includes web site monitoring Cloudbased quality testing Performance monitoring Chef IT Automation Puppet, ansible, salt, docker, Jenkins, Capistrano, saltstack Configuration management, SCCM memCache Distributed memory object caching Apc, memcached, dynacache, ehcache, xcache, key valuebased NOSQLdatabases are alsoused Razor Physical and virtual hardware provisioning solution Axemblr, Cobbler, JuJU, SaltCloud, Dell Crowbar, Ansible, CFEngine, Chef Perforce Version Management and Content collaboration Git, SVN, TFS, bitbucket, ClearCase, Subversion
  9. 9. Pytheas ITIL assets management software Remedy (BMC), Assyst(Axios), FrontRange, EasyVista, Hornbill, HP Service Manager, SmartCloudControl Desk (IBM), ServiceNow IT incident management, IT problem management, IT change management, IT release governance, IT user self-service, IT request management, IT knowledge management, IT service support analyticsand reporting, IT SLA management Ref: Gartner ZUUL Service that provides dynamic routing, monitoring, resiliency and security Nginx, lightpd, Netscaler, HAProxy, Radware, CoyotePoint, Barracuda, Kemp, Varnish, Avast, Norton, Kaspersky, Mcafee, AVG, Avast, Bitdefender, F5, PaloAlto, Cisco ASA, Cisco ACE, Foundary, JuniperSSG, MSTMG Can be firewall, router, web load balancing server, proxy Server etc. Feign Javahttpclient binder Retrofit, JAX-RS, websocket, Jersey, CXF, Apache HC Includestransport libraries Hive Querying and managing large datasets residing in distributed storage Impala, BigSQL, HAWQ AWS ELB Elastic Load balancing Nginx, HAProxy, Route53, AzureTraffic Manager, F5  Port-boundservers, sticky sessions, TCP session reassignment, automaticunfail, slow start, SynGuard, dynamic feedback protocol, NAT, maximum connection, Round Robin, Least Connections, Weighted Round Robin, Weighted
  10. 10. Least Connections, FastestResponse Layer 4 andLayer 7 load balancing CloudLoad balancing features: Dedicated (static) IP address,SSL termination Multiple protocols, Advancedaccess control, Connection logging, Advanced algorithmic routing, Session persistence, Connection throttling, Node management, High availability Contentcaching, Persistent connections, Gzip compression, Regionalized load balancers gZip Applicationused for file compression and decompression httpZip,deflate, 7zip, bzip2, zlib Akamai Content delivery network Azure CDN, Cloudfront, Torbit, Incapsula, Cotendo, Fastly HTML 5 frameworks Javascript Frameworks panduranga/frameworks/10152107517972934 OpenStack OpenSource Cloudcomputing platform OpenStack currently has the following features: Compute(Nova), Object Storate(Swift), Block Storage(Cinder), Networking (Neutron), Dashboard(Horizon), Identity Service (Keystone), Image Service (Glance), Telemetry (Ceilometer), Orchestration(Heat), Database(Trove), Bare Metal Provisioning(Ironic), Multiple tenantcloud messaging(Zaqar), Elastic MapReduce (Sahara) Hadoop Distributedstorage anddistributedprocessingof very large data setson computer clusters Aegisthus Bulk DataPipeline outof Cassandra
  11. 11. Eureka Eureka is a REST (Representational State Transfer) based service that is primarily usedin the AWS cloud for locating services for the purposeof loadbalancing and failover of middle-tier servers Genie Federated JobExecution Engine Clojure Dynamicprogramminglanguagethat targets the JavaVirtual Machine PigPen Map-Reducefor Clojure Governator Governatoris a library of extensionsandutilities thatenhance Google Guice to provide:classpathscanningand automaticbinding, lifecycle management, configurationto field mapping, field validationandparallelized object warmup Inviso Visualize Hadoopperformance Ribbon Ribbon is a Inter ProcessCommunication(remote procedurecalls) library with built in software loadbalancers Hystrix Hystrix is a latency and fault tolerance library designed to isolate pointsof access to remote systems, servicesand 3rdparty libraries, stopcascadingfailure and enable resilience in complex distributedsystemswhere failure is inevitable Suro Distributeddata pipeline Aminator A toolfor creating EBS AMIs Lipstick Pig Visualizationframework Zeno In-Memory DataPropagationFramework Blesk Lightweight client for pushingnotificationsto web basedapplications/sites Turbine Turbine is a tool for aggregating streamsof Server-SentEvent(SSE) JSON dataintoa single stream. The targeted use case is metrics streams from instancesin an SOA being aggregated for dashboards Priam Co-Processfor backup/recovery, TokenManagement, andCentralizedConfiguration managementfor Cassandra Workflowable Workflowable is a Ruby gem that allows addingflexible workflow functionality to Ruby onRails Applications s3mper S3mperis a library that providesan additionallayer of consistency checking on top of Amazon'sS3 index throughuseof a consistent, secondary index Astyanax JavaClient for Apache Cassandra Denominator Denominatoris a portable Javalibrary for manipulatingDNSclouds. Denominator has pluggableback-ends, includingAWS Route53, NeustarUltra, DynECT, Rackspace CloudDNS, OpenStack Designate, and a mock for testing GCViz Garbage Collector Visualization framework Curator The Curator Framework is a high-level API thatgreatly simplifies usingZooKeeper. It addsmany features that build onZooKeeperand handlesthe complexity of managingconnectionsto the ZooKeepercluster and retryingoperations Staash A language-agnosticaswell as storage-agnosticwebinterface for storingdata into persistentstorage systems, themetadatalayer abstractsa lot of storage details and the patternautomationAPIstake care of automatingcommondataaccess patterns Edda Edda is a Service totrack changes in cloud deployments Brutal An asyc centered chat bot framework for pythonprogrammerswrittenusing the twisted framework CassJMeter JMeter pluginto run cassandratests Glisten Groovy library for building JVM applicationswith AmazonSimple Workflow (SWF) Pig Platformfor analyzinglarge data sets Spark Engine for big dataprocessing, with built-inmodulesfor streaming, SQL, machine learning and graphprocessing
  12. 12. Karyon Framework and a library for a cloudready web service. Blueprint for the services. It containsBootstrapping, LibrariesandLifecycle Management, RuntimeInsightsand Diagnostics, PluggableWeb Resources, Cloud-Ready hooks EBS Elastic Block store, persistentblock level storage volume Curler A Gearman worker which cURLsto do work archaius, , Library for configurationmanagementAPI ZooKeeper ZooKeeperis a centralized service for maintainingconfigurationinformation, naming, providingdistributedsynchronization, andprovidinggroupservices Parallel processing - Explicit and Implicit parallelism, batch parallelism, asynchronous programming, segregating layers, distributing workloads, Load balancing, multi- tenancy, scaling out on all layers, sharding, partitioning, CAP preference, reads, writes, statelessness, logging and telemetry, automating, SOA adoption, caching, throttling, distributing requests across multiple zones, effective usage of CDNs, Auto provisioning, Autoscaling, compression, queuing, workloaddistribution, batchprocessing, designing system with fault tolerance, redundancy, Consistency, Availability, Partition Tolerance, event processing, web sockets, cloud computing, fog computing, Grid Computing, Client side workload distribution, In- Memory processing, Proxies, No single points of failure. Resilience to failure, Graceful degradation, Recoverability from failure, design for failure, Database Transactions, Client side transactions, two-phase commit, Auto-commit, Partition Everything, DB operations ordering, Considerations for Eventual consistency, Functional Segmentation, Application Pools, Prevention of session state, Async Everywhere, Index, StructuredIndexes, text indexes, entity indexes, Fuzzy match indexes, pre-aggregatedindexes, pre- calculated indexes, embedded value indexes, join indexes, link indexes, De-Normalized Indexes (all kinds) are all importantconsiderationsfor a highly successfuland scalable website. Restassuredif youhaveconsideredallthe abovefactorsin yourarchitecture youareonyourway to create a scalable one. Do let me know if you have questions regardingany particular subject andI will be glad to write up onthe same. .