Your SlideShare is downloading. ×
Building Highly Scalable Java Applications on Windows Azure - JavaOne S313978
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Building Highly Scalable Java Applications on Windows Azure - JavaOne S313978

8,318

Published on

Presentation delivered at JavaOne 2010. Talks about how to use Java to build highly scalable and reliable applications on Windows Azure.

Presentation delivered at JavaOne 2010. Talks about how to use Java to build highly scalable and reliable applications on Windows Azure.

Published in: Technology
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
8,318
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
196
Comments
0
Likes
6
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Microsoft's Windows Azure platform is a virtualized and abstracted application platform that can be used to build highly scalable and reliable applications, with Java. The environment consists of a set of services such as NoSQL table storage, blob storage, queues, relational database service, internet service bus, access control, and more. Java applications can be built using these services via Web services APIs, and your own Java Virtual Machine, without worrying about the underlying server OS and infrastructure. Highlights of this session will include: • An overview of the Windows Azure environment • How to develop and deploy Java applications in Windows Azure • How to architect horizontally scalable applications in Windows Azure
  • To build for big scale – use more of the same pieces, not bigger pieces; though a different approach may be needed
  • Transcript

    • 1. Building Highly Scalable Java Applications on Windows Azure
      David Chou
      david.chou@microsoft.com
      blogs.msdn.com/dachou
    • 2. >Introduction
      Agenda
      Overview of Windows Azure
      Java How-to
      Architecting for Scale
      What’s Next
    • 3. >Azure Overview
      What is Windows Azure?
      A cloud computing platform(as-a-service)
      on-demand application platform capabilities
      geo-distributed Microsoft data centers
      automated, model-driven services provisioning and management
      You manage code, data, content, policies, service models, etc.
      not servers (unless you want to)
      We manage the platform
      application containers and services, distributed storage systems
      service lifecycle, data replication and synchronization
      server operating system, patching, monitoring, management
      physical infrastructure, virtualization networking
      security
      “fabric controller” (automated, distributed service management system)
    • 4. >Azure Overview
      How this may be interesting to you
      Not managing and interacting with server OS
      less work for you
      don’t have to care it is “Windows Server” (you can if you want to)
      but have to live with some limits and constraints
      Some level of control
      process isolation (runs inside your own VM/guest OS)
      service and data geo-location
      allocated capacity, scale on-demand
      full spectrum of application architectures and programming models
      You can run Java!
      plus PHP, Python, Ruby, MySQL, memcached, etc.
      and eventually anything that runs on Windows
    • 5. > Azure Overview >Anatomy of a Windows Azure instance
      Compute – instance types: Web Role & Worker Role. Windows Azure applications are built with web role instances, worker role instances, or a combination of both.
      Storage – distributed storage systems that are highly consistent, reliable, and scalable.
      Anatomy of a Windows Azure instance
      HTTP/HTTPS
      Each instance runs on its own VM (virtual machine) and local transient storage; replicated as needed
      Guest VM
      Guest VM
      Guest VM
      Host VM
      Maintenance OS,
      Hardware-optimized hypervisor
      The Fabric Controller communicates with every server within the Fabric. It manages Windows Azure, monitors every application, decides where new applications should run – optimizing hardware utilization.
    • 6. > Azure Overview > Application Platform Services
      Application Platform Services
      Application
      Marketplace
      Information Marketplace
      Personal Data Repository
      Application Services
      Workflow Hosting
      Distributed Cache
      Services Hosting
      Frameworks
      Claims-Based Identity
      Federated Identities
      Secure Token Service
      Declarative Policies
      Security
      Registry
      On-Premise Bridging
      Service Bus
      Connectivity
      Transact-SQL
      Data Synchronization
      Relational Database
      ADO.NET, ODBC, PHP
      Data
      Compute
      C / C++
      Win32
      VHD
      Dynamic Tabular Data
      Blobs
      Message Queues
      Distributed File System
      Content Distribution
      Storage
    • 7. > Azure Overview >Application Platform Services
      Application Platform Services
      Application Services
      Hosting
      Caching
      Frameworks
      WIF, ADFS2, MFG
      Security
      Access Control
      “Sydney”
      Connectivity
      Service Bus
      SQL Azure Data Sync
      “Houston”
      Data
      VM Role
      Compute
      Table Storage
      Blob Storage
      Queue
      Drive
      Content Delivery Network
      Storage
    • 8. >Java How-To
      Java and Windows Azure
      Provide your JVM
      any version or flavor that runs on Windows
      Provide your code
      no programming constraints (e.g., whitelisting libraries, execution time limit, multi-threading, etc.)
      use existing frameworks
      use your preferred tools (Eclipse, emacs, etc.)
      File-based deployment
      no OS-level installation(conceptually extracting a tar/zip with run.bat)
      Windows Azure “Worker Role” sandbox
      standard user (non-admin privileges; “full trust” environment)
      native code execution (via launching sub-processes)
      service end points (behind VIPs and load balancers)
    • 9. > Java How-To > Boot-strapping
      Some boot-strapping in C#
      Kick-off process in WorkerRole.run()
      get environment info (assigned end point ports, file locations)
      set up local storage (if needed; for configuration, temp files, etc.)
      configure diagnostics (Windows Server logging subsystem for monitoring)
      launch sub-process(es) to run executable (launch the JVM)
      Additional hooks (optional)
      Manage role lifecycle
      Handle dynamic configuration changes
      Free tools
      Visual Studio Express
      Windows Azure Tools for Visual Studio
      Windows Azure SDK
    • 10. > Java How-To > Tomcat
      Running Tomcat in Windows Azure
      Service Instance
      listen port(x)
      Service Instance
      Worker Role
      Sub-Process
      Tomcat
      server.xml
      Catalina
      index.jsp
      new Process()
      RoleEntry Point
      bind port(x)
      get
      runtime
      info
      SQL Database
      JVM
      http://instance:x
      http://instance:y
      Service
      Bus
      Access Control
      http://app:80
      Fabric Controller
      Load Balancer
      Table
      Storage
      Blob
      Storage
      Queue
    • 11. > Java How-To > Jetty
      Running Jetty in Windows Azure
      Boot-strapping code in WorkerRole.run()
      Service end point(s) in ServiceDefinition.csdef
      string response = ""; try{     System.IO.StreamReadersr;     string port = RoleEnvironment.CurrentRoleInstance.InstanceEndpoints["HttpIn"].IPEndpoint.Port.ToString();     stringroleRoot = Environment.GetEnvironmentVariable("RoleRoot");     stringjettyHome = roleRoot + @"approotappjetty7";     stringjreHome = roleRoot + @"approotappjre6";     Processproc = newProcess();     proc.StartInfo.UseShellExecute = false;     proc.StartInfo.RedirectStandardOutput = true;     proc.StartInfo.FileName = String.Format(""{0}inava.exe"", jreHome);     proc.StartInfo.Arguments = String.Format("-Djetty.port={0} -Djetty.home="{1}" -jar "{1}tart.jar"", port, jettyHome);     proc.EnableRaisingEvents = false;     proc.Start();     sr = proc.StandardOutput;     response = sr.ReadToEnd();} catch(Exception ex) {     response = ex.Message;     Trace.TraceError(response); }
      <Endpoints> <InputEndpointname="HttpIn"port="80"protocol="tcp" /></Endpoints>
    • 12. > Java How-To > Limitations
      Current constraints
      Platform
      Dynamic networking
      <your app>.cloudapp.net
      no naked domain
      CNAME re-direct from custom domain
      sending traffic to loopback addresses not allowed and cannot open arbitrary ports
      No OS-level access
      Non-persistent local file system
      allocate local storage directory
      read-only: Windows directory, machine configuration files, service configuration files
      Available registry resources
      read-only: HKEY_CLASSES_ROOT, HKEY_LOCAL_MACHINE, HKEY_USERS, HKEY_CURRENT_CONFIG
      full access: HKEY_CURRENT_USER
      Java
      Sandboxed networking
      NIO (java.nio) not supported
      engine and host-level clustering
      JNDI, JMS, JMX, RMI, etc.
      need to configure networking
      Non-persistent local file system
      logging, configuration, etc.
      REST-based APIs to services
      Table Storage – schema-less (noSQL)
      Blob Storage – large files (<200GB block blobs; <1TB page blobs)
      Queues
      Service Bus
      Access Control
    • 13. > Java How-To > Platform Enhancements
      Improvements on the way
      Platform
      Networking control
      fixed VM ports (e.g., match external ports)
      fixed and ranges for inter-role ports
      inter-role communication access control
      Plugins
      diagnostics
      Intellitrace
      remote desktop
      Full IIS
      multiple websites in same role
      virtual directories
      applications, modules
      Admin mode
      OS-level installations and configurations
      VM Role
      run your own Windows Server-based VM
      Java
      Traditional deployment models
      deploy your own Java EE stack
      configure internal networking
      More frameworks, packages, and extended languages
      verify deployment and configuration
      Solution accelerators
      with bootstrapping and configuration
      Java API support
      map to lower-level services
    • 14. > Java How-To >Platform Enhancements
      Running Jetty with fixed VM ports
      Boot-strapping code in WorkerRole.run()
      Service end point(s) in ServiceDefinition.csdef
      string response = ""; try{     System.IO.StreamReadersr;     string port = "80";    stringroleRoot = Environment.GetEnvironmentVariable("RoleRoot");     stringjettyHome = roleRoot + @"approotappjetty7";     stringjreHome = roleRoot + @"approotappjre6";     Processproc = newProcess();     proc.StartInfo.UseShellExecute = false;     proc.StartInfo.RedirectStandardOutput = true;     proc.StartInfo.FileName = String.Format(""{0}inava.exe"", jreHome);     proc.StartInfo.Arguments = String.Format("-Djetty.port={0} -Djetty.home="{1}" -jar "{1}tart.jar"", port, jettyHome);     proc.EnableRaisingEvents = false;     proc.Start();     sr = proc.StandardOutput;     response = sr.ReadToEnd();} catch(Exception ex) {     response = ex.Message;     Trace.TraceError(response); }
      <Endpoints><InputEndpointname="HttpIn"protocol=“http"port="80"localPort="80"/></Endpoints>
    • 15. > Java How-To >Platform Enhancements
      Running Jetty with admin mode
      Execute startup script in ServiceDefinition.csdef
      <Startup><TaskcommandLine=“runme.cmd"executionContext=“elevated"askType=“simple"> </Task></Startup>
    • 16. > Java How-To >Platform Enhancements
      Deployment Options
      Worker Role
      fabric sandbox native deployment
      automated, need additional code
      available now
      Admin Mode
      script-based installation and execution
      automated, need scripts
      available shortly
      Remote Desktop
      login remotely and manually install
      manual, full control
      available shortly
      VM Role
      host your own pre-configured VM image
      automated, full control
      available later
    • 17. > Azure Overview >Ideal Scenarios
      What’s this good for?
      Web Applications
      • massive scale infrastructure
      • 18. burst & overflow capacity
      • 19. temporary, ad-hoc sites
      Service Applications
      • composite applications
      • 20. mobile/client connected services
      • 21. Web API’s
      Hybrid Applications
      • component services
      • 22. distributed processing
      • 23. distributed data
      • 24. external storage
      Media Applications
      • CGI rendering
      • 25. content transcoding
      • 26. media streaming
      Information Sharing
      • reference data
      • 27. common data repositories
      • 28. knowledge discovery & management
      Collaborative Processes
      • multi-enterprise integration
      • 29. B2B & e-commerce
      • 30. supply chain management
      • 31. health & life sciences
      • 32. domain-specific services
    • > Architecting for Scale
      Size matters
      Facebook (2009)
      +200B pageviews /month
      >3.9T feed actions /day
      +300M active users
      >1B chat mesgs /day
      100M search queries /day
      >6B minutes spent /day (ranked #2 on Internet)
      +20B photos, +2B/month growth
      600,000 photos served /sec
      25TB log data /day processed thru Scribe
      120M queries /sec on memcache
      Twitter (2009)
      600 requests /sec
      avg 200-300 connections /sec; peak at 800
      MySQL handles 2,400 requests /sec
      30+ processes for handling odd jobs
      process a request in 200 milliseconds in Rails
      average time spent in the database is 50-100 milliseconds
      +16 GB of memcached
      Google (2007)
      +20 petabytes of data processed /day by +100K MapReduce jobs
      1 petabyte sort took ~6 hours on ~4K servers replicated onto ~48K disks
      +200 GFS clusters, each at 1-5K nodes, handling +5 petabytes of storage
      ~40 GB /sec aggregate read/write throughput across the cluster
      +500 servers for each search query < 500ms
      >1B views / day on Youtube (2009)
      Myspace(2007)
      115B pageviews /month
      5M concurrent users @ peak
      +3B images, mp3, videos
      +10M new images/day
      160 Gbit/sec peak bandwidth
      Flickr (2007)
      +4B queries /day
      +2B photos served
      ~35M photos in squid cache
      ~2M photos in squid’s RAM
      38k req/sec to memcached (12M objects)
      2 PB raw storage
      +400K photos added /day
    • 33. > Architecting for Scale > Vertical Scaling
      Traditional scale-up architecture
      Common characteristics
      synchronous processes
      sequential units of work
      tight coupling
      stateful
      pessimistic concurrency
      clustering for HA
      vertical scaling
      units of work
      app server
      web
      data store
      app server
      web
      data store
    • 34. > Architecting for Scale >Vertical Scaling
      Traditional scale-up architecture
      To scale, get bigger servers
      expensive
      has scaling limits
      inefficient use of resources
      app server
      web
      data store
      app server
      web
    • 35. > Architecting for Scale >Vertical Scaling
      Traditional scale-up architecture
      When problems occur
      bigger failure impact
      data store
      app server
      web
      app server
      web
    • 36. > Architecting for Scale >Vertical Scaling
      Traditional scale-up architecture
      When problems occur
      bigger failure impact
      more complex recovery
      app server
      web
      data store
      web
    • 37. > Architecting for Scale > Horizontal scaling
      Use more pieces, not bigger pieces
      LEGO 7778 Midi-scale Millennium Falcon
      • 9.3 x 6.7 x 3.2 inches (L/W/H)
      • 38. 356 pieces
      LEGO 10179 Ultimate Collector's Millennium Falcon
      • 33 x 22 x 8.3 inches (L/W/H)
      • 39. 5,195 pieces
    • > Architecting for Scale > Horizontal scaling
      Scale-out architecture
      Common characteristics
      small logical units of work
      loosely-coupled processes
      stateless
      event-driven design
      optimistic concurrency
      partitioned data
      redundancy fault-tolerance
      re-try-based recoverability
      app server
      web
      data store
      app server
      web
      data store
    • 40. > Architecting for Scale > Horizontal scaling
      Scale-out architecture
      To scale, add more servers
      not bigger servers
      app server
      web
      data store
      app server
      web
      data store
      app server
      web
      data store
      app server
      web
      data store
      app server
      web
      data store
      app server
      web
      data store
    • 41. > Architecting for Scale > Horizontal scaling
      Scale-out architecture
      When problems occur
      smaller failure impact
      higher perceived availability
      app server
      web
      data store
      app server
      web
      data store
      app server
      web
      data store
      app server
      web
      data store
      app server
      web
      data store
      app server
      web
      data store
    • 42. > Architecting for Scale > Horizontal scaling
      Scale-out architecture
      When problems occur
      smaller failure impact
      higher perceived availability
      simpler recovery
      app server
      web
      data store
      app server
      web
      data store
      web
      app server
      data store
      web
      data store
      app server
      web
      data store
      app server
      web
      data store
    • 43. > Architecting for Scale > Horizontal scaling
      Scale-out architecture + distributed computing
      parallel tasks
      Scalable performance at extreme scale
      asynchronous processes
      parallelization
      smaller footprint
      optimized resource usage
      reduced response time
      improved throughput
      app server
      web
      data store
      app server
      web
      data store
      web
      app server
      data store
      app server
      web
      data store
      perceived response time
      app server
      web
      data store
      app server
      web
      data store
      async tasks
    • 44. > Architecting for Scale > Horizontal scaling
      Scale-out architecture + distributed computing
      When problems occur
      smaller units of work
      decoupling shields impact
      app server
      web
      data store
      app server
      web
      data store
      web
      app server
      data store
      app server
      web
      data store
      app server
      web
      data store
      app server
      web
      data store
    • 45. > Architecting for Scale > Horizontal scaling
      Scale-out architecture + distributed computing
      When problems occur
      smaller units of work
      decoupling shields impact
      even simpler recovery
      app server
      web
      data store
      app server
      web
      data store
      web
      app server
      data store
      app server
      web
      data store
      app server
      web
      data store
      web
      data store
    • 46. > Architecting for Scale >Cloud Architecture Patterns
      Live Journal (from Brad Fitzpatrick, then Founder at Live Journal, 2007)
      Web Frontend
      Apps & Services
      Partitioned Data
      Distributed
      Cache
      Distributed Storage
    • 47. > Architecting for Scale >Cloud Architecture Patterns
      Flickr (from Cal Henderson, then Director of Engineering at Yahoo, 2007)
      Web Frontend
      Apps & Services
      Distributed Storage
      Distributed
      Cache
      Partitioned Data
    • 48. > Architecting for Scale >Cloud Architecture Patterns
      SlideShare(from John Boutelle, CTO at Slideshare, 2008)
      Web
      Frontend
      Apps &
      Services
      Distributed Cache
      Partitioned Data
      Distributed Storage
    • 49. > Architecting for Scale >Cloud Architecture Patterns
      Twitter (from John Adams, Ops Engineer at Twitter, 2010)
      Web
      Frontend
      Apps &
      Services
      Partitioned
      Data
      Queues
      Async
      Processes
      Distributed
      Cache
      Distributed
      Storage
    • 50. > Architecting for Scale >Cloud Architecture Patterns
      Distributed
      Storage
      Facebook
      (from Jeff Rothschild, VP Technology at Facebook, 2009)
      2010 stats (Source: http://www.facebook.com/press/info.php?statistics)
      People
      +500M active users
      50% of active users log on in any given day
      people spend +700B minutes /month
      Activity on Facebook
      +900M objects that people interact with
      +30B pieces of content shared /month
      Global Reach
      +70 translations available on the site
      ~70% of users outside the US
      +300K users helped translate the site through the translations application
      Platform
      +1M developers from +180 countries
      +70% of users engage with applications /month
      +550K active applications
      +1M websites have integrated with Facebook Platform
      +150M people engage with Facebook on external websites /month
      Web
      Frontend
      Apps &
      Services
      Distributed
      Cache
      Parallel
      Processes
      Partitioned
      Data
      Async
      Processes
    • 51. > Architecting for Scale > Cloud Architecture Patterns
      Windows Azure platform components
      Apps & Services
      Web Frontend
      Distributed
      Cache
      Partitioned Data
      Distributed Storage
      Queues
      Content Delivery Network
      Load Balancer
      IIS
      Web Server
      VM Role
      Worker Role
      Web Role
      Caching
      Queues
      Access Control
      Hosting
      Blobs
      Relational Database
      Tables
      Drives
      Service Bus
      Reporting & Analysis
      Data Synchronization
      Virtual Private Network
      Services
    • 52. >Architecting for Scale
      Fundamental concepts
      Vertical scaling still works
    • 53. >Architecting for Scale
      Fundamental concepts
      Horizontal scaling for cloud computing
      Small pieces, loosely coupled
      Distributed computing best practices
      asynchronous processes (event-driven design)
      parallelization
      idempotent operations (handle duplicity)
      de-normalized, partitioned data (sharding)
      shared nothing architecture
      optimistic concurrency
      fault-tolerance by redundancy and replication
      etc.
    • 54. > Architecting for Scale >Fundamental Concepts
      Asynchronous processes & parallelization
      Defer work as late as possible
      return to user as quickly as possible
      event-driven design (instead of request-driven)
      Cloud computing friendly
      distributes work to more servers (divide & conquer)
      smaller resource usage/footprint
      smaller failure surface
      decouples process dependencies
      Windows Azure platform services
      Queue Service
      AppFabric Service Bus
      inter-node communication
      Worker Role
      Web Role
      Queues
      Service Bus
      Web Role
      Web Role
      Web Role
      Worker Role
      Worker Role
      Worker Role
    • 55. > Architecting for Scale >Fundamental Concepts
      Partitioned data
      Shared nothing architecture
      transaction locality (partition based on an entity that is the “atomic” target of majority of transactional processing)
      loosened referential integrity (avoid distributed transactions across shard and entity boundaries)
      design for dynamic redistribution and growth of data (elasticity)
      Cloud computing friendly
      divide & conquer
      size growth with virtually no limits
      smaller failure surface
      Windows Azure platform services
      Table Storage Service
      SQL Azure
      AppFabric Caching (coming soon)
      SQL Azure DB federation (coming soon)
      read
      Web Role
      Queues
      Web Role
      Web Role
      Worker Role
      Relational Database
      Relational Database
      Relational Database
      Web Role
      write
    • 56. > Architecting for Scale >Fundamental Concepts
      Idempotent operations
      Repeatable processes
      allow duplicates (additive)
      allow re-tries (overwrite)
      reject duplicates (optimistic locking)
      stateless design
      Cloud computing friendly
      resiliency
      Windows Azure platform services
      Queue Service
      AppFabric Service Bus
      Worker Role
      Service Bus
      Worker Role
      Worker Role
    • 57. > Architecting for Scale >Fundamental Concepts
      CAP (Consistency, Availability, Partition) Theorem
      At most two of these properties for any shared-data system
      Consistency + Availability
      • High data integrity
      • 58. Single site, cluster database, LDAP, xFS file system, etc.
      • 59. 2-phase commit, data replication, etc.
      A
      C
      A
      A
      C
      C
      Consistency + Partition
      • Distributed database, distributed locking, etc.
      • 60. Pessimistic locking, minority partition unavailable, etc.
      P
      P
      P
      Availability + Partition
      • High scalability
      • 61. Distributed cache, DNS, etc.
      • 62. Optimistic locking, expiration/leases, etc.
      “Towards Robust Distributed Systems”, Dr. Eric A. Brewer, UC Berkeley
    • 63. > Architecting for Scale >Fundamental Concepts
      Hybrid architectures
      Scale-out (horizontal)
      BASE: Basically Available, Soft state, Eventually consistent
      focus on “commit”
      conservative (pessimistic)
      shared nothing
      favor extreme size
      e.g., user requests, data collection & processing, etc.
      Scale-up (vertical)
      ACID: Atomicity, Consistency, Isolation, Durability
      availability first; best effort
      aggressive (optimistic)
      transactional
      favor accuracy/consistency
      e.g., BI & analytics, financial processing, etc.
      Most distributed systems employ both approaches
    • 64. > What’s Next
      Roadmap (high-level sampling; subject to change)
      2010
      2011
      • VM role
      • 77. data synchronization service
      • 78. virtual private network
      • 79. service bus v2
      • 80. distributed caching
      • 81. pipeline & container (service hosting)
      • 82. data reporting services
      • 83. database federation
      2012
      • data analysis services
      • 84. data cleansing service
      • 85. Windows Azure Appliance
    • Thank you!
      David Chou
      david.chou@microsoft.com
      blogs.msdn.com/dachou
      © 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
      The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

    ×