#evolve19
RUNNING AEM
WORKLOADS ON
MICROSOFT AZURE
Jayan Kandathil (Adobe Inc.)
August 5, 2019
#evolve19 2
• [Cloud Engineer] with Adobe Managed Services
• [Adobe Managed Services] runs AEM on AWS and Azure for customers
• 600+ AEM customers (130+ on Azure)
• 2000+ VMs on Azure
• CSE model – one point person for everything AEM
• Global team of about 200
• San Jose, Lehi (UT), London, Bucharest, New Delhi, Bangalore, Sydney
ABOUT ADOBE MANAGED SERVICES
#evolve19 3
• “Strategic Partnership” with Microsoft (announced at IGNITE in Atlanta, Sep 2016)
• AEM Managed Services chosen as part of the vanguard
THE AZURE CONTEXT
https://channel9.msdn.com/Events/Ignite/2016/KEY01 (timecode 11:36)
https://blogs.adobe.com/conversations/2016/09/microsoft-partnership.html
#evolve19 4
• “Premier Mission Critical” (PMC) Support
• Dedicated solutions architects from Azure
• Dedicated liaison with Azure Engineering teams
• First customer went live in Oct 2017
PARTNERSHIP
WHAT IT MEANS IN PRACTICE
#evolve19 5
• Built Connector for Azure Blob Storage
PRODUCT CHANGES MADE
#evolve19 6
• Gartner Magic Quadrant
CURRENT STATE OF IAAS
SUBHEAD
https://mspoweruser.com/microsoft-azure-continues-to-lag-behind-amazon-in-the-cloud-infrastructure-market/
https://www.gartner.com/en/documents/3875999/magic-quadrant-for-cloud-infrastructure-as-a-service-wor0
#evolve19 7
• Virtual Machine
• Virtual Network (VNET)
• Network Interface
• Network Security Group
• Managed Disk (~ EBS volume)
• Blob Storage Container (~ S3 bucket)
• Application Gateway load-balancer (~ ALB)
• CDN
KEY AZURE SERVICES WE USE
#evolve19 8
• CPU (sysbench --test=cpu --cpu-max-prime=100000 --num-threads=4 run)
• Memory (READ) (ops/sec)
• Memory (READ) (MB/sec)
• Memory (READ) (sec, total time)
• Memory (WRITE) (ops/sec)
• Memory (WRITE) (MB/sec)
• Memory (WRITE) (sec, total time)
EVALUATION - COMPUTE
SYSBENCH 0.4.12
sysbench --test=memory --memory-block-size=4K --memory-scope=global --memory-total-size=1024G --num-threads=100 --memory-oper=read run
#evolve19 9
• Storage (SEQuential WRITE) (MB/sec)
• Storage (SEQuential READ) (GB/sec)
• Storage (RANDOM WRITE) (MB/sec)
• Storage (RANDOM READ) (MB/sec)
EVALUATION - STORAGE
SYSBENCH, DD
WRITE : dd if=/dev/zero of=/mnt/crx/sysbench/file.img bs=8k count=1310720
READ : dd if=/mnt/crx/sysbench/file.img of=/dev/zero bs=8k
#evolve19 10
• Loss of “Availability Zone”
• Loss of Region
EVALUATION – APPLICATION AVAILABILITY
CANARY REGION
#evolve19 11
• Ingest 6,000 [1 MB] JPGs (95th percentile) (client-side) (ms)
• Ingest 4,000 [5 MB] PNGs (95th percentile) (client-side) (ms)
• (Transient) Workflow processing 6,000 JPGs and 4,000 PNGs (total time) (server-side)
(minutes)
• Install package with 300,000 cq:Page nodes (total time) (seconds)
• Create 2 million web pages with two JCR properties each (ACS Tools - Test Page
Generator) (total time) (minutes)
• Ingest 1,000 [1 MB] PDFs via WebDAV (95th percentile) (server-side) (ms)
EVALUATION - AEM
AEM 6.3
#evolve19 12
• Query for 6,000 PNG images (QueryBuilder) (total time) (seconds)
• Query for 4,000 JPG images (QueryBuilder) (total time) (seconds)
• Query for those 2 million web pages (QueryBuilder) (total time) (seconds)
• Query (JCR-SQL2) for page property (node traversal of 5.02 million nodes), count the
results as well (ACS Tools - Explain Query) (total time) (seconds)
• Query for 1,000 PDF documents (QueryBuilder) (total time) (seconds)
EVALUATION – AEM - SEARCH
AEM 6.3
#evolve19 13
• 125 TB ASSETS repository
• List Folders and Assets
• Download Assets
• READ Assets Metadata
• Search for Assets
• Update Assets Metadata
• Upload File
EVALUATION – AEM – SITES/ASSETS/FORMS
AEM 6.3
#evolve19 14
• Hyper-threading* turned off on hosts
• 30% better AEM performance
VM CHOICE – DS_V2
GENERAL PURPOSE, SSD-CAPABLE, GENERATION 2
* CPU splits each of its physical cores into virtual cores, which are known as threads
Virtual Machine
Intel Xeon CPU 4-core Socket
HT turned off
Virtual Machine
Physical Core Physical Core Physical Core Physical Core
Virtual MachineVirtual MachineVirtual MachineVirtual Machine
Intel Xeon CPU 4-core Socket
HT turned on
Physical Core Physical Core Physical Core Physical Core
Logical
Core
Logical
Core
Logical
Core
Logical
Core
Logical
Core
Logical
Core
Logical
Core
Logical
Core
#evolve19 15
INTER-AVAILABILITY ZONE NETWORK BANDWIDTH
DEC 13, 2018 : [WEST US 2] : DS3_V2 : IPERF3
#evolve19 16
INTER-REGION NETWORK BANDWIDTH
DEC 13, 2018 : [WEST US 2] – [EAST US 2] : DS3_V2 : IPERF3
#evolve19 17
• Single-Tenant
• Each customer gets a dedicated Azure “subscription”
• Each environment (QA/Stage/Prod) mapped to an Azure “resource group”
DEPLOYMENT ARCHITECTURE
https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits
Resource
group
customer1-
dev
Resource
group
customer1-
stage
Resource
group
customer1-
qa
Resource
group
customer1-
prod
#evolve19 18
• AEM-based portal called “MS-Central”
• Azure SDK for Java
• Azure Resource Manager (ARM) Templates (~ CloudFormation templates)
• Chef
AUTOMATED PROVISIONING
https://github.com/Azure/azure-sdk-for-java
https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-group-authoring-templates
#evolve19 19
• OSGI Bundle
• Oak Azure Cloud Blob Store (org.apache.jackrabbit.oak-blob-cloud-azure)
• (AEM 6.5) 1.10.2
AEM CONNECTOR FOR AZURE BLOB STORE
https://mvnrepository.com/artifact/org.apache.jackrabbit/oak-blob-cloud-azure
https://helpx.adobe.com/experience-manager/6-5/sites/deploying/using/data-store-config.html
#evolve19 20
• CI/CT/CD as a managed service
• Azure Logic Apps (orchestrator)
• Azure Container Instances (compute)
• Azure Functions (functions as a service – FaaS – “”server-less”)
• Happy with Logic Apps
• Make sure observability is designed in
AEM CLOUD MANAGER
#evolve19 21
• Dedicated, private, leased lines
• No Internet “weather” issues
• Predictable bandwidth
• Strictly between customer and Microsoft
• We do ExpressRoute gateways
• Private Peering, not the other one
EXPRESSROUTE
#evolve19 22
• Availability Zones
• Three per (some) Regions
• Availability Sets
• Fault Domain
• Update Domain
AVAILABILITY
#evolve19 23
• AEM on containers instead of VMs
• Autoscaling
• Azure Frontdoor
• Azure Stack
LOOKING AHEAD
#evolve19 24
TAKEAWAYS
ADVICE FOR YOU
#evolve19 25
• In many cases, choice of cloud provider is a business decision, not a technical one
• Differences are nuanced
# 1 : CLOUD PROVIDER COMPARISON
UNNECESSARY
#evolve19 26
• Azure has 54 Regions globally
• Survey content creator locations, and deploy near them
• If no CDN, deploy near content consumers
• Tools for measuring network latency available
• Let them run for at least a minute
#2 : DEPLOY WHERE YOUR USERS ARE
http://www.azurespeed.com/
https://azurespeedtest.azurewebsites.net/
#evolve19 27
• Azure “Canary” regions are great for testing failovers and failbacks
#3 : LEVERAGE CANARY REGIONS FOR TESTING
http://www.azurespeed.com/
https://azurespeedtest.azurewebsites.net/
#evolve19 28
• Azure was a latecomer to the AZ bandwagon, but is catching up
• 2 ms latency between AZs within a single Region
• IGNITE 2018 session on how AEM and Adobe Sign leverages AZs
• “Availability Sets” are NOT enough
#4 : LEVERAGE AVAILABILITY ZONES
https://www.youtube.com/watch?v=XoTDybIrazw (2018, timecode 54:02)
https://azure.microsoft.com/en-ca/resources/videos/azure-availability-zones-customer-testimonial/ (testimonial video, Mitch Nelson - Adobe)
https://www.youtube.com/watch?v=ilXx0cmmGz0 (2015, John Savill)
#evolve19 29
• 200 subscriptions per account
• Multiple accounts possible
#5 : PAY ATTENTION TO QUOTAS/LIMITS
https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits
#evolve19 30
• Application Gateway is a lyer 7 load-balancer (~ALB)
• BIG performance difference between Application Gateway v1 and v2
• Please upgrade to v2
• Test auto-scaling agility (100 RPM to 20,000 RPM in 1 minute may not be possible)
• Define upper limit on auto-scale (125 nodes) to avoid surprise bills
#6 : APPLICATION GATEWAY V2
https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits
#evolve19 31
• Azure has a very capable log analytics service
• Application Gateway (load-balancer) logs
• CDN logs
#7 : LOG ANALYTICS
https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits
#evolve19 32
• ACI for short-lived workloads (billed by the minute)
• AKS for long-running workloads (billed by the hour)
• Cloud Manager uses [Azure Container Instances]
• Disable connection pooling of Maven’s [Wagon provider for HTTP access] to avoid NAT
timeout (4 minutes)
• Performance penalty – since Maven would now reconnect to the repository servers for each
request
#8 : ACI VS KUBERNETES
https://maven.apache.org/wagon/wagon-providers/wagon-http/
#evolve19 33
• AEM on Azure works fine
• Deploy to a Region near where your users are
#10 : FINAL TAKEAWAY
https://azurespeedtest.azurewebsites.net/
http://www.azurespeed.com/
#evolve19
THANK YOU!

Evolve 19 | Jayan Kandathil | Running AEM Workloads on Microsoft Azure

  • 1.
    #evolve19 RUNNING AEM WORKLOADS ON MICROSOFTAZURE Jayan Kandathil (Adobe Inc.) August 5, 2019
  • 2.
    #evolve19 2 • [CloudEngineer] with Adobe Managed Services • [Adobe Managed Services] runs AEM on AWS and Azure for customers • 600+ AEM customers (130+ on Azure) • 2000+ VMs on Azure • CSE model – one point person for everything AEM • Global team of about 200 • San Jose, Lehi (UT), London, Bucharest, New Delhi, Bangalore, Sydney ABOUT ADOBE MANAGED SERVICES
  • 3.
    #evolve19 3 • “StrategicPartnership” with Microsoft (announced at IGNITE in Atlanta, Sep 2016) • AEM Managed Services chosen as part of the vanguard THE AZURE CONTEXT https://channel9.msdn.com/Events/Ignite/2016/KEY01 (timecode 11:36) https://blogs.adobe.com/conversations/2016/09/microsoft-partnership.html
  • 4.
    #evolve19 4 • “PremierMission Critical” (PMC) Support • Dedicated solutions architects from Azure • Dedicated liaison with Azure Engineering teams • First customer went live in Oct 2017 PARTNERSHIP WHAT IT MEANS IN PRACTICE
  • 5.
    #evolve19 5 • BuiltConnector for Azure Blob Storage PRODUCT CHANGES MADE
  • 6.
    #evolve19 6 • GartnerMagic Quadrant CURRENT STATE OF IAAS SUBHEAD https://mspoweruser.com/microsoft-azure-continues-to-lag-behind-amazon-in-the-cloud-infrastructure-market/ https://www.gartner.com/en/documents/3875999/magic-quadrant-for-cloud-infrastructure-as-a-service-wor0
  • 7.
    #evolve19 7 • VirtualMachine • Virtual Network (VNET) • Network Interface • Network Security Group • Managed Disk (~ EBS volume) • Blob Storage Container (~ S3 bucket) • Application Gateway load-balancer (~ ALB) • CDN KEY AZURE SERVICES WE USE
  • 8.
    #evolve19 8 • CPU(sysbench --test=cpu --cpu-max-prime=100000 --num-threads=4 run) • Memory (READ) (ops/sec) • Memory (READ) (MB/sec) • Memory (READ) (sec, total time) • Memory (WRITE) (ops/sec) • Memory (WRITE) (MB/sec) • Memory (WRITE) (sec, total time) EVALUATION - COMPUTE SYSBENCH 0.4.12 sysbench --test=memory --memory-block-size=4K --memory-scope=global --memory-total-size=1024G --num-threads=100 --memory-oper=read run
  • 9.
    #evolve19 9 • Storage(SEQuential WRITE) (MB/sec) • Storage (SEQuential READ) (GB/sec) • Storage (RANDOM WRITE) (MB/sec) • Storage (RANDOM READ) (MB/sec) EVALUATION - STORAGE SYSBENCH, DD WRITE : dd if=/dev/zero of=/mnt/crx/sysbench/file.img bs=8k count=1310720 READ : dd if=/mnt/crx/sysbench/file.img of=/dev/zero bs=8k
  • 10.
    #evolve19 10 • Lossof “Availability Zone” • Loss of Region EVALUATION – APPLICATION AVAILABILITY CANARY REGION
  • 11.
    #evolve19 11 • Ingest6,000 [1 MB] JPGs (95th percentile) (client-side) (ms) • Ingest 4,000 [5 MB] PNGs (95th percentile) (client-side) (ms) • (Transient) Workflow processing 6,000 JPGs and 4,000 PNGs (total time) (server-side) (minutes) • Install package with 300,000 cq:Page nodes (total time) (seconds) • Create 2 million web pages with two JCR properties each (ACS Tools - Test Page Generator) (total time) (minutes) • Ingest 1,000 [1 MB] PDFs via WebDAV (95th percentile) (server-side) (ms) EVALUATION - AEM AEM 6.3
  • 12.
    #evolve19 12 • Queryfor 6,000 PNG images (QueryBuilder) (total time) (seconds) • Query for 4,000 JPG images (QueryBuilder) (total time) (seconds) • Query for those 2 million web pages (QueryBuilder) (total time) (seconds) • Query (JCR-SQL2) for page property (node traversal of 5.02 million nodes), count the results as well (ACS Tools - Explain Query) (total time) (seconds) • Query for 1,000 PDF documents (QueryBuilder) (total time) (seconds) EVALUATION – AEM - SEARCH AEM 6.3
  • 13.
    #evolve19 13 • 125TB ASSETS repository • List Folders and Assets • Download Assets • READ Assets Metadata • Search for Assets • Update Assets Metadata • Upload File EVALUATION – AEM – SITES/ASSETS/FORMS AEM 6.3
  • 14.
    #evolve19 14 • Hyper-threading*turned off on hosts • 30% better AEM performance VM CHOICE – DS_V2 GENERAL PURPOSE, SSD-CAPABLE, GENERATION 2 * CPU splits each of its physical cores into virtual cores, which are known as threads Virtual Machine Intel Xeon CPU 4-core Socket HT turned off Virtual Machine Physical Core Physical Core Physical Core Physical Core Virtual MachineVirtual MachineVirtual MachineVirtual Machine Intel Xeon CPU 4-core Socket HT turned on Physical Core Physical Core Physical Core Physical Core Logical Core Logical Core Logical Core Logical Core Logical Core Logical Core Logical Core Logical Core
  • 15.
    #evolve19 15 INTER-AVAILABILITY ZONENETWORK BANDWIDTH DEC 13, 2018 : [WEST US 2] : DS3_V2 : IPERF3
  • 16.
    #evolve19 16 INTER-REGION NETWORKBANDWIDTH DEC 13, 2018 : [WEST US 2] – [EAST US 2] : DS3_V2 : IPERF3
  • 17.
    #evolve19 17 • Single-Tenant •Each customer gets a dedicated Azure “subscription” • Each environment (QA/Stage/Prod) mapped to an Azure “resource group” DEPLOYMENT ARCHITECTURE https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits Resource group customer1- dev Resource group customer1- stage Resource group customer1- qa Resource group customer1- prod
  • 18.
    #evolve19 18 • AEM-basedportal called “MS-Central” • Azure SDK for Java • Azure Resource Manager (ARM) Templates (~ CloudFormation templates) • Chef AUTOMATED PROVISIONING https://github.com/Azure/azure-sdk-for-java https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-group-authoring-templates
  • 19.
    #evolve19 19 • OSGIBundle • Oak Azure Cloud Blob Store (org.apache.jackrabbit.oak-blob-cloud-azure) • (AEM 6.5) 1.10.2 AEM CONNECTOR FOR AZURE BLOB STORE https://mvnrepository.com/artifact/org.apache.jackrabbit/oak-blob-cloud-azure https://helpx.adobe.com/experience-manager/6-5/sites/deploying/using/data-store-config.html
  • 20.
    #evolve19 20 • CI/CT/CDas a managed service • Azure Logic Apps (orchestrator) • Azure Container Instances (compute) • Azure Functions (functions as a service – FaaS – “”server-less”) • Happy with Logic Apps • Make sure observability is designed in AEM CLOUD MANAGER
  • 21.
    #evolve19 21 • Dedicated,private, leased lines • No Internet “weather” issues • Predictable bandwidth • Strictly between customer and Microsoft • We do ExpressRoute gateways • Private Peering, not the other one EXPRESSROUTE
  • 22.
    #evolve19 22 • AvailabilityZones • Three per (some) Regions • Availability Sets • Fault Domain • Update Domain AVAILABILITY
  • 23.
    #evolve19 23 • AEMon containers instead of VMs • Autoscaling • Azure Frontdoor • Azure Stack LOOKING AHEAD
  • 24.
  • 25.
    #evolve19 25 • Inmany cases, choice of cloud provider is a business decision, not a technical one • Differences are nuanced # 1 : CLOUD PROVIDER COMPARISON UNNECESSARY
  • 26.
    #evolve19 26 • Azurehas 54 Regions globally • Survey content creator locations, and deploy near them • If no CDN, deploy near content consumers • Tools for measuring network latency available • Let them run for at least a minute #2 : DEPLOY WHERE YOUR USERS ARE http://www.azurespeed.com/ https://azurespeedtest.azurewebsites.net/
  • 27.
    #evolve19 27 • Azure“Canary” regions are great for testing failovers and failbacks #3 : LEVERAGE CANARY REGIONS FOR TESTING http://www.azurespeed.com/ https://azurespeedtest.azurewebsites.net/
  • 28.
    #evolve19 28 • Azurewas a latecomer to the AZ bandwagon, but is catching up • 2 ms latency between AZs within a single Region • IGNITE 2018 session on how AEM and Adobe Sign leverages AZs • “Availability Sets” are NOT enough #4 : LEVERAGE AVAILABILITY ZONES https://www.youtube.com/watch?v=XoTDybIrazw (2018, timecode 54:02) https://azure.microsoft.com/en-ca/resources/videos/azure-availability-zones-customer-testimonial/ (testimonial video, Mitch Nelson - Adobe) https://www.youtube.com/watch?v=ilXx0cmmGz0 (2015, John Savill)
  • 29.
    #evolve19 29 • 200subscriptions per account • Multiple accounts possible #5 : PAY ATTENTION TO QUOTAS/LIMITS https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits
  • 30.
    #evolve19 30 • ApplicationGateway is a lyer 7 load-balancer (~ALB) • BIG performance difference between Application Gateway v1 and v2 • Please upgrade to v2 • Test auto-scaling agility (100 RPM to 20,000 RPM in 1 minute may not be possible) • Define upper limit on auto-scale (125 nodes) to avoid surprise bills #6 : APPLICATION GATEWAY V2 https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits
  • 31.
    #evolve19 31 • Azurehas a very capable log analytics service • Application Gateway (load-balancer) logs • CDN logs #7 : LOG ANALYTICS https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits
  • 32.
    #evolve19 32 • ACIfor short-lived workloads (billed by the minute) • AKS for long-running workloads (billed by the hour) • Cloud Manager uses [Azure Container Instances] • Disable connection pooling of Maven’s [Wagon provider for HTTP access] to avoid NAT timeout (4 minutes) • Performance penalty – since Maven would now reconnect to the repository servers for each request #8 : ACI VS KUBERNETES https://maven.apache.org/wagon/wagon-providers/wagon-http/
  • 33.
    #evolve19 33 • AEMon Azure works fine • Deploy to a Region near where your users are #10 : FINAL TAKEAWAY https://azurespeedtest.azurewebsites.net/ http://www.azurespeed.com/
  • 34.