Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
#infrayarou
{
“名前” : “真壁 徹(まかべ とおる)”,
“所属” : “日本マイクロソフト株式会社”,
“役割” : “クラウド ソリューションアーキテクト”,
“経歴” : “大和総研  HP Enterprise”,
“特技” : “クラウド...
https://docs.microsoft.com/ja-jp/azure/
[Microsoft Global Datacenters and
Network Infrastructure]
https://www.youtube.com/watch?v=bqZrejosqWU
32 Regions Worldwide, 24 Generally Available…
Central US
Iowa
West US
California
East US 2
Virginia
US Gov
Virginia
North ...
$azure location list --details
info: Executing command location list
+ Getting ARM registered providers
info: Getting loca...
https://blogs.technet.microsoft.com/hybridcloud/2016/05/26/microsoft-
and-facebook-to-build-subsea-cable-across-atlantic/
...
Colocation Density
2.0+ Power Usage
Effectiveness (PUE) 1.4 – 1.6 PUE
Discrete servers
Capacity
20 year technology
Rack
De...
S. Sankar, K. Vaid, M. Shaw “Impact of Temperature on Hard Disk Drive Reliability in
Large Datacenters” Microsoft, IEEE, 2...
http://natick.research.microsoft.com/
• 2014 年にマイクロソフトとして
カーボン ニュートラルを達成済み
( https://blogs.microsoft.com/green/category/renewable-energy/ )
https://news.microso...
Geo
Region
Region
DCs/Zones
DCs/Zones
汎用・柔軟 効率・性能
( https://docs.microsoft.com/ja-jp/azure/virtual-machines/virtual-machines-linux-sizes )
October 15, 2016
October 15, 2016
• Hyper-V VMSwitch拡張
• AzureでSDNを実現するためのコア機能
• Address Virtualization for VNET
• VIP -> DIP Translation for SLB
• ACLs, Me...
Host: 10.4.1.5
• VMSwitchがMatch-Action-Table型の
APIをコントローラーへ提供
• コントローラーがポリシーを定義
• ポリシー毎のテーブル
• パケット毎にどう処理すべきかを厳密
に定義
Tenan...
Flow Action
Decap, DNAT, Rewrite, Meter1.2.3.1->1.3.4.1, 62362->80
VFP
Southbound API
GFT Offload API (NDIS)
VMSwitch
VM
N...
• IaaS仮想マシンD15v2、
DS15v2で利用可能
• プライベートプレビュー中
Microsoft's Production Configurable Cloud” Mark Russinovich, Chief Technology...
ToR
FPGA
NIC
Server
FPGA
NIC
Server
FPGA
NIC
Server
FPGA
NIC
Server
CS0 CS1 CS2 CS3
ToR
FPGA
NIC
Server
FPGA
NIC
Server
FP...
Credits
Virtual Channel
Data
Header
Elastic
Router
(multi-
VC
on-chip
router)
Send Connection Table
Transmit State
Machine...
Microsoft's Production Configurable Cloud” Mark Russinovich, Chief Technology Officer, Microsoft Azure, SCS Distinguished ...
https://www.sdxcentral.com/articles/news/microsoft-azure-will-use-intel-silicon-
photonics/2016/08/
Microsoft expects to d...
“The problem I have right now? It is supply chain. I am not so worried about technology. We
have our Open Cloud Server, wh...
https://azure.microsoft.com/en-us/blog/microsoft-reimagines-open-source-cloud-hardware/
Azure Storage
https://infrayarou.blob.core.win
dows.net/vhds/myubuntu.vhd
FE 2
Partition 3
(F-J)
Stream 2
Partition Layer
Stream Layer
FE 2
Partition 3
(F-J)
Stream 2
Request 1:
Partition F;
Row 102
Request 1: シンプルな例
FE 1
Partition 3
(F-J)
Stream 4
Request 1:
Partition F;
Row 102
Request 2:
Partition F;
Row 507
Request 2: 異なる Front End、同...
FE 4
Partition 4
(K-T)
Stream 2
Request 1:
Partition F;
Row 102
Request 2:
Partition F;
Row 507
Request 3:
Partition T;
Ro...
FE 4
Partition 5
(U-Z)
Stream 3 Stream 4
Request 1:
Partition F;
Row 102
Request 2:
Partition F;
Row 507
Request 3:
Partit...
https://docs.microsoft.com/ja-jp/azure/storage/storage-scalability-targets
Disk(Page Blob)
C:¥, /dev/sda C:¥, /dev/sda
copy
C:¥, /dev/sda
Image
Cache
copy
C:¥, /dev/sda
L3
L2
L3 East/West
トラフィックが
遠い
Routerが大型に
なり高コスト
LB/FWが
ボトルネック
T2-1-1 T2-1-2 T2-1-8
T3-1 T3-2 T3-3 T3-4
Row Spine
T2-4-1 T2-4-2 T2-4-4Data Center Spine
T1-1 T1-8T1-7
…T1-2
… …
Regional ...
https://azure.githu
b.io/SONiC/
” Albert Greenberg, Distinguished Engineer Director of Networking, Microsoft, SIGCOMM 2015
P802.3by)
• Today’s Server to Tier 0
• Interconnect is based on 25G technology
• Links are 50G Ethernet - 2x25G based on 25G Etherne...
Azureで実現したいこと LB製品を使った実装では
スケール • VIPあたり100Gbps
• 障害発生時、数1000のVIPを素早く
再構成したい
• $80,000で20Gbps
• VIPあたり20Gbps
• VIPあたり再構成に1...
Ananta: Cloud Scale Load Balancing” Microsoft, SIGCOMM 2013
VM Switch
VMN
Host Agent
VM1
. . .
VM Switch
VMN
Host Agent
VM...
2nd Tier: Provides
connection-level
(layer-4) load spreading,
implemented in servers.
1st Tier: Provides
packet-level (lay...
Ananta: Cloud Scale Load Balancing” Microsoft, SIGCOMM 2013
RouterRouter MUX
Host
MUXRouter MUX
…
Host
Agent
1
2
3
VM
DIP
...
Ananta: Cloud Scale Load Balancing” Microsoft, SIGCOMM 2013
Packet
Headers
Dest:
Server:80
Src:
VIP:1025
VIP:1025  DIP2
S...
Ananta: Cloud Scale Load Balancing” Microsoft, SIGCOMM 2013
足りなくなったら単純にサーバー足す、いちいちエンジニアリングしない
手作業で増設、設定していては無理なスケールと変化スピード
50Gbpsを超える世界で、CPUだけでは頑張れない
各種チップを活用しているが、FPGAが鍵
LinkedInのエンジニアリングチームもとんがってます
情報公開も積極的 (https://engineering.linkedin.com/blog )
インフラ野郎Azureチーム Night
インフラ野郎Azureチーム Night
インフラ野郎Azureチーム Night
インフラ野郎Azureチーム Night
インフラ野郎Azureチーム Night
インフラ野郎Azureチーム Night
インフラ野郎Azureチーム Night
インフラ野郎Azureチーム Night
インフラ野郎Azureチーム Night
インフラ野郎Azureチーム Night
インフラ野郎Azureチーム Night
インフラ野郎Azureチーム Night
インフラ野郎Azureチーム Night
インフラ野郎Azureチーム Night
インフラ野郎Azureチーム Night
インフラ野郎Azureチーム Night
インフラ野郎Azureチーム Night
インフラ野郎Azureチーム Night
インフラ野郎Azureチーム Night
インフラ野郎Azureチーム Night
インフラ野郎Azureチーム Night
インフラ野郎Azureチーム Night
インフラ野郎Azureチーム Night
インフラ野郎Azureチーム Night
インフラ野郎Azureチーム Night
インフラ野郎Azureチーム Night
インフラ野郎Azureチーム Night
インフラ野郎Azureチーム Night
インフラ野郎Azureチーム Night
インフラ野郎Azureチーム Night
インフラ野郎Azureチーム Night
インフラ野郎Azureチーム Night
インフラ野郎Azureチーム Night
Upcoming SlideShare
Loading in …5
×

インフラ野郎Azureチーム Night

8,392 views

Published on

2016/12/26 NHNテコラスにて

Published in: Technology
  • Be the first to comment

インフラ野郎Azureチーム Night

  1. 1. #infrayarou
  2. 2. { “名前” : “真壁 徹(まかべ とおる)”, “所属” : “日本マイクロソフト株式会社”, “役割” : “クラウド ソリューションアーキテクト”, “経歴” : “大和総研  HP Enterprise”, “特技” : “クラウド & オープンソース” }
  3. 3. https://docs.microsoft.com/ja-jp/azure/
  4. 4. [Microsoft Global Datacenters and Network Infrastructure] https://www.youtube.com/watch?v=bqZrejosqWU
  5. 5. 32 Regions Worldwide, 24 Generally Available… Central US Iowa West US California East US 2 Virginia US Gov Virginia North Central US Illinois US Gov Iowa South Central US Texas Brazil South Sao Paulo State West Europe Netherlands China North* Beijing China South* Shanghai Japan East Tokyo, Saitama Japan West Osaka India South Chennai East Asia Hong Kong SE Asia Singapore Australia South East Victoria Australia East New South Wales India Central Pune Canada East Quebec City Canada Central Toronto India West Mumbai Germany North East Magdeburg Germany Central Frankfurt United Kingdom Regions (2) North Europe Ireland US DoD West TBA US DoD East TBA East US Virginia Korea Regions (2) *Operated by 21Vianet Announced/not operational Operational 公表32リージョン/稼働済み24 その時点の配置 (*) (*)現在は公表38/稼働済み30
  6. 6. $azure location list --details info: Executing command location list + Getting ARM registered providers info: Getting locations... data: data: Location : eastasia data: DisplayName : East Asia data: data: […]
  7. 7. https://blogs.technet.microsoft.com/hybridcloud/2016/05/26/microsoft- and-facebook-to-build-subsea-cable-across-atlantic/ https://azure.microsoft.com/ja-jp/blog/microsoft-invests-in-subsea- cables-to-connect-datacenters-globally/
  8. 8. Colocation Density 2.0+ Power Usage Effectiveness (PUE) 1.4 – 1.6 PUE Discrete servers Capacity 20 year technology Rack Density & deployment Minimized resource impact Generation 1 Generation 2 Containment Modular Hyper-scale 1.2 – 1.5 PUE 1.12 – 1.20 PUE 1.07 – 1.19 PUE Containers, PODs Scalability & sustainability Air & water Economization Differentiated SLAs Deployment Areas & ITPACs No more traditional IT Right-sized Faster time-to-market Outside air cooled Fully integrated Resilient software Common infrastructure Operational simplicity Flexible & scalable Generation 3 Generation 4 Generation 5
  9. 9. S. Sankar, K. Vaid, M. Shaw “Impact of Temperature on Hard Disk Drive Reliability in Large Datacenters” Microsoft, IEEE, 2011 Inlet Temperature and Impact on Hard Disk Failure Rates HDD Case Temp Relative AFR HDD Case Temp Relative AFR 10 C 50 F 11 C 100% 30 C 100% 15 C 59 F 16 C 100% 34 C 100% 20 C 68 F 21 C 100% 38 C 100% 25 C 77 F 26 C 100% 41 C 106% 30 C 86 F 31 C 100% 45 C 131% 35 C 95 F 36 C 100% 49 C 153% 40 C 104 F 41 C 106% 53 C 189% 45 C 113 F 46 C 138% 56 C 231% 50 C 122 F 51 C 179% 60 C 281% HDD's in Front, ΔT 1˚C Buried HDDs Design, ΔT 20˚C cold de-rated to ΔT 10˚C hotInlet Temp “Azure Network and Datacenter Infrastructure: Enterprise Quality at Cloud Scale” Microsoft Ignite 2015
  10. 10. http://natick.research.microsoft.com/
  11. 11. • 2014 年にマイクロソフトとして カーボン ニュートラルを達成済み ( https://blogs.microsoft.com/green/category/renewable-energy/ ) https://news.microsoft.com/2016/11/14/microsoft- announces-largest-wind-energy-purchase-to-date
  12. 12. Geo Region Region DCs/Zones DCs/Zones
  13. 13. 汎用・柔軟 効率・性能
  14. 14. ( https://docs.microsoft.com/ja-jp/azure/virtual-machines/virtual-machines-linux-sizes )
  15. 15. October 15, 2016
  16. 16. October 15, 2016
  17. 17. • Hyper-V VMSwitch拡張 • AzureでSDNを実現するためのコア機能 • Address Virtualization for VNET • VIP -> DIP Translation for SLB • ACLs, Metering, and Security Guards • プログラマブル ルール/フローテーブルでパ ケット毎のアクション定義 • Windows Server 2016で利用可 NIC vNIC VM Switch VFP VM vNIC VM ACLs, Metering, Security VNET SLB (NAT) Microsoft's Production Configurable Cloud” Mark Russinovich, Chief Technology Officer, Microsoft Azure, SCS Distinguished Lecture, 11/15/2016
  18. 18. Host: 10.4.1.5 • VMSwitchがMatch-Action-Table型の APIをコントローラーへ提供 • コントローラーがポリシーを定義 • ポリシー毎のテーブル • パケット毎にどう処理すべきかを厳密 に定義 Tenant Description VNet Description VNet Routing Policy ACLsNAT Endpoints VFP VM1 10.1.1.2 NIC Flow ActionFlow ActionFlow Action TO: 10.2/16 Encap to GW TO: 10.1.1.5 Encap to 10.5.1.7 TO: !10/8 NAT out of VNET Flow ActionFlow Action TO: 79.3.1.2 DNAT to 10.1.1.2 TO: !10/8 SNAT to 79.3.1.2 Flow Action TO: 10.1.1/24 Allow 10.4/16 Block TO: !10/8 Allow VNET LB NAT ACLS Controller Microsoft's Production Configurable Cloud” Mark Russinovich, Chief Technology Officer, Microsoft Azure, SCS Distinguished Lecture, 11/15/2016
  19. 19. Flow Action Decap, DNAT, Rewrite, Meter1.2.3.1->1.3.4.1, 62362->80 VFP Southbound API GFT Offload API (NDIS) VMSwitch VM Northbound API GFT Table First Packet GFT Offload Engine SmartNIC 50G QoSCrypto RDMAFlow Action Decap, DNAT, Rewrite, Meter1.2.3.1->1.3.4.1, 62362->80 GFT Transposition Engine Rewrite SLB Decap SLB NAT VNET ACL Metering Rule Action Rule ActionRule Action Rule Action Rule Action Rule Action Decap* DNAT* Rewrite* Allow* Meter* ControllerControllerController Encap Microsoft's Production Configurable Cloud” Mark Russinovich, Chief Technology Officer, Microsoft Azure, SCS Distinguished Lecture, 11/15/2016
  20. 20. • IaaS仮想マシンD15v2、 DS15v2で利用可能 • プライベートプレビュー中 Microsoft's Production Configurable Cloud” Mark Russinovich, Chief Technology Officer, Microsoft Azure, SCS Distinguished Lecture, 11/15/2016
  21. 21. ToR FPGA NIC Server FPGA NIC Server FPGA NIC Server FPGA NIC Server CS0 CS1 CS2 CS3 ToR FPGA NIC Server FPGA NIC Server FPGA NIC Server FPGA NIC Server SP0 SP1 SP2 SP3 L0 L1/L2 Microsoft's Production Configurable Cloud” Mark Russinovich, Chief Technology Officer, Microsoft Azure, SCS Distinguished Lecture, 11/15/2016 October 15, 2016
  22. 22. Credits Virtual Channel Data Header Elastic Router (multi- VC on-chip router) Send Connection Table Transmit State Machine Send Frame QueueConnection Lookup Packetizer and Transmit Buffer Unack’d Frame Store Ethernet Encap Ethernet Decap 40G MAC+PHY Receive Connection Table Credits Virtual Channel Data Header Depacketizer Credit Management Ack Receiver Ack Generation Receive State Machine Solid links show Data flow, Dotted links show ACK flow Datacenter Network Microsoft's Production Configurable Cloud” Mark Russinovich, Chief Technology Officer, Microsoft Azure, SCS Distinguished Lecture, 11/15/2016
  23. 23. Microsoft's Production Configurable Cloud” Mark Russinovich, Chief Technology Officer, Microsoft Azure, SCS Distinguished Lecture, 11/15/2016
  24. 24. https://www.sdxcentral.com/articles/news/microsoft-azure-will-use-intel-silicon- photonics/2016/08/ Microsoft expects to deploy silicon photonics in Azure data centers soon, “initially going for switch-to-switch connectivity,” said Kushagra Vaid, Azure’s general manager of hardware engineering, speaking at the Intel Developer Forum.
  25. 25. “The problem I have right now? It is supply chain. I am not so worried about technology. We have our Open Cloud Server, which I think is very compelling in that it offers some real economic capabilities. But I have got to nurture my supply chain because traditionally we bought from OEMs and now we are designing with ODMs so we can take advantage of prices and lower our overall costs. So I am moving very, very quickly to build out new capacity, and I want to do it in a very efficient and effective way and it is really about the commoditization of the infrastructure.” ( https://www.nextplatform.com/2016/09/26/rare-tour-microsofts-hyperscale-datacenters/ ) Rick Bakken, Sr. Director, Data Center Evangelism, Microsoft
  26. 26. https://azure.microsoft.com/en-us/blog/microsoft-reimagines-open-source-cloud-hardware/
  27. 27. Azure Storage https://infrayarou.blob.core.win dows.net/vhds/myubuntu.vhd
  28. 28. FE 2 Partition 3 (F-J) Stream 2 Partition Layer Stream Layer
  29. 29. FE 2 Partition 3 (F-J) Stream 2 Request 1: Partition F; Row 102 Request 1: シンプルな例
  30. 30. FE 1 Partition 3 (F-J) Stream 4 Request 1: Partition F; Row 102 Request 2: Partition F; Row 507 Request 2: 異なる Front End、同じPartition Server、異なるStream Server
  31. 31. FE 4 Partition 4 (K-T) Stream 2 Request 1: Partition F; Row 102 Request 2: Partition F; Row 507 Request 3: Partition T; Row 356 Request 3: 違うFront End、違うPartition Server、同じStream Server
  32. 32. FE 4 Partition 5 (U-Z) Stream 3 Stream 4 Request 1: Partition F; Row 102 Request 2: Partition F; Row 507 Request 3: Partition T; Row 356 Request 4: Partition W; Rows 213 & 672 Request 4: トランザクションの例 ひとつのPartition Serverが複数のStream Server上のデータをAtomicに更新
  33. 33. https://docs.microsoft.com/ja-jp/azure/storage/storage-scalability-targets
  34. 34. Disk(Page Blob) C:¥, /dev/sda C:¥, /dev/sda copy C:¥, /dev/sda Image Cache copy C:¥, /dev/sda
  35. 35. L3 L2 L3 East/West トラフィックが 遠い Routerが大型に なり高コスト LB/FWが ボトルネック
  36. 36. T2-1-1 T2-1-2 T2-1-8 T3-1 T3-2 T3-3 T3-4 Row Spine T2-4-1 T2-4-2 T2-4-4Data Center Spine T1-1 T1-8T1-7 …T1-2 … … Regional Spine … T1-1 T1-8T1-7 …T1-2 T1-1 T1-8T1-7 …T1-2 Rack …T0-1 T0-2 T0-20 Servers …T0-1 T0-2 T0-20 Servers …T0-1 T0-2 T0-20 Servers Microsoft's Production Configurable Cloud” Mark Russinovich, Chief Technology Officer, Microsoft Azure, SCS Distinguished Lecture, 11/15/2016
  37. 37. https://azure.githu b.io/SONiC/
  38. 38. ” Albert Greenberg, Distinguished Engineer Director of Networking, Microsoft, SIGCOMM 2015
  39. 39. P802.3by)
  40. 40. • Today’s Server to Tier 0 • Interconnect is based on 25G technology • Links are 50G Ethernet - 2x25G based on 25G Ethernet Consortium spec • Bandwidth growth drove us to use 50G • Don’t require an 802.3 specification here • Tomorrow’s Server to Tier 0 • Interconnect will be based upon 50G PAM4 technology • Expect links will be 100G Ethernet (2x50G) • Choice for 802.3: • Create the specification • Let a consortium do it
  41. 41. Azureで実現したいこと LB製品を使った実装では スケール • VIPあたり100Gbps • 障害発生時、数1000のVIPを素早く 再構成したい • $80,000で20Gbps • VIPあたり20Gbps • VIPあたり再構成に1秒かかる 可用性 • N+1 冗長化 and Quick failover • 1+1 冗長化 or Slow failover 配置柔軟性 • サーバーとLB/NATはL2境界を越え て柔軟に配置したい • NATやDSR(Direct Server Return)は 同じL2でしかサポートされない テナント分離 • ユーザーテナント起因での過負荷が、 他テナントに影響しないようにした い • ユーザーテナントからの過度なSNAT要 求が他テナントに影響を及ぼす
  42. 42. Ananta: Cloud Scale Load Balancing” Microsoft, SIGCOMM 2013 VM Switch VMN Host Agent VM1 . . . VM Switch VMN Host Agent VM1 . . . Controller ControllerAnanta Manager VIP Configuration: VIP, ports, # DIPs Multiplexer Multiplexer Multiplexer. . . VM Switch VMN Host Agent VM1 . . . . . .
  43. 43. 2nd Tier: Provides connection-level (layer-4) load spreading, implemented in servers. 1st Tier: Provides packet-level (layer-3) load spreading, implemented in routers via ECMP. 3rd Tier: Provides stateful NAT implemented in the virtual switch in every server. Multiplexer Multiplexer Multiplexer. . . VM Switch VMN Host Agent VM1 . . . VM Switch VMN Host Agent VM1 . . . VM Switch VMN Host Agent VM1 . . . . . . Ananta: Cloud Scale Load Balancing” Microsoft, SIGCOMM 2013
  44. 44. Ananta: Cloud Scale Load Balancing” Microsoft, SIGCOMM 2013 RouterRouter MUX Host MUXRouter MUX … Host Agent 1 2 3 VM DIP 4 5 6 7 8 Dest: VIP Src: Client Packet Headers Dest: VIP Dest: DIP Src: Mux Src: Client Dest: Client Src: VIPPacket Headers Client
  45. 45. Ananta: Cloud Scale Load Balancing” Microsoft, SIGCOMM 2013 Packet Headers Dest: Server:80 Src: VIP:1025 VIP:1025  DIP2 Server Dest: Server:80 Src: DIP2:5555
  46. 46. Ananta: Cloud Scale Load Balancing” Microsoft, SIGCOMM 2013
  47. 47. 足りなくなったら単純にサーバー足す、いちいちエンジニアリングしない 手作業で増設、設定していては無理なスケールと変化スピード 50Gbpsを超える世界で、CPUだけでは頑張れない 各種チップを活用しているが、FPGAが鍵
  48. 48. LinkedInのエンジニアリングチームもとんがってます 情報公開も積極的 (https://engineering.linkedin.com/blog )

×