Hello and welcome. Thanks to Patrick and Fujitsu for providing this great opportunity.
How many of you know who ARM is an what we do? OK for those that done know, i'm here to tell you that i can almost guarantee that you are a user of ARM technology. Your mobile phone...
So why are we here at a seminar on enterprise storage? Because the efficiency that you expect in your mobile device is now being delivered to enterprise applications...in this case storage.
Xu Luo
Server Segment Manager
Akira Shimizu
Segment Marketing Manager
As the scale of compute continues to grow we find ourselves at a time of major disruption. Not a bad disruption but an opportunity to leverage new levels of access to deliver new services across more connected devices with significantly less latency.
To take advantage of this opportunity means delivering new levels of scalability and portability in the network enabling compatible services to be deployed whether from the data center or at the edge. And the scale will require significantly improved levels of efficiency and compute density.
This broad range of solutions addressing such a diverse set of requirements can only be delivered by the breadth of ARM and the ARM ecosystem
Today I want to make you aware of the opportunity for increased efficiency and scale of leveraging the ARM ecosystem for Ceph by first showing you why it will be crucial to tomorrow’s data center. And then share with you some examples of scale and efficency that is being delivered by the ARM ecosystem…
But for those of you who may not know ARM, let me introduce you to ARM and our Ecosystem
First thing to understand is that “ARM” is thought of in many different ways depending on the topic.
ARM as a company has been around for just over 25 years now
We license technology, processor designs and other IP to our customers enabling them to add their IP and create the appropriate System on Chips or SOC for their markets and customers
As a company we have enabled so many of our customers to create SOCs that in 2015, about 15 Billion chips were sold into the market with ARM technology.
No these go into different markets including a few Billion chips powering smartphones to bluetooth chips to chips in disk drives.
ARM also has a very technical meaning.
It is a computer architecture… The ARM Architecture
As a RISC the ARM Architecture was defined with efficiency in mind from the beginning...
The architecture has undergone a few revisions over the years and today you will mostly hear about ARM v8 as it is now powering over 50% of the smartphones.
And with the 64 bit support it brings, it is what is being used for Servers and other enterprise applications.
And we continue to improve the architecture with new extensions so that there is 8.1 and 8.2. 8.2 added RAS support to better support enterprise applications.
Now just this week, we announced SVE or Scalar Vector Extensions to better address HPC including Fujitsu's Post-K machine...
And finally and most importantly, ARM is an Ecosystem. For ARM works with our partners to jointly develop and enable new markets, products, and opportunities…
At the same time, many members of the ecosystem compete with each other
For instance, a few years ago we were able to get the likes of HP and Dell, MS and Red hat, and Cavium and AMD to work together to create a base system specification for ARM servers
This collaboration and competition allow for companies to work together where there is little differentiation and focus their resources and investments on the key technologies which differentiate them from their competitors
In this manner there is the opportunity for success for everyone...
Together with our Connected Community®, we are breaking down barriers to innovation for developers, designers and engineers, and enabling competition and choice across technology markets. We share success with our partners….By leveraging our technology and working together, our partnership has the opportunity to succeed far beyond what they would all do alone.
Andrew Carnegie couldn't’t have said it better…
Fundamentally ARM and our partners are driving an almost continuous pace of innovation. Innovation in business model, technology, and product development has been a part of ARM and our partners since the beginning
And as we said, our business model is dependent on partnership. Our success is dependant upon our customers’ success
And this is all built upon a foundation of energy efficiency..
Disrupting existing markets
And creating new opportunities through innovation
Now… I am sure that many of you have seen this or other similar data. IP traffic continuing to grow from from 1ZB this year to 2.3ZB in 2020 and storage ballooning from 7 zettabytes today to over 44 zettabytes by 2020
But this is only part of it... it is not just the amount of data being moved, it is how and where and the type of data
Consider the impact of the internet of things. Analysts have predicted 10s to 100s of billions of internet connected things over the next few years.
And looking at the network specs for 5g and predictions about the impact it's deployment...
...there will be a 30 fold increase in access nodes
<click>
...and the amout of bandwidth required to deal with all of the data we mentioned. Tn the UK, virtual reality and the move to HD and 4k content will drive up BW requirements by 22x with more than 75% of the traffic being video
<click>
...To support IoT and access and control of real time sensors, the 5g specs are for a maximum of 1 ms end to end latency. This is a major driver for more compute at the edge of the network.
<click>
...And finally it is the massive volume of devices, not just the growing number of consumer devices we all have but also those IoT nodes throughout our infrastructure that is driving the requirement for massive increases in the density of connections. Consider autonomous driving and smart cities. Especially with the growing densities of cities just like Tokyo...
-------------------------- OLD NOTES - USE AS YOU SEE FIT
The point is that it’s not just a bigger hammer – it’s a question of scale and specialisation
Need to be able to express the requirements in a simple definitioin – from streaming high speed video to support for 10^6 low BW connections per sq km for sensors…
The need here is to quantify some of the claims made.
The latest hype topic whether this be driven from IoT, Connected things, connected home, health, augmented reality etc all will place a load on the network.
Some data points:
Three axis of pressure:
MTC MC (Machine Type Comms Mission Critical)
MTC NMC (machine Type Comms Non-Mission Critical)
Mobile Broadband (subsciber driven)
Use cases for each and explain diverse set of challenges.
From EE UK market studies (Mobile Broadband):
Video will be the driver of long term growth. There will be a 22x increase in data over the UK network between 2015 and 2030. The network in the UK will be required to carry 2200PBytes per month. 76% of the traffic will be video related. 4K video will be the majority of traffic in this timeframe with a data rate per stream of over 18Mbps. This places a demand on the network and forces carriers to consider caching this content on the edge of the network to meet latency demands.
Augmented reality and gaming drive about 2/3 of the remaining traffic over the cellular network and amount to approximately 600PetaBytes per month.
MTC NMC – drives bandwidth throughput and connectivity everywhere. 2 foils below is an old slide but represents the amount of data that requires to be handled. This is not latency sensitive and can be handled in the core cloud (bring in NFV/Virtualisation), but we require connectivity and bandwidth/thoughput.
Another to highlight is the development of Narrow-Band IoT (NB-IoT) in 3GPP that is expected to support massive machine connectivity in wide area applications. NB-IoT will most likely be deployed in bands below 2GHz and will provide high capacity and deep coverage for an enormous numbers of connected devices.
MTC MC – Need equivalent data on Machine critical requirements that would drive reduction in latency and require data to be processed and cached at the edge of the network in the edge cloud. To support such latency-critical applications, 5G should allow for an application end-to-end latency of 1ms or less. Many services will distribute computational capacity and storage close to the air interface. This will create new capabilities for real-time communication and will allow ultra-high service reliability in a variety of scenarios, ranging from entertainment, automotive, health to industrial process control.
In addition to very low latency, 5G should also enable connectivity with ultra-high reliability and ultra-high availability. For critical services, such as control of critical infrastructure and traffic safety, connectivity with certain characteristics, such as a specific maximum latency, should not merely be ‘typically available.’ Loss of connectivity and deviation from quality of service requirements must be extremely rare. For example, some industrial applications might need to guarantee successful packet delivery within 1 ms with a probability higher than 99.9999 percent.
This the allows us to bridge to the so-whats of what we are doing…
So as the device that normally communicate back to the servers grow and grow.
<click>
With more consumer devices, more mobile phones, and of course the 10s of Billions of IoT devices predicted by so many
<click>
We will see the increasing demands placed on the required compute; the compute density, and the need for increased data throughput both on chip and off-chip to memory. There will also be a significant need for acceleration in the network.
In addition the future of cloud and network infrastructure is not just about more servers and disks...The future will be driven by the types of SERVICES that need to be delivered…. And the specific needs of that service.
It means that if you are a cloud or network provider you need datacenters and networks that are highly reusable, highly reconfigurable and highly flexible.
Where the topology, the compute, the storage can be adjusted to match the service being delivered.
You will need an Intelligent Flexible Cloud…
Until now, there’s been a clear distinction between what runs in the cloud… and what runs in the network.
THAT WILL CHANGE.
Because of the amount of traffic, the number of nodes, 5G requirements, and most importnatly the new services they enable, there is a fundemental need for an intelligent flexible cloud.
No longer will everything be able to be run from a central server out the the ednge. And I believe there will be new players and business models emerging to support this….
...the future is more about pushing server functionality throughout the network with some being pushed further to the edge of the network and others staying within the core datacenter. This diagram is actually wrong beause while some will move to the edge, some services will still stay in the central data center. It will be throughout...
This means that networking and cloud capability is delivered closer to where its actually needed…Another way to think about this is that we will have data centers throughout the network…throughout this Intelligent Flexible Cloud
This means that you will need storage solutions that can scale to be deployed as needed. Already we see some CDNs deploying content at the edge to minimize the bandwidth impacts of streaming HD content today...
For this to happen computing, storage, and networking will have to be...
Scalable – we need to make sure we are delivering HW that can scale from the data center to the smallest edge and still run the same SW, services, manageability…
Portable – it needs to be portable across a diverse range of workload optimized hardware. Essentially running standard server software on right-sized or workload optimized hardware without any impact or even knowledge by the developer of those apps and services. The software can take advantage of the underlying acceleration as needed without impact to the delivery or rollout of the service. Really leverage truly open hardware and software interface standards.
And Optimized – Leveraging workload acceleration to meet the performance demands without breaking the power or size limitations but also deliver a significant increase in compute density for general workloads to be run within those constraints.
Let me show you...
This is an example of some containerized network Services running in the data center. It is running NAT and IPsec and some other services. Leveraging docker or some other containers as well as the OS and orchestration.
<Click>
Now to be truely scalable, it shoud also run on hardware you might find at the edge. in this case this is a small cluster of 5 O-Droid boards. The server in this case is 182 times larger than the orioid cluster
The point being you can deliver compatible services from the core to Edge to Access. Run the exact same software on the small cluster as you run in the data center to deliver the appropriate level of services
Now we also need to make sure the solutions are both optimzed and portable.
Optimized potentially using hardware acceleration. Maybe network acceleration or some other workload accelerator build into the chip. Now in one particular case, acceleration was used for deep packet inspection and they wre able to demonstrate significant, power efficiency, and density by taking advantage of HW acceleration.
Now while optimization is important, portability is equally or maybe even more important. The application has to be able to just run. If the SW has to be rewritten everytime based on the hardware acceleration that is needed, then it will be completely impractical. The SW MUST be portable to different hardwre with different accelerators.
This can be achieved by using open and standard SW APIs and HW interfaces.
One the SW side we have ODP
Enabling application developers to leverage the full capabilities of SoC’s
Agnostic to DataPlane API – can make use of DPDK on systems without DataPlane specific accelerators
As a HW interface for different accelerators, ARM is supporting CCIX
Enabling SoC manufacturers to integrate Application Specific accelerators
Enabling Accelerator IP marketplace
Reducing cost and time to market for Network Accelerated SoC’s
-------------------------BACKGROUND
Based on Axxia data from VNF Acceleration PoC at ETSI this is running Open Vswitch and in applications like Deep Packet Inspection.
Accelerated solution vs. plain X86 server with software
Kontron Data points
This level of scalability is already being deliver by many of our partners in networking, servers, and storage. different numbers of cores and different accelerators, they are already demonstrating huge gains in efficiency…
So lets look a little closer at storage and see how this applies…
As the scale of computing continues to grow with almost no limits, we are also seeing the need to deliver an increased amount of that compute at the edge...
Historically ARM processors have been in storage but really for the function of controlling the disks themselves or managing solid state storage…a typical HDD will have multiple ARM based chips...
But that is NOT what we are talking about today…Today is about interprise storage at a petabyte scale...
Here is an example where we are seeing companies innovate storage solutions by taking advantage of the benefits of ARM-based systems. In this case, Cynny Space is a european cloud provider where they deliver storage as a service. With the ARM-based platfrom they are able to deliver the required interfaces with the high level of reliability but at a significant cost saving. That cost savings is realized a benefit of the power savings and the less expensive hardware configuration...
Another example is Huawei where they have shown significant power savings by deploying ARM based storage solutions…
This is their Ocean UDS storage system where the ARM processor enables them to realize a 50% reduction in system power or a MAXIMUM of 4.2W per TB…
We have even seen rack level reference designs from ODM and system providers like Gigabyte...
In this case they leverage 5 Cavium based servers for compute with another 20 Annapurna based storage servers.
<click>
The 1Ux16 storage servers leverage the integration in the Anapurna chip for not only the peripherals for I/O but also with Hardware support for Raid and erasure coding. In this way they can increase the density and lower the cost of the storage system…
… And just this week HPE announced their new addition to their StoreVirtual product line to deliver a of more affordable enterprise class storage for SMB…
Delivering the same level of service at a fraction of the cost compared to using a legacy architecture. This article from the register compares an existing model with the new 3200 model. Same 2U form factor, same 14.4TB but 58% less cost.
Now many of these examples are proprietary solutions so you might ask…
…What about Ceph?
<CLICK>
Well I’m here to tell you that these benefits are applicable to Ceph based systems.. . TODAY!
There is no reason the benfits from higher efficiency and intergation cannot be realized leveraging open source SW storage like CEPH.
So let’s take a look…
First of all, Ceph on ARM has been in the community for some time but as of this year with the Jewel release it is an officially supported part of the upstream community . If you haven’t read the blogs on this you should…
Now as I said, CEPH on ARM has been in the open source so much so that Suse has discussed brining enterprise storage on ARM.
With this release, the momentum will build even more for Ceph on ARM.
…
23
There are already solutions in the market using ARM and Ceph in storage solutions.
Ambeded, for example, is delivering a Ceph based solution putting several microservers together to deliver a Ceph Cluster Storage Appliance… Earlier this year that product won an award at Interop 2016 for Storage…
This solution is a different sort of architecture than legacy based solutions in that it distributes the work a little bit different…
------------------------------
As you will mention Ambedded’s arm solution during your world tour, here has some highlight on our product.
1. we are fully distributed storage solution (HW+SW) - thanks for arm SoC we are able to achieve this micro-server architecture with much less power consumption, high density and HA.
2. On the performance data (the last 2 pages), it shows:
Uplink bandwidth matters on the data exchange performance: mars 200 has 40G uplink, we have done the testing on 20G/40G to see the performance difference
The true scale-out in terms of “Performance” & “capacity” ; with our arm micro-server architecture working with optimized CEPH, our mars 200/201 shows linear aggregation on performance & capacity.
They use a micro server approach where they have a cluster of 8 ARM based micro servers each connected to a drive in a rack. They include the networking, switches, and power supplies while minimizing power…
They still deliver the reliability and redundancy that is required for enterprise storage but are able to significantly reduce the power…
But the power savings are only 1 of the benefits…
Another benefit is apparent when you start looking at the failure domains. In the case of the legacy approach, where each server is connected to a number of drives running multiple OSDs, if that server should fail, you take out N drives.
With the micro server approach, since it is one micro server to drive, you only take out the single drive. Consider the resultant re-build time in these two different approaches. Instead of a whole server with 10s of drives going down, now you only have a single drive failure.
And from a performance standpoint, it delivers …
Now this data, like much of what I will show you was generated from our partners so I don’t have all the specifics but much of it is self explanatory.
In this case as more nodes are added the performance scales linearly in true “scale out” fashion…
Looking back to the power savings…power equals $$ and a quick estimate on the difference in server power shows the potential for $10k per rack saving just on the power. This does not even include the impacts of cooling costs.
Now if you want to look at an architecture that is more traditional with a number of drives per processor then you only have to look at solutions from a few of our server chip partners. The next few slides are excerpts from some presentations on Ceph storage from both Applied Micro with their x-Gene System on Chip and Cavium with their ThunderX System on Chip…
Let’s start by looking at some of the recent work APM has done…
Applied Micro has their x-Gene 1 processor. It is an SOC with 8 cores and a number of integrated peripherals such as SATA and Networking. It also supports a large number of memory controllers. In fact to get the same memory footprint as a legacy system requires you purchase a more expensive procesor where you don’t really need the additional performance. After all storage is about the balance of memory, IO, and processor. This also happens to be the same SOC that is in the HP platform I mentioned earlier.
APM has created a 1U storage platform called Mudan. It includes 12 HDDs with 2 SSD for journaling. They have recently gone through a similar process as the reference architectures with Red Hat to better characterize the performance of Mudan…
Now Mudan is actually a product that you can purchase for small deployments or just to test as a POC.
The results of the testing have been documented in an application note on how to deploy a Ceph cluster using X-Gene1.
Essentially they have created a well balanced configuration that equally saturates the SSDs, HDDs, and network…
On the Cavium front, this is a storage platform from Penguin. It actually leverages the 16 integrated SATA ports that the ThunderX SoC provides to minimize cost and power at a system level. They recently did a head to head comparison of a 24 core ThunderX ST with a legacy x86 based server…
Here are some of the results where it is clear that the ThunderX was able to perform as good or better than an equivalent x86 legacy solution. Both RADOS Read and Writes as well as Block Writes.
But when you look at the complexity of the HW solutions you immediately see the areas for savings at a system level so above and beyond just the processor costs (which can be quite high).
The CEPH monitors require not only the CPU but often will require a chipset and external Network card.
And the OSD nodes will need those as well as an HBA expander to get the number of drives. Remember the ThunderX solution includes 16 SATA interfaces on the chip..
In fact..
With SOC integration, the Cavium system is able cost 40-60% less than the legacy systems. The thing to keep in mind is that this is consistent with the HPE system that was just announced. They were offering the ARM based system at 58% less than the previous system.
Yet the perormance is the same or better…
Now, do you remember this HDD
<CLICK>
Well it is actually much more than just an HDD. It is actually what WDLabs calls a Converged Micro Server.
They have integrated a processor SOC onto the control board of the HDD creating a micro server…So what looks like a standard HDD and fits in the same for factor as a standard HDD, is actually a self contained micro server…
This converged micro server adds the processor to the HDD. It includes dual Gigabit ethernet as well as memory so that the OS, storage, and networking are all integrated with the HDD.
As a sort of prototype, they are investigating a range of potential uses. Besides running the OSDs – 1ODS per drive like CEPH was originally conceived, they also see the opportunity to run other applications on the micro server
Here are some of the specifications but essentially it has everything on board that a server would…In its current incarnation, it is ARMv7 and only 1 GB of DDR but they are looking to the next generation to update those.
Now they have used their Helium filled drive which consumes less power so that the addition of 3W for the micro server portion of the board, they are still within the power consumption of a traditional HDD.
Now this is a first generation system and they have done some preliminary testing …
With Ceph running on each micro server, they pulled together 4 PB into a single cluster.
With the dual ethernet they have one connecting all of the drives for the private netowrk for CEPH with the other the public network for access to the cluster by the clients.
In this case the cluster was built up using 1U x 12 chassis. But they have also demonstrated this in a 4U x 72 chassis.
Much of this is documented in a Ceph blog from May but here is some of the data…
They tested performance from a read and write standpoint
The impact of the network
And the comparison of using Erasure coding vs replication.
I don’t expect you to read these graphs but please take a look at the CEPH community blog where the initial testing was described…
So in summary…
We encourage you to explore Ceph on ARM for yourself…
Tomorrows cloud will be an intielligent flexible cloud to distribute the workloads throughout the network to deliver the required services.
Standard server SW will be running on workload optimized hardware to most efficiently deliver the services to customers.
And the ARM ecosystem is making innovative solutions available for servers, networking and of course Storage to provide scalealbe, portable, and optimized solutions. With already demonstrated benefits around cost, energy efficiency and even simplicty.
So please explore ARM based solutions for yourself…
Thank you…