See how Inoreader migrated from Bare-metal servers to OpenNebula + StorPool. Inoreader has reached a tipping point where it was no longer sustainable to add hardware servers to store the billions of articles that hundreds of thousands of users read every day across the globe. With OpenNebula and StorPool we can now utilize those servers far more efficiently and no longer worry about performance and downtime.
Right Money Management App For Your Financial Goals
OpenNebulaConf2018 - How Inoreader Migrated from Bare-Metal Containers to OpenNebula and StorPool - Yordan Yordanov - Innologica
1.
2. I have 10+ years of experience in the Telco IT sector, working with large enterprise solutions as well as building
specialized solutions from scratch.
I have founded a company called Innologica in 2013 with the mission of developing Next-Gen OSS and BSS
solutions. A side project was born back then called Inoreader, which quickly turned into a leading platform for
content consumption and is now a core product of the company.
Yordan Yordanov
2
CEO Innologica
3. Introduction
Agenda
3
Presenter and company intro
Who are we and what we do?
Migration to OpenNebula and StorPool
In order to fix our scalability problems we pinpointed the
need for a virtualization layer and distributed storage. After
thorough research we ended up with OpenNebula and
StorPool
Inoreader
What is Inoreader?
Tips
Infrastructure issues
We were facing numerous scalability issues while
at the same time we hade a an array of servers
doing nothing mostly because of filled storage. At
certain point we hit a brick wall.
QA
If you have any questions I will gladly answer
them
Some useful takeaways for you.
4.
5. Who Are We?
5
Product company
We are not a sweatshop. We
make successful products.
International market
Our customers are all over the
globe.
Relaxed environment
We do not push the devs, but
we cherish top performers.
Smart team
The team is small, but each
member brings great value.
6. Inoreader
RSS aggregation platform and information hub
6
200,000 MAU
We have 200k monthly active users (MAU) and more than
30k simultaneous sessions in peak times. Recently passed 1M
registrations. 10k+ premium subscribers.
17,000,000,000 articles in MySQL and ES
We keep the full archive in enormous MySQL Databases and
a separate Elasticsearch cluster just for searching. Around
20TB of data without the replicas. 10M+ new articles per day.
1,300,000 feed updates per hour
We need to update our 15+ Million feeds in a timely manner.
A lot of machines are dedicated for this task only.
60 VMs and 14 physical hosts
The platform is currently running on 60 Virtual Machines
mainly in our main DC. There are some physical hosts that
were not good candidates for virtualization mainly for
Elasticsearch.
8. Hardware capacity
8
We needed to constantly buy new servers just to keep up with the growing
databases, because local storages were being quickly exhausted.
We were using expensive RAID cards and RAID-10 setups for all
databases. Those severs never used more than 10% of their CPUs, so it
was a complete waste of resources.
Our problem
CPU
10%
Memory
Storage
Rack space
50%
90%
100%
9. Hardware failures
Not so common but always hair-pulling
9
All components are bound to fail. Whenever we lose a server, there was
always at least some service disruption if not a whole outage. All databases
needed to have replications, which skyrocketed server costs and didn’t
provide automatic HA. If a hard-drive fails in a RAID-10 setup you need
to replace it ASAP. Bigger drives are more prone to cause errors while
rebuilding.
Large databases on RAID-10 are slow to recover from crashes, so
replications should be carefully set up and should be on identical
(expensive) hardware in case a replication should be promoted to a master.
Nobody likes to go to a DC on Saturday to replace a failed drive, reinstall
OS and rotate replications. We much prefer to ride bikes!
Problem description
11. Project Timeline
11
2017
Nov 2017
Dec 2017 – Jan 2018
Feb 2018
Mar 2018
PROJECT START
We knew for quite a while
that we need a solution to the
growth problem.
PLANNINGAND FIRST TESTS
While the hardware was in
transit we took our time to
learn OpenNebula and test it
as much as possible. We also
started our first VMs.
SUCCESS
We have finally migrated our
last server and all VMs were
happily running on
OpenNebula and StorPool.
CHOOSINGA SOLUTION
We held some meetings with
vendors and researched
different solutions
EXECUTION
We have migrated all servers
through several iterations
which will be described in
more detail here
12. Hardware
12
StorPool nodes
We chose a standard 3x SuperMicro SC836 3U servers.
Switches
As recommended by StorPool we chose Quanta LB8 for the
10G network and Quanta LB4-M for the Gigabit network.
Hosts
We have reused our old servers, but modified their CPUs and
memory.
Others
10G LAN cards and cables
13. StorPool Nodes
13
StorPool recommends to use commodity hardware. Supermicro offers a
good platform without vendor specific requirements for RAID cards, etc.
and is very budget friendly.
Our setup:
• Supermicro CSE-836B chassis
• Supermicro X10SRL-F motherboard
• 1x Intel Xeon E5-1620 v4 CPU (8 threads @3.5Ghz)
• 64GB DDR4-2666 RAM
• Avago 3108L RAID controller with 2G cache
• Intel X520-DA2 10G Ethernet card
• 8x 4TB HDD LFF SATA3 7200 RPM
• 8x 2TB HDD LFF SATA3 7200 RPM (reused from older servers)
14. Gigabit Network – Quanta LB4M
14
We were struggling with some old TP-Link SG2424 switches that we
wanted to upgrade, so we used the opportunity to upgrade the regular 1G
network too. We chose the Quanta LB4M.
Key aspects
• 48x Gigabit RJ45 ports
• 2x 10G SFP+ ports
• Redundant power supplies
• Very cheap!
• EOL – You might want to stack up some spare switches!
• Stable (4 months without a single flop for now)
15. 10G Network – Quanta LB8
15
Again due to StorPool recommendation we procured three Quanta LB8
switches. They seem to be performing great so far.
Key aspects
• 48x 10G SFP+ ports
• Redundant power supplies
• Very cheap for what they offer!
• EOL – You might want to stack up some spare switches!
• Stable (4 months without a single flop for now)
16. Hosts
16
We have reused our old servers, but with some significant upgrades. We
currently have 14 hosts, all with the following configuration:
• Supermicro 1U chassis with X9DRW motherboards
• 2x Intel Xeon E5-2650 v2 CPU (32 total threads)
• Dual power supply
• 128G DDR3 12800R Memory
• Intel X520-DA2 10G card
• 2xHDD in mdraid for OS only
18. Preparation and OpenNebula learning
18
While waiting for our hardware to arrive we installed OpenNebula on two
hosts with a shared NFS datastore and we tried everything we can think of
to battle test it.
After we were happy with how things look and work, we started moving
some small things like name servers, smtp servers, ticketing systems, etc.
to dedicated VMs to decouple servers from services, which made our lives
easier later.
19. New Rack
19
We have rented a new rack in our collocation center since we didn’t have
any more space available in the old rack.
The idea was simple – Deploy StorPool in the new rack only and gradually
migrate hosts.
20. StorPool Nodes
20
The servers landed in our office in late January.
It was Friday afternoon, but we quickly installed them in the lab and let the
StorPool guys do their magic over the weekend.
22. InstallationDay
22
Fast forward several hours and we had our first StorPool cluster up and
running. Still nо hosts. StorPool needed to perform a full cluster check in
the real environment to see if everything works well.
23. First hosts
23
The very next day we installed our first hosts – the temporary ones that
were holding VMs installed during our test period. Those VMs were still
running on local storage and NFS.
The next step was to migrate them to StorPool.
24. VM Migration to StorPool
24
Shut down the VM
Use SunStone or cli to shut down
the VM.01
Create StorPool volumes
On the host, use the storpool cli to
create volume(s) for the VM with
the exact size of the original images
02
Copy the Volumes
Use dd or qemu-convert for raw and
qcow2 images respectively to copy
the images to the StorPool volumes.
03
Reattach images
Detach local images and attach
StorPool ones. Mind the order.
There’s a catch with large images*
04
Power up the VM
Check if the VM boots properly.
We’re not done yet…05
Finalize the migration
To fully migrate persistent VMs use the
Recover -> delete-recreate function to
redeploy all files to StorPool.
06
*Large images (100G+) takes forever to detach on slow local storage, so we had to kill the cp process and use the onevm recover success option to lie
to OpenNebula that the detach actually completed. This is risky but save a LOT of downtime.
After all VMs are migrated, you can delete the old system and image datastores and leave only StorPool DSs
At this point we are completely on StorPool!
StorPool helps their customers with this step, but here’s the summary of what we did.
25. Next hosts
25
From here on we had several iterations that consisted of roughly the
following:
• Create a list of servers for migration. The more hosts the more servers
we can move in a single iteration
• Create VMs and migrate the services there
• Use the opportunity to untangle microservices running on the same
machine
• Make sure servers are completely drained from any services.
• Shut down the servers and plan a visit to the DC the next day
• Continue on the next slide…
31. RINSE AND REPEAT
At each iteration we move more servers at once
because we have more capacity for VMs
32. Current capacity
32
At the end we have achieved 3x capacity boost in terms of processing
power and memory with just a fraction of our previous servers, because
with virtualization we can distribute the resources however we’d like. In
terms of storage we are on a completely different level since we are no
longer restricted to a single machine capacity, we have 3x redundancy and
all the performance we need.
We did it!
Allocated CPU
37%
Allocated Memory
Storage
Rack space
32%
67%
70%
33. 33
Extreme Makeover
The old and the new setup
33
100% Virtualized
No more services running
directly on bare-metal.
Lighter power
footprint300% more capacity with
60% of the previous servers
with room for expansion.
Performance gains
Huge compute and storage
performance gains.
Maintainability is a breeze
too.
34. Our Dashboard
34
A glimpse at our OpenNebula dashboard.
400 CPU cores and 1.5TB of RAM in just 14 hosts.
35. Hosts view
35
All hosts are all nicely balanced using the default scheduler.
There’s always enough room to move VMs around in case a host
crashes or if we need to reboot a host.
37. Optimize CPU for homogenous clusters
37
Available as template setting since OpenNebula 5.4.6. Set to host-
passthrough.
This option presents the real CPU model to the VMs instead of the default
QEMU CPU. It can substantially increase the performance especially if
instructions like aes are needed.
Do not use it if you have different CPU models across the cluster since it
will cause the VMs to crash after live migration.
For older OpenNebula setups set this as RAW DATA in the template:
<cpu mode="host-passthrough"/>
38. Beware of mkfs.xfs on large StorPool volumes inside VMs
38
We noticed that when doing mkfs.xfs on large StorPool volumes (e.g.
4TB) there was a big delay before the command completes. What’s worse
is that during this time all VMs on this host starve for IO, because the
storpool_block.bin process is using 100% CPU time.
The image shown on the left is for 1TB volume.
The reason is that mkfs uses TRIM by default and the StorPool driver
support that.
To remedy it use -K option for mkfs.xfs or -E nodiscard for mkfs.ext4,
e.g.:
• mkfs.xfs -K /dev/sdb1
• mkfs.ext4 -E nodiscard /dev/sdb1
39. Use the 10G network for OpenNebula too
39
This is probably an obvious one, but it deserves to be mentioned. By
default your hosts will probably resolve others via the regular Gigabit
network. Forcing them to talk through the 10G storage network will
drastically improve the live VM migration. The migration is not IO bound
so it will completely saturate the network.
Usually a simple /etc/hosts modification.
Consult with StorPool for your specific use case before doing that.
Live migrating a VM with 8G of ram takes 7 seconds on 10G. The same
VM will take aboud 1.5 minutes on a Gigabit network and will probably
disturb VM communications if the network is saturated.
Live migration on highly loaded VMs can take significantly longer and
should be monitored. In some cases it’s enough to stop busy services for
just a second for the migration to complete.
40. Other tips
40
Those are the more obvious ones that probably everyone uses in
production, but still worth mentioning.
• Use cache=none, io=native when attaching volumes
• Use virtio networking instead of the default 8139 nic. The latter has
performance issues and drops packets when host IO is high
• Measure IO latency instead of IO load to judge saturation. We have
several machines with constant 99% IO load which are doing perfectly
fine.
/etc/one/vmm_exec/vmm_exec_kvm.conf:
…
DISK = [ driver = "raw" , cache = "none", io = "native",
discard = "unmap", bus = "scsi" ]
NIC = [ filter = "clean-traffic", model="virtio" ]
….
42. Grafana Dashboards
42
We have adapted the OpenNebula Dashboards with Graphite
and Grafana scripts by Sebastian Mangelkramer and used them
to create our own Grafana dashboards so we can see at a glance
which hosts are most loaded and how much overall capacity we
have.
43. Grafana TV Dashboard
43
Why not have a master dashboard on the TV at the office? This gives our
team a very quick and easy way to tell if everything is working smoothly.
If all you see is green, we’re good J
This dashboard show our main DC on the first row, our backup DC on the
second and then some other critical aspects of our system. It’s still a WIP,
hence the empty space.
At the top is our Geckoboard that we use for more business KPIs.
44. Server Power Usage in Grafana
44
Part of our virtualization project was to optimize the electricity
bill by using less servers. We were able to easily measure our
power usage by using Graphite and Grafana.
If you are interested, the script for getting the data into Graphite
is here:
https://gist.github.com/Jacketbg/6973efdb41a2ecfcf2a83ea84c08
6887
The Grafana Dashboard can be found here:
https://gist.github.com/Jacketbg/7255b4f81ebb2de0e8a5708b433
5c9d7
Obviously you will need to tweak it, especially the formula for
the power bill.
45. StorPool’s Grafana
45
StorPool were nice to give us an access to their own Grafana
instance where they collect a lot of internal data about the system
and KPIs. It gives us great insights that we couldn’t get otherwise
so we can plan and estimate the system load very well.
46. What’s Left?
46
SSD Pool
We are currently only using a HDD pool, but we could benefit
from a smaller SSD pool for picky MySQL databases.
Add more hosts
As the service grows our needs will too. We will probably
have rack space for the near years to come.
Add more StorPool nodes
We have maxed out the HDD bays on our our current nodes,
so we’ll probably need to add more nodes in the future.
47. THANK YOU !
READ MORE ON BLOG.INOREADER.COM
GET THIS PRESENTATION FROM ino.to/one-
amsterdam