SlideShare a Scribd company logo
Reliability, Availability
and Serviceability on
Linux
Mauro Carvalho Chehab
Linux Kernel Expert
Samsung Open Source Group
Sep 16, 2013

Open Source Group – Silicon Valley

Not to be used for commercial purpose without getting permission
All information, opinions and ideas herein are exclusively the author's own opinion

© 2013 SAMSUNG Electronics
Co.
What is RAS (1)
●

Used originally by IBM to measure mainframe robusteness

●

Reliability
–
–

Generally measured as Mean Time Between Failures (MTBF)

–
●

Probability that a system will produce correct outputs
Enhanced by features that help to avoid, detect and repair hardware faults

Availability
–

Probability that a system is operational at a given time

–

Generally measured as a percentage of downtime per a period of time
●

Examples:
–
–
–
–

Open Source Group – Silicon Valley

99.9% (“three nines”) means 3.65 days unavailable per year
99.999% (“five nines”) means 5.26 minutes of downtime per year
Minimal down-time for service and repair.
Detect and correct hardware faults as opposed to detect and repair

© 2013 SAMSUNG Electronics
Co.
What is RAS (2)
●

Serviceability (or maintainability)
–

Simplicity and speed with which a system can be repaired or maintained

–

Generally measured on Mean Time Between Repair

–

Can be increased with redundant parts, and higher support grade (24/7/365)

Open Source Group – Silicon Valley

© 2013 SAMSUNG Electronics
Co.
Improving RAS (1)
●

In order to improve RAS, both IT services and hardware require improvements

●

Examples of hardware measures
–

CPU – to detect errors at instruction execution and L1/L2/L3 caches;

–

Memory – add error correction logic (ECC) to detect and correct errors;

–

I/O – add CRC checksums for tranfered data (PCIe has such feature);

–

Storage – RAID, journal file systems, checksums;

–

Power/cooling – component duplication, over-design, surge protector, UPS

–

System – hot swap of components, predictive failure analysis, partitioning of
system components, virtual machines running on redundant servers,
clustering, dynamic software update, independent CPU for RAS

–

RAS servers have features to hot add/replace/remove I/O cards that reduce

–

down-time for adding new hardware. Replacing failing I/O cards based on

–

PCIe AER features.

–

memory mirroring and active/ative and active/standby comfigurations that
reduce down-time

Open Source Group – Silicon Valley

© 2013 SAMSUNG Electronics
Co.
Improving RAS (2)
●

Examples of IT measures
–

●

24x7x365 days on-site support; low latency support from their vendors

Usage of Virtual Machines
–
–

●

vm migration with minimal application down-time
Cloud computing

Predictive analysis
–

Hardware/OS should provide data to detect systems/components
degradation

–

It should have tools to analyze and (hot)replace those degraded components

Open Source Group – Silicon Valley

© 2013 SAMSUNG Electronics
Co.
RAS features on Linux (1)
●

Storage errors reported is supported since early versions, as RAID/SAS/SAN
controllers/drivers offer measurements.
–

●

There are also userspace tools to manage it.

Machine Check Architecture – MCA
–

CPU errors are provided on x86 machines since Pentium 4

–

Depending on the processor, it can also provide memory and bus errors

–

Kernel implements it at mcelog subsystem
●

–

Fatal errors produce panic() and are reported at console;

At userspace, the mcelog tool reads the corrected/non-fatal error data from time
to time and reports at console
●

The Kernel-userspace API is obscure: userspace receives a dump of a series of
registers;
–
–
–
–

Open Source Group – Silicon Valley

Decoding those errors are CPU-specific;
Kernel decodes those errors for fatal errors;
The userspace tool decodes those errors for non-fatal/corrected ones
Errors are reported also via a kernel trace event;

© 2013 SAMSUNG Electronics
Co.
RAS features on Linux (2)
●

EDAC (Error Detection and Correction) subsystem
–

Provides a way to report errors detected by memory controllers to userspace;

–

Some (old) drivers also report PCI errors via EDAC;

–

Kernel decodes the error into the DIMM labels affected by an error;
●

●

–

Errors are reported at console and via a kernel trace event
The association between the memory architecture and DIMM is done via
some files that are loaded by an userspace tool (edac-utils or rasdaemon)

Most drivers talk directly with the memory controller (MC)
●

●

That provides a more reliable error report
BIOS data is not very reliable: on several cases, the same BIOS is used on
different machines
The DMI BIOS tables may contain the wrong DIMM labels
Race conditions may happen on BIOS that also collect error data
–

●

–

There's one driver (ghes_edac) on Kernel 3.9+ that get errors from BIOS
●

“firmware first” mode: BIOS tell OS to not talk with the MC directly

Open Source Group – Silicon Valley

© 2013 SAMSUNG Electronics
Co.
RAS features on Linux (3)
●

PCIe AER (Advanced Error Reporting)
–
–

AER logs data at console;

–
●

Some PCIe hardware provide ways to get AER error reports on OS;
It also reports error via a kernel trace event

Userspace tools:
–

mcelog – collects and decodes MCA error events on x86;

–

edac-utils – fills DIMM labels data and summarizes memory errors;

–

rasdaemon
●

collects errors via kernel trace events from several sources:
MCA, EDAC and PCIe AER events
Fills DIMM labels data
–

●

●

Store error data into a persistent database (sqllite3)

●

Allow to latter query/summarize errors

●

Use new resources available on Kernel 3.10

Open Source Group – Silicon Valley

© 2013 SAMSUNG Electronics
Co.
Typical D-RAM implementation
Bank 7
Bank 6
Bank 5
Bank 4
Bank 3
Bank 2
Bank 1
Bank 0

Rank 1

DRAM Memory Matrix

Bank n

Row Decoder

Column Decoder

DRAM
Memory
Matrix
Source: http://lwn.net/Articles/250967/

A DIMM can have 1, 2 or 4 ranks
Open Source Group – Silicon Valley

© 2013 SAMSUNG Electronics
Co.
Memory Arrangements on PC

IBM PC original architecture

Classic server (most common) architecture

NOTE: Nehalem-EX has an additional buffer chip
between the RAM and CPU, called Intel SMB
(Intel Scalable Memory Buffer)
●

●

AMD-64 and newer Intel CPU architecture
(Nehalem, Sandy Bridge and upcoming)

●

It means that the CPU memory controller doesn't
see the DIMMs directly
This is to improve performance when there are
lots of CPU sockets (-EX machines)
Only BIOS knows how the memory is organized
on Nehalem-EX

Images took from: http://lwn.net/Articles/250967/
Open Source Group – Silicon Valley

© 2013 SAMSUNG Electronics
Co.
Evolution of RAS on Kernel (1)
●

Before Kernel 2.6.32
–
–

On EDAC, memory controllers are assumed to be Rank-based

–
●

EDAC reports errors via dmesg
mcelog reports error via its own interface only

Kernel 2.6.32
–

●

Added kernel trace events on MCE;

Kernel 3.5
–

EDAC/HERM patches added support for modern memory architectures
●

Modern Intel CPUs/MCs proper support (2002 and upper Intel systems)
Memory controllers are DIMM-based,
– Memory controllers can be grouped in branches (FB-DIMM)
Added kernel trace events for EDAC;
–

●

Open Source Group – Silicon Valley

© 2013 SAMSUNG Electronics
Co.
Evolution of RAS on Kernel (2)
●

Kernel 3.9
–
–

●

Added firmware first EDAC driver (ghes_edac);
Added trace events for PCIe AER;

Kernel 3.10 brings a series of new features useful for RAS events tracing:
–
–

Added blocking functionality to trace_pipe_raw;

–
●

Allow to create independent tracing facility for each process using traces;
Added “uptime” clock reference for tracing events;

While the rasdaemon tool works with kernels below 3.10, it is optimized to use
those new features found on Kernel 3.10.

Open Source Group – Silicon Valley

© 2013 SAMSUNG Electronics
Co.
Firmware First x Hardware First
●

Hardware-first approach
–

Errors come directly from the hardware

–

BIOS doesn't handle it
●

●

–
●

It is faster
Can help to avoid long SMI interrupts

Require a deep knowledge on the hardware

Firmware-first approach
–

BIOS and/or dedicated CPUs collect errors

–

OS doesn't need to know deeply the hardware

–

BIOS can mask/group errors, apply proprietary algorithms, avoid spurious
report

–

Unfortunately, current ACPI API doesn't expose the memory slot label, with
makes harder to be used by the system admin

Open Source Group – Silicon Valley

© 2013 SAMSUNG Electronics
Co.
RASDAEMON (1)
●

It is a new tool
–

Currently, provided on Fedora 18, Fedora 19 and rawhide

–

Has:
●

●

A tool to configure DIMMs and do RAS reports (ras-mc-ctl)

●

–

A daemon that waits for kernel trace events (rasdaemon)
Some contrib tools to test EDAC and to fake inject errors

Example:
●

Dell T620 with 2 Sandy Bridge-EP Xeon CPUs (E5-2670)

●

2 8GB dual-rank DIMMs (Samsung M393B1K70DH0-YK0)

●

Driver: sb-edac

ras-mc-ctl –-layout
+-----------------------------------------------------------------------------------------------+
|
mc0
|
mc1
|
| channel0 | channel1 | channel2 | channel3 | channel0 | channel1 | channel2 | channel3 |
-------+-----------------------------------------------------------------------------------------------+
slot2: |
0 MB |
0 MB |
0 MB |
0 MB |
0 MB |
0 MB |
0 MB |
0 MB |
slot1: |
0 MB |
0 MB |
0 MB |
0 MB |
0 MB |
0 MB |
0 MB |
0 MB |
slot0: | 8192 MB |
0 MB |
0 MB |
0 MB | 8192 MB |
0 MB |
0 MB |
0 MB |
-------+-----------------------------------------------------------------------------------------------+

Open Source Group – Silicon Valley

© 2013 SAMSUNG Electronics
Co.
RASDAEMON (2)
$ ras-mc-ctl --print-labels
LOCATION
mc0 channel 0 slot 0

mc1 channel 0 slot 0

CONFIGURED LABEL
DIMM_A1
DIMM_A2
DIMM_A3
DIMM_A4
DIMM_A5
DIMM_A6
DIMM_A7
DIMM_A8
DIMM_A9
DIMM_A10
DIMM_A11
DIMM_A12
DIMM_B1
DIMM_B2
DIMM_B3
DIMM_B4
DIMM_B5
DIMM_B6
DIMM_B7
DIMM_B8
DIMM_B9
DIMM_B10
DIMM_B11
DIMM_B12

SYSFS CONTENTS
CPU_SrcID#0_Channel#0_DIMM#0
0:0:1 missing
0:0:2 missing
0:0:3 missing
0:1:0 missing
0:1:1 missing
0:1:2 missing
0:1:3 missing
0:2:0 missing
0:2:1 missing
0:2:2 missing
0:2:3 missing
CPU_SrcID#1_Channel#0_DIMM#0
1:0:1 missing
1:0:2 missing
1:0:3 missing
1:1:0 missing
1:1:1 missing
1:1:2 missing
1:1:3 missing
1:2:0 missing
1:2:1 missing
1:2:2 missing
1:2:3 missing

# ras-mc-ctl --register-labels

Open Source Group – Silicon Valley

© 2013 SAMSUNG Electronics
Co.
RASDAEMON (3)
$ util/ras-mc-ctl --print-labels
LOCATION
mc0 channel 0 slot 0

mc1 channel 0 slot 0

Open Source Group – Silicon Valley

CONFIGURED LABEL
DIMM_A1
DIMM_A2
DIMM_A3
DIMM_A4
DIMM_A5
DIMM_A6
DIMM_A7
DIMM_A8
DIMM_A9
DIMM_A10
DIMM_A11
DIMM_A12
DIMM_B1
DIMM_B2
DIMM_B3
DIMM_B4
DIMM_B5
DIMM_B6
DIMM_B7
DIMM_B8
DIMM_B9
DIMM_B10
DIMM_B11
DIMM_B12

SYSFS CONTENTS
DIMM_A1
0:0:1 missing
0:0:2 missing
0:0:3 missing
0:1:0 missing
0:1:1 missing
0:1:2 missing
0:1:3 missing
0:2:0 missing
0:2:1 missing
0:2:2 missing
0:2:3 missing
DIMM_B1
1:0:1 missing
1:0:2 missing
1:0:3 missing
1:1:0 missing
1:1:1 missing
1:1:2 missing
1:1:3 missing
1:2:0 missing
1:2:1 missing
1:2:2 missing
1:2:3 missing

© 2013 SAMSUNG Electronics
Co.
RASDAEMON(4)
# rasdaemon -r -f
overriding event (931) ras:mc_event with new print handler
rasdaemon: ras:mc_event event enabled
rasdaemon: Enabled event ras:mc_event
overriding event (847) ras:aer_event with new print handler
rasdaemon: ras:aer_event event enabled
rasdaemon: Enabled event ras:aer_event
overriding event (56) mce:mce_record with new print handler
rasdaemon: mce:mce_record event enabled
rasdaemon: Enabled event mce:mce_record
rasdaemon: Listening to events for cpus 0 to 31
Calling ras_mc_event_opendb()
rasdaemon: cpu 0: Recording events at /var/lib/rasdaemon/ras-mc_event.db
cpu 12:rasdaemon: mc_event store: 0x19c6968
rasdaemon: register inserted at db
<...>-2742 [732433178] 2507.782000: mc_event:
2013-08-15 19:58:50 -0300 1
Corrected error: FAKE ERROR on DIMM_A1 (mc: 0 location: 0:0:0 grain: 7 for EDAC testing only)
cpu 12:rasdaemon: mc_event store: 0x19c6968
<...>-2742 [732433178] 2507.864000: mc_event:
2013-08-15 19:58:50 -0300 1
Corrected error: FAKE ERROR on DIMM_B1 (mc: 1 location: 0:0:0 grain: 7 for EDAC testing only)
cpu 12:rasdaemon: mc_event store: 0x19c6968
# ras-mc-ctl --summary
Memory controller events summary:
Corrected on DIMM Label(s): 'DIMM_A1' location: 0:0:0:0 errors: 1
Corrected on DIMM Label(s): 'DIMM_B1' location: 1:0:0:0 errors: 1

Open Source Group – Silicon Valley

© 2013 SAMSUNG Electronics
Co.
Thank you.
Questions?

Open Source Group – Silicon Valley

© 2013 SAMSUNG Electronics
Co.

More Related Content

What's hot

BKK16-317 How to generate power models for EAS and IPA
BKK16-317 How to generate power models for EAS and IPABKK16-317 How to generate power models for EAS and IPA
BKK16-317 How to generate power models for EAS and IPA
Linaro
 
spinlock.pdf
spinlock.pdfspinlock.pdf
spinlock.pdf
Adrian Huang
 
SFO15-TR9: PSCI, ACPI (and UEFI to boot)
SFO15-TR9: PSCI, ACPI (and UEFI to boot)SFO15-TR9: PSCI, ACPI (and UEFI to boot)
SFO15-TR9: PSCI, ACPI (and UEFI to boot)
Linaro
 
MemVerge: The Software Stack for CXL Environments
MemVerge: The Software Stack for CXL EnvironmentsMemVerge: The Software Stack for CXL Environments
MemVerge: The Software Stack for CXL Environments
Memory Fabric Forum
 
Reliability, Availability, and Serviceability (RAS) on ARM64 status - SFO17-203
Reliability, Availability, and Serviceability (RAS) on ARM64 status - SFO17-203Reliability, Availability, and Serviceability (RAS) on ARM64 status - SFO17-203
Reliability, Availability, and Serviceability (RAS) on ARM64 status - SFO17-203
Linaro
 
Page cache in Linux kernel
Page cache in Linux kernelPage cache in Linux kernel
Page cache in Linux kernel
Adrian Huang
 
Trusted firmware deep_dive_v1.0_
Trusted firmware deep_dive_v1.0_Trusted firmware deep_dive_v1.0_
Trusted firmware deep_dive_v1.0_
Linaro
 
Linux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBLinux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKB
shimosawa
 
Physical Memory Management.pdf
Physical Memory Management.pdfPhysical Memory Management.pdf
Physical Memory Management.pdf
Adrian Huang
 
Virtualization Support in ARMv8+
Virtualization Support in ARMv8+Virtualization Support in ARMv8+
Virtualization Support in ARMv8+
Aananth C N
 
HKG15-505: Power Management interactions with OP-TEE and Trusted Firmware
HKG15-505: Power Management interactions with OP-TEE and Trusted FirmwareHKG15-505: Power Management interactions with OP-TEE and Trusted Firmware
HKG15-505: Power Management interactions with OP-TEE and Trusted Firmware
Linaro
 
LCU13: An Introduction to ARM Trusted Firmware
LCU13: An Introduction to ARM Trusted FirmwareLCU13: An Introduction to ARM Trusted Firmware
LCU13: An Introduction to ARM Trusted Firmware
Linaro
 
ACPI Debugging from Linux Kernel
ACPI Debugging from Linux KernelACPI Debugging from Linux Kernel
ACPI Debugging from Linux Kernel
SUSE Labs Taipei
 
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
Adrian Huang
 
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtKernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Anne Nicolas
 
Continguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux KernelContinguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux Kernel
Kernel TLV
 
SK hynix CXL Disaggregated Memory Solution
SK hynix CXL Disaggregated Memory SolutionSK hynix CXL Disaggregated Memory Solution
SK hynix CXL Disaggregated Memory Solution
Memory Fabric Forum
 
LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler
LCA14: LCA14-306: CPUidle & CPUfreq integration with schedulerLCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler
LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler
Linaro
 
Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)
Pankaj Suryawanshi
 
Linux Porting to a Custom Board
Linux Porting to a Custom BoardLinux Porting to a Custom Board
Linux Porting to a Custom Board
Patrick Bellasi
 

What's hot (20)

BKK16-317 How to generate power models for EAS and IPA
BKK16-317 How to generate power models for EAS and IPABKK16-317 How to generate power models for EAS and IPA
BKK16-317 How to generate power models for EAS and IPA
 
spinlock.pdf
spinlock.pdfspinlock.pdf
spinlock.pdf
 
SFO15-TR9: PSCI, ACPI (and UEFI to boot)
SFO15-TR9: PSCI, ACPI (and UEFI to boot)SFO15-TR9: PSCI, ACPI (and UEFI to boot)
SFO15-TR9: PSCI, ACPI (and UEFI to boot)
 
MemVerge: The Software Stack for CXL Environments
MemVerge: The Software Stack for CXL EnvironmentsMemVerge: The Software Stack for CXL Environments
MemVerge: The Software Stack for CXL Environments
 
Reliability, Availability, and Serviceability (RAS) on ARM64 status - SFO17-203
Reliability, Availability, and Serviceability (RAS) on ARM64 status - SFO17-203Reliability, Availability, and Serviceability (RAS) on ARM64 status - SFO17-203
Reliability, Availability, and Serviceability (RAS) on ARM64 status - SFO17-203
 
Page cache in Linux kernel
Page cache in Linux kernelPage cache in Linux kernel
Page cache in Linux kernel
 
Trusted firmware deep_dive_v1.0_
Trusted firmware deep_dive_v1.0_Trusted firmware deep_dive_v1.0_
Trusted firmware deep_dive_v1.0_
 
Linux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBLinux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKB
 
Physical Memory Management.pdf
Physical Memory Management.pdfPhysical Memory Management.pdf
Physical Memory Management.pdf
 
Virtualization Support in ARMv8+
Virtualization Support in ARMv8+Virtualization Support in ARMv8+
Virtualization Support in ARMv8+
 
HKG15-505: Power Management interactions with OP-TEE and Trusted Firmware
HKG15-505: Power Management interactions with OP-TEE and Trusted FirmwareHKG15-505: Power Management interactions with OP-TEE and Trusted Firmware
HKG15-505: Power Management interactions with OP-TEE and Trusted Firmware
 
LCU13: An Introduction to ARM Trusted Firmware
LCU13: An Introduction to ARM Trusted FirmwareLCU13: An Introduction to ARM Trusted Firmware
LCU13: An Introduction to ARM Trusted Firmware
 
ACPI Debugging from Linux Kernel
ACPI Debugging from Linux KernelACPI Debugging from Linux Kernel
ACPI Debugging from Linux Kernel
 
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
 
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtKernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
 
Continguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux KernelContinguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux Kernel
 
SK hynix CXL Disaggregated Memory Solution
SK hynix CXL Disaggregated Memory SolutionSK hynix CXL Disaggregated Memory Solution
SK hynix CXL Disaggregated Memory Solution
 
LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler
LCA14: LCA14-306: CPUidle & CPUfreq integration with schedulerLCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler
LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler
 
Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)
 
Linux Porting to a Custom Board
Linux Porting to a Custom BoardLinux Porting to a Custom Board
Linux Porting to a Custom Board
 

Similar to Reliability, Availability and Serviceability on Linux

Windows Debugging Tools - JavaOne 2013
Windows Debugging Tools - JavaOne 2013Windows Debugging Tools - JavaOne 2013
Windows Debugging Tools - JavaOne 2013
MattKilner
 
UNIT 1 ERTS-1.pptusbce18ugxy8vsxysqvyexv
UNIT 1 ERTS-1.pptusbce18ugxy8vsxysqvyexvUNIT 1 ERTS-1.pptusbce18ugxy8vsxysqvyexv
UNIT 1 ERTS-1.pptusbce18ugxy8vsxysqvyexv
eeerithanya
 
My First 100 days with an Exadata (PPT)
My First 100 days with an Exadata (PPT)My First 100 days with an Exadata (PPT)
My First 100 days with an Exadata (PPT)
Gustavo Rene Antunez
 
HKG18-116 - RAS Solutions for Arm64 Servers
HKG18-116 - RAS Solutions for Arm64 ServersHKG18-116 - RAS Solutions for Arm64 Servers
HKG18-116 - RAS Solutions for Arm64 Servers
Linaro
 
Faults inside System Software
Faults inside System SoftwareFaults inside System Software
Faults inside System Software
National Cheng Kung University
 
LCE13: Android Graphics Upstreaming
LCE13: Android Graphics UpstreamingLCE13: Android Graphics Upstreaming
LCE13: Android Graphics Upstreaming
Linaro
 
[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』
[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』
[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』
Insight Technology, Inc.
 
Introduction to Computer Hardware slides ppt
Introduction to Computer Hardware slides pptIntroduction to Computer Hardware slides ppt
Introduction to Computer Hardware slides ppt
Osama Yousaf
 
system unit and Motherboard
system unit and Motherboardsystem unit and Motherboard
system unit and Motherboard
romeodait
 
Visão geral do hardware do servidor System z e Linux on z - Concurso Mainframe
Visão geral do hardware do servidor System z e Linux on z - Concurso MainframeVisão geral do hardware do servidor System z e Linux on z - Concurso Mainframe
Visão geral do hardware do servidor System z e Linux on z - Concurso Mainframe
Anderson Bassani
 
Chs nc2 reviewer - with oral questioning 0
Chs nc2 reviewer - with oral questioning 0Chs nc2 reviewer - with oral questioning 0
Chs nc2 reviewer - with oral questioning 0
ronan213
 
Chs nc2 reviewer - with oral questioning
Chs nc2 reviewer - with oral questioningChs nc2 reviewer - with oral questioning
Chs nc2 reviewer - with oral questioning
Adolfo Nasol
 
Embedded Systems
Embedded SystemsEmbedded Systems
Embedded Systems
Benjim Thomas Mathew
 
AIX Performance Tuning Session at STU2017
AIX Performance Tuning Session at STU2017AIX Performance Tuning Session at STU2017
AIX Performance Tuning Session at STU2017
Paulo Sergio Lemes Queiroz
 
Spike yuan server ras and uefi cper final
Spike yuan  server ras and uefi cper finalSpike yuan  server ras and uefi cper final
Spike yuan server ras and uefi cper final
parth bera
 
Oracle R12 EBS Performance Tuning
Oracle R12 EBS Performance TuningOracle R12 EBS Performance Tuning
Oracle R12 EBS Performance Tuning
Scott Jenner
 
Future of Power: PureFlex and IBM i - Erik Rex
Future of Power: PureFlex and IBM i - Erik RexFuture of Power: PureFlex and IBM i - Erik Rex
Future of Power: PureFlex and IBM i - Erik Rex
IBM Danmark
 
5120224.ppt
5120224.ppt5120224.ppt
5120224.ppt
dedanndege
 
”Bare-Metal Container" presented at HPCC2016
”Bare-Metal Container" presented at HPCC2016”Bare-Metal Container" presented at HPCC2016
”Bare-Metal Container" presented at HPCC2016
Kuniyasu Suzaki
 
TSRT Crashes
TSRT CrashesTSRT Crashes
TSRT Crashes
ashiesh0007
 

Similar to Reliability, Availability and Serviceability on Linux (20)

Windows Debugging Tools - JavaOne 2013
Windows Debugging Tools - JavaOne 2013Windows Debugging Tools - JavaOne 2013
Windows Debugging Tools - JavaOne 2013
 
UNIT 1 ERTS-1.pptusbce18ugxy8vsxysqvyexv
UNIT 1 ERTS-1.pptusbce18ugxy8vsxysqvyexvUNIT 1 ERTS-1.pptusbce18ugxy8vsxysqvyexv
UNIT 1 ERTS-1.pptusbce18ugxy8vsxysqvyexv
 
My First 100 days with an Exadata (PPT)
My First 100 days with an Exadata (PPT)My First 100 days with an Exadata (PPT)
My First 100 days with an Exadata (PPT)
 
HKG18-116 - RAS Solutions for Arm64 Servers
HKG18-116 - RAS Solutions for Arm64 ServersHKG18-116 - RAS Solutions for Arm64 Servers
HKG18-116 - RAS Solutions for Arm64 Servers
 
Faults inside System Software
Faults inside System SoftwareFaults inside System Software
Faults inside System Software
 
LCE13: Android Graphics Upstreaming
LCE13: Android Graphics UpstreamingLCE13: Android Graphics Upstreaming
LCE13: Android Graphics Upstreaming
 
[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』
[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』
[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』
 
Introduction to Computer Hardware slides ppt
Introduction to Computer Hardware slides pptIntroduction to Computer Hardware slides ppt
Introduction to Computer Hardware slides ppt
 
system unit and Motherboard
system unit and Motherboardsystem unit and Motherboard
system unit and Motherboard
 
Visão geral do hardware do servidor System z e Linux on z - Concurso Mainframe
Visão geral do hardware do servidor System z e Linux on z - Concurso MainframeVisão geral do hardware do servidor System z e Linux on z - Concurso Mainframe
Visão geral do hardware do servidor System z e Linux on z - Concurso Mainframe
 
Chs nc2 reviewer - with oral questioning 0
Chs nc2 reviewer - with oral questioning 0Chs nc2 reviewer - with oral questioning 0
Chs nc2 reviewer - with oral questioning 0
 
Chs nc2 reviewer - with oral questioning
Chs nc2 reviewer - with oral questioningChs nc2 reviewer - with oral questioning
Chs nc2 reviewer - with oral questioning
 
Embedded Systems
Embedded SystemsEmbedded Systems
Embedded Systems
 
AIX Performance Tuning Session at STU2017
AIX Performance Tuning Session at STU2017AIX Performance Tuning Session at STU2017
AIX Performance Tuning Session at STU2017
 
Spike yuan server ras and uefi cper final
Spike yuan  server ras and uefi cper finalSpike yuan  server ras and uefi cper final
Spike yuan server ras and uefi cper final
 
Oracle R12 EBS Performance Tuning
Oracle R12 EBS Performance TuningOracle R12 EBS Performance Tuning
Oracle R12 EBS Performance Tuning
 
Future of Power: PureFlex and IBM i - Erik Rex
Future of Power: PureFlex and IBM i - Erik RexFuture of Power: PureFlex and IBM i - Erik Rex
Future of Power: PureFlex and IBM i - Erik Rex
 
5120224.ppt
5120224.ppt5120224.ppt
5120224.ppt
 
”Bare-Metal Container" presented at HPCC2016
”Bare-Metal Container" presented at HPCC2016”Bare-Metal Container" presented at HPCC2016
”Bare-Metal Container" presented at HPCC2016
 
TSRT Crashes
TSRT CrashesTSRT Crashes
TSRT Crashes
 

More from Samsung Open Source Group

The Complex IoT Equation (and FLOSS solutions)
The Complex IoT Equation (and FLOSS solutions)The Complex IoT Equation (and FLOSS solutions)
The Complex IoT Equation (and FLOSS solutions)
Samsung Open Source Group
 
Easy IoT with JavaScript
Easy IoT with JavaScriptEasy IoT with JavaScript
Easy IoT with JavaScript
Samsung Open Source Group
 
Spawny: A New Approach to Logins
Spawny: A New Approach to LoginsSpawny: A New Approach to Logins
Spawny: A New Approach to Logins
Samsung Open Source Group
 
Rapid SPi Device Driver Development over USB
Rapid SPi Device Driver Development over USBRapid SPi Device Driver Development over USB
Rapid SPi Device Driver Development over USB
Samsung Open Source Group
 
Tizen RT: A Lightweight RTOS Platform for Low-End IoT Devices
Tizen RT: A Lightweight RTOS Platform for Low-End IoT DevicesTizen RT: A Lightweight RTOS Platform for Low-End IoT Devices
Tizen RT: A Lightweight RTOS Platform for Low-End IoT Devices
Samsung Open Source Group
 
IoTivity: Smart Home to Automotive and Beyond
IoTivity: Smart Home to Automotive and BeyondIoTivity: Smart Home to Automotive and Beyond
IoTivity: Smart Home to Automotive and Beyond
Samsung Open Source Group
 
IoTivity for Automotive: meta-ocf-automotive tutorial
IoTivity for Automotive: meta-ocf-automotive tutorialIoTivity for Automotive: meta-ocf-automotive tutorial
IoTivity for Automotive: meta-ocf-automotive tutorial
Samsung Open Source Group
 
GENIVI + OCF Cooperation
GENIVI + OCF CooperationGENIVI + OCF Cooperation
GENIVI + OCF Cooperation
Samsung Open Source Group
 
Framework for IoT Interoperability
Framework for IoT InteroperabilityFramework for IoT Interoperability
Framework for IoT Interoperability
Samsung Open Source Group
 
Open Source Metrics to Inform Corporate Strategy
Open Source Metrics to Inform Corporate StrategyOpen Source Metrics to Inform Corporate Strategy
Open Source Metrics to Inform Corporate Strategy
Samsung Open Source Group
 
IoTivity for Automotive IoT Interoperability
IoTivity for Automotive IoT InteroperabilityIoTivity for Automotive IoT Interoperability
IoTivity for Automotive IoT Interoperability
Samsung Open Source Group
 
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Thin...
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Thin...JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Thin...
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Thin...
Samsung Open Source Group
 
Adding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux Device
Adding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux DeviceAdding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux Device
Adding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux Device
Samsung Open Source Group
 
IoTivity: From Devices to the Cloud
IoTivity: From Devices to the CloudIoTivity: From Devices to the Cloud
IoTivity: From Devices to the Cloud
Samsung Open Source Group
 
SOSCON 2016 JerryScript
SOSCON 2016 JerryScriptSOSCON 2016 JerryScript
SOSCON 2016 JerryScript
Samsung Open Source Group
 
IoT: From Arduino Microcontrollers to Tizen Products using IoTivity
IoT: From Arduino Microcontrollers to Tizen Products using IoTivityIoT: From Arduino Microcontrollers to Tizen Products using IoTivity
IoT: From Arduino Microcontrollers to Tizen Products using IoTivity
Samsung Open Source Group
 
Run Your Own 6LoWPAN Based IoT Network
Run Your Own 6LoWPAN Based IoT NetworkRun Your Own 6LoWPAN Based IoT Network
Run Your Own 6LoWPAN Based IoT Network
Samsung Open Source Group
 
Practical Guide to Run an IEEE 802.15.4 Network with 6LoWPAN Under Linux
Practical Guide to Run an IEEE 802.15.4 Network with 6LoWPAN Under LinuxPractical Guide to Run an IEEE 802.15.4 Network with 6LoWPAN Under Linux
Practical Guide to Run an IEEE 802.15.4 Network with 6LoWPAN Under Linux
Samsung Open Source Group
 
IoTivity Tutorial: Prototyping IoT Devices on GNU/Linux
IoTivity Tutorial: Prototyping IoT Devices on GNU/LinuxIoTivity Tutorial: Prototyping IoT Devices on GNU/Linux
IoTivity Tutorial: Prototyping IoT Devices on GNU/Linux
Samsung Open Source Group
 
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Things
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of ThingsJerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Things
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Things
Samsung Open Source Group
 

More from Samsung Open Source Group (20)

The Complex IoT Equation (and FLOSS solutions)
The Complex IoT Equation (and FLOSS solutions)The Complex IoT Equation (and FLOSS solutions)
The Complex IoT Equation (and FLOSS solutions)
 
Easy IoT with JavaScript
Easy IoT with JavaScriptEasy IoT with JavaScript
Easy IoT with JavaScript
 
Spawny: A New Approach to Logins
Spawny: A New Approach to LoginsSpawny: A New Approach to Logins
Spawny: A New Approach to Logins
 
Rapid SPi Device Driver Development over USB
Rapid SPi Device Driver Development over USBRapid SPi Device Driver Development over USB
Rapid SPi Device Driver Development over USB
 
Tizen RT: A Lightweight RTOS Platform for Low-End IoT Devices
Tizen RT: A Lightweight RTOS Platform for Low-End IoT DevicesTizen RT: A Lightweight RTOS Platform for Low-End IoT Devices
Tizen RT: A Lightweight RTOS Platform for Low-End IoT Devices
 
IoTivity: Smart Home to Automotive and Beyond
IoTivity: Smart Home to Automotive and BeyondIoTivity: Smart Home to Automotive and Beyond
IoTivity: Smart Home to Automotive and Beyond
 
IoTivity for Automotive: meta-ocf-automotive tutorial
IoTivity for Automotive: meta-ocf-automotive tutorialIoTivity for Automotive: meta-ocf-automotive tutorial
IoTivity for Automotive: meta-ocf-automotive tutorial
 
GENIVI + OCF Cooperation
GENIVI + OCF CooperationGENIVI + OCF Cooperation
GENIVI + OCF Cooperation
 
Framework for IoT Interoperability
Framework for IoT InteroperabilityFramework for IoT Interoperability
Framework for IoT Interoperability
 
Open Source Metrics to Inform Corporate Strategy
Open Source Metrics to Inform Corporate StrategyOpen Source Metrics to Inform Corporate Strategy
Open Source Metrics to Inform Corporate Strategy
 
IoTivity for Automotive IoT Interoperability
IoTivity for Automotive IoT InteroperabilityIoTivity for Automotive IoT Interoperability
IoTivity for Automotive IoT Interoperability
 
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Thin...
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Thin...JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Thin...
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Thin...
 
Adding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux Device
Adding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux DeviceAdding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux Device
Adding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux Device
 
IoTivity: From Devices to the Cloud
IoTivity: From Devices to the CloudIoTivity: From Devices to the Cloud
IoTivity: From Devices to the Cloud
 
SOSCON 2016 JerryScript
SOSCON 2016 JerryScriptSOSCON 2016 JerryScript
SOSCON 2016 JerryScript
 
IoT: From Arduino Microcontrollers to Tizen Products using IoTivity
IoT: From Arduino Microcontrollers to Tizen Products using IoTivityIoT: From Arduino Microcontrollers to Tizen Products using IoTivity
IoT: From Arduino Microcontrollers to Tizen Products using IoTivity
 
Run Your Own 6LoWPAN Based IoT Network
Run Your Own 6LoWPAN Based IoT NetworkRun Your Own 6LoWPAN Based IoT Network
Run Your Own 6LoWPAN Based IoT Network
 
Practical Guide to Run an IEEE 802.15.4 Network with 6LoWPAN Under Linux
Practical Guide to Run an IEEE 802.15.4 Network with 6LoWPAN Under LinuxPractical Guide to Run an IEEE 802.15.4 Network with 6LoWPAN Under Linux
Practical Guide to Run an IEEE 802.15.4 Network with 6LoWPAN Under Linux
 
IoTivity Tutorial: Prototyping IoT Devices on GNU/Linux
IoTivity Tutorial: Prototyping IoT Devices on GNU/LinuxIoTivity Tutorial: Prototyping IoT Devices on GNU/Linux
IoTivity Tutorial: Prototyping IoT Devices on GNU/Linux
 
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Things
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of ThingsJerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Things
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Things
 

Recently uploaded

Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Muhammad Ali
 
The importance of Quality Assurance for ICT Standardization
The importance of Quality Assurance for ICT StandardizationThe importance of Quality Assurance for ICT Standardization
The importance of Quality Assurance for ICT Standardization
Axel Rennoch
 
Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024
aakash malhotra
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
KAMAL CHOUDHARY
 
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes..."Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
Anant Gupta
 
Data Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining DataData Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining Data
Safe Software
 
CiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.pptCiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.ppt
moinahousna
 
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
maigasapphire
 
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptxDublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Kunal Gupta
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
BrainSell Technologies
 
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Torry Harris
 
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
aslasdfmkhan4750
 
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
Edge AI and Vision Alliance
 
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and OllamaTirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Zilliz
 
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python CodebaseEuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
Jimmy Lai
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
huseindihon
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
Adam Dunkels
 
Google I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged SlidesGoogle I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged Slides
Google Developer Group - Harare
 
Types of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technologyTypes of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technology
ldtexsolbl
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
SynapseIndia
 

Recently uploaded (20)

Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
 
The importance of Quality Assurance for ICT Standardization
The importance of Quality Assurance for ICT StandardizationThe importance of Quality Assurance for ICT Standardization
The importance of Quality Assurance for ICT Standardization
 
Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
 
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes..."Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
 
Data Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining DataData Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining Data
 
CiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.pptCiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.ppt
 
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
 
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptxDublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
 
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
 
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
 
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
 
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and OllamaTirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
 
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python CodebaseEuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
 
Google I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged SlidesGoogle I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged Slides
 
Types of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technologyTypes of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technology
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
 

Reliability, Availability and Serviceability on Linux

  • 1. Reliability, Availability and Serviceability on Linux Mauro Carvalho Chehab Linux Kernel Expert Samsung Open Source Group Sep 16, 2013 Open Source Group – Silicon Valley Not to be used for commercial purpose without getting permission All information, opinions and ideas herein are exclusively the author's own opinion © 2013 SAMSUNG Electronics Co.
  • 2. What is RAS (1) ● Used originally by IBM to measure mainframe robusteness ● Reliability – – Generally measured as Mean Time Between Failures (MTBF) – ● Probability that a system will produce correct outputs Enhanced by features that help to avoid, detect and repair hardware faults Availability – Probability that a system is operational at a given time – Generally measured as a percentage of downtime per a period of time ● Examples: – – – – Open Source Group – Silicon Valley 99.9% (“three nines”) means 3.65 days unavailable per year 99.999% (“five nines”) means 5.26 minutes of downtime per year Minimal down-time for service and repair. Detect and correct hardware faults as opposed to detect and repair © 2013 SAMSUNG Electronics Co.
  • 3. What is RAS (2) ● Serviceability (or maintainability) – Simplicity and speed with which a system can be repaired or maintained – Generally measured on Mean Time Between Repair – Can be increased with redundant parts, and higher support grade (24/7/365) Open Source Group – Silicon Valley © 2013 SAMSUNG Electronics Co.
  • 4. Improving RAS (1) ● In order to improve RAS, both IT services and hardware require improvements ● Examples of hardware measures – CPU – to detect errors at instruction execution and L1/L2/L3 caches; – Memory – add error correction logic (ECC) to detect and correct errors; – I/O – add CRC checksums for tranfered data (PCIe has such feature); – Storage – RAID, journal file systems, checksums; – Power/cooling – component duplication, over-design, surge protector, UPS – System – hot swap of components, predictive failure analysis, partitioning of system components, virtual machines running on redundant servers, clustering, dynamic software update, independent CPU for RAS – RAS servers have features to hot add/replace/remove I/O cards that reduce – down-time for adding new hardware. Replacing failing I/O cards based on – PCIe AER features. – memory mirroring and active/ative and active/standby comfigurations that reduce down-time Open Source Group – Silicon Valley © 2013 SAMSUNG Electronics Co.
  • 5. Improving RAS (2) ● Examples of IT measures – ● 24x7x365 days on-site support; low latency support from their vendors Usage of Virtual Machines – – ● vm migration with minimal application down-time Cloud computing Predictive analysis – Hardware/OS should provide data to detect systems/components degradation – It should have tools to analyze and (hot)replace those degraded components Open Source Group – Silicon Valley © 2013 SAMSUNG Electronics Co.
  • 6. RAS features on Linux (1) ● Storage errors reported is supported since early versions, as RAID/SAS/SAN controllers/drivers offer measurements. – ● There are also userspace tools to manage it. Machine Check Architecture – MCA – CPU errors are provided on x86 machines since Pentium 4 – Depending on the processor, it can also provide memory and bus errors – Kernel implements it at mcelog subsystem ● – Fatal errors produce panic() and are reported at console; At userspace, the mcelog tool reads the corrected/non-fatal error data from time to time and reports at console ● The Kernel-userspace API is obscure: userspace receives a dump of a series of registers; – – – – Open Source Group – Silicon Valley Decoding those errors are CPU-specific; Kernel decodes those errors for fatal errors; The userspace tool decodes those errors for non-fatal/corrected ones Errors are reported also via a kernel trace event; © 2013 SAMSUNG Electronics Co.
  • 7. RAS features on Linux (2) ● EDAC (Error Detection and Correction) subsystem – Provides a way to report errors detected by memory controllers to userspace; – Some (old) drivers also report PCI errors via EDAC; – Kernel decodes the error into the DIMM labels affected by an error; ● ● – Errors are reported at console and via a kernel trace event The association between the memory architecture and DIMM is done via some files that are loaded by an userspace tool (edac-utils or rasdaemon) Most drivers talk directly with the memory controller (MC) ● ● That provides a more reliable error report BIOS data is not very reliable: on several cases, the same BIOS is used on different machines The DMI BIOS tables may contain the wrong DIMM labels Race conditions may happen on BIOS that also collect error data – ● – There's one driver (ghes_edac) on Kernel 3.9+ that get errors from BIOS ● “firmware first” mode: BIOS tell OS to not talk with the MC directly Open Source Group – Silicon Valley © 2013 SAMSUNG Electronics Co.
  • 8. RAS features on Linux (3) ● PCIe AER (Advanced Error Reporting) – – AER logs data at console; – ● Some PCIe hardware provide ways to get AER error reports on OS; It also reports error via a kernel trace event Userspace tools: – mcelog – collects and decodes MCA error events on x86; – edac-utils – fills DIMM labels data and summarizes memory errors; – rasdaemon ● collects errors via kernel trace events from several sources: MCA, EDAC and PCIe AER events Fills DIMM labels data – ● ● Store error data into a persistent database (sqllite3) ● Allow to latter query/summarize errors ● Use new resources available on Kernel 3.10 Open Source Group – Silicon Valley © 2013 SAMSUNG Electronics Co.
  • 9. Typical D-RAM implementation Bank 7 Bank 6 Bank 5 Bank 4 Bank 3 Bank 2 Bank 1 Bank 0 Rank 1 DRAM Memory Matrix Bank n Row Decoder Column Decoder DRAM Memory Matrix Source: http://lwn.net/Articles/250967/ A DIMM can have 1, 2 or 4 ranks Open Source Group – Silicon Valley © 2013 SAMSUNG Electronics Co.
  • 10. Memory Arrangements on PC IBM PC original architecture Classic server (most common) architecture NOTE: Nehalem-EX has an additional buffer chip between the RAM and CPU, called Intel SMB (Intel Scalable Memory Buffer) ● ● AMD-64 and newer Intel CPU architecture (Nehalem, Sandy Bridge and upcoming) ● It means that the CPU memory controller doesn't see the DIMMs directly This is to improve performance when there are lots of CPU sockets (-EX machines) Only BIOS knows how the memory is organized on Nehalem-EX Images took from: http://lwn.net/Articles/250967/ Open Source Group – Silicon Valley © 2013 SAMSUNG Electronics Co.
  • 11. Evolution of RAS on Kernel (1) ● Before Kernel 2.6.32 – – On EDAC, memory controllers are assumed to be Rank-based – ● EDAC reports errors via dmesg mcelog reports error via its own interface only Kernel 2.6.32 – ● Added kernel trace events on MCE; Kernel 3.5 – EDAC/HERM patches added support for modern memory architectures ● Modern Intel CPUs/MCs proper support (2002 and upper Intel systems) Memory controllers are DIMM-based, – Memory controllers can be grouped in branches (FB-DIMM) Added kernel trace events for EDAC; – ● Open Source Group – Silicon Valley © 2013 SAMSUNG Electronics Co.
  • 12. Evolution of RAS on Kernel (2) ● Kernel 3.9 – – ● Added firmware first EDAC driver (ghes_edac); Added trace events for PCIe AER; Kernel 3.10 brings a series of new features useful for RAS events tracing: – – Added blocking functionality to trace_pipe_raw; – ● Allow to create independent tracing facility for each process using traces; Added “uptime” clock reference for tracing events; While the rasdaemon tool works with kernels below 3.10, it is optimized to use those new features found on Kernel 3.10. Open Source Group – Silicon Valley © 2013 SAMSUNG Electronics Co.
  • 13. Firmware First x Hardware First ● Hardware-first approach – Errors come directly from the hardware – BIOS doesn't handle it ● ● – ● It is faster Can help to avoid long SMI interrupts Require a deep knowledge on the hardware Firmware-first approach – BIOS and/or dedicated CPUs collect errors – OS doesn't need to know deeply the hardware – BIOS can mask/group errors, apply proprietary algorithms, avoid spurious report – Unfortunately, current ACPI API doesn't expose the memory slot label, with makes harder to be used by the system admin Open Source Group – Silicon Valley © 2013 SAMSUNG Electronics Co.
  • 14. RASDAEMON (1) ● It is a new tool – Currently, provided on Fedora 18, Fedora 19 and rawhide – Has: ● ● A tool to configure DIMMs and do RAS reports (ras-mc-ctl) ● – A daemon that waits for kernel trace events (rasdaemon) Some contrib tools to test EDAC and to fake inject errors Example: ● Dell T620 with 2 Sandy Bridge-EP Xeon CPUs (E5-2670) ● 2 8GB dual-rank DIMMs (Samsung M393B1K70DH0-YK0) ● Driver: sb-edac ras-mc-ctl –-layout +-----------------------------------------------------------------------------------------------+ | mc0 | mc1 | | channel0 | channel1 | channel2 | channel3 | channel0 | channel1 | channel2 | channel3 | -------+-----------------------------------------------------------------------------------------------+ slot2: | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | slot1: | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | slot0: | 8192 MB | 0 MB | 0 MB | 0 MB | 8192 MB | 0 MB | 0 MB | 0 MB | -------+-----------------------------------------------------------------------------------------------+ Open Source Group – Silicon Valley © 2013 SAMSUNG Electronics Co.
  • 15. RASDAEMON (2) $ ras-mc-ctl --print-labels LOCATION mc0 channel 0 slot 0 mc1 channel 0 slot 0 CONFIGURED LABEL DIMM_A1 DIMM_A2 DIMM_A3 DIMM_A4 DIMM_A5 DIMM_A6 DIMM_A7 DIMM_A8 DIMM_A9 DIMM_A10 DIMM_A11 DIMM_A12 DIMM_B1 DIMM_B2 DIMM_B3 DIMM_B4 DIMM_B5 DIMM_B6 DIMM_B7 DIMM_B8 DIMM_B9 DIMM_B10 DIMM_B11 DIMM_B12 SYSFS CONTENTS CPU_SrcID#0_Channel#0_DIMM#0 0:0:1 missing 0:0:2 missing 0:0:3 missing 0:1:0 missing 0:1:1 missing 0:1:2 missing 0:1:3 missing 0:2:0 missing 0:2:1 missing 0:2:2 missing 0:2:3 missing CPU_SrcID#1_Channel#0_DIMM#0 1:0:1 missing 1:0:2 missing 1:0:3 missing 1:1:0 missing 1:1:1 missing 1:1:2 missing 1:1:3 missing 1:2:0 missing 1:2:1 missing 1:2:2 missing 1:2:3 missing # ras-mc-ctl --register-labels Open Source Group – Silicon Valley © 2013 SAMSUNG Electronics Co.
  • 16. RASDAEMON (3) $ util/ras-mc-ctl --print-labels LOCATION mc0 channel 0 slot 0 mc1 channel 0 slot 0 Open Source Group – Silicon Valley CONFIGURED LABEL DIMM_A1 DIMM_A2 DIMM_A3 DIMM_A4 DIMM_A5 DIMM_A6 DIMM_A7 DIMM_A8 DIMM_A9 DIMM_A10 DIMM_A11 DIMM_A12 DIMM_B1 DIMM_B2 DIMM_B3 DIMM_B4 DIMM_B5 DIMM_B6 DIMM_B7 DIMM_B8 DIMM_B9 DIMM_B10 DIMM_B11 DIMM_B12 SYSFS CONTENTS DIMM_A1 0:0:1 missing 0:0:2 missing 0:0:3 missing 0:1:0 missing 0:1:1 missing 0:1:2 missing 0:1:3 missing 0:2:0 missing 0:2:1 missing 0:2:2 missing 0:2:3 missing DIMM_B1 1:0:1 missing 1:0:2 missing 1:0:3 missing 1:1:0 missing 1:1:1 missing 1:1:2 missing 1:1:3 missing 1:2:0 missing 1:2:1 missing 1:2:2 missing 1:2:3 missing © 2013 SAMSUNG Electronics Co.
  • 17. RASDAEMON(4) # rasdaemon -r -f overriding event (931) ras:mc_event with new print handler rasdaemon: ras:mc_event event enabled rasdaemon: Enabled event ras:mc_event overriding event (847) ras:aer_event with new print handler rasdaemon: ras:aer_event event enabled rasdaemon: Enabled event ras:aer_event overriding event (56) mce:mce_record with new print handler rasdaemon: mce:mce_record event enabled rasdaemon: Enabled event mce:mce_record rasdaemon: Listening to events for cpus 0 to 31 Calling ras_mc_event_opendb() rasdaemon: cpu 0: Recording events at /var/lib/rasdaemon/ras-mc_event.db cpu 12:rasdaemon: mc_event store: 0x19c6968 rasdaemon: register inserted at db <...>-2742 [732433178] 2507.782000: mc_event: 2013-08-15 19:58:50 -0300 1 Corrected error: FAKE ERROR on DIMM_A1 (mc: 0 location: 0:0:0 grain: 7 for EDAC testing only) cpu 12:rasdaemon: mc_event store: 0x19c6968 <...>-2742 [732433178] 2507.864000: mc_event: 2013-08-15 19:58:50 -0300 1 Corrected error: FAKE ERROR on DIMM_B1 (mc: 1 location: 0:0:0 grain: 7 for EDAC testing only) cpu 12:rasdaemon: mc_event store: 0x19c6968 # ras-mc-ctl --summary Memory controller events summary: Corrected on DIMM Label(s): 'DIMM_A1' location: 0:0:0:0 errors: 1 Corrected on DIMM Label(s): 'DIMM_B1' location: 1:0:0:0 errors: 1 Open Source Group – Silicon Valley © 2013 SAMSUNG Electronics Co.
  • 18. Thank you. Questions? Open Source Group – Silicon Valley © 2013 SAMSUNG Electronics Co.