On an X Architecture compute node, both
hardware alerts and UEFI alerts are detected and sent to the IBM Flex System Manager.
The following diagram shows the flow of alerts from an X Architecture compute node to the IBM Flex System Manager:
Alert flow (hardware released 11/2014 or later)
To understand how alerts flow through the
Flex System chassis
when an error is detected on a
X Architecture compute node, consider
an example in which the correctable Error Correction Code (ECC) memory
error logging threshold for a
X Architecture compute node is reached.
This event is a predictive failure alert (PFA), which means that the
X Architecture compute node will
continue to function, but there could be a memory failure at some
point. When the management processor on the
X Architecture compute node detects
this problem, the following actions occur:
- The problem is logged by the management processor in the system
event log on the X Architecture compute node.
The
following example shows how the event might appear in the system event
log:
58001 - The PFA Threshold limit (correctable error logging limit) has been exceeded on DIMM number %
at address %. MCx_Status contains % and MCx_Misc contains %.
UEFI
will also detect this problem, and it will log an event in the system
log as well. It will log the same event in the event log.
58001 - The PFA Threshold limit (correctable error logging limit) has been exceeded on DIMM number %
at address %. MCx_Status contains % and MCx_Misc contains %.
- An alert is sent from the management processor on the X Architecture compute node to the CMM and posted
to the event log on the CMM.
The following
example shows the error that might be posted to the
CMM event log:
58001 - The PFA Threshold limit (correctable error logging limit) has been exceeded on DIMM number %
at address %. MCx_Status contains % and MCx_Misc contains %.
- An alert is also sent from the management processor on the X Architecture compute node to the Lenovo xClarity Administrator and posted
to the event log on the Lenovo xClarity Administrator.
The following
example shows the error that might be posted to the event log on the
IBM Flex System Manager:
58001 - The PFA Threshold limit (correctable error logging limit) has been exceeded on DIMM number %
at address %. MCx_Status contains % and MCx_Misc contains %.
Note: If call home is enabled on the Lenovo xClarity Administrator, the Lenovo xClarity Administrator sends a notification
to the Support team, which includes the collected service data.
- An alert is sent from the CMM to the Lenovo xClarity Administrator. In this case,
the event is ignored because the same event was sent by the X Architecture compute node.
Alert flow (hardware released before 11/2014)
To understand how alerts flow through the
Flex System chassis
when an error is detected on a
X Architecture compute node, consider
an example in which the correctable Error Correction Code (ECC) memory
error logging threshold for a
X Architecture compute node is reached.
This event is a predictive failure alert (PFA), which means that the
X Architecture compute node will
continue to function, but there could be a memory failure at some
point. When the management processor on the
X Architecture compute node detects
this problem, the following actions occur:
- The problem is logged by the management processor in the system
event log on the X Architecture compute node.
The
following example shows how the event might appear in the system event
log:
806F050C - 2001xxxx - Memory Logging Limit Reached for DIMM_number on MemoryElementName.
UEFI
will also detect this problem, and it will log an event in the system
log as well. The following example shows how a UEFI diagnostic event
might look in the system event log:
E.0058001 - PFA Threshold Exceeded.
- An alert is sent from the management processor on the X Architecture compute node to the CMM and posted
to the event log on the CMM.
The following
example shows the error that might be posted to the
CMM event log:
0x77777773 - Compute_node_bay: The correctable Error Correct Code (ECC) memory logging threshold for the specified blade server was reached. The system will continue to run. Refer to the steps in the user response before replacing a DIMM.
In addition, the IMM message is listed in the details of this
event to enable you to correlate the message on the CMM with the message
that appears in the system event log for the X Architecture compute node.
- An alert is also sent from the management processor on the X Architecture compute node to the Lenovo xClarity Administrator and posted
to the event log on the Lenovo xClarity Administrator.
The following
example shows the error that might be posted to the event log on the
Lenovo xClarity Administrator:
806F050C - 2001xxxx - Memory Logging Limit Reached for DIMM 1 on MemoryElementName.
Note: If call home is enabled
on the Lenovo xClarity Administrator, the Lenovo xClarity Administrator sends a notification to the Support team, which includes the collected
service data.
- An alert is sent from the CMM to the Lenovo xClarity Administrator . In this
case, the event is ignored because the same event was sent by the X Architecture compute node.