11002: [11002] A processor mismatch has been detected between one or more processors in the system.
[11002] A processor mismatch has been detected between one or more processors in the system.
User Response:
Complete the following steps:
- This message could occur with messages about other processor configuration problem Resolve those messages first.
- If the problem persists, ensure that matching processors are installed (e., matching option part numbers, etc)
- Verify that the processors are installed in the correct socket If not, correct that problem.
- Check the IBM support site for an applicable service bulletin or firmware update that applies to this processor error.
- Replace mismatching processor. Inspect the processor socket and replace the system board first if the socket is damaged.
11004: [11004] A processor within the system has failed the BIST.
[11004] A processor within the system has failed the BIST.
User Response:
Complete the following steps:
- If the processor or firmware was just updated, check the IBM support site for an applicable service bulletin or firmware update that applies to this processor error.
- If there are multiple processors, swap processors to move affected processor to another processor socket and retry. If problem follows the affected processor, or this is a single processor system, replace the processor. Inspect the processor socket on each processor removal and replace system board first if the processor socket is damaged or mis-aligned pins are found.
- Replace the system board.
1100C: [1100C] An uncorrectable error has been detected on processor %.
[1100C] An uncorrectable error has been detected on processor %.
User Response:
Complete the following steps:
- Check the IBM support site for an applicable service bulletin or firmware update that applies to this error.
- Reboot the system. If problem persists, contact Support.
18005: [18005] A discrepancy has been detected in the number of cores reported by one or more processor packages within the system.
[18005] A discrepancy has been detected in the number of cores reported by one or more processor packages within the system.
User Response:
Complete the following steps:
- If this is a newly installed option, ensure that matching processors are installed in the correct processor socket
- Check the IBM support site for an applicable service bulletin that applies to this processor error.
- Replace the processor. Inspect the processor socket and replace the system board first if the socket is damaged.
18006: [18006] A mismatch between the maximum allowed QPI link speed has been detected for one or more processor package
[18006] A mismatch between the maximum allowed QPI link speed has been detected for one or more processor package
User Response:
Complete the following steps:
- If this is a newly installed option, ensure that matching processors are installed in the correct processor socket
- Check the IBM support site for an applicable service bulletin or firmware update that applies to this processor error.
- Replace the processor. Inspect the processor socket and replace the system board first if the socket is damaged.
18007: [18007] A power segment mismatch has been detected for one or more processor package
[18007] A power segment mismatch has been detected for one or more processor package
User Response:
Complete the following steps:
- Processors installed do not have the same power requirements
- Ensure that all Processors have matching power requirements (such as 65, 95, or 130 Watts)
- If power requirements match, check the IBM support site for an applicable service bulletin or firmware update that applies to this processor error.
- Replace the processor. Inspect the processor socket and replace the system board first if the socket is damaged.
18009: [18009] A core speed mismatch has been detected for one or more processor package
[18009] A core speed mismatch has been detected for one or more processor package
User Response:
Complete the following steps:
- Verify that matching processors are installed in the correct processor socket Correct any mismatch issues found.
- Check the IBM support site for an applicable service bulletin or firmware update that applies to this processor error.
- Replace the processor. Inspect the processor socket and replace the system board first if the socket is damaged.
1800A: [1800A] A mismatch has been detected between the speed at which a QPI link has trained between two or more processor package
[1800A] A mismatch has been detected between the speed at which a QPI link has trained between two or more processor package
User Response:
Complete the following steps:
- Verify that the processor is a valid option that is listed as a Server Proven device for this system. If not, remove the Processor and install one listed on the Server Proven website.
- Verify that matching processors are installed in the correct processor socket Correct any mismatch found.
- Check the IBM support site for an applicable service bulletin or firmware update that applies to this processor error.
- Replace the processor. Inspect the processor socket and replace the system board first if the socket is damaged.
1800B: [1800B] A cache size mismatch has been detected for one or more processor package
[1800B] A cache size mismatch has been detected for one or more processor package
User Response:
Complete the following steps:
- Verify that matching processors are installed in the correct processor socket Correct any mismatch found.
- Check the IBM support site for an applicable service bulletin or firmware update that applies to this processor error.
- Replace the system board.
1800C: [1800C] A cache type mismatch has been detected for one or more processor package
[1800C] A cache type mismatch has been detected for one or more processor package
User Response:
Complete the following steps:
- Verify that matching processors are installed in the correct processor socket Correct any mismatch found.
- Check the IBM support site for an applicable service bulletin or firmware update that applies to this processor error.
- Replace the system board.
1800D: [1800D] A cache associativity mismatch has been detected for one or more processor package
[1800D] A cache associativity mismatch has been detected for one or more processor package
User Response:
Complete the following steps:
- Verify that matching processors are installed in the correct processor socket Correct any mismatch found.
- Check the IBM support site for an applicable service bulletin or firmware update that applies to this processor error.
- Replace the system board.
1800E: [1800E] A processor model mismatch has been detected for one or more processor package
[1800E] A processor model mismatch has been detected for one or more processor package
User Response:
Complete the following steps:
- Verify that matching processors are installed in the correct processor socket Correct any mismatch found.
- Check the IBM support site for an applicable service bulletin or firmware update that applies to this Processor error.
- Replace the system board.
1800F: [1800F] A processor family mismatch has been detected for one or more processor package
[1800F] A processor family mismatch has been detected for one or more processor package
User Response:
Complete the following steps:
- Verify that matching processors are installed in the correct processor socket Correct any mismatch found.
- Check the IBM support site for an applicable service bulletin or firmware update that applies to this processor error.
- Replace the system board.
18010: [18010] A processor stepping mismatch has been detected for one or more processor package
[18010] A processor stepping mismatch has been detected for one or more processor package
User Response:
Complete the following steps:
- Verify that matching processors are installed in the correct processor socket Correct any mismatch found.
- Check the IBM support site for an applicable service bulletin or firmware update that applies to this processor error.
- Replace the system board.
2011001: [2011001] An Uncorrected PCIe Error has Occurred at Bus % Device % Function %. The Vendor ID for the device is % and the Device ID is %.
[2011001] An Uncorrected PCIe Error has Occurred at Bus % Device % Function %. The Vendor ID for the device is % and the Device ID is %.
User Response:
Complete the following steps:
- If this compute node and/or any attached cables were recently installed, moved, serviced or upgraded,
- Reseat the adapter and any attached cable
- Reload the device driver.
- If device is not recognized, reconfiguring slot to Gen1 or Gen2 may be required. Gen1/Gen2 settings can be configured via F1 Setup -> System Settings ->
Devices and I/O Ports -> PCIe Gen1/Gen2/Gen3 Speed Selection, or the ASU Utility.
- Check the IBM support site for an applicable device driver, firmware update, or other information that applies to this error. Load the new device driver
and install any required firmware update
- If the problem persists, remove the adapter. If system reboots successfully without the adapter, replace that adapter.
- Replace the processor.
2018001: [2018001] An Uncorrected PCIe Error has Occurred at Bus % Device % Function %. The Vendor ID for the device is % and the Device ID is %.
[2018001] An Uncorrected PCIe Error has Occurred at Bus % Device % Function %. The Vendor ID for the device is % and the Device ID is %.
User Response:
Complete the following steps:
- If this compute node and/or any attached cables were recently installed, moved, serviced or upgraded:
- Reseat the adapter and any attached cable
- Reload the device driver.
- If device is not recognized, reconfiguring slot to Gen1 or Gen2 may be required. Gen1/Gen2 settings can be configured via F1 Setup -> System Settings ->
Devices and I/O Ports -> PCIe Gen1/Gen2/Gen3 Speed Selection, or the ASU Utility.
- Check the IBM support site for an applicable device driver, firmware update, or other information that applies to this error. Load the new device driver
and install any required firmware update
- If the problem persists, remove the adapter. If the system reboots successfully without the adapter, replace that adapter.
- Replace the processor.
2018002: [2018002] The device found at Bus % Device % Function % could not be configured due to resource constraint The Vendor ID for the device is % and the Device ID is %.
[2018002] The device found at Bus % Device % Function % could not be configured due to resource constraint The Vendor ID for the device is % and the Device ID is %.
User Response:
Complete the following steps:
- If this PCIe device and/or any attached cables were recently installed, moved, serviced or upgraded, reseat the adapter and any attached cable
- Check the IBM support site for any applicable service bulletin or UEFI or adapter firmware update that applies to this error.
NOTE: It may be necessary to disable unused option ROMs from UEFI F1 setup or ASU or using adapter manufacturer utilities so that adapter firmware can be updated.
- Move the adapter to a different slot. If a slot is not available or error recurs, replace the adapter.
- If the adapter was moved to a different slot and the error did not recur, verify that this is not a system limitation. Then replace the system board.
Also, if this is not the initial installation and the error persists after adapter replacement, replace the system board.
2018003: [2018003] A bad option ROM checksum was detected for the device found at Bus % Device % Function %. The Vendor ID for the device is % and the Device ID is %.
[2018003] A bad option ROM checksum was detected for the device found at Bus % Device % Function %. The Vendor ID for the device is % and the Device ID is %.
User Response:
Complete the following steps:
- If this PCIe device and/or any attached cables were recently installed, moved, serviced or upgraded, reseat the adapter and any attached cable
- Move the adapter to a different system slot, if available.
- Check the IBM support site for any applicable service bulletin or UEFI or adapter firmware update that applies to this error.
NOTE: It may be necessary to configure slot to Gen1 or to use special utility software so that adapter firmware can be upgraded. Gen1/Gen2 settings can
be configured via F1 Setup -> System Settings -> Devices and I/O Ports -> PCIe Gen1/Gen2/Gen3 Speed Selection, or the ASU Utility.
- Replace the adapter.
3020007: [3020007] A firmware fault has been detected in the UEFI image.
[3020007] A firmware fault has been detected in the UEFI image.
User Response:
Complete the following steps:
- Check the IBM support site for an applicable service bulletin or firmware update that applies to this error.
- Update the UEFI image.
- Replace the system board.
3028002: [3028002] Boot permission timeout detected.
[3028002] Boot permission timeout detected.
User Response:
Complete the following steps:
- Check CMM/IMM logs for communication errors and resolve.
- Reseat the system.
- If problem persists, contact Support.
3030007: [3030007] A firmware fault has been detected in the UEFI image.
[3030007] A firmware fault has been detected in the UEFI image.
User Response:
Complete the following steps:
- Check the IBM support site for an applicable service bulletin or firmware update that applies to this error.
- Update the UEFI image.
- Replace the system board.
3040007: [3040007] A firmware fault has been detected in the UEFI image.
[3040007] A firmware fault has been detected in the UEFI image.
User Response:
Complete the following steps:
- Check the IBM support site for an applicable service bulletin or firmware update that applies to this error.
- Update the UEFI image.
- Replace the system board.
3048005: [3048005] UEFI has booted from the backup flash bank.
[3048005] UEFI has booted from the backup flash bank.
User Response:
Complete the following steps:
Return the system to primary bank.
3048006: [3048006] UEFI has booted from the backup flash bank due to an Automatic Boot Recovery (ABR) event.
[3048006] UEFI has booted from the backup flash bank due to an Automatic Boot Recovery (ABR) event.
User Response:
Complete the following steps:
- Check the IBM support site for an applicable service bulletin or firmware update that applies to this error.
- Update the primary UEFI image.
- Replace the system board.
3050007: [3050007] A firmware fault has been detected in the UEFI image.
[3050007] A firmware fault has been detected in the UEFI image.
User Response:
Complete the following steps:
- Check the IBM support site for an applicable service bulletin or firmware update that applies to this error.
- Update the UEFI image.
- Replace the system board.
305000A: [305000A] An invalid date and time have been detected.
[305000A] An invalid date and time have been detected.
User Response:
Complete the following steps:
- Check IMM/chassis event log. This event should immediately precede 0068002 error. Resolve that event or any other battery related error
- Use F1 Setup to reset date and time. If problem returns after a system reset, replace CMOS battery.
- If problem persists then check the IBM support site for an applicable service bulletin or firmware update that applies to this error.
- Replace the system board.
3058004: [3058004] A Three Strike boot failure has occurred. The system has booted with default UEFI setting
[3058004] A Three Strike boot failure has occurred. The system has booted with default UEFI setting
User Response:
Complete the following steps:
- This event resets UEFI to the default settings for the next boot. If successful, the Setup Utility is displayed. The original UEFI settings are still present.
- If you did not intentionally trigger the reboots, check logs for probable cause.
- Undo recent system changes (settings or devices added). If there were no recent system changes, remove all options, and then remove the CMOS battery for 30 seconds to clear CMOS content Verify that the system boot Then, re-install the options one at a time to locate the problem.
- Check the IBM support site for an applicable service bulletin or firmware update that applies to this error.
- Update the UEFI firmware.
- Remove and re-install CMOS battery for 30 seconds to clear CMOS contents
- Replace the system board.
3058009: [3058009] DRIVER HEALTH PROTOCOL: Missing Configuraiton. Requires Change Settings From F1.
[3058009] DRIVER HEALTH PROTOCOL: Missing Configuraiton. Requires Change Settings From F1.
User Response:
Complete the following steps:
- Go to the Setup Utility (System Settings -> Settings -> Driver Health Status List) and find a driver/controller reporting Configuration Required statu
- Search for the driver menu from System Settings and change settings appropriately.
- Save settings and restart system.
305800A: [305800A] DRIVER HEALTH PROTOCOL: Reports 'Failed' Status Controller.
[305800A] DRIVER HEALTH PROTOCOL: Reports 'Failed' Status Controller.
User Response:
Complete the following steps:
- Reboot the system.
- If problem persists, switch to backup UEFI or update the current UEFI image.
- Replace the system board.
305800B: [305800B] DRIVER HEALTH PROTOCOL: Reports 'Reboot' Required Controller.
[305800B] DRIVER HEALTH PROTOCOL: Reports 'Reboot' Required Controller.
User Response:
Complete the following steps:
- No action required. The system will reboot at the end of POST.
- If the problem persists, switch to the backup UEFI image or update the current UEFI image.
- Replace the system board.
305800C: [305800C] DRIVER HEALTH PROTOCOL: Reports 'System Shutdown' Required Controller.
[305800C] DRIVER HEALTH PROTOCOL: Reports 'System Shutdown' Required Controller.
User Response:
Complete the following steps:
- Reboot the system.
- If problem persists, switch to the backup UEFI image or update the current UEFI image.
- Replace system board.
305800D: [305800D] DRIVER HEALTH PROTOCOL: Disconnect Controller Failed. Requires 'Reboot'.
[305800D] DRIVER HEALTH PROTOCOL: Disconnect Controller Failed. Requires 'Reboot'.
User Response:
Complete the following steps:
- Reboot the system to reconnect the controller.
- If problem persists, switch to the backup UEFI image or update the current UEFI image.
- Replace the system board.
305800E: [305800E] DRIVER HEALTH PROTOCOL: Reports Invalid Health Status Driver.
[305800E] DRIVER HEALTH PROTOCOL: Reports Invalid Health Status Driver.
User Response:
Complete the following steps:
- Reboot the system.
- If problem persists, switch to the backup UEFI image or update the current UEFI image.
- Replace the system board.
3060007: [3060007] A firmware fault has been detected in the UEFI image.
[3060007] A firmware fault has been detected in the UEFI image.
User Response:
Complete the following steps:
- Check the IBM support site for an applicable service bulletin or firmware update that applies to this error.
- Update the UEFI image.
- Replace the system board.
3070007: [3070007] A firmware fault has been detected in the UEFI image.
[3070007] A firmware fault has been detected in the UEFI image.
User Response:
Complete the following steps:
- Check the IBM support site for an applicable service bulletin or firmware update that applies to this error.
- Update the UEFI image.
- Replace the system board.
3108007: [3108007 ] The default system settings have been restored.
[3108007 ] The default system settings have been restored.
User Response:
Complete the following steps:
Check the IBM support site for an applicable service bulletin or firmware update that applies to this error.
3808000: [3808000] An IMM communication failure has occurred.
[3808000] An IMM communication failure has occurred.
User Response:
Complete the following steps:
- Reset the IMM from the CMM.
- Use the CMM to remove auxilliary power from the compute node. This will reboot the compute node.
- Check the IBM support site for an applicable service bulletin or firmware update that applies to this error.
- Update the UEFI Firmware.
- Replace the system board.
3808002: [3808002] An error occurred while saving UEFI settings to the IMM.
[3808002] An error occurred while saving UEFI settings to the IMM.
User Response:
Complete the following steps:
- Use the Setup Utility to verify and save the settings (which will recover the settings).
- Reset the IMM from the CMM.
- Use CMM to remove auxilliary power from the compute node. This will reboot the compute node.
- Check the IBM support site for an applicable service bulletin or firmware update that applies to this error.
- Update the IMM Firmware.
- Remove and re-install CMOS battery for 30 seconds to clear CMOS content
- Replace the system board.
3808003: .[3808003] Unable to retrieve the system configuration from the IMM.
.[3808003] Unable to retrieve the system configuration from the IMM.
User Response:
Complete the following steps:
- Use the Setup Utility to verify and save the settings (which will recover the settings).
- Reset the IMM from the CMM.
- Use CMM to remove auxilliary power from the compute node. This will reboot the compute node.
- Check the IBM support site for an applicable service bulletin or firmware update that applies to this error.
- Update the IMM Firmware.
- Remove and re-install CMOS battery for 30 seconds to clear CMOS content
- Replace the system board.
3808004: [3808004] The IMM System Event log (SEL) is full.
[3808004] The IMM System Event log (SEL) is full.
User Response:
Complete the following steps:
- Use the IMM Web Interface to clear the event log.
- If IMM communication is unavailable, use the Setup Utility to access the System Event Logs Menu and choose Clear IMM System Event Log and Restart Server.
3818001: [3818001] The firmware image capsule signature for the currently booted flash bank is invalid.
[3818001] The firmware image capsule signature for the currently booted flash bank is invalid.
User Response:
Complete the following steps:
- Reboot the system. Will come up on backup UEFI image. Update the primary UEFI image.
- If error does not persist no additional recovery action is required.
- If error persists, or boot is unsuccessful, replace the system board.
3818002: [3818002] The firmware image capsule signature for the non-booted flash bank is invalid.
[3818002] The firmware image capsule signature for the non-booted flash bank is invalid.
User Response:
Complete the following steps:
- Update the backup UEFI image.
- If error does not persist, no additional recovery action is required.
- If error persists, or boot is unsuccessful, replace the system board.
3818003: [3818003] The CRTM flash driver could not lock the secure flash region.
[3818003] The CRTM flash driver could not lock the secure flash region.
User Response:
Complete the following steps:
- If system failed to boot successfully, DC cycle the system.
- If system boots to F1 setup, update the UEFI image and reset bank to primary (if required). If the system boots without error, recovery is complete and no additional action is required.
- If system fails to boot, or if the firmware update attempt fails, replace the system board.
3818004: [3818004] The CRTM flash driver could not successfully flash the staging area. A failure occurred.
[3818004] The CRTM flash driver could not successfully flash the staging area. A failure occurred.
User Response:
Complete the following steps:
- Continue booting the system. If the system does not reset, manually reset the system.
- If the error is not reported on the subsequent boot, no additional recovery action is required.
- If the error persists, continue booting system and update the UEFI image.
- Replace the system board.
3818005: [3818005] The CRTM flash driver could not successfully flash the staging area. The update was aborted
[3818005] The CRTM flash driver could not successfully flash the staging area. The update was aborted
User Response:
Complete the following steps:
- Continue booting system. If system does not reset, manually reset the system.
- If the error is not reported on the subsequent boot, no additional recovery action is required.
- If the event persists, continue booting system and update the UEFI image.
- Replace the system board.
3818007: [3818007] The firmware image capsules for both flash banks could not be verified.
[3818007] The firmware image capsules for both flash banks could not be verified.
User Response:
Complete the following steps:
- If system failed to boot successfully, DC cycle system.
- If system boots to F1 setup, update the UEFI image and reset bank to primary (if required). If the system boots without error, recovery is complete and no additional action is required.
- If system fails to boot, or if the firmware update attempt fails, replace the system board.
3818009: [3818009] The TPM could not be properly initialized.
[3818009] The TPM could not be properly initialized.
User Response:
Complete the following steps:
- Check the IBM support site for an applicable service bulletin or firmware update that applies to this error.
- Reboot the system.
- If the error continues, replace the system-board assembly (see Removing the system-board assembly and Installing the system-board assembly).
3868000: [3868000] IFM: System reset performed to reset adapter
[3868000] IFM: System reset performed to reset adapter
User Response:
Complete the following steps:
Information only; no action is required.
3868001: [3868001] IFM: Reset loop avoided - Multiple resets not allowed.
[3868001] IFM: Reset loop avoided - Multiple resets not allowed.
User Response:
Complete the following steps:
- Update all firmware (including adapter firmware) to the latest level
- If problem persists, contact Support.
3868002: [3868002] BOFM: Error communicating with the IMM - BOFM may not be deployed correctly.
[3868002] BOFM: Error communicating with the IMM - BOFM may not be deployed correctly.
User Response:
Complete the following steps:
- Update all firmware (including adapter firmware) to the latest level
- If problem persists, contact Support.
3868003: [3868003] BOFM: Configuration to large for compatibility mode.
[3868003] BOFM: Configuration to large for compatibility mode.
User Response:
Complete the following steps:
Information only; no action is required.
3938002: [3938002] A boot configuration error has been detected.
[3938002] A boot configuration error has been detected.
User Response:
Complete the following steps:
- F1 Setup -> Save Setting
- Retry the configuration update.
50001: [50001] A DIMM has been disabled due to an error detected during POST.
[50001] A DIMM has been disabled due to an error detected during POST.
User Response:
Complete the following steps:
- If the DIMM was disabled because of a memory fault, follow the procedure for that event.
- If no memory fault is recorded in the logs and no DIMM connector error LEDs are lit, re-enable the DIMM through the Setup utility or the Advanced Settings Utility (ASU).
- If the problem persists, Power cycle the compute node from the management console.
- Reset the IMM to default setting
- Reset UEFI to default setting
- Update IMM and UEFI firmware.
- Replace the system board.
51003: [51003] An uncorrectable memory error was detected in DIMM slot % on rank %.
[51003] An uncorrectable memory error was detected in DIMM slot % on rank %.
User Response:
Complete the following steps:
- If the compute node has recently been installed, moved, serviced, or upgraded, verify that the DIMM is properly seated and visually verify that there is no foreign material in any DIMM connector on that memory channel. If either of these conditions is found, correct and retry with the same DIMM. (Note: The event Log might contain a recent 00580A4 event denoting detected change in DIMM population that could be related to this problem.)
- If no problem is observed on the DIMM connectors or the problem persists, replace the DIMM identified by LightPath and/or event log entry.
- If problem recurs on the same DIMM connector, replace the other DIMMs on the same memory channel.
- Check the IBM support site for an applicable service bulletin or firmware update that applies to this memory error.
- If problem recurs on the same DIMM connector, inspect connector for damage. If found, replace the system board.
- Replace the affected processor.
- Replace system board.
51006: [51006] A memory mismatch has been detected. Please verify that the memory configuration is valid.
[51006] A memory mismatch has been detected. Please verify that the memory configuration is valid.
User Response:
Complete the following steps:
- Could follow an uncorrectable memory error or failed memory test. Check the log and service that event first. DIMMs disabled by other errors or actions could
cause this event.
- Verify that the DIMMs are installed in the correct population sequence.
- Disable memory mirroring and sparing. If this action eliminates the mismatch, check the IBM Support site for information related to this problem.
- Update UEFI firmware.
- Replace the DIMM
- Replace the processor.
51009: [51009] No system memory has been detected.
[51009] No system memory has been detected.
User Response:
Complete the following steps:
- If any memory errors are logged other than this one, take actions indicated for those codes first.
- If no other memory diagnostic codes appear in the logs, verify that all DIMM connectors are enabled using the Setup utility or the Advanced Settings Utility (ASU).
- If the problem remains, shut down and remove node from chassis and physically verify that one or more DIMMs are installed and that all DIMMs are installed in the correct population sequence.
- If DIMMs are present and properly installed, check for any lit DIMM-connector LEDs, and if found, reseat those DIMM
- Reinstall node in chassis, power on node, then check logs for memory diagnostic code
- If the problem remains, replace the processor.
- If the problem remains, replace the system board.
5100B: [5100B] An unqualified DIMM serial number has been detected: serial number % found in slot % of memory card %.
[5100B] An unqualified DIMM serial number has been detected: serial number % found in slot % of memory card %.
User Response:
Complete the following steps:
- If this information event is logged in the IMM event log, the server does not have qualified memory installed.
- The memory installed may not be covered under warranty.
- Without qualified memory, speeds supported above industry standards will not be enabled.
- Please contact your Local Sales Representative or Authorized Business Partner to order qualified memory to replace the unqualified DIMM(s).
- After you install qualified memory and power up the server, check to make sure this informational event is not logged again.
58001: [58001] The PFA Threshold limit (correctable error logging limit) has been exceeded on DIMM number % at address %. MCx_Status contains % and MCx_Misc contains %.
[58001] The PFA Threshold limit (correctable error logging limit) has been exceeded on DIMM number % at address %. MCx_Status contains % and MCx_Misc contains %.
User Response:
Complete the following steps:
- If the compute node has recently been installed, moved, serviced, or upgraded, verify that the DIMM is properly seated and visually verify that there is no foreign material in any DIMM connector on that memory channel. If either of these conditions is found, correct and retry with the same DIMM. (Note: The event Log might contain a recent 00580A4 event denoting detected change in DIMM population that could be related to this problem.)
- Check the IBM support site for an applicable firmware update that applies to this memory error. The release notes will list the known problems the update addresse
- If the previous steps do not resolve the problem, at the next maintenance opportunity, replace the affected DIMM (as indicated by LightPath and/or failure log entry).
- If PFA re-occurs on the same DIMM connector, swap the other DIMMs on the same memory channel one at a time to a different memory channel or processor. If PFA
follows a moved DIMM to any DIMM connector on the different memory channel, replace the moved DIMM.
- Check the IBM support site for an applicable Service Bulletins (Service bulletins) that applies to this memory error. (Link to IBM support service bulletins)
- If problem continues to re-occur on the same DIMM connector, inspect DIMM connector for foreign material and remove, if found. If connector is damaged, replace system board.
- Remove the affected processor and inspect the processor socket pins for damaged or mis-aligned pin If damage is found or the processor is an upgrade part, replace the system board.
- Replace affected processor.
- Replace the system board.
58007: [58007] Invalid memory configuration (Unsupported DIMM Population) detected. Please verify memory configuration is valid.
[58007] Invalid memory configuration (Unsupported DIMM Population) detected. Please verify memory configuration is valid.
User Response:
Complete the following steps:
- This event could follow an uncorrectable memory error or failed memory test. Check the log and resolve that event first. DIMMs disabled by other errors or
actions could cause this event.
- Ensure that the DIMM connectors are populated in the correct sequence.
58008: [58008] A DIMM has failed the POST memory test.
[58008] A DIMM has failed the POST memory test.
User Response:
Complete the following steps:
- You must AC-cycle the system to re-enable the affected DIMM connector or re-enable manually using the Setup utility.
- If the compute node has been recently installed, serviced, moved, or upgraded, check to ensure that DIMMs are firmly seated and that
no foreign material can be seen in the DIMM connector. If either condition is observed, correct and retry with the same DIMM.
(Note: The event Log might contain a recent 00580A4 event denoting detected change in DIMM population that could be related to this problem.)
- If problem persists, replace the DIMM identified by LightPath and/or event log entry.
- If problem recurs on the same DIMM connector, swap the other DIMMs on the same memory channel across channels one at a time to a different memory channel or processor. If problem follows a moved DIMM to a different memory channel, replace that DIMM.
- Check the IBM support site for an applicable service bulletin or firmware update that applies to this memory error.
- If problem stays with the original DIMM connector, re-inspect DIMM connector for foreign material and remove, if found. If connector is damaged, replace system board.
- Remove the affected processor and inspect the processor socket pins for damaged or mis-aligned pin If damage is found, or this is an upgrade processor,
replace the system board. If there are multiple processors, swap them to move affected procesor to another processor socket and retry. If problem follows the
affected processor (or there is only one processor), replace the affected processor.
- Replace the system board.
58015: [58015] Memory spare copy initiated.
[58015] Memory spare copy initiated.
User Response:
Complete the following steps:
Information only; no action is required.
580A1: [580A1] Invalid memory configuration for Mirror Mode. Please correct memory configuration.
[580A1] Invalid memory configuration for Mirror Mode. Please correct memory configuration.
User Response:
Complete the following steps:
- If a DIMM connector error LED is lit, resolve the failure.
- Make sure that the DIMM connectors are correctly populated for mirroring mode.
580A2: [580A2] Invalid memory configuration for Sparing Mode. Please correct memory configuration.
[580A2] Invalid memory configuration for Sparing Mode. Please correct memory configuration.
User Response:
Complete the following steps:
Make sure that the DIMM connectors are correctly populated for sparing mode.
580A4: [580A4] Memory population change detected.
[580A4] Memory population change detected.
User Response:
Complete the following steps:
Check system event log for uncorrected DIMM failures and replace those DIMM
580A5: [580A5] Mirror Fail-over complete. DIMM number % has failed over to to the mirrored copy.
[580A5] Mirror Fail-over complete. DIMM number % has failed over to to the mirrored copy.
User Response:
Complete the following steps:
Check the system-event log for uncorrected DIMM failures and replace those DIMM
580A6: [580A6] Memory spare copy has completed successfully.
[580A6] Memory spare copy has completed successfully.
User Response:
Complete the following steps:
Check system log for related DIMM failures and replace those DIMM
68002: [68002] A CMOS battery error has been detected
[68002] A CMOS battery error has been detected
User Response:
Complete the following steps:
- If the system was recently installed, moved, or serviced, make sure the battery is properly seated.
- Check the IBM support site for an applicable service bulletin or firmware update that applies to this error.
- Replace the CMOS battery.
- Replace the system board.
68005: [68005] An error has been detected by the the IIO core logic on Bus %. The Global Fatal Error Status register contains %. The Global Non-Fatal Error Status register contains %. Please check error logs for the presence of additional downstream device error data.
[68005] An error has been detected by the the IIO core logic on Bus %. The Global Fatal Error Status register contains %. The Global Non-Fatal Error Status register contains %. Please check error logs for the presence of additional downstream device error data.
User Response:
Complete the following steps:
- Check the log for a separate error related to an associated PCIe device and resolve that error.
- Check the IBM support site for an applicable service bulletin or firmware update for the system or adapter that applies to this error.
- Replace the system board.
680B8: [680B8] Internal QPI Link Failure Detected.
[680B8] Internal QPI Link Failure Detected.
User Response:
Complete the following steps:
- Check the IBM support site for an applicable service bulletin or firmware update that applies to this error.
- Inspect the processor socket for foreign debris or damage. If debris is found, remove the debri
- If error recurs, or socket damage is found, replace the system board.
680B9: [680B9] External QPI Link Failure Detected.
[680B9] External QPI Link Failure Detected.
User Response:
Complete the following steps:
- Check the IBM support site for an applicable service bulletin or firmware update that applies to this error.
- Inspect the processor socket for foreign debris or damage. If debris is found, remove the debri
- If error recurs, or socket damage is found, replace the system board.