专注云服务,云技术,云资讯的分享
当前位置:
Dell 12代服务器出现 CPU 1 has an internal error (IERR)错误
[摘要] Dell 12代 Dell PowerEdge R420服务器突然挂掉,无响应,Idrac可以连接,但是通过Idrac reset后毫无反应。记得之前同样的机器也挂掉过一台,因为没抓到更多有用的系统日志,当时也没太在意。
这次发现日志里面有错误出现了:“CPU 1 has an internal error (IERR)”,因为系统用keepalived配置了高可用,挂掉一台并不影响服务,所以并不着急,正好可以找找问题原因所在。
一边请教谷歌大神,一边致电Dell金牌服务:400-886-8618,技术支持听我描述一番后给出了如下建议:
(1)BIOS中修改System Profile Settings -> System Profile,修改为Performance
(2)升级BIOS版本:
Google的结果也说Dell12代服务器电源管理有问题,建议使用acpi-cpufreq电源管理模块
# modprobe -r p4-clockmod
# modprobe acpi-cpufreq
# modprobe -r p4-clockmod# modprobe acpi-cpufreq
因为Idrac无法重启,于是找到了机房的remote hand,断电重启,居然能点亮,看来电源或者主板没问题,接下来好办了,Idrac全部可以搞定。
慢慢来,首先BIOS中修改了System Profile为Performance
然后升级了BIOS版本,从1.5.2升级到了2.1.2
过程如下:
# ./BIOS_R5R32_LN_2.1.2.BIN
Collecting inventory...
Running validation...
The version of this Update Package is newer than the currently installed version.
Software application name: BIOS
Package version: 2.1.2
Installed version: 1.5.2
Continue? Y/N:Y
Executing update...
WARNING: DO NOT STOP THIS PROCESS OR INSTALL OTHER DELL PRODUCTS WHILE UPDATE IS IN PROGRESS.
THESE ACTIONS MAY CAUSE YOUR SYSTEM TO BECOME UNSTABLE!
.............................................................................
The BIOS image file is successfully loaded. To successfully apply the BIOS update, do not shut down, cold reboot, power cycle, or turn off the system before the
BIOS update is complete. Reboot the system for the update to take effect. Note:
If OMSA is installed on the system, the OMSA data manager service stops if it
is already running.
Would you like to reboot your system now?
Continue? Y/N:Y
Broadcast message from
(/dev/pts/0) at 23:16 ...
1234567891011121314151617181920212223242526
# ./BIOS_R5R32_LN_2.1.2.BIN Collecting inventory.......Running validation...&BIOS&The version of this Update Package is newer than the currently installed version.Software application name: BIOSPackage version: 2.1.2Installed version: 1.5.2&&Continue? Y/N:YExecuting update...WARNING: DO NOT STOP THIS PROCESS OR INSTALL OTHER DELL PRODUCTS WHILE UPDATE IS IN PROGRESS.THESE ACTIONS MAY CAUSE YOUR SYSTEM TO BECOME UNSTABLE!.............................................................................The BIOS image file is successfully loaded. To successfully apply the BIOS update, do not shut down, cold reboot, power cycle, or turn off the system before theBIOS update is complete. Reboot the system for the update to take effect. Note:&&If OMSA is installed on the system, the OMSA data manager service stops if itis already running.Would you like to reboot your system now?Continue? Y/N:Y&Broadcast message from root@sudops.com (/dev/pts/0) at 23:16 ...
重启之后ssh登陆到系统,dmsg中发现有很多这样的日志:
p4-clockmod: Warning: EST-capable CPU detected. The acpi-cpufreq module offers voltage scaling in addition of frequency scaling. You should use that instead of p4-clockmod, if possible.
p4-clockmod: Warning: EST-capable CPU detected. The acpi-cpufreq module offers voltage scaling in addition of frequency scaling. You should use that instead of p4-clockmod, if possible.
p4-clockmod: Warning: EST-capable CPU detected. The acpi-cpufreq module offers voltage scaling in addition of frequency scaling. You should use that instead of p4-clockmod, if possible.p4-clockmod: Warning: EST-capable CPU detected. The acpi-cpufreq module offers voltage scaling in addition of frequency scaling. You should use that instead of p4-clockmod, if possible.
看来google到的处理方法应该是有必要的,于是执行两条命令
# modprobe -r p4-clockmod
# modprobe acpi-cpufreq
FATAL: Error inserting acpi_cpufreq (/lib/modules/2.6.32-220.el6.x86_64/kernel/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.ko): No such device
居然报错,说是找不到文件,但文件明明就在那呢,怎么会找不到?
# ls -l /lib/modules/2.6.32-220.el6.x86_64/kernel/arch/x86/kernel/cpu/cpufreq/*
-rwxr--r--. 1 root root 23672 Nov
2011 /lib/modules/2.6.32-220.el6.x86_64/kernel/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.ko
-rwxr--r--. 1 root root
2011 /lib/modules/2.6.32-220.el6.x86_64/kernel/arch/x86/kernel/cpu/cpufreq/mperf.ko
-rwxr--r--. 1 root root 12160 Nov
2011 /lib/modules/2.6.32-220.el6.x86_64/kernel/arch/x86/kernel/cpu/cpufreq/p4-clockmod.ko
-rwxr--r--. 1 root root 18552 Nov
2011 /lib/modules/2.6.32-220.el6.x86_64/kernel/arch/x86/kernel/cpu/cpufreq/pcc-cpufreq.ko
-rwxr--r--. 1 root root 41704 Nov
2011 /lib/modules/2.6.32-220.el6.x86_64/kernel/arch/x86/kernel/cpu/cpufreq/powernow-k8.ko
-rwxr--r--. 1 root root 13120 Nov
2011 /lib/modules/2.6.32-220.el6.x86_64/kernel/arch/x86/kernel/cpu/cpufreq/speedstep-lib.ko
# modprobe -l acpi-cpufreq
kernel/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.ko
123456789101112131415
# modprobe -r p4-clockmod# modprobe acpi-cpufreqFATAL: Error inserting acpi_cpufreq (/lib/modules/2.6.32-220.el6.x86_64/kernel/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.ko): No such device居然报错,说是找不到文件,但文件明明就在那呢,怎么会找不到? ls -l /lib/modules/2.6.32-220.el6.x86_64/kernel/arch/x86/kernel/cpu/cpufreq/*-rwxr--r--. 1 root root 23672 Nov&&9&&2011 /lib/modules/2.6.32-220.el6.x86_64/kernel/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.ko-rwxr--r--. 1 root root&&5824 Nov&&9&&2011 /lib/modules/2.6.32-220.el6.x86_64/kernel/arch/x86/kernel/cpu/cpufreq/mperf.ko-rwxr--r--. 1 root root 12160 Nov&&9&&2011 /lib/modules/2.6.32-220.el6.x86_64/kernel/arch/x86/kernel/cpu/cpufreq/p4-clockmod.ko-rwxr--r--. 1 root root 18552 Nov&&9&&2011 /lib/modules/2.6.32-220.el6.x86_64/kernel/arch/x86/kernel/cpu/cpufreq/pcc-cpufreq.ko-rwxr--r--. 1 root root 41704 Nov&&9&&2011 /lib/modules/2.6.32-220.el6.x86_64/kernel/arch/x86/kernel/cpu/cpufreq/powernow-k8.ko-rwxr--r--. 1 root root 13120 Nov&&9&&2011 /lib/modules/2.6.32-220.el6.x86_64/kernel/arch/x86/kernel/cpu/cpufreq/speedstep-lib.ko modprobe -l acpi-cpufreqkernel/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.ko
继续Google。。
找到这样一篇的文章,在Performance模式下是无法加载任何module的:
1. Performance Per Watt(DAPC): System DBPM(DAPC)
该模式是无法加载任何的 module 的:
# cpuspeed
Error: Could not find any CPUFreq controlled CPU cores to manage
# /etc/init.d/cpuspeed status
cpuspeed is stopped
2. Performance Per Watt(OS): OS DBPM
启动后可以发现,系统自动的加载了 acpi_cpufreq:
# lsmod | grep cpu
cpufreq_ondemand
acpi_cpufreq
freq_table
2 cpufreq_ondemand,acpi_cpufreq
1 acpi_cpufreq
# /etc/init.d/cpuspeed status
Frequency scaling enabled using ondemand governor
3. Performance: Maximum Performance
该模式同样无法加在任何的 module 的
1234567891011121314151617181920
1. Performance Per Watt(DAPC): System DBPM(DAPC)该模式是无法加载任何的 module 的:# cpuspeedError: Could not find any CPUFreq controlled CPU cores to manage# /etc/init.d/cpuspeed statuscpuspeed is stopped&2. Performance Per Watt(OS): OS DBPM启动后可以发现,系统自动的加载了 acpi_cpufreq:# lsmod | grep cpucpufreq_ondemand&&&&&& 10544&&24acpi_cpufreq&&&&&&&&&&&&7891&&1freq_table&&&&&&&&&&&&&&4881&&2 cpufreq_ondemand,acpi_cpufreqmperf&&&&&&&&&&&&&&&&&& 1557&&1 acpi_cpufreq /etc/init.d/cpuspeed statusFrequency scaling enabled using ondemand governor&3. Performance: Maximum Performance该模式同样无法加在任何的 module 的
于是又回到BIOS中把 System Profile,修改为 Performance Per Watt(OS): OS DBPM
再次重启,dmsg中已经正常了,看来问题解决了,不过还有待于时间的考验!
Trouble shooting的过程中发现cpufreq_setup的使用方法比较有价值
另外Dell的Idrac命令里面真的有很多选项
比如Idrac取到的sel日志如下:
racadm&&getsel
racadm getsel
-------------------------------------------------------------------------------
Date/Time:
Description: CPU 1 has an internal error (IERR).
-------------------------------------------------------------------------------
racadm&&getsel racadm getsel&&-------------------------------------------------------------------------------Record:&&&&&&2Date/Time:&& 05/22/2014 12:44:33Source:&&&&&&systemSeverity:&&&&CriticalDescription: CPU 1 has an internal error (IERR).-------------------------------------------------------------------------------
其他帮助参数
/admin1-& help
[&options&] [&target&] [&properties&]
[&propertyname&== &propertyvalue&]
[&options&] [&target&] &propertyname&=&value&
[&options&] [&target&]
create [&options&] &target& [&property of new target&=&value&]
[&property of new target&=&value&]
delete [&options&] &target&
[&options&]
[&options&] [&target&]
[&options&] [&target&]
[&options&] [&target&]
version [&options&]
[&options&] [&help topics&]
load -source &URI& [&options&] [&target&]
dump -destination &URI& [&options&] [&target&]
/admin1-& racadm
racadm&&help
racadm help
help [subcommand]
-- display usage summary for a subcommand
-- display the networking ARP table
clearasrscreen
-- clear the last ASR (crash) screen
-- close a session
-- clear the RAC log
-- clear the System Event Log (SEL)
-- Deprecated: modify RAC configuration properties
-- display the last RAC coredump
coredumpdelete
-- delete the last RAC coredump
eventfilters
-- Alerts configuration commands
-- update the RAC firmware
-- display RAC configuration properties
-- Deprecated: display RAC configuration properties
-- Get the state of the LED on a module.
-- display current network settings
-- display the RAC log
getractime
-- display the current RAC time
-- display records from the System Event Log (SEL)
getsensorinfo
-- display system sensors
getssninfo
-- display session information
-- display service tag information
getsysinfo
-- display general RAC and system information
gettracelog
-- display the RAC diagnostic trace log
getuscversion
-- display the current USC version details
getversion
-- display the current version details
-- display network interface information
inlettemphistory
-- inlet temperature history operations
-- LCLog operations
frontpanelerror
-- hide LCD errors - color amber to blue
-- display routing table and network statistics
-- send ICMP echo packets on the network
-- send ICMP echo packets on the network
-- display RAC diagnostic information
-- perform a RAC reset operation
racresetcfg
-- restore the RAC configuration to factory defaults
remoteimage
-- make a remote ISO image available to the server
serveraction
-- perform system power management operations
-- modify RAC configuration properties
-- Set the state of the LED on a module.
-- modify network configuration properties
-- manage SSH PK authentication keys on the RAC
sslcertdelete
-- delete an SSL certificate on the iDRAC
sslcertview
-- view SSL certificate information
-- generate a certificate CSR from the RAC
sslresetcfg
-- resets the web certificate to default and restarts the web server.
-- test RAC e-mail notifications
-- test RAC SNMP trap notifications
-- test RAC SNMP - FQDN trap notifications
traceroute
-- print the route packets trace to network host
traceroute6
-- print the route packets trace to network host
usercertview
-- view user certificate information
vflashpartition
-- manage partitions on the vFlash SD card
-- perform vFlash SD Card initialization
vmdisconnect
-- disconnect Virtual Media connections
-- Deprecated: perform vFlash operations
-- License Manager commands
-- Field Service Debug Authorization facility commands
-- Monitoring and Inventory of H/W RAID connected to the server.
hwinventory
-- Monitoring and Inventory of H/W NICs connected to the server.
nicstatistics
-- Statistics for NICs connected to the server.
fcstatistics
-- Statistics for FCs connected to the server.
-- Platform Update of the devices on the server
-- Jobqueue of of the jobs currently scheduled
systemconfig
-- Backup &/or Restore of iDRAC Config and Firmware
-- Information about iDRAC being queried
cfgRemoteHosts
-- Properties for configuration of the SMTP server
cfgUserAdmin
-- Information about iDRAC users
cfgEmailAlert
-- Parameters to configure e-mail alerting capabilities
cfgSessionManagement -- Information of the session Properties
-- Provides configuration parameters for the iDRAC
cfgOobSnmp
-- Configuration of the SNMP agent and trap capabilities
cfgRacTuning
-- Configuration for various iDRAC properties.
ifcRacManagedNodeOs
-- Properties of the managed server OS
cfgRacSecurity
-- Configure SSL certificate signing request settings
cfgRacVirtual
-- Configuration Properties for iDRAC Virtual Media
cfgActiveDirectory
-- Configuration of the iDRAC Active Directory feature
-- Configuration properties for LDAP settings
cfgLdapRoleGroup
-- Configuration of role groups for LDAP
cfgLogging
-- Group Description for group cfgLogging
cfgStandardSchema
-- Configuration of AD standard schema settings
cfgIpmiSerial
-- Properties to configure the IPMI serial interface
cfgIpmiSol
-- Configuration the SOL capabilities of the system
cfgIpmiLan
-- Configuration the IPMI over LAN of the system
cfgIpmiPef
-- Configuration the platform event filters
cfgServerPower
-- Provides power management features
cfgServerPowerSupply -- Provides information related to the power supplies
cfgVFlashSD
-- Configure the properties for the vFlash SD card
cfgVFlashPartition
-- Configure partitions on the vFlash SD Card
cfgUserDomain
-- Configure the Active Directory user domain names
cfgSmartCard
-- Properties to access iDRAC using a smart card
cfgServerInfo
-- Configuration of first boot device
cfgSensorRedundancy
-- Configure the power supply redundancy
cfgLanNetworking
-- Parameters to configure the iDRAC NIC
cfgStaticLanNetworking -- Parameters to configure the iDRAC NIC
cfgNetTuning
-- Group Description for group cfgNetTuning
cfgIPv6LanNetworking -- Configuration of the IPv6 over LAN networking
cfgIPv6StaticLanNetworking -- Configuration of the IPv6 over LAN networking
cfgIPv6URL
-- Configuration of the iDRAC IPv6 URL.
For Help on configuring the properties of a group - racadm help config
-----------------------------------------------------------------------
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128
/admin1-& help[Usage]&&&&show&& [&options&] [&target&] [&properties&] &&&&&&&&&& [&propertyname&== &propertyvalue&]&&&&set&&&&[&options&] [&target&] &propertyname&=&value&&&&&cd&&&& [&options&] [&target&]&&&&create [&options&] &target& [&property of new target&=&value&] &&&&&&&&&& [&property of new target&=&value&]&&&&delete [&options&] &target&&&&&exit&& [&options&]&&&&reset&&[&options&] [&target&]&&&&start&&[&options&] [&target&]&&&&stop&& [&options&] [&target&]&&&&version [&options&]&&&&help&& [&options&] [&help topics&]&&&&load -source &URI& [&options&] [&target&]&&&&dump -destination &URI& [&options&] [&target&]&/admin1-& racadmracadm&&help &racadm help&&
help [subcommand]&&&&-- display usage summary for a subcommand arp&&&&&&&&&&&&&&&&&&-- display the networking ARP table clearasrscreen&&&&&& -- clear the last ASR (crash) screen closessn&&&&&&&&&&&& -- close a session clrraclog&&&&&&&&&&&&-- clear the RAC log clrsel&&&&&&&&&&&&&& -- clear the System Event Log (SEL) config&&&&&&&&&&&&&& -- Deprecated: modify RAC configuration properties coredump&&&&&&&&&&&& -- display the last RAC coredump coredumpdelete&&&&&& -- delete the last RAC coredump eventfilters&&&&&&&& -- Alerts configuration commands fwupdate&&&&&&&&&&&& -- update the RAC firmware get&&&&&&&&&&&&&&&&&&-- display RAC configuration properties getconfig&&&&&&&&&&&&-- Deprecated: display RAC configuration properties getled&&&&&&&&&&&&&& -- Get the state of the LED on a module. getniccfg&&&&&&&&&&&&-- display current network settings getraclog&&&&&&&&&&&&-- display the RAC log getractime&&&&&&&&&& -- display the current RAC time getsel&&&&&&&&&&&&&& -- display records from the System Event Log (SEL) getsensorinfo&&&&&&&&-- display system sensors getssninfo&&&&&&&&&& -- display session information getsvctag&&&&&&&&&&&&-- display service tag information getsysinfo&&&&&&&&&& -- display general RAC and system information gettracelog&&&&&&&&&&-- display the RAC diagnostic trace log getuscversion&&&&&&&&-- display the current USC version details getversion&&&&&&&&&& -- display the current version details ifconfig&&&&&&&&&&&& -- display network interface information inlettemphistory&&&& -- inlet temperature history operations lclog&&&&&&&&&&&&&&&&-- LCLog operations frontpanelerror&&&&&&-- hide LCD errors - color amber to blue netstat&&&&&&&&&&&&&&-- display routing table and network statistics ping&&&&&&&&&&&&&&&& -- send ICMP echo packets on the network ping6&&&&&&&&&&&&&&&&-- send ICMP echo packets on the network racdump&&&&&&&&&&&&&&-- display RAC diagnostic information racreset&&&&&&&&&&&& -- perform a RAC reset operation racresetcfg&&&&&&&&&&-- restore the RAC configuration to factory defaults remoteimage&&&&&&&&&&-- make a remote ISO image available to the server serveraction&&&&&&&& -- perform system power management operations set&&&&&&&&&&&&&&&&&&-- modify RAC configuration properties setled&&&&&&&&&&&&&& -- Set the state of the LED on a module. setniccfg&&&&&&&&&&&&-- modify network configuration properties sshpkauth&&&&&&&&&&&&-- manage SSH PK authentication keys on the RAC sslcertdelete&&&&&&&&-- delete an SSL certificate on the iDRAC sslcertview&&&&&&&&&&-- view SSL certificate information sslcsrgen&&&&&&&&&&&&-- generate a certificate CSR from the RAC sslresetcfg&&&&&&&&&&-- resets the web certificate to default and restarts the web server. testemail&&&&&&&&&&&&-- test RAC e-mail notifications testtrap&&&&&&&&&&&& -- test RAC SNMP trap notifications testalert&&&&&&&&&&&&-- test RAC SNMP - FQDN trap notifications traceroute&&&&&&&&&& -- print the route packets trace to network host traceroute6&&&&&&&&&&-- print the route packets trace to network host usercertview&&&&&&&& -- view user certificate information vflashpartition&&&&&&-- manage partitions on the vFlash SD card vflashsd&&&&&&&&&&&& -- perform vFlash SD Card initialization vmdisconnect&&&&&&&& -- disconnect Virtual Media connections vmkey&&&&&&&&&&&&&&&&-- Deprecated: perform vFlash operations license&&&&&&&&&&&&&&-- License Manager commands debug&&&&&&&&&&&&&&&&-- Field Service Debug Authorization facility commands raid&&&&&&&&&&&&&&&& -- Monitoring and Inventory of H/W RAID connected to the server. hwinventory&&&&&&&&&&-- Monitoring and Inventory of H/W NICs connected to the server. nicstatistics&&&&&&&&-- Statistics for NICs connected to the server. fcstatistics&&&&&&&& -- Statistics for FCs connected to the server. update&&&&&&&&&&&&&& -- Platform Update of the devices on the server jobqueue&&&&&&&&&&&& -- Jobqueue of of the jobs currently scheduled systemconfig&&&&&&&& -- Backup &/or Restore of iDRAC Config and Firmware
Groups idRacInfo&&&&&&&&&&&&-- Information about iDRAC being queriedcfgRemoteHosts&&&&&& -- Properties for configuration of the SMTP servercfgUserAdmin&&&&&&&& -- Information about iDRAC userscfgEmailAlert&&&&&&&&-- Parameters to configure e-mail alerting capabilitiescfgSessionManagement -- Information of the session PropertiescfgSerial&&&&&&&&&&&&-- Provides configuration parameters for the iDRAC cfgOobSnmp&&&&&&&&&& -- Configuration of the SNMP agent and trap capabilitiescfgRacTuning&&&&&&&& -- Configuration for various iDRAC properties.ifcRacManagedNodeOs&&-- Properties of the managed server OScfgRacSecurity&&&&&& -- Configure SSL certificate signing request settingscfgRacVirtual&&&&&&&&-- Configuration Properties for iDRAC Virtual MediacfgActiveDirectory&& -- Configuration of the iDRAC Active Directory featurecfgLDAP&&&&&&&&&&&&&&-- Configuration properties for LDAP settingscfgLdapRoleGroup&&&& -- Configuration of role groups for LDAPcfgLogging&&&&&&&&&& -- Group Description for group cfgLoggingcfgStandardSchema&&&&-- Configuration of AD standard schema settingscfgIpmiSerial&&&&&&&&-- Properties to configure the IPMI serial interfacecfgIpmiSol&&&&&&&&&& -- Configuration the SOL capabilities of the systemcfgIpmiLan&&&&&&&&&& -- Configuration the IPMI over LAN of the systemcfgIpmiPef&&&&&&&&&& -- Configuration the platform event filterscfgServerPower&&&&&& -- Provides power management featurescfgServerPowerSupply -- Provides information related to the power suppliescfgVFlashSD&&&&&&&&&&-- Configure the properties for the vFlash SD cardcfgVFlashPartition&& -- Configure partitions on the vFlash SD CardcfgUserDomain&&&&&&&&-- Configure the Active Directory user domain namescfgSmartCard&&&&&&&& -- Properties to access iDRAC using a smart cardcfgServerInfo&&&&&&&&-- Configuration of first boot devicecfgSensorRedundancy&&-- Configure the power supply redundancycfgLanNetworking&&&& -- Parameters to configure the iDRAC NICcfgStaticLanNetworking -- Parameters to configure the iDRAC NICcfgNetTuning&&&&&&&& -- Group Description for group cfgNetTuningcfgIPv6LanNetworking -- Configuration of the IPv6 over LAN networkingcfgIPv6StaticLanNetworking -- Configuration of the IPv6 over LAN networkingcfgIPv6URL&&&&&&&&&& -- Configuration of the iDRAC IPv6 URL. For Help on configuring the properties of a group - racadm help config -----------------------------------------------------------------------
本文固定链接:
【上一篇】【下一篇】
您可能还会对这些文章感兴趣!
最新日志热评日志随机日志
日志总数:125 篇
评论总数:28 篇
标签数量:132 个
链接总数:0 个
建站日期:
运行天数:1007 天
最后更新: