Exadata(Half Rack) Image Upgrade (Rolling)

Exadata(Half Rack) Image Upgrade (Rolling)
Successfully Upgraded Exadata Half Rack Image to 12.1.2.3.4

Component Current Image Version To Upgrade Image Version Image Patch

Database Server
/Compute Node 12.1.2.1.1 12.1.2.3.4 25093501

Cell Storage Server 12.1.2.1.1 12.1.2.3.4 25031476

InfiniBand Switch IB Switch version: 2.1.5-1 IB Switchversion 2.1.8-1 25031476

Backup Current Configurations:
echo "Executing prechecks specific to cell nodes..."
This is one time precheck configs will collect for all 7 cell nodes of the cell group
#cd /root
echo "" >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt
echo "************************" >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt
echo "Cell specific prechecks" >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt
echo "************************" >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt
echo -e "\n# Need at least 1.5gb + size of ISO file (approx 3gb total for Jan-July releases) space on / partition of cells to do cell update" >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt
dcli -l root -g cell_group df -h / >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt
echo -e "\n# Check all cells are up" >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt
dcli -l root -g cell_group cellcli -e list cell >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt
echo -e "\n# Check cell network configuration" >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt
dcli -l root -g cell_group "/opt/oracle.cellos/ipconf -verify" >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt
echo -e "\n# Validate cell disks for valid physicalInsertTime (should be no output)" >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt
dcli -l root -g cell_group cellcli -e 'list physicaldisk attributes luns where physicalInsertTime = null' >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt
echo -e "\n# Check for WriteBack Flash Cache" >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt
dcli -l root -g cell_group cellcli -e "list cell attributes flashcachemode" >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt
echo -e "\n# Check for Flash Cache Compression" >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt
dcli -l root -g cell_group cellcli -e "list cell attributes flashCacheCompress" >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt

echo "Executing prechecks specific to compute nodes..."
This is one time precheck configs will collect for all 4 db nodes of the dbs group
#cd /root
echo "" >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
echo "************************" >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
echo "Compute specific prechecks" >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
echo "************************" >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
echo -e "\n# need 3-5 gb on / partition" >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
dcli -l root -g dbs_group df -h / >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
echo -e "\n# need ~40 mb on /boot partition" >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
dcli -l root -g dbs_group df -h /boot >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
echo -e "\n# check freespace on /u01 partition" >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
dcli -l root -g dbs_group df -h /u01 >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
echo -e "\n# Make sure not snaps are still active" >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
dcli -l root -g dbs_group lvs >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
echo -e "\n# Need at least 1.5gb gb free PE" >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
dcli -l root -g dbs_group vgdisplay >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt

echo -e "\n# Mounted filesystems" >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
dcli -l root -g dbs_group mount >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
echo -e "\n# Contents of /etc/fstab" >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
dcli -l root -g dbs_group cat /etc/fstab >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
echo -e "\n# Contents of /etc/exports" >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
dcli -l root -g dbs_group cat /etc/exports >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
#
echo "Done."

cho -e "\n# Contents of /etc/exports" >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
dcli -l root -g dbs_group cat /etc/exports >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
#
echo "Done."

Cell Node Image Upgrade

Pre Validation
==============
1. Validate if there is any Critical and Stateful alert is leftover on the cell servers
cd /root

[root@abcxyzadm01 ~]# dcli -g cell_group -l root "cellcli -e list alerthistory attributes name,beginTime,alertShortName,alertDescription,severity where alerttype=stateful and severity=critical"

We can drop all the alert history
CellCLI> drop alerthistory all

OR
====
To drop individual alert we need to use below command
Example :
CellCLI> list alerthistory <6_1>
CellCLI> list alerthistory <6_1> detail

CellCLI> drop alerthistory <6_1>

Precheck
========
[root@abcxyzadm01 patch_12.1.2.3.4.170111]# ./patchmgr -cells <cell_group> -patch_check_prereq -rolling

-rw-r--r-- 1 root root 10 May 4 15:53 cell_node_1
-rw-r--r-- 1 root root 11 May 4 16:10 cell_node_2
-rw-r--r-- 1 root root 10 May 4 16:11 cell_node_3
-rw-r--r-- 1 root root 10 May 4 16:11 cell_node_4
-rw-r--r-- 1 root root 11 May 4 16:11 cell_node_5
-rw-r--r-- 1 root root 10 May 4 16:11 cell_node_6
-rw-r--r-- 1 root root 10 May 4 16:12 cell_node_7
Note : change the cell_group name to one which you are patching
Example : For first cell node
[root@abcxyzadm01 patch_12.1.2.3.4.170111]# ./patchmgr -cells cell_node_1 -patch_check_prereq -rolling
[root@abcxyzadm01 patch_12.1.2.3.4.170111]# ./patchmgr -cells cell_node_1 -patch_check_prereq -rolling
2017-05-08 11:24:52 -0700 [INFO] Disabling /var/log/cellos cleanup on this node for the duration of the patchmgr session.

2017-05-08 11:25:56 -0700 :Working: DO: Check cells have ssh equivalence for root user. Up to 10 seconds per cell ...
2017-05-08 11:25:56 -0700 :SUCCESS: DONE: Check cells have ssh equivalence for root user.
2017-05-08 11:26:00 -0700 :Working: DO: Initialize files, check space and state of cell services. Up to 1 minute ...
2017-05-08 11:26:44 -0700 :SUCCESS: DONE: Initialize files, check space and state of cell services.
2017-05-08 11:26:44 -0700 :Working: DO: Copy, extract prerequisite check archive to cells. If required start md11 mismatched partner size correction. Up to 40 minutes ...
2017-05-08 11:26:59 -0700 Wait correction of degraded md11 due to md partner size mismatch. Up to 30 minutes.

2017-05-08 11:27:00 -0700 :SUCCESS: DONE: Copy, extract prerequisite check archive to cells. If required start md11 mismatched partner size correction.
2017-05-08 11:27:00 -0700 :Working: DO: Check prerequisites on all cells. Up to 2 minutes ...
2017-05-08 11:27:39 -0700 :SUCCESS: DONE: Check prerequisites on all cells.
2017-05-08 11:27:39 -0700 :Working: DO: Execute plugin check for Patch Check Prereq ...
2017-05-08 11:27:39 -0700 :INFO: Patchmgr plugin start: Prereq check for exposure to bug 22909764 v1.0. Details in logfile /u01/exa_img_upg/CELL/patch_12.1.2.3.4.170111/patchmgr.stdout.
2017-05-08 11:27:39 -0700 :INFO: Patchmgr plugin start: Prereq check for exposure to bug 17854520 v1.3. Details in logfile /u01/exa_img_upg/CELL/patch_12.1.2.3.4.170111/patchmgr.stdout.
2017-05-08 11:27:41 -0700 :WARNING: ACTION REQUIRED: Cells to be upgraded pass version check, however other cells not being upgraded may be at version 11.2.3.1.x or 11.2.3.2.x, exposing the system to bug 17854520. Manually check other cells for version 11.2.3.1.x or 11.2.3.2.x.
2017-05-08 11:27:41 -0700 :INFO: Checking database homes for remote db nodes with oracle-user ssh equivalence to the local system.
2017-05-08 11:27:41 -0700 :INFO: Database homes that exist only on remote nodes must be checked manually.
2017-05-08 11:27:47 -0700 :SUCCESS: Patchmgr plugin complete: Prereq check passed - no exposure to bug 17854520
2017-05-08 11:27:47 -0700 :INFO: Patchmgr plugin start: Prereq check for exposure to bug 22468216 v1.0. Details in logfile /u01/exa_img_upg/CELL/patch_12.1.2.3.4.170111/patchmgr.stdout.
2017-05-08 11:27:48 -0700 :SUCCESS: Patchmgr plugin complete: Prereq check passed for the bug 22468216
2017-05-08 11:27:48 -0700 :INFO : Patchmgr plugin start: Prereq check for exposure to bug 24625612 v1.0.
2017-05-08 11:27:48 -0700 :INFO : Details in logfile /u01/exa_img_upg/CELL/patch_12.1.2.3.4.170111/patchmgr.stdout.
2017-05-08 11:27:48 -0700 :SUCCESS: Patchmgr plugin complete: Prereq check passed for the bug 24625612
2017-05-08 11:27:48 -0700 :SUCCESS: DONE: Execute plugin check for Patch Check Prereq.

Actual Steps
============

Cell Node/ Cell Server Patch Plan (Rolling)

Start one screen session Image Upgrade using using the patchmgr utility Unzip the cell software p25031476_121234_Linux-x86-64.zip It will extract into the patch_12.1.2.3.4.170111 directory. Login to the compute node where you kept the cell software

screen -RR cell-patch
cd /u01/exa_img_upg/CELL/
unzip p25031476_121234_Linux-x86-64.zip
cd /u01/exa_img_upg/CELL/patch_12.1.2.3.4.170111

Stop agents if running (Rolling)

#ps -ef | grep agent | grep java | sed 's/\s\+/ /g' | cut -d " " -f 1,8 | sed 's/\/jdk.*//'

sudo su - oracle -c "/u01/app/em12c/core/12.1.0.4.0/bin/emctl status agent"
sudo su - oracle -c "/u01/app/em12c/core/12.1.0.4.0/bin//bin/emctl stop agent"
sudo su - oracle -c "/u01/app/em12c/core/12.1.0.4.0/bin/emctl status agent"
Command Output
[root@abcxyzadm01 patch_12.1.2.3.4.170111]# ps -ef | grep agent | grep java | sed 's/\s\+/ /g' | cut -d " " -f 1,8 | sed 's/\/jdk.*//'
oracle /u01/app/em12c/core/12.1.0.4.0

[root@abcxyzadm01 patch_12.1.2.3.4.170111]# sudo su - oracle -c "/u01/app/em12c/core/12.1.0.4.0/bin/emctl status agent"
Oracle Enterprise Manager Cloud Control 12c Release 4
Copyright (c) 1996, 2014 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
Agent Version : 12.1.0.4.0
OMS Version : 12.1.0.4.0
Protocol Version : 12.1.0.1.0
Agent Home : /u01/app/em12c/agent_inst
Agent Log Directory : /u01/app/em12c/agent_inst/sysman/log
Agent Binaries : /u01/app/em12c/core/12.1.0.4.0
Agent Process ID : 20987
Parent Process ID : 20811
Agent URL : https://abcxyzadm01.xxxx:3872/emd/main/
Local Agent URL in NAT : https://abcxyzadm01.xxxx:3872/emd/main/
Repository URL : https://XYZ-oemprdapp01.xxxx:4900/empbs/upload
Started at : 2015-08-28 20:49:00
Started by user : oracle
Operating System : Linux version 2.6.39-400.248.3.el6uek.bug21692254.x86_64 (amd64)
Last Reload : 2015-08-31 10:37:11
Last successful upload : 2017-05-08 11:51:23
Last attempted upload : 2017-05-08 11:51:23
Total Megabytes of XML files uploaded so far : 2,641
Number of XML files pending upload : 0
Size of XML files pending upload(MB) : 0
Available disk space on upload filesystem : 16.91%
Collection Status : Collections enabled
Heartbeat Status : Ok
Last attempted heartbeat to OMS : 2017-05-08 11:50:32
Last successful heartbeat to OMS : 2017-05-08 11:50:32
Next scheduled heartbeat to OMS : 2017-05-08 11:51:32

---------------------------------------------------------------
Agent is Running and Ready
[root@abcxyzadm01 patch_12.1.2.3.4.170111]#

Restart ILOM on all cell nodes (optional)
dcli -l root -g <cell_group> "ipmitool bmc reset cold"
Note: Change the <cell_group> to the cell server you are patching as below
cell_node_1
cell_node_2
cell_node_3
cell_node_4
cell_node_5
cell_node_6
cell_node_7
Example
dcli -l root -g cell_node_1 "ipmitool bmc reset cold"

Check repair times for all mounted disk groups in the Oracle ASM instance and adjust if needed Note: Set disk_repair_time to 8.5 hours.

sqlplus / as sysasm
select dg.name,a.value from v$asm_diskgroup dg, v$asm_attribute a where dg.group_number=a.group_number and a.name='disk_repair_time';
NAME VALUE
------------ ----------
DATA_DG 3.6h
DBFS_DG 3.6h
RECO_DG 3.6h
If the repair time is not 8.5 hours then note the value and the diskgroup names. Replace <diskgroup_name> in the following statement to adjust. alter diskgroup '<diskgroup_name>' set attribute 'disk_repair_time'='8.5h';
Repeat the above statement for each diskgroup as below
alter diskgroup DATA_DG set attribute 'disk_repair_time'='8.5h';
alter diskgroup DBFS_DG set attribute 'disk_repair_time'='8.5h';
alter diskgroup RECO_DG set attribute 'disk_repair_time'='8.5h';

Also increase the rebalance power to 5

alter diskgroup RECO_DG rebalance power 5;
alter diskgroup DATA_DG rebalance power 5;
alter diskgroup DBFS_DG rebalance power 5;

Check uptime and reboot if needed

cd /u01/exa_img_upg/CELL/patch_12.1.2.3.4.170111
dcli -l root -g cell_group "uptime"

[root@abcxyzadm01 ~]# dcli -l root -g cell_group "uptime"
xyzceladm01: 12:51:44 up 222 days, 17:17, 0 users, load average: 1.45, 1.33, 1.35
xyzceladm02: 12:51:44 up 222 days, 17:17, 0 users, load average: 1.64, 1.21, 1.25
xyzceladm03: 12:51:44 up 222 days, 17:18, 0 users, load average: 1.34, 1.45, 1.43
xyzceladm04: 12:51:44 up 222 days, 17:17, 0 users, load average: 0.98, 1.25, 1.35
xyzceladm05: 12:51:44 up 222 days, 17:17, 0 users, load average: 1.10, 1.33, 1.42
xyzceladm06: 12:51:44 up 222 days, 17:17, 0 users, load average: 0.88, 1.21, 1.34
xyzceladm07: 12:51:44 up 222 days, 17:17, 0 users, load average: 0.99, 1.21, 1.30
[root@abcxyzadm01 ~]#
Note: If cells up more than 7 days then reboot each cell in rolling fashion using the note below.
Steps to shut down or reboot an Exadata storage cell without affecting ASM (Doc ID 1188080.1)
Note: In our case we need to reboot the cell server
cd /u01/exa_img_upg/CELL/patch_12.1.2.3.4.170111

Cell Node Reboot Steps Without Affecting ASM
Step 1: Check the disk_repair_time is set to 8.5 hours for all mounted disk groups in the Oracle ASM instance and if not set the same from above steps Note: We have just set it to 8.5 hours from previous steps.

sqlplus / as sysasm
select dg.name,a.value from v$asm_diskgroup dg, v$asm_attribute a where dg.group_number=a.group_number and a.name='disk_repair_time';
Step 2:
Next we need to check if ASM will be OK if the grid disks go OFFLINE. The following command should return 'Yes' for the grid disks being listed: ssh the cell node you are going for the image upgrade after the reboot
ssh <Cell_Node>
cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome
Note : double check you are on the correct cell which you are going for the image upgrade

Step 3: If all the disks return asmdeactivationoutcome='Yes', then only further proceed to next step

Step 4: Run cellcli command to Inactivate all grid disks on the cell you wish to power down/reboot:
cellcli -e list griddisk

cellcli -e alter griddisk all inactive
Note: * Please note - This action could take 10 minutes or longer depending on activity. It is very important to make sure you were able to offline all the disks successfully before shutting down the cell services. Inactivating the grid disks will automatically OFFLINE the disks in the ASM instance.

Step 5: Confirm that the griddisks are now offline by performing the following actions:
(a) Execute the command below and the output should show either asmmodestatus=OFFLINE or asmmodestatus=UNUSED and asmdeactivationoutcome=Yes for all griddisks once the disks are offline in ASM. Only then is it safe to proceed with shutting down or restarting the cell:
cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome
( there has also been a reported case of asmmodestatus= OFFLINE: Means Oracle ASM has taken this grid disk offline. This status is also fine and can proceed with remaining instructions)
(b) List the griddisks to confirm all now show inactive:
cellcli -e list griddisk
Step 6: We can now reboot the cell, Double check above steps are taken care.
#hostname
#reboot

or
#shutdown -F -r now
Step 7: Once the cell comes back online - We will need to reactive the griddisks:
cellcli -e alter griddisk all active
Step 8: Issue the command below and all disks should show 'active':
cellcli -e list griddisk

Step 9: Verify grid disk status:
(a) Verify all grid disks have been successfully put online using the following command:
cellcli -e list griddisk attributes name, asmmodestatus
(b) Wait until asmmodestatus is ONLINE for all grid disks. Each disk will go to a 'SYNCING' state first then 'ONLINE'. The following is an example of the output:
DATA_CD_00_xyzcel01 ONLINE <<=========
DATA_CD_01_xyzcel01 SYNCING <<========
DATA_CD_02_xyzcel01 OFFLINE <<========
DATA_CD_03_xyzcel01 OFFLINE
DATA_CD_04_xyzcel01 OFFLINE
DATA_CD_05_xyzcel01 OFFLINE
DATA_CD_06_xyzcel01 OFFLINE
DATA_CD_07_xyzcel01 OFFLINE
DATA_CD_08_xyzcel01 OFFLINE
DATA_CD_09_xyzcel01 OFFLINE
DATA_CD_10_xyzcel01 OFFLINE
DATA_CD_11_xyzcel01 OFFLINE
(c) Oracle ASM synchronization is only complete when all grid disks show asmmodestatus=ONLINE.
( Please note: this operation uses Fast Mirror Resync operation - which does not trigger an ASM rebalance. The Resync operation restores only the extents that would have been written while the disk was offline.)
We will get output as below
[root@xyzcel07 ~]# cellcli -e list griddisk attributes name, asmmodestatus
DATA_DG_CD_00_xyzcel07 ONLINE
DATA_DG_CD_01_xyzcel07 ONLINE
DATA_DG_CD_02_xyzcel07 ONLINE
DATA_DG_CD_03_xyzcel07 ONLINE
DATA_DG_CD_04_xyzcel07 ONLINE
DATA_DG_CD_05_xyzcel07 ONLINE
DATA_DG_CD_06_xyzcel07 ONLINE
DATA_DG_CD_07_xyzcel07 ONLINE
DATA_DG_CD_08_xyzcel07 ONLINE
DATA_DG_CD_09_xyzcel07 ONLINE
DATA_DG_CD_10_xyzcel07 ONLINE
DATA_DG_CD_11_xyzcel07 ONLINE
DBFS_DG_CD_02_xyzcel07 ONLINE
DBFS_DG_CD_03_xyzcel07 ONLINE
DBFS_DG_CD_04_xyzcel07 ONLINE
DBFS_DG_CD_05_xyzcel07 ONLINE
DBFS_DG_CD_06_xyzcel07 ONLINE
DBFS_DG_CD_07_xyzcel07 ONLINE
DBFS_DG_CD_08_xyzcel07 ONLINE
DBFS_DG_CD_09_xyzcel07 ONLINE
DBFS_DG_CD_10_xyzcel07 ONLINE
DBFS_DG_CD_11_xyzcel07 ONLINE
RECO_DG_CD_00_xyzcel07 ONLINE
RECO_DG_CD_01_xyzcel07 ONLINE
RECO_DG_CD_02_xyzcel07 ONLINE
RECO_DG_CD_03_xyzcel07 ONLINE
RECO_DG_CD_04_xyzcel07 ONLINE
RECO_DG_CD_05_xyzcel07 ONLINE
RECO_DG_CD_06_xyzcel07 ONLINE
RECO_DG_CD_07_xyzcel07 ONLINE
RECO_DG_CD_08_xyzcel07 ONLINE
RECO_DG_CD_09_xyzcel07 ONLINE
RECO_DG_CD_10_xyzcel07 ONLINE
RECO_DG_CD_11_xyzcel07 ONLINE
[root@xyzcel07 ~]#
Step10: Cell Node Reboot Steps Without Affecting ASM completed here.

Cleanup space from any previous runs
The -reset_force command is only done the first time the cells are patched to this release.
It is not necessary to use the command for subsequent cell patching, even after rolling back the patch.
We need to use 'cleanup' option not 'reset_force' option
[root@abcxyzadm05 ~]#cd /u01/exa_img_upg/CELL/patch_12.1.2.3.4.170111

./patchmgr -cells <cell_group> -reset_force

Note : Always use the -cleanup option before retrying a failed or halted run of the patchmgr utility.
[root@abcxyzadm05 ~]#cd /u01/exa_img_upg/CELL/patch_12.1.2.3.4.170111
./patchmgr -cells <cell_group> -cleanup
Note: Please use the <cell_gorup> as per respective cell you are patching
Note: Change the <cell_group> to the cell server you are patching as below
cell_node_1
cell_node_2
cell_node_3
cell_node_4
cell_node_5
cell_node_6
cell_node_7
Example : ./patchmgr -cells cell_node_1 -cleanup

Download and install latest plugins
#cd /u01/exa_img_upg/CELL
chmod +x /u01/exa_img_upg/CELL/patch_12.1.2.3.4.170111/plugins/*

Run prerequisites check
=======================
#cd /u01/exa_img_upg/CELL/patch_12.1.2.3.4.170111
./patchmgr -cells <cell_group> -patch_check_prereq -rolling

Note: Change the <cell_group> to the cell server you are patching as below

cell_node_1
cell_node_2
cell_node_3
cell_node_4
cell_node_5
cell_node_6
cell_node_7
Example : ./patchmgr -cells cell_node_1 -patch_check_prereq -rolling

Patch the cell nodes

#cd /u01/exa_img_upg/CELL/patch_12.1.2.3.4.170111
dcli -l root -g <cell_group> imageinfo

nohup ./patchmgr -cells <cell_group> -patch -rolling &

To Check Progress of Image Upgrade

Monitor the patch progress
Monitor the ILOM console for each cell being patched. You may want to download the ilom-login.sh script from note 1616791.1 for assisting in logging into the iloms.
cd /u01/exa_img_upg/CELL/patch_12.1.2.3.4.170111
tail -f nohup.out

Post Patch Space Cleanup
Cleanup space
./patchmgr -cells <cell_group> -cleanup

Post Image Upgrade Validations
==============================
Post Checks
dcli -l root -g <cell_group> imageinfo -version
dcli -l root -g <cell_group> imageinfo -status
dcli -l root -g <cell_group> "uname -r"
dcli -l root -g <cell_group> cellcli -e list cell
dcli -l root -g <cell_group> /opt/oracle.cellos/CheckHWnFWProfile
Also in post check - Verify grid disk status:
(a) Verify all grid disks have been successfully put online using the following command:
dcli -l root -g cell_group cellcli -e list griddisk attributes name, asmmodestatus
[root@abcxyzadm01 patch_12.1.2.3.4.170111]# dcli -l root -g <cell_group> cellcli -e list griddisk attributes name, asmmodestatus
xyzcel07: DATA_DG_CD_00_xyzcel07 ONLINE
xyzcel07: DATA_DG_CD_01_xyzcel07 ONLINE
xyzcel07: DATA_DG_CD_02_xyzcel07 ONLINE
xyzcel07: DATA_DG_CD_03_xyzcel07 ONLINE
xyzcel07: DATA_DG_CD_04_xyzcel07 ONLINE
xyzcel07: DATA_DG_CD_05_xyzcel07 ONLINE
xyzcel07: DATA_DG_CD_06_xyzcel07 ONLINE
xyzcel07: DATA_DG_CD_07_xyzcel07 ONLINE
xyzcel07: DATA_DG_CD_08_xyzcel07 ONLINE
xyzcel07: DATA_DG_CD_09_xyzcel07 ONLINE
xyzcel07: DATA_DG_CD_10_xyzcel07 ONLINE
xyzcel07: DATA_DG_CD_11_xyzcel07 ONLINE
xyzcel07: DBFS_DG_CD_02_xyzcel07 ONLINE
xyzcel07: DBFS_DG_CD_03_xyzcel07 ONLINE
xyzcel07: DBFS_DG_CD_04_xyzcel07 ONLINE
xyzcel07: DBFS_DG_CD_05_xyzcel07 ONLINE
xyzcel07: DBFS_DG_CD_06_xyzcel07 ONLINE
xyzcel07: DBFS_DG_CD_07_xyzcel07 ONLINE
xyzcel07: DBFS_DG_CD_08_xyzcel07 ONLINE
xyzcel07: DBFS_DG_CD_09_xyzcel07 ONLINE
xyzcel07: DBFS_DG_CD_10_xyzcel07 ONLINE
xyzcel07: DBFS_DG_CD_11_xyzcel07 ONLINE
xyzcel07: RECO_DG_CD_00_xyzcel07 ONLINE
xyzcel07: RECO_DG_CD_01_xyzcel07 ONLINE
xyzcel07: RECO_DG_CD_02_xyzcel07 ONLINE
xyzcel07: RECO_DG_CD_03_xyzcel07 ONLINE
xyzcel07: RECO_DG_CD_04_xyzcel07 ONLINE
xyzcel07: RECO_DG_CD_05_xyzcel07 ONLINE
xyzcel07: RECO_DG_CD_06_xyzcel07 ONLINE
xyzcel07: RECO_DG_CD_07_xyzcel07 ONLINE
xyzcel07: RECO_DG_CD_08_xyzcel07 ONLINE
xyzcel07: RECO_DG_CD_09_xyzcel07 ONLINE
xyzcel07: RECO_DG_CD_10_xyzcel07 ONLINE
xyzcel07: RECO_DG_CD_11_xyzcel07 ONLINE
[root@abcxyzadm01 patch_12.1.2.3.4.170111]#

(b) Wait until asmmodestatus is ONLINE for all grid disks. Each disk will go to a 'SYNCING' state first then 'ONLINE'. The following is an example of the output:
DATA_CD_00_xyzcel01 ONLINE
DATA_CD_01_xyzcel01 SYNCING
DATA_CD_02_xyzcel01 OFFLINE
DATA_CD_03_xyzcel01 OFFLINE
DATA_CD_04_xyzcel01 OFFLINE
DATA_CD_05_xyzcel01 OFFLINE
DATA_CD_06_xyzcel01 OFFLINE
DATA_CD_07_xyzcel01 OFFLINE
DATA_CD_08_xyzcel01 OFFLINE
DATA_CD_09_xyzcel01 OFFLINE
DATA_CD_10_xyzcel01 OFFLINE
DATA_CD_11_xyzcel01 OFFLINE
(c) Oracle ASM synchronization is only complete when all grid disks show asmmodestatus=ONLINE.
( Please note: this operation uses Fast Mirror Resync operation - which does not trigger an ASM rebalance. The Resync operation restores only the extents that would have been written while the disk was offline.)
Note: In above command change the <cell_group> to the cell server you are patching as below
cell_node_1
cell_node_2
cell_node_3
cell_node_4
cell_node_5
cell_node_6
cell_node_7

Post Execution After Validations of Image Upgrade

1. Change disk_repair_time back to original value
select dg.name,a.value from v$asm_diskgroup dg, v$asm_attribute a where dg.group_number=a.group_number and a.name='disk_repair_time';

sqlplus / as sysasm
select dg.name,a.value from v$asm_diskgroup dg, v$asm_attribute a where dg.group_number=a.group_number and a.name='disk_repair_time';
NAME VALUE
------------------------------ ------------
DATA_DG 8.5h
DBFS_DG 8.5h
RECO_DG 8.5h
Reset the repair time to the original value if it was changed at the start of patching. Replace <diskgroup_name> in the following statement to adjust. alter diskgroup '<diskgroup_name>' set attribute 'disk_repair_time'='<original value>';
alter diskgroup '<diskgroup_name>' set attribute 'disk_repair_time'='3.6h';
Repeat the above statement for each diskgroup as below
alter diskgroup DATA_DG set attribute 'disk_repair_time'='3.6h';
alter diskgroup DBFS_DG set attribute 'disk_repair_time'='3.6h';
alter diskgroup RECO_DG set attribute 'disk_repair_time'=3.6h';

Also put back the rebalance power to 2

alter diskgroup RECO_DG rebalance power 2;
alter diskgroup DATA_DG rebalance power 2;
alter diskgroup DBFS_DG rebalance power 2;

2. Start the agents which was stopped before start the Image Upgrade, if agent is already NOT running
sudo su - oracle -c "/u01/app/em12c/core/12.1.0.4.0/bin/emctl status agent"
sudo su - oracle -c "/u01/app/em12c/core/12.1.0.4.0/bin//bin/emctl start agent"
sudo su - oracle -c "/u01/app/em12c/core/12.1.0.4.0/bin/emctl status agent"

Known Issues(In case Image Upgrade Fails)
Additional checks (if there were problems)
cd /u01/exa_img_upg/CELL/patch_12.1.2.3.4.170111
cat patchmgr.stdout
cat _wip_stdout file
ssh <cell-node>
cd /var/log/cellos
grep -i 'fail' validations.log
grep -i 'fail' vldrun*.log
cat validations.log
cat vldrun.upgrade_reimage_boot.log
cat vldrun.first_upgrade_boot.log
cat CheckHWnFWProfile.log
cat cell.bin.install.log
cat cellFirstboot.log
cat exachkcfg.log
cat patch.out.place.sh.log
cat install.sh.log

Rolling Back Successfully Patched Exadata Cells
(This section describes how to roll back successfully-patched Exadata Cells. Cells with incomplete or failed patching cannot be rolled back.)

Do not run more than one instance of the patchmgr utility at a time in the deployment.

Check the prerequisites using the following command:

./patchmgr -cells cell_group -rollback_check_prereq [-rolling]

Perform the rollback using the following command:

./patchmgr -cells cell_group –rollback [-rolling]

Switch Firmware Upgrade
Note : Switch firmware is upgraded in a rolling manner.
Step1: First upgrade the firmware of Subnet Manager
[root@xyzsw-iba01 ~]# getmaster
Local SM enabled and running, state STAND BY
20150813 00:20:59 Master SubnetManager on sm lid 1 sm guid 0x10e035c2e0a0a0 : SUN DCS 36P QDR xyzsw-ibb01 192.168.xx.xx
[root@xyzsw-iba01 ~]#

[root@xyzsw-ibb01 ~]# getmaster
Local SM enabled and running, state MASTER
20150813 00:21:00 Master SubnetManager on sm lid 1 sm guid 0x10e035c2e0a0a0 : SUN DCS 36P QDR xyzsw-ibb01 192.168.xx.xx
[root@xyzsw-ibb01 ~]#
Here xyzsw-ibb01 is the Subnet Manager

Step2:Create ibswitches.lst
cd /u01/exa_img_upg/CELL/patch_12.1.2.3.4.170111

# vi ibswitches.lst
xyzsw-ibb01
xyzsw-iba01
Step 3:Run the pre-requisite checks
#./patchmgr -ibswitches ibswitches.lst -upgrade -ibswitch_precheck
Note: If the output from the comma nd shows overall status is SUCCESS , then proceed with the upgrade.
If the output from the command shows overall status is FAIL, then review the error summary in the output to determine which checks failed, and then correct the errors. After the errors have been corrected, rerun the pre-requisite checks until it is successful.

Step 4:Upgrade the switches
#./patchmgr -ibswitches ibswitches.lst -upgrade

Step 5: Check the output from the command, and verify the upgrade
The output should show SUCCESS . If there are errors, then correct the errors and run the upgrade command again.

Compute Node Image Upgrade
==========================
==========================

Compute Node/ DB Note / YUM Patch Plan (Rolling)
=================================================

Black out the Compute Node which you are going to patch and disable the crontab entry.
Pre-checks : make sure prechecks are completed
Compute Nodes
abcxyzadm01
abcxyzadm02
abcxyzadm03
abcxyzadm04

Check image version
===================
cd /root
dcli -l root -g dbs_group imageinfo -version
dcli -l root -g dbs_group imageinfo -status
dcli -l root -g dbs_group uname -r

Verify dbnodeupdate script version
Download latest version of dbnodeupdate script from patch 21634633
Download dbserver.patch.zip as p21634633_122110_Linux-x86-64.zip, which contains dbnodeupdate.zip and patchmgr for dbnodeupdate orchestration via patch 21634633

cd /u01/exa_img_upg/YUM
unzip -o p21634633_122110_Linux-x86-64.zip
Should be at least version 5.151022
./dbnodeupdate.sh -V
ver=$(./dbnodeupdate.sh -V | awk '{print $3}'); if (( $(echo "$ver < 5.151022" | bc -l) )); then echo -e "\nFAIL: dbnodeupdate version too low. Update before proceeding.\n"; elif (( $(echo "$ver > 5.151022" | bc -l) )); then echo -e "\nPASS: dbnodeupdate version OK\n"; else echo -e "\nWARN: dbnodeupdate minimum version ($ver) detected. Check if there is a newer version before proceeding.\n"; fi
dbnodeupdate script is updated frequently (sometimes daily). If not current then download updated version.

Check databases running before stopping CRS
/u01/app/12.1.0.2/grid/bin/crsctl status resource -t -w "TYPE = ora.database.type"
ps -ef | grep pmon_ | grep -v grep

Stop the CRS (Rolling)
Execute on the Compute Node for which you are going for Image Upgrade.
/u01/app/12.1.0.2/grid/bin/crsctl disable crs
/u01/app/12.1.0.2/grid/bin/crsctl stop crs
/u01/app/12.1.0.2/grid/bin/crsctl check crs
ps -ef | grep grid | grep -v grep

Reboot server and reset ILOM
uptime

If uptime more than 7 days then reboot server
reboot

Reset the ilom
ipmitool bmc reset cold

Unmount NFS partitions
umount -a -t nfs -f -l

Run precheck
==============
cd /u01/exa_img_upg/YUM/dbserver_patch_5.170420
./dbnodeupdate.sh -u -l /u01/exa_img_upg/YUM/p25093501_121234_Linux-x86-64.zip -t 12.1.2.3.4.170111 -v

Perform backup and upgrade
==========================
Make sure to check known issues section above prior to executing dbnodeupdate.sh
cd /u01/exa_img_upg/YUM/dbserver_patch_5.170420
nohup ./dbnodeupdate.sh -u -l /u01/exa_img_upg/YUM/p25093501_121234_Linux-x86-64.zip -t 12.1.2.3.4.170111 -q &

Monitor the reboot
==================
Monitor the reboot of each node by logging into the ilom console.

After reboot completes
Before running the completion step, run the CheckHWnFWProfile script to make sure it passes. If not, shut the system down and power cycle it from the ilom ( stop /SYS, wait 5 minutes, start /SYS)
/opt/oracle.cellos/CheckHWnFWProfile

cd /u01/exa_img_upg/YUM

umount -a -t nfs -f -l

cd /u01/exa_img_upg/YUM/dbserver_patch_5.170420
./dbnodeupdate.sh -t 12.1.2.3.4.170111 -c

mount -a

Verify fuse RPMs are Installed
yum list installed | grep fuse
There should be 3 fuse rpm's. If not check note "Fuse packages removed as part of dbnodeupdate prereq check (Doc ID 2066488.1)"

Check version and status
========================
cd /root
dcli -l root -g dbs_group imageinfo -version
dcli -l root -g dbs_group imageinfo -status
dcli -l root -g dbs_group uname -r

Enable CRS
/u01/app/12.1.0.2/grid/bin/crsctl enable crs
/u01/app/12.1.0.2/grid/bin/crsctl check crs

If CRS is not already started then start the CRS on this Image Upgraded node.

/u01/app/12.1.0.2/grid/bin/crsctl start crs

Post checks
===========
/u01/app/12.1.0.2/grid/bin/crsctl status resource -t -w "TYPE = ora.database.type"

Additional checks (if there were problems)
ssh <database-node>
cd /var/log/cellos/
cat dbnodeupdate.log
cat dbserver_backup.sh.log
cat CheckHWnFWProfile.log
cat exadata.computenode.post.log
cat cellFirstboot.log
cat exachkcfg.log
cat vldrun.each_boot.log
cat validations.log

Skip starting resources if applying Cell Patch next
Check agents and restart if not running
ps -ef | grep agent | grep java | sed 's/\s\+/ /g' | cut -d " " -f 1,8 | sed 's/\/jdk.*//'
sudo su -l oracle -c "/u01/app/em12c/core/12.1.0.4.0/bin/emctl status agent"
sudo su -l oracle -c "/u01/app/em12c/core/12.1.0.4.0/bin/emctl start agent"
sudo su -l oracle -c "/u01/app/em12c/core/12.1.0.4.0/bin/emctl status agent"' | grep 'Agent is'
[root@abcxyzadm01 ~]# sudo su -l oracle -c "/u01/app/em12c/core/12.1.0.4.0/bin/emctl status agent"
Oracle Enterprise Manager Cloud Control 12c Release 4
Copyright (c) 1996, 2014 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
Agent Version : 12.1.0.4.0
OMS Version : 12.1.0.4.0
Protocol Version : 12.1.0.1.0
Agent Home : /u01/app/em12c/agent_inst
Agent Log Directory : /u01/app/em12c/agent_inst/sysman/log
Agent Binaries : /u01/app/em12c/core/12.1.0.4.0
Agent Process ID : 20987
Parent Process ID : 20811
Agent URL : https://abcxyzadm01.xxxx:3872/emd/main/
Local Agent URL in NAT : https://abcxyzadm01.xxxx:3872/emd/main/
Repository URL : https://XYZ-oemprdapp01.xxxx:4900/empbs/upload
Started at : 2015-08-28 20:49:00
Started by user : oracle
Operating System : Linux version 2.6.39-400.248.3.el6uek.bug21692254.x86_64 (amd64)
Last Reload : 2015-08-31 10:37:11
Last successful upload : 2017-05-09 16:05:20
Last attempted upload : 2017-05-09 16:05:20
Total Megabytes of XML files uploaded so far : 2,643.81
Number of XML files pending upload : 0
Size of XML files pending upload(MB) : 0
Available disk space on upload filesystem : 15.97%
Collection Status : Collections enabled
Heartbeat Status : Ok
Last attempted heartbeat to OMS : 2017-05-09 16:04:54
Last successful heartbeat to OMS : 2017-05-09 16:04:54
Next scheduled heartbeat to OMS : 2017-05-09 16:05:54

---------------------------------------------------------------
Agent is Running and Ready
[root@abcxyzadm01 ~]#

Rollback Steps in case required
===============================

1. Rolling back the update with the dbnodeupdate.sh utility:
./dbnodeupdate.sh -r

2. Reboot the server using the reboot command.

# reboot

3. Run the dbnodeupdate.sh utility in 'completion mode' to finish post patching steps
Similar like with regular updates or One-Time updates, when switching OS binaries with the same Oracle Home, the database kernel should be relinked, so the 'post completion' step needs to be performed.

./dbnodeupdate.sh -c

Search This Blog

Exadata

Exadata(Half Rack) Image Upgrade (Rolling)

Comments

Post a Comment

Popular posts from this blog

Exadata Performance Improvement Tips

Exadata Image Upgrade (Non-Rolling)