Manual Node Recovery Guide
This runbook describes what steps node providers need to take during a manual node recovery.
Security warning
⚠️⚠️⚠️ Don’t get tricked into compromising your nodes. Only complete a manual node recovery if all of the following conditions are met:
- A subnet recovery is announced on the Internet Computer Status Page
- The DFINITY team reached out on the dedicated Matrix channel #ic-node-providers-incident-response:matrix.org.
- Only the DFINITY team is able to send messages on this channel. In case of an incident, permissions are adapted so that everyone can send messages.
Prerequisites
- The recovery coordinator should have communicated with you the following:
- The recovery parameters:
- The
VERSION: the commit ID of the recovery-GuestOS update image - The
VERSION-HASH: the SHA256 sum of the recovery-GuestOS update image. - The
RECOVERY-HASH: the SHA256 sum of the recovery.tar.zst
- The
- The node(s): which specific nodes managed by the NP/NO are part of the target subnet.
- The recovery parameters:
- Obtain console access to all nodes you run that are part of the target subnet.
- Note that the recovery can be completed from a physical console or from the node's remote BMC virtual console view.
Recovery Steps
For each node to recover, you should perform the following process.
Obtain console access
Again, note that the recovery can be completed from a physical console or from the node's remote BMC virtual console view.
You should see the limited-console> prompt. Type help to see the full list of limited-console commands.
Initiate manual recovery TUI
Type manual-recovery to initiate the manual recovery TUI.
You should then be taken to the manual recovery text-user-interface:
If you fail to enter the Manual Recovery TUI, see the Manual Recovery Fallback
Input recovery parameters
Input the VERSION, VERSION-HASH, and RECOVERY-HASH provided by the recovery coordinator
Please take great care to type in the characters precisely. If a single character is wrong, the recovery will not succeed and you will have to restart.
Note: certain BMCs offer a Virtual Clipboard within the Console Controls to paste text to the console, which you may find useful.
Confirm recovery parameters
Please take a moment to verify that your inputted recovery parameters exactly match those provided by the recovery coordinator. Again, if a single character is wrong, the recovery will not succeed and you will have to restart.
Monitor the recovery process
Once you have initiated the recovery process, monitor the recovery logs.
After ~30 seconds, you should see the log:
======================================================================== SUCCESS: Recovery completed successfully! ========================================================================
The system should then output standard boot logs:
Congratulations! You have successfully completed the manual node recovery!
Note that if you reach the following recovery error page, this is almost certainly a result of incorrectly inputting the recovery parameters:
If you reach the recovery error page, do not worry. Hit enter and return to the “Initiate manual recovery” step and try again. If errors still persist, please contact the recovery coordinator in the Matrix channel and post a screenshot of your recovery error page
Notify of a successful recovery
Send a message in the Matrix channel confirming that you have successfully completed recovery.
Wait for recovery confirmation
Once the recovery process on your node is complete and you have notified the Matrix channel, continue to monitor the Matrix until the subnet is back online and the recovery is complete.
⚠️ Manual Recovery Fallback ⚠️
A manual recovery fallback is available if the manual recovery TUI fails to render.
Enter the rbash-console
Type rbash-console to enter the rbash-console
Run the manual recovery fallback command:
sudo /opt/ic/bin/guestos-recovery-launcher.sh version=<VERSION> version-hash=<VERSION-HASH> recovery-hash=<RECOVERY-HASH>
You may then resume the recovery instructions from the Monitor the recovery process step.