Share Dialog
Share Dialog

Subscribe to detoo

Subscribe to detoo

As an Ethereum PoS Validator, we want to stay online as long as we can. We could do as much to build a reliable node that keeps itself available, but one thing we don’t have much control of is the internet providers.
Unfortunately, where I setup the node I have only Comcast, which is the nation’s most beloved provider and is famous for its highly reliable uptime as well as its hassle-free cancellation process (kek).
But no High Availability is too high, that’s why I have also subscribed to Starlink. Because hey, sending blocks to the space! How cool is that?
The idea is to hot switch providers whenever it fails to reach the internet. Sounds simple, but it turns out to be not as straightforward as I thought.
For one, simply having both interfaces online is not enough. As when one fails, it usually fails at WAN level instead of LAN. From the node OS (ex. Ubuntu)’s point of view the connection is still on, so it will not switch to the other interface and will keep sending the packets through the failing one.
It seems failover on internet/DNS accessibility requires higher layer logic than a network manager would normally support. I also tried netplan bonding but it seemed to get stuck for the same reason.
Although the network manager alone might not be suitable for our need, it is good at setting interface priority. We could use it in combination of writing ourselves a simple cronjob to check the internet connections periodically, and then tell the network manager to switch interface accordingly.
The following example assumes two internet-facing interfaces: eno1, which is ethernet-based; and wlp3s0, which is WiFi-based. Both of them are connected to regular consumer routers respectively with DHCP support.
Again we use netplan as the network manager. First create the following config files (they are separated into multiple files for easier management):
/etc/netplan/00-installer-config-eno1.yaml: basic configs for eno1
network:
version: 2
ethernets:
eno1:
dhcp4: true
/etc/netplan/01-installer-config-wlp3s0.yaml: basic configs for wlp3s0 (remember to update the access-point name and password)
network:
version: 2
wifis:
wlp3s0:
access-points:
YourWiFiName:
password: YourWiFiPassword
dhcp4: true
/etc/netplan/02-prioritize-eno1.yaml: configs for prioritizing eno1
network:
version: 2
ethernets:
eno1:
dhcp4-overrides:
route-metric: 100 # lower number = higher priority
wifis:
wlp3s0:
dhcp4-overrides:
route-metric: 200
The first two files are basic configs that apply all the time. The last two files are mutually-exclusive and there should be only one enabled at any given time. This way we can tell the network manager to use either eno1 or wlp3s0, whichever is available. And then we run a cronjob to perform the check periodically.
Create the script for cronjob at ~/bin/internet-failover/internet-failover.sh or other places of your choice:
#!/bin/bash
# choose a realiable URL or IP to check against.
# we use URL here to also check against DNS failures
CHECK_URL="www.yahoo.com"
# remember to update the interface IDs if necessary
IF0="eno1"
IF1="wlp3s0"
PRI_PLAN_PATH_PREFIX="/etc/netplan/02-prioritize-"
PRI_PLAN_PATH_POSTFIX=".yaml"
# determine the current primary interface by finding the plan files
if [ -f "$PRI_PLAN_PATH_PREFIX$IF0$PRI_PLAN_PATH_POSTFIX" ]; then
echo "current primary plan: \$IF0"
CUR_IF="$IF0"
NEXT_IF="$IF1"
else
echo "current primary plan: \$IF1"
CUR_IF="$IF1"
NEXT_IF="$IF0"
fi
echo "checking if internet is available..."
ping -I $CUR_IF -q -c 2 $CHECK_URL > /dev/null 2>&1
if [ $? -ne 0]; then
echo "[WARN] internet not available"
echo "switching primary plan to: \$NEXT_IF ..."
mv "$PRI_PLAN_PATH_PREFIX$CUR_IF$PRI_PLAN_PATH_POSTFIX" "$PRI_PLAN_PATH_PREFIX$CUR_IF$PRI_PLAN_PATH_POSTFIX.disabled"
mv "$PRI_PLAN_PATH_PREFIX$NEXT_IF$PRI_PLAN_PATH_POSTFIX".disabled "$PRI_PLAN_PATH_PREFIX$NEXT_IF$PRI_PLAN_PATH_POSTFIX"
/usr/sbin/netplan apply
fi
Create the cronjob and set it to run every minute:
sudo crontab -e
# add the following line (remember to update the path to the script if necessary):
*/1 * * * * /home/username/bin/internet-failover/internet-failover.sh 2>&1 | logger -t internet-failover
# save it
# check execution
tail -f /var/log/syslog | grep internet-failover
# after a minute or so we should see the logs
CRON: (root) CMD (/home/username/bin/internet-failover/internet-failover.sh 2>&1 | logger -t internet-failover)
internet-failover: current primary plan: eno1
internet-failover: checking if internet is available...
If we are feeling lucky, try to disable the internet for eno1 and we will see the cronjob detects it and switches the provider:
CRON: (root) CMD (/home/username/bin/internet-failover/internet-failover.sh 2>&1 | logger -t internet-failover)
internet-failover: current primary plan: eno1
internet-failover: checking if internet is available...
internet-failover: [WARN] internet not available
internet-failover: switching primary plan to: wlp3s0 ...
Stake long and prosper!

As an Ethereum PoS Validator, we want to stay online as long as we can. We could do as much to build a reliable node that keeps itself available, but one thing we don’t have much control of is the internet providers.
Unfortunately, where I setup the node I have only Comcast, which is the nation’s most beloved provider and is famous for its highly reliable uptime as well as its hassle-free cancellation process (kek).
But no High Availability is too high, that’s why I have also subscribed to Starlink. Because hey, sending blocks to the space! How cool is that?
The idea is to hot switch providers whenever it fails to reach the internet. Sounds simple, but it turns out to be not as straightforward as I thought.
For one, simply having both interfaces online is not enough. As when one fails, it usually fails at WAN level instead of LAN. From the node OS (ex. Ubuntu)’s point of view the connection is still on, so it will not switch to the other interface and will keep sending the packets through the failing one.
It seems failover on internet/DNS accessibility requires higher layer logic than a network manager would normally support. I also tried netplan bonding but it seemed to get stuck for the same reason.
Although the network manager alone might not be suitable for our need, it is good at setting interface priority. We could use it in combination of writing ourselves a simple cronjob to check the internet connections periodically, and then tell the network manager to switch interface accordingly.
The following example assumes two internet-facing interfaces: eno1, which is ethernet-based; and wlp3s0, which is WiFi-based. Both of them are connected to regular consumer routers respectively with DHCP support.
Again we use netplan as the network manager. First create the following config files (they are separated into multiple files for easier management):
/etc/netplan/00-installer-config-eno1.yaml: basic configs for eno1
network:
version: 2
ethernets:
eno1:
dhcp4: true
/etc/netplan/01-installer-config-wlp3s0.yaml: basic configs for wlp3s0 (remember to update the access-point name and password)
network:
version: 2
wifis:
wlp3s0:
access-points:
YourWiFiName:
password: YourWiFiPassword
dhcp4: true
/etc/netplan/02-prioritize-eno1.yaml: configs for prioritizing eno1
network:
version: 2
ethernets:
eno1:
dhcp4-overrides:
route-metric: 100 # lower number = higher priority
wifis:
wlp3s0:
dhcp4-overrides:
route-metric: 200
The first two files are basic configs that apply all the time. The last two files are mutually-exclusive and there should be only one enabled at any given time. This way we can tell the network manager to use either eno1 or wlp3s0, whichever is available. And then we run a cronjob to perform the check periodically.
Create the script for cronjob at ~/bin/internet-failover/internet-failover.sh or other places of your choice:
#!/bin/bash
# choose a realiable URL or IP to check against.
# we use URL here to also check against DNS failures
CHECK_URL="www.yahoo.com"
# remember to update the interface IDs if necessary
IF0="eno1"
IF1="wlp3s0"
PRI_PLAN_PATH_PREFIX="/etc/netplan/02-prioritize-"
PRI_PLAN_PATH_POSTFIX=".yaml"
# determine the current primary interface by finding the plan files
if [ -f "$PRI_PLAN_PATH_PREFIX$IF0$PRI_PLAN_PATH_POSTFIX" ]; then
echo "current primary plan: \$IF0"
CUR_IF="$IF0"
NEXT_IF="$IF1"
else
echo "current primary plan: \$IF1"
CUR_IF="$IF1"
NEXT_IF="$IF0"
fi
echo "checking if internet is available..."
ping -I $CUR_IF -q -c 2 $CHECK_URL > /dev/null 2>&1
if [ $? -ne 0]; then
echo "[WARN] internet not available"
echo "switching primary plan to: \$NEXT_IF ..."
mv "$PRI_PLAN_PATH_PREFIX$CUR_IF$PRI_PLAN_PATH_POSTFIX" "$PRI_PLAN_PATH_PREFIX$CUR_IF$PRI_PLAN_PATH_POSTFIX.disabled"
mv "$PRI_PLAN_PATH_PREFIX$NEXT_IF$PRI_PLAN_PATH_POSTFIX".disabled "$PRI_PLAN_PATH_PREFIX$NEXT_IF$PRI_PLAN_PATH_POSTFIX"
/usr/sbin/netplan apply
fi
Create the cronjob and set it to run every minute:
sudo crontab -e
# add the following line (remember to update the path to the script if necessary):
*/1 * * * * /home/username/bin/internet-failover/internet-failover.sh 2>&1 | logger -t internet-failover
# save it
# check execution
tail -f /var/log/syslog | grep internet-failover
# after a minute or so we should see the logs
CRON: (root) CMD (/home/username/bin/internet-failover/internet-failover.sh 2>&1 | logger -t internet-failover)
internet-failover: current primary plan: eno1
internet-failover: checking if internet is available...
If we are feeling lucky, try to disable the internet for eno1 and we will see the cronjob detects it and switches the provider:
CRON: (root) CMD (/home/username/bin/internet-failover/internet-failover.sh 2>&1 | logger -t internet-failover)
internet-failover: current primary plan: eno1
internet-failover: checking if internet is available...
internet-failover: [WARN] internet not available
internet-failover: switching primary plan to: wlp3s0 ...
Stake long and prosper!
wlp3s0eno1network:
version: 2
ethernets:
eno1:
dhcp4-overrides:
route-metric: 200
wifis:
wlp3s0:
dhcp4-overrides:
route-metric: 100 # lower number = higher priority
wlp3s0eno1network:
version: 2
ethernets:
eno1:
dhcp4-overrides:
route-metric: 200
wifis:
wlp3s0:
dhcp4-overrides:
route-metric: 100 # lower number = higher priority
<100 subscribers
<100 subscribers
No activity yet