# Linux Bonding配置手册


By [likizju](https://paragraph.com/@likizju) · 2021-11-11

---

Linux Bonding配置手册

1.  bonding简介
    

Linux bonding驱动提供了聚合多个网络接口成为一个单一逻辑接口的方法。对于重要的服务器， bonding可以通过hot standby来提供failover特性，提高系统的可靠性；而对于像文件服务器这样的对网络要求很高的场合,bonding可以极大的提高网络IO总带宽及性能。

1.  网络拓扑
    
                 |uplink                         uplink|
        
           +-----+----+                          +-----+----+
        
           |          |堆叠                  |          |
        
           | switch A +--------------------------+ switch B |
        
           |          |                          |          |
        
           +-----+----+                          +-----++---+
        
                 |port1                           port1|
        
                 |             +-------+               |
        
                 +-------------+ host1 +---------------+
        
                              eth0 +-------+ eth1
        
    
2.  网络配置
    

目前新版本Linux内核都已经内置bonding支持，只需要安装ifenslave后即可进行配置，以Debian为例。

106.2.49.195这台机器有2张网卡，现在需要做如下操作：

eth0和eth1 做bond，模式0 ，使用IP：106.2.49.195

编辑 /etc/network/interfaces ：

auto bond0

iface bond0 inet static

bond\_slaves eth0 eth1

bond\_mode balance-rr

bond\_xmit\_hash\_policy layer2+3

bond\_miimon 100

bond\_downdelay 200

bond\_updelay 200

address 106.2.49.195

netmask 255.255.255.0

gateway 106.2.49.1

重启网络后可以看到虚拟的网卡bond0， 在/sys/class/net/bond0/bonding下可以看到bond0的相关信息：如工作模式，当前状态等等。

1.  注意事项
    

1）Bonding常用的模式有：

0或者balance-rr

1或者active-backup

4或者802.3ad

2）系统下做bonding时，要先将上联交换机对应端口配置成port-channel并且设定为对应的模式，不然无法正常工作。

downdelay

Specifies the time, in milliseconds, to wait before disabling

a slave after a link failure has been detected.This option

is only valid for the miimon link monitor.The downdelay

value should be a multiple of the miimon value; if not, it

will be rounded down to the nearest multiple.The default

         value is 0.
    

lacp\_rate

Option specifying the rate in which we'll ask our link partner

to transmit LACPDU packets in 802.3ad mode.Possible values

are:

slow or 0

          Request partner to transmit LACPDUs every 30 seconds
    

fast or 1

          Request partner to transmit LACPDUs every 1 second
    

The default is slow.

miimon

         Specifies the MII link monitoring frequency in milliseconds.
    
         This determines how often the link state of each slave is
    
         inspected for link failures.  A value of zero disables MII
    
         link monitoring.  A value of 100 is a good starting point.
    
         The use_carrier option, below, affects how the link state is
    
         determined.  See the High Availability section for additional
    
         information.  The default value is 0.
    

mode

        Specifies one of the bonding policies. The default is
        balance-rr (round robin).  Possible values are:
    
        balance-rr or 0
    
               Round-robin policy: Transmit packets in sequential
               order from the first available slave through the
               last.  This mode provides load balancing and fault
               tolerance.
    
        active-backup or 1
    
               Active-backup policy: Only one slave in the bond is
               active.  A different slave becomes active if, and only
               if, the active slave fails.  The bond's MAC address is
               externally visible on only one port (network adapter)
               to avoid confusing the switch.
    
               In bonding version 2.6.2 or later, when a failover
               occurs in active-backup mode, bonding will issue one
               or more gratuitous ARPs on the newly active slave.
               One gratuitous ARP is issued for the bonding master
               interface and each VLAN interfaces configured above
               it, provided that the interface has at least one IP
               address configured.  Gratuitous ARPs issued for VLAN
               interfaces are tagged with the appropriate VLAN id.
    
               This mode provides fault tolerance.  The primary
               option, documented below, affects the behavior of this
               mode.
    
        balance-xor or 2
    
               XOR policy: Transmit based on the selected transmit
               hash policy.  The default policy is a simple [(source
                MAC address XOR'd with destination MAC address XOR
               packet type ID) modulo slave count].  Alternate transmit
               policies may be selected via the xmit_hash_policy option,
               described below.
    
               This mode provides load balancing and fault tolerance.
    
        broadcast or 3
    
               Broadcast policy: transmits everything on all slave
               interfaces.  This mode provides fault tolerance.
    
        802.3ad or 4
    
               IEEE 802.3ad Dynamic link aggregation.  Creates
               aggregation groups that share the same speed and
               duplex settings.  Utilizes all slaves in the active
               aggregator according to the 802.3ad specification.
    
               Slave selection for outgoing traffic is done according
               to the transmit hash policy, which may be changed from
               the default simple XOR policy via the xmit_hash_policy
               option, documented below.  Note that not all transmit
               policies may be 802.3ad compliant, particularly in
               regards to the packet mis-ordering requirements of
               section 43.2.4 of the 802.3ad standard.  Differing
               peer implementations will have varying tolerances for
               noncompliance.
    
               Prerequisites:
    
               1. Ethtool support in the base drivers for retrieving
               the speed and duplex of each slave.
    
               2. A switch that supports IEEE 802.3ad Dynamic link
               aggregation.
    
               Most switches will require some type of configuration
               to enable 802.3ad mode.
    
        balance-tlb or 5
    
               Adaptive transmit load balancing: channel bonding that
               does not require any special switch support.
    
               In tlb_dynamic_lb=1 mode; the outgoing traffic is
               distributed according to the current load (computed
               relative to the speed) on each slave.
    
               In tlb_dynamic_lb=0 mode; the load balancing based on
               current load is disabled and the load is distributed
               only using the hash distribution.
    
               Incoming traffic is received by the current slave.
               If the receiving slave fails, another slave takes over
               the MAC address of the failed receiving slave.
    
               Prerequisite:
    
               Ethtool support in the base drivers for retrieving the
               speed of each slave.
    
        balance-alb or 6
    
               Adaptive load balancing: includes balance-tlb plus
               receive load balancing (rlb) for IPV4 traffic, and
               does not require any special switch support.  The
               receive load balancing is achieved by ARP negotiation.
               The bonding driver intercepts the ARP Replies sent by
               the local system on their way out and overwrites the
               source hardware address with the unique hardware
               address of one of the slaves in the bond such that
               different peers use different hardware addresses for
               the server.
    
               Receive traffic from connections created by the server
               is also balanced.  When the local system sends an ARP
               Request the bonding driver copies and saves the peer's
               IP information from the ARP packet.  When the ARP
               Reply arrives from the peer, its hardware address is
               retrieved and the bonding driver initiates an ARP
               reply to this peer assigning it to one of the slaves
               in the bond.  A problematic outcome of using ARP
               negotiation for balancing is that each time that an
               ARP request is broadcast it uses the hardware address
               of the bond.  Hence, peers learn the hardware address
               of the bond and the balancing of receive traffic
               collapses to the current slave.  This is handled by
               sending updates (ARP Replies) to all the peers with
               their individually assigned hardware address such that
               the traffic is redistributed.  Receive traffic is also
               redistributed when a new slave is added to the bond
               and when an inactive slave is re-activated.  The
               receive load is distributed sequentially (round robin)
               among the group of highest speed slaves in the bond.
    
               When a link is reconnected or a new slave joins the
               bond the receive traffic is redistributed among all
               active slaves in the bond by initiating ARP Replies
               with the selected MAC address to each of the
               clients. The updelay parameter (detailed below) must
               be set to a value equal or greater than the switch's
               forwarding delay so that the ARP Replies sent to the
               peers will not be blocked by the switch.
    
               Prerequisites:
    
               1. Ethtool support in the base drivers for retrieving
               the speed of each slave.
    
               2. Base driver support for setting the hardware
               address of a device while it is open.  This is
               required so that there will always be one slave in the
               team using the bond hardware address (the
               curr_active_slave) while having a unique hardware
               address for each slave in the bond.  If the
               curr_active_slave fails its hardware address is
               swapped with the new curr_active_slave that was
               chosen.
    

updelay

        Specifies the time, in milliseconds, to wait before enabling a
        slave after a link recovery has been detected.  This option is
        only valid for the miimon link monitor.  The updelay value
        should be a multiple of the miimon value; if not, it will be
        rounded down to the nearest multiple.  The default value is 0.
    

use\_carrier

        Specifies whether or not miimon should use MII or ETHTOOL
        ioctls vs. netif_carrier_ok() to determine the link
        status. The MII or ETHTOOL ioctls are less efficient and
        utilize a deprecated calling sequence within the kernel.  The
        netif_carrier_ok() relies on the device driver to maintain its
        state with netif_carrier_on/off; at this writing, most, but
        not all, device drivers support this facility.
    
        If bonding insists that the link is up when it should not be,
        it may be that your network device driver does not support
        netif_carrier_on/off.  The default state for netif_carrier is
        "carrier on," so if a driver does not support netif_carrier,
        it will appear as if the link is always up.  In this case,
        setting use_carrier to 0 will cause bonding to revert to the
        MII / ETHTOOL ioctl method to determine the link state.
    
        A value of 1 enables the use of netif_carrier_ok(), a value of
        0 will use the deprecated MII / ETHTOOL ioctls.  The default
        value is 1.
    

xmit\_hash\_policy

        Selects the transmit hash policy to use for slave selection in
        balance-xor, 802.3ad, and tlb modes.  Possible values are:
    
        layer2
    
               Uses XOR of hardware MAC addresses and packet type ID
               field to generate the hash. The formula is
    
               hash = source MAC XOR destination MAC XOR packet type ID
               slave number = hash modulo slave count
    
               This algorithm will place all traffic to a particular
               network peer on the same slave.
    
               This algorithm is 802.3ad compliant.
    
        layer2+3
    
               This policy uses a combination of layer2 and layer3
               protocol information to generate the hash.
    
               Uses XOR of hardware MAC addresses and IP addresses to
               generate the hash.  The formula is
    
               hash = source MAC XOR destination MAC XOR packet type ID
               hash = hash XOR source IP XOR destination IP
               hash = hash XOR (hash RSHIFT 16)
               hash = hash XOR (hash RSHIFT 8)
               And then hash is reduced modulo slave count.
    
               If the protocol is IPv6 then the source and destination
               addresses are first hashed using ipv6_addr_hash.
    
               This algorithm will place all traffic to a particular
               network peer on the same slave.  For non-IP traffic,
               the formula is the same as for the layer2 transmit
               hash policy.
    
               This policy is intended to provide a more balanced
               distribution of traffic than layer2 alone, especially
               in environments where a layer3 gateway device is
               required to reach most destinations.
    
               This algorithm is 802.3ad compliant.
    
        layer3+4
    
               This policy uses upper layer protocol information,
               when available, to generate the hash.  This allows for
               traffic to a particular network peer to span multiple
               slaves, although a single connection will not span
               multiple slaves.
    
               The formula for unfragmented TCP and UDP packets is
    
               hash = source port, destination port (as in the header)
               hash = hash XOR source IP XOR destination IP
               hash = hash XOR (hash RSHIFT 16)
               hash = hash XOR (hash RSHIFT 8)
               And then hash is reduced modulo slave count.
    
               If the protocol is IPv6 then the source and destination
               addresses are first hashed using ipv6_addr_hash.
    
               For fragmented TCP or UDP packets and all other IPv4 and
               IPv6 protocol traffic, the source and destination port
               information is omitted.  For non-IP traffic, the
               formula is the same as for the layer2 transmit hash
               policy.
    
               This algorithm is not fully 802.3ad compliant.  A
               single TCP or UDP conversation containing both
               fragmented and unfragmented packets will see packets
               striped across two interfaces.  This may result in out
               of order delivery.  Most traffic types will not meet
               this criteria, as TCP rarely fragments traffic, and
               most UDP traffic is not involved in extended
               conversations.  Other implementations of 802.3ad may
               or may not tolerate this noncompliance.
    
        encap2+3
    
               This policy uses the same formula as layer2+3 but it
               relies on skb_flow_dissect to obtain the header fields
               which might result in the use of inner headers if an
               encapsulation protocol is used. For example this will
               improve the performance for tunnel users because the
               packets will be distributed according to the encapsulated
               flows.
    
        encap3+4
    
               This policy uses the same formula as layer3+4 but it
               relies on skb_flow_dissect to obtain the header fields
               which might result in the use of inner headers if an
               encapsulation protocol is used. For example this will
               improve the performance for tunnel users because the
               packets will be distributed according to the encapsulated
               flows.
    
        The default value is layer2.  This option was added in bonding
        version 2.6.3.  In earlier versions of bonding, this parameter
        does not exist, and the layer2 policy is the only policy.  The
        layer2+3 value was added for bonding version 3.2.2.
    

Creating and Destroying Bonds
-----------------------------

To add a new bond foo:

echo +foo > /sys/class/net/bonding\_masters
===========================================

To remove an existing bond bar:

echo -bar > /sys/class/net/bonding\_masters
===========================================

To show all existing bonds:

cat /sys/class/net/bonding\_masters
===================================

Ref: [https://www.kernel.org/doc/Documentation/networking/bonding.txt](https://www.kernel.org/doc/Documentation/networking/bonding.txt)

bonding的模式：0-6，即：7种模式

第一种模式：mod=0 ，即：(balance-rr) Round-robin policy（平衡抡循环策略） 特点：传输数据包顺序是依次传输（即：第1个包走eth0，下一个包就走eth1....一直循环下去，直到最后一个传输完毕）， 此模式提供负载平衡和容错能力；但是我们知道如果一个连接或者会话的数据包从不同的接口发出的话，中途再经过不同的链路，在客户端很有可能会出现数据包无序到达的问题，而无序到达的数据包需要重新要求被发送，这样网络的吞吐量就会下降

第二种模式：mod=1，即： (active-backup) Active-backup policy（主-备份策略） 特点：只有一个设备处于活动状态，当 一个宕掉另一个马上由备份转换为主设备。mac地址是外部可见得，从外面看来，bond的MAC地址是唯一的，以避免switch(交换机)发生混乱。此模式只提供了容错能力；由此可见此算法的优点是可以提供高网络连接的可用性，但是它的资源利用率较低，只有一个接口处于工作状态，在有 N 个网络接口的情况下，资源利用率为1/N

第三种模式：mod=2，即：(balance-xor) XOR policy（平衡策略） 特点：基于指定的传输HASH策略传输数据包。缺省的策略是：(源MAC地址 XOR 目标MAC地址) % slave数量。其他的传输策略可以通过xmit\_hash\_policy选项指定，此模式提供负载平衡和容错能力

第四种模式：mod=3，即：broadcast（广播策略） 特点：在每个slave接口上传输每个数据包，此模式提供了容错能力

第五种模式：mod=4，即：(802.3ad) IEEE 802.3ad Dynamic link aggregation（IEEE 802.3ad 动态链接聚合） 特点：创建一个聚合组，它们共享同样的速率和双工设定。根据802.3ad规范将多个slave工作在同一个激活的聚合体下。 外 出流量的slave选举是基于传输hash策略，该策略可以通过xmit\_hash\_policy选项从缺省的XOR策略改变到其他策略。需要注意的是， 并不是所有的传输策略都是802.3ad适应的，尤其考虑到在802.3ad标准43.2.4章节提及的包乱序问题。不同的实现可能会有不同的适应性。 必要条件： 条件1：ethtool支持获取每个slave的速率和双工设定 条件2：switch(交换机)支持IEEE 802.3ad Dynamic link aggregation 条件3：大多数switch(交换机)需要经过特定配置才能支持802.3ad模式

第六种模式：mod=5，即：(balance-tlb) Adaptive transmit load balancing（适配器传输负载均衡） 特点：不需要任何特别的switch(交换机)支持的通道bonding。在每个slave上根据当前的负载（根据速度计算）分配外出流量。如果正在接受数据的slave出故障了，另一个slave接管失败的slave的MAC地址。 该模式的必要条件：ethtool支持获取每个slave的速率

第七种模式：mod=6，即：(balance-alb) Adaptive load balancing（适配器适应性负载均衡） 特点：该模式包含了balance-tlb模式，同时加上针对IPV4流量的接收负载均衡(receive load balance, rlb)，而且不需要任何switch(交换机)的支持。接收负载均衡是通过ARP协商实现的。bonding驱动截获本机发送的ARP应答，并把源硬件地址改写为bond中某个slave的唯一硬件地址，从而使得不同的对端使用不同的硬件地址进行通信。 来 自服务器端的接收流量也会被均衡。当本机发送ARP请求时，bonding驱动把对端的IP信息从ARP包中复制并保存下来。当ARP应答从对端到达 时，bonding驱动把它的硬件地址提取出来，并发起一个ARP应答给bond中的某个slave。使用ARP协商进行负载均衡的一个问题是：每次广播 ARP请求时都会使用bond的硬件地址，因此对端学习到这个硬件地址后，接收流量将会全部刘翔当前的slave。这个问题通过给所有的对端发送更新 （ARP应答）来解决，应答中包含他们独一无二的硬件地址，从而导致流量重新分布。当新的slave加入到bond中时，或者某个未激活的slave重新 激活时，接收流量也要重新分布。接收的负载被顺序地分布（round robin）在bond中最高速的slave上 当某个链路被重新接上，或者 一个新的slave加入到bond中，接收流量在所有当前激活的slave中全部重新分配，通过使用指定的MAC地址给每个 client发起ARP应答。下面介绍的updelay参数必须被设置为某个大于等于switch(交换机)转发延时的值，从而保证发往对端的ARP应答 不会被switch(交换机)阻截。 必要条件： 条件1：ethtool支持获取每个slave的速率； 条件2：底层驱动支持设置 某个设备的硬件地址，从而使得总是有个slave(curr\_active\_slave)使用bond的硬件地址，同时保证每个bond 中的slave都有一个唯一的硬件地址。如果curr\_active\_slave出故障，它的硬件地址将会被新选出来的 curr\_active\_slave接管 其实mod=6与mod=0的区别：mod=6，先把eth0流量占满，再占eth1，....ethX；而mod=0的话，会发现2个口的流量都很稳定，基本一样的带宽。而mod=6，会发现第一个口流量很高，第2个口只占了小部分流量

来源： [http://wushank.blog.51cto.com/3489095/1147864](http://wushank.blog.51cto.com/3489095/1147864)

---

*Originally published on [likizju](https://paragraph.com/@likizju/linux-bonding)*