The IP_TRANSPARENT option of the socket implements the proxy

The socket has an IP_TRANSPARENT option, which is meant to enable a server program to listen to all IP addresses, even if they are not the local IP address, a feature that is very useful when implementing a transparent proxy server, and its use is simple: int

int opt =1;

setsockopt(server_socket,SOL_IP, IP_TRANSPARENT,&opt,sizeof(opt));


0. Introduction: TCP binding 0.0.0.0 case

TCP can bind 0.0.0.0, this is known, then in the end with which address when to determine it? The answer is "determined by doing a reverse route lookup based on the address of the connection source". If there is an address A connecting to that server, then after the server receives a syn, it looks up the route with destination address A and thus determines the source address, however if the IP_TRANSPARENT option is not set, then this connected address must be found in the local routing table or all bets are off.


So if I have a TCP server without the IP_TRANSPARENT option bound to the address 0.0.0.0 and the port bound to 80, and I want to intercept the traffic going through here to 56.56.56.56:80, what do I do? Once we know how TCP source address selection works, we just need to set up the following route.


ip route add local 56.56.56.56 dev lo tab local

This way, all traffic to 56.56.56.56 will go to local_in when it passes through this machine, because it finds the route in the local table. However, there is no 56.56.56.56 address on this machine, so when the local port 80 server replies to the syn-ack, it performs a reverse route lookup and finds the 56.56.56.56 route in the local table, which successfully returns it, and eventually the connection is successfully established between A and 56.56.56.56:80.


However, if there are more than N destinations, do we have to set N addresses in the local routing table? Is there any way to intercept all traffic arriving on port 80 with only a few rules? Yes, there is, by setting the IP_TRANSPARENT option on the socket of the proxy server.


1. General Configuration

If you want to manipulate the route lookup process, you still have to use policy routing. The following configurations are available for the above requirements.


a). Identify the traffic to be intercepted (proxied)

Optional configuration 1: For NICs only

ip rule add iif $ The NIC to which the traffic is coming tab proxy

Optional configuration 2: for more complex five-tuple information

iptables -t mangle -A PREROUTING (mark traffic for a specific port)

ip rule add fwmark (mark above) iif $ traffic incoming NIC tab proxy

b). Add routing table entries to the policy routing table

ip route add local 0.0.0.0/0 (or just write default) dev lo tab proxy

Note: When adding a routing table entry, be sure to pay attention to the type local, which is a routing type, where this type of routing entry, once traffic is matched, all traffic is sent locally for processing.


It is worth noting that the above configuration does not require adding routing table entries to the local table, because we set the IP_TRANSPARENT option, please refer to the ip_route_output_slow function in the Linux kernel code net/ipv4/route.c for the specific restrictions.


if (oldflp->oif == 0

    && (ipv4_is_multicast(oldflp->fl4_dst) ||

    oldflp->fl4_dst == htonl(0xFFFFFFFF))) {

    dev_out = ip_dev_find(net, oldflp->fl4_src); // force a lookup in the local table

    ...

}

if (! (oldflp->flags & FLOWI_FLAG_ANYSRC)) { //if the IP_TRANSPARENT option is set...

...

}

2. In summary

In summary, the configuration is complete. We can see how Linux handles routing, and if you want to understand it in depth, you still need to delve into the iproute2 tool, which provides a surface for understanding the problem. However, just as the other subsystems of Linux always have some puzzling behavior for occurring, the network subsystem has more of this behavior, some following the recommendations of relevant standards such as RFC or IEEE, and others being Linux's own implementation tricks, in the case of this article, the following topics.


a. The route you configure nexthop does not always get adopted

If your local eth1 address is 4.4.4.1/24, and you want to import transit traffic to the local application layer, a very intuitive but unavailable configuration is


ip route add 1.2.3.4 via 4.4.4.1

However, when traffic going to 1.2.3.4 passes through the local machine, using route -C to check the cache, you find that the default gateway is still the local default gateway, not 4.4.4.1 as you indicated, so if you don't understand this, you still need to see how the code is implemented. When traffic to 1.2.3.4 passes through, it will obviously match the route configured above, but the kernel needs to do some fine-tuning of the route before it can actually use it, and the resulting route cache is the real available route entry, which initially initializes the rt_gateway of the route cache entry to the destination address, which is 1.2.3.4 rth->rt_gateway


rth->rt_gateway = fl->fl4_dst;

Then in rt_set_nexthop it will determine whether to continue using the destination address for direct forwarding or to use that default gateway you configured in the routing table.


if (FIB_RES_GW(*res) &&

    FIB_RES_NH(*res).nh_scope == RT_SCOPE_LINK)

    rt->rt_gateway = FIB_RES_GW(*res);

Obviously, the default gateway you configured 4.4.4.1 will only be used if FIB_RES_NH(*res).nh_scope == RT_SCOPE_LINK is true.


The next step is to make FIB_RES_NH(*res).nh_scope == RT_SCOPE_LINK true, so set the following routing table entry.


ip route add 1.2.3.4/32 scope global via 4.4.4.1 dev eth1 onlink

A scope of global is forced, because it is specified that the next hop must be closer than the destination, so its scope must be smaller than the next hop's scope. This still doesn't work, because there is another restriction that your next-hop gateway address must be unicast


if (nh->nh_flags&RTNH_F_ONLINK) {

    struct net_device *dev;

    if (cfg->fc_scope >= RT_SCOPE_LINK)

        return -EINVAL;

    if (inet_addr_type(net, nh->nh_gw) ! = RTN_UNICAST)

        return -EINVAL;

    if ((dev = __dev_get_by_index(net, nh->nh_oif)) == NULL)

        return -ENODEV;

    if (! (dev->flags&IFF_UP))

        return -ENETDOWN;

    nh->nh_dev = dev;

    dev_hold(dev);

    nh->nh_scope = RT_SCOPE_LINK;

    return 0;

}

Since inet_addr_type returns unicast it is necessary to ensure that the address is not hit in the local table.


local_table = fib_get_table(net, RT_TABLE_LOCAL);

if (local_table) {

    ret = RTN_UNICAST;

    if (!local_table->tb_lookup(local_table, &fl, &res)) {

        if (!dev || dev == res.fi->fib_dev)

            ret = res.type;

        fib_res_put(&res);

    }

}

Then the next step is to remove the route 4.4.4.1 from the local table.


ip rou del 4.4.4.1/24 tab local

OK, this works, but it fails when sending data locally to 1.2.3.4, because there is a problem with the source address selection, and the address 4.4.4.1 is no longer in local. The above failure takes telnet 1.2.3.4 6666 as an example. When TCP connect is performed, the source address is first determined by calling ip_route_connect before it actually enters the routing module. However, the output route lookup requires the source address to be in the local table or the source socket to have the IP_TRANSPARENT option set if the source address is determined, and we know that standard telnet does not have this option, so it returns


telnet: Unable to connect to remote host: Invalid argument

This error will be returned: telnet: Unable to connect to remote host: Invalid argument. Therefore the following is required.


ip route add 1.2.3.4/32 scope global via 4.4.4.1 dev eth1 onlink src 7.7.7.7

where 7.7.7.7 is another address temporarily added to eth1, and apparently 7.7.7.7 exists in the local table.


See here, there is actually a simpler way to do this. The previous if (nh->nh_flags&RTNH_F_ONLINK) code is part of fib_check_nh, which performs the logic of setting the onlink flag, but what if it is not set? It is important to realize that all we have to do is.


1.inet_addr_type(net, nh->nh_gw) == RTN_UNICAST

2. cfg->fc_scope < RT_SCOPE_LINK

So look at the second part of fib_check_nh


struct flowi fl = {

        .nl_u = {

            .ip4_u = {

                .daddr = nh->nh_gw,

                .scope = cfg->fc_scope + 1,

            },

        },

        .oif = nh->nh_oif,

    };

    if (fl.fl4_scope < RT_SCOPE_LINK)

        fl.fl4_scope = RT_SCOPE_LINK;

    //If you get here, if gw is the local address, it will hit in the local table

    //so you still need to remove 4.4.4.1 from the local table and make sure the res has a scope of RTN_UNICAST

    if ((err = fib_lookup(net, &fl, &res)) ! = 0)

        return err;

    }

    err = -EINVAL;

    if (res.type ! = RTN_UNICAST && res.type ! = RTN_LOCAL)

        goto out;

    nh->nh_scope = res.scope;

    ...

Success, no more errors are reported, however the data does not reach the application layer either.


Translated with www.DeepL.com/Translator (free version)


b. The kernel does not perform another route lookup for the next-hop gateway

After all this effort to get the traffic to 4.4.4.1, can it really be sent to the local machine? If you experiment, you will find that the kernel arp's the 4.4.4.1 address directly on the NIC at 4.4.4.1.


ARP, Request who-has 4.4.4.1 tell 4.4.4.1, length 28

The kernel never does another route lookup for your default gateway, but simply resolves the address directly based on the scope of the route and the scope of the next-hop gateway, in this case 4.4.4.1 is linked, so obviously it will arp directly because the kernel believes that 4.4.4.1 is somewhere else at the same link layer.


c. Routing works at the network layer

While we don't want to use complex iptables, using pure routing is seen as a great trick, and routing blackhole/unreachable and arbitrarily directing arbitrary traffic is seen as a must for the network. However, routing only works at the network layer, and if you need higher-level parameters involved in filtering and steering, you have to use other means, which can be done with the iptables tool on Linux. Of course you don't have to use its firewall and NAT features, even a mark would be nice. iptables is a toolchain that interacts with policy routing via marks.


Comments

Popular posts from this blog

Python Receiving and parse JSON Data via UDP protocol

ubus lua client method and event registration code demo/example