Quantcast
Channel: SRX Services Gateway topics
Viewing all articles
Browse latest Browse all 3959

SRX1400 - large number of hosts, SNAT allocation failure, TCP retransmission

$
0
0

I'm kind new here, however I'm been studying SRX JunOS for about 6 months, and sometimes I find some caveats and this forum still being my support, and I'm thankful for that and the effort of the community.

            However, there's a problem that I'm been studying it a lot:

            We, a public university, have a setup of two SRX1400 in HA with a link of 10 Gbps and internet of 1 Gbps. There's about 30 SNATs for different purposes with approximate 10 to 100 hosts/clients and the Internet access is pretty good.

            However, there's a SNAT for a public WIFI network that could reach 3000 hosts/clients easily and the Internet access is really poor (~76% of packet loss) and the packet lost in the link (host/client <-> SRX1400 gateway) is 0% of failure. The first problem was the DNS UDP queries, they didn't reach the outside DNS and problem start with no domain resolution, then TCP connection weren't made with the external servers. So, I brought an interface of our DNS inside the network and the DNS queries success rate raised to 100%. So the problem starts to become more "tactile".

            Next, I checked the CPU load (~0.30, ok), MEM (~30% free, ok) and our NAT logs and see a lot of this message:

            RT_FLOW_SESSION_CLOSE: session closed source NAT allocation failure
            
            Another symptom is the great number of ACK Retransmissions.
            
            So...
            
            First, I increased the aging timeout of the session flow
                set security flow aging early-ageout 20
                
            But, no success.

            So I tried to understand the process of session creation in the SRX and learned that there's a default limit for each SNAT of 128 concurrent sessions for destination-based. I created a screen to increase this limit, however I adjusted some instructions described here:
                https://www.juniper.net/documentation/en_US/junos/topics/concept/denial-of-service-firewall-destination-based-session-limit-understanding.html
                https://www.juniper.net/documentation/en_US/junos/topics/example/denial-of-service-firewall-destination-based-session-limit-setting-cli.html
            to increase the destination-based number in the INTERNAL_OPENWIFI zone, so a large number of clients could access the same host at the "same" time.
            
            But I'm still getting these SNAT FLOW errors (no success).
            
            The number of sessions is ~80000, with ~7000 invalidate sessions (I think this number is pretty high), but the session limit of the SRX is about 2^20 (1048576), so the number of sessions is a way bellow the maximum (I thinks this is good).
            
            I have the impression that the SRX is doing a WFQ (Weighted Fair Queue) between the SNATs transferring (INTERNAL_{Zone1|Zone2|...|ZoneN} -> UNTRUST), so I think it could be reserving the same bandwidth to SNATs with less hosts, however, I didn't find any source check this and to teach how to "tame" it if this really exists.
            
            If someone could help me with something, it will help us and a lot of users Smiley Happy


Viewing all articles
Browse latest Browse all 3959

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>