Hi, all
We have an IPsec connection with our partner, due to increasing of traffic, SRX can not handle the encryption/decryption any more, so we decide to migrate to direct connections. I put both st0 interface and physical direct connection interface in the same security zone so I don't have touch exsiting security policies or NAT rules, for the migration, I thought I just deactivate the VPN and lift the BGP import filter so routing to partner side prefix will now go out of physical interface, everything should just work, easy enought right? not so much ... somehow TCP session can not be established from either direction after cutover, security flow session indicates that sessions were created by intitating connections from either side, but there is no return traffic. Here is the diagram
[zone trust ge-1/0/1] ------[SRX]----(zone untrust, interface st0.1, interface ge-1/0/0)
## Here is the show security session interface output when VPN was deactivated
for inbound traffic internal host 172.18.63.122 is statically mapped to 28.8.12.129, for outbound traffic internal host is PAT'd to 28.8.12.135 (if this internal host does not have static NAT address assigned)
Session ID: 115841114, Policy name: allow_inbound/26, State: Active, Timeout: 8, Valid
In: 13.20.21.192/53944 --> 28.8.12.129/25;tcp, Conn Tag: 0x0, If: ge-1/0/0, Pkts: 1, Bytes: 60, CP Session ID: 113526337
Out: 172.18.63.122/25 --> 13.20.21.192/53944;tcp, Conn Tag: 0x0, If: ge-1/0/1, Pkts: 0, Bytes: 0, CP Session ID: 113526337
Session ID: 115842757, Policy name: allow_outbound/25, State: Active, Timeout: 6, Valid
In: 172.18.25.36/54664 --> 13.20.17.137/8051;tcp, Conn Tag: 0x0, If: ge-1/0/1, Pkts: 1, Bytes: 60, CP Session ID: 112611196
Out: 13.20.17.137/8051 --> 28.8.12.135/43157;tcp, Conn Tag: 0x0, If: ge-1/0/0, Pkts: 0, Bytes: 0, CP Session ID: 15897414
Look at the inbound session, obviously SRX received TCP SYNC from partner, but seems that SRX did not receive SYNC-ACK from our internal host, but from the outbound session, SRX received TCP SYNC from internal host, but did not receive SYNC-ACK from partner side.
This is a pure networking layer routing changes, there is no application side configuration changes and both partner and I verifed that routing is correct, but the above two sessions controdict to each other. By looking at the flow session, I am not sure which leg is having problem, for example for the inbound session, we can conclude that inbound from partner to SRX works, but how do I know the return session failure is because of our internal host is not sending sync-ack to SRX, or SRX failed to send syn-ack to partner, or partner side received sync-ack but failed to send back to SRX? I unfortunately don't have the luxury to take my time to do flow trace on my side in production. Where else should I look further?