Skip to content

Proposal 8 — ApConnection.reconnectImplConnect(): Socket-steal causes infinite reconnect loop #10

@scottf

Description

@scottf

Problem

reconnectImplConnect() steals passive.dataPort into this.dataPort, then calls newPassive(). newPassive() calls passive.close(false, true) on the old passive. close() calls closeSocketImpl() which calls dataPort.forceClose().

Since this.dataPort and passive.dataPort are the same object after the steal, this closes the socket that active is now using. Active's reader throws an exception, which triggers handleCommunicationIssuereconnectImplConnect() again. The new passive is connected, so the steal runs again, closing the socket again — infinite loop.

Observed in practice

Every failover (LDM or crash) triggered a rapid-fire loop of failover_complete log lines with no stable connected state, until the application was restarted.

Proposed fix

Save the DataPort reference before the steal, then null oldPassive.dataPort so close() cannot touch it. Move the connecting guard before any passive state is modified:

@Override
protected void reconnectImplConnect() throws InterruptedException {
    if (passive == null) { return; }
    if (!passive.isConnected()) { /* ... fall back to standard reconnect ... */ return; }

    // Check connecting flag BEFORE touching passive state (early-exit guard)
    statusLock.lock();
    try {
        if (this.connecting) { return; }
        this.connecting = true;
        statusChanged.signalAll();
    } finally { statusLock.unlock(); }

    NatsConnection oldPassive = this.passive;
    this.passive = null;                         // re-entrant guard

    DataPort stolenDataPort = oldPassive.dataPort;
    NatsUri  stolenServer   = oldPassive.currentServer;
    oldPassive.dataPort       = null;            // prevent close() from closing stolen socket
    oldPassive.dataPortFuture = null;

    oldPassive.reader.stop(false);  // running=false, no shutdownInput — socket stays live
    oldPassive.writer.stop();

    // ... steal stolenDataPort into this.dataPort, updateStatus(CONNECTED) ...

    apServerPool.setActiveServer(currentServer);
    newPassive(); // this.passive is null, so old passive's close() is not called here
}

Current workaround

Applied directly in the local java-active-passive source until a release is available.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions