# Post-Mortem: Resolving PanGu Testnet Bridge Issues

By [Snapchain](https://paragraph.com/@snapchain) · 2024-04-01

---

We launched our PanGu testnet last Monday and on Wednesday afternoon found some L1→ L2 bridge transactions got stuck.

### Problem 1: Insufficient Fund

Looking into the logs, we found the bridge service didn’t send out the “claim” transaction on L2 because the `claimtxmanager` account doesn’t have enough fund to pay gas fees.

This is due to there are more traffic than we expected since the launch and our ETH on that account was soon used up.

So we topped up the account with more ETH. But this doesn’t fix the issue.

### Problem 2: Rate Limiting

Looking into the logs again, we found that the new issue was the `claimtxmanager` sent a ton of “claim” transactions to catch up and then hit the account queue limit by the RPC service.

Then we increased the limit and restarted the RPC service, but still saw:

    ERROR   claimtxman/claimtxman.go:317    maximum retries and the tx is still missing in the pool.
    

### Problem 3: Dangling Transaction Reference

Our theory was when the rate limit was hit, new transactions from the account were ignored and not added to the pool. But it’s still added to the bridge service database.

So we looked into the `monitored_txs` table in the database and indeed found lots of “claim” transactions were stuck in `created` status.

So we deleted those transactions and the bridge service started to process those pending bridging transactions.

We thought the issue was quickly resolved. However, on Thursday morning, we got a few reports about pending bridge transactions for a long time.

### Problem 4: Skipped Blocks

Our theory was when we were deleting the dangling transactions in the database, we mistakenly deleted more “claim” transactions that are being processed, since the bridge service was still running while we were doing the deletion operation.

And at the same time, some L1 blocks that contain such bridging transactions where marked as processed by the bridge service and added to the `block` table.

As a result, those deposits were never actually processed.

Due to the fact that the bridge service won’t scan back to find missed in older blocks. We have to manually identify those missed block and “claim” any deposit transactions there.

One way is to perform some SQL queries and RPC calls to figure out the delta. But given that the bridge service is permissionless, meaning any account can be used to help with the “claim” process to send the fund to the designated L2 accounts, we can run a separate bridge service using a different `claimtxmanage` account and scan starting from any L1 block that is mined before the incident initially happened.

After that, all stuck bridge transactions were backfilled and processed.

### Summary

The issue was caused by unexpected high traffic and insufficient fund in our operational account. Remediations led to a a few more issues but was resolved with customized backfill operations.

The incident did not affect any transactions on the L2 itself. It only delayed a few deposit transactions from the L1 to L2.

The bridge is now fully functional and people can permissionlessly deposit their fund from L1 to L2 and withdraw back from L2 to L1.

---

*Originally published on [Snapchain](https://paragraph.com/@snapchain/post-mortem-resolving-pangu-testnet-bridge-issues)*
