Guidance for AI agents writing new acceptance tests in the Optimism monorepo. For building and running them, see acceptance-tests.md.
An acceptance test exists to describe a user-visible behaviour of the OP Stack and fail loudly when that behaviour breaks. A reader who is not a domain expert should be able to open any test in op-acceptance-tests/tests/ and understand what the system is supposed to do, without running anything.
Tests express requirements. The DSL expresses how those requirements are checked. When the technical details change, the DSL is updated once and every test benefits — including tests added in the future. When a test is flaky, the fix usually belongs in the DSL (a better wait, a missing precondition), not in the test.
This guide covers both sides: how to write a test that reads as a requirement, and how to grow the DSL so future tests stay that way.
Complexity in a test is duplicated across every test that follows. Complexity in the DSL is centralised and encapsulated. Push difficulty downward.
The DSL is not plain English and should not try to be. Its domain experts are test authors, not non-technical readers. A statement should clearly describe what it is doing without reading like a sentence.
The “language” of the DSL emerges from consistent naming and structure. Follow established patterns even when a new one would be marginally nicer for your specific test — the cognitive cost of divergent patterns outweighs the win.
If the pattern you need does not exist, extend the DSL rather than invent a one-off in the test file.
A test sets up the minimum state, performs the minimum action, and asserts the minimum outcome required to verify one behaviour. If the name wants to say “and”, split it.
// Good: one behaviour, one failure mode
func TestTransferMovesFunds(gt *testing.T) { ... }
func TestTransferChargesGasToSender(gt *testing.T) { ... }
// Bad: ambiguous failure
func TestTransferMovesFundsAndChargesGasAndUpdatesNonce(gt *testing.T) { ... }
When two behaviours always change together, the second is part of the first behaviour’s definition — fold it into an action or verification method on the DSL, not a second assertion in the test body.
Test names describe the user-visible behaviour, not the implementation.
// Good
func TestCrossChainWithdrawalFinalisesAfterProofWindow(gt *testing.T) { ... }
func TestSequencerHaltsWhenBatcherLagsBeyondSafeHead(gt *testing.T) { ... }
// Bad
func TestFlow1(gt *testing.T) { ... }
func Test_L2_CL_Adv_v2(gt *testing.T) { ... }
A name that requires reading the body to interpret is broken — rename it.
Most acceptance tests follow the same shape:
func TestSomething(gt *testing.T) {
t := devtest.ParallelT(gt)
sys := presets.NewMinimal(t)
// 1. Arrange: seed state via DSL entry points (users, contracts, nodes)
alice := sys.FunderL2.NewFundedEOA(eth.OneEther)
bob := sys.Wallet.NewEOA(sys.L2EL)
// 2. Act: call a single DSL action method
alice.Transfer(bob.Address(), eth.OneHundredthEther)
// 3. Assert: verify the user-visible outcome via DSL verification methods
bob.VerifyBalance(eth.OneHundredthEther)
}
Keep the test body at this level. If you find yourself reaching into low-level clients, RPC calls, or raw receipts to build assertions, that is a signal the DSL needs a new method — not that the test needs more lines.
As a test author, this means you should be able to call an action once and trust that what it says it did, it did. If that isn’t true for the action you need, fix the DSL method.
Verification methods in the DSL do the fetching, waiting, retrying, and asserting. Tests should never need to build their own wait/retry loops around a raw getter.
Use verification methods only to assert the behaviour the test is actually covering. Do not re-verify that setup worked — that belongs inside the setup action method. Extra verifications in the test body obscure intent and increase the number of places that need updating when behaviour changes.
The system state is asynchronous; fetching raw values and asserting on them creates flakes. A verification method bundles the fetch with a bounded wait and an assertion.
// Avoid: async state fetched and compared directly
bal := node.GetBalance(user)
require.Equal(t, expected, bal)
// Good
node.VerifyBalance(user, expected)
// Better: let the entry point decide where to verify
user.VerifyBalance(expected) // e.g. verifies across every node
Returning objects that represent entities in the system is fine — they expose further action and verification methods:
claim := game.RootClaim()
counter := claim.VerifyCountered() // waits for op-challenger to counter
counter.VerifyClaimant(honestChallenger)
counter.Attack()
Required inputs are normal parameters — the type system enforces presence. Optional inputs use a config struct plus a vararg of functions that mutate it. With* helpers exist only for the most common knobs; tests usually pass a one-off function that sets everything they need at once.
alice.Transfer(bob.Address(), eth.OneEther,
func(opts *TransferOpts) {
opts.GasLimit = 100_000
opts.AccessList = myAccessList
},
)
Reliability comes from waiting for the right condition, not from waiting longer.
Banned:
time.Sleep(...), fixed delays, “give it a second” waits.MarkFlaky() as a substitute for fixing the underlying missing wait.Required:
// Bad: timing-based hope
alice.Transfer(bobAddr, amount)
time.Sleep(5 * time.Second)
require.Equal(t, expected, bob.GetBalance())
// Good: wait for the real post-condition
alice.Transfer(bobAddr, amount)
bob.WaitForBalance(expected)
If the DSL lacks a wait for the condition you care about, add one — don’t sprinkle time.Sleep into the test. If a test only passes on rerun, it is broken, not “flaky”; treat the retry as a missing post-condition somewhere upstream and find it.
Production code paths and test code paths must be the same paths. The only acceptable variation points are explicit, observable seams — preset selection, DSL-injected doubles, launch flags read once at startup.
Banned:
isTest, skipChecksInTests).if os.Getenv("UNDER_TEST") != "" branches scattered through business logic.If you find yourself wanting one of these to make a test pass, the real fix is at the DSL or preset layer — seed different state, use a different preset, inject a different component via the existing provider. Once production code branches on “am I being tested?”, the test stops covering the production path and bugs hide in the gap.
DSL methods (including ones you add) should log what they are doing. Waiters should log what they are waiting for and the current state of the system on every poll cycle. When an acceptance test fails in CI, the logs are often the only evidence — make them speak.
Inside a test body, prefer expressive DSL calls over scattered t.Log statements. If you need a comment or log line to explain what a block of test code is doing, the DSL method is either poorly named or missing.
A failing test must give the reader enough information to diagnose without re-running.
"expected L2 balance to reflect transfer minus gas" beats "".just acceptance-test already writes logs to op-acceptance-tests/logs/testrun-<timestamp>/ (see acceptance-tests.md) — make sure the log output your test produces via the DSL is enough to understand a failure from those files alone.Re-running to “see what happened” is a sign the failure artefact is missing.
Smells are signals that the DSL is at the wrong level of abstraction for this test. They are not hard rules — they are invitations to stop and consider whether the DSL should grow.
A comment explaining what a block of test code is doing usually means the DSL method is either misnamed or too low-level.
// Smelly: test explains low-level mechanics in prose
// Deploy test contract
storeProgram := program.New().Sstore(0, 0xbeef).Bytes()
walletv2, err := system.NewWalletV2FromWalletAndChain(ctx, wallet, l2Chain)
require.NoError(t, err)
storeAddr, err := DeployProgram(ctx, walletv2, storeProgram)
require.NoError(t, err)
code, err := l2Client.CodeAt(ctx, storeAddr, nil)
require.NoError(t, err)
require.NotEmpty(t, code, "Store contract not deployed")
require.Equal(t, code, storeProgram, "Store contract code incorrect")
// Good: a DSL method captures the intent
contract := contracts.SStoreContract.Deploy(l2Node, 0xbeef)
Not every comment is bad — explaining why a specific number was chosen is useful:
// operatorFeeCharged = gasUsed * operatorFeeScalar == 1000 * 5 == 5000
tx.VerifyOperatorFeeCharged(5000)
Raw RPC clients, direct receipt manipulation, or imports of internal packages from a test file usually indicate a missing DSL method. Add it, then use it.
A test that verifies its own setup completed (“did the deposit arrive?”) before running its real assertions is duplicating work that belongs inside the setup action method. Move it.
If two tests open with the same ten lines of boilerplate, that boilerplate belongs in a helper or preset — not pasted.
Write a new DSL method when:
require.Eventually or retry loop.When extending the DSL, apply the same patterns this guide prescribes for tests: action methods check/act/assert, verification methods wait internally, optional args use the opts-struct vararg pattern, and no method should require the caller to add their own wait or retry loop.
time.Sleep, no hand-rolled retry loops, no MarkFlaky band-aid.