pkg/fuzzer: possible improvements #4591

a-nogikh · 2024-03-21T14:38:22Z

This PR focuses on the fuzzing engine improvements, while #1541 lists more generic applications of the fuzzer on host mode (which hopefully soon become the only supported one).

If you want to add something, please edit this comment directly.

Refactorings

Merge syz-fuzzer and syz-runner. After all: move fuzzer to the host #4579 they effectively do the same stuff.
Instead of transferring the textual representations of programs, transfer the SerializeForExec() output directly.
Factor out the handling of fuzzer.Requests with ipc.Env. It's de facto common to syz-fuzzer/proc.go, tools/syz-execprog and pkg/fuzzer/fuzzer_test.go.
There still remains the HostFuzzer mode for targets without a Go runtime. The name is now a bit misleading, we'd better rename it.

Optimizations

resourceCentric() still takes the majority of CPU time once the fuzzer begins to mostly do exec fuzz operations. We need to cache the per-corpus prog resource list.

Statistics collection

Aggregate per-syscall statistics on average execution time, blocking probability, KCOV array overflows.

Bug reproducers

Instead of printing the executed programs in syz-fuzzer and then parsing the output, collect them in syz-manager right away.

Fuzzing algorithm improvements

Corpus program selection (aka seed selection)

We currently use quite a primitive approach -- input selection probability is linearly proportional to the total signal triggered by the input.

Possible experiment: randomly select one covered PC and then pick one of the programs that cover it.
- It was attempted once, but it did not bring statistically significant benefits in the old distributed model.

Reuse the programs discarded during triage?

These programs likely made use of the previously accumulated state, and since there was no such state on other VMs/procs, the fuzzer just could not reliably reproduce that newly discovered coverage.

Still, these programs were doing something interesting -- they did reach new coverage in the first place. We may want to make use of them:

Reuse them during fuzzing (easy): in exec fuzz, sometimes take one of those programs and prepend some new calls to them via mutator.insertCall().
Try to reconstruct the accumulated state (slow): bisect over the programs previously executed on the VM. We have the necessary generic libraries, but we'd need to have per-VM (and per-proc?) lists of previously executed programs.

Rewrite the comparison argument substitution logic

Currently we only do hint mutations for newly added corpus items (and don't do it for the programs loaded from corpus.db). The process is rather lengthy and we're quite likely to not have tried all mutations before a syz-manager instance was restarted.

Revise prog/hints.go to only do more meaningful substitutions.
Explicitly keep in corpus.db whether the hintsJob has been fully completed for every input.
Do hint substitutions not just for new inputs, but e.g. also as a part of exec fuzz: mutate some corpus program, then substitute the comparison arguments for it, even if the mutation itself did not bring new coverage. It might help discover more signal in the long run.

Repetitive crash avoidance

If we already found a crashing program, it's very likely that its mutations are also crashing the kernel. Alternatively, if one system call is very seriously broken, we may get an insane number of VM crashes.

This already used to be a problem, but now that the fuzzing engine is separated from the DUT, the negative consequences are much more visible.

It would be great for pkg/fuzzer (or some separate package) to be able to learn from VM crashes and circumvent them.

The text was updated successfully, but these errors were encountered:

a-nogikh added the enhancement label Mar 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pkg/fuzzer: possible improvements #4591

pkg/fuzzer: possible improvements #4591

a-nogikh commented Mar 21, 2024 •

edited

pkg/fuzzer: possible improvements #4591

pkg/fuzzer: possible improvements #4591

Comments

a-nogikh commented Mar 21, 2024 • edited

Refactorings

Optimizations

Statistics collection

Bug reproducers

Fuzzing algorithm improvements

Corpus program selection (aka seed selection)

Reuse the programs discarded during triage?

Rewrite the comparison argument substitution logic

Repetitive crash avoidance

a-nogikh commented Mar 21, 2024 •

edited