RFC: vm: split no output from test machine errors #4808

a-nogikh · 2024-05-14T17:21:46Z

It's quite confusing that we in fact combine two kinds of problems into one:

We are no longer able to execute syzkaller-generated programs.
The VM has hanged.

Let's use different error messages and different timeouts for these issues.

In the first case, keep on monitoring for the program execution logs. In the second case, look at any output from the VM. Additionally, attempt periodic SSH connections.

Now it becomes possible to test (2) using C reproducers.

TODO:

Update the documentation.
Update tests.

It's quite confusing that we in fact combine two kinds of problems into one: 1) We are no longer able to execute syzkaller-generated programs. 2) The VM has hanged. Let's use different error messages and different timeouts for these issues. In the first case, keep on monitoring for the program execution logs. In the second case, look at any output from the VM. Additionally, attempt periodic SSH connections. Now it becomes possible to test (2) using C reproducers.

dvyukov · 2024-05-15T07:24:42Z

vm/vm.go

-	vmDiagnosisStart    = "\nVM DIAGNOSIS:\n"
+	lostConnectionCrash   = "lost connection to test machine"
+	noOutputCrash         = "no output from test machine"
+	executionStalledCrash = "execution stalled"


This needs to be understandable by kernel developers who know nothing about syzkaller. All they will get is this string and likely not much else (no reproducer, no report).

dvyukov · 2024-05-15T07:25:39Z

vm/vm.go

 }

 func (mon *monitor) monitorExecution() *report.Report {
 	ticker := time.NewTicker(tickerPeriod * mon.inst.pool.timeouts.Scale)
 	defer ticker.Stop()
+
+	alive := make(chan bool)


This change needs a test.

dvyukov · 2024-05-15T07:28:44Z

It's quite confusing that we in fact combine two kinds of problems into one:

We are no longer able to execute syzkaller-generated programs.

The VM has hanged.

Let's use different error messages and different timeouts for these issues.

In the first case, keep on monitoring for the program execution logs. In the second case, look at any output from the VM. Additionally, attempt periodic SSH connections.

Now it becomes possible to test (2) using C reproducers.

I don't understand the exact difference. In the first case it's also likely hanged, in the second case it's also not executing programs. So both cases are no output and hanged.

We can create C reproducers for both of these failures already, no?

a-nogikh · 2024-05-15T08:42:33Z

The periodic executing program output (more precisely, the absence of it) is not a 100% reliable indicator of a kernel problem.

We may have hit some problem in syz-executor/syz-execprog/syz-fuzzer and it just stopped execution.
We don't print executing program in C reproducers. So we should not be looking for executing program in the output in the first place, it will never be there.

In the light of (2) and that we have explicitly tried to avoid running C reproducers for more than 5 minutes not to hit the no output timeout, it's very strange to see C reproducers in https://syzkaller.appspot.com/bug?extid=2e40940976be9f8fce8ba3d1d03b77aee9f4df9d. They should have remained syz reproducers.

So the idea here is to detect kernel hangs more precisely: there must be no output at all and the VM should not accept any new connections anymore. It's the actual no output from machine, or, maybe, machine stalled.

Independently from that, if we test a syz reproducer, we can look for no "executing program" substrings in the output". If there are signs that the VM is alive, but no executions, this is a sign for us that there are some syzkaller bugs that we should look into. Probably prefixing it with SYZFATAL: can make it more clear.

dvyukov · 2024-05-15T08:59:32Z

We print "executing program" in C reproducers:

syzkaller$ git grep "executing program" pkg/csource/
pkg/csource/csource.go:                 fmt.Fprintf(buf, "\tif (write(1, \"executing program\\n\", sizeof(\"executing program\\n\") - 1)) {}\n")

a-nogikh · 2024-05-15T09:19:03Z

Hmm, interesting. I've just looked at ~10 random C repoducers from syzbot and I see executing program in none of them.

dvyukov · 2024-05-15T09:33:00Z

Either they were not repeating, or something broke.

a-nogikh · 2024-05-17T09:39:48Z

The flag is indeed set before we start reproduction:

syzkaller/pkg/csource/options.go

Line 172 in 4130c19

Repro: true,

Bug then we clear it right after we have found a reproducer:

syzkaller/pkg/repro/repro.go

Lines 238 to 242 in 4130c19

    
           defer func() { 
        
           	if res != nil { 
        
           		res.Opts.Repro = false 
        
           	} 
        
           }()

We form the actual repro C code right before sending it to the dashboard, now with Repro=false.

syzkaller/syz-manager/manager.go

Line 1068 in 4130c19

cprog, err := csource.Write(repro.Prog, repro.Opts)

It's this way for a very long time already (?), so we likely don't have executing program in all of the currently open bugs.

UPD: sent #4816

a-nogikh requested a review from dvyukov May 14, 2024 17:21

dvyukov reviewed May 15, 2024

View reviewed changes

a-nogikh closed this May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: vm: split no output from test machine errors #4808

RFC: vm: split no output from test machine errors #4808

a-nogikh commented May 14, 2024 •

edited

dvyukov May 15, 2024

dvyukov May 15, 2024

dvyukov commented May 15, 2024

a-nogikh commented May 15, 2024

dvyukov commented May 15, 2024

a-nogikh commented May 15, 2024

dvyukov commented May 15, 2024

a-nogikh commented May 17, 2024 •

edited

RFC: vm: split no output from test machine errors #4808

RFC: vm: split no output from test machine errors #4808

Conversation

a-nogikh commented May 14, 2024 • edited

dvyukov May 15, 2024

Choose a reason for hiding this comment

dvyukov May 15, 2024

Choose a reason for hiding this comment

dvyukov commented May 15, 2024

a-nogikh commented May 15, 2024

dvyukov commented May 15, 2024

a-nogikh commented May 15, 2024

dvyukov commented May 15, 2024

a-nogikh commented May 17, 2024 • edited

a-nogikh commented May 14, 2024 •

edited

a-nogikh commented May 17, 2024 •

edited