Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code coverage metrics for libFuzzer #41

Open
JulianVolodia opened this issue May 8, 2020 · 15 comments
Open

Code coverage metrics for libFuzzer #41

JulianVolodia opened this issue May 8, 2020 · 15 comments

Comments

@JulianVolodia
Copy link

Hi!

I want to better know how experienced ppl measure coverage for fuzzing nowadays.
There was quite nice method with sancov and libFuzzer -dump_coverage=1 flag in older libFuzzer version, but now deprecated.
I seen that 15mo and 2y ago @kcc was involved in it, so maybe you know what should be done instead?

I haven't managed to make Clang Coverage working with libxml2 fuzzing example mentioned in 8th lesson of Dor1s/libfuzzer-workshop, so could you tell me:

  1. what is 'rule of thumb' for managing code coverage now?
  2. is there any example of Clang Coverage done with complex library and fuzzer to see how it was done and learn from it?
  3. which libFuzzer version is used on OSS-Fuzz project?

Best regards!

@kcc
Copy link
Contributor

kcc commented May 11, 2020 via email

@JulianVolodia
Copy link
Author

I think I lacking some basic experience and understanding. Until going deeper I think I can't go through it. Give me one more week and if I could I would be very happy if You could help me with my doubts. Still, thanks for info You gave me, @kcc

@JulianVolodia
Copy link
Author

If you have any resources worth reading about this or could throw me any link which works well and generate that lovely graph that would be awesome.

@Dor1s
Copy link
Contributor

Dor1s commented Sep 10, 2020

https://clang.llvm.org/docs/SourceBasedCodeCoverage.html page has the instructions on how generate code coverage report for a single file.

If you want to generate code coverage report for a fuzz target linked with some library (e.g. libxml), you need to make sure that all files are compiled with -fprofile-instr-generate -fcoverage-mapping.

@damgut
Copy link

damgut commented Feb 1, 2022

I have the same problem. I would like to have a visual coverage, like the tool "gcovr" for gcc does (e.g. in html).
As @Dor1s wrote, I can generate a simple coverage in text mode by doing:

clang -fprofile-instr-generate -fcoverage-mapping hello.c
LLVM_PROFILE_FILE="coverage.profraw" a.out # this command creates file "coverage.profraw"
llvm-profdata-10 merge -sparse coverage.profraw -o coverage.profdata # this command creates file "coverage.profdata"
llvm-cov-10 show --format=html ./a.out -instr-profile=coverage.profdata > coverage.html

But the problem is that if I use "-fprofile-instr-generate -fcoverage-mapping" together with "-fsanitize=address,fuzzer", after the execution stops (crash or exit) no file ".profraw" is created. I guess the reason is that sanitize breaks the program execution before ".profraw" is created.

Any ideas?

@maflcko
Copy link
Contributor

maflcko commented Feb 1, 2022

You don't need the address sanitizer enabled to create coverage for you source code. -fsanitize=fuzzer together with the coverage flags should be enough and work around any sanitizer issues. Though, I recommend addressing the address sanitizer reports regardless.

@damgut
Copy link

damgut commented Feb 1, 2022

Hi @MarcoFalke !
I need address sanitizer. But nevertheless this is not the problem. If I use only "-fsanitize=fuzzer" the result is the same.
Here is the code "hello.cc":

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>

//int main() {
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    printf("Hello World\n");
    exit(0);
}

And this is the compilation and execution:

# Compilation:
> clang -fprofile-instr-generate -fcoverage-mapping -g -fsanitize=fuzzer hello.cc

# Execution:
> ./a.out -print_coverage=1
INFO: Seed: 1320401412
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
Hello World
==106162== ERROR: libFuzzer: fuzz target exited
    #0 0x4af000 in __sanitizer_print_stack_trace (.../a.out+0x4af000)
    #1 0x45b308 in fuzzer::PrintStackTrace() (.../a.out+0x45b308)
    #2 0x44050c in fuzzer::Fuzzer::ExitCallback() (.../a.out+0x44050c)
    #3 0x7f167468ca26 in __run_exit_handlers /build/glibc-eX1tMB/glibc-2.31/stdlib/exit.c:108:8
    #4 0x7f167468cbdf in exit /build/glibc-eX1tMB/glibc-2.31/stdlib/exit.c:139:3
    #5 0x4af310 in LLVMFuzzerTestOneInput .../hello.cc:8:5
    #6 0x441b11 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) (.../a.out+0x441b11)
    #7 0x44384a in fuzzer::Fuzzer::ReadAndExecuteSeedCorpora(std::__Fuzzer::vector<fuzzer::SizedFile, fuzzer::fuzzer_allocator<fuzzer::SizedFile> >&) (.../a.out+0x44384a)
    #8 0x443ed9 in fuzzer::Fuzzer::Loop(std::__Fuzzer::vector<fuzzer::SizedFile, fuzzer::fuzzer_allocator<fuzzer::SizedFile> >&) (.../a.out+0x443ed9)
    #9 0x432bae in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) (.../a.out+0x432bae)
    #10 0x45b9f2 in main (.../a.out+0x45b9f2)
    #11 0x7f167466a0b2 in __libc_start_main /build/glibc-eX1tMB/glibc-2.31/csu/../csu/libc-start.c:308:16
    #12 0x40794d in _start (.../a.out+0x40794d)

SUMMARY: libFuzzer: fuzz target exited
MS: 0 ; base unit: 0000000000000000000000000000000000000000
artifact_prefix='./'; Test unit written to ./crash-da39a3ee5e6b4b0d3255bfef95601890afd80709
Base64:

COVERAGE:
UNCOVERED_FUNC: hits: 0 edges: 0/1 LLVMFuzzerTestOneInput .../hello_fuzzer.cc:5

⚠️ The last 2 lines show a primitive coverage text report, perhaps someone can tell me how to convert it to a friendly format (.gcda .gcno or .html)?

Anyway, the file "default.profraw" is created but with size 0:

> ls -l
-rwxrwxr-x 1 developer developer 1356920 Feb  1 12:36 a.out
-rw-rw-r-- 1 developer developer       0 Feb  1 12:36 crash-da39a3ee5e6b4b0d3255bfef95601890afd80709
-rw-rw-r-- 1 developer developer       0 Feb  1 12:36 default.profraw
-rw-rw-r-- 1 developer developer     206 Feb  1 12:30 hello.cc

If I remove "-fsanitize=fuzzer" (and use main() in hello.cc), then the file "default.profraw" is created (and I can see the coverage);

> clang -fprofile-instr-generate -fcoverage-mapping -g  hello.cc
> ./a.out
Hello World
> ls -l
total 92
-rwxrwxr-x 1 developer developer 78968 Feb  1 12:43 a.out*
-rw-rw-r-- 1 developer developer   152 Feb  1 12:44 default.profraw
-rw-rw-r-- 1 developer developer   206 Feb  1 12:43 hello.cc
> file default.profraw
default.profraw: LLVM raw profile data, version 5

@Dor1s
Copy link
Contributor

Dor1s commented Feb 2, 2022

@damgut you need to exclude any crash inputs when generating code coverage report. You're right that something breaks the program execution before the .profraw is dumped -- it is a fuzzer crash. To have a good coverage report for your fuzzer, let it run for a while and then use the generation corpora for code coverage generation. It also will be faster to do so.

  1. run a normal fuzzer build (without coverage instrumentation)
./fuzzer -any_other_runtime_flags=1 ./your_corpora_directory
  1. minimize the corpora
mkdir corpora_minimized
./fuzzer -merge=1 ./corpora_minimized ./your_corpora_directory
  1. run coverage instrumented build over the minimized corpora
./fuzzer -runs=0 ./corpora_minimized

the .profraw generated at the last step will have the most accurate code coverage that your fuzzer was able to achieve while fuzzing on step 1

@damgut
Copy link

damgut commented Feb 2, 2022

Thanks @Dor1s, I was trying what you proposed and it works. That means 2 runs: first run until fuzzy crashes and a second run by using the corpus files to generate the coverage in .profraw. The only thing missing in the coverage will be the last path if the fuzz test crashes but this is not terrible.

Nevertheless I still have difficulties to generate .profraw in a big project. I see I can get also a text coverage by passing the option -print_coverage=1 to the executable. The output goes to the console and looks like this:

COVERAGE:
UNCOVERED_FUNC: hits: 0 edges: 0/3 init foo.cpp:97
UNCOVERED_FUNC: hits: 0 edges: 0/1 start foo.cpp:0
UNCOVERED_FUNC: hits: 0 edges: 0/3 open() foo.h:162
COVERED_FUNC: hits: 5 edges: 4/7 bla(int a) foo.cpp:118
UNCOVERED_PC: foo.cpp:0
UNCOVERED_PC: foo.cpp:119
....

Do you know if there is a way to convert this text output in a friendly format (like gcovr which produces an html output)

@Dor1s
Copy link
Contributor

Dor1s commented Feb 5, 2022

-print_coverage=1 has nothing to do with LLVM Source-based coverage instrumentation (which generates .profraw files). What are the difficulties you're having with a big project? Are you sure all the files were instrumented with -fprofile-instr-generate -fcoverage-mapping? Is the application you're running single process or does it spawn multiple processes?

@damgut
Copy link

damgut commented Feb 7, 2022

Thanks @Dor1s and everybody for the fast answers!

My last problem was that I've missed the options -fprofile-instr-generate -fcoverage-mapping when calling the linker. Now everything works as expected.

Summary

I've found 3 different ways to get a coverage when using -fsanitize=address together with fuzzing:

Using -print_coverage=1

Compile and link with -fprofile-instr-generate -fcoverage-mapping options. Then call the executable by passing the option -print_coverage=1. After the execution is finished (even by abort or crash), a very long list is printed to stdout indicating which lines where accessed. Unfortunately I didn't find any tool which can parse this info to display it in a friendly manner.

Using .profraw file

Compile and link with -fprofile-instr-generate -fcoverage-mapping options. When using -fsanitize=address, no .profraw will be written on crash or abort, so once the fuzzy test is finished, a second run is needed by passing only files in corpus, as @Dor1s proposed above: ./fuzzer -runs=0 ./corpora_minimized
Then to generate an html view I've used:

# create "coverage.profdata"
llvm-profdata-10 merge -sparse coverage.profraw -o coverage.profdata
# Generate output
llvm-cov-10 show --format=html ./a.out -instr-profile=coverage.profdata > coverage.html

The disadvantage here is that coverage.html is a single big file which contain a list of files. There is no summary or statistics.

Using gcovr

This is my favorite since the generated html contains different files, one for each source code, together with a summary and nice statistics. Here also 2 runs are needed:

  1. First run to generate the corpus files (for this no specific coverage options is needed). You can start many runs as you want to fill the corpus.
  2. Compile again by using the option --coverage. I would not recommend to use this option for the first run since it introduces an additional overhead to generate the .gcno .gcda files (in my measurement the execution took 35% longer). Since coverage run is not done frequently and normally with few files in the corpus, this additional time is not critical.
    To generate the coverage:
./fuzzer -runs=0 ./corpora_minimized
mkdir coverage
gcovr --gcov-executable "llvm-cov-10 gcov" --html --html-details \
--object-directory=[directory where .o .gcno .gcdaj are located] \
-r [root directory, normally .] \
-f [filter for source files as regex, for example .*src/.*] \
-o coverage/coverage.html

See also: https://stackoverflow.com/questions/60840386/how-do-i-produce-a-graphical-code-profile-report-for-c-code-compiled-with-clan

@chinggg
Copy link

chinggg commented Jul 3, 2022

Thanks to everyone involved in the discussion! I find this issue really helpful since there seems to be no official document about generating code coverage reports for libfuzzer. Just FYI, I find another tool to get libfuzzer HTML coverage overview https://github.com/vanhauser-thc/libfuzzer-cov

@vors
Copy link

vors commented Feb 28, 2023

Hi friends! I have troubles with empty coverage. I tried running the simple @damgut 's example (thank you for the documenting it) #41 (comment) in latest clang docker container and it doesn't produce the COVERAGE.

Here is the repro (based on @damgut 's post).

create hello.cc file

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>

//int main() {
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    printf("Hello World\n");
    exit(0);
}

Run docker container with latest clang

docker run -v$(pwd):/app -it silkeh/clang:15-bullseye

Compile in it and run

clang -fprofile-instr-generate -fcoverage-mapping -g -fsanitize=fuzzer hello.cc
./a.out -print_coverage=1

Output that I'm getting

root@bed059cbaae9:/app# ./a.out -print_coverage=1
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 1659063642
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
Hello World
==12== ERROR: libFuzzer: fuzz target exited
    #0 0x561dd2bf6784 in __sanitizer_print_stack_trace (/app/a.out+0x68784) (BuildId: d8b0663adf9142cf63d5e69ce28e312bd0471ab9)
    #1 0x561dd2bcdb37 in fuzzer::PrintStackTrace() (/app/a.out+0x3fb37) (BuildId: d8b0663adf9142cf63d5e69ce28e312bd0471ab9)
    #2 0x561dd2bb3eac in fuzzer::Fuzzer::ExitCallback() (/app/a.out+0x25eac) (BuildId: d8b0663adf9142cf63d5e69ce28e312bd0471ab9)
    #3 0x7f3b0aaba4d6  (/lib/x86_64-linux-gnu/libc.so.6+0x3b4d6) (BuildId: b503275bf9fee51581fdceef97533b194035b4f7)
    #4 0x7f3b0aaba679 in exit (/lib/x86_64-linux-gnu/libc.so.6+0x3b679) (BuildId: b503275bf9fee51581fdceef97533b194035b4f7)
    #5 0x561dd2bf6a96 in LLVMFuzzerTestOneInput /app/hello.cc:8:5
    #6 0x561dd2bb5512 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) (/app/a.out+0x27512) (BuildId: d8b0663adf9142cf63d5e69ce28e312bd0471ab9)
    #7 0x561dd2bb67d0 in fuzzer::Fuzzer::ReadAndExecuteSeedCorpora(std::vector<fuzzer::SizedFile, std::allocator<fuzzer::SizedFile>>&) (/app/a.out+0x287d0) (BuildId: d8b0663adf9142cf63d5e69ce28e312bd0471ab9)
    #8 0x561dd2bb6e93 in fuzzer::Fuzzer::Loop(std::vector<fuzzer::SizedFile, std::allocator<fuzzer::SizedFile>>&) (/app/a.out+0x28e93) (BuildId: d8b0663adf9142cf63d5e69ce28e312bd0471ab9)
    #9 0x561dd2ba51f2 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) (/app/a.out+0x171f2) (BuildId: d8b0663adf9142cf63d5e69ce28e312bd0471ab9)
    #10 0x561dd2bce462 in main (/app/a.out+0x40462) (BuildId: d8b0663adf9142cf63d5e69ce28e312bd0471ab9)
    #11 0x7f3b0aaa2d09 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x23d09) (BuildId: b503275bf9fee51581fdceef97533b194035b4f7)
    #12 0x561dd2b99cc9 in _start (/app/a.out+0xbcc9) (BuildId: d8b0663adf9142cf63d5e69ce28e312bd0471ab9)

SUMMARY: libFuzzer: fuzz target exited
MS: 0 ; base unit: 0000000000000000000000000000000000000000


artifact_prefix='./'; Test unit written to ./crash-da39a3ee5e6b4b0d3255bfef95601890afd80709
Base64: 
COVERAGE:

Notice nothing is printed at the end.

@Dor1s
Copy link
Contributor

Dor1s commented Mar 6, 2023

@vors you have three options:

  1. remove inputs that trigger exit() from your corpus
  2. if you really need to have exit() invoked as part of the expectation behavior, you need to call __llvm_profile_dump prior to exiting the program
  3. on Mac, you can try using %c pattern in the LLVM_PROFILE_FILE value (see https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#running-the-instrumented-program - it also has a bit more context about program exits / crashes)

@vors
Copy link

vors commented Mar 6, 2023

@Dor1s oh interesting, thank you for the replay! FWIW this doesn't affect the . profraw flow -- I just realized that I was still able to create this datafile.

zduthie-unimelb added a commit to zduthie-unimelb/connectedhomeip that referenced this issue May 4, 2023
Fuzzing binary now searches for environment variable `FUZZ_CAMPAIGN_MINUTES` to automatically limit, halt execution, and dump gcov data once X minutes have elapsed. This was required to extract gcov data from a fuzzing binary as under normal circumstances manually aborting the execution did not produce any gcov data.
google/fuzzing#41
zduthie-unimelb added a commit to zduthie-unimelb/connectedhomeip that referenced this issue May 5, 2023
Fuzzing binary now searches for environment variable `FUZZ_CAMPAIGN_MINUTES` to automatically limit, halt execution, and dump gcov data once X minutes have elapsed. This was required to extract gcov data from a fuzzing binary as under normal circumstances manually aborting the execution did not produce any gcov data.
google/fuzzing#41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants