Search code, repositories, users, issues, pull requests...

Provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications

google/atheris

Name already in use.

Use Git or checkout with SVN using the web URL.

Work fast with our official CLI. Learn more about the CLI .

  • Open with GitHub Desktop
  • Download ZIP

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

@AidenRHall

  • 267 commits

Atheris: A Coverage-Guided, Native Python Fuzzer

Atheris is a coverage-guided Python fuzzing engine. It supports fuzzing of Python code, but also native extensions written for CPython. Atheris is based off of libFuzzer. When fuzzing native code, Atheris can be used in combination with Address Sanitizer or Undefined Behavior Sanitizer to catch extra bugs.

Installation Instructions

Atheris supports Linux (32- and 64-bit) and Mac OS X, Python versions 3.6-3.10.

You can install prebuilt versions of Atheris with pip:

These wheels come with a built-in libFuzzer, which is fine for fuzzing Python code. If you plan to fuzz native extensions, you may need to build from source to ensure the libFuzzer version in Atheris matches your Clang version.

Building from Source

Atheris relies on libFuzzer, which is distributed with Clang. If you have a sufficiently new version of clang on your path, installation from source is as simple as:

If you don't have clang installed or it's too old, you'll need to download and build the latest version of LLVM. Follow the instructions in Installing Against New LLVM below.

Apple Clang doesn't come with libFuzzer, so you'll need to install a new version of LLVM from head. Follow the instructions in Installing Against New LLVM below.

Installing Against New LLVM

Using atheris.

When fuzzing Python, Atheris will report a failure if the Python code under test throws an uncaught exception.

Python coverage

Atheris collects Python coverage information by instrumenting bytecode. There are 3 options for adding this instrumentation to the bytecode:

You can instrument the libraries you import:

This will cause instrumentation to be added to foo and bar , as well as any libraries they import.

Or, you can instrument individual functions:

Or finally, you can instrument everything:

Put this right before atheris.Setup() . This will find every Python function currently loaded in the interpreter, and instrument it. This might take a while.

Atheris can additionally instrument regular expression checks, e.g. re.search . To enable this feature, you will need to add: atheris.enabled_hooks.add("RegEx") To your script before your code calls re.compile . Internally this will import the re module and instrument the necessary functions. This is currently an experimental feature.

Similarly, Atheris can instrument str methods; currently only str.startswith and str.endswith are supported. To enable this feature, add atheris.enabled_hooks.add("str") . This is currently an experimental feature.

Why am I getting "No interesting inputs were found"?

You might see this error:

You'll get this error if the first 2 calls to TestOneInput didn't produce any coverage events. Even if you have instrumented some Python code, this can happen if the instrumentation isn't reached in those first 2 calls. (For example, because you have a nontrivial TestOneInput ). You can resolve this by adding an atheris.instrument_func decorator to TestOneInput , using atheris.instrument_all() , or moving your TestOneInput function into an instrumented module.

Visualizing Python code coverage

Examining which lines are executed is helpful for understanding the effectiveness of your fuzzer. Atheris is compatible with coverage.py : you can run your fuzzer using the coverage.py module as you would for any other Python program. Here's an example:

Coverage reports are only generated when your fuzzer exits gracefully. This happens if:

  • you specify -atheris_runs=<number> , and that many runs have elapsed.
  • your fuzzer exits by Python exception.
  • your fuzzer exits by sys.exit() .

No coverage report will be generated if your fuzzer exits due to a crash in native code, or due to libFuzzer's -runs flag (use -atheris_runs ). If your fuzzer exits via other methods, such as SIGINT (Ctrl+C), Atheris will attempt to generate a report but may be unable to (depending on your code). For consistent reports, we recommend always using -atheris_runs=<number> .

If you'd like to examine coverage when running with your corpus, you can do that with the following command:

This will cause Atheris to run on each file in <corpus-dir> , then exit. Note: atheris use empty data set as the first input even if there is no empty file in <corpus_dir> . Importantly, if you leave off the -atheris_runs=$(ls corpus_dir | wc -l) , no coverage report will be generated.

Using coverage.py will significantly slow down your fuzzer, so only use it for visualizing coverage; don't use it all the time.

Fuzzing Native Extensions

In order for fuzzing native extensions to be effective, your native extensions must be instrumented. See Native Extension Fuzzing for instructions.

Structure-aware Fuzzing

Atheris is based on a coverage-guided mutation-based fuzzer (LibFuzzer). This has the advantage of not requiring any grammar definition for generating inputs, making its setup easier. The disadvantage is that it will be harder for the fuzzer to generate inputs for code that parses complex data types. Often the inputs will be rejected early, resulting in low coverage.

Atheris supports custom mutators (as offered by LibFuzzer) to produce grammar-aware inputs.

Example (Atheris-equivalent of the example in the LibFuzzer docs ):

To reach the RuntimeError crash, the fuzzer needs to be able to produce inputs that are valid compressed data and satisfy the checks after decompression. It is very unlikely that Atheris will be able to produce such inputs: mutations on the input data will most probably result in invalid data that will fail at decompression-time.

To overcome this issue, you can define a custom mutator function (equivalent to LLVMFuzzerCustomMutator ). This example produces valid compressed data. To enable Atheris to make use of it, pass the custom mutator function to the invocation of atheris.Setup .

As seen in the example, the custom mutator may request Atheris to mutate data using atheris.Mutate() (this is equivalent to LLVMFuzzerMutate ).

You can experiment with custom_mutator_example.py and see that without the mutator Atheris would not be able to find the crash, while with the mutator this is achieved in a matter of seconds.

Custom crossover functions (equivalent to LLVMFuzzerCustomCrossOver ) are also supported. You can pass the custom crossover function to the invocation of atheris.Setup . See its usage in custom_crossover_fuzz_test.py .

Structure-aware Fuzzing with Protocol Buffers

libprotobuf-mutator has bindings to use it together with Atheris to perform structure-aware fuzzing using protocol buffers.

See the documentation for atheris_libprotobuf_mutator .

Integration with OSS-Fuzz

Atheris is fully supported by OSS-Fuzz , Google's continuous fuzzing service for open source projects. For integrating with OSS-Fuzz, please see https://google.github.io/oss-fuzz/getting-started/new-project-guide/python-lang .

The atheris module provides three key functions: instrument_imports() , Setup() and Fuzz() .

In your source file, import all libraries you wish to fuzz inside a with atheris.instrument_imports(): -block, like this:

Generally, it's best to import atheris first and then import all other libraries inside of a with atheris.instrument_imports() block.

Next, define a fuzzer entry point function and pass it to atheris.Setup() along with the fuzzer's arguments (typically sys.argv ). Finally, call atheris.Fuzz() to start fuzzing. You must call atheris.Setup() before atheris.Fuzz() .

instrument_imports(include=[], exclude=[])

  • include : A list of fully-qualified module names that shall be instrumented.
  • exclude : A list of fully-qualified module names that shall NOT be instrumented.

This should be used together with a with -statement. All modules imported in said statement will be instrumented. However, because Python imports all modules only once, this cannot be used to instrument any previously imported module, including modules required by Atheris. To add coverage to those modules, use instrument_all() instead.

A full list of unsupported modules can be retrieved as follows:

instrument_func(func)

  • func : The function to instrument.

This will instrument the specified Python function and then return func . This is typically used as a decorator, but can be used to instrument individual functions too. Note that the func is instrumented in-place, so this will affect all call points of the function.

This cannot be called on a bound method - call it on the unbound version.

instrument_all()

This will scan over all objects in the interpreter and call instrument_func on every Python function. This works even on core Python interpreter functions, something which instrument_imports cannot do.

This function is experimental.

Setup(args, test_one_input, internal_libfuzzer=None)

  • args : A list of strings: the process arguments to pass to the fuzzer, typically sys.argv . This argument list may be modified in-place, to remove arguments consumed by the fuzzer. See the LibFuzzer docs for a list of such options.
  • test_one_input : your fuzzer's entry point. Must take a single bytes argument. This will be repeatedly invoked with a single bytes container.
  • internal_libfuzzer : Indicates whether libfuzzer will be provided by atheris or by an external library (see native_extension_fuzzing.md ). If unspecified, Atheris will determine this automatically. If fuzzing pure Python, leave this as True .

This starts the fuzzer. You must have called Setup() before calling this function. This function does not return.

In many cases Setup() and Fuzz() could be combined into a single function, but they are separated because you may want the fuzzer to consume the command-line arguments it handles before passing any remaining arguments to another setup function.

FuzzedDataProvider

Often, a bytes object is not convenient input to your code being fuzzed. Similar to libFuzzer, we provide a FuzzedDataProvider to translate these bytes into other input forms.

You can construct the FuzzedDataProvider with:

The FuzzedDataProvider then supports the following functions:

Consume count bytes.

Consume unicode characters. Might contain surrogate pair characters, which according to the specification are invalid in this situation. However, many core software tools (e.g. Windows file paths) support them, so other software often needs to too.

Consume unicode characters, but never generate surrogate pair characters.

Alias for ConsumeBytes in Python 2, or ConsumeUnicode in Python 3.

Consume a signed integer of the specified size (when written in two's complement notation).

Consume an unsigned integer of the specified size.

Consume an integer in the range [ min , max ].

Consume a list of count integers of size bytes.

Consume a list of count integers in the range [ min , max ].

Consume an arbitrary floating-point value. Might produce weird values like NaN and Inf .

Consume an arbitrary numeric floating-point value; never produces a special type like NaN or Inf .

Consume a floating-point value in the range [0, 1].

Consume a floating-point value in the range [ min , max ].

Consume a list of count arbitrary floating-point values. Might produce weird values like NaN and Inf .

Consume a list of count arbitrary numeric floating-point values; never produces special types like NaN or Inf .

Consume a list of count floats in the range [0, 1].

Consume a list of count floats in the range [ min , max ]

Given a list, pick a random value

Consume either True or False .

Code of conduct

Security policy, used by 168.

@vidur2

Contributors 22

@TheShiftedBit

  • Python 60.5%
  • Starlark 1.2%

atheris 1.0.3

pip install atheris==1.0.3 Copy PIP instructions

Released: Dec 4, 2020

A coverage-guided fuzzer for Python and Python extensions.

Project links

View statistics for this project via Libraries.io , or by using our public dataset on Google BigQuery

Author: Ian Eldred Pudney

Maintainers

Avatar for aidenhall from gravatar.com

Project description

Atheris: a coverage-guided, native python fuzzer.

Atheris is a coverage-guided Python fuzzing engine. It supports fuzzing of Python code, but also native extensions written for CPython. Atheris is based off of libFuzzer. When fuzzing native code, Atheris can be used in combination with Address Sanitizer or Undefined Behavior Sanitizer to catch extra bugs.

Installation Instructions

Atheris supports Linux (32- and 64-bit) and Mac OS X.

Atheris relies on libFuzzer, which is distributed with Clang. If you have a sufficiently new version of clang on your path, installation is as simple as:

If you don't have clang installed or it's too old, you'll need to download and build the latest version of LLVM. Follow the instructions in Installing Against New LLVM below.

Atheris relies on libFuzzer, which is distributed with Clang. However, Apple Clang doesn't come with libFuzzer, so you'll need to install a new version of LLVM from head. Follow the instructions in Installing Against New LLVM below.

Installing Against New LLVM

Using atheris.

Atheris supports fuzzing Python code, and uses Python code coverage information for this purpose.

Fuzzing Python Code

While Atheris supports Python 2.7 and Python 3.3+, its Python code coverage support is significantly better when used with Python 3.8+, as it supports opcode-by-opcode coverage. If fuzzing Python code, we strongly recommend using Python 3.8+ where possible.

When fuzzing Python, Atheris will report a failure if the Python code under test throws an uncaught exception.

Be sure to pass enable_python_coverage=True as an argument to Setup() . You can additionally pass enable_python_opcode_coverage=[True/False] to turn on and off opcode coverage. Opcode coverage is typically beneficial, but may provide more performance impact than benefit on large Python projects. This option defaults to True on Python 3.8+, or False otherwise.

Opcode coverage must be enabled to support features like intelligent string comparison fuzzing for Python code.

Fuzzing Native Extensions

In order for native fuzzing to be effective, such native extensions must be built with Clang, using the argument -fsanitize=fuzzer-no-link . They should be built with the same clang as was used when building Atheris.

The mechanics of building with Clang depend on your native extension. However, if your library is built with setuptools (e.g. pip and setup.py), the following is often sufficient:

When fuzzing a native extension, you must LD_PRELOAD the atheris dynamic library. Otherwise, you will receive an error such as undefined symbol: __sancov_lowest_stack . Atheris provides a feature to do this: you can find the atheris dynamic library with the following command:

Then, run Python with LD_PRELOAD :

If fuzzing a native extension without a significant Python component, you'll get better performance by specifying enable_python_coverage=False as an argument to Setup() .

Using Sanitizers

We strongly recommend using a Clang sanitizer, such as -fsanitize=address , when fuzzing native extensions. However, there are complexities involved in doing this; see using_sanitizers.md for details.

Main Interface

The atheris module provides two key functions: Setup() and Fuzz() .

In your source file, define a fuzzer entry point function, and pass it to atheris.Setup(), along with the fuzzer's arguments (typically sys.argv). Finally, call atheris.Fuzz() to start fuzzing. Here's an example:

Configure the Atheris Python Fuzzer. You must call atheris.Setup() before atheris.Fuzz().

  • args : A list of strings: the process arguments to pass to the fuzzer, typically sys.argv. This argument list may be modified in-place, to remove arguments consumed by the fuzzer.
  • test_one_input : your fuzzer's entry point. Must take a single bytes argument ( str in Python 2). This will be repeatedly invoked with a single bytes container.

Optional Args:

  • enable_python_coverage : boolean. Controls whether to collect coverage information on Python code. Defaults to True . If fuzzing a native extension with minimal Python code, set to False for a performance increase.
  • enable_python_opcode_coverage : boolean. Controls whether to collect Python opcode trace events. You typically want this enabled. Defaults to True on Python 3.8+, and False otherwise. Ignored if enable_python_coverage=False , or if using a version of Python prior to 3.8.

This starts the fuzzer. You must have called Setup() before calling this function. This function does not return.

FuzzedDataProvider

Often, a bytes object is not convenient input to your code being fuzzed. Similar to libFuzzer, we provide a FuzzedDataProvider to translate these bytes into other input forms.

You can construct the FuzzedDataProvider with:

The FuzzedDataProvider then supports the following functions:

Consume count bytes.

Consume unicode characters. Might contain surrogate pair characters, which according to the specification are invalid in this situation. However, many core software tools (e.g. Windows file paths) support them, so other software often needs to too.

Consume unicode characters, but never generate surrogate pair characters.

Alias for ConsumeBytes in Python 2, or ConsumeUnicode in Python 3.

Consume a signed integer of the specified size (when written in two's complement notation).

Consume an unsigned integer of the specified size.

Consume an integer in the range [ min , max ].

Consume a list of count integers of size bytes.

Consume a list of count integers in the range [ min , max ].

Consume an arbitrary floating-point value. Might produce weird values like NaN and Inf .

Consume an arbitrary numeric floating-point value; never produces a special type like NaN or Inf .

Consume a floating-point value in the range [0, 1].

Consume a floating-point value in the range [ min , max ].

Consume a list of count arbitrary floating-point values. Might produce weird values like NaN and Inf .

Consume a list of count arbitrary numeric floating-point values; never produces special types like NaN or Inf .

Consume a list of count floats in the range [0, 1].

Consume a list of count floats in the range [ min , max ]

Given a list, pick a random value

Consume either True or False .

Use with Hypothesis

The Hypothesis library for property-based testing is also useful for writing fuzz harnesses. As well as a great library of "strategies" which describe the inputs to generate, using Hypothesis makes it trivial to reproduce failures found by the fuzzer - including automatically finding a minimal reproducing input. For example:

See here for more details , or here for what you can generate .

Project details

Release history release notifications | rss feed.

Aug 29, 2023

Jan 6, 2023

Oct 25, 2022

Jul 25, 2022

May 4, 2022

Mar 18, 2022

Feb 17, 2022

Feb 15, 2022

Oct 27, 2021

Aug 14, 2021

Aug 12, 2021

Aug 10, 2021

Jul 28, 2021

Jul 21, 2021

Jul 16, 2021

1.0.13 yanked

May 11, 2021

Jul 5, 2021

Feb 3, 2021

Dec 11, 2020

Dec 9, 2020

Dec 4, 2020

Nov 17, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages .

Source Distribution

Uploaded Dec 4, 2020 source

Hashes for atheris-1.0.3.tar.gz

  • português (Brasil)

Supported by

atheris fuzzer

' height=

Security Blog

How the atheris python fuzzer works.

def TestOneInput(data):  # Our entry point

  if data == b"bad":

    raise RuntimeError("Badness!")

    

atheris.Setup(sys.argv, TestOneInput)

atheris.Fuzz()

Atheris is a native Python extension, and uses libFuzzer to provide its code coverage and input generation capabilities. The entry point passed to atheris.Setup() is wrapped in the C++ entry point that’s actually passed to libFuzzer. This wrapper will then be invoked by libFuzzer repeatedly, with its data proxied back to Python.

Python Code Coverage 

Atheris is a native Python extension, and is typically compiled with libFuzzer linked in. When you initialize Atheris, it registers a tracer with CPython to collect information about Python code flow. This tracer can keep track of every line reached and every function executed.

We need to get this trace information to libFuzzer, which is responsible for generating code coverage information. There’s a problem, however: libFuzzer assumes that the amount of code is known at compile-time. The two primary code coverage mechanisms are __sanitizer_cov_pcs_init (which registers a set of program counters that might be visited) and __sanitizer_cov_8bit_counters_init (which registers an array of booleans that are to be incremented when a basic block is visited). Both of these need to know at initialization time how many program counters or basic blocks exist. But in Python, that isn’t possible, since code isn’t loaded until well after Python starts. We can’t even know it when we start the fuzzer: it’s possible to dynamically import code later, or even generate code on the fly.

Thankfully, libFuzzer supports fuzzing shared libraries loaded at runtime. Both __sanitizer_cov_pcs_init and __sanitizer_cov_8bit_counters_init are able to be safely called from a shared library in its constructor (called when the library is loaded). So, Atheris simulates loading shared libraries! When tracing is initialized, Atheris first calls those functions with an array of 8-bit counters and completely made-up program counters. Then, whenever a new Python line is reached, Atheris allocates a PC and 8-bit counter to that line; Atheris will always report that line the same way from then on. Once Atheris runs out of PCs and 8-bit counters, it simply loads a new “shared library” by calling those functions again. Of course, exponential growth is used to ensure that the number of shared libraries doesn’t become excessive.

What's Special about Python 3.8+?

In the README , we advise users to use Python 3.8+ where possible. This is because Python 3.8 added a new feature: opcode tracing. Not only can we monitor when every line is visited and every function is called, but we can actually monitor every operation that Python performs, and what arguments it uses. This allows Atheris to find its way through if statements much better.

When a COMPARE_OP opcode is encountered, indicating a boolean comparison between two values, Atheris inspects the types of the values. If the values are bytes or Unicode, Atheris is able to report the comparison to libFuzzer via __sanitizer_weak_hook_memcmp. For integer comparison, Atheris uses the appropriate function to report integer comparisons, such as __sanitizer_cov_trace_cmp8.

In recent Python versions, a Unicode string is actually represented as an array of 1-byte, 2-byte, or 4-byte characters, based on the size of the largest character in the string. The obvious solution for coverage is to:

  • first compare two strings for equivalent character size and report it as an integer comparison with __sanitizer_cov_trace_cmp8
  • Second, if they’re equal, call __sanitizer_weak_hook_memcmp to report the actual string comparison

Share on Twitter

No comments :

Post a Comment

  • #sharethemicincyber
  • #supplychain #security #opensource
  • android security
  • app security
  • chrome enterprise
  • chrome security
  • connected devices
  • federated learning
  • google play
  • google play protect
  • interoperability
  • iot security
  • linux kernel
  • memory safety
  • Open Source
  • pha family highlights
  • private compute core
  • security rewards program
  • supply chain
  • targeted spyware
  • vulnerabilities

atheris fuzzer

You are using an outdated browser. Please upgrade your browser to improve your experience.

opensource.google.com

Google open source blog.

The latest news from Google on open source releases, major projects, events, and student outreach programs.

Announcing the Atheris Python Fuzzer

Friday, december 4, 2020, what can atheris do, what does atheris support, how can i get started.

By the Google Information Security team

Share on Twitter

IMAGES

  1. Atheris Close-Up

    atheris fuzzer

  2. Google Open-Sources Atheris Python Fuzzer

    atheris fuzzer

  3. I tried Google’s new Python fuzzer: Atheris

    atheris fuzzer

  4. Atheris Python Fuzzer, Bronze Bit Attack, & FireEye Highlights

    atheris fuzzer

  5. Atheris squamigera

    atheris fuzzer

  6. Atheris Close-Up

    atheris fuzzer

VIDEO

  1. AWP Atheris aldım

  2. Sam Sulek WTF! #swole #weightlifting #peds

  3. Atheris

  4. Playing Web Fuzzer

  5. Brother Atheris

  6. AWP ATHERİS (MW) TRADE UP %6

COMMENTS

  1. google/atheris

    Atheris: A Coverage-Guided, Native Python Fuzzer. Atheris is a coverage-guided Python fuzzing engine. It supports fuzzing of Python code, but also native

  2. Atheris: A Coverage-Guided, Native Python Fuzzer

    Atheris is a coverage-guided Python fuzzing engine. It supports fuzzing of Python code, but also native extensions written for CPython. Atheris is based off of

  3. Google открыл Atheris, инструментарий для fuzzing ...

    Напомним, что при fuzzing-тестировании осуществляется генерация потока всевозможных случайных комбинаций входных данных, приближенных к реальным

  4. Fuzzing Python code with Atheris

    Fuzzing Python code with Atheris - PyCon Italia 2022 Fuzz testing is a well-known technique for uncovering programming errors.

  5. How the Atheris Python Fuzzer Works

    Atheris is a native Python extension, and is typically compiled with libFuzzer linked in. When you initialize Atheris, it registers a tracer

  6. Announcing the Atheris Python Fuzzer

    Atheris can be used to automatically find bugs in Python code and native extensions. Atheris is a “coverage-guided” fuzzer, which means that

  7. Fuzzing Django Applications With the Atheris Fuzzing Engine

    Google released the Atheris fuzzing engine which allows programs written in Python to be tested with libFuzzer, an actively developed

  8. I tried Google's new Python fuzzer: Atheris

    Atheris: The Python fuzzer. Atheris is one of the first coverage-guided Python fuzzers. It means, the fuzzer measures the code coverage and

  9. Google представил инструмент для автоматического поиска

    ... Atheris Python Fuzzer. Компания Google выложила в открытый доступ на Github движок Atheris, который хорошо подходит для fuzzing-тестирования.

  10. Introduction to Python Fuzzing

    Discover how to fuzz Python code using pythonfuzz and Google Atheris fuzzers. Source code, Cheatsheets & 🎞️ Videos.

  11. Google открыл Atheris, инструментарий для fuzzing

    Google открыл Atheris, инструментарий для fuzzing-тестирования кода на языке Python ... Компания Google объявила об открытии исходных текстов