Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
strfx authored Aug 27, 2021
1 parent 290f22c commit 71b1175
Showing 1 changed file with 54 additions and 3 deletions.
57 changes: 54 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,62 @@
</h1>

<h4 align="center">Generate Code-Based <a href="https://virustotal.github.io/yara/" target="_blank">Yara</a> Rules using Machine Learning.</h4>



<p align="center">
<strong>.</strong>
<img width="650px" src="https://github.com/strfx/clava/blob/main/docs/cli.png?raw=true" alt="clava LCI"/>
</p>

# About

clava was developed during an industry project at Hochschule Luzern with the goal to automatically create Yara rules, based on a given malware sample. Rules created with clava should **not** be used in production, but can assist during rule development. Since this project is heavily inspired by [yarGen](https://github.com/Neo23x0/yarGen), See also Floriah Roth's [blog post](https://cyb3rops.medium.com/how-to-post-process-yara-rules-generated-by-yargen-121d29322282) on *"How to post-process YARA rules generated by yarGen"*.

We've kept the machine learning part intentionally rudimentary to demonstrate how much can be achieved with simple techniques. See _Summary_ for a quick roundup of the research. As a next step, one could explore more sophisticated techniques to improve the results. See the section *Contribute* below for some ideas. In the first, rudimentary iteration we used a simple logistic regression classifier, which was trained on the term frequency weights of mnemonic n-grams. If you are interested in the written report, feel free to contact me.

clava was heavily inspired by these projects:

* [yarGen](https://github.com/Neo23x0/yarGen)
* [yara-signator](https://github.com/fxb-cocacoding/yara-signator)
* [binsequencer](https://github.com/karttoon/binsequencer/)
* [yabin](https://github.com/AlienVault-OTX/yabin)

**Note:** At the moment, the models are not public. However, you can easily train a model on your own dataset. Instructions will follow.


# Getting Started

To install `clava`, clone this repository and run:

```sh
$ python setup.py install
```

clava offers a simple CLI to interact. To list all available options, run:

```sh
$ clava -h
```

To generate a yara rule based on a sample:

```sh
$ clava yara <path/to/sample>
```


# Development

During development, I recommend installing `clava` in editable mode:

```sh
$ pip install -e .[dev]
```

## Running the tests

clava uses [pytest](https://docs.pytest.org/en/6.2.x/). To run the test suite with a set of predefined settings, run:

```sh
$ make tests
```

Alternatively, you can run pytest against the `tests/` directory with your own settings.

0 comments on commit 71b1175

Please sign in to comment.