Introduction: Why YARA?

YARA (Yet Another Ridiculous Acronym) is the industry-standard tool for pattern-based malware detection. Originally designed by Victor Alvarez at VirusTotal, it lets analysts write rules that describe malware families using strings, binary patterns, and boolean logic.

Used daily by SOC teams, DFIR responders, threat hunters, and red/blue teams worldwide.



1. Anatomy of a YARA Rule

rule RuleName : tag1 tag2 {
    meta:
        author      = "Analyst"
        date        = "2024-01-01"
        description = "Detects something sketchy"
        severity    = "high"
        reference   = "https://mlab.sh"

    strings:
        $s1 = "MaliciousString" nocase
        $s2 = { 4D 5A 90 00 03 00 00 00 }  // MZ header
        $re1 = /https?:\/\/[a-z0-9]{8,}\.onion/

    condition:
        uint16(0) == 0x5A4D and
        filesize < 2MB and
        ($s1 or $re1) and
        #s2 >= 1
}

The 4 sections:

  • meta — free-form metadata (not evaluated at scan time)
  • strings — pattern declarations
  • condition — mandatory boolean logic
  • Tags — optional categorization


2. String Types


Text strings

$plain   = "cmd.exe"                  // exact, case-sensitive
$nocase  = "powershell" nocase        // case-insensitive
$wide    = "explorer" wide            // UTF-16LE (Windows internals)
$both    = "malware" wide ascii       // both encodings
$full    = "exactmatch" fullword      // whole word only, no substrings
$xored   = "payload" xor             // all XOR keys 0x01-0xFF
$xored2  = "payload" xor(0x01-0x0F)  // restricted XOR range

Hex strings (binary patterns)

$mz    = { 4D 5A }                        // MZ header
$wild  = { 4D ?? 5A }                     // wildcard byte
$range = { 4D [2-4] 5A }                  // 2 to 4 arbitrary bytes
$jump  = { 4D [0-] 5A }                   // unlimited jump
$alt   = { (4D|4E) 5A }                   // byte alternative
$mask  = { 4? 5A }                        // nibble wildcard

Regular expressions

$re1 = /[a-f0-9]{32}/               // MD5 hash pattern
$re2 = /https?:\/\/\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/  // IP in URL
$re3 = /[A-Za-z0-9+\/]{40,}={0,2}/  // base64 blob
$re4 = /CreateRemoteThread|VirtualAllocEx|WriteProcessMemory/i


3. Advanced Conditions


Core operators

condition:
    $s1 and $s2           // both present
    $s1 or $s2            // at least one
    not $s1               // absent
    $s1 and not $s2
    any of ($s*)          // any string matching $s*
    all of ($s*)          // every string matching $s*
    2 of ($s1, $s2, $s3)  // exactly 2 of the 3
    any of them           // any declared string

Cardinality and offsets

condition:
    #s1 >= 3              // $s1 appears at least 3 times
    #s1 == 1              // exactly once
    @s1 < 0x1000          // first occurrence within the first 4KB
    @s1[2] > 0x500        // 2nd occurrence past offset 0x500
    !s1 > 100             // match length > 100 bytes

File inspection functions

condition:
    filesize > 10KB and filesize < 5MB
    uint8(0) == 0x4D                    // first byte = 'M'
    uint16(0) == 0x5A4D                 // little-endian 'MZ'
    uint32(0) == 0x464C457F             // ELF magic
    uint32be(0) == 0xCAFEBABE           // Java class file
    int32(filesize - 4) == 0x41424344  // last 4 bytes check


4. YARA Modules


PE module — Windows binary analysis

import "pe"

rule Packed_PE_Suspicious {
    meta:
        description = "PE with suspicious sections and few imports"

    condition:
        pe.is_pe and
        pe.number_of_sections > 8 and
        pe.imports("kernel32.dll", "VirtualAlloc") and
        pe.imports("kernel32.dll", "WriteProcessMemory") and
        pe.number_of_imports < 5 and
        for any section in pe.sections : (
            section.name == ".text" and
            math.entropy(section.raw_data_offset, section.raw_data_size) > 7.0
        )
}
import "pe"

rule Signed_But_Suspicious {
    meta:
        description = "Revoked or expired certificate"

    condition:
        pe.is_signed and
        not pe.is_valid_signature and
        pe.timestamp > 1577836800  // after 2020-01-01
}

Math module — entropy and statistics

import "math"

rule High_Entropy_Section {
    meta:
        description = "Packed or encrypted section (UPX, custom packer)"

    condition:
        filesize > 50KB and
        math.entropy(0, filesize) > 7.2
}

Hash module — fingerprinting

import "hash"

rule Known_Dropper_Hash {
    condition:
        hash.md5(0, filesize) == "d41d8cd98f00b204e9800998ecf8427e" or
        hash.sha256(0, filesize) == "e3b0c44298fc1c149afb...0655"
}

ELF module — Linux / Android

import "elf"

rule ELF_Mirai_Variant {
    meta:
        description = "Mirai variant targeting IoT devices"

    strings:
        $s1 = "LCOGQGPTGR" // common XOR-obfuscated strings
        $s2 = "/proc/net/tcp"
        $s3 = "router"

    condition:
        elf.type == elf.ET_EXEC and
        elf.machine == elf.EM_ARM and
        2 of ($s*)
}


5. Advanced Rules with Loops


Iterating over PE sections

import "pe"

rule PE_Injector_Classic {
    meta:
        description = "Classic process injection pattern"

    strings:
        $api1 = "OpenProcess" nocase
        $api2 = "VirtualAllocEx" nocase
        $api3 = "WriteProcessMemory" nocase
        $api4 = "CreateRemoteThread" nocase

    condition:
        pe.is_pe and
        3 of ($api*) and
        pe.imports("kernel32.dll") and
        for any i in (0..pe.number_of_sections - 1) : (
            pe.sections[i].name == ".text" and
            pe.sections[i].virtual_size > pe.sections[i].raw_data_size * 2
        )
}

Polymorphic shellcode detection

rule Shellcode_Decoder_Stub {
    meta:
        description = "Generic XOR decoder stub"

    strings:
        // x86 XOR loop patterns
        $x86_xor1 = { 8A ?? ?? 34 ?? 88 ?? ?? 4? 75 F? }
        $x86_xor2 = { 30 ?? ?? 4? 83 ?? ?? 75 F? }
        // x64 XOR loop patterns
        $x64_xor1 = { 48 8B ?? ?? 48 33 ?? 48 89 ?? ?? 48 FF ?? }

    condition:
        filesize < 10KB and
        any of ($x86_xor*, $x64_xor*)
}


6. Real-World Examples by Malware Family


Ransomware — generic detection

rule Ransomware_Generic_Behavior {
    meta:
        description = "Ransomware behavior: file enumeration + crypto APIs"
        severity     = "critical"
        tags         = "ransomware"

    strings:
        // Encryption APIs
        $crypt1 = "CryptEncrypt" nocase
        $crypt2 = "CryptGenKey" nocase
        $crypt3 = "BCryptEncrypt" nocase
        $crypt4 = "CryptAcquireContext" nocase

        // Shadow copy deletion
        $shadow1 = "vssadmin" nocase
        $shadow2 = "delete shadows" nocase
        $shadow3 = "wbadmin delete" nocase
        $shadow4 = { 76 73 73 61 64 6D 69 6E } // "vssadmin" hex

        // Target file extensions
        $ext1 = ".docx" wide ascii
        $ext2 = ".xlsx" wide ascii
        $ext3 = ".pdf" wide ascii

        // Ransom note indicators
        $note1 = "bitcoin" nocase
        $note2 = "decrypt" nocase wide ascii
        $note3 = "your files" nocase wide ascii

    condition:
        pe.is_pe and
        2 of ($crypt*) and
        (1 of ($shadow*) or 3 of ($ext*)) and
        any of ($note*)
}

Cobalt Strike Beacon

rule CobaltStrike_Beacon_Config {
    meta:
        description = "Cobalt Strike Beacon — in-memory config block"
        author      = "Mlab Team"
        severity    = "critical"

    strings:
        // Config block magic bytes
        $cfg_start = { 00 00 00 BE EF }
        // Checksum 0x5d3c8f39 (XORed default key)
        $xor_key   = { 69 68 68 68 }
        // Default trial watermark
        $watermark = { 00 00 CA FE }
        // Default named pipes
        $pipe1 = "\\.\pipe\msagent_" wide
        $pipe2 = "\\MSSE-" wide

    condition:
        (uint32(0) == 0x0000BEEF or $cfg_start) and
        any of ($pipe*) or
        ($xor_key and $watermark and filesize < 500KB)
}

Mimikatz

rule Mimikatz_Generic {
    meta:
        description = "Mimikatz or derivative tool"
        severity    = "critical"

    strings:
        $s1  = "sekurlsa::logonpasswords" nocase
        $s2  = "lsadump::dcsync" nocase
        $s3  = "kerberos::golden" nocase
        $s4  = "privilege::debug" nocase
        $s5  = "mimikatz" nocase wide ascii
        $s6  = "Benjamin DELPY" nocase
        $s7  = { 6D 00 69 00 6D 00 69 00 6B 00 61 00 74 00 7A 00 }
        $api1 = "LsaIQueryInformationPolicyTrusted" nocase
        $api2 = "SamIGetPrivateData" nocase

    condition:
        2 of ($s*) or
        any of ($api*)
}


7. Rules for Malicious Documents


Office VBA macro dropper

rule Office_Macro_Dropper {
    meta:
        description = "Office document with dropper macro"

    strings:
        // OLE magic
        $ole = { D0 CF 11 E0 A1 B1 1A E1 }
        // VBA module markers
        $vba1 = "VBA7" wide
        $vba2 = "ThisDocument" wide
        // Suspicious execution techniques
        $exec1 = "Shell" nocase wide
        $exec2 = "WScript.Shell" nocase wide
        $exec3 = "CreateObject" nocase wide
        $exec4 = "PowerShell" nocase wide
        $exec5 = "cmd.exe" nocase wide
        // Download functions
        $dl1   = "XMLHTTP" nocase wide
        $dl2   = "WinHttp" nocase wide
        $dl3   = "URLDownloadToFile" nocase
        // Basic obfuscation
        $obf1  = "Chr(" nocase
        $obf2  = "StrReverse" nocase

    condition:
        $ole at 0 and
        any of ($vba*) and
        2 of ($exec*) and
        (any of ($dl*) or 3 of ($obf*))
}

PDF with malicious JavaScript

rule PDF_Malicious_JavaScript {
    meta:
        description = "PDF with obfuscated JS and potential shellcode"

    strings:
        $pdf_hdr    = { 25 50 44 46 }  // %PDF
        $js1        = "/JavaScript" nocase
        $js2        = "/JS" nocase
        $launch     = "/Launch"
        $openaction = "/OpenAction"
        $exploit1   = "util.printf" nocase
        $exploit2   = "getAnnots" nocase
        $exploit3   = "getIcon" nocase
        $obf        = "unescape" nocase
        $heap       = "%u0c0c%u0c0c"

    condition:
        $pdf_hdr at 0 and
        ($js1 or $js2) and
        ($openaction or $launch) and
        (any of ($exploit*) or $obf or $heap)
}


8. Performance and Optimization


Golden rules

Always place the most restrictive conditions first to benefit from boolean short-circuit evaluation.

// BAD - evaluates entropy on every single file
condition:
    math.entropy(0, filesize) > 7.0 and pe.is_pe

// GOOD - pe.is_pe filters out 99% of non-PE files first
condition:
    pe.is_pe and math.entropy(0, filesize) > 7.0

Optimization checklist

  • Use filesize as the very first filter (near-zero cost)
  • Check magic bytes early: uint16(0) == 0x5A4D
  • Avoid complex regexes on large files without a pre-filter
  • Prefer hex strings over text strings for binary patterns
  • Avoid any of them — be specific with any of ($s*)
  • Benchmark with yara -p 4 (4 threads) on a large corpus


9. Integration and Tooling


Command line

# Basic scan
yara rules.yar /path/to/suspect/

# Recursive scan
yara -r rules.yar /malware/corpus/

# Show matched strings
yara -s rules.yar sample.exe

# Show metadata
yara -m rules.yar sample.exe

# Multi-rule, multi-threaded
yara -p 8 rules/*.yar /corpus/

# Scan from stdin
cat sample.bin | yara rules.yar -

# Per-file timeout (prevent ReDoS)
yara --timeout=30 rules.yar /scan/

YARA with Python (yara-python)

import yara

# Compile rules
rules = yara.compile(filepath='rules.yar')

# Scan a file
matches = rules.match('/tmp/sample.exe')
for match in matches:
    print(f"[!] {match.rule} — tags: {match.tags}")
    for string in match.strings:
        print(f"    @ {hex(string.offset)}: {string.identifier} = {string.instances[0].matched_data[:64]}")

# Scan a live process (DFIR)
import psutil
for proc in psutil.process_iter(['pid', 'name']):
    try:
        matches = rules.match(pid=proc.info['pid'])
        if matches:
            print(f"ALERT: PID {proc.info['pid']} ({proc.info['name']}) matches {[m.rule for m in matches]}")
    except Exception:
        pass

MISP / OpenCTI integration

from pymisp import ExpandedPyMISP, MISPEvent

api = ExpandedPyMISP("https://misp.corp/", "API_KEY")
event = MISPEvent()
event.add_attribute('yara', value=open('rule.yar').read(),
                    comment='Cobalt Strike Beacon v4.x')
api.add_event(event)


10. Resources and Rule Repositories

  • Mlab.sh — threat intelligence platform with built-in YARA scanning
  • YARA GitHub — yara-project/yara (official documentation)
  • Awesome YARA — github.com/InQuest/awesome-yara
  • YARAify — abuse.ch/yaraify (community rules)
  • Elastic Security — rules.elastic.co
  • Mandiant / Google — github.com/mandiant/red_team_tool_countermeasures