YARA Rules: The Complete Guide for Threat Hunters
Master YARA from zero to hero: full syntax, advanced conditions, modules, optimization and real-world examples to detect malware, ransomware and APTs. A no-nonsense technical guide.
Introduction: Why YARA?
YARA (Yet Another Ridiculous Acronym) is the industry-standard tool for pattern-based malware detection. Originally designed by Victor Alvarez at VirusTotal, it lets analysts write rules that describe malware families using strings, binary patterns, and boolean logic.
Used daily by SOC teams, DFIR responders, threat hunters, and red/blue teams worldwide.
1. Anatomy of a YARA Rule
rule RuleName : tag1 tag2 {
meta:
author = "Analyst"
date = "2024-01-01"
description = "Detects something sketchy"
severity = "high"
reference = "https://mlab.sh"
strings:
$s1 = "MaliciousString" nocase
$s2 = { 4D 5A 90 00 03 00 00 00 } // MZ header
$re1 = /https?:\/\/[a-z0-9]{8,}\.onion/
condition:
uint16(0) == 0x5A4D and
filesize < 2MB and
($s1 or $re1) and
#s2 >= 1
}
The 4 sections:
meta— free-form metadata (not evaluated at scan time)strings— pattern declarationscondition— mandatory boolean logic- Tags — optional categorization
2. String Types
Text strings
$plain = "cmd.exe" // exact, case-sensitive
$nocase = "powershell" nocase // case-insensitive
$wide = "explorer" wide // UTF-16LE (Windows internals)
$both = "malware" wide ascii // both encodings
$full = "exactmatch" fullword // whole word only, no substrings
$xored = "payload" xor // all XOR keys 0x01-0xFF
$xored2 = "payload" xor(0x01-0x0F) // restricted XOR range
Hex strings (binary patterns)
$mz = { 4D 5A } // MZ header
$wild = { 4D ?? 5A } // wildcard byte
$range = { 4D [2-4] 5A } // 2 to 4 arbitrary bytes
$jump = { 4D [0-] 5A } // unlimited jump
$alt = { (4D|4E) 5A } // byte alternative
$mask = { 4? 5A } // nibble wildcard
Regular expressions
$re1 = /[a-f0-9]{32}/ // MD5 hash pattern
$re2 = /https?:\/\/\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/ // IP in URL
$re3 = /[A-Za-z0-9+\/]{40,}={0,2}/ // base64 blob
$re4 = /CreateRemoteThread|VirtualAllocEx|WriteProcessMemory/i
3. Advanced Conditions
Core operators
condition:
$s1 and $s2 // both present
$s1 or $s2 // at least one
not $s1 // absent
$s1 and not $s2
any of ($s*) // any string matching $s*
all of ($s*) // every string matching $s*
2 of ($s1, $s2, $s3) // exactly 2 of the 3
any of them // any declared string
Cardinality and offsets
condition:
#s1 >= 3 // $s1 appears at least 3 times
#s1 == 1 // exactly once
@s1 < 0x1000 // first occurrence within the first 4KB
@s1[2] > 0x500 // 2nd occurrence past offset 0x500
!s1 > 100 // match length > 100 bytes
File inspection functions
condition:
filesize > 10KB and filesize < 5MB
uint8(0) == 0x4D // first byte = 'M'
uint16(0) == 0x5A4D // little-endian 'MZ'
uint32(0) == 0x464C457F // ELF magic
uint32be(0) == 0xCAFEBABE // Java class file
int32(filesize - 4) == 0x41424344 // last 4 bytes check
4. YARA Modules
PE module — Windows binary analysis
import "pe"
rule Packed_PE_Suspicious {
meta:
description = "PE with suspicious sections and few imports"
condition:
pe.is_pe and
pe.number_of_sections > 8 and
pe.imports("kernel32.dll", "VirtualAlloc") and
pe.imports("kernel32.dll", "WriteProcessMemory") and
pe.number_of_imports < 5 and
for any section in pe.sections : (
section.name == ".text" and
math.entropy(section.raw_data_offset, section.raw_data_size) > 7.0
)
}
import "pe"
rule Signed_But_Suspicious {
meta:
description = "Revoked or expired certificate"
condition:
pe.is_signed and
not pe.is_valid_signature and
pe.timestamp > 1577836800 // after 2020-01-01
}
Math module — entropy and statistics
import "math"
rule High_Entropy_Section {
meta:
description = "Packed or encrypted section (UPX, custom packer)"
condition:
filesize > 50KB and
math.entropy(0, filesize) > 7.2
}
Hash module — fingerprinting
import "hash"
rule Known_Dropper_Hash {
condition:
hash.md5(0, filesize) == "d41d8cd98f00b204e9800998ecf8427e" or
hash.sha256(0, filesize) == "e3b0c44298fc1c149afb...0655"
}
ELF module — Linux / Android
import "elf"
rule ELF_Mirai_Variant {
meta:
description = "Mirai variant targeting IoT devices"
strings:
$s1 = "LCOGQGPTGR" // common XOR-obfuscated strings
$s2 = "/proc/net/tcp"
$s3 = "router"
condition:
elf.type == elf.ET_EXEC and
elf.machine == elf.EM_ARM and
2 of ($s*)
}
5. Advanced Rules with Loops
Iterating over PE sections
import "pe"
rule PE_Injector_Classic {
meta:
description = "Classic process injection pattern"
strings:
$api1 = "OpenProcess" nocase
$api2 = "VirtualAllocEx" nocase
$api3 = "WriteProcessMemory" nocase
$api4 = "CreateRemoteThread" nocase
condition:
pe.is_pe and
3 of ($api*) and
pe.imports("kernel32.dll") and
for any i in (0..pe.number_of_sections - 1) : (
pe.sections[i].name == ".text" and
pe.sections[i].virtual_size > pe.sections[i].raw_data_size * 2
)
}
Polymorphic shellcode detection
rule Shellcode_Decoder_Stub {
meta:
description = "Generic XOR decoder stub"
strings:
// x86 XOR loop patterns
$x86_xor1 = { 8A ?? ?? 34 ?? 88 ?? ?? 4? 75 F? }
$x86_xor2 = { 30 ?? ?? 4? 83 ?? ?? 75 F? }
// x64 XOR loop patterns
$x64_xor1 = { 48 8B ?? ?? 48 33 ?? 48 89 ?? ?? 48 FF ?? }
condition:
filesize < 10KB and
any of ($x86_xor*, $x64_xor*)
}
6. Real-World Examples by Malware Family
Ransomware — generic detection
rule Ransomware_Generic_Behavior {
meta:
description = "Ransomware behavior: file enumeration + crypto APIs"
severity = "critical"
tags = "ransomware"
strings:
// Encryption APIs
$crypt1 = "CryptEncrypt" nocase
$crypt2 = "CryptGenKey" nocase
$crypt3 = "BCryptEncrypt" nocase
$crypt4 = "CryptAcquireContext" nocase
// Shadow copy deletion
$shadow1 = "vssadmin" nocase
$shadow2 = "delete shadows" nocase
$shadow3 = "wbadmin delete" nocase
$shadow4 = { 76 73 73 61 64 6D 69 6E } // "vssadmin" hex
// Target file extensions
$ext1 = ".docx" wide ascii
$ext2 = ".xlsx" wide ascii
$ext3 = ".pdf" wide ascii
// Ransom note indicators
$note1 = "bitcoin" nocase
$note2 = "decrypt" nocase wide ascii
$note3 = "your files" nocase wide ascii
condition:
pe.is_pe and
2 of ($crypt*) and
(1 of ($shadow*) or 3 of ($ext*)) and
any of ($note*)
}
Cobalt Strike Beacon
rule CobaltStrike_Beacon_Config {
meta:
description = "Cobalt Strike Beacon — in-memory config block"
author = "Mlab Team"
severity = "critical"
strings:
// Config block magic bytes
$cfg_start = { 00 00 00 BE EF }
// Checksum 0x5d3c8f39 (XORed default key)
$xor_key = { 69 68 68 68 }
// Default trial watermark
$watermark = { 00 00 CA FE }
// Default named pipes
$pipe1 = "\\.\pipe\msagent_" wide
$pipe2 = "\\MSSE-" wide
condition:
(uint32(0) == 0x0000BEEF or $cfg_start) and
any of ($pipe*) or
($xor_key and $watermark and filesize < 500KB)
}
Mimikatz
rule Mimikatz_Generic {
meta:
description = "Mimikatz or derivative tool"
severity = "critical"
strings:
$s1 = "sekurlsa::logonpasswords" nocase
$s2 = "lsadump::dcsync" nocase
$s3 = "kerberos::golden" nocase
$s4 = "privilege::debug" nocase
$s5 = "mimikatz" nocase wide ascii
$s6 = "Benjamin DELPY" nocase
$s7 = { 6D 00 69 00 6D 00 69 00 6B 00 61 00 74 00 7A 00 }
$api1 = "LsaIQueryInformationPolicyTrusted" nocase
$api2 = "SamIGetPrivateData" nocase
condition:
2 of ($s*) or
any of ($api*)
}
7. Rules for Malicious Documents
Office VBA macro dropper
rule Office_Macro_Dropper {
meta:
description = "Office document with dropper macro"
strings:
// OLE magic
$ole = { D0 CF 11 E0 A1 B1 1A E1 }
// VBA module markers
$vba1 = "VBA7" wide
$vba2 = "ThisDocument" wide
// Suspicious execution techniques
$exec1 = "Shell" nocase wide
$exec2 = "WScript.Shell" nocase wide
$exec3 = "CreateObject" nocase wide
$exec4 = "PowerShell" nocase wide
$exec5 = "cmd.exe" nocase wide
// Download functions
$dl1 = "XMLHTTP" nocase wide
$dl2 = "WinHttp" nocase wide
$dl3 = "URLDownloadToFile" nocase
// Basic obfuscation
$obf1 = "Chr(" nocase
$obf2 = "StrReverse" nocase
condition:
$ole at 0 and
any of ($vba*) and
2 of ($exec*) and
(any of ($dl*) or 3 of ($obf*))
}
PDF with malicious JavaScript
rule PDF_Malicious_JavaScript {
meta:
description = "PDF with obfuscated JS and potential shellcode"
strings:
$pdf_hdr = { 25 50 44 46 } // %PDF
$js1 = "/JavaScript" nocase
$js2 = "/JS" nocase
$launch = "/Launch"
$openaction = "/OpenAction"
$exploit1 = "util.printf" nocase
$exploit2 = "getAnnots" nocase
$exploit3 = "getIcon" nocase
$obf = "unescape" nocase
$heap = "%u0c0c%u0c0c"
condition:
$pdf_hdr at 0 and
($js1 or $js2) and
($openaction or $launch) and
(any of ($exploit*) or $obf or $heap)
}
8. Performance and Optimization
Golden rules
Always place the most restrictive conditions first to benefit from boolean short-circuit evaluation.
// BAD - evaluates entropy on every single file
condition:
math.entropy(0, filesize) > 7.0 and pe.is_pe
// GOOD - pe.is_pe filters out 99% of non-PE files first
condition:
pe.is_pe and math.entropy(0, filesize) > 7.0
Optimization checklist
- Use
filesizeas the very first filter (near-zero cost) - Check magic bytes early:
uint16(0) == 0x5A4D - Avoid complex regexes on large files without a pre-filter
- Prefer hex strings over text strings for binary patterns
- Avoid
any of them— be specific withany of ($s*) - Benchmark with
yara -p 4(4 threads) on a large corpus
9. Integration and Tooling
Command line
# Basic scan
yara rules.yar /path/to/suspect/
# Recursive scan
yara -r rules.yar /malware/corpus/
# Show matched strings
yara -s rules.yar sample.exe
# Show metadata
yara -m rules.yar sample.exe
# Multi-rule, multi-threaded
yara -p 8 rules/*.yar /corpus/
# Scan from stdin
cat sample.bin | yara rules.yar -
# Per-file timeout (prevent ReDoS)
yara --timeout=30 rules.yar /scan/
YARA with Python (yara-python)
import yara
# Compile rules
rules = yara.compile(filepath='rules.yar')
# Scan a file
matches = rules.match('/tmp/sample.exe')
for match in matches:
print(f"[!] {match.rule} — tags: {match.tags}")
for string in match.strings:
print(f" @ {hex(string.offset)}: {string.identifier} = {string.instances[0].matched_data[:64]}")
# Scan a live process (DFIR)
import psutil
for proc in psutil.process_iter(['pid', 'name']):
try:
matches = rules.match(pid=proc.info['pid'])
if matches:
print(f"ALERT: PID {proc.info['pid']} ({proc.info['name']}) matches {[m.rule for m in matches]}")
except Exception:
pass
MISP / OpenCTI integration
from pymisp import ExpandedPyMISP, MISPEvent
api = ExpandedPyMISP("https://misp.corp/", "API_KEY")
event = MISPEvent()
event.add_attribute('yara', value=open('rule.yar').read(),
comment='Cobalt Strike Beacon v4.x')
api.add_event(event)
10. Resources and Rule Repositories
- Mlab.sh — threat intelligence platform with built-in YARA scanning
- YARA GitHub — yara-project/yara (official documentation)
- Awesome YARA — github.com/InQuest/awesome-yara
- YARAify — abuse.ch/yaraify (community rules)
- Elastic Security — rules.elastic.co
- Mandiant / Google — github.com/mandiant/red_team_tool_countermeasures