A C++ class used for thread synchronization named timed-mutex can be used for sleep evasion in Windows, by delaying execution just enough to trick anti-virus software into declaring a malicious payload as benign. In this article we explore which System APIs are used to implement timed-mutexes, aiming to debloat the result and make it compatible with other evasion techniques.
Revisiting https://ghostline.neocities.org/avEvasion/#timed-mutexes, published a few months ago, this technique used timed mutexes in C++ to delay malware execution just enough to time out dynamic analysis in sandboxes such as anyrun or virustotal. This type of technique is useful in red team or pentest engagements since organizations will always have some sort of antivirus/EDR software running and will block any payload that look too obviously like malware.
Remembering definitions, https://en.cppreference.com/w/cpp/thread/timed_mutex.html, a mutex is a synchronization primitive used in parallel computation to protect data in critical regions and avoid problems like race conditions. The std::timed_mutex class (similarly to the std::mutex class) implements a synchronization mechanism in which it allows threads to attempt claim ownership of a timed_mutex object during a specific duration by leveraging member functions such as try_lock_for or try_lock_until.
Let’s consider two threads A and B, A enters a critical section and locks a timed_mutex until it’s done processing. B now tries to enter the same critical section and will try to lock the same timed_mutex, by using the member function try_lock_for for a duration of 10min, this means execution in thread B will halt for 10min while trying to acquire a lock on the object, if for some reason A does not unlock, then try_lock_for will timeout and B will continue to execute.
This primitive, if used in a single threaded program will translate into an execution delay for a predetermined duration and since it isn’t that well known or documented, allowed to bypass detection rules which usually caught more obvious winAPI functions like Sleep.
But the original method was bloated and unoptimized, it relied too much on the C++ runtime and OOP features to perform the delay.
So, to optimize the technique I needed to distil it down to the windows API functions used by the programming language to implement this behaviour. Enters reverse engineering.

By analyzing the call stack of the main thread of a binary implementing the initial version of the timex_mutex evasion technique, it might be possible to get some insight into what is happening ‘under the hood’.
This specific trace was dumped after the process was already halted and sleeping, in it we can observe at least two functions with ‘Sleep’ in their definition and NtWaitForAlertByThreadId, which was the last call before entering the sleep delay.
Functions defined in ntdll.dll, such as NtWaitForAlertByThreadId are in most cases not officially documented, with Microsoft warning against its use since they can be modified without warning on system updates. Calling NT functions directly is risky, this behavior is typically viewed by AVs as a malware indicator, so let’s start with a top-down approach starting with documented functions. In Windows when an OS API function is called from a user-level application it will most likely be calling a function exported by kernel32.dll or kernelbase.dll, these are high-level native windows functions used for system programming. To interact with the windows kernel and perform certain operations ntdll.dll exposes another set of API functions, ntdll is effectively acting as a user-level frontend for the kernel, with all this linking together by having the kernel32.dll functions call a lower-level function from ntdll.dll which will pass execution to the kernel using syscall operations.
So, our starting point to setting the thread asleep is without a doubt SleepConditionVariableCS.
Looking at its definition it seems easy to use:
BOOL SleepConditionVariableCS(
[in, out] PCONDITION_VARIABLE ConditionVariable,
[in, out] PCRITICAL_SECTION CriticalSection,
[in] DWORD dwMilliseconds
);We need to initialize a CONDITION_VARIABLE and CRITICAL_SECTION variable and pass its reference into the function, with the dwMilliseconds argument containing the delay duration in milliseconds.
The full program looked like this:
#include "windows.h"
CONDITION_VARIABLE BufferNotFull;
CRITICAL_SECTION BufferLock;
int main(){
InitializeConditionVariable (&BufferNotFull);
InitializeCriticalSection (&BufferLock);
DWORD dwMilliseconds = 20000;
SleepConditionVariableCS(&BufferNotFull,&BufferLock, dwMilliseconds);
return 0;
}In the code above we initialized the first and second arguments as required by the documentation and defined a DWORD variable with a delay of 20s.
Looking at the call stack for the sleeping thread we can see it has correctly replicated the timed mutex behaviour, implemented in pure C using only windows API functions:

Using mingw64, without any compiler optimizations the resulting executable had a size of 2.4 mb using C++ timed mutexes and a contrasting 111 kb when using the SleepConditionVariableCS approach.
We can still take this further and invoke NtWaitForAlertByThreadId directly, although in my testing and compilation conditions it did not provide further size reduction.
typedef NTSTATUS (NTAPI* FnNtWaitForAlertByThreadId)(
_In_opt_ PVOID Address,
_In_opt_ PLARGE_INTEGER Timeout
);
int main(){
HMODULE nt = GetModuleHandleW(L"ntdll.dll");
FnNtWaitForAlertByThreadId pnNtWaitForAlertByThreadId =(FnNtWaitForAlertByThreadId)GetProcAddress(nt, "NtWaitForAlertByThreadId");
LARGE_INTEGER timeout = { 0 };
timeout.QuadPart = -30000000LL;// ~3 sec
pnNtWaitForAlertByThreadId(NULL, &timeout);
return 0;
}The call stack trace is noticeably smaller.

References
https://en.cppreference.com/w/cpp/thread/timed_mutex.html
https://ghostline.neocities.org/avEvasion/#timed-mutexes
https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-sleep
https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-sleepconditionvariablecs