//==// // // /|| // //==== //==// //| // // // // // //|| // // // // //|| // //==// //==// //=|| // // // // // || // // // // // || // // // // // ||// // // // // || //==== //==== //==// // ||/ /==== // // // /==== /| /| // // // // // //| //| ===\ // // // ===\ //|| //|| // // \\ // // // ||// || ====/ // \\ // ====/ // ||/ || ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ DISCLAIMER: This file is 100% guaranteed to exist. The author makes no claims to the existence or nonexistence of the reader. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ This space intentionally left blank. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ GREETS: Welcome home, Hellraiser! Hello to the rest of the PHALCON/SKISM crew: Count Zero, Demogorgon, Garbageheap, as well as everyone else I failed to mention. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Dark Angel's Clumpy Virus Writing Guide ÄÄÄÄ ÄÄÄÄÄÄÄ ÄÄÄÄÄÄ ÄÄÄÄÄ ÄÄÄÄÄÄÄ ÄÄÄÄÄ "It's the cheesiest" - Kraft ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ INSTALLMENT IV: RESIDENT VIRII, PART I ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Now that the topic of nonresident virii has been addressed, this series now turns to memory resident virii. This installment covers the theory behind this type of virus, although no code will be presented. With this knowledge in hand, you can boldly write memory resident virii confident that you are not fucking up too badly. ÄÄÄÄÄÄÄÄÄÄ INTERRUPTS ÄÄÄÄÄÄÄÄÄÄ DOS kindly provides us with a powerful method of enhancing itself, namely memory resident programs. Memory resident programs allow for the extention and alteration of the normal functioning of DOS. To understand how memory resident programs work, it is necessary to delve into the intricacies of the interrupt table. The interrupt table is located from memory location 0000:0000 to 0000:0400h (or 0040:0000), just below the BIOS information area. It consists of 256 double words, each representing a segment:offset pair. When an interrupt call is issued via an INT instruction, two things occur, in this order: 1) The flags are pushed onto the stack. 2) A far call is issued to the segment:offset located in the interrupt table. To return from an interrupt, an iret instruction is used. The iret instruction reverses the order of the int call. It performs a retf followed by a popf. This call/return procedure has an interesting sideeffect when considering interrupt handlers which return values in the flags register. Such handlers must directly manipulate the flags register saved in the stack rather than simply directly manipulating the register. The processor searches the interrupt table for the location to call. For example, when an interrupt 21h is called, the processor searches the interrupt table to find the address of the interrupt 21h handler. The segment of this pointer is 0000h and the offset is 21h*4, or 84h. In other words, the interrupt table is simply a consecutive chain of 256 pointers to interrupts, ranging from interrupt 0 to interrupt 255. To find a specific interrupt handler, load in a double word segment:offset pair from segment 0, offset (interrupt number)*4. The interrupt table is stored in standard Intel reverse double word format, i.e. the offset is stored first, followed by the segment. For a program to "capture" an interrupt, that is, redirect the interrupt, it must change the data in the interrupt table. This can be accomplished either by direct manipulation of the table or by a call to the appropriate DOS function. If the program manipulates the table directly, it should put this code between a CLI/STI pair, as issuing an interrupt by the processor while the table is half-altered could have dire consequences. Generally, direct manipulation is the preferable alternative, since some primitive programs such as FluShot+ trap the interrupt 21h call used to set the interrupt and will warn the user if any "unauthorised" programs try to change the handler. An interrupt handler is a piece of code which is executed when an interrupt is requested. The interrupt may either be requested by a program or may be requested by the processor. Interrupt 21h is an example of the former, while interrupt 8h is an example of the latter. The system BIOS supplies a portion of the interrupt handlers, with DOS and other programs supplying the rest. Generally, BIOS interrupts range from 0h to 1Fh, DOS interrupts range from 20h to 2Fh, and the rest is available for use by programs. When a program wishes to install its own code, it must consider several factors. First of all, is it supplanting or overlaying existing code, that is to say, is there already an interrupt handler present? Secondly, does the program wish to preserve the functioning of the old interrupt handler? For example, a program which "hooks" into the BIOS clock tick interrupt would definitely wish to preserve the old interrupt handler. Ignoring the presence of the old interrupt handler could lead to disastrous results, especially if previously-loaded resident programs captured the interrupt. A technique used in many interrupt handlers is called "chaining." With chaining, both the new and the old interrupt handlers are executed. There are two primary methods for chaining: preexecution and postexecution. With preexecution chaining, the old interrupt handler is called before the new one. This is accomplished via a pseudo-INT call consisting of a pushf followed by a call far ptr. The new interrupt handler is passed control when the old one terminates. Preexecution chaining is used when the new interrupt handler wishes to use the results of the old interrupt handler in deciding the appropriate action to take. Postexecution chaining is more straightforward, simply consisting of a jmp far ptr instruction. This method doesn't even require an iret instruction to be located in the new interrupt handler! When the jmp is executed, the new interrupt handler has completed its actions and control is passed to the old interrupt handler. This method is used primarily when a program wishes to intercept the interrupt call before DOS or BIOS gets a chance to process it. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ AN INTRODUCTION TO DOS MEMORY ALLOCATION ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Memory allocation is perhaps one of the most difficult concepts, certainly the hardest to implement, in DOS. The problem lies in the lack of official documentation by both Microsoft and IBM. Unfortunately, knowledge of the DOS memory manager is crucial in writing memory-resident virii. When a program asks DOS for more memory, the operating system carves out a chunk of memory from the pool of unallocated memory. Although this concept is simple enough to understand, it is necessary to delve deeper in order to have sufficient knowledge to write effective memory-resident virii. DOS creates memory control blocks (MCBs) to help itself keep track of these chunks of memory. MCBs are paragraph-sized areas of memory which are each devoted to keeping track of one particular area of allocated memory. When a program requests memory, one paragraph for the MCB is allocated in addition to the memory requested by the program. The MCB lies just in front of the memory it controls. Visually, a MCB and its memory looks like: ÚÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ MCB 1 ³ Chunk o' memory controlled by MCB 1 ³ ÀÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ When a second section of memory is requested, another MCB is created just above the memory last allocated. Visually: ÚÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄ¿ ³ MCB 1 ³ Chunk 1 ³ MCB 2 ³ Chunk 2 ³ ÀÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÙ In other words, the MCBs are "stacked" one on top of the other. It is wasteful to deallocate MCB 1 before MCB 2, as holes in memory develop. The structure for the MCB is as follows: Offset Size Meaning ÄÄÄÄÄÄ ÄÄÄÄÄÄÄ ÄÄÄÄÄÄÄ 0 BYTE 'M' or 'Z' 1 WORD Process ID (PSP of block's owner) 3 WORD Size in paragraphs 5 3 BYTES Reserved (Unused) 8 8 BYTES DOS 4+ uses this. Yay. If the byte at offset 0 is 'M', then the MCB is not the end of the chain. The 'Z' denotes the end of the MCB chain. There can be more than one MCB chain present in memory at once and this "feature" is used by virii to go resident in high memory. The word at offset 1 is normally equal to the PSP of the MCB's owner. If it is 0, it means that the block is free and is available for use by programs. A value of 0008h in this field denotes DOS as the owner of the block. The value at offset 3 does NOT include the paragraph allocated for the MCB. It reflects the value passed to the DOS allocation functions. All fields located after the block size are pretty useless so you might as well ignore them. When a COM file is loaded, all available memory is allocated to it by DOS. When an EXE file is loaded, the amount of memory specified in the EXE header is allocated. There is both a minimum and maximum value in the header. Usually, the linker will set the maximum value to FFFFh paragraphs. If the program wishes to allocate memory, it must first shrink the main chunk of memory owned by the program to the minimum required. Otherwise, the pathetic attempt at memory allocation will fail miserably. Since programs normally are not supposed to manipulate MCBs directly, the DOS memory manager calls (48h - 4Ah) all return and accept values of the first program-usable memory paragraph, that is, the paragraph of memory immediately after the MCB. It is important to keep this in mind when writing MCB-manipulating code. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ METHODS OF GOING RESIDENT ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ There are a variety of memory resident strategies. The first is the use of the traditional DOS interrupt TSR routines, either INT 27h or INT 21h/Function 31h. These routines are undesirable when writing virii, because they do not return control back to the program after execution. Additionally, they show up on "memory walkers" such as PMAP and MAPMEM. Even a doorknob can spot such a blatant viral presence. The traditional viral alternative to using the standard DOS interrupt is, of course, writing a new residency routine. Almost every modern virus uses a routine to "load high," that is, to load itself into the highest possible memory location. For example, in a 640K system, the virus would load itself just under the 640K but above the area reserved by DOS for program use. Although this is technically not the high memory area, it shall be referred to as such in the remainder of this file in order to add confusion and general chaos into this otherwise well-behaved file. Loading high can be easily accomplished through a series of interrupt calls for reallocation and allocation. The general method is: 1. Find the memory size 2. Shrink the program's memory to the total memory size - virus size 3. Allocate memory for the virus (this will be in the high memory area) 4. Change the program's MCB to the end of the chain (Mark it with 'Z') 5. Copy the virus to high memory 6. Save the old interrupt vectors if the virus wishes to chain vectors 7. Set the interrupt vectors to the appropriate locations in high memory When calculating memory sizes, remember that all sizes are in paragraphs. The MCB must also be considered, as it takes up one paragraph of memory. The advantage of this method is that it does not, as a rule, show up on memory walkers. However, the total system memory as shown by such programs as CHKDSK will decrease. A third alternative is no allocation at all. Some virii copy themselves to the memory just under 640K, but fail to allocate the memory. This can have disastrous consequences, as any program loaded by DOS can possibly use this memory. If it is corrupted, unpredictable results can occur. Although no memory loss is shown by CHKDSK, the possible chaos resulting from this method is clearly unacceptable. Some virii use memory known to be free. For example, the top of the interrupt table or parts of video memory all may be used with some assurance that the memory will not be corrupted. Once again, this technique is undesirable as it is extremely unstable. These techniques are by no means the only methods of residency. I have seen such bizarre methods as going resident in the DOS internal disk buffers. Where there's memory, there's a way. It is often desirable to know if the virus is already resident. The simplest method of doing this is to write a checking function in the interrupt handler code. For example, a call to interrupt 21h with the ax register set to 7823h might return a 4323h value in ax, signifying residency. When using this check, it is important to ensure that no possible conflicts with either other programs or DOS itself will occur. Another method, albeit a costly process in terms of both time and code length, is to check each segment in memory for the code indicating the presence of the virus. This method is, of course, undesirable, since it is far, far simpler to code a simple check via the interrupt handler. By using any type of check, the virus need not fear going resident twice, which would simply be a waste of memory. ÄÄÄÄÄÄÄÄÄÄÄÄÄ WHY RESIDENT? ÄÄÄÄÄÄÄÄÄÄÄÄÄ Memory resident virii have several distinct advantages over runtime virii. o Size Memory resident virii are often smaller than their runtime brethern as they do not need to include code to search for files to infect. o Effectiveness They are often more virulent, since even the DIR command can be "infected." Generally, the standard technique is to infect each file that is executed while the virus is resident. o Speed Runtime virii infect before a file is executed. A poorly written or large runtime virus will cause a noticible delay before execution easily spotted by users. Additionally, it causes inordinate disk activity which is detrimental to the lifespan of the virus. o Stealth The manipulation of interrupts allows for the implementation of stealth techniques, such as the hiding of changes in file lengths in directory listings and on-the-fly disinfection. Thus it is harder for the average user to detect the virus. Additionally, the crafty virus may even hide from CRC checks, thereby obliterating yet another anti- virus detection technique. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ STRUCTURE OF THE RESIDENT VIRUS ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ With the preliminary information out of the way, the discussion can now shift to more virus-related, certainly more interesting topics. The structure of the memory resident virus is radically different from that of the runtime virus. It simply consists of a short stub used to determine if the virus is already resident. If it is not already in memory, the stuf loads it into memory through whichever method. Finally, the stub restores control to the host program. The rest of the code of the resident virus consists of interrupt handlers where the bulk of the work is done. The stub is the only portion of the virus which needs to have delta offset calculations. The interrupt handler ideally will exist at a location which will not require such mundane fixups. Once loaded, there should be no further use of the delta offset, as the location of the variables is preset. Since the resident virus code should originate at offset 0 of the memory block, originate the source code at offset 0. Do not include a jmp to the virus code in the original carrier file. When moving the virus to memory, simply move starting from [bp+startvirus] and the offsets should work out as they are in the source file. This simplifies (and shortens) the coding of the interrupt handlers. Several things must be considered in writing the interrupt handlers for a virus. First, the virus must preserve the registers. If the virus uses preexecution chaining, it must save the registers after the call to the original handler. If the virus uses postexecution chaining, it must restore the original registers of the interrupt call before the call to the original handler. Second, it is more difficult, though not impossible, to implement encryption with memory resident virii. The problem is that if the interrupt handler is encrypted, that interrupt handler cannot be called before the decryption function. This can be a major pain in the ass. The cheesy way out is to simply not include encryption. I prefer the cheesy way. The noncheesy readers out there might wish to have the memory simultaneously hold two copies of the virus, encrypt the unused copy, and use the encrypted copy as the write buffer. Of course, the virus would then take twice the amount of memory it would normally require. The use of encryption is a matter of personal choice and cheesiness. A sidebar to preservation of interrupt handlers: As noted earlier, the flags register is restored from the stack. It is important in preexecution chaining to save the new flags register onto the stack where the old flags register was stored. Another important factor to consider when writing interrupt handlers, especially those of BIOS interrupts, is DOS's lack of reentrance. This means that DOS functions cannot be executed while DOS is in the midst of processing an interrupt request. This is because DOS sets up the same stack pointer each time it is called, and calling the second DOS interrupt will cause the processing of one to overwrite the stack of the other, causing unpredictable, but often terminal, results. This applies regardless of which DOS interrupts are called, but it is especially true for interrupt 21h, since it is often tempting to use it from within an interrupt handler. Unless it is certain that DOS is not processing a previous request, do NOT use a DOS function in the interrupt handler. It is possible to use the "lower" interrupt 21h functions without fear of corrupting the stack, but they are basically the useless ones, performing functions easily handled by BIOS calls or direct hardware access. This entire discussion only applies to hooking non-DOS interrupts. With hooking DOS interrupts comes the assurance that DOS is not executing elsewhere, since it would then be corrupting its own stack, which would be a most unfortunate occurence indeed. The most common interrupt to hook is, naturally, interrupt 21h. Interrupt 21h is called by just about every DOS program. The usual strategy is for a virus to find potential files to infect by intercepting certain DOS calls. The primary functions to hook include the find first, find next, open, and execute commands. By cleverly using pre and postexecution chaining, a virus can easily find the file which was found, opened, or executed and infect it. The trick is simply finding the appropriate method to isolate the filename. Once that is done, the rest is essentially identical to the runtime virus. When calling interrupts hooked by the virus from the virus interrupt code, make sure that the virus does not trap this particular call, lest an infinite loop result. For example, if the execute function is trapped and the virus wishes, for some reason, to execute a particular file using this function, it should NOT use a simple "int 21h" to do the job. In cases such as this where the problem is unavoidable, simply simulate the interrupt call with a pushf/call combination. The basic structure of the interrupt handler is quite simple. The handler first screens the registers for either an identification call or for a trapped function such as execute. If it is not one of the above, the handler throws control back to the original interrupt handler. If it is an identification request, the handler simply sets the appropriate registers and returns to the calling program. Otherwise, the virus must decide if the request calls for pre or postexecution chaining. Regardless of which it uses, the virus must find the filename and use that information to infect. The filename may be found either through the use of registers as pointers or by searching thorugh certain data structures, such as FCBs. The infection routine is the same as that of nonresident virii, with the exception of the guidelines outlined in the previous few paragraphs. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄ WHAT'S TO COME ÄÄÄÄÄÄÄÄÄÄÄÄÄÄ I apologise for the somewhat cryptic sentences used in the guide, but I'm a programmer, not a writer. My only suggestion is to read everything over until it makes sense. I decided to pack this issue of the guide with theory rather than code. In the next installment, I will present all the code necessary to write a memory-resident virus, along with some techniques which may be used. However, all the information needed to write a resident virii has been included in this installment; it is merely a matter of implementation. Have buckets o' fun!