#
# RRRACC 1.03 (C) 1999 Rajaat
#
# Description:  Random Line & Variable substituting preprocessor for assembler
#               sources.
# Language:     Perl 5
# Platform:     Any that runs Perl, but RRRACC is targetted for Turbo Assembler
#               sources, so should be restricted to Dos/Wintel platform.
# Invocation:   perl rrracc.pl infile.r outfile.asm     (for processing)
#               perl rrracc.pl --help                   (for docs)
# Compilation:  Consult the Perl 5 documents in order to learn how to generate
#               C code from a Perl program, and how to compile it with DJGPP
#               and link in the Perl library. In general, this should do:
#               perl -MO=C rrracc.pl > rrracc.c
#               gcc -s -O3 -o rrracc.exe rrracc.c -lperl
#
# About the source:
#
# Here is the PERL sourcecode of RRRACC 1.03, the final version of my very
# first kind of mutating source preprocessor.
#
# I am well aware by releasing this source, I will get replies from Perl
# mongers stating they could do this in just xxx lines of code instead
# of my perhaps not too well structured source, but please take into
# account that this is my very first Perl source that goes beyond a
# "print sort <STDIN>;"
#
# You are free to make any changes/bugfixes/enhancements, as long as
# you:
#       - comment the changes to the source code
#       - send the changed source to rajaat@itookmyprozac.com
#       - leave my name in there, too
#       - if you distribute it on your homepage, please provide a link
#         to http://www.sourceofkaos.com/homes/rajaat
#
# To understand some things in the source you must at least have some
# basic knowledge of PERL, like the fact that with a foreach (@arr) every
# line during the loop gets automatically assigned to $_, which you
# don't need to specify in regular expressions of if statements.
#
# Recommended books for reading to understand PERL:
#
# Programming Perl - 2nd edition (covers Perl 5)
#       Larry Wall, Tom Christiansen & Randal L. Schwartz
#       ISBN 1-56592-149-6      O'Reilly
#       Price: US $39.95        Approx. Pages: 645
#
# Perl Cookbook - Solutions and Examples for Perl Programmers
#       Tom Christiansen & Nathan Torkington
#       ISBN 1-56592-243-3      O'Reilly
#       Price: US $39.95        Approx. Pages: 757
#
# Mastering Regular Expressions - Powerful Techniques for Perl and Other Tools
#       Jeffrey E.F. Friedl
#       ISBN 1-56592-257-3      O'Reilly
#       Price: US $29.95        Approx. Pages: 342
#
# Where can you order these books?
#       http://www.oreilly.com
#
# Where can I get Perl?
#       http://www.perl.com
#       http://perl.oreilly.com
#       http://www.activestate.com
#

$idname = "RRRACC1.03";         # set id name for output
$id = pack "u",$idname;         # and UUENCODE to $id
chomp($id);                     # remove trailing newline

# print HEAD message stdout (inline document)
print <<HEAD;
Rajaats Recursive Random Assembler Code Creator (RRRACC) Version 1.03
Internal Version Id $idname

To see usage, use with --help

HEAD
# end of HEAD inline document

if (@ARGV[0] =~ /-{1,2}([h](elp)?)|(\?)/)       # user asks for help?
{ # yes, print EOT message to stdout (inline document)
  print <<EOT;

A word of warning
~~~~~~~~~~~~~~~~~
    This tool is not intended for use by novice users, since you will
    need in-depth assembler knowlegde to know what kind of operands
    cause which flags to trigger, or what operands you can change
    without creating a possibility that one of the generated source
    codes will not compile properly, or even worse, produce a virus that
    is defective! RRRACC also ain't fool proof yet, and it could benefit
    from a lot of improvements, so if you don't use it with care,
    disasters could happen. In other words,


                        YOU HAVE BEEN WARNED!!!!!


General
~~~~~~~
    Rajaats Recursive Random Assembler Code Creator, RRRACC for short
    (and if you still don't like it you can pronounce it as ROCK), is a
    utility that processes text files, recognizes special tokens which
    it uses to randomize the input, and write the result to an output
    file. This does not have to mean assembler code, but my primary
    intention was to make it work mainly with assemblers.

Invocation
~~~~~~~~~~
    To invoke RRRACC on a file, use the following syntax:
        RRRACC infile.ext outfile.ext

    RRRACC does not perform any extension checking, so I suggest you
    use .rrr for source file extension, and .asm for the RRRACC
    generated files.

    The output will usually show something like this:

    ----------
      Rajaats Recursive Random Assembler Code Creator (RRRACC) Version 1.03
      Internal Version Id RRRACC1.03

      To see usage, use with --help

      Pass one - reading input file dsa2.rrr...
      Pass two - processing file...
      Pass three - write the processed file to dsa2.asm...
      Done.

      Statistics:
        1 mutation out of 8.88520724322027e+62 possible complete mutations
        1 mutation out of 589824 possible line mutations
        1 mutation out of 1.50641670112106e+57 possible variable mutations
        1 mutation out of 1 possible variable ranges

      Please be nice and do not remove the header of the generated file, it is
      for educational purpose only. :-)

      Comments to rajaat\@itookmyprozac.com
    ----------

    Please not that RRRACC doesn't has strong error checking, so you
    better get familiar with the inner workings of the tokens, which I
    will explain to you in the next section. You also might get shocked
    at the large numbers in the statistics. These are mathematically
    correct, assuming the instructions are very different in nature, a 1
    out of 1 mutation means that there weren't any mutations performed
    of this type.

ASM to RRRACC
~~~~~~~~~~~~~
    The most common use of RRRACC is probably converting an existing
    virus source to a RRRACC parseable one, which is easiest to do in
    the beginning. If you grow more accustomed to the use of RRRACC you
    might want to write directly code that can be parsed by it.

    RRRACC is able to do only three things:

    1.  Random change a consecutive range of lines
    2.  Pick a random string from a selection and put it in a variable
    3.  Replace variable references in the asm source with the random
        chosen selection

    This might not seem a lot, and indeed I think it is yet a start, but
    if properly applied, this can create tons of variants from one
    single RRRACC source.

    How to swap lines
    ~~~~~~~~~~~~~~~~~
        This is the easiest part to understand, I bet even macro virus
        writers understand this (oh oh, I feel I'll get my arse kicked
        for saying this). All you have to do is put a ! (that's right,
        and exclamation mark) at the start of the lines you wish to
        randomize. For example

                !               mov ax,4202h
                !               xor cx,cx
                !               xor dx,dx               ; DON'T USE CWD!
                                int 21h

        would mean that the first three lines can be changed in random
        order. I also hereby present you the first caveat you could get
        yourself entangled in. If you are an optimizing fanatic and
        change the  xor dx,dx  to a  cwd  ,it might have a chance of
        generating a sequence like this

                                cwd
                                mov ax,4202h
                                xor cx,cx

        where you don't know what value ax had before it was converted
        to a doubleword. You are in luck if ax already happens to be
        less than 8000h. If you don't understand this, go buy a book
        about assembler or surf the internet to get one.

        Well as you see, it's not hard to use RRRACC, it's more a
        problem of knowing assembler right. Another example

                !               pop ds
                !               cmp ax,4b00
                                jz infect

        would be perfectly right, since the  pop ds  opcode does not
        affect the flags, so the  cmp ax,4b00  is allowed to be swapped
        before it. But look out for this

                !               pop ds
                !               mov word ptr ds:[old_21],bx

        since the second line depends on ds set properly, you can't swap
        these.

    Assigning a random string to a variable
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        Variables are assigned on a separate blank line, which will be
        ommited in the output. Variables are recognized by their ~
        prefix, and can contains characters,digits and _, however they
        may not start with a digit (I could support this, but it's not a
        good habit of using them anyway). I'll show a silly example and
        explains what it does:

                ~rndreg = ( "bx" "cx" "dx" )            ; 1
                ~rndmov = ( "mov" "xchg" )              ; 2
                        ~rndmov ~rndreg,ax              ; 3
                        push ~rndreg                    ; 4
                ~rndreg = ( "bx" "cx" "dx" "si" "di" )  ; 5
                ~rndopen = ( "3d02h" "3d82h" )          ; 6
                        mov ~rndreg,~rndopen            ; 7
                ~rndmov = ( "mov" "xchg" )              ; 8
                        ~rndmov ax,~rndreg              ; 9
                        int 21h                         ; 10
                        pop cx                          ; 11

        At line 1, the variable rndreg gets a value assigned from chosen
        random from its parameters. Lets say, ~rndreg becomes bx. At
        line 2 we'll do the same, but then we don't assing a random
        register, but a random operand. In this example, we'll assume
        ~rndmov gets "xchg" assigned. Now line 3 and 4 will be parsed
        and RRRACC sees the variables and will replace them with their
        random assigned value. The above example *could* generate:

                xchg bx,ax  or  mov dx,ax  or  xchg cx,ax
                push bx         push dx        push cx

        A random value stays assigned to a variable until the variable
        is reused/reinitialized. Once you know this, understanding the
        rest of the above example is very easy.

    Assigning a random hex number to a variable
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        Since $Idname assigning variables has been extended with the
        possibility of generating a hexadecimal random number between
        two numbers. Hexadecimal numbers are specified in C notation,
        but the output will be TASM compatible.

                ~maxlen = ( 0xe000 0xf400 )     ; maximum host size
                ~cjump = ( "ja" "jae" )
                cmp cx,~maxlen
                ~cjump too_big

    Combining
    ~~~~~~~~~
        The true power of RRRACC (yuck) comes in view when you combine
        both the line randomizer and the random variable substitution.
        I'll show you a little example again:

                ~rndzero1 = ( "xor" "sub" )
                ~rndzero2 = ( "xor" "sub" )
                !       mov ax,4202h            ; seek eof
                !       ~rndzero1 cx,cx
                !       ~rndzero2 dx,dx

        This is very annoying if for antivirus researcher to analyse if
        you can create enough possibilities in your source code to
        get randomized.

A small word on the mutation mathematics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Here is a small table that shows the amount of mutations possible
    with a certain amount of consecutive lines set to randomized order:

        Consecutive Lines               Possible amount of mutations
        ============================================================
        1                                       1
        2                                       2
        3                                       6
        4                                      24
        5                                     120
        6                                     660
        7                                   4,620
        8                                  36,960
        9                                 332,640
        10                              3,326,400

    The formula to calculate this is fairly simple and straightfordward,
    you just take the amount of mutations possible shown by the last
    line and multiply it by the amount of consecutive lines on the next
    line, so 11 consecutive lines show give a possible 11*3,326,400
    amount of mutations, and so on. To calculate the total amounts of
    line mutations possible in a file, multiply all possible amount of
    mutations times of all the blocks encountered in the source code. I
    think I am correct in my calculations, but I would love it to be
    proved wrong by Alan Solomon, after all he's a doctor in
    mathematics. :-)

Future
~~~~~~
    RRRACC is very powerful in its simplicity, still it lacks a lot of
    thing I would like to add before I start on my GRACE project, like

        + random ordering blocks of code in addition to lines
        + variables assignments and parameters that can span more than a
          line (and can contain newline characters, possibly with a new
          line starting with a !, which should be randomized)
        + recursive variable assignment (by using variables in
          variables, I didn't call it rRracc for nothing <grin>)
        + parameter exclusion, so you can say that it can't choose a
          parameter thas has been assigned to another variable
        + random line ordering reversed stack, so you can push and pop
          registers in the right order
        + stronger error checking - this version has some, but not
          enough to my liking
        + multiple file processing and bulk output (using numbering
          instead of an extension)
        + optional automatic compilation, by calling a batch file (to
          invoke assembler/linker with your own parameters), ideal for
          bulk generation

History
~~~~~~~
    02-28-1999: Release of RRRACC 1.03, including PERL source
                + Final release of RRRACC, since it is very bulky, when
                  compiled, and I have finally started on coding GRACE,
                  so this should become obsolete. Maybe I will release
                  newer versions if I still think there should be an
                  intermediate release before the first beta release of
                  GRACE. At least I hope the source code will be
                  interesting to study for PERL fanatics.

    01-23-1999: Release of RRRACC 1.02
                + Shows amount of possible random mutations in a source
                  file, don't get shocked at the monstrous numbers,
                  these numbers are mathematically correct (at least
                  they are, I think, if you doubt it, just try to
                  generate some samples and prove me it calculates
                  wrong)

    01-21-1999: Release of RRRACC 1.01
                + Support for random hex ranges added

    01-20-1999: Release of RRRACC 1.0

    01-19-1999: Release of SwapLine, actually RRRACC 0.01 :-)

Contact
~~~~~~~
    If you have any questions or bug reports (this does not include
    invalid use, read the docs), feel free to mail them to
    rajaat\@itookmyprozac.com and I'll try to answer them. You can also
    check my website for new updates or new programs, which is located
    at
                http://www.sourceofkaos.com/homes/rajaat

Hint
~~~~
    You might want to print out this help, which can simply do by
    redirecting the output, like : RRRACC --help > lpt1.

EOT
# end of inline document EOT (I should separate this from the main source)

exit 0;         # end program after printing documents
}

# pass one, opening the files and reading in the source
open (INFILE,"<@ARGV[0]") or die "Can't open input file $infile : $!\n";
open (OUTFILE,">@ARGV[1]") or die "Can't open output file $outfile : $!\n";
print "Pass one - reading input file @ARGV[0]...\n";

@source = <INFILE>;                     # read all input from file

# pass two, expanding and initializing the randomizing macros
print "Pass two - processing file...\n";

foreach (@source)                       # process all lines
{
  $lineno++;                            # keep counter of lines processed
  if (/\~(\w+)\s*=\s*\(/)               # regexp: ~ascii *= *( ?
  {
    $line = $_;                         # yes, copy line
    $var = $1;                          # get variable name (without ~)
    if (@params = ($line =~ /\"(.*?)\"/g)) {    # get all quoted strings (if any)
      $mutvar = 1;                      # initialise variable mutation counter
      for ($i = 0; $i <= $#params; $i++)        # process all parameters
      {
        $mutvar *= ($i+1);              # and calculate variable mutation
      }
      push @mutvars, $mutvar;           # and store in the var mutation list
      $param = @params[int(rand($#params+1))];     # choose a random line from stack
      $var{$var} = $param;              # assign chosen line to variable (hash)
    }
    elsif (($min, $max) = ($line =~ /0x([0-9A-Fa-f]{1,8})\s*0x([0-9A-Fa-f]{1,8})/g)) {
      # if two hex numbers found (0x0000 format) store in $min and $max
      # convert min and max number to a decimal
      $decmax = hex $max;
      $decmin = hex $min;
      push @mutrnds,($decmax+1-$decmin);        # store possible random numbers

      # convert random chosen number to TASM hex format and store in $param
      $param = sprintf("0%lxh",int (rand ($decmax+1-$decmin)+$decmin));
      $var{$var} = $param;              # assign chosen number to variable (hash)
    }
    else
    { # wrong type of parameter found (no quoted strings or hex numbers)
      die "No recognized data type at line $lineno\n";  # exit program
    }
    next;       # go on to process next line
  }
  else
  { # not a correct assignment if () not found in assignment
    die "Incorrect assignment at line $lineno\n" if /\~\w+\s*=/;
    if (/\~(\w+)/)              # if a variable reference found (like ~var)
    {
      @params = /\~(\w+)/g;     # get all variable references in a line
      foreach $param (@params) {        # replace each with the value
        s/\~$param/$var{$param}/g;      # assigned to the variable in the
      }                                 # hash table
    }
  }
  if (/^!(.*)/s)                        # if the line starts with a !
  {
    push @rndlines, $1;                 # push line (not the !) on the stack
  }
  else                                  # otherwise
  {
    $line = $_;                         # get current line
    $rnds = $#rndlines;                 # get amount of lines in stack
    $mutations = 1;                     # initialise line mutation count
    for ($i = 0; $i <= $rnds; $i++)     # process random lines
    {
      $mutations *= ($i+1);             # calculate possible line mutations
      $rnd = int(rand($#rndlines+1));   # choose a random line from stack
      push @target, "@rndlines[$rnd]";  # push random line in destination array
      splice(@rndlines,$rnd,1);         # and remove item from the stack
    }
    push @mutations,$mutations;         # store line mutation count
    push @target, $line;                # push fixed line in destination array
  }
}

# pass three, writing out the processed file to the destination
print "Pass three - write the processed file to @ARGV[1]...\n";

# but first calculate the amounts of possible mutations in a source
$mutlines = 1;
$mutvariables = 1;
$mutranges = 1;
foreach $mutations (@mutations)
{
  $mutlines *= $mutations;
}
foreach $mutvars (@mutvars)
{
  $mutvariables *= $mutvars;
}
foreach $mutrnd (@mutrnds)
{
  $mutranges *= $mutrnd;
}
$mutcount = $mutlines * $mutvariables * $mutranges;

# array @header will have the header asm (inline document)
@header = <<HEADER;
;=====( RRRACC generated file )================================================
; Source : @ARGV[0]
; Target : @ARGV[1]
; Processed by RRRACC Version : [[[ $id ]]]
; 1 mutation out of $mutcount possible complete mutations
; 1 mutation out of $mutlines possible line mutations
; 1 mutation out of $mutvariables possible variable mutations
; 1 mutation out of $mutranges possible variable ranges
; RRRACC written by Rajaat (comments to rajaat\@itookmyprozac.com)
;==============================================================================
HEADER
# end of HEADER inline document

print OUTFILE @header;          # write header to outfile

foreach (@target)               # write all lines in the @target array
{
  print OUTFILE;                # to the outfile
}
close (INFILE);                 # close input
close (OUTFILE);                # close output

# print end message of RRRACC (inline document)
print <<FOOT;
Done.

Statistics:
  1 mutation out of $mutcount possible complete mutations
  1 mutation out of $mutlines possible line mutations
  1 mutation out of $mutvariables possible variable mutations
  1 mutation out of $mutranges possible variable ranges

Please be nice and do not remove the header of the generated file, it is
for educational purpose only. :-)

Comments to rajaat\@itookmyprozac.com
FOOT
# end of inline document FOOT
# the program exits here
