Dissecting a Compiled program

As a programmer I always wondered how a compiled program operates?

Questions like 

Where are our Variables stored?

How does a computer organize our program's code and data in virtual memory?


If not all I will try to give you most of the vital information you will find useful in your programming career. Yes, memory segmentation will be the right topic if we want to dive deeper into a compiled program's memory.

Don't worry this won't be that long booooring post.


* Who is recommended to read this:

    -- Any Programmer who wants to understand a compiled program's memory.  ("Why and How it worked? Is every good programmer's question")

    -- This is a must-read for those who are into security research /cyber sec. 


* Any prerequisite?

    -- Yes, of course, basic programming knowledge. (we won't code but you will have a better understanding of this post if you know the basics of a programming language).

    -- Basic assembly language (It's ok if you don't know assembly. But it's always better if you know at least the basics)




    Still don't know the basics of Assembly Language and think yourself a good programmer? What a shame! Learn and clear all your Assembly Language doubts from this post --CLICK HERE TO MASTER ASSEMBLY BASICS--



A compiled program's memory is divided into five segments. They are:

TEXT segment,

DATA segment,

BSS segment,

STACK segment and

HEAP segment, each with its own special purpose.




Text segment

A code segment or text segment is where our program's machine language instructions are located. When we run our program, the EIP register (told you to learn assembly) is set to the first instruction in this text segment.


Well, this is how our CPU executes our program's machine code stored in the text segment:

1. Read and execute the instruction pointed by EIP register (EIP is just a special variable for the CPU).

2. Increment EIP so that it points to the next following instruction.

3. Goes back to step 1.

Sometimes, the instruction our EIP points to will be a jump statement that will jump to a different address in memory (within the text segment) from the current instruction. The processor doesn't really care about this change because it already knows that the execution of a program can be non-linear.

Even if our EIP has jumped to a different address in memory (near or far) our processor will still follow the above execution procedure/steps.

Remember: The text segment is read-only segment. Since it is only used to store code and not variables. This read-only (non-writable) protects the segment from being modified. Another thing to be noted about this segment is that its size is fixed since nothing ever changed.


Data and Bss segment:

These segments are used to store global or any static program variables. In particular, the data segment is used for storing initialized global and static variables while the bss segment is filled with its uninitialized counterparts.

Both global and static variables persist as they are stored in their own memory segments.

That's it for these segments? Yes, easy right? NO...Just wait a while! (just kidding)


Heap segment:

The heap segment is a segment in memory where a programmer can directly control. You can use as much memory as you need from this segment (like for storing your unlimited wishes). Naturally, by its function, its size is not fixed, so it can grow larger and smaller as needed.

Remember those malloc, free, realloc functions you use in your C program. It just makes use of this segment to allocate and reserve memory for use on the fly. 

Note: Heap grows towards higher memory address. (What? That's natural. Yes, I know but keep in mind).


Stack segment: 

This segment is used as a temporary scratch pad to store local function variables and context during function calls. And obviously, its sized is also not fixed. But unlike heap, stack grows towards lower memory address (it's ok if you are confused by this. You will understand more on this later).




Another image I stole from the internet. (I am a hacker ;)

As you can see from the above figure Heap grows towards higher memory and opposite for the stack. Don't worry about the stack figure above yet.


Another Ad break:

Grab me a coffee. I need coffee to THINK!!!. Think of more amazing posts for you.


More on stack:

When a program calls a function, the actual code of the function is in the text (or code) segment. And all the local variables of that function are at a different memory address in the stack segment.


A stack is filled with stack frames. What is a stack frame?

A stack frame can be thought of as a function's storage area. So, the more function() a program has, the more stack frame. But stack frame is only created when the function is called.


Since the context and the EIP (Instruction Pointer) must change when a function is called, the stack is used to remember all of the passed variables and the location the EIP should return after the function is finished, and all the variables used by the previous function.

Confused? Don't worry all your doubts will be cleared as you read along.

When you google "stack", you will find that it is an abstract data structure. It is FILO in nature (First in, Last out), which means that the item that is put into a stack is the last item to come out of it. Let's think of stack with an example. Suppose you have a pile of books or a stack of books. In order to remove the first book (the bottom one), you need to remove all the others at the top of it. But you can just pull out the first book. NO, you can't and that's how stack works. First in last out, last in first out.

Putting an item on stack is known as pushing and popping when removed.



Note: The above stack figure is the standard stack diagram (without any security feature)



The above figure shows the contents of a stack frame. At the top of the stack frame (since stack grows towards lower address) stores all of our local variables inside our function. 

The ESP register always points to the top of the stack. Notice that the EBP register is used to reference our variables. For example: if we want to get ARG1 then EBP+16 holds it. 

The RETURN (saved return address) holds an address which the execution will continue after the execution of the current function is finished.

The SAVED EBP is the EBP for the previously executed function.

Those ARGS are values passed to the function (function arguments).



So, now we have a brief idea of how the above stack frame is used.

Let's recap: 

After the current function finishes execution.

1. Saved EBP is restored into the EBP register.

2. EIP is pointed to the saved RETURN address on the stack frame.

3. Execution continues from the RETURN address.



That's it for stack. There is a lot more to explain on memory segmentation. If you want deeper detail of stack or any other topic on memory segmentation then let me know. Hope I answered some of your questions. 

Congratulation, you now have the basic understanding of the internals of a compiled program (Some don't even know this yet).

What topic do you want to see next?

Post a Comment (0)
Previous Post Next Post