Quantcast
Channel: CodeSection,代码区,Linux操作系统:Ubuntu_Centos_Debian - CodeSec
Viewing all articles
Browse latest Browse all 11063

Introduction to Linux Intel Assembly Language

$
0
0
Norman Matloff February 5, 20022001, 2002, N.S. Matloff Contents

2 Different Assemblers

3 Assembler Command-Line Syntax

5 16-Bit, 8-Bit and String Operations

6 Linking into an Executable File

7 What If You Compile a C Program?

8 How to Execute Those Sample Programs

8.1 ``Normal'' Execution Won't Work

8.2 Running Our Programs Using gdb/ddd

8.2.1 Use a Debugging Tool for ALL of Your Programming, in EVERY Class

8.2.2 Using ddd for Executing Our Assembly Programs

8.2.3 Using gdb for Executing Our Assembly Programs

8.3 An Assembly-Language Specific Debugger: ald

1 Overview

This document introduces the use of assembly language on linux systems. The intended audience is students in the first week or two of a computer systems/assembly language course. It is assumed that the reader is already familiar with Unix, and has been exposed a bit to the Intel register and instruction set.

2 Different Assemblers

Our emphasis will be on as (also written sometimes as gas , for ``GNU assembler''), the assembler which is part of the gcc package. Its syntax is commonly referred to as the ``AT&T syntax,'' alluding to Unix's Bell Labs origins.

However, we will also be using another commonly-used assembler, NASM. It uses Intel's syntax, which is similar to that of as but does differ in some ways. For example, for two-operand instructions, as has us specify the source first while NASM wants the destination first.

It is very important to note, though, that the two assemblers will produce the same machine code. Unlike a compiler, whose output is unpredictable, we know ahead of time what machine code an assembler will produce, because the assembly-language mnemonics are merely handy abbreviations for specific machine-language bit fields.

Suppose for instance we wish to copy the contents of the AX register to the BX register. In as we would write

mov %ax,%bx

while in NASM it would be

MOV BX,AX

but the same machine-language will be produced in both cases, 0x6689c3.

3 Assembler Command-Line Syntax

To assemble an AT&T-syntax source file, say x.s (UNIX custom is that assembly-language files end with a .s suffix), we will type

as -a --gstabs -o x.o x.s

The -o option specifies what to call the object file , i.e. machine-code file, which is the primary output of the assembler. The -o means we are telling the assembler, ``The name we want for the .o file immediately follows,'' in this case x.o.

The -a option tells the assembler to display to the screen the source code, machine code and segment offsets side-by-side, for easier correlation.

The -gstabs option tells the assembler to retain in x.o the symbol table , a list of the locations of whatever labels are in x.s, in the object file. This is used by symbolic debuggers, in our case gdb or ddd .

If the file were instead in Intel syntax, our command would be

nasm -f elf -o x.o -l x.l x.s

The -f option instructs the assembler to set up the x.o file so that the executable file constructed from it later on will be of the ELF format, which is a common executable format on Linux platforms. The -l option plays a similar role to -a in as , in that a side-by-side listing of source and machine code will be written to the file x.l.

Things are similar under other operating systems.Using the Microsoft or Turbo compilers, for example, assembly language source files have the suffix .asm, object files have the suffix .obj, etc.

4 Sample Program

In this very simple example, we find the sum of the elements in a 4-word array, x.

First, the program using AT&T syntax:

# introductory example; finds the sum of the elements of an array .data # start of data segment x: .long 1 .long 5 .long 2 .long 18 sum: .long 0 .text # start of code segment .globl _start _start: movl $4, %eax # EAX will serve as a counter for # the number of words left to be summed movl $0, %ebx # EBX will store the sum movl $x, %ecx # ECX will point to the current # element to be summed top: addl (%ecx), %ebx addl $4, %ecx # move pointer to next element decl %eax # decrement counter jnz top # if counter not 0, then loop again done: movl %ebx, sum # done, store result in "sum"

And the version using Intel syntax:

; introductory example; finds the sum of the elements of an array SECTION .data ; start of data segment global x x: dd 1 dd 5 dd 2 dd 18 sum: dd 0 SECTION .text ; start of code segment mov eax,4 ; EAX will serve as a counter for ; the number words left to be summed mov ebx,0 ; EBX will store the sum mov ecx, x ; ECX will point to the current ; element to be summed top: add ebx, [ecx] add ecx,4 ; move pointer to next element dec eax ; decrement counter jnz top ; if counter not 0, then loop again done: mov [sum],ebx ; done, store result in "sum"

Let's discuss this in the context of the AT&T syntax.

First, we have the line

.data # start of data segment

The fact that this begins with `.' signals the assembler that this will be a directive , meaning a command to the assembler rather than something the assembler will translate into an instruction. (The `#' character means that it and the remainder of the line are to be treated as a comment.) This directive here is indicating that what follows will be data rather than code.

Next

x: .long 1 .long 5 .long 2 .long 18

This tells the assembler to make a note in x.o saying that when this program is later loaded for execution, there will be four consecutive ``long'' (i.e. 32-bit) words in memory set with initial values 1, 5, 2 and 18 (decimal).Moreover, we are telling the assembler that in our assembly code below, the first of these four long words will be referred to as x. We say that x is a label for this word.Similarly, immediately following those four long words in memory will be a long word which we will refer to in our assembly code below as sum.

By the way, what if x had been an array of 1,000 long words instead of four, with all words to be initialized to, say, 8? Would we need 1,000 lines? No, we could do it this way:

x: .rept 1000 .long 8 .endr

The .rept directive tells the assembler to act as if the lines following .rept, up to the one just before .endr, are repeated the specified number of times.

Next we have a directive signalling the start of the text segment, meaning actual program code. Look at the first two lines:

_start: movl $4, %eax

Here _start is another label, in this case for the location in memory at which execution of the program is to begin, called the entry point , in this case that movl instruction. We did not choose the name for this label arbitrarily, in contrast to all the others; the UNIX linker takes this as the default.

The movl instruction copies the constant 4 to the EAX register.The `l' in ``movl'' means ``long.'' The corresponding Intel syntax,

mov eax,4 has no su

Viewing all articles
Browse latest Browse all 11063

Latest Images

Trending Articles