Introduction to Linux Intel Assembly Language

Norman Matloff February 5, 20022001, 2002, N.S. Matloff Contents

2 Different Assemblers

3 Assembler Command-Line Syntax

5 16-Bit, 8-Bit and String Operations

6 Linking into an Executable File

7 What If You Compile a C Program?

8 How to Execute Those Sample Programs

8.1 ``Normal'' Execution Won't Work

8.2 Running Our Programs Using gdb/ddd

8.2.1 Use a Debugging Tool for ALL of Your Programming, in EVERY Class

8.2.2 Using ddd for Executing Our Assembly Programs

8.2.3 Using gdb for Executing Our Assembly Programs

8.3 An Assembly-Language Specific Debugger: ald

1 Overview

This document introduces the use of assembly language on linux systems. The intended audience is students in the first week or two of a computer systems/assembly language course. It is assumed that the reader is already familiar with Unix, and has been exposed a bit to the Intel register and instruction set.

2 Different Assemblers

Our emphasis will be on as (also written sometimes as gas , for ``GNU assembler''), the assembler which is part of the gcc package. Its syntax is commonly referred to as the ``AT&T syntax,'' alluding to Unix's Bell Labs origins.

However, we will also be using another commonly-used assembler, NASM. It uses Intel's syntax, which is similar to that of as but does differ in some ways. For example, for two-operand instructions, as has us specify the source first while NASM wants the destination first.

It is very important to note, though, that the two assemblers will produce the same machine code. Unlike a compiler, whose output is unpredictable, we know ahead of time what machine code an assembler will produce, because the assembly-language mnemonics are merely handy abbreviations for specific machine-language bit fields.

Suppose for instance we wish to copy the contents of the AX register to the BX register. In as we would write

mov %ax,%bx

while in NASM it would be

MOV BX,AX

but the same machine-language will be produced in both cases, 0x6689c3.

3 Assembler Command-Line Syntax

To assemble an AT&T-syntax source file, say x.s (UNIX custom is that assembly-language files end with a .s suffix), we will type

as -a --gstabs -o x.o x.s

The -o option specifies what to call the object file , i.e. machine-code file, which is the primary output of the assembler. The -o means we are telling the assembler, ``The name we want for the .o file immediately follows,'' in this case x.o.

The -a option tells the assembler to display to the screen the source code, machine code and segment offsets side-by-side, for easier correlation.

The -gstabs option tells the assembler to retain in x.o the symbol table , a list of the locations of whatever labels are in x.s, in the object file. This is used by symbolic debuggers, in our case gdb or ddd .

If the file were instead in Intel syntax, our command would be

nasm -f elf -o x.o -l x.l x.s

The -f option instructs the assembler to set up the x.o file so that the executable file constructed from it later on will be of the ELF format, which is a common executable format on Linux platforms. The -l option plays a similar role to -a in as , in that a side-by-side listing of source and machine code will be written to the file x.l.

Things are similar under other operating systems.Using the Microsoft or Turbo compilers, for example, assembly language source files have the suffix .asm, object files have the suffix .obj, etc.

4 Sample Program

In this very simple example, we find the sum of the elements in a 4-word array, x.

First, the program using AT&T syntax:

# introductory example; finds the sum of the elements of an array .data # start of data segment x: .long 1 .long 5 .long 2 .long 18 sum: .long 0 .text # start of code segment .globl _start _start: movl $4, %eax # EAX will serve as a counter for # the number of words left to be summed movl $0, %ebx # EBX will store the sum movl $x, %ecx # ECX will point to the current # element to be summed top: addl (%ecx), %ebx addl $4, %ecx # move pointer to next element decl %eax # decrement counter jnz top # if counter not 0, then loop again done: movl %ebx, sum # done, store result in "sum"

And the version using Intel syntax:

; introductory example; finds the sum of the elements of an array SECTION .data ; start of data segment global x x: dd 1 dd 5 dd 2 dd 18 sum: dd 0 SECTION .text ; start of code segment mov eax,4 ; EAX will serve as a counter for ; the number words left to be summed mov ebx,0 ; EBX will store the sum mov ecx, x ; ECX will point to the current ; element to be summed top: add ebx, [ecx] add ecx,4 ; move pointer to next element dec eax ; decrement counter jnz top ; if counter not 0, then loop again done: mov [sum],ebx ; done, store result in "sum"

Let's discuss this in the context of the AT&T syntax.

First, we have the line

.data # start of data segment

The fact that this begins with `.' signals the assembler that this will be a directive , meaning a command to the assembler rather than something the assembler will translate into an instruction. (The `#' character means that it and the remainder of the line are to be treated as a comment.) This directive here is indicating that what follows will be data rather than code.

x: .long 1 .long 5 .long 2 .long 18

This tells the assembler to make a note in x.o saying that when this program is later loaded for execution, there will be four consecutive ``long'' (i.e. 32-bit) words in memory set with initial values 1, 5, 2 and 18 (decimal).Moreover, we are telling the assembler that in our assembly code below, the first of these four long words will be referred to as x. We say that x is a label for this word.Similarly, immediately following those four long words in memory will be a long word which we will refer to in our assembly code below as sum.

By the way, what if x had been an array of 1,000 long words instead of four, with all words to be initialized to, say, 8? Would we need 1,000 lines? No, we could do it this way:

x: .rept 1000 .long 8 .endr

The .rept directive tells the assembler to act as if the lines following .rept, up to the one just before .endr, are repeated the specified number of times.

Next we have a directive signalling the start of the text segment, meaning actual program code. Look at the first two lines:

_start: movl $4, %eax

Here _start is another label, in this case for the location in memory at which execution of the program is to begin, called the entry point , in this case that movl instruction. We did not choose the name for this label arbitrarily, in contrast to all the others; the UNIX linker takes this as the default.

The movl instruction copies the constant 4 to the EAX register.The `l' in ``movl'' means ``long.'' The corresponding Intel syntax,

mov eax,4 has no su

Introduction to Linux Intel Assembly Language

Trending Articles

明慧广播：明慧文章汇编-修心断欲（5）

嫩妹淬炼成御姊 AV女优樱木凛解放变身

Perplexity推出的AI瀏覽器Comet可被誘騙，研究人員引導至冒牌電商網站自動下單

比较盘日木月木，究竟是哪方对哪方包容宠爱？ (豆瓣合盘研究小组)

PST Walker 6.05 免安裝版 - 免裝Outlook即可閱讀PST檔郵件

二手中国车涌入巴基斯坦汽车业面临冲击

出售: 王靖文 once more

Cocos Creator 圆角矩形Shader

免费翻墙节点大全

打包Android App后，游戏无法拖动图片了

俄罗斯大神版本 Adobe Photoshop 2025 (v26.0) Multilingual

[图像制作]通用大学毕业证生成器免注册版 v8.5 免费版

由：免安裝軟體封裝工具：Cameyo 1.50.247.0 繁中免安裝版 | Geek is a Lift-Style.

為發放留步曼城冬令有愛

（应求）[OPFans枫雪动漫][ONE PIECE 海贼王][720p mkv合集][TV 1-878 SP1-13]+[剧场版1-13]（附Jump...

出售: Audio Note DAC 3 膽解碼(影音寄賣)

关门一家亲：习远平、张澜澜、徐才厚

urwtest v1.8新版发布，增加错误处理，彩色输出

惠誉确认印度的评级为“BBB-”

【美食典故】糊涂糕的由来