Windows assembly language primer


















Since Intel x86 family processors use variable length instructions, some instructions are longer than others. Following references can be consulted for computing the opcode from the given instruction.

We will appreciate the amount of work done be the assembler in performing these translations. For that we use the command. In this way program can be executed in steps. If we call a subroutine, each instruction of that subroutine will be executed in steps. The fifth and higher arguments are passed on the stack. A scalar return value that can fit into 64 bits is returned through RAX. Non-scalar types including floats, doubles, and vector types are returned in XMM0. Volatile registers: They can be changed by a function that uses them.

Nonvolatile registers: They must be saved and restored by a function that uses them. You are commenting using your WordPress. You are commenting using your Google account.

You are commenting using your Twitter account. You are commenting using your Facebook account. Notify me of new comments via email. Notify me of new posts via email. If there is no error, you are good to go. To debug the salam. For line be line execution, we first need to add break points in the program. Has no effect on program execution. If a break point is added it will run till that break point.

When program is run, it will halt at first break point encountered nexti ni Execute the next instruction. Starting address is main. Starting address is name of the string i. End at zero byte. Zero is string termination character.

The msg is defined in the data section. Any other register can be used in place of RAX. Share this: Twitter Facebook. Like this: Like Loading Matlab, Internal error: Zero length license information. Comments Invited Cancel reply Enter your comment here Fill in your details below or click an icon to log in:. Email Address never made public. Follow Following.

Abu Bakar. Sign me up. Already have a WordPress. Log in now. Loading Comments Email Name Website. GDB will disassemble the next line of code after executing the present instruction. Print the flags from status register. Print 10 values, in he x adecimal format, each value is one b yte. Because some nybble patterns can look just like a number, it's often necessary to somehow indicate that we're talking about a pattern.

A more common convention is to just add the letter H to the end of the pattern: H. In both conventions, the H is referring to hexadecimal. Eventually you'll want to learn about using the hexadecimal number system, since it is an important way to use bit patterns. I'm not going to discuss it in this primer, because a number of books have much better treatments of this topic than I could produce.

Consider this an advanced topic you'll want to fill in later. Some of the possible manipulations are copying patterns from one place to another, turning on or turning off certain bits, or interpreting the patterns as numbers and performing arithmetic operations on them. To perform any of these actions, the has to know what part of memory is to be worked on. An address is a pointer into memory. Each address points to the beginning of a byte long chunk of memory.

The has the capability to distinguish 1,, differant bytes of memory. By this point, it probably comes as no suprise to hear that addresses are represented as patterns of bits. It takes 20 bits to get a total of 1,, differant patterns, and thus an address may be written down as a series of 5 nybble codes. For example, DOS stores a pattern which encodes information about what equipment is installed on your IBM PC in the word which begins at location Interpreting the address as a hex number, the second byte of this word has an address 1 greater than , or The isn't very happy handling 20 bits at a time.

The biggest chunk that's convenient for it to use is a 16 bit word. The actually calculates 20 bit addresses as the combination of two words, a segment word and an offset word. The combination process involves interpreting the two patterns as hexadecimal numbers and adding them.

The way that two 16 bit patterns can be combined to give one 20 bit pattern is that the two patterns are added out of alignment by one nybble: 4 nybble segment 4 nybble offset 5 nybble address Because of this mechanism for calculating addresses, they will often be written down in what may be called segment:offset form.

The significance of the patterns is determined by what the computer is being used for at any given time. The can look at memory and interpret a pattern it sees there as specifying one of the some fundamental operations it knows how to do. Note that there is no way for the to know whether a given pattern is meant to be an instruction, or a piece of data to operate on. It is quite possible for the chip to accidentally begin reading what was intended to be data, and interpret it as a program.

Some pretty bizarre things can occur when this happens. In assembly language programming circles, this is known as "crashing the system". For example, the pattern which tells the to flip all the bits in the byte at address is: F6 16 55 55 which is not very informative, although you can see the address in there. In ancient history, the old wood-burning and vacuum tube computers were programmed by laboriously figuring out bit patterns which represented the series of instructions desired.

Needless to say, this technique was incredibly tedious, and very prone to making errors. It finally occured to these ancestral programmers that they could give the task of figuring out the proper patterns to the computer itself, and assembly language programming was born. For example, in boolean algebra, the logical operation which inverts the state of a bit is called "not", and hence the assembly language equivalent of the preceding machine language pattern is: NOTB [] The brackets around the roughly mean "the memory location addressed by".

Unfortunately, the can't make head nor tail of the string of characters "NOTB". What's needed is a special program to run on the which converts the string "NOTB" into the pattern F6 This program is called an assembler. A good analogy is that an assembler program is like a meat grinder which takes in assembly language and gives out machine language. Typically, an assembler reads a file of assembly language and translates it one line at a time, outputing a file of machine language. The listing shows each line from the source file, along with the shorthand "nybble code" representation of the object code produced.

In the event that the assembler was unable to understand any of the source lines, it inserts error messages in the listing, pointing out the problem.

The primeval assembly language programmers had to write their assembler programs in machine language, because they had no other choice. When you think about it, there's a sort of circular logic in action here. Someday, I hope to use the present version of CHASM to produce a machine language version, which will run about a hundred times faster, and at the same time bring this crazy process full circle.

THE The preceding discussions have I hope given you some very general background, a world view if you will, about assembly and machine language programming. At this point, I'd like to get into a little more detail, beginning by examining the internal structure of the microprocessor, from the programmer's point of view.

Once you've digested this, I'd recomend going to The Book for a deeper treatment. To use the CHASM assembler, you're going to need The Book anyway, to tell you the different instructions and their mnemonics.

In assembly language, each of the registers has a two letter mnemonic name. These are registers which hold patterns pulled in from memory which are to be worked on within the You can use these registers for just about anything you want. Each of the general purpose registers can be broken down into two 8 bit registers, which have names of their own.

The "H" and "L" stand for high and low respectively. The AX register, and it's 8 bit low half, the AL register, are somewhat special. Some operations of the can only be carried out on the contents of the accumulators, and many others are faster when used in conjunction with these registers. These registers hold segment values for use in calculating memory addresses.

The CS, or code segment register, is used every time the accesses memory to read an instruction pattern. The DS, or data segment register, is used for bringing data patterns in. The SS register is used to access the stack more about the stack later. The ES is the extra segment register. A very few special instructions use the ES register to access memory, plus you can overide use of the DS register and substitute the ES register, if you need to maintain two separate data areas.

Indirect addressing is beyond the scope of this little primer, but is discussed in The Book. The SP register is used to implement a stack in memory. Although it's physically possible to directly manipulate the value in the SP register, it's best to leave it alone, since you could wipe out the stack.

Finally, there are two registers which are relatively inaccessable to direct manipulation. This register always contains the offset part of the address of the next instruction to be executed. The last register is also relatively inaccessable. The latter is somewhat steeped in history, since this was the name given to a special location in memory which served a similar function on the antique IBM mainframe.

There are special instructions which allow you to set or clear each of these flags. In addition, many instructions affect the state of the flags, depending on the outcome of the instruction. For example, one of the bits of the status register is called the Zero flag. Any operation which ends up generating a bit pattern of all 0's automatically sets the Zero flag on.

In assembly language, the only way to make a decision and branch accordingly is via this flag testing mechanism. It's very common to see one of these comparision operations used to set the flags just before a conditional branch.

Each line may consist of one or more of the following parts: First, a label, which is just a marker for the assembler to use. If you want to branch to an instruction from some other part of the program, you put a label on the instruction. When you want to branch, you refer to the label. In general, the label can be any string of characters you want. A good practice is to use a name which reminds you what that particular part of the program does.

CHASM will assume that any string of characters which starts in the first column of a line is intended to be a label. After the label, or if the text of the line starts to the right of the first column, at the beginning of the text, comes an instruction mnemonic.

This specifies the operation that the line is asking for. For a list of the odd mnemonics, along with the instructions they stand for, see The Book.

The operands are what the operation is to work on, and are listed after the instruction mnemonic. There are a number of possible operands. Probably the most common are registers, specified by their two letter mnemonics.

Generally immediate data is specified by it's nybble code representation, marked as such by following it with the letter "H". Some assemblers allow alternate ways to specify immediate data which emphasize the pattern's intended use. CHASM recognizes five different ways to represent immediate data.

A memory location can be used as an operand. We've seen one way to do this, by enclosing it's address in brackets. You can now see why the brackets are needed. Without them, you couldn't distinguish between an address and immediate data. If you've asked the assembler to set aside a section of memory for data more on this latter , and put a label on the request, you can specify that point in memory by using the label. Finally, there are a number of indirect ways to address memory locations, which you can read about in The Book.

The last major type of operands are labels. In assembly language, you specify locations which may be branched to by putting a label on them. You can then use the label as an operand on branches. Often times, the order in which the operands are listed can be important. For example, when moving a pattern from one place to another, you need to specify where the pattern is to come from, and where it's going.

Thus, to move the pattern in the DX register into the AX register, you would write: MOV AX,DX This may take some getting used to, since when reading from left to right it seems reasonable to assume that the transfer goes in this direction as well.

However, since this convention is pretty well entrenched in the assembly language community, CHASM goes along with it. Assembly language programs tend to be very hard to follow, and so it's particularly important to put in lots of comments so that you'll remember just what it was you were trying to do with a given piece of code. Since the assembler ignores the comments, they cost you nothing in terms of size or speed of execution in the resulting machine language program.

This is in sharp contrast to BASIC, where each remark slows your program down and eats up precious memory. Generally, a character is set aside to indicate to the assembler the beginning of a comment, so that it knows to skip over. CHASM follows a common convention of reserving the semi-colon ; for marking comments. The stack is just a portion of memory which has been temporarily set aside to be used in a special way.

To get a picture of how the stack works, think of the spring loaded contraptions you sometimes see holding trays in a cafeteria. As each tray is washed, the busboy puts it on top of the stack in the contraption. Because the thing is spring loaded, the whole stack sinks down from the weight of the new tray, and the top of the stack ends up always being the same height off the floor.

When a customer takes a tray off the stack, the next one rises up to take it's place. In the computer, the stack is used to hold data patterns, which are generally being passed from one program or subroutine to another. By putting things on the stack, the receiving routine doesn't need to know a particular address to look for the information it needs, it just pulls them off the top of the stack.

There is some jargon associated with use of the stack. Because you don't need to keep track of where the patterns are actually being kept, the stack is often used as a scratch pad area, patterns being pushed when the register they're in is needed for some other purpose, then popped out when the register is free. It's very common for the first few instructions of a subroutine to be a series of pushes to save the patterns which are occupying the registers its about to use.

Following the analogy of the cafeteria contraption, when you pop the stack, the pattern you get is the last one which was pushed. When you pop a pattern off, the next-to-last thing pushed automatically moves to the top, just as the trays rise up when a customer removes one. Everything comes off the stack in the reverse order of which they went on. Of course, there are no special spring loaded memory locations inside the computer. The stack is implemented using a register which keeps track of where the top of the stack is currently located.

When you push something, the pointer is moved to the next available memory location, and the pattern is put in that spot. When something is popped, it is copied from the location pointed at, then the pointer is moved back. You don't have to worry about moving the pointer because it's all done automatically with the push and pop instructions.



0コメント

  • 1000 / 1000