@database "GameDev"
@Author "Sam Jordan"
@$VER: GameDev.guide V1.1 (23.7.97)
@REM 1.1 ;New at V1.1
@REM -1.1 ;End of new block at V1.1

@Node Main "System-compliant games development on the PPC"


                   System-compliant games development on the PPC
                              1997/98 by Sam Jordan
                          © HAAGE&PARTNER Computer GmbH

Tips, thought and suggestions regarding the development of games for the PowerAMIGA


        @{"          Preface             " link Prologue}     @{"        Configurability        " link Configuration}
        @{"        Introduction          " link Introduction}     @{"            Control            " link Control}
        @{"         Philosophy           " link Philosophy}     @{"           Difficulty          " link Difficulty}
        @{"Choice of programming language" link Language}     @{"      The necessary change     " link Change}
        @{"   Structure of a PPC-game    " link Structure}     @{"     Playability / Fairness    " link Playability}
        @{"    Graphics programming      " link Graphics}     @{"         Demo-Versions         " link Demos}
        @{"         Interaction          " link Interaction}     @{"         Thoughts on 3D        " link 3D}
        @{"         RAM is slow          " link RAM}     @{"            Support            " link Address}
        @{"        MMU and Cache         " link MMU_Cache}
        @{"       Multiprocessing        " link Multiprocessing}
        @{"   Scheduling / Optimizing    " link Scheduling}


@EndNode

@Node Prologue "Preface"

The first floppy disk any newly bought AMIGA swallowed was most likely a
games-disk in most of the cases. It was in my case, at least, and I still
remember the game very well: Silkworm. That was around the end of 1991.

Games have always been one of the supporting legs of the AMIGA - together with
the operating system and the famous custom-chips. It must be for this reason
that the AMIGA got stuck with the image of a games-machine - and this
prejudice is still uttered by uninformed users of different models. But his
prejudice contains a grain of truth for sure.

To me, the prime time of AMIGA games development was at the beginning of the
nineties. The AMIGA was still doing good and games were high in demand. With
the problems and the final demise of Commodore the games began their downward
spiral as well. Quantity as well as quality started to diminish. The short
interlude with AMIGA Technologies under Escom wasn't enough to bring new
momentum to the games business. Nevertheless, the games scene was still alive
although it had of course a lot of its importance.

During all this time the games developers on competing systems did all but
sleep. In the past years gaming technology has experienced a literal boom
which was mostly caused by the ever-improving hardware. In the shadow of this
huge technology jump, the AMIGA fell by the wayside. Only in the areas where
it had been generations ahead of the competition it could now, at best, keep
competing.

The demands of gamers have fundamentally changed. Today graphics, music and
speed are what matters while a few years back values such as playability and
atmosphere were stressed a lot more. This isn't even meant to imply that
today's games don not have these qualities anymore. It rather means that the
priorities have changed. And in order to be successful in the highly
fought-over games market, one HAS to pay attention to what the player actually
wants.

The AMIGA lacked one thing: speed. Exactly this missing speed is having a
highly negative impact on the games business. If you wanted speed, you went
out and bought a competing system that boasts 3-digit CPU clocks and got your
kick out of the speed frenzy.

So let me get to the point of this document. The AMIGA is now getting what it
used to lack: speed!

By employing the PowerPC-processors the AMIGA can leap-frog ahead and be at
equal terms with the competition again. The AMIGA is given another chance at
being at the very top. The same holds true for the games.

A PowerAMIGA makes games possible that achieve the same quality as those on
the competing machines. The PowerPC-processor is state-of-the-art and can even
outpeform for example the Intel-processors.

However, in order to get the maximum performance out of games for the
PowerAMIGA, a lot of know-how is necessary. The AMIGA is not only a processor,
it consists of many single components that closely work together. As important
as the processor is the graphics hardware. If access to the video memory
completely bogs down a game even the fastest processor on this planet is of no
use. The key is to spot existing bottlenecks and to circumvent them. This
document in front of you contains some very valuable tips for
games-programming. The biggest bottlenecks are described and solutions are
presented.

The WarpOS-archive contains two demo-programs that have a high significance in
this respect. 'Cybermand' and, first and foremost, 'voxelspace'. Both programs
show how you can and should develop a game for the PPC. And both programs
impressively demonstrate what amazing speeds can be achieved in a 100%
system-compliant fashion if you have enough experience.

This document not only covers programming-techniques but also something even
more important: game-design. A game only gets bought if is capable of
convincing customers of its quality. An important matter in this context is
the creation of suitable demo-version. The very biggest part of all demo
version I have seen to this date were not worth a thing. It really is a pity
that such mistakes can destroy the large amount of work invested in a game.

Why do I take the trouble of writing such a document?
Simple: we at HAAGE&PARTNER like good games. And I want the AMIGA to step
forward into a better future - and I want the games to take that step with it.

Sam Jordan

@EndNode

@Node Introduction "Introduction"

The purpose of this document is to provide new impulses for games development
on the AMIGA. It describes what must be considered when developing games
especially for the PowerPC-processor and what the strengths and weaknesses of
the PPC are as well as how to use these to your advantage.

At first, technical matters are discussed and the biggest problems that may
occur are described. After that the aspects of game-design are covered - I
will discuss what has to be considered during the development process in order
to create a game of high playability and quality.

What exactly can be achieved with the PPC is demonstrated by the two
demo-programs 'cybermand' and, above all, 'voxelspace'. These two programs
demonstrate how a PPC-game can be structured and impressively prove what high
speed can be achieved in a completely system-compliant way.

What is necessary for super-fast games: they must be able to run under WarpOS.
WarpOS is currently the only chance to run fast games/demos on the
dual-processor-board. WarpOS was also specially optimized for games and offers
features which can be very helpful especially when developing games.

@EndNode

@Node Philosophy "Philosophy"

Right at the beginning I want to state which philosophy I think should be
applied when developing games in the future:

Future games should be completely compliant with the respective operating
system they run on, but they should also offer the user the option of
switching on certain non-compliant options.

AMIGA games used to be fast, even very fast. The secret of this high speed is
easily uncovered: the majority of all games switched off the operating system
and took over the hardware. Result: A game that ran on every AMIGA without
problems was rarely seen. Especially AMIGAs that were not yet built at the
time a game was written made a lot of problems as the programmers didn't (and
couldn't) know anything about those new machines.

As a counter-argument to my above statement, one could say:
System-compliant games are too slow.

What anybody who utters this argument should first do is to take a look at the
provided 'cybermand' and 'voxelspace' demos. After that, the argument looses
any credibility. The 'cybermand' and 'voxelspace' programs run 100%
system-compliant and are very fast nevertheless. It IS possible!

'Voxelspace' also demonstrates the second part of my philosophy: it offers the
user really 'evil' hacks as additional parameters. The user must explicitly
activate them and can then enjoy even higher speed if the program still runs
at all. This is exactly where the problem is: the hacks will possibly not work
on every machine anymore. And that is the reason why it is absolutely
mandatory that the game runs completely system-compliant before all other
things. Because if it is system compliant it will always run.

There are even more arguments in favour of the 'system-compliant + optional
hacks' philosophy. First of all: Hardware is getting ever faster.  If a
program runs system-compliant today, it will still run with later hardware but
at increased performance. If the game relies on hacks chances are quite good
it will not run at all on later hardware.

Developing system-compliant games is a lot more efficient than developing
games that keep re-inventing the wheel all the time and are only marginally
faster. At the time we are living one has to work efficiently to satisfy the
quality expectations of the users and still release the game in time. Very
often games aren't released until they are already outdated...

It is important that AMIGA-games receive a new image. The potential customer
must not be afraid that the game might not run on his machine at home. This
fear often leads to the game not getting bought at all. I know this from
personal experience. As a HighEnd-user I very often considered it too risky to
buy a game which was quite likely to be incompatible with some kind of
hardware in my system.

@EndNode

@Node Language "Choice of programming language"

If you wanted fast games in the past, there was no question on what language
to use for that. Assembly language. Anything else was too slow.

Nowadays this has changed fundamentally. Games are getting ever more complex
and bigger and bigger at the same time. As things stand today, the part that
is really responsible for the speed is only a small fraction of the entire
project.

It is absolutely obvious that it is pure madness to create projects of these
sizes in 100% assembly language. This is simply too much work and results in
too little advantage.

On the other hand, assembler is not an obsolete language. If a game was
developed entirely in a high-level language it usually is a lot slower than if
select subroutines have been written in assembly language. In my opinion, that
is what comprises the ideal mixture.

For a high-level programming language one need not look any further than C, as
it is a very portable language. For this reason a project should first be done
entirely in C. If the program is done, it gets tested. This is where a
so-called profiler can be a valuable aid: it measures the load of the single
functions and then generates a statistic that shows which functions are
time-critical and which are not. On the basis of such a profiler-statistic it
is decided which functions should be implemented in assembly language. The key
is to find a reasonable limit here - not too many functions should be adapted
but neither too few.

After choosing the appropriate functions to be adapted you should first take a
look at the assembler-output of the compiler. You should do this because very
often you need not rewrite the entire function but rather only have to edit
the assembler-output by hand. After adapting a function the game gets tested
again and the effect is analysed. This analysis is possibly very useful to
help with the decision which further functions should be re-written in
assembly language.

Let me use this opportunity to make another, quite unconventional, suggestion:
an interesting idea would be to provide such core routines as external modules
or as very small programs. If these mini-programs are well documented,
assembler-specialists could re-write these routines and then simply exchange
the module. This would also be of advantaged to the creator of the game. This
idea was used a few times when it came to ChunkyToPlanar-conversion algorithms
which could simply be exchanged as modules.

@EndNode

@Node Structure "Structure of a PPC-game"

When developing software for the dual-processor-system there is always one
decision to be made: which parts of the program will be compiled for the 68K
and which ones for the PPC?

A first approach is of course to run all computing-intensive program parts on
the PPC while all parts that call a lot of system-function and have little or
no influence on the speed are compiled for the 68K. If you are equipped with a
good developer's environment such as 'StormC', you can recompile any source
code for the respective other CPU without having to change the source code at
all.

However, this approach must be viewed in a wider context. The
dual-processor-solution is most likely to be only a first step on the way to a
pure PPC-AMIGA. It can be expected that the AMIGA-OS (or just parts of it)
will be ported to the PPC in the future and 68K-software is then executed
through an emulator.

If you then also consider that developing a game takes a significant amount of
time, it makes a lot more sense to compile most if not all parts of the
software for the PPC. The performance-losses on the dual-processor board that
are caused by the many CPU-switches when calling system functions are reduced
to a bare minimum due to the high-speed communication-interface of WarpOS.

If it should show that the performance loss is too big in certain areas of the
program, you can still compile that part for the 68K. When there is a
completely native AMIGA-OS in the future, the game will still run and the
games developer can still release an update that has the appropriate parts
compiled for the PPC.

When designing the inner loops that are most time-critical in the case of most
games, special care must be taken. You have to consider very carefully how to
design such a main loop and for which CPU to compile it. Your options are the
following approaches:

1. The entire main loop is compiled for the PPC. This results in a CPU-switch
   becoming necessary for every system call (e.g. for the message-handling).
   If a game only needs few system-calls, this approach is the most ideal one.

2. The main loop is compiled for the 68K and the computing-intensive functions
   are done on the PowerPC. When taking this approach you should be able to
   make do with very few CPU-switches. This is most likely to be the ideal
   approach very often. This is also the approach taken for the two demo-
   programs 'cybermand' and 'voxelspace', both of which make do with one or
   two PPC-calls.

3. The main loop and the computing-intensive functions run in two different
   tasks on two different CPUs. By the use of signals the two tasks can be
   synchronized. This approach is very delicate as multiprocessing very often
   has counter-productive effects.

You should always choose the most ideal approach for the inner loops in order
to ensure maximum performance. If this approach is not ideal for a pure
PPC-system, you can still create an update that achieves maximum performance
on the target system.

As a general rule: system-calls are a large performance-factor if they cause a
CPU-switch. If only few calls are necessary, this has very little effect - if
a lot of CPU-switches occur, however, this has a detrimental effect on
performance. For this reason it is vital to create the structure of the
game-core in a way that minimizes the amount of CPU-switches.

@EndNode

@Node Graphics "Graphics programming"

In this section we will cover the technical aspects of games programming,
especially graphics programming. If you want to proceed to the technical part
immediately, please go to the @{"Main Graphics Menu" link Graphics 61}.

Nowadays the main weakness of the AMIGA is graphics access - exactly that
which use to be one of its strengths in the past. The famous custom-chips of
the AMIGA were mainly designed for scrolling and sprite handling - areas in
which the AMIGA is still able to shine today. In the age of 3D-games, the
custom chips are unfortunately useless.

On top of that, the custom-chips can only operate in Chip-RAM. That means that
if the processor were to provide the custom-chips with data, it would have to
write all data into the unbelievably slow Chip-RAM. The access to the
Chip-Memory was ideally suited to the 68000 ages ago. Needless to say that
this kind of access speed is in-acceptable for a processor that is more than a
hundred times faster.

Where is the way out of this problem? If a high performance is desired, a
graphics board is necessary. And exactly that is currently the main weakness
of the AMIGA:

1. The so-called 'standard' AMIGA does not have a graphics board as of today.

2. AMIGA graphics-boards are incredibly expensive compared to similar boards
   on competing systems and offer a worse performance at the same time. The
   high price seems to be related to the small market volume as graphics-board
   owners still are a minority.

3. The normal AMIGA graphics-system is based on bitplanes. However, many games
   (especially 3D-games) rely on the chunky-format used by most graphics
   boards for the internal representation of graphics. This results in further
   performance loss of the software because the graphics data has to be converted
   to bitplane-format first. In this case the performance increase caused by
   the PPC-processor becomes minimal because the algorithm is heavily memory-
   dependent and of course especially because it has to write into the slow
   Chip-RAM all of the time.

4. The largest part of games does not support graphics boards. As a direct
   result, games run hopelessly slower even on fast 68K-systems equipped with
   a 68040 or 68060 than on competing systems that don't even have a much faster
   processor. A PowerPC-processor doesn't even make sense for any of those
   games.

When using a PowerPC-processor, a graphics board becomes an absolute must. For
this reason games written for the PPC must support graphics boards in the most
optimal way possible. In the following part, progamming graphics boards is
described as well the system-compliant programming of ECS/AGA-graphics.

At this point I want to place a remark to a software which already supports
a lot of the techniques explained below and which offers appropriate
functions to the programmer: RTGMaster from Steffen Haeuser (found on
Aminet under gfx/board). Game programmers should really consider to
do the graphics programming with RTGMaster instead of coding everything
new. RTGMaster is still developed further intensively and supports
in future new graphics hardware.

The following explanations are of interest for these programmers who
can't use RTGMaster due to certain reasons.


                        @{"      ECS/AGA-programming     " link GR_Old}
                        @{"  Graphics-boards programming " link GR_CGFX}
                        @{"          CyberGFX+           " link GR_CGFXPlus}
                        @{"          TurboGFX            " link GR_TurboGFX}


@EndNode

@Node GR_Old "ECS/AGA-Programming"

The major part of all AMIGAs still has no graphics card. When developing a
game you should always analyse your target market - if a large part of this
market does not own a graphics board, ECS and AGA must be supported. It is
important that a game not only supports the LowEnd-AMIGAs but also the
HighEnd-AMIGAs. So a modern game must be able to run on different graphics
systems.

In the following part, no distinction will be made between AGA and ECS as this
difference is negligible when programming in a system-compliant way. This kind
of programming is referred to as PAL-programming as because games programmed
in the way described will run in PAL-mode.

Most of the games for ECS/AGA used to be programmed in a non-system-compliant
way because the AMIGA-OS didn't offer very good support for games programming.
Below I will show how to develop system-compliant games with the operating
system still running.

The following examples are shown in assembly language. Sense and purpose of
these examples can of course be transferred to any high-level language as
well. The assembley-parts are direct excerpts from the source code of the
voxelspace-demo.

Generally speaking, a system-compliant ECS/AGA-game with double-buffering
works in the same way as a game that was programmed close to the hardware.
Therefore the examples below always operate on two bitplanes. The order of the
following menu items corresponds to the order of programming.

The following statements take it as granted that the game uses all of the
available screen area and that there is only one window present. Games such as
strategy games or industry simulations can of course use several windows. In
these cases the display-speed doesn't play a vital role anyways. That is why
this case won't be covered any further in this place. The following statements
are mostly of interest for 3D-games and similar ones.

                        @{"  Allocating the bitplanes    " link PAL_AllocBM}
                        @{"      Opening a screen        " link PAL_OpenScr}
                        @{"    Assigning the colors      " link PAL_SetCols}
                        @{"      Opening a window        " link PAL_OpenWin}
                        @{" Clearing the mouse pointer   " link PAL_ClearPointer}
                        @{"   Generating the graphics    " link PAL_CreateGFX}
                        @{" Switching the image buffers  " link PAL_Switch}
                        @{"      Closing the window      " link PAL_CloseWin}
                        @{"      Closing the screen      " link PAL_CloseScr}
                        @{"     Freeing the bitplanes    " link PAL_FreeBM}


@EndNode

@Node PAL_AllocBM "Allocating the bitplanes"

First of all you have to allocate two image buffers that in turn consist of
several bitplanes. The number of bitplanes depends on the number of colors to
be displayed. In the case of AGA, eight bitplanes are commonly used.

I will now discuss a method that also works with earlier version of the
operating system. There are also alternative ways that require a higher
version number.

At first you have to create space for two bitmap-structures. This can be done
statically as well as dynamically. The organisation of the bitmap-structures
is described in the 'graphics/gfx.i' include-file.

After that both of the bitmap structures are initialized by using
'graphics/InitBitMap'. This function is passed these arguments: address of a
bitmap-structure; height, width and depth of the bitmap.

Finally the 'AllocRaster' function is called for every bitmap to be allocated.
'AllocRaster' allocates the necessary memory for the bitplanes. After that,
the return values are entered into the bitmap-structure as PlanePtrs. You can
also allocate several bitmaps at the same time if you proceed as described
below.

From now on we will name the two bitmap-structures as follows:

ActualBitmap : The bitmap that is currently active. The bitplanes that are
               part of this bitmap-structure are currently displayed.
HiddenBitmap : The bitmap that is currently inactive. The bitplanes that are
               part of this bitmap structure are available for creating the
               graphics.

Below you find an excerpt from the voxelspace source-code that allocates the
bitplanes:


******************************************************************************
*
*       d0 = SetupBitmaps
*
*       prepares two bitmaps for double buffering
*
*       Out:
*       d0 = error code
*
*       error codes:    -1 = success
*                        4 = not enough memory
******************************************************************************
SetupBitmaps
                movem.l d1/d5-a2,-(sp)
                moveq   #2-1,d7                 ;two bitmap-structures
                lea     bitmap1,a0              ;a0 -> first bitmap-structure
                move.l  a0,ActualBitmap         ;bitmap1 is ActualBitmap
.loop
                move.l  a0,a2
                lea     bm_Planes(a2),a2        ;a2 -> bitplane-pointer-array
                moveq   #8,d0                   ;color depth (256 colors)
                move.l  #320,d1                 ;width of the bitplanes
                move.l  #256,d2                 ;height of the bitplanes
                CALLGRAF        InitBitMap      ;initialize bitmap-structure
                move.l  #320,d0                 ;Width of a bitplane
                move.l  #256*8,d1               ;Height of a bitplane * 8
                CALLGRAF        AllocRaster     ;allocate a 8 bitplanes
                tst.l   d0                      ;enough memory available?
                beq.b   .error                  ;no -> error message
                move.l  d0,a0
                move.l  #(256*8*320/8/4)-1,d5
.clear
                clr.l   (a0)+                   ;an image-buffer is cleared
                subq.l  #1,d5
                bne.b   .clear
                moveq   #8-1,d6                 ;all 8 bitplane-Ptr are entered
.loop2                                          ;into the bitmap-structure
                move.l  d0,(a2)+
                add.l   #320/8*256,d0           ;jump to next bitplane
                dbra    d6,.loop2
                lea     bitmap2,a0              ;now repeat everything for
                move.l  a0,HiddenBitmap         ;the 2nd bitmap (HiddenBitmap)
                dbra    d7,.loop
                moveq   #-1,d0                  ;function successful
                bra.b   .end
.error
                moveq   #4,d0                   ;return error code
.end
                movem.l (sp)+,d1/d5-a2
                rts


@EndNode

@Node PAL_OpenScr "Opening a screen"

After allocating the bitmaps, a screen is opened. Best suited to this purpose
is the 'OpenScreenTagList' function of intuition.library. This function
requires a NewScreen-structure or a taglist as parameters. In our example we
will only use a taglist.

If the taglist is implemented in assembley language, it can be implemented as
follows (example from the voxelspace source-code):


ScreenTags
                dc.l    SA_Left                 ;left screen border
                dc.l    0
                dc.l    SA_Top                  ;upper screen border
                dc.l    0
                dc.l    SA_Width                ;height of screen (here: 320)
                dc.l    320
                dc.l    SA_Height               ;width of screen (here: 256)
                dc.l    256
                dc.l    SA_Depth                ;depth of screen (here: 256
                dc.l    8                       ;colors)
                dc.l    SA_BitMap               ;pointer to a bitmap-structure
                dc.l    bitmap1
                dc.l    SA_Quiet                ;Keeps intuition from messing
                dc.l    TRUE                    ;with the screen
                dc.l    SA_Type                 ;ScreenType
                dc.l    CUSTOMSCREEN
                dc.l    TAG_DONE

It is important that the longword after SA_Bitmap contains a pointer to an
initialized bitmap-structure. If the bitmap was allocated dynamically, the
pointer still remains to be entered.

'OpenScreenTagList' can now be called. This function returns a pointer to a
screen-structure which will be used later. The screen-structure can be used to
determine a pointer to the ViewPort that will also be needed later. The
following code demonstrates the opening of a screen:

                sub.l   a0,a0                   ;not a NewScreen-structure
                lea     ScreenTags,a1           ;pointer to above taglist
                CALLINT OpenScreenTagList       ;open screen
                moveq   #5,d1                   ;provide error code
                move.l  d0,_Screen              ;save screen-pointer
                beq.w   .error                  ;Is it Null? -> error
                move.l  d0,a0
                move.l  d0,ScreenAddress        ;enter in window-taglist
                lea     sc_ViewPort(a0),a0      ;get address of viewports
                move.l  a0,_VPort               ;save viewport-address

The 'move.l d0,ScreenAddress' line will be discussed later in the
@{"'Opening a window'" link PAL_OpenWin} chapter.

@EndNode

@Node PAL_SetCols "Assigning the colors"

This section is valid for ECS/AGA as well as for graphics-board programming.

After the screen has been opened, the address of the ViewPort is also
available. This enables you to assign the desired colors to the screen. There
are two cases that have to be distinguished:

1. ECS/OCS

If the game has to support ECS/OCS, the following path must be taken:

The assigning of the colors is done using the 'graphics/LoadRGB4' system
function. This functions requires a pointer to the ViewPort, a table of all
colors and the amount of colors as parameters. The color table is a simple
array of USHORT (2 bytes) the elements of which signify the RGB-value of the
color. An example of a color table with 4 colors:

ColorTable
                        dc.w    0               ;black
                        dc.w    $f00            ;red
                        dc.w    $00f            ;blue
                        dc.w    $fff            ;white

The code for assigning these colors then looks as follows (example for 32
colors):

                move.l  _VPort,a0               ;a0 -> ViewPort
                lea     ColorTable,a1           ;a1 -> color table
                moveq   #32,d0                  ;d0 = number of colors
                CALLGRAF        LoadRGB4


2. AGA / CyberGFX

If AGA/CyberGFX are to be supported, the matter has to be handled differently:

The assigning of the colors is done by using the 'graphics/LoadRGB32' system
function. This function requires a pointer to the ViewPort and a pointer to
the color table as parameters. However, this table has a different
organisation than that for the ECS/OCS-variant:

The first value of the table signifies the amount of colors to be loaded. The
second word is the first color number to be loaded (usually zero). After that
follows the actual table. Each color consists of 3 longwords with the first
longword signifying the red-component, the second longword the green-component
and the third longword the blue-component. Another important peculiarity: the
color values must be provided LEFT-ALIGNED! In the case of 8-bit color-values
the actual color value must be left-shifted by 24 bits. The end of the table
is marked by a null-longword.

Example for a color table with 4 colors:

ColorTable
        dc.w    4                               ;4 colors
        dc.w    0                               ;first color is color number 0
        dc.l    $ff000000,$ff000000,0           ;yellow
        dc.l    0,0,0                           ;black
        dc.l    $7f000000,$7f000000,$7f000000   ;grey
        dc.l    0,0,$40000000                   ;dark blue
        dc.l    0                               ;end of table

The code for assigning the colors then looks as follows:

                move.l  _VPort,a0               ;a0 -> ViewPort
                lea     ColorTable,a1           ;a1 -> color table
                CALLGRAF        LoadRGB32

@EndNode

@Node PAL_OpenWin "Opening a window"

This section is valid for ECS/AGA as well as for graphics-board programming.

Now the window can be opened. Best suited to this purpose is the
'OpenWindowTagList' function of intuition.library. This function requires a
NewWindow-structure or a taglist as arguments. In our example we will only use
a taglist.

If the taglist is implemented in assembley language, it can look as follows
(example from the voxelsspace source-code):

WindowTags
                dc.l    WA_Left                 ;left window border
                dc.l    0
                dc.l    WA_Top                  ;upper window border
                dc.l    0
                dc.l    WA_Width                ;window width (PAL: 320)
WinWidth
                dc.l    320                     ;CyberGFX : dynamically
                dc.l    WA_Height               ;height of window (PAL: 256)
WinHeight
                dc.l    256                     ;CyberGFX : dynamically
                dc.l    WA_Activate             ;window is to be activated
                dc.l    TRUE                    ;immediately
                dc.l    WA_Borderless           ;the window has no border
                dc.l    TRUE
                dc.l    WA_RMBTrap              ;right mouse button will be
                dc.l    TRUE                    ;trapped
                dc.l    WA_ReportMouse          ;mouse movement will be
                dc.l    TRUE                    ;reported
                dc.l    WA_IDCMP                ;IDCMP-flags to be supported
                dc.l    IDCMP_MOUSEBUTTONS!IDCMP_RAWKEY!IDCMP_MOUSEMOVE!IDCMP_DELTAMOVE!IDCMP_ACTIVEWINDOW!IDCMP_INACTIVEWINDOW
                dc.l    WA_CustomScreen         ;address of the parent screen
                dc.l    0
                dc.l    TAG_DONE

The longword after WA_CustomScreen must be filled with the address of the
screen as described in the 'Opening a screen' chapter.

This window-structure is of course only an example. Depending on the
application it will look differently. It depends on e.g. which interaction by
the user should be handled. If mouse control is not provided for it does not
make sense to specify the WA_ReportMouse and the corresponding IDCMP-flags.

Here we will cover the IDCMP-flags a little further. The following flags might
be of interest for games:

IDCMP_MOUSEBUTTONS      : The user pressed a mouse button
IDCMP_MOUSEMOVE         : The User moved the mouse. IDCMP_MOUSEMOVE should
                          always be used together with IDCMP_DELTAMOVE.
IDCMP_RAWKEY            : The user has pressed a key. In this mode the keys
                          are passed unmodified. This can be used to test
                          special keys such as Ctrl, Alt, Shift, etc.
IDCMP_ACTIVEWINDOW      : Might be used if a game was put into pause-mode by
                          switching screens. By activating the window with the
                          mouse the game can be resumed.
IDCMP_INACTIVEWINDOW    : Can be used to put the game into pause-mode if the
                          user switches screens an de-activates the window.
IDCMP_DELTAMOVE         : Should always be used when IDCMP_MOUSEMOVE is used.
IDCMP_VANILLAKEY        : The user has pressed a key. The data has already
                          been processed in this mode. Therefore it can not
                          be used to watch all keys.


Now 'OpenWindowTagList' can be called. This function returns a pointer to a
window-structure which will be needed later. The window-structure can be used
to determine the pointer to the RastPort that will also be used later. The
following code demonstrates the opening of a window:

                sub.l   a0,a0                   ;Not a NewWindow-structure
                lea     WindowTags,a1           ;Pointer to above taglist
                CALLINT OpenWindowTagList       ;Open windows
                moveq   #6,d1                   ;provide error codes
                move.l  d0,_Window              ;save window-pointer
                beq.b   .error                  ;Is it zero? -> error
                move.l  d0,a0
                move.l  wd_RPort(a0),_RPort     ;Save RastPort address


@EndNode

@Node PAL_ClearPointer "Clearing the mouse pointer"

This section will show you how to make the mouse pointer disappear. The reason
to do this is that an ever-persistent mouse pointer can be quite annoying
under certain circumstances.

First of all, a small memory area of Chip-RAM is allocated (e.g. 16 bytes).
This area is then used as the mouse pointer which becomes completely
transparent and thus invisible.

After that the 'intuition/SetPointer' function is called. This can be done as
seen below:


                move.l  #16,d0                  ;allocate 16 bytes
                move.l  #MEMF_CHIP!MEMF_CLEAR,d1
                CALLEXEC        AllocVec        ;allocate Chip-RAM
                move.l  d0,NullPointer          ;store address
                move.l  d0,a1                   ;to a1 for SetPointer
                move.l  _Window,a0              ;window address to a0
                moveq   #1,d0
                moveq   #1,d1
                moveq   #0,d2
                moveq   #0,d3
                CALLINT SetPointer              ;clear mouse pointer

If you want the mouse pointer to re-appear, this can be done by calling
'intuition/ClearPointer'.

If the window is closed, the mouse pointer automatically re-appears.

@EndNode

@Node PAL_CreateGFX "Creating the graphics"

After all preparations have been made, the game enters the main loop where,
iteration for iteration, the next frame is computed and displayed.

In modern games (such as 3D-games, for example) the graphics are first created
in Fast-RAM using the chunky-format (i.e. one byte per pixel). After that the
picture is converted using a conversion algorithm. These so-called C2P or
ChunkyToPlanar-converters are available by the score. Special attention has to
be paid to selecting an algorithm that fits the task to be accomplished. Very
often such algorithms only work for a certain color depth.

Any such C2P-function should be implemented for the PPC. The algorithm is
actually not at all suited to demonstrate the power of the PPC because it is
very memory-intensive and writes to Chip-RAM. On the other hand, it is
possible to achieve small performance-improvements over a 68K-processor. This
could possibly have visible impact on the overall result.

After generating the actual background-graphics, any status bars and texts are
displayed. These can be written into the invisible image buffer using
conventional graphics.library functions (e.g. Move, Text, ...) before the
buffers are swapped and the changes become visible. An example for this can be
seen in the 'voxelspace' demo in which, after computing the landscape, the
status bar is generated using 'Move' and 'Text'.

Important: When using standard-functions of graphics.library, the
RastPort-structure and the the RasInfo-structure within the ViewPort must be
adapted before invoking the functions. The reason for this is that these
functions operate directly on the current image buffer. Details can be found
in the @{"'Switching the image buffers'" link PAL_Switch} chapter.

@EndNode

@Node PAL_Switch "Switching the image buffers"

A typical main loop could be structured similar to this:

                - compute image and put it into Fast-RAM
                - copy image to invisible image buffer/convert
                - swap visible and invisible image buffers

In this way you can create smooth and flowing animations (double buffering).
Now we have to find a way to swap the two image buffers.

As mentioned in the previous chapter, @{"'Creating the graphics'" link PAL_CreateGFX}, some
preparations must be made before calling standard-functions of
graphics.library. These function always affect the current image buffer but we
want their output to be made into the invisible buffer in order for the
animation to proceed cleanly.

Switching is done as follows:

1. adapt RastPort and RasInfo-structures
2. use graphics.library functions
3. make above changes visible using 'ScrollVPort'
4. swap bitmap-pointers


First of all, care must be taken that the graphics.library functions actually
affect the invisible buffer. This is done by making the invisible bitmap
visible for these function without actually switching the bitplanes. This is
done as described below:

                move.l  _RPort,a0               ;get RastPort address
                move.l  HiddenBitmap,a1         ;put address of the hidden bitmap into a1
                move.l  a1,rp_BitMap(a0)        ;enter into RastPort
                move.l  _VPort,a0               ;get ViewPort address
                move.l  vp_RasInfo(a0),a0       ;get RasInfo address
                move.l  a1,ri_BitMap(a0)        ;enter into RasInfo


After that, functions such as 'Move' and 'Text' can be applied. These
functions therefore still affect the invisible buffer.


As the third step, the changes to the RastPort and RasInfo are made visible.
By doing so, the two image buffers are effectively switched:

                move.l  _VPort,a0               ;get ViewPort address
                CALLGRAF        ScrollVPort     ;swap image buffers


Finally the pointers to the actual and hidden bitmaps are exchanged with each
other so that the game can create the next image in the other image buffer
that has now become invisible:

                move.l  ActualBitmap,d0
                move.l  HiddenBitmap,ActualBitmap
                move.l  d0,HiddenBitmap


@EndNode

@Node PAL_CloseWin "Closing the window"

After the game was finished a clean exit should been made. This means that
e.g. all memory should be freed again. The window should of course be closed,
too. This is done by using the 'intuition/CloseWindow' function:

                move.l  _Window,d0              ;get window-structure address
                beq.b   .nowindow               ;was the window opened?
                move.l  d0,a0                   ;address to a0
                CALLINT CloseWindow             ;close window
.nowindow

@EndNode

@Node PAL_CloseScr "Closing the Screen"

It is needless to say that the screen should be closed as well. This is done
through 'intuition/CloseScreen':

                move.l  _Screen,d0              ;get screen-structure address
                beq.b   .noscreen               ;was the screen opened?
                move.l  d0,a0                   ;address to a0
                CALLINT CloseScreen             ;close screen
.noscreen

@EndNode

@Node PAL_FreeBM "Freeing the bitplanes"

The bitplanes that were allocated should be freed. First of all, the
'FreeRaster' function is called for every bitplane. 'FreeRaster' is the
counterpart to 'AllocRaster'.

If the bitmap-structures were allocated dynamically, these should be freed as
well.

Below you find an excerpt from the voxelspace-demo. It is the part that frees
the bitplanes:

******************************************************************************
*
*       FreeBitmaps
*
*       frees all the memory allocated by 'AllocRaster'
*
******************************************************************************
FreeBitmaps
                movem.l d0/d1/d6-a2,-(sp)
                moveq   #2-1,d7                 ;two bitmap-structures
                lea     bitmap1,a0              ;a0 -> 1st Bitmap
.loop
                move.l  a0,a2
                lea     bm_Planes(a2),a2        ;a2 -> Array of bitplane-pointer
                moveq   #8-1,d6                 ;8 bitplanes to be freed
.loop2
                move.l  (a2)+,d0                ;read bitplane-pointer
                beq.b   .next                   ;Null? -> nothing is freed
                move.l  d0,a0
                move.l  #320,d0                 ;width of bitplane
                move.l  #256,d1                 ;height of bitplane
                CALLGRAF        FreeRaster      ;free bitplane
.next
                dbra    d6,.loop2
                lea     bitmap2,a0              ;repeat for 2nd bitmap
                dbra    d7,.loop
                movem.l (sp)+,d0/d1/d6-a2
                rts

@EndNode

@Node GR_CGFX "Graphics-board programming"

System-compliant graphics-board programming is slightly different from
ECS/AGA-programming. One significant difference: no double-buffering is done.
Usually a game/demo that does not use double-buffering starts to flicker, but
if CyberGFX is programmed in the right way these effects almost never occur.

The programming process is similar to that for ECS/AGA:

                        @{"   Choosing the screen mode   " link CGFX_ScrMode}
                        @{"       Opening a screen       " link CGFX_OpenScr}
                        @{"     Assigning the colors     " link PAL_SetCols}
                        @{"       Opening a window       " link PAL_OpenWin}
                        @{"  Clearing the mouse pointer  " link PAL_ClearPointer}
                        @{"  Creating the temp. RastPort " link CGFX_TempRP}
                        @{"     Creating the graphics    " link CGFX_CreateGFX}
                        @{"      Closing the window      " link PAL_CloseWin}
                        @{"      Closing the screen      " link PAL_CloseScr}
                        @{"  Freeing the temp. RastPort  " link CGFX_FreeTempRP}

@EndNode

@Node CGFX_ScrMode "Choosing the screen mode"

Graphics boards usually allow choosing the screen mode quite freely. Therefore
AMIGA-users that own a graphics-board will most likely use different screen
modes. It is important especially for games to work with any sensible screen
mode.

The choice of screen mode has also direct influence on the game performance.
The smaller the resolution, the faster the game will run. In this way it is
even possible to play existing 68K-games super-smoothly (e.g. by choosing a
192*128 screen mode). The graphics will look quite blocky in this case, but
due to the smooth animation this will very often go unnoticed.

A game should offer the user the option of choosing his preferred screen mode.
CyberGFX offers a function for this purpose which displays a screen
mode-requester and returns the mode chosen by the user. In order to use this
function, cybergraphics.library must be opened successfully.

The 'CModeRequestTagList' function requires a taglist as parameter and returns
the DisplayID of the selected screen mode which is used later. The taglist can
be used to filter the screen modes to be displayed in the requester. If the
game does not work with certain screen dimensions, these should be filtered
out of the list. Further information on this can be found in the autodocs for
cybergraphics.library.

Most of the time the screen mode-entries will be filtered by their color
depth. Games will only work with 16- or 24-bit-modes in the rarest of cases.
The taglist will then look as follows (the include-file for
cybergraphics.library must also have been loaded):

CyberModeTags   dc.l    CYBRMREQ_CModelArray
                dc.l    ColorModel
                dc.l    TAG_DONE
ColorModel
                dc.w    PIXFMT_LUT8
                dc.w    -1

Now the screen mode-requester can be displayed:

                sub.l   a0,a0                   ;must be NULL
                lea     CyberModeTags,a1        ;pointer to taglist
                CALLCYBERGFX    CModeRequestTagList     ;show requester
                moveq   #8,d1                   ;provide error code
                tst.l   d0                      ;function successful?
                beq.w   .error                  ;no -> error
                moveq   #0,d1                   ;provide error code
                cmp.l   #-1,d0                  ;did the user choose 'Cancel'?
                beq.w   .error                  ;then -> quit game
                move.l  d0,DispID               ;store DisplayID

@EndNode

@Node CGFX_OpenScr "Opening a screen"

Opening a screen works almost as described above, by calling
'OpenScreenTagList'. The only addition is that the screen mode can now be
chosen by the user, therefore data such as height, width and DisplayID is
dynamic. These values must first be determined and then placed in the taglist
for 'OpenScreenTagList'.

The DisplayID is returned by 'CModeRequestTagList' (see the earlier chapter,
@{"'Choosing the screen mode'" link CGFX_ScrMode}). Width and height must now be determined. This can
be done with another CyberGFX-function: 'GetCyberIDAttr'. It requires a
DisplayID and a mode as parameters. The mode specifies which information is
desired. We will call this function twice: At first with the
'CYBRIDATTR_HEIGHT' parameter to get the height, then with the
'CYBRIDATTR_WIDTH' parameter to determine the width. All these values are
placed in the screen-taglist after that.

After opening the screen, the width and height should be read from the
screen-structure and placed in the window-taglist. Additionally, the address
of the ViewPort is determined from the screen-structure and then stored.

The screen-taglist can be organized as follows:

ScreenTags_C
                dc.l    SA_Quiet                ;keeps intuition from
                dc.l    TRUE                    ;messing with the screen
                dc.l    SA_Width                ;screen width
ScreenWidth
                dc.l    0
                dc.l    SA_Height               ;screen height
ScreenHeight
                dc.l    0
                dc.l    SA_Depth                ;screen depth
                dc.l    8
                dc.l    SA_DisplayID            ;screen-DisplayID
DispID
                dc.l    0
                dc.l    TAG_DONE


The code then looks as follows:

                move.l  DispID,d1               ;get DisplayID
                move.l  #CYBRIDATTR_HEIGHT,d0   ;height will be determined
                CALLCYBERGFX    GetCyberIDAttr  ;get height
                move.l  d0,ScreenHeight         ;and place in taglist
                move    d0,AreaHeight           ;store height
                move.l  DispID,d1               ;get DisplayID
                move.l  #CYBRIDATTR_WIDTH,d0    ;width will be determined
                CALLCYBERGFX    GetCyberIDAttr  ;get width
                move.l  d0,ScreenWidth          ;and place in taglist
                sub.l   a0,a0                   ;not a NewScreen-structure
                lea     ScreenTags_C,a1         ;pointer to taglist
                CALLINT OpenScreenTagList       ;open screen
                moveq   #5,d1                   ;provide error code
                move.l  d0,_Screen              ;store pointer to screen
                beq.w   .error                  ;pointer is NULL? -> error
                move.l  d0,a0
                move    sc_Width(a0),AreaWidth  ;store width
                move    sc_Width(a0),WinWidth+2 ;width -> in Window-Taglist
                move    sc_Height(a0),WinHeight+2 ;height -> in Window-Taglist
                move.l  d0,ScreenAddress        ;pointer to screen in Window-TL
                lea     sc_ViewPort(a0),a0      ;get ViewPort address
                move.l  a0,_VPort               ;store address


@EndNode

@Node CGFX_TempRP "Creating a  temp. RastPort"

The actual copying of the image data into the graphics-memory is done by the
'graphics/WritePixelArray8' function. Further details on this will be given at
a later point. What now is important is that this function requires a
temporary RastPort.

First of all, memory must be provided for the RastPort-structure (the
structure is defined in the 'graphics/rastport.i' include-file). This can be
done either statically or dynamically. In the following example this RastPort
will be named 'tempRP'.

The RastPort is now initialized using the 'graphics/InitRastPort' function.
After that, a bitmap-structure and the appropriate bitplanes are allocated
using 'graphics/AllocBitMap'. This function requires AMIGA-OS V3.0. If you
want to support graphics boards, you can safely assume that OS V3.0 is
present.

The code then looks as follows:

                lea     tmpRP,a1                ;a1 -> space for temp. RastPort
                CALLGRAF        InitRastPort    ;initialise RastPort
                moveq   #0,d0
                move    AreaWidth,d0            ;determine screen width
                moveq   #1,d1                   ;Height = 1 row
                moveq   #8,d2                   ;depth = 256 colors
                move.l  #BMF_MINPLANES,d3       ;special flag
                move.l  _RPort,a0               ;get window-RastPort
                move.l  rp_BitMap(a0),a0        ;pass Bitmap as 'Friend'
                CALLGRAF        AllocBitMap     ;allocate Bitmap + Bitplanes
                lea     tmpRP,a0                ;address of temp. RastPort -> a0
                move.l  d0,rp_BitMap(a0)        ;enter new bitmap


@EndNode

@Node CGFX_CreateGFX "Creating the graphics"

Within the main loop the image is (as before) first created in FAST-RAM. It
now must be copied into the graphics-RAM in some way so it can be displayed.

There are two functions that can take care of this copying:

1. cybergraphics/WritePixelArray
2. graphics/WritePixelArray8

The first function has the advantage that it is also able to copy just parts
of the screen. The disadvantage: This function is INFINITELY slow.

The second function is as fast as possible. Unfortunately it has the
disadvantage of offering only very few parameters. This function is commonly
used to copy the entire image into the graphics memory.

The very biggest problem when programming graphics-boards in this way: as no
double-buffering is possible, it is not possible anymore to add further
graphical elements using the standard graphics.library functions without the
game starting to flicker heavily. All additional graphical elements must
already be created in the 'ChunkyBuffer' (picture in FAST-RAM) which can be
quite tedious as the standard graphics.library functions can not be used to do
this.

The code for copying the entire picture from the 'ChunkyBuffer' (Fast-RAM)
into the graphics-RAM could look as follows:

                move.l  _RPort,a0               ;get RastPort address
                moveq   #0,d0                   ;xstart = 0
                moveq   #0,d1                   ;ystart = 0
                move    AreaWidth,d2            ;xstop = width - 1
                subq    #1,d2
                move    AreaHeight,d3           ;ystop = height - 1
                subq    #1,d3
                move.l  ChunkyBuffer,a2         ;a2 -> image-data in Fast-RAM
                lea     tmpRP,a1                ;a1 -> temp. RastPort
                CALLGRAF        WritePixelArray8 ;copy image


@EndNode

@Node CGFX_FreeTempRP "Freeing the temp. RastPort"

The temporary RastPort that was created for the 'WritePixelArray8' function
should be freed again. This can easily be done using the 'FreeBitMap'
function.

                lea     tmpRP,a0                ;RastPort address -> a0
                move.l  rp_BitMap(a0),a0        ;bitmap address -> a0
                CALLGRAF        FreeBitMap      ;free bitmap

Finally, the temp. RastPort itself should be freed if it was allocated
dynamically.

@EndNode

@Node GR_CGFXPlus "CyberGFX+"

A very big disadvantage of the graphics-board programming described above is
the lack of double- or multi-buffering support. This also results in the
situation that no additional elements can be added to the already computed
graphics using the standard-graphics-functions. This poses of course a severe
limitation.

Below I will describe an extension to system-compliant graphics programming
that is able to do real multi-buffering by employing a few tricks. Adding
additional graphics elements isn't a problem anymore, either.

This technique is system-compliant although switching the buffers using
'ScrollVPort' is not guaranteed anywhere. However, this method has always
worked for me. In order to avoid any problems, games should always support
graphics-board access without multi-buffering whenever possible.

The programming itself is done very similar to the conventional graphics-board
programming described earlier:

                        @{"   Choosing the screen mode   " link CGFX_ScrMode}
                        @{"       Triple Buffering       " link TG_Triple}
                        @{"       Opening a screen       " link CGFXPlus_OpenScr}
                        @{"     Assigning the colors     " link PAL_SetCols}
                        @{"       Opening a window       " link PAL_OpenWin}
                        @{"  Clearing the mouse pointer  " link PAL_ClearPointer}
                        @{"  Creating the temp. RastPort " link CGFX_TempRP}
                        @{"    Creating the graphics     " link CGFXPlus_CreateGFX}
                        @{"      Closing the window      " link PAL_CloseWin}
                        @{"      Closing the screen      " link PAL_CloseScr}
                        @{"  Freeing the temp. RastPort  " link CGFX_FreeTempRP}
                        @{"           Problems           " link TG_Problems}

@EndNode

@Node CGFXPlus_OpenScr "Opening a screen"

Opening a screen works almost as described above, by calling
'OpenScreenTagList'. The only addition is that the screen mode can now be
chosen by the user, therefore data such as height, width and DisplayID is
dynamic. These values must first be determined and then placed in the taglist
for 'OpenScreenTagList'.

As described in the @{"'Triple Buffering'" link TG_Triple} section, the screen is opened at three
times the height that was originally specified.

The DisplayID is returned by 'CModeRequestTagList' (see the earlier chapter,
@{"'Choosing the screen mode'" link CGFX_ScrMode}). Width and height must now be determined. This can
be done with another CyberGFX-function: 'GetCyberIDAttr'. It requires a
DisplayID and a mode as parameters. The mode specifies which information is
desired. The function is then called twice. At first with the
'CYBRIDATTR_HEIGHT' parameter to get the height, then with the
'CYBRIDATTR_WIDTH' parameter to determine the width. All these values are
placed in the screen-taglist after that.

After opening the screen the width and height should be read from the
screen-structure and placed in the window-taglist. Additionally, the address
of the ViewPort is determined from the screen-structure and then stored.

In addition to the conventional graphics-card programming, the vertical
positions of the three buffers are now determined. The position of the first
buffer is always zero, the position of the second buffer is equal to the
buffer height and the position of the third buffer is equal to twice the
buffer height.

The screen-taglist can be organized as follows:

ScreenTags_C
                dc.l    SA_Quiet                ;keeps intuition from
                dc.l    TRUE                    ;messing with the screen
                dc.l    SA_Width                ;screen width
ScreenWidth
                dc.l    0
                dc.l    SA_Height               ;screen height
ScreenHeight
                dc.l    0
                dc.l    SA_Depth                ;screen depth
                dc.l    8
                dc.l    SA_DisplayID            ;screen DisplayID
DispID
                dc.l    0
                dc.l    TAG_DONE


The code then looks as follows:

                move.l  DispID,d1               ;get DisplayID
                move.l  #CYBRIDATTR_HEIGHT,d0   ;height will be determined
                CALLCYBERGFX    GetCyberIDAttr  ;get height
                move    d0,AreaHeight           ;store height
                move.l  d0,d1                   ;multiply height by three
                add.l   d0,d0
                add.l   d1,d0
                move.l  d0,ScreenHeight         ;and place in Taglist
                move.l  DispID,d1               ;get DisplayID
                move.l  #CYBRIDATTR_WIDTH,d0    ;width will be determined
                CALLCYBERGFX    GetCyberIDAttr  ;get width
                move.l  d0,ScreenWidth          ;and place in Taglist
                sub.l   a0,a0                   ;not a NewScreen-structure
                lea     ScreenTags_C,a1         ;pointer to taglist
                CALLINT OpenScreenTagList       ;open screen
                moveq   #5,d1                   ;provide error code
                move.l  d0,_Screen              ;store pointer to screen
                beq.w   .error                  ;Is it NULL? -> error
                move.l  d0,a0
                move    sc_Width(a0),AreaWidth  ;store width
                move    sc_Width(a0),WinWidth+2 ;width -> in window-Taglist
                move    sc_Height(a0),WinHeight+2 ;height -> in window-Taglist
                move.l  d0,ScreenAddress        ;pointer to screen in window-TL
                lea     sc_ViewPort(a0),a0      ;get ViewPort address
                move.l  a0,_VPort               ;store address
                move    AreaHeight,d1           ;read buffer height
                clr     ActualOffset            ;y-Pos. buffer 1 = 0
                move    d1,HiddenOffset         ;y-Pos. buffer 2 = d1
                add     d1,d1
                move    d1,ThirdOffset          ;y-Pos. buffer 3 = d1*2

@EndNode

@Node CGFXPlus_CreateGFX "Creating the Graphics"

Within the main loop the image is (as before) first created in FAST-RAM. It
now must be copied into the graphics-RAM in some way so it can be displayed.

There are two functions that can take care of this copying:

1. cybergraphics/WritePixelArray
2. graphics/WritePixelArray8

The first function has the advantage that it is also able to copy just parts
of the screen. The disadvantage: This function is INFINITELY slow.

The second function is as fast as possible. Unfortunately it has the
disadvantage of offering only very few parameters. This function is commonly
used to copy the entire image into the graphics memory.

After copying the graphics-data, the standard functions of graphics.library
can be used to add additional graphical elements to the invisible buffer.

Now the buffers are switched. This is done through the 'graphics/ScrollVPort'
function. After switching, the position values (ActualOffset, HiddenOffset,
and ThirdOffset) are rotated so that the correct buffer can be chosen in the
next iteration.

The code for copying the entire image from the 'ChunkyBuffer' (Fast-RAM) into
the graphics-RAM could look as follows:

                move.l  _RPort,a0               ;get RastPort address
                moveq   #0,d0                   ;xstart = 0
                moveq   #0,d1                   ;ystart = 0
                move    AreaWidth,d2            ;xstop = width - 1
                subq    #1,d2
                move    AreaHeight,d3           ;ystop = height - 1
                subq    #1,d3
                add     HiddenOffset,d2         ;select invisible buffer
                add     HiddenOffset,d3         ;select invisible buffer
                move.l  ChunkyBuffer,a2         ;a2 -> image-data in Fast-RAM
                lea     tmpRP,a1                ;a1 -> temp. RastPort
                CALLGRAF        WritePixelArray8 ;copy image

At this point the additional graphical objects can be created. After that, the
buffers are switched and the position values rotated:

                move.l  _VPort,a0               ;get ViewPort address
                move    HiddenOffset,d0         ;determine and negate position
                neg     d0                      ;of the invisible buffer
                move    d0,vp_DyOffset(a0)      ;enter into ViewPort
                CALLGRAF        ScrollVPort     ;switch buffers
                move    ActualOffset,d0         ;rotate position values of
                move    HiddenOffset,ActualOffset ;the buffers
                move    ThirdOffset,HiddenOffset
                move    d0,ThirdOffset

@EndNode

@Node GR_TurboGFX "TurboGFX"

It is quite likely that some people already asked themselves why the image has
to be created in FAST-RAM first and then copied into the graphics memory when
programming graphics-boards. It should be faster to create the image directly
within the graphics memory. That should make games even faster.

This section will now show that this is indeed possible and how to program and
implement it.


Hinweis: Diese hier vorgestellte Technik ist im Prinzip systemkonform. Der
direkte Zugriff auf das Grafik-RAM bzw. auf die Bitmap des Screens wird durch
Locking-Mechanismen geschuetzt, wie es auch von CyberGFX verlangt wird. Diese
Technik ist allerdings als LowLevel zu betrachten und es ist ueberhaupt keine
gute Idee, sich völlig darauf abzustuetzen. Zudem kann es durchaus sein, dass
diese Technik auf anderen Systemen mit anderer Gfx-Software nicht funktioniert.
Deswegen sollten Spiele und Demos diese Technik als Ergänzung zu den anderen
Techniken anbieten.

Note: the techniques presented here are basically system-compliant. The
direct access to the graphics RAM resp. to the bitmap of the screen is
enclosed by locking mechanisms, as required by CyberGFX. This technique
has to be considered lowlevel and it's not a good idea only to support
this technique without supporting other ones. And it's possible that
this technique doesn't work with other gfx-interface-software. Therefore
games and demos should only use this technique as supplement to the
other ones.

The term TURBOGFX is derived from a CLI-parameter of the voxelspace-demo. This
demo was my prototype for this new technique. The voxelspace-demo supports
direct writing into graphics memory with both the 68K as well as the PowerPC.

Wenn ein Spiel mit TurboGFX läuft, dann ist es wichtig, dass die Screens
nicht umgeschaltet werden, weil sonst Grafik-Fehler auf der Workbench
erscheinen können. Bei richtiger Programmierung von TURBOGFX sollte das
theoretisch nicht passieren, beim Voxelspace-Demo ist es aber schon
vorgekommen.

If a game is using TurboGFX it is important that no screens are swapped during
the game, otherwise graphics failures can occur on the workbench. When
programming TURBOGFX correctly this problem shouldn't occur, but it
didn't occur sometimes with the voxelspace demo.

The following topics will be discussed:

                        @{"     Triple Buffering     " link TG_Triple}
                        @{" The address of the image " link TG_MapPtr}
                        @{"      Modulo-Problems     " link TG_Modulo}
                        @{"       Implementation     " link TG_Code}
                        @{"         Optimizing       " link TG_Z3}
                        @{"         Problems         " link TG_Problems}

@EndNode

@Node TG_Triple "Triple Buffering"

A graphics-frame of a game is usually not built from the top left to the
bottom right in a linear way. For this reason it is an absolute necessity to
use several buffers when using TurboGFX. In order to completely avoid any
flicker-effects, Triple-Buffering is used, i.e. three buffers that are rotated
after each iteration.

Unfortunately there is no immediate way to program this Triple-Buffering (e.g.
through a library function). Instead, a few tricks must be used in order for
this to work:

When opening the screen its height is tripled. Now this over-height screen is
vertically broken down into three parts. Each of these parts will be treated
as an image buffer in its own right from now on. Furthermore, the fact that
these buffers are located directly above each other can be used to our
advantage.

One of these three image buffers will always be displayed and the next image
will be created in one of the other invisible buffers. Switching is done using
graphics/ScrollVPort (very similar to ECS/AGA).

@EndNode

@Node TG_MapPtr "The address of the image"

One important question must still be answered: where is the left upper corner
of the image located in the graphics-memory?

Now we have a problem: To this date I have not found a 100 percent reliable
method to determine this pointer. Below I will discuss several methods and
want to encourage the programmer to implement as many of these methods as
possible and let the user choose among them with appropriate switches.

Now following are all methods that are known to me:

1. Determine the pointer to the graphics-RAM by using cybergraphics/
   GetCyberMapAttr (CYBRMATTR_DISPADR parameter)

This method is used in the voxelspace-demo, if the option "MODE2" is
enabled. It has worked almost every time for me.

However, I have the suspicion that the pointer to the beginning of the gfx-RAM
need not be identical to the pointer to the beginning of the image in gfx-RAM.
I have also heard about displacements of the image on screen - a problem that
might be directly related to this.

2. Determine the pointer using cybergraphics/LockBitmapTagList

The mentioned function is called with the LBMI_BASEADDRESS parameter, followed
by a call to cybergraphics/UnLockBitmap.

This method is the one authorised by CyberGFX and should be used as
default method, just like in the voxelspace demo.

Judged by its definition this method seems to be a good one. It is used by
the voxelspace-demo if the 'MODE2' CLI-parameter is activated - in case the
first method should cause problems.

However, in certain older versions of CyberGFX the necessary function was
broken. Therefore you should not rely 100 percent on this method.

3. cybergraphics/DoCDrawMethodTagList

This method was never tested by me. It should also be usable to determine the
pointer. This function was also completely broken in earlier version of
CyberGfx.

4. Use a library-function that returns the pointer

Such a function is rumoured to exist already. If such a library is well
supported and maintained, this is a method that is recommended over all the
others. If it suddenly stops working, only the library needs to be replaced.

@EndNode

@Node TG_Modulo "Modulo-Problems"

A further problem remains to be solved. The second row of an image is not
necessarily located directly after the first row. This means an image can have
a horizontal modulo. If this is not taken into account, the graphics-display
might be come completely distorted.

There are two ways to find out how large the distance (in bytes) between two
rows is. To do this, the 'cybergraphics/GetCyberMapAttr' function is called
with the CYBRMATTR_XMOD argument. This function returns the width of the
bitmap used - this is identical to the desired difference between to rows.

The algorithm for doing the actual image-calculations must therefore always
assume that such a modulo-value exists.

@EndNode

@Node TG_Code "Implementation"

Below you find a few code fragments that illustrate programming in
TurboGFX-mode.

First of all, the screen height must be multiplied by three when opening the
screen.

After that you find the code to determine the address of the image as well as
the modulo-value. In the example below, two methods are shown how to get the
address of the bitmap.

Explanation of the variables:

BitmapWidth     : Bitmap width (distance between two rows)
ActualOffset    : number of rows between the beginning of the image and the
                  1st image buffer
HiddenOffset    : number of rows between the beginning of the image and the
                  2nd image buffer
ThirdOffset     : number of rows between the beginning of the image and the
                  3rd image buffer
ActualBitmap    : Address of the 1st image buffer
HiddenBitmap    : Address of the 2nd image buffer
ThirdBitmap     : Address of the 3rd image buffer
ChunkyBuffer    : ImageBuffer that is used for creating the graphics
AreaHeight      : Height of one image-buffer
Mode2           : Is 0, if the bitmap's address should be evaluated using
                  'LockBitMapTagList' and -1, if 'GetCyberMapAttr' should be
                  used.

                move.l  _RPort,a0               ;evalute rastport address
                move.l  rp_BitMap(a0),a0        ;address of the bitmap structure
                tst.b   Mode2                   ;which mode is enabled?
                bne.b   .nomode2                ;GetCyberMapAttr -> jump
                lea     CyberLBTLTags,a1        ;pointer to taglist for LBTL
                CALLCYBERGFX    LockBitmapTagList ;lock bitmap
                move.l  LBMI_Addr,d3            ;evaluate address of bitmap
                move.l  d3,BitMapAddr
                move.l  d0,a0
                CALLCYBERGFX    UnLockBitmap    ;unlock bitmap
                bra.b   .mode2
.nomode2
                move.l  #CYBRMATTR_DISPADR,d0
                CALLCYBERGFX    GetCyberMapAttr ;evaluate address of bitmap
                move.l  d0,d3
.mode2
                move.l  _RPort,a0               ;get RastPort address
                move.l  rp_BitMap(a0),a0        ;Bitmap-structure address
                move.l  #CYBRMATTR_XMOD,d0
                CALLCYBERGFX    GetCyberMapAttr ;find out Bitmap width
                move    d0,BitmapWidth          ;and store it
                move    AreaHeight,d1           ;find out height of image buffer
                clr     ActualOffset            ;vert. position of 1st buffer
                move    d1,HiddenOffset         ;vert. position of 2nd buffer
                move    d1,d2
                add     d1,d1
                move    d1,ThirdOffset          ;vert. position of 3rd buffer
                mulu    d0,d2
                move.l  d3,ActualBitmap         ;address of 1st buffer
                add.l   d2,d3
                move.l  d3,HiddenBitmap         ;address of 2nd buffer
                move.l  d3,ChunkyBuffer         ;2nd buffer used first
                add.l   d2,d3
                move.l  d3,ThirdBitmap          ;address of 3rd buffer

The taglist, which is passed to the function 'LockBitMapTagList', can look
like this:

CyberLBTLTags   dc.l    LBMI_BASEADDRESS
                dc.l    LBMI_Addr               ;-> free space for bitmap address
                dc.l    TAG_DONE
LBMI_Addr       dc.l    0                       ;this is the place for the addr.


The voxelspace demo assumes that the new screen is in the foreground when
'LockBitMapTagList' is called, so the bitmap address points to the graphics
RAM, not to a backup buffer. This should always be the case, because the
screen was just opened a short time ago.

During the main loop the program should test after each 'locking' if the
returned bitmap address is identical to the one evaluated at startup.
If, for example, the screens are switched, the bitmap is copied to the
FAST-RAM. A game/demo should then usually enter waiting state until the
screen has been switched to the foreground again.


The main loop then roughly follows this schematic:

                1. Lock bitmap using 'LockBitmapTagList'
                2. Compute image in the buffer that 'ChunkyBuffer' points to
                3. Create additional elements in the invisible buffer using
                   graphics.library functions.
                4. Unlock bitmap using 'UnLockBitmap'
                5. Rotate image buffer using 'graphics/ScrollVPort'
                6. Rotate necessary pointers and offsets as well


1. First the bitmap should be locked using 'LockBitMapTagList'. If this isn't
   done, the game/demo will still work but it isn't legal anymore from the
   view of CyberGFX. A game/demo should maybe offer the possibility not to
   lock/unlock the bitmap as an option, in the case that the locking mechanisms
   produce problems.

   In the following example it is also tested, if the bitmap address returned
   is identical to the one evaluated at startup. In this case the voxelspace
   demo enters a waiting mode and tests periodically if the screen is at the
   foreground again.

   Note: The voxeldemo hangs up itself in the following function if the
   locking mechanims don't work. A game/demo should offer a possiblity to
   leave the program with an error message instead.

                tst.b   Mode2
                bne.b   .noLBTL
.retry
                move.l  _RPort,a0               ;evaluate rastport address
                move.l  rp_BitMap(a0),a0        ;address of bitmap structure
                lea     CyberLBTLTags,a1        ;pointer to the taglist for LBTL
                CALLCYBERGFX    LockBitmapTagList ;lock bitmap
                move.l  d0,d2
                beq.b   .delay                  ;if error, then jump
                move.l  LBMI_Addr,d0            ;evaluate bitmap address
                move.l  BitMapAddr,d1           ;get original bitmap address
                cmp.l   d1,d0                   ;compare addresses
                beq.b   .noLBTL                 ;if equal than proceed
                move.l  d2,a0
                CALLCYBERGFX    UnLockBitmap    ;unlock bitmap
.delay
                moveq   #5,d1
                CALLDOS Delay                   ;wait a bit
                bra.b   .retry                  ;restart the procedure
.noLBTL


2. This item of course depends on the game being developed. Take care that
   the modulo-value is taken into account.


3. The TurboGFX-mode allows using standard-functions of graphics.library again.
   You must pay attention to adapting the vertical position in a way that
   ensures that the correct image buffer is selected. The vertical position 0
   is always the first row of the first image buffer.
   Any such vertical shift is easiest achieved by adding the 'HiddenOffset'
   value.

4. After all accesses to the bitmap are completed, the bitmap has to be
   unlocked if it was really locked previously . The following code
   assumes that the handle returned by 'LockBitmapTagList' is situated in d2.


                tst.b   Mode2
                bne.b   .noLBTL2
                move.l  d2,a0
                CALLCYBERGFX    UnLockBitmap
.noLBTL2


5. The switching of the image buffers is done as follows:


                move.l  _VPort,a0               ;get ViewPort address
                move    HiddenOffset,d0         ;negate vert. Offset of
                neg     d0                      ;the 2nd image buffer
                move    d0,vp_DyOffset(a0)      ;and enter it into the Viewport
                CALLGRAF        ScrollVPort     ;switch image buffers


6. Only the necessary pointers and offsets remain to be rotated so that the
   next image will be created in the correct buffer:


                move.l  ActualBitmap,d0
                move.l  HiddenBitmap,ActualBitmap
                move.l  ThirdBitmap,HiddenBitmap
                move.l  d0,ThirdBitmap
                move.l  HiddenBitmap,ChunkyBuffer
                move    ActualOffset,d0
                move    HiddenOffset,ActualOffset
                move    ThirdOffset,HiddenOffset
                move    d0,ThirdOffset


@EndNode

@Node TG_Z3 "Optimizing"

If the TurobGFX-technique is used, a couple of additional things must be
considered in order to reach optimum performance. The access to the RAM of the
graphics board will usually be done through the Zorro3-bus (Zorro2 in earlier
AMIGA-models). Exactly this access of the processor via the Z3-bus to the
graphics board is significantly slower than the access of the processor to the
Fast-RAM which is often located on the processor-board itself.

In addition, the Fast-RAM is usually accessed in copyback-mode while the
graphics-board RAM is always accessed 'noncachable' - that is with the cache
turned off.

This leads to the conclusion that it is very important whether the graphics
are written into the RAM in byte- or in longword-increments. In the former
case the performance may drop by a significant amount.

As a more 'hands-on' example let me mention the voxelspace-algorithm here.
This algorithm projects the landscape-data onto the screen in strips. The code
writes vertical columns into the graphics-RAM from left to right. If these
columns are 4 pixels wide, this results in longword-accesses which are
optimal. In case of 2- or even only 1-pixel columns these accesses are not
optimal at all anymore.

This access can still be optimized, though. In the voxelspace-demo in case of
1-pixel columns groups of 4 strips are formed and created in a
Fast-RAM-buffer. After that this buffer is copied into graphics-RAM in
longword-units and then the next 4 strips are created. As this buffer is
relatively small, these accesses still profit from the processors data cache.
The byte-accesses to Fast-RAM are optimized by the copyback-mode of the cache.

When using the TurboGFX-technique you should always make sure to access the
graphics-RAM in longword-units. 'Detours' through Fast-RAM can often lead to
tremendous performance improvements.

@EndNode

@Node TG_Problems "Problems"

At this point I want to point out a well-known problem with
CyberGfx+/TurboGFX.

It may happen that pressing mouse buttons can lead to the machine hanging or
even crashing. In this case it is recommended to de-activate all
commodity-programs running in the system (if you know which program causes the
problem you can of course simply deactivate that one). An example for these
kind of programs are screen-blankers.

@EndNode

@Node Interaction "Interaction"

The essence of every game is the interaction with the player. The player uses
the keyboard, mouse or joystick to direct the game in the way he desires.

The following section covers how to evaluate user-input in a system-compliant
way. Joystick-programming is not covered here as I have never used it myself.
In case joystick-input is desired, you should resort to the documentation of
the 'gameport.device' which is used for that purpose.

Generally the user input occurs in the active window. The operating system
notes this input and sends messages to the program that allow it to evaluate
what kind of input occurred. The game must decide which kinds of input to
evaluate when @{"'Opening the window'" link PAL_OpenWin}, this is done by specifying the appropriate
IDCMP-flags in the window-taglist.

The main loop of the game will then look similar to this:

                        1. Get window-message
                        2. Evaluate window-message and decide which actions
                           to take (if any)
                        3. Answer window-message
                        4. Execute actions
                        5. Compute and display image


1. Get window-message

A message is read using the 'exec/GetMsg' system function. It expects a
message-port as parameter. The UserPort of the window is passed. This then
looks as follows:


                move.l  _Window,a0              ;get window-address
                move.l  wd_UserPort(a0),a0      ;find out UserPort-address
                CALLEXEC        GetMsg          ;get message
                tst.l   d0                      ;is there a message?
                beq.w   .loop                   ;no -> no evaluation
                move.l  d0,d4                   ;save message for ReplyMsg
                move.l  d0,a0                   ;put message into a0 for evaluation


2. Evaluate window-message

A window-message has a defined structure (which can be found in the
'IntuiMessage' structure in the 'intuition/intuition.i' include-file). Some
elements of this structure can only be used for evaluation. First of all the
'im_Class' field which classifies the type of input is evaluated.

After getting the message all interesting elements are read:

                move.l  im_Class(a0),d0         ;get message-class
                move    im_Code(a0),d1          ;get message-subclass
                move    im_MouseX(a0),d2        ;get mouse-delta-position
                move    im_MouseY(a0),d3        ;and for Y-direction

Now the message-class is checked (the possible values correspond to the
IDCMP-flags that were specified when opening the window). Example:

                cmp.l   #IDCMP_MOUSEBUTTONS,d0  ;mouse buttons pressed?
                beq.b   .checkmouse
                cmp.l   #IDCMP_RAWKEY,d0        ;key pressed?
                beq.w   .checkrawkey
                cmp.l   #IDCMP_MOUSEMOVE,d0     ;mouse was moved?
                beq.w   .checkdeltamove
                cmp.l   #IDCMP_ACTIVEWINDOW,d0  ;window activated?
                beq.b   .activewindow
                cmp.l   #IDCMP_INACTIVEWINDOW,d0 ;window deactivated?
                beq.b   .inactivewindow
                bra.w   .reply                  ;else reply to message

Depending on the message-class further information is now extracted. This is
most often done in the 'im_Code' filed that now is located in d1:

IDCMP-Flag MOUSEBUTTONS:

                cmp     #IECODE_LBUTTON,d1      ;left mouse button pressed?
                beq.b   .leftdown
                cmp     #IECODE_LBUTTON+IECODE_UP_PREFIX,d1 ;released?
                beq.b   .leftup
                cmp     #IECODE_RBUTTON,d1      ;right mouse button pressed?
                beq.b   .rightdown
                cmp     #IECODE_RBUTTON+IECODE_UP_PREFIX,d1 ;released?
                beq.b   .rightup
                bra.w   .reply

IDCMP-Flag RAWKEY:

The 'im_Code' field contains the key-code of the key that was pressed. This
code is NOT identical to the ASCII-code (the IDCMP_VANILLAKEY IDCMP-flag must
be used to do this). The codes must be taken from a table or be determined by
trial&error.

Bit 7 of the key-code specifies whether the key was pressed or released. This
can be tested using the IECODEF_UP_PREFIX tag.

Example:

                cmp     #$45,d1                 ;ESC pressed?
                beq.w   .esc
                btst    #IECODEB_UP_PREFIX,d1
                bne.w   .keyup
                cmp.b   #$50,d1                 ;F1 pressed?
                beq.w   .F1pressed
                cmp.b   #$51,d1                 ;F2 pressed?
                beq.w   .F2pressed
                ...
                bra.w   .reply
.keyup
                bclr    #IECODEB_UP_PREFIX,d1
                cmp.b   #$55,d1                 ;F6 released?
                beq.w   .F6released
                cmp.b   #$56,d1                 ;F7 released?
                beq.w   .F7released
                ...
                bra.w   .reply

IDCMP-Flag MOUSEMOVE:

This IDCMP-flag should always be used together with IDCMP_DELTAMOVE. The
'im_MouseX' and 'im_MouseY' fields then contain the number of units the mouse
was moved by. The scale of these values must be determined simply by trying
out. Very often these values are scaled for further use in the game.

ACTIVEWINDOW and INACTIVEWINDOW IDCMP_Flags:

No further information can be gained for these flags.


3. Reply to window-message

After evaluating the message it must be replied to. This is done through the
'ReplyMsg' exec-function:

                move.l  d4,a1                   ;put message into a1
                CALLEXEC        ReplyMsg        ;reply to message


The items 4 and 5 are program-specific and do not belong to the interaction
process as such.

@EndNode

@Node RAM "RAM is slow"

In the past years CPU power has risen almost exponentially. On the contrary,
RAM-chips of the kind used in conventional computers have only improved a
little bit. The access to RAM has become more and more of a bottleneck.

This bottleneck is successfully being fought using ever-increasing cache
memories. The caches have become an important performance factor for
conventional applications.

But games are no conventional applications. Games are very memory-intensive as
they have to cope with larger and larger amounts of data. Caches are often
even counter-productive in certain areas. For this reason it is important to
pay attention to minimizing memory access during the development process.

In this context some programming philosophies as they are know from older
processors must be turned upside down. When programming the 68000, games were
optimized by doing as many calculations as possible in advance and storing the
results in tables from where they were read in the realtime-part of the
program.

As a lot of games were developed using this philosophy, many of them did not
experience the expected performance increase when running on a faster
processor. Even the very fastest processor accesses RAM only a little faster
than a conventional 68000.

Modern processors can execute 50 to 100 or even more commands in the same time
they need to do a memory access. This leads to the conclusion that it is often
faster to compute data in realtime than to read it from a table. On top of
that, large tables are very cache-unfriendly. They lead to a large
efficiency-loss of the cache and therefore decrease performance a lot.

The voxelspace-demo puts this new philosophy into action. Many
voxelspace-algorithms described in scientific journals were optimized for
older processors by creating many structures in advance. The voxelspace-demo
does nearly all of these calculations in realtime. The main loop contains only
one memory access: the reading of the height/color data of the landscape. Of
course all accesses for creating the graphics come on top of that. These kinds
of algorithms that almost entirely consists of calculation commands also
further the pipelining and thus increase the throughput of the commands.

It is of decisive importance for the programming of PPC-games to optimize the
algorithms for a minimum of memory accesses in order to utilize the full power
of these processors.

@EndNode

@Node MMU_Cache "MMU and Cache"

This section contains information on how the cache and the MMU can be used to
achieve significant performance improvements.

The caches are most efficient if the same memory areas are accessed very
often. In that case time-consuming memory-accesses can be avoided. The best
example for this is the processor-stack.

When it comes to game programsb it is often the case that huge amounts of data
must be accessed but only rarely the same memory area is accessed several times.
In these cases the caches actually become counter-productive and slow the
game down. Caches are managed in segments of 32 bytes internally. As soon as a
data element is accessed one such cacheline is loaded into the cache as a
whole. This memory access takes as long as 8 conventional memory accesses
(this applies to processors with a 32-bit bus). This extra time is usually
compensated for because all following accesses to this cacheline only have to
be done on the cache.

For games that only use such data elements once this leads to memory access
becoming slower than if the data cache is turned off.

Now we have a problem: if the data cache is switched off globally, you have
taken care of the above problem but now you are losing performance because
access to those areas that could take advantage of the cache is now slowed
down. The only solution is the creation of an optimized MMU-setup that assigns
the separate memory areas different cache-modes. Unfortunately AMIGA-OS offers
no MMU-support at all. The only solution is to use 'evil hacks', which means
direct access to the hardware - something that actually shouldn't be done
anymore.

When employing the PowerPC and the WarpOS operating systems this is different.
WarpOS offers applications the option of allocating memory areas with a
certain cache-mode. This offer a system-compliant way to use the MMU and the
cache in a most optimal way.

The 'AllocVecPPC' function of the powerpc.library supports additional memory
attributes that can be used to specify the desired cache-mode. Games can now
simply allocate a memory area and mark it as 'noncachable' (using the
MEMF_NOCACHE attribute). For a detailed description of this function I
recommend the 'WarpOS.guide' and 'powerpc.doc' documents.

Here a futher note to this problem: The local disable of the data cache can
accelerate a program or it can even brake it down, dependant on the processor.
The greater the data cache and the faster the memory access the greater the
probability that a program is braked down. So it should be always tested on
several machines and eventually there should be several versions for
different processors. The voxelspace demo enables the local disable of the
data cache for the 603E, but enables it on 604E (with CyberStormPPC)
because the 604E was indeed braked down.

I now want to point out a further problem. The MMU has the task of doing
address translations. It looks up in a table which logical address matches
which physical address and then performs the translation. On every
68K-processor that has an MMU this is done automatically in hardware. This
process is called the tablesearch or tablewalk.

Not all of the PowerPC-processors support the hardware-tablesearch. The PPC603
and the PPC603E do not know any hardware tablesearch. On these processors the
tablesearch is done in software. Such a software-tablesearch is of course
extremely slow. However, this is not relevant most of the time as a
'MMU-cache' takes care that these tablesearches occur seldom enough to not
pose a performance-problem.

Again, this can be completely different for games. Let's take a look at
'voxelspace', for example: The voxelspace-algorithm is based on a geographic
map that is 2 MByte in size. The algorithm now uses a certain method to read
the map data. In this process it traverses an extremely large address space.
This results (similar to the effect on the cache) in the efficiency of the
MMU-cache dropping close to zero. As a result a tablesearch has to be done for
almost every memory access which can lead to the game stalling.

This effect can even be observed on systems that have hardware-supported table
search, but the effects are not very severe in that case.

Again, there is no system-compliant solution to this problem on 68K-systems.
The voxelspace-demo offers an additional option for 68K-systems which must be
considered a 'hack' and is therefore not entirely system-compliant. This hack
sets up the MMU using the transparent translation registers and in this way
keeps tablesearches from occurring. This can result in as much as 40-50
percent performance increase.

On the PPC, WarpOS offers a system-compliant way to solve this problem. WarpOS
supports the so-called BAT registers that work similar to the transparent
translation registers. When running WarpOS, each PPC-task has the option of
filling the 4 BAT-registers with a memory-area of your choice. There are 4
BAT-registers available, one of which will most often be used by the operating
system to cover the graphics-RAM. By employing the BAT-registers,
tablesearches for the specified memory areas can be avoided.

The 'AllocVecPPC' function of the powerpc.library knows the MEMF_BAT memory
attribute. If this is specified the allocated memory area will be controlled
by a BAT-register.

The voxelspace uses this additional feature if the TURBOPPC option is switched
on. It allocates the memory for the map as well the memory for the sky using
the MEMF_NOCACHE and MEMF_BAT attributes.

In order to demonstrate the power of this feature, I want to point out the
following:

The voxelspace-demo runs about twice as fast on a PPC603E/150 than on a
68060/50 if the standard options are used and both processors are driven in
the most optimal way. In this case the 68060 does not run system-compliant.

If the 'MMU-hack' for the 68060 is switched off, the PowerPC runs as much as
three times as fast. This means that games that are programmed in a
system-compliant way can squeeze even more power out of the PowerPC.

@EndNode

@Node Multiprocessing "Multiprocessing"

To put first things first: this section does not explain how to use
multiprocessing in order to increase performance but rather explains why
multiprocessing can not be used to increase performance.

A dual-processor-board might give reason to hope to be able to run both
processors at the same time in order to achieve a higher performance. The
problem is: both processors use the same bus. During every memory access by
one processor the bus is blocked for the other one. Algorithms that rely on a
lot of memory accesses then result in the overall performance of both
processors decreasing.

Now let's assume that both processors are executing algorithms that are not
dependant on memory access - in this case a performance increase could
actually be achieved. But especially in this case the PPC outperforms the 68K
by such a large margin that it doesn't pay off anymore to run both CPUs in
parallel. On top of that, pure computing-intensive algorithms are very rare.
The mandelbrot-algorithm is an example for that. And even in the case of
'cybermand' parallel-processing does not lead to a noticeable performance
increase.

Conclusion: when designing a game you must take care that it runs sequentially
on both processors. You should also not use several tasks as this can result
in similar problems.

@EndNode

@Node Scheduling "Scheduling / Optimizing"

Scheduling is the art of arranging commands in a way that optimally exploits
the internal execution units of the processor. Scheduling has an increased
importance for modern processors. Scheduling should not be overrated, however.
Only applying the scheduling-rules systematically (as described in the PPC
user-manuals, for example), can lead to a performance-increase. And that is a
task for high-level programming language compilers.

Nevertheless, I want to point out some possibilities to optimally arrange
commands. The voxelspace-demo contains such a case. The innermost main loop
contains a floating-point division as well as a memory access. Both command
have a long execution time, which means they are very slow. The performance
can now be optimized by placing the floating-point command directly before the
memory access (that is the FP-command followed by the memory access). Now the
PPC can execute both commands in parallel and quite some time can be saved.

Avoiding dependencies is the most efficient scheduling method which can be
implemented best. The powerpc works most efficiently if its pipelines are
filled as good as possible. But if there are commands one after the other
which need the result of the predecessor, such a command has to wait until
the result is available. Clever placement of commands can result in much
better performance in many cases.

As a general rule it is advisable to split algorithms among several execution
units. Integer- and FPU-commands should be placed alternately so that they can
be executed in parallel. This makes sense if the algorithm has at least a
certain size. In the case of small algorithms you will face the problem that
the floating-point<->integer conversion will eat up all the time gained by
scheduling the commands.

When programming in assembly language you should therefore always try to place
integer-, floating-point- and memory-access-commands alternately.

If a programmer has to choose if a program should be based on integer or
FPU algorithms, the FPU version is mostly the more optimal one. The FPU of
the PowerPC is extremely powerful and offers many powerful commands, such
as the combined Multiply-Add/Sub commands.

But it has also to be considered that the PPC604E has several integer units.
In this case, an integer algorithms is maybe faster than the corresponding
FPU algorithm.

Here some hints on optimizing: thanks to the many registers of the PPC, memory
accesses can often be avoided. Before executing the actual function, all
constants and variable can be loaded into registers and all calculations in
the main loop can be done on the registers. This of course requires a clean
documentation as one still has to be able to determine which register contains
what value/variable/constant.

@EndNode

@Node Configuration "Configurability"

Now we leave the technical area and turn our attention towards game-design.
First of all we will discuss the configurability of games.

Games often gain a lot of attractiveness if they have a lot of options and
switches that allow playing the game from a different perspective or under
different conditions. This includes the screen resolution which should be
freely selectable by the player (if a graphics card is present in the system).

The control should also be as flexible as possible. Several types of control
should be supported as well as parameters that define speed, inertia, etc.

The keyboard map should be as freely selectable as possible as every player
has different preferences in this respect. Too many games got a bad review
because of the simple reason that they had an absolutely unusable keyboard
layout.

Very often technical parameters are among the favourite toys of the player.
These include screen resolution, screen detail, window size, approximations,
landscape parameters, displaying the framerate, to just name a few. The more
of the these toys are available, the more interesting it gets to play the game
several times.

As a general rule you should use your imagination to the fullest in order to
enable the player to play the game in as many different ways as possible.

@EndNode

@Node Control "Control"

Just too many promising games failed due to a miserable control. In most cases
the control is hard-coded into the game and the player has no influence on it.

Generally speaking, your game should support as many control-modes as
possible, e.g. joystick, mouse and keyboard. Every gamer has a preference for
one these control modes.

The control-speed also takes a central role. Here some negative examples from
well-known games:

A popular car-racing game used a system of time-limits, i.e. a certain
distance must be covered within a certain time. The time expired absolutely,
which is exactly as a real clock. If the game was played on a slower machine,
the hardware was only able to display fewer frames in the same amount of
time. Result: the car covered less distance in the same time and at the same
speed. This is of course very bad as the user of slower machines are at an
disadvantage and it is also very unfair in itself.

A popular 3D-shooting-game uses a well though-out mechanism for rotating the
player if he presses the left and right cursor keys. The actual rotation speed
was supposed to be constant, however - i.e. the player was supposed to be able
to turn the same angle in the same time, regardless of the game speed. Result:
in order to cover the same angle in the same amount of frames on slower
machines, the angle hat to be increased dramatically. As a direct result, a
short press on the cursor key sufficed to rotate the player by a half turn.
This of course made any precise control virtually impossible and the game was
not playable anymore on most systems (except when you artificially increased
the game speed by choosing a smaller screen size/resolution).

These examples show how important it is that the control is adapted to the
respective hardware. Positive examples exist as well, certain games allow
setting parameters such as walking- and rotation speed and even the movement
inertia.

Another important thing is to make the control as precise as possible. Many
games became un-playable simply because certain tasks that had to be
accomplished could only be mastered through sheer luck due to the un-precise
control of the game.

A game should also take into account the limited movement accuracy of the
player. If e.g. collisions are to be detected, any such checks should not
react immediately if only one pixel of an object touches one pixel of another
object. A more generous check is needed here in order to avoid detecting
collisions if actually none have occurred (if two cars touch this is quite
different from when they crash head-on).

@EndNode

@Node Difficulty "Difficulty"

Gamers can be broken down into several categories: those who play a game
every now and then, those who play more or less regularly and the
professionals. At best, a game should be attractive for all of these potential
customers and in order to accomplish this it needs several levels of
difficulty.

A lot of games have difficulty levels - and almost none of them actually
employ this option in a sensible way. If a popular action-game offers four
levels of difficulty and calls the hardest one 'Maniac', then it is simply not
right that a professional action gamer walks through half of the game on the
first try and then quits the game himself in order to be able see something
new the next time he plays it.

Absolute beginners are neglected in the same way as often. The most simple
level of difficulty is very often still too hard for those that have never
seen a game of the genre. Rule:

The programmer should choose the most simple level of difficulty so that he
thinks even a toddler could manage it - after that the difficulty is halved.

For the most difficult level you should proceed analogously. The higher the
difference, the better. Even top-profis will be offered an additional
challenge.

'Simple' and 'difficult' are often interpreted in a completely wrong way. A
game is often made more difficult my simply increasing the amount of luck
needed to get past a certain point. Very often the strength of the player is
simply reduced and this is then sold as a separate level of difficulty.

An easy level of difficulty should interpreted in that way that mistakes made
by the player have less dire consequences. There is no sense in extremely
reducing the number of enemies if an unskilled player keeps ramming the
landscape and looses a life in the process. It would be more advisable to
lessen the consequences of such collisions.

Different levels of difficulty should not only change the game parameters.
They should also introduce new elements into the game in order for advanced
gamers to be required to really think and change their strategy. If you use a
lot of creativity on that area, you can greatly increase the persistant
attractivity of a game.

You should also check whether the owners of system with different speeds are
at an advantage or disadvantage. This should never be the case and must be
compensated for by employing appropriate measures.

@EndNode

@Node Change "The necessary change"

More and more often you can read things such as 'over 33 levels', 'more than
100 tasks', etc. in advertisements or on game boxes.

It is actually very good if a game is large enough to keep the player
entertained for a long time. Unfortunately it is most often the case that
games developers shoot all of their powder in the first 10 percent of the
game. During this time the player enjoys the game only to become completely
disappointed once nothing new turns up and no change is provided anymore.

It is important to spread all elements evenly throughout the game. You should
not introduce all elements right at the beginning of the game as very often
the player will simply be overtaxed by this. It is better to confront the
player with the aspects of the game step by step so that he has time to adapt
to each new feature.

A game should also keep up the motivation of the player to continue playing.
Special events (e.g. super-enemies) that really impress the player should
occur, or bonus sequences - always different ones if possible. Also, it is
often interesting to put a real "barrier" into the game that truly taxes the
player (a task often accomplished by super-enemies). Defeating a strong enemy
is an impressive event and poses a high motivating force as often several
tries are needed until the player finally makes it. It also requires the
player to concentrate during all parts of the game as he has to save resources
for defeating the super-enemy.

Most of the super-enemies I have seen to this date were very unimaginative and
easy to see through. Most of the time they can be defeated using a very
primitive strategy. I would be best if in addition to innovative
attack-strategies (through which the player will still see after three or four
times), dexterity would also be required so that the player must muster the
same concentration every time he challenges this particular enemy. By the way
- it is always advisable to keep the challenging the players' dexterity as
dexterity isn't something you'll be able to 'learn by heart'. Most of the
games get boring as time passes because the player has 'seen everything'.

@EndNode

@Node Playability "Playability / Fairness"

Playability must be the one thing which many games lack most. A game is
playable if it is actually fun to play. For this reason playability should be
of high importance during games development.

One classic example of un-playable elements are the famous backside-attacks.
These tactics are often used to artificially increase the level of difficulty.
But especially these kinds of attacks have no effect in the long run except
making the game more boring as the player will soon know when to expect those
unfair attacks.

Many games boast about the speed at which the objects move. All too often this
results in a cool effect in the short run but this effect is not all so cool
anymore after having seen it two or three times. What remains is most often a
scene that has nothing to offer in terms of actual game action.

Another common error: a game is programmed in a way that makes it behave
virtually the same every time it is run. Most often the player is forced to
always take one path, to always fight the same enemies which always attack in
the same way and which can, of course, always be defeated in the same way. In
the long run this is nothing but highly uninteresting.

Modern games should give the player as much freedom of action as possible and
also employ random elements to spice up things a little. Special care should
be taken regarding random elements, however, because otherwise they will
backfire and result in unfair situations occuring.

Games should keep offering dexterity-elements on a regular basis in order to
force the player to concentrate. The goal of a game should be to capture the
player and draw his undivided attention in order to create a thrilling
atmosphere. As soon as the player is able to lean back and easily defeat the
game 'from a distance', any motivation to play this game will soon be lost.

@EndNode

@Node Demos "Demo-Versions"

90 percent of all demo-versions I have seen and tested to this day were
complete and utter trash.

It is hard to believe how much energy is lost simply because the presentation
of a game was done with an un-satisfying demo version. Demo-versions should
offer a first step into the game and also demonstrate the advantages of the
game. Demo-versions are also often used as a measure whether a game will be
bought or not. In times when well-know Amiga-magazines rate almost every game
as 'good', this is likely to be the only chance to get an impression of the
game.

These kinds of demos receive only a very short residence-permit on my hard drive:

- Demos in which I get killed after 10 seconds without ever knowing what just
  happened.
- Demos that within the first 2 seconds throw tons of objects at me that want
  to finish me off.
- Demos that want to present all elements of the full game at one time and
  completely overtax me in the process.
- Demos that have a bad, un-precise control
- Demos that have no (reasonable) documentation
- Demos that don't run or run unstable

This at the same times says a lot about how demos should look. Demos are often
confused with prototypes. A prototype may violate all of the above items. But
it is not meant for the public. The potential customers deserve better.

Demos should fulfil a tutorial-function. They should slowly introduce the
player to the game an present some of its elements as well as confront the
player with tasks he can handle.

Demos should offer enough playing time for the player to be able to get an
idea of the the game.

It is important to invest a sufficient amount of time in creating
demo-versions. This time will pay for itself as a good demo-version motivates
the potential customer into buying the game.

@EndNode

@Node 3D "Thoughts on 3D"

I want to take the opportunity to utter my thoughts on the 3D-topic. 3D-games
have been experiencing a boom for quite some time already since the hardware
of competing systems could reach the necessary speed. For a very long time,
not a single game of this kind could be found on the AMIGA.

Nowadays the technology of 3D-games on competing systems has already advanced
very far. And only now old 3D-games are slowly being copied on the AMIGA.

Most of the 3D-games for the AMIGA still use the antiquated
floor-wall-technique which severely hampers the players' freedom of movement.
On competing systems complete freedom of movement has been the standard for
quite some time now.

At his place I want to summon everybody familiar with 3D-techniques to not
simply keep the delay on the competing systems by simply pulling old source
codes over to the AMIGA without any creativity but rather use the newest
techniques so that this delay simply won't exist for much longer.

The PowerPC-processor offers us the opportunity to achieve the same speeds as
competing systems, therefore we should also be using the same new techniques.
For this reason all 3D-specialists must join forces in order to define new
3D-techniques the target of which must me to top the competition. Today the
games of tomorrow should be created - not those of yesterday!

@EndNode

@Node Address "Address of the author"

I have invested a lot of time to put all thoughts and all know-how into this
document. By doing so I want to show that it is about that time to make the
technical know-how available to everyone in order for the AMIGA to be able to
catch up with the competition again in a joint effort.

On the AMIGA-sector there is absolutely no sense in hiding technical know-how
from the 'evil' competition. I want to remind everybody again: the technology
is only a tool. A game is measured by its quality, not by the technology used
to create it. Therefore it is my goal that the games developers allow full
access to all technical matters and then again concentrate on the actual
gameplay again - an area where they can still keep honing their business
secrets.

I would be very happy to get in contact with people who aim at developing
unique and innovative games for the AMIGA. Through discussion and exchange of
technical refinements the lead of competing systems could be diminished and
the quality of games greatly enhanced as not so much time would have to be
spent on technical details.

Anybody who wants to contact me can do so by one of the following ways:


        regular mail:                   eMail:

        HAAGE&PARTNER GmbH              s.jordan@haage-partner.com
        Sam Jordan
        Schlossborner Weg 7
        61479 Glashuetten
        Germany
        Tel: ++49/(0)6174/966100
        Fax: ++49/(0)6174/966101

@EndNode