Home

Cat's Out of The Bag...

  • May. 13th, 2008 at 6:47 AM
I'd been thinking about dumping the 3rd Generation iPod Nano flash memory for
a while now, and recently saw that the Linux4Nano guys have already dumped the
2nd Generation flash... Which is BGA btw!

Originally I was going to wait until I actually had the dump in hand before
posting anything, considering it's my first project with an FPGA thats above
the trivial bitstream hacking level. However I felt I should let the
Linux4Nano guys know, at least...

I'll pickup the iPod from work on Thursday (staff sale :-) and probably a
camera that does decent macro shots; my current camera is pretty good, but
does terrible macro shots.

So until then... ;-)

R300 VAP_CNTL Fix Pushed...

  • May. 4th, 2008 at 11:18 AM
I just pushed Markus Amsler's VAP_CNTL patch, which cleans up a lot of the
nastyness we previously had with this register. It shouldn't break anything,
but you know the drill, test and report. :-)

P.S. Sorry about the delay, real life got in the way...

Testing the R300 Driver...

  • May. 2nd, 2008 at 4:20 AM
I just merged Markus Amsler's VAP_CNTL patch into my local Mesa repo, and
tested it with glxgears, Quake 3, and Doom 3. Doom 3 actually has some serious
bugs, but it's mostly related to shadow volumes. I'll push Markus' patch some
time today and close the bug.

I'm currently running on Alex's DRM which has some additional commits correcting
the purge/flush code, as well as some other changes. I haven't tested this with
code that is known to cause lockups yet, but I'm pretty amazed that Doom 3 ran
without a single lockup... :-)

Here's a screenshot of the Doom 3 rendering bugs, for your viewing
(dis-)pleasure. ;-)



Actually I had to disable the two-sided stencil extension just to get that
screenshot, otherwise software rendering would be used... So some work is needed
on the stencil code...

Random Update...

  • May. 1st, 2008 at 2:11 PM
I found out a few days ago that I didn't get the Google job, which I kind of
half-expected after reading about Google mass-targeting people on various
mailing lists, such as the Linux Kernel Mailing List.

Part of the email from the recruiter was "... though we don't have a position
that is a strong match with your qualifications at this time, we will
definitely be keeping your resume active in our system, and I'd love to stay
in touch with you in the future about opportunities." So maybe some time in
the future... :-)

I've been thinking about getting back into some hardware hacking. I have a
couple of FPGA boards sitting around here which I really should put to use. I
have a little project in mind, with time permitting. At least it's something
to motivate me to finally learn Verilog. I have already ordered a few parts
for this project, but I'll see how it goes before posting further details...

Google Interview!

  • Apr. 19th, 2008 at 5:02 AM
A few days ago I was contacted by an engineering recruiter from Google. I have
to admit my first reaction was that someone was playing a strange practical
joke, but after checking it out and replying to the recruiter, it's defiantly
legitimate. :-)

I've got a phone interview scheduled for Tuesday. Hopefully this goes well;
I'm really excited to be given a chance to work at Google, but I'm usually
quite nervous about interviews, so I'm hoping everything goes okay...

On Governments...

  • Mar. 31st, 2008 at 1:28 PM
I normally don't care for or have any interest in governments or politics,
however I felt this troubling enough to write about... I apologize in advance
for this post not being filled with interesting Radeon hacking topics. :-)

Apparently the New Zealand government feels it's acceptable to automatically
enroll new employees in a KiwiSaver scheme, without prior permission or
agreement, and deduct 4% from my pay every pay day.

Fortunately it's possible to request an "opt-out" of this scheme, which I have
done. I still find it rather unsettling that you may be automatically enrolled
in a saving scheme, without any prior permission or agreement, and that you
must request an opt-out, as apposed to being guaranteed the
ability to opt-out.

Should I not be entitled to opt-out at will, and decide where my own money
goes? I would rather have the money in my bank account, where I may do with it
as I choose, than some semi-forced government saving scheme.

I don't think there will be any problems getting my opt-out request approved.
As far as I can tell, I meet all the criteria required... However the fact
that you must request an opt-out, and are not guaranteed one is
somewhat troubling...

Pushed Some Fixes...

  • Mar. 30th, 2008 at 4:25 PM
I've pushed some fixes from Markus Amsler which fix two bugs introduced by the
vertex program branch merge. The first was caused by a core Mesa change cherry
picked from the Gallium branch, and the second was a copy-and-paste error.

This should fix a couple of problems reported by various people. :-)

R300 Vertex Program Branch Merged...

  • Mar. 29th, 2008 at 6:13 PM
I finally got around to merging my vertex program branch into the mainline
Mesa tree. This changes the R300 drivers vertex program code to match AMD's
names and semantics; for example, the separation of Vector and Math engine
opcodes, etc.

This doesn't add any additional optimization of the vertex programs, but does
clean up the code, and make it easier to compare against the documentation.
Hopefully there are no regressions from this merge. I've tested with the Mesa
vertex program tests, glxgears, Quake 3, etc.

I've also pushed a patch to the DRM which corrects the R300_CMD_WAIT command;
you can read the commit log for the details, but briefly we would set the
wrong idle conditions, which lead to some lockups. Actually this was only seen
recently during an IRC discussion between myself, Jerome Glisse, and John
Bridgman. Thanks to John for pointing this out! :-)

I don't know exactly which lockups are fixed by this patch, so please test
your favorite OpenGL programs.

Anyway, I'm going to get some more sleep and hopefully get over this horrible
cold; hopefully before I get fired for taking too much sick leave. ;-)

Companion to Revenge...

  • Mar. 12th, 2008 at 9:28 AM
I've started thinking about doing some work on a companion tool to Revenge,
which would be similar to r300_demo, but designed more for testing
implementation of the 3D documentation.

Currently we don't have a easy way to take something from the 3D
documentation, try a quickly written implementation, and see whether it works
before writing a proper implementation for the Mesa driver.

While this is possible by hacking the Mesa driver directly, you do not have
fine-grain control over exactly which commands are sent to the GPU, and
somewhat limited debugging ability.

I apologise that I've been kind of slack with getting the vertex program
branch merged. Between starting a job (working at Dick Smith Electronics),
moving from my Opteron to a Sempron box (about time I upgraded to PCI-E), and
cleaning up my home directory, I've been a bit pressed for time.

I'll try to get it merged before or over the weekend. :-)

R300 Vertex Program Branch...

  • Mar. 3rd, 2008 at 3:02 PM
I've published my work on the R300 DRI vertex program code, and hopefully
fixed the last couple of bugs. The MAD and SUB instructions have some
weirdness, but they should be working correctly now...

You're welcome to test and report any problems. Assuming no major problems,
I'll finish off by reworking the Vector/Math Engine macros, and pushing into
the main tree.

I also have some thoughts on how the vertex/fragment program code could be
further improved, but I'm not sure it's worth the effort with Gallium around
the corner... Really a full optimizing compiler is required, rather than a
simple Mesa-to-Hardware translator; this may still be the case with Gallium,
but I'm not sure.

I would be very interested to talk to AMD about their implementation. Though I
suspect they would have an easier time as they can target the whole graphics
stack specifically to ATI hardware...

R300 Hacking...

  • Feb. 24th, 2008 at 10:43 PM
I've started hacking the vertex program code in R300 DRI based on the newly
released 3D documentation. Initially this started out as just register/define
renaming, but I'll probably end up restructuring the code again as currently
it's still a bit messy...

Christoff Brill has also made some patches renaming R300 DRI register names to
the names used in AMD's documentation, which was generally agreed to be a Good
Thing (TM).

I will probably finish the vertex program hacking today and commit either
today or tomorrow. There is currently a bug on the master branch that causes
my card to instantly lockup with any 3D program, so I need to track that down
first...

So there is work going on, even though not much of it has been pushed yet. :-)

That Was Fun...

  • Feb. 14th, 2008 at 9:30 PM
I just got back from driving to the map point for a warehouse party, through
the morning traffic from hell, only to find out I was supposed to be there
tomorrow after 4 PM.

Awesome... :(

AtomBIOS Experimentation...

  • Jan. 30th, 2008 at 8:59 AM
I've started looking into the AtomBIOS ROM format, and there is still a lot
that I don't understand completely, but I'm starting to get a better idea...

From what I can tell, there isn't any clear distinction between the bytecode
and data sections, but it's a bit confusing because the ParseTable function
seems to parse both tables and opcodes...

Anyway, just a quick update, hopefully I'll have something more concrete
soon...

Fast Reciprocal Square Root...

  • Dec. 17th, 2007 at 3:17 PM
I've been doing some work to optimize tangent space calculation in my
development engine, Trinity. A large amount of time is spent calculating the
reciprocal square root for vector normalization. I've done some profiling
(with the rdtsc instruction) of different ways to calculate the reciprocal
square root which might be interesting to read about.

Here are the results (clock cycles) of the profiling code running on a dual
AMD Opteron 246.

05: rdtsc_base
37: rdtsc_trinity_rsqrt
37: rdtsc_xreal_sse_rsqrt
43: rdtsc_quake3_rsqrt
44: rdtsc_doom3_rsqrt
51: rdtsc_xreal_rsqrt
60: rdtsc_doom3_invsqrt16_rsqrt
76: rdtsc_slow_rsqrt


The rsqrtps instruction (operates on 4 floats simultaneously) would probably
be even faster than the rsqrtss instruction, but thats an optimization for
later...

You may be interested in Chris Lomont's Fast Inverse Square Root and Apple
Computer, Inc's Computing the Inverse Square Root (the Doom 3 function) paper
as further reading.

Interesting Lockup...

  • Dec. 16th, 2007 at 7:24 PM
I started doing some checking for floating point exceptions (NaN, inf, etc) in
my engine for performance reasons; these degenerate values can stall the CPU
quite badly.

After the GPU decided to lockup I didn't get my usual repeating buffer of
music, but rather a constant tone. Probably something to do with how the
kernel was buffering the audio...

It's kind of weird to see your PC grind to a halt and turn into a tone
generator. ;-) AMD, where are the 3D specs!

More Hacking!

  • Dec. 9th, 2007 at 11:14 PM
Revenge 1.1 has a bug on X1600 Mobility PCI-E cards which probably effects all
RV530 (aka M56) hardware. The framebuffer is not detected correctly on this
hardware; you can work around this problem by disabling framebuffer mapping
with the following patch.

I have also decided that it would be a good idea to convert Revenge to use
PCI-ID identification. This means that hardware detection should be more
reliable, but that Revenge will only work on your hardware after you have
added your PCI-ID and associated information.

Without PCI-ID identification there will probably be too many subtle variances
between families and even individual chips for Revenge to work reliably on
every card...

I've also sent an email to the LinuxBIOS mailing list in response to some of
my AtomBIOS related entries, there does seem to be some interest... No code
has been written yet and it's just research at this point, but who knows, it
might lead to something... :-)

Barycentric Coordinates...

  • Nov. 25th, 2007 at 10:12 AM
I've been wondering for some time how the newer id Software games (Doom 3,
Quake 4, ETQW, etc) create interactive GUI surfaces in the world.

I peeked inside the ETQW beta SDK and found they are using barycentric
coordinates
. I found almost identical (but much more complete) code in
Christer Ericson's Real-Time Collision Detection book.

So I finally know how it's done, and since the code is published in a book, it
should be fair game. :-)

Now I need to make some time for R5XX hacking on r300_demo...

New Shadow Code...

  • Nov. 21st, 2007 at 11:14 AM
I just finished all the major refactoring of the shadow code in my unreleased
development engine. The new code is based on indices, rather than triangle
structures and is much easier to work with.

I still have to change the tessellation code for the bezier patch surface as
it's handled specially, but the majority of the work is complete. :-)

The next step is to implement J.M.P. van Waveren's optimizations described in
the Shadow Volume Construction paper (vertex culling information, SIMD, etc)
and Eric Lengyel's shadow volume caps and scissor optimizations described in
The Mechanics of Robust Stencil Shadows.

Figuring out the scissor optimization math is going to be fun...

More BIOS Flashing Research...

  • Nov. 18th, 2007 at 12:17 PM
I discovered some more information useful for figuring out Radeon BIOS
flashing. One of the flashing program, flashrom, contains a plain-text file
listing some register information.

The following is the definition for one of my Radeon cards:

ASIC_SF    4E40 FFF0 R300/R350 FFFF 00000000 00000000
INIT_REG   0050 FFDFFFFF 04000000 FFFFFFFF 00000000
INIT_REG   0008 FFFFFF00 0000008D FFFFFF00 0000008D
INIT_REG   000C FFFFFFF8 00000000 FFFFFFFF 00000000
INIT_REG   01c0 00FFFFFF 01000000 00FFFFFF 08000000
INIT_REG   019c 00000000 00000000 FFFFFFFF 00000000
INIT_REG   01a0 00000000 00000000 FFFFFFFF 00000000
INIT_REG   0198 00000000 00000000 FFFFFFFF 00000000
INIT_REG   01ac 00000000 00000000 FFFFFFFF 00000000
INIT_REG   01b0 00000000 00000000 FFFFFFFF 00000000
INIT_REG   01a8 00000000 00000000 FFFFFFFF 00000000
INIT_REG   0c40 FFDFFFFF 00000000 FFFFFFFF 00000000
WR_EN      0
WR_DIS     0


There are also definitions for cards with parallel ROM's, and definitions for
the ROM chips themselves.

The documentation released by AMD seems to suggest the GPIO registers are used
for flashing. There are a few matches for the above registers in the M56
documentation, so the registers may have remained the same across multiple
families:

INIT_REG   SEPROM_CNTL1 00FFFFFF 01000000 00FFFFFF 08000000
INIT_REG   GPIOPAD_A    00000000 00000000 FFFFFFFF 00000000
INIT_REG   GPIOPAD_EN   00000000 00000000 FFFFFFFF 00000000
INIT_REG   GPIOPAD_MASK 00000000 00000000 FFFFFFFF 00000000


I sill have to figure out how these register are manipulated to perform the
flashing; the table above only gives initialization values.

Ideally I would like to get a trace of the hardware access from one of the
flashing tools, however dosemu fails horribly when attempting to emulate them,
even with hardware access enabled. If anyone has a bright idea for tracing a
DOS program, leave a comment. :-)

Fun with R600...

  • Nov. 11th, 2007 at 1:03 AM
It looks like it's going to be more difficult to get R6XX series cards to work
with Revenge than I expected... Apparently R6XX series hardware uses a more
complex memory controller than previous generations.

This seems to be confirmed by a line in Xorg.0.log:

(--) fglrx(0): Using per-process page tables (PPPT) as GART.


I haven't been able to find anything in the documentation released by AMD that
would explain this new GART setup. They have probably omitted this
information.

Perhaps examining the process maps may find something...