Kodewerx

Our culture has advanced beyond all that you could possibly comprehend with one hundred percent of your brain.
It is currently Sun Nov 10, 2024 5:01 am

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 5 posts ] 
Author Message
PostPosted: Fri May 15, 2009 6:19 pm 
Offline
Krew (Admin)
Krew (Admin)
User avatar

Joined: Sun Oct 01, 2006 9:26 pm
Posts: 3768
Title: All in a day's work.
Hackers familiar with [ring-0] debuggers know how useful breakpoints can be. I was able to implement breakpoint support in GCNrd because the PowerPC supports breakpoints in hardware. This means, there is an on-chip debugging facility which allows software to enable breakpoints on a given address.

For clarity, the term "watchpoint" used here refers to read/write breakpoints. And "breakpoint" refers to execute breakpoints.

Unfortunately, not all CPUs support hardware breakpoints or watchpoints. ARM9 is an example of a modern CPU without hardware debugging support (not counting JTAG). This is sad, because it makes supporting breakpoints and watchpoints in a ring-0 debugger very difficult. There are only two solutions for this kind of situation: 1) replace the CPU with an ICE. 2) Use software breakpoints and watchpoints.

Point one can be extremely difficult. But it can also be the only way to provide breakpoint/watchpoint support on some architectures (especially older CPUs and architectures which run all code from ROM). ICE support is beyond the scope of this discussion.

Point two provides an interesting alternative: Dynamically modify the running code to keep watch on specific conditions at every point of the program.

In terms of watchpoints, the basic idea can be thought of as "self emulation" (instruction interpretation) -- the ring-0 debugger reads instructions in the program flow, and either fully or partially emulates those instructions. In a partial-emulation system, many of the instructions can be run entirely natively, just like they would without the debugger present. Only certain instructions are "trapped" ... These are typically program-control instructions (jumps, branches, other PC-modifying instructions...) and any "interesting" instructions which may read or write memory. To avoid potential performance loss, it's also recommended to use some form of read/write prediction. For example, if the target address is not within the stack, it's probably safe to avoid trapping on push/pop instructions.

Software breakpoints are a lot easier. Many CPUs support software breakpoints in the form of a specific instruction which causes an exception when executed. This kind of instruction is usually called a "break" or "trap" instruction. CPUs which do not support this specific kind of instruction (but do support instruction exceptions) can be tricked into causing an exception by using any invalid instruction. CPUs which support neither must resort to jump instructions.

The idea here is to overwrite the instruction(s) at the target address with your "breakpoint instruction" ... and when it executes, you need an exception handler to notify of the breakpoint hit. (Jump breakpoints just need a subroutine to jump to.)

The important thing to remember is to execute the replaced instruction (with both software breakpoints and watchpoints) in the same context as it was originally placed. Or emulate the instruction and context. "Context" refers to the register set, CPU status flags, Program Counter, Stack... etc. Executing the original instruction in the original context can be difficult in the case of software breakpoints. Using pure instruction interpretation on PC-relative instructions is the best bet on some CPUs, for example. The simplest solution is putting the original instruction(s) back and returning to the address where the exception occurred. But this means your breakpoint is disabled.

As you may have guessed, the additional code from software breakpoints/watchpoints adds a lot of overhead to the running program. It can be mitigated with some clever prediction (not trapping within "obvious" loops, ...) but it will still run slower than the native application. An issue with software breakpoints is that other parts of code can overwrite them at will. Code that does integrity checking over itself (CRCs, checksums, etc.) will fail when software breakpoints are enabled within the checked ranges. Using the control-flow idea for software breakpoints can effectively bypass these two trouble areas with code reading/writing itself.

Yet another means of software breakpoints/watchpoints can be achieved with timers. Some CPUs contain watchdog timers which can be exploited to cause an interrupt after every clock cycle. Some architectures contain external timers which can also be used. The main problems with these ideas is making the timing accurate, and sharing the timer resources with the software being debugged. A slightly different approach uses a hardware modification to wire a clock directly to the CPU's interrupt input pin(s). Some hardware/software tricks can be used to enable and disable the "custom timer hack" at will. This eliminates the resource sharing problem, but the issue of timing remains.

For some examples of the emulation ideas on an ARM platform, Zeld has been doing some work with NDS. We held a conversation in private messages where we discussed some implementation details. (Specifically, exploiting the condition field of ARM instructions to interpret conditional branches extremely quickly.) If there is any interest in this, I'll post some excerpts from the conversation, with Zeld's blessing. I've also offered some hg hosting to him, so if that pans out, he'll have a source code repository available with his work.

_________________
I have to return some video tapes.

Feed me a stray cat.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri May 15, 2009 8:25 pm 
Offline
Komrade
Komrade

Joined: Tue Mar 27, 2007 10:18 am
Posts: 1328
I'll get on that soon (I hope; if not, I may just host the code through different means temporarily).

Anyway, it seems what I mentioned about a timer is different than what you've discussed. For one, the timer I was talking about was stored in WRAM and manually operated by user (my) code.

It was just a lame hack to make sure that an amount of time that was "long" relative to the speed of the processor but "short" relative to the debugger (here referring to the person doing the debugging, not the debugging tool) would elapse before any attempt to locate another instruction to trap was made.

And as I've mentioned, the code I had running (and tested with small success in one case) advanced through memory, not execution, therefore being prone to difficulties with dynamic code. Because the instructions it was trapping as such were not always being executed, there was less slow down, but also the possibility for missing hits.

If you'll pardon the oxymoron.

Post whatever you like from the conversation, though hopefully the code will speak for itself. I want to warn you (again?) that it's not my best work, and that if it had been, it still would be sub par. Or rather, I imagine its reception would suggest so.

Edit: Just checked.

The code is from 2 Christmases ago (yet another Christmas I spent coding in assembly instead of spending time with friends and/or family - at least this time I wasn't hand assembling my code!).

So, yeah, that's my excuse, if you don't like it. :P

Edit, again: Actually, Para, can you go ahead and set up and/or explain how to use a repository at the hg.kodewerx domain for this particular component of the SRDP's reach?

_________________
Image


Top
 Profile  
Reply with quote  
PostPosted: Fri May 22, 2009 2:28 am 
Offline
Komrade
Komrade
User avatar

Joined: Tue Mar 27, 2007 6:23 pm
Posts: 1354
Location: Mario Raceway, 1509.831, 217.198, -564.429
Title: Mario Kart 64 Hacker
I'm curious why this is all being done in private.

Also, re: instruction interpretation: all you really need to say is Valgrind.

_________________
Image 143
HyperNova Software is now live (but may take a few tries to load) currently down; check out my PSP/DS/Game Boy/Windows/Linux homebrew, ROM hacks, and Gameshark codes!


Top
 Profile  
Reply with quote  
PostPosted: Fri May 22, 2009 9:43 pm 
Offline
Krew (Admin)
Krew (Admin)
User avatar

Joined: Sun Oct 01, 2006 9:26 pm
Posts: 3768
Title: All in a day's work.
It's not being done in private. Which is the reason for this thread.

P.S. Yes, Valgrind does a lot of static and dynamic analysis of compiled code.

_________________
I have to return some video tapes.

Feed me a stray cat.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat May 23, 2009 8:47 am 
Offline
Komrade
Komrade

Joined: Tue Mar 27, 2007 10:18 am
Posts: 1328
The code, if anyone's interested.

What I did was use 0xAAAAAAA for all of the variable addresses, assembled it and then left it as an exercise for the user to replace the temporary values with suitable ones.

You'll have to do that again; the TT code I had formatted that way uses a less optimized assembly so I don't feel inclined to post it.

_________________
Image


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 5 posts ] 

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group