Debugging
At some point while developing a MOOSE-based application you will probably need to use a debugger. A debugger will allow you to stop your program at certain points (or if things happen such as a memory segfault). Once stopped you can then carefully step through the program, inspecting variable values as you go in order to find the source of the problem.
In particular, if you ever see a "Segfault" or a "Signal 11" that means it's time to pull out the debugger. Any debugger will automatically stop once a segfault is reached, showing you exactly where the invalid memory access occurred.
For a good tutorial on debugging see Example 21
Debug Executable
The first step to debugging anything is to build a debug executable. By default MOOSE-based applications are built in "optimized" (opt
) mode. That ensures the fastest solves. However, an optimized executable is missing a lot of information that is useful to a debugger and the optimization process itself can cause code to get reordered (or even skipped!) making it difficult to step through a program.
To build an executable suitable for debugging you need to set the METHOD
environment variable to dbg
. You can export
it in your environment but it's usually simpler to use a UNIX shortcut that allows you to define environment variables at the same time you run a command, like so:
METHOD=dbg make -j 8
Always remember that the 8
should be modified to reflect the number of processors you want to use for the build (usually the number of cores in your computer).
Once the build is complete you should end up with a "debug executable" that look like: yourapp-dbg
. That executable is perfect for loading into a debugger. However, that executable will run VERY slowly - so make sure that before you begin debugging you come up with a problem that is as small as possible but still shows the problem you're trying to fix.
Debuggers
Many different debuggers exist: lldb
, gdb
, ddd
, Totalview and Intel Debugger are just a few. While a command-line debugger (like lldb
or gdb
) might seem daunting at first, they are an invaluable tool for quick debugging and debugging in complicated scenarios such as when you're running on a cluster. Learning one should be essential to any computational scientist.
For debugging MOOSE-based applications we recommend lldb
if you're using the clang compiler (default on Mac OSX) and gdb
for the gcc
compiler (default on Linux).
LLDB and GDB
lldb
and gdb
are very similar. They work on the "command-line" taking text input and moving through your program as it executes.
With our MOOSE package on Mac OSX you actually need to run lldb using sudo
so it has the elevated privileges it needs to attach to your program. To invoke lldb
with a MOOSE-based application on Mac OSX you would do:
sudo lldb -- ./yourapp-dbg -i inputfile.i
The --
tells lldb that any command-line options after that point need to be passed to the executable you are running. sudo
will ask you for your password so that it can elevate the priveleges of lldb
. You can also look below to see how to allow lldb
to run with sudo without using a password.
gdb
can be run with a similar command:
gdb --args ./yourapp-dbg -i inputfile.i
(On Linux this will generally work, without the need for sudo
)
Once this is done, your executable will be loaded but won't start running. This is an opportune time to set breakpoints. We usually recommmend setting a breakpoint on MPI_Abort
using the b
command:
b MPI_Abort
Then to start running your executable use the r
command (type r
then hit enter).
If a breakpoint (or fault) is reached: I recommend first using the bt
command to output a "back-trace" so you can see exactly what the current call-stack looks like to figure out where you are.
To quit the debugger use Ctrl+d
(by the way: that's a normal way to quit lots of command-line interpreters on Unix - including Python).
Another useful command is p
(for 'print') which allows you to print out the value of a variable. Just do:
p variablename
You can learn about the full set of commands by using help
or looking at any number of tutorials online.
Parallel Debugging
Firstly, if you don't have to debug in parallel DON'T! Only do parallel debugging when you have a problem that can only be reproduced when running in parallel. If your problem will show up in serial, it is MUCH easier to debug in serial. If it takes a long time for your problem to run so you are wanting to run it in parallel: don't do that... instead, try to make your problem smaller so that you can debug it in serial.
With all of that said: if you actually do need to debug in parallel, MOOSE has a couple of command-line arguments to help. However, before we get there, we need to do a bit of setup on Mac OSX:
Mac OSX Parallel Debugging Setup
As noted above, when running lldb
on Mac OSX we need to run with sudo
to give it permission to attach to our running program. sudo
will ask for your password, but unfortunately when doing parallel debugging there is no way to enter that password. Therefore, we need to make it so that you can run lldb
with sudo
without a password.
First thing is to get the full path to where lldb
is using the command-line:
which lldb
If you are using our package on OSX this should return something like /opt/moose/llvm-5.0.1/bin/lldb
. Make note of this location (copy it, or set that Terminal aside) because you'll need it in the next step.
To set lldb
to be able to run with sudo
without a password we need to modify the sudoers
file. To do that issue this command in a terminal (preferably a new one so you can still see the path to lldb
in the first one):
sudo visudo
Most-likely this is going to use vi
(a text editor) to open the sudoers
file (I say "most-likely" because it technically can open any editor based on the EDITOR
environment variable). If you are unfamiliar with vi
don't worry. Just press i
to go into "insert" mode. Navigate to the bottom of the file and add a line that looks like:
username ALL= NOPASSWD: /opt/moose/llvm-5.0.1/bin/lldb
Where username
MUST be replaced with your username! And the path to lldb
needs to reflect what you got back from which lldb
above. Once that line is in place press Esc
(to exit "insert mode") then type :wq
(that's a colon
then w
then q
- it's a command that says "write and quit") and press Enter
.
Once you've completed that you should be able to run sudo lldb
on the command-line and not need to enter your password. If you still need to enter your password, email moose-users
so we can figure out what's wrong before you go further.
Actually Parallel Debugging
With all of that setup we are ready to debug in parallel. There are two options:
1. Launch a terminal for each MPI process
What we're going to do is launch a terminal (using xterm
) for each MPI process. That terminal will run our debugger and attach to the running MPI process. For this to work, you either need to be on your local box or have X-forwarding set up over SSH (which I'm not going to go into here).
Let's assume that you're working on your local Mac workstation using our package. In order to launch your program with 4 MPI processes and 4 terminals for debugging you would do:
mpiexec -n 4 ./yourapp-dbg -i inputfile.i --start-in-debugger='sudo lldb'
(If you are on Linux - most-likely you will want to put gdb
where sudo lldb
is)
If everything is setup correctly you should see 4 XTerm windows show up with lldb
command-line prompts. Those debugger prompts are already attached to your running executable, but the executable is paused. This is an opportune time to set breakpoints, but you have to do it in each terminal seperately. For instance, you might want to go through each one and do:
b MPI_Abort
to be able to stop if MOOSE encounters an error.
Once you are ready to continue - you do just that. Use the c
command (type c
and hit Enter
) in each terminal window to tell it to continue. Once you go through all of the open terminal windows you should see your application start to run in your original terminal. If a breakpoint (or any fault) is reached on any process that terminal window will show the command-prompt again, allowing you to inspect variables, etc.
Passing debugger commands on the command line
lldb
accepts debugger commands through the -o
command line option that are executed as soon as the execuatble is loaded up and ready. This can be used to set breakpoints and immediately resume the execution of the app.
mpirun -n 4 ./yourapp-dbg -i inputfile.i --start-in-debugger "sudo lldb -o 'break set -E C++' -o cont"
The above command will set breakpoints that halt on thrown exceptions. Replace break set -E C++
with b MPI_Abort
to break on MOOSE errors. The -o cont
option will automatically run the app after the breakpoints are set.
2. Launch your application and tell it to wait so you can manually attach a debugger
This is going to be used in cases where you need to debug using LOTS of MPI processes, but you don't want a terminal window for each one. This is also handy if you're working on a cluster that doesn't have any way of doing X-forwarding for option #1 to work. The idea is to launch your application and have it wait during the initialization phase so you have time to attach a debugger manually to one of the processes.
To do this simply run your application like so:
mpiexec -n 4 ./yourapp-dbg -i inputfile.i --stop-for-debugger
This will cause your application to print something like the following and then wait 30 seconds (by default - if you need more time use --stop-for-debugger=75
where 75
is the number of seconds you want it to wait):
> mpiexec -n 4 ../../../moose_test-opt -i simple_diffusion.i --stop-for-debugger=2
Stopping for 2 seconds to allow attachment from a debugger.
All of the processes you can connect to:
rank - hostname - pid
0 - dereksmacpro.local - 53403
1 - dereksmacpro.local - 53404
2 - dereksmacpro.local - 53405
3 - dereksmacpro.local - 53406
Waiting...
The message there is telling you where each process is running and what its "process ID" (pid
) is. That is the relevant information you need to be able to attach a debugger to that process. In this case (on my local Mac) if I want to connect to "rank 2", in a separate Terminal I would do:
sudo lldb -p 53405
(gdb
has a similar mechanism, see its docs)
That will launch lldb
and attach to my running program. Attaching to the program "pauses" it - allowing me to set breakpoints and then use the c
command to tell it to continue.