Debugging and profiling quick guide

strace -c -f -p 2021

Count( -c ) the system calls of process with pid 2021 and all the children spawned by that process( -f ). Produces nice set of stats:

<Ctrl-C> 
strace: Process 31867 detached
strace: Process 31868 detached
strace: Process 31869 detached
strace: Process 31870 detached
strace: Process 18996 detached
% time seconds usecs/call calls errors syscall
31.06 0.854875 3143 272 io_getevents
26.19 0.720795 34324 21 4 restart_syscall
14.65 0.403190 2965 136 19 futex
14.38 0.395702 2105 188 poll
12.12 0.333484 10421 32 select
0.76 0.020852 348 60 fsync
0.28 0.007772 13 605 201 recvfrom
0.21 0.005732 24 236 sendto
0.15 0.004123 29 140 io_submit
0.06 0.001635 51 32 pwrite64
0.04 0.001057 31 34 read
0.03 0.000827 44 19 madvise
0.02 0.000640 13 48 lseek
0.02 0.000582 34 17 write
0.01 0.000320 80 4 getdents
0.01 0.000215 15 14 fcntl
0.01 0.000138 23 6 setsockopt
0.00 0.000090 23 4 close
0.00 0.000061 31 2 accept
0.00 0.000059 30 2 shutdown
0.00 0.000057 29 2 openat
0.00 0.000052 26 2 access
0.00 0.000047 24 2 getpeername
0.00 0.000032 16 2 fstat
0.00 0.000030 15 2 gettid
0.00 0.000023 12 2 sched_yield

100.00 2.752390 1884 224 total

strace -f -p 2021 -o strace.log -s 500

Write strace of process with PID 2021 and all of its child processes to file strace.log and include up to 500 bytes of output for each line.

mkdir traces
cd traces
strace -ff -p 2021 -s 1000 -tt -o strace

For process 2021 and all its child processes, trace system calls and place them in files with names like strace.pid (that’s the -ff part). Include 1000 bytes of data per call(-s 1000) and let each call be marked with wall clock time(-tt).

strace -p 2021 -T -o strace.log

Trace process 2021 and mark each call by the time it took between start and end of the call. Log it to strace.log.

strace -p 2021 -e trace=file

Trace only file-related calls. See more categories: http://man7.org/linux/man-pages/man1/strace.1.html

FreeIPA

With quite a lot of both physical machines and even more virtual machines it can be tricky to keep file permissions on shared storage like Ceph in sync. Adding two users in the wrong order on a node can give them the wrong UIDs. This was less of a problem when I accessed files via CIFS/Samba but now I only do that for Windows-machines. Linux-machines use CephFS so UIDs and GIDs have to be correct on all nodes. Therefor I use FreeIPA to keep this stuff in sync. I spent what felt like months trying to get a multi-master LDAP cluster to work – though it might have been a weekend – but gave up.

Turns out FreeIPA has built-in support for replication of data. So I have freeipa01 running on a Ceph RBD disk allowing me to keep it running even as physical nodes are rebooted. freeipa02 is run on a local disk on one of my servers and it’s there in case Ceph stops working. FreeIPA also handles DNS which it replicates down to Bind daemons on three virtual machines that are also running on local storage. The DNS function NOTIFY is used to keep replication lag down when syncing down to the bind slaves.

What else does FreeIPA do? Kerberos and single-sign-on, currently only used for SSH. Certificate issuing, standard centralized PAM log on against the central user directory. I’ve come to use it quite a lot.