28 February 2012

85. Nvidia bug causes evolution to crash/segmentation fault. Temporary and permanent fixes on Debian Testing

The nvidia-tls bug is affecting evolution too...
(and it's not just GNOME - http://www.linuxmintusers.de/index.php?topic=6859.0)


UPDATE: I have two nvidia boxes running debian testing. Only the one with GT 430 is exhibiting problems. My GT 520 box is unaffected.


UPDATE: Here's how to downgrade your drivers:
http://verahill.blogspot.com.au/2012/03/debian-testing-downgrading-nvidia.html

If you don't want to read the entire post, here's the summary:
1. I think the only semi-permanent solution is to downgrade from 295.20 to nvidia driver version 290.10
2. you can run evolution with
strace -o evolution.log evolution
and IT WILL NOT CRASH
3. It doesn't matter whether you use the nvidia binary straight from nvidia, using sgfxi, or use the nvidia-kernel-dkms/glx debian way. Evolution still dies.

PS strace is normally used to track system calls for the purpose trouble shooting. That it prevents evolution from crashing is completely unintended. But it works as a quick-fix.

PPS  What it does:

"The nvidia-tls libraries (/usr/lib/libnvidia-tls.so.x.y.z and /usr/lib/tls/libnvidia-tls.so.x.y.z); these files provide thread local storage support for the NVIDIA OpenGL libraries (libGL, libGLcore, and libglx). Each nvidia-tls library provides support for a particular thread local storage model (such as ELF TLS), and the one appropriate for your system will be loaded at run time."




The symptoms:
Start  evolution, and it will crash with a segmentation fault within the first ten seconds or so

dmesg points to the nvidia bug:

[19690.606196] evolution[13032]: segfault at 10 ip 00007f5a0f53ac0f sp 00007f59ddde6508 error 6 in libnvidia-tls.so.295.20[7f5a0f53a000+3000]
[21476.236668] evolution[18197]: segfault at 10 ip 00007fd4389c2c0f sp 00007fd418d56508 error 6 in libnvidia-tls.so.295.20[7fd4389c2000+3000]
[21513.224145] evolution[18387]: segfault at 10 ip 00007f2cd3e85c0f sp 00007f2cb3a44508 error 6 in libnvidia-tls.so.295.20[7f2cd3e85000+3000]
[21954.867694] evolution[19803]: segfault at 10 ip 00007f1680aa9c0f sp 00007f165bffe508 error 6 in libnvidia-tls.so.295.20[7f1680aa9000+3000]
[22129.426444] evolution[20435]: segfault at 10 ip 00007f2a05bf8c0f sp 00007f29e5725508 error 6 in libnvidia-tls.so.295.20[7f2a05bf8000+3000]


Running
CAMEL_DEBUG=all evolution >& evolution.log
three times had it crash with

First time:

DB SQL operation [BEGIN] started
Camel SQL Exec:
BEGIN
Camel SQL Exec:
COMMIT
DB Operation ended. Time Taken : 0.000060
###########
received: * LSUB (\HasNoChildren) "/" "INBOX"
received: B00005 OK Success
sending : B00006 LIST "" "*"
--> Segmentation fault

Second time:

===========
DB SQL operation [ATTACH DATABASE ':memory:' AS mem] started
Camel SQL Exec:
ATTACH DATABASE ':memory:' AS mem
POP3_STREAM_LINE (25): '-ERR unrecognized command'
DB Operation ended. Time Taken : 0.011516
###########

Database succesfully opened  

===========
DB SQL operation [ATTACH DATABASE ':memory:' AS mem] started
Camel SQL Exec:
ATTACH DATABASE ':memory:' AS mem
DB Operation ended. Time Taken : 0.010961
###########
**
GLib-GIO:ERROR:/tmp/buildd/glib2.0-2.30.2/./gio/gdbusmessage.c:1986:append_value_to_blob: assertion failed: (g_utf8_validate (v, -1, &end) && (end == v + len))
--> Segmentation fault

Third time:

===========
DB SQL operation [BEGIN] started
Camel SQL Exec:
BEGIN
Camel SQL Exec:
COMMIT
DB Operation ended. Time Taken : 0.000070
###########
sending : A00004 SELECT INBOX
--> Segmentation fault

strace:
I can't crash evolution with either strace or valgrind running. Now why is that?


Solution:
Downgrading. Which turns out to be more difficult than one would imagine.

UPDATE: Here's how to downgrade your drivers:
http://verahill.blogspot.com.au/2012/03/debian-testing-downgrading-nvidia.html

If you don't want to downgrade the nvidia drivers:
A temporary solution is, odd as it may seem, to use
strace -o evolution.log evolution
because it just refuses to crash. I don't know why, but it works. 

2 comments:

  1. Thanks for sharing. You might consider

    strace -o /dev/null evolution

    which prevents evolution cluttering your harddisk.

    ReplyDelete
    Replies
    1. Good point. I was logging with the intent of figuring out why it was crashing, but if you want to use the strace hack as a stop-gap measure, then dumping the data to /dev/null as you suggest makes excellent sense.

      Delete