Monday, March 15, 2010

Tdpkg 1.0 - speed up reading dpkg database

Hello,
you may have noticed that dpkg takes a long time reading the database the first time you run it (e.g. through apt). This is because of the huge number of /var/lib/dpkg/info/*.list files (1700+ on my desktop machines). It can take up to 14 seconds and more at cold start to install/remove a single package.
Since 2007 in dpkg mailing list a first proposal (by Sean Finney) to using sqlite as cache has been posted, then a couple of weeks ago I reproposed it. No reply since then from the maintainers.

My first idea was to fork dpkg and only change the part about reading the list files. This means you had to install another dpkg version, and I haven't done it for two main reasons: most of people wouldn't have replaced dpkg and it'd have been too hard to maintain it.
The solution is tdpkg, a shared library that wrappes around glibc function calls of dpkg. You'll find in README to backup your /var/lib/dpkg/info but tdpkg is robust enough to not fuck it up.

Tdpkg comes with tokyocabinet (faster) and sqlite (handles concurrency better) cache backends. I've managed to bring cold startup time from about 14 seconds down to about 2 seconds. I will definitely have fun installing and removing applications back again.

28 comments:

  1. Anonymous6:06 AM

    is it compatible with "plain" dpkg? I mean, can I use tdpkg or dpkg without any problem? or after the switch I should continue using tdpkg?

    Is it "aptitude-compatible"?

    D.

    ReplyDelete
  2. For me the /var/lib/dpkg/info/tdpkg.cache file doesn't seem to be created for some reason. Any ideas what's wrong?

    ReplyDelete
  3. Oops, I forgot to mention: I'm running Ubuntu Karmic 32bit and tried bot tdpkg 1.0 and the latest version from GIT with the same result

    ReplyDelete
  4. Thanks I really was waitin for (or thinking to implement) something like that since long time!

    ReplyDelete
  5. It's 100% full compatible with tdpkg, then dpkg, then apt and all variants. It only acts as a cache when possible. When you use dpkg then tdpkg again, the cache is rebuilt for consinstency.

    For ubuntu it can be different, because debian uses eglibc. You can help me by doing objdump -T /usr/bin/dpkg|grep open and objdump -T /usr/bin/dpkg|grep stat

    Thanks.

    ReplyDelete
  6. Anonymous1:14 PM

    objdump -T /usr/bin/dpkg|grep open
    00000000 DF *UND* 00000000 GLIBC_2.0 open
    00000000 DF *UND* 00000000 GLIBC_2.1 fopen64
    00000000 DF *UND* 00000000 GLIBC_2.1 fdopen
    00000000 DF *UND* 00000000 GLIBC_2.0 opendir
    00000000 DF *UND* 00000000 GLIBC_2.1 fopen
    00000000 DF *UND* 00000000 GLIBC_2.2 open64


    objdump -T /usr/bin/dpkg|grep stat


    00000000 DF *UND* 00000000 GLIBC_2.1 statfs64
    00000000 DF *UND* 00000000 GLIBC_2.0 __xstat
    00000000 DF *UND* 00000000 GLIBC_2.2 __xstat64
    00000000 DF *UND* 00000000 GLIBC_2.2 __lxstat64
    00000000 DF *UND* 00000000 GLIBC_2.2 __fxstat64

    ReplyDelete
  7. Output looks ok. Do you get any message starting with tdpkg:... when you run LD_PRELOAD=./libtdpkg.so dpkg -S test ?

    ReplyDelete
  8. david2:06 PM

    actually there's a message.
    I think it is something related to permissions of the cache file:
    (861)-~% dpkg -S cdsvbfcsdfgbvcds
    tdpkg tokio: no permission
    tdpkg tokio: no permission
    tdpkg: file /var/lib/dpkg/info/libexempi3.list not up-to-date in cache, rebuild cache
    tdpkg tokio: no permission
    tdpkg tokio: no permission
    tdpkg: can't rebuild cache, no wrapping
    tdpkg tokio: no permission


    and if you need:
    ls -alh /var/lib/dpkg/info/tdpkg.cache
    -rw-r--r-- 1 root root 12M 2010-03-16 08:20 /var/lib/dpkg/info/tdpkg.cache

    ReplyDelete
  9. I found out why it wasn't working for me: the alias doesn't work but running directly the command with LD_Preload works. The alias is listed when running "alias" but it just doesn't process it when running apt-get or dpkg for some reason. Any ideas regarding this.

    ReplyDelete
  10. @david you need to be root at least for the first time in order to create the cache

    @Andrew could you post the alias you're using? Don't use quotes in LD_PRELOAD when using the alias

    I've deleted the message of Anonymous for privacy (explicitely requested by him).

    ReplyDelete
  11. @Luca: Like I said, the alias shows up when running the command "alias" in a terminal but it just doesn't use it when running the actual dpkg.

    I've used it like this:
    alias dpkg='LD_PRELOAD=/home/andrei/tdpkg/libtdpkg.so /usr/bin/dpkg'

    but also like this:
    alias dpkg="LD_PRELOAD=/home/andrei/libtdpkg.so /usr/bin/dpkg"

    ReplyDelete
  12. OK, I got it working by placing the alias in the /etc/bash.bashrc file instead of ~/.bashrc

    Not tdpkg is triggered when running "dpkg -i *.deb" but I don't see anything related to it when running "apt-get install". Does this mean it only works for "dpkg -i ..." or must I do something else to get it working with apt-get?

    Sorry for bothering you so much... I just wanted to present your tool to my blog's readers (and also use it myself, obviously) and I'm trying to make sure I have all the facts right.

    ReplyDelete
  13. Nice Andrew. You have to alias also aptitude and apt-get, because they call dpkg bypassing the shell, so alias won't work. Thanks for bumping the news :)

    ReplyDelete
  14. Thanks for answering so fast. Hmmm I can't seem to get apt-get to work.

    Like so: alias apt-get='LD_PRELOAD=/root/tdpkg/libtdpkg.so /usr/bin/apt-get'

    ?

    It doesn't seem to pick up tdpkg...

    ReplyDelete
  15. I don't know Andrew, I'm able to use it as well as I use aptitude. I'm using zsh pheraphs apt-get can be incompatible with alias?

    ReplyDelete
  16. Anonymous2:14 AM

    Good job! It works well and speeds things up dramatically.

    jack

    ReplyDelete
  17. Anonymous7:11 AM

    Could it depend on sudo?
    Does sudo use the user alias?
    Or should we put the alias in the root's .bashrc and invoke aptitude as root?

    Just an idea.

    Ciao

    ReplyDelete
  18. Yes you should ensure root use those aliases too.

    ReplyDelete
  19. porc.. è una scheggia!! prima ci metteva 24-25 secondi a caricare il database ora il processo è diventato istantaneo!! meraviglioso!è strano che ancora non l'abbiano introdotto ufficialmente nelle varie ditro debian-based
    ottimo lavoro:-)

    ReplyDelete
  20. It's a shame that the maintainers didn't consider your idea for so long. This bug was first reported in 2000 [1] and it's still there. Just awful.

    [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=69192

    ReplyDelete
  21. Just managed to make it work with Ubuntu 9.10. What I did:

    1. I put the aliases in /etc/bash.bashrc (for dpkg, apt-get and aptitude)

    2. I needed to run some dpkg command as root to create the DB, but sudo *doesn't* work for some reason (perhaps it doesn't expand aliases). So just login as root.

    3. To test, I did a sync and then echo 3 > /proc/sys/vm/drop_caches (just echo 1 didn't clear enough cache data, and I first thought it kicks in, but it was still using plain dpkg and with cache).

    FYI I have over 11000 files in /var/lib/dpkg/info and the "Reading database" operation was painfully slow (took like one minute) although my hardware is more than decent (Core 2 Duo @ 2.8GHz, 4GB RAM, 7200 rpm hard drive).

    Thanks for this great little tool that the DPKG maintainers should have done for years!

    ReplyDelete
  22. Thanks for testing Mishoo. There's was a bug in Ubuntu too. Also consider new dpkg versions will finally speed up as well, up to 3 seconds on my pc without cache.

    ReplyDelete
  23. Anonymous5:44 AM

    Well, I just tried both flavours (sqlite and tokyo) without much luck. It seems to manage to cache all 3894 .list files in the info dir but then starts spitting loads - and I mean loads - of error messages:

    tdpkg: nested open(/var/lib/dpkg/info/libxcb-keysyms1.list, 0, 38) detected, no wrapping
    tdpkg: nested __fxstat64(10) detected, no wrapping
    tdpkg: nested read(10) detected, no wrapping
    tdpkg: close() on unknown fd 10, no wrapping
    tdpkg: nested open(/var/lib/dpkg/info/libxcb-event1-dev.list, 0, 29) detected, no wrapping
    tdpkg: nested __fxstat64(10) detected, no wrapping
    tdpkg: nested read(10) detected, no wrapping
    tdpkg: close() on unknown fd 10, no wrapping
    tdpkg: nested open(/var/lib/dpkg/info/wireshark-common.list, 0, 25) detected, no wrapping
    tdpkg: nested __fxstat64(10) detected, no wrapping
    tdpkg: nested read(10) detected, no wrapping
    tdpkg: close() on unknown fd 10, no wrapping
    tdpkg: nested open(/var/lib/dpkg/info/libunac1.list, 0, 34) detected, no wrapping
    tdpkg: nested __fxstat64(10) detected, no wrapping
    tdpkg: nested read(10) detected, no wrapping
    tdpkg: close() on unknown fd 10, no wrapping
    ...
    ...
    etc.

    Instead of being faster it ends up being much slower...

    ReplyDelete
  24. Tdpkg is not more compatible with current dpkg, which gained some speed finally. So you must not use it anymore. Though making it compatible again would be faster than dpkg, it's not worth it anymore like before.

    ReplyDelete
  25. I enjoyed reading your articles. This is truly a great read for me. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work!

    ReplyDelete
  26. Nice post!
    SRI ANNAPOORNESHAWARI ASTROLOGY CENTER.Best Astrologer In Indiana

    ReplyDelete
  27. Thank you for the great post!

    DURGAANUGARHA ASTROLOGY.Best Astrologer In belgaum

    ReplyDelete
  28. Thanks for providing this information. It's very helpful.

    ABHIRAM ASTROLOGY CENTER.Best Astrologer In edmonton

    ReplyDelete