Monday, March 15, 2010

Tdpkg 1.0 - speed up reading dpkg database

Hello,
you may have noticed that dpkg takes a long time reading the database the first time you run it (e.g. through apt). This is because of the huge number of /var/lib/dpkg/info/*.list files (1700+ on my desktop machines). It can take up to 14 seconds and more at cold start to install/remove a single package.
Since 2007 in dpkg mailing list a first proposal (by Sean Finney) to using sqlite as cache has been posted, then a couple of weeks ago I reproposed it. No reply since then from the maintainers.

My first idea was to fork dpkg and only change the part about reading the list files. This means you had to install another dpkg version, and I haven't done it for two main reasons: most of people wouldn't have replaced dpkg and it'd have been too hard to maintain it.
The solution is tdpkg, a shared library that wrappes around glibc function calls of dpkg. You'll find in README to backup your /var/lib/dpkg/info but tdpkg is robust enough to not fuck it up.

Tdpkg comes with tokyocabinet (faster) and sqlite (handles concurrency better) cache backends. I've managed to bring cold startup time from about 14 seconds down to about 2 seconds. I will definitely have fun installing and removing applications back again.

26 comments:

Anonymous said...

is it compatible with "plain" dpkg? I mean, can I use tdpkg or dpkg without any problem? or after the switch I should continue using tdpkg?

Is it "aptitude-compatible"?

D.

Andrew said...

For me the /var/lib/dpkg/info/tdpkg.cache file doesn't seem to be created for some reason. Any ideas what's wrong?

Andrew said...

Oops, I forgot to mention: I'm running Ubuntu Karmic 32bit and tried bot tdpkg 1.0 and the latest version from GIT with the same result

Treviño said...

Thanks I really was waitin for (or thinking to implement) something like that since long time!

Luca Bruno aka Lethalman said...

It's 100% full compatible with tdpkg, then dpkg, then apt and all variants. It only acts as a cache when possible. When you use dpkg then tdpkg again, the cache is rebuilt for consinstency.

For ubuntu it can be different, because debian uses eglibc. You can help me by doing objdump -T /usr/bin/dpkg|grep open and objdump -T /usr/bin/dpkg|grep stat

Thanks.

Anonymous said...

objdump -T /usr/bin/dpkg|grep open
00000000 DF *UND* 00000000 GLIBC_2.0 open
00000000 DF *UND* 00000000 GLIBC_2.1 fopen64
00000000 DF *UND* 00000000 GLIBC_2.1 fdopen
00000000 DF *UND* 00000000 GLIBC_2.0 opendir
00000000 DF *UND* 00000000 GLIBC_2.1 fopen
00000000 DF *UND* 00000000 GLIBC_2.2 open64


objdump -T /usr/bin/dpkg|grep stat


00000000 DF *UND* 00000000 GLIBC_2.1 statfs64
00000000 DF *UND* 00000000 GLIBC_2.0 __xstat
00000000 DF *UND* 00000000 GLIBC_2.2 __xstat64
00000000 DF *UND* 00000000 GLIBC_2.2 __lxstat64
00000000 DF *UND* 00000000 GLIBC_2.2 __fxstat64

Luca Bruno aka Lethalman said...

Output looks ok. Do you get any message starting with tdpkg:... when you run LD_PRELOAD=./libtdpkg.so dpkg -S test ?

david said...

actually there's a message.
I think it is something related to permissions of the cache file:
(861)-~% dpkg -S cdsvbfcsdfgbvcds
tdpkg tokio: no permission
tdpkg tokio: no permission
tdpkg: file /var/lib/dpkg/info/libexempi3.list not up-to-date in cache, rebuild cache
tdpkg tokio: no permission
tdpkg tokio: no permission
tdpkg: can't rebuild cache, no wrapping
tdpkg tokio: no permission


and if you need:
ls -alh /var/lib/dpkg/info/tdpkg.cache
-rw-r--r-- 1 root root 12M 2010-03-16 08:20 /var/lib/dpkg/info/tdpkg.cache

Andrew said...

I found out why it wasn't working for me: the alias doesn't work but running directly the command with LD_Preload works. The alias is listed when running "alias" but it just doesn't process it when running apt-get or dpkg for some reason. Any ideas regarding this.

Luca Bruno aka Lethalman said...

@david you need to be root at least for the first time in order to create the cache

@Andrew could you post the alias you're using? Don't use quotes in LD_PRELOAD when using the alias

I've deleted the message of Anonymous for privacy (explicitely requested by him).

Andrew said...

@Luca: Like I said, the alias shows up when running the command "alias" in a terminal but it just doesn't use it when running the actual dpkg.

I've used it like this:
alias dpkg='LD_PRELOAD=/home/andrei/tdpkg/libtdpkg.so /usr/bin/dpkg'

but also like this:
alias dpkg="LD_PRELOAD=/home/andrei/libtdpkg.so /usr/bin/dpkg"

Andrew said...

OK, I got it working by placing the alias in the /etc/bash.bashrc file instead of ~/.bashrc

Not tdpkg is triggered when running "dpkg -i *.deb" but I don't see anything related to it when running "apt-get install". Does this mean it only works for "dpkg -i ..." or must I do something else to get it working with apt-get?

Sorry for bothering you so much... I just wanted to present your tool to my blog's readers (and also use it myself, obviously) and I'm trying to make sure I have all the facts right.

Luca Bruno aka Lethalman said...

Nice Andrew. You have to alias also aptitude and apt-get, because they call dpkg bypassing the shell, so alias won't work. Thanks for bumping the news :)

Andrew said...

Thanks for answering so fast. Hmmm I can't seem to get apt-get to work.

Like so: alias apt-get='LD_PRELOAD=/root/tdpkg/libtdpkg.so /usr/bin/apt-get'

?

It doesn't seem to pick up tdpkg...

Luca Bruno aka Lethalman said...

I don't know Andrew, I'm able to use it as well as I use aptitude. I'm using zsh pheraphs apt-get can be incompatible with alias?

Anonymous said...

Good job! It works well and speeds things up dramatically.

jack

Anonymous said...

Could it depend on sudo?
Does sudo use the user alias?
Or should we put the alias in the root's .bashrc and invoke aptitude as root?

Just an idea.

Ciao

Luca Bruno aka Lethalman said...

Yes you should ensure root use those aliases too.

red_mage said...

porc.. è una scheggia!! prima ci metteva 24-25 secondi a caricare il database ora il processo è diventato istantaneo!! meraviglioso!è strano che ancora non l'abbiano introdotto ufficialmente nelle varie ditro debian-based
ottimo lavoro:-)

Mishoo said...

It's a shame that the maintainers didn't consider your idea for so long. This bug was first reported in 2000 [1] and it's still there. Just awful.

[1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=69192

Mishoo said...

Just managed to make it work with Ubuntu 9.10. What I did:

1. I put the aliases in /etc/bash.bashrc (for dpkg, apt-get and aptitude)

2. I needed to run some dpkg command as root to create the DB, but sudo *doesn't* work for some reason (perhaps it doesn't expand aliases). So just login as root.

3. To test, I did a sync and then echo 3 > /proc/sys/vm/drop_caches (just echo 1 didn't clear enough cache data, and I first thought it kicks in, but it was still using plain dpkg and with cache).

FYI I have over 11000 files in /var/lib/dpkg/info and the "Reading database" operation was painfully slow (took like one minute) although my hardware is more than decent (Core 2 Duo @ 2.8GHz, 4GB RAM, 7200 rpm hard drive).

Thanks for this great little tool that the DPKG maintainers should have done for years!

Luca Bruno aka Lethalman said...

Thanks for testing Mishoo. There's was a bug in Ubuntu too. Also consider new dpkg versions will finally speed up as well, up to 3 seconds on my pc without cache.

Anonymous said...

Well, I just tried both flavours (sqlite and tokyo) without much luck. It seems to manage to cache all 3894 .list files in the info dir but then starts spitting loads - and I mean loads - of error messages:

tdpkg: nested open(/var/lib/dpkg/info/libxcb-keysyms1.list, 0, 38) detected, no wrapping
tdpkg: nested __fxstat64(10) detected, no wrapping
tdpkg: nested read(10) detected, no wrapping
tdpkg: close() on unknown fd 10, no wrapping
tdpkg: nested open(/var/lib/dpkg/info/libxcb-event1-dev.list, 0, 29) detected, no wrapping
tdpkg: nested __fxstat64(10) detected, no wrapping
tdpkg: nested read(10) detected, no wrapping
tdpkg: close() on unknown fd 10, no wrapping
tdpkg: nested open(/var/lib/dpkg/info/wireshark-common.list, 0, 25) detected, no wrapping
tdpkg: nested __fxstat64(10) detected, no wrapping
tdpkg: nested read(10) detected, no wrapping
tdpkg: close() on unknown fd 10, no wrapping
tdpkg: nested open(/var/lib/dpkg/info/libunac1.list, 0, 34) detected, no wrapping
tdpkg: nested __fxstat64(10) detected, no wrapping
tdpkg: nested read(10) detected, no wrapping
tdpkg: close() on unknown fd 10, no wrapping
...
...
etc.

Instead of being faster it ends up being much slower...

Luca Bruno aka Lethalman said...

Tdpkg is not more compatible with current dpkg, which gained some speed finally. So you must not use it anymore. Though making it compatible again would be faster than dpkg, it's not worth it anymore like before.

purchase domain name said...

I enjoyed reading your articles. This is truly a great read for me. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work!

Anonymous said...

It seems I have a much clearer perspective after the completion of this article. I appreciate reading words that are written in such a concise and thoughtful way. Kimberly@reverse phone lookup