Friday, August 08, 2014

Nix pill 9: automatic runtime dependencies

Welcome to the 9th Nix pill. In the previous 8th pill we wrote a generic builder for autotools projects. We feed build dependencies, a source tarball, and we get a Nix derivation as a result.

Today we stop by the GNU hello world program to analyze build and runtime dependencies, and enhance the builder in order to avoid unnecessary runtime dependencies.

Build dependencies


Let's start analyzing build dependencies for our GNU hello world package:
$ nix-instantiate hello.nix
/nix/store/z77vn965a59irqnrrjvbspiyl2rph0jp-hello.drv
$ nix-store -q --references /nix/store/z77vn965a59irqnrrjvbspiyl2rph0jp-hello.drv
/nix/store/0q6pfasdma4as22kyaknk4kwx4h58480-hello-2.9.tar.gz
/nix/store/1zcs1y4n27lqs0gw4v038i303pb89rw6-coreutils-8.21.drv
/nix/store/2h4b30hlfw4fhqx10wwi71mpim4wr877-gnused-4.2.2.drv
/nix/store/39bgdjissw9gyi4y5j9wanf4dbjpbl07-gnutar-1.27.1.drv
/nix/store/7qa70nay0if4x291rsjr7h9lfl6pl7b1-builder.sh
/nix/store/g6a0shr58qvx2vi6815acgp9lnfh9yy8-gnugrep-2.14.drv
/nix/store/jdggv3q1sb15140qdx0apvyrps41m4lr-bash-4.2-p45.drv
/nix/store/pglhiyp1zdbmax4cglkpz98nspfgbnwr-gnumake-3.82.drv
/nix/store/q9l257jn9lndbi3r9ksnvf4dr8cwxzk7-gawk-4.1.0.drv
/nix/store/rgyrqxz1ilv90r01zxl0sq5nq0cq7v3v-binutils-2.23.1.drv
/nix/store/qzxhby795niy6wlagfpbja27dgsz43xk-gcc-wrapper-4.8.3.drv
/nix/store/sk590g7fv53m3zp0ycnxsc41snc2kdhp-gzip-1.6.drv
It has exactly the derivations referenced in the derivation function, nothing more, nothing less. Some of them might not be used at all, however given that our generic mkDerivation function always pulls such dependencies (think of it like build-essential of Debian), for every package you build from now on, you will have these packages in the nix store.

Why are we looking at .drv files? Because the hello.drv file is the representation of the build action to perform in order to build the hello out path, and as such it also contains the input derivations needed to be built before building hello.

Digression about NAR files


NAR is the Nix ARchive. First question: why not tar? Why another archiver? Because commonly used archivers are not deterministic. They add padding, they do not sort files, they add timestamps, etc.. Hence NAR, a very simple deterministic archive format being used by Nix for deployment.
NARs are also used extensively within Nix itself as we'll see below.

For the rationale and implementation details you can find more in the Dolstra's PhD Thesis.

To create NAR archives, it's possible to use nix-store --dump and nix-store --restore. Those two commands work regardless of /nix/store.

Runtime dependencies


Something is different for runtime dependencies however. Build dependencies are automatically recognized by Nix once they are used in any derivation call, but we never specify what are the runtime dependencies for a derivation.

There's really black magic involved. It's something that at first glance makes you think "no, this can't work in the long term", but at the same it works so well that a whole operating system is built on top of this magic.

In other words, Nix automatically computes all the runtime dependencies of a derivation, and it's possible thanks to the hash of the store paths.

Steps:
  1. Dump the derivation as NAR, a serialization of the derivation output. Works fine whether it's a single file or a directory.
  2. For each build dependency .drv and its relative out path, search the contents of the NAR for this out path.
  3. If found, then it's a runtime dependency.

You get really all the runtime dependencies, and that's why Nix deployments are so easy.
$ nix-instantiate hello.nix
/nix/store/z77vn965a59irqnrrjvbspiyl2rph0jp-hello.drv
$ nix-store -r /nix/store/z77vn965a59irqnrrjvbspiyl2rph0jp-hello.drv
/nix/store/a42k52zwv6idmf50r9lps1nzwq9khvpf-hello
$ nix-store -q --references /nix/store/a42k52zwv6idmf50r9lps1nzwq9khvpf-hello
/nix/store/94n64qy99ja0vgbkf675nyk39g9b978n-glibc-2.19
/nix/store/8jm0wksask7cpf85miyakihyfch1y21q-gcc-4.8.3
/nix/store/a42k52zwv6idmf50r9lps1nzwq9khvpf-hello
Ok glibc and gcc. Well, gcc really should not be a runtime dependency!
$ strings result/bin/hello|grep gcc
/nix/store/94n64qy99ja0vgbkf675nyk39g9b978n-glibc-2.19/lib:/nix/store/8jm0wksask7cpf85miyakihyfch1y21q-gcc-4.8.3/lib64
Oh Nix added gcc because its out path is mentioned in the "hello" binary. Why is that? That's the ld rpath. It's the list of directories where libraries can be found at runtime. In other distributions, this is usually not abused. But in Nix, we have to refer to particular versions of libraries, thus the rpath has an important role.

The build process adds that gcc lib path thinking it may be useful at runtime, but really it's not. How do we get rid of it? Nix authors have written another magical tool called patchelf, which is able to reduce the rpath to the paths that are really used by the binary.

Not only, even after reducing the rpath the hello binary would still depend upon gcc. Because of debugging information. For that, the well known strip can be used.

Another phase in the builder


We will add a new phase to our autotools builder. The builder has these phases already:
  1. First the environment is set up
  2. Unpack phase: we unpack the sources in the current directory (remember, Nix changes dir to a temporary directory first)
  3. Change source root to the directory that has been unpacked
  4. Configure phase: ./configure
  5. Build phase: make
  6. Install phase: make install
We add a new phase after the installation phase, which we call fixup phase. At the end of the builder.sh follows:
find $out -type f -exec patchelf --shrink-rpath '{}' \; -exec strip '{}' \; 2>/dev/null
That is, for each file we run patchelf --shrink-rpath and strip. Note that we used two new commands here, find and patchelf. These two deserve a place in baseInputs of autotools.nix as findutils and patchelf.

Rebuild hello.nix and...:
$ nix-build hello.nix
[...]
$ nix-store -q --references result
/nix/store/94n64qy99ja0vgbkf675nyk39g9b978n-glibc-2.19
/nix/store/md4a3zv0ipqzsybhjb8ndjhhga1dj88x-hello
...only glibc is the runtime dependency. Exactly what we wanted.

The package is self-contained, copy its closure on another machine and you will be able to run it. I remind you the very few components under the /nix/store necessary to run nix when we installed it. The hello binary will use that exact version of glibc library and interpreter, not the system one:
$ ldd result/bin/hello
 linux-vdso.so.1 (0x00007fff11294000)
 libc.so.6 => /nix/store/94n64qy99ja0vgbkf675nyk39g9b978n-glibc-2.19/lib/libc.so.6 (0x00007f7ab7362000)
 /nix/store/94n64qy99ja0vgbkf675nyk39g9b978n-glibc-2.19/lib/ld-linux-x86-64.so.2 (0x00007f7ab770f000)
Of course, the executable runs fine as long as everything is under the /nix/store path.

Conclusion


Short post compared to previous ones as I'm still on vacation, but I hope you enjoyed it. Nix provides tools with cool features. In particular, Nix is able to compute all runtime dependencies automatically for us. Not only shared libraries, but also referenced executables, scripts, Python libraries etc..

This makes packages self-contained, because we're sure (apart data and configuration) that copying the runtime closure on another machine is sufficient to run the program. That's why Nix has one-click install, or reliable deployment in the cloud. All with one tool.


Next pill


...we will introduce nix-shell. With nix-build we build derivations always from scratch: the source gets unpacked, configured, built and installed. But this may take a long time, think of WebKit. What if we want to apply some small changes and compile incrementally instead, yet keeping a self-contained environment similar to nix-build?

Pill 10 is available for reading here.

To be notified about the new pill, stay tuned on #NixPills, follow @lethalman or subscribe to the nixpills rss.

8 comments:

Unknown said...

Hello,
Great post series!
I just have a question at this point, what about runtime dependencies which are not compile time dependencies? Such as for interpreted languages?
I don't see how there would be a mechanism to ensure that these are satisfied. Is there a policy instead?

Luca Bruno said...

Let's say you need to depend on python. Most probably, the packaged executable will have a shebang like #!/usr/bin/env python.
This is what happens with the common stdenv utilities in nix:
1. Put python in buildInputs, so this is a build time dep.
2. Check every executable in the output for the shebang.
3. Determine the shebang needs python from the env, rewrite it using the full nix store path as something like: #!/nix/store/..../bin/python

It's not exactly like this perhaps for python, but that's the idea. Basically rewrite non-absolute interpreters to absolute nix store paths. Hence build dependencies become runtime dependencies.

Of course, it's possible that we miss some runtime dependencies in general.

The statement is: runtime dependencies are a subset of build time dependencies. But there's no way to determine runtime dependencies from a software automatically unfortunately :)

DavidS said...

Thank you for the great post!

Following your instructions, I made a docker image using Nix. After loading it into docker and running it, I see many unnecessary dependencies stored at /nix. The glibc libraries are of course needed by most of the included binaries, but the compilers themselves, such as gcc-5.4, gcc-6.4, dmd-2.075, dmd-2067, and ldc-1.3 are not needed and they are adding a lot of extra space.

I have two questions:

1. How could I modify the Nix script for the package or the docker image in order to preclude or remove the above compilers as runtime dependencies?

2. The CREATED field of the image as shown as `docker images` or `docker inspect` is incorrect. How could I change the CREATED field value?

Here is a minimal example of what I did:
https://github.com/djhshih/muscle-nix

Thanks so much!

DavidS said...

Sorry, to follow up with my last post...

It appears that it is the `ldc` package that is causing the compiler-inclusion/size-inflation problem. I re-made the package that required `ldc` using a pre-compiled binary so that `ldc` is no longer needed as one of its buildInputs. In the final docker image, none of the compilers I mentioned above are included...

(Since the minimal example does not depend on `ldc`, it cannot be used to reproduce this problem.)

So to rephrase my first question:

1. How could the `ldc` package or the dependency on `ldc` be changed in order to avoid pulling in the ldc compiler (along with gcc compilers) into the docker image?

Luca Bruno said...

I DavisS, if you want to create a docker image you may try this instead: http://lethalman.blogspot.ie/2016/04/cheap-docker-images-with-nix_15.html

pandith13 said...

Thank you for the great post!
SRI ANNAPOORNESHAWARI ASTROLOGY CENTER.Best Astrologer In Pennsylvania

Pandith13 said...

Great post, thanks

SRI ANNAPOORNESHAWARI ASTROLOGY CENTER.Best Astrologer In JP Nagar

Vasudeva said...

This is a good article. Thanks for sharing

SRIKRISHANA ASTROLOGY.Vashikaran Astrologer in Kolar