Sunday, July 7, 2013

Automatic Binding Generation With Verrazano

Verrazano is a library for automatically creating Foreign Function Interface (FFI) bindings for Common Lisp. It is easy to use, as long as it works, and it can save you a lot of busy work. However, it is not well documented, and the documentation that is available is incorrect. Hopefully this post will help out people that don't want to figure it out themselves (including myself in 3 months).

As a sample of how to use Verrazano, I'm going to generate a set of bindings for the NLopt library from our friends over at MIT. This library is a great tool for optimization, whether it be minimization/maximization, fitting, or any optimization as it is very flexible in the constraints that the problem can have. It puts many of the optimization facilities in GSL to shame. I first started using it because of its derivative free local optimization routines and really went on to appreciate it when I realized I needed non-trivial constraints on the parameters in my model fits (GSL can't even handle trivial ones, really).

Up until now, I have been using some hand crafted bindings. This library is pretty small, so it actually isn't much of a burden to maintain for my purposes. But let's see how to generate them with Verrazano. We will use the function generate-binding which takes a list which represents a backend. The only backend available right now is CFFI, which just happens is a good portable FFI, so this is okay. To generate the NLopt bindings:

(generate-binding
 '(:cffi
   :package-name :nlopt-cffi-bindings
   :input-files ("nlopt.h")
   :gccxml-flags "-I /usr/local/include/"))

After this, we have a file called "nlopt-cffi-bindings.lisp". These represent a very low-level interface for the library. It basically gets you to the level of functionality you would have if programming in C. As such, this could be used as is, but that wouldn't be very Lispy, would it? I wrapped some of these utilities and will later post my end results as a proper set of Lispy bindings on GitHub.

This is already very useful, but Verrazano can do much more.

Generating GSL Bindings

There are already several GSL bindings available for CL (I even have my own private library that I like). The most popular, and deservedly so, is GSLL. The shear level of completeness is inspiring considering Liam wrote the bindings by hand (an admirable amount of work, but in my opinion a bit wasteful). One thing I don't like about GSLL is that it feels like it was brought just to the point of usefulness, but no further. Coding with GSLL feels like you are coding in C. This has changed a bit recently, with the introduction of Antik, but the library is still more difficult to use than my hand rolled GSL library (in fact, perhaps I'm dumb or something, but I have yet to actually productively used GSLL in a project for work or otherwise).

But never mind all that, let's make our own:

(generate-binding
 (list
  :cffi
  :package-name :cffi-gsl-bindings
  :input-files (mapcar #'namestring (directory #p"/usr/include/gsl/*.h"))
  :gccxml-flags "-I /usr/include/gsl"
  :standard-name-transformer-replacements
  `(,(cl-ppcre:create-scanner "(^gsl_?)") "")))

This basically looks like the example above but with a few things added. First, we are including all of the GSL header files. Also, we use the :standard-name-transformer-replacements option to remove the prefix on GSL functions using a regular expression. If you are familiar with C, you know that this prefix is necessary because the C programming language doesn't have a way to divide the symbol namespace. This means that without these prefixes the likelihood of a name collision is very high. In Common Lisp we don't have to worry about this because we have a package system.

However, when you run the command above, you get an error and the bindings were not generated. This is because GSL changed some naming conventions a while back and we need to explicitly define that we want to use the new (non-backwards compatible) functions by passing an extra argument to GCCXML. We need to pass -D GSL_DISABLE_DEPRECATED to GCCXML. Well, let's try it again:

(verrazano:generate-binding
 (list
  :cffi
  :package-name :cffi-gsl-bindings
  :input-files (mapcar #'namestring (directory #p"/usr/include/gsl/*.h"))
  :gccxml-flags "-I /usr/include/gsl -DGSL_DISABLE_DEPRECATED"
  :standard-name-transformer-replacements
  `(,(cl-ppcre:create-scanner "(^gsl_?)") "")))

Bingo, this time it worked. If we look at the binding file we see that basically all of the stuff is there that we want. However there is a bunch of stuff that we don't care about and, quite frankly, probably doesn't work. Indeed, if you define the library and attempt to load that file, then you will get several errors. Also, the file is some 30 thousand lines long, making it difficult to track them all down.

The errors seem to come in a few different flavors. First, sometimes Verrazano will find a type that that it doesn't know how to deal with. These are associated with comments in the generated source file that look like:

;;; No entry found for gccxml type "long double" in *GCCXML-FUNDAMENTAL-TYPE->CFFI-TYPE*

Indeed, long doubles are typically seen as an extension to C and not standard or necessarily available on every platform, and is certainly not available in CFFI (maybe someday?). The best way to get around this is to not include those headers that provide long double support in GSL. But even that doesn't clear it all up. It turns out that if you also exclude gsl_sys.h, gsl_complex.h, gsl_types.h, gsl_test.h, and gsl_version.h then it clears all of these errors up. Which brings up two points:

  • If there is an error, it might show up as a comment in the generated source file without so much as a warning at the REPL. This is weird, but it is the way things are right now. These will typically manifest as errors at load time.
  • Only include the headers that you really need. The headers that we ended up removing turned out to be ones that we really didn't want or need in the first place. It takes some time up front to pick the right header files, but it can save you a pain later.
(verrazano:generate-binding
   (list
    :cffi
    :package-name :cffi-gsl-bindings
    :input-files (remove-if (lambda (x)
                              (or (ppcre:scan "gsl_sys\\.h$" x)
                                  (ppcre:scan "gsl_complex\\.h$" x)
                                  (ppcre:scan "gsl_types\\.h$" x)
                                  (ppcre:scan "gsl_version\\.h$" x)
                                  (ppcre:scan "gsl_test\\.h$" x)
                                  (ppcre:scan "long_double\\.h$" x)))
                            (mapcar #'namestring (directory #p"/usr/include/gsl/*.h")))
    :gccxml-flags "-I /usr/include/gsl -DGSL_DISABLE_DEPRECATED"
    :standard-name-transformer-replacements
  `(,(cl-ppcre:create-scanner "(^gsl_?)") "")))

Now we run into the next error, case sensitivity. Apparently Verrazano isn't really designed to handle cases where a function has multiple identifiers that differ by nothing more than case. Seems like a bad programming practice, but GSL does it, and Verrazano screws it up, so we have to work around it. One place where this seems appropriate but unfortunate is with the Bessel functions where the cylindrical functions are written as upper case letters (J, K, I, and Y) and the spherical functions are written as lower case (j, k, i, and y). We can deal with it by adding a new translation rule:

(verrazano:generate-binding
   (list
    :cffi
    :package-name :cffi-gsl-bindings
    :input-files (remove-if (lambda (x)
                              (or (ppcre:scan "gsl_sys\\.h$" x)
                                  (ppcre:scan "gsl_complex\\.h$" x)
                                  (ppcre:scan "gsl_types\\.h$" x)
                                  (ppcre:scan "gsl_version\\.h$" x)
                                  (ppcre:scan "gsl_test\\.h$" x)
                                  (ppcre:scan "long_double\\.h$" x)))
                            (mapcar #'namestring (directory #p"/usr/include/gsl/*.h")))
    :gccxml-flags "-I /usr/include/gsl -DGSL_DISABLE_DEPRECATED"
    :standard-name-transformer-replacements
  `(,(cl-ppcre:create-scanner "(^gsl_?)")
     ""
     ,(cl-ppcre:create-scanner "bessel_(zero_)?([jkiy])")
     "spherical_bessel_\\1\\2")))

This will translate the lower case Bessel functions of the form gsl_sf_bessel_j into gsl_sf_spherical_bessel_j while leaving gsl_sf_bessel_J alone.

The much more troubling situation is when GSL uses both N and n, or F and f, in a single functions argument list. This cannot work, as I'm sure is clear as day. In order to fix this, I needed to actually hack on Verrazano a bit. I had to change the write-cffi-function function from it's current version to:

;; From...
(do-arguments-of-function (argument node)
  (incf index)
  (if (typep argument 'gccxml:ellipsis)
      (write-string "common-lisp:&rest")
      (bind ((argument-name (aif (name-of argument)
                                 (transform-name (name-of argument) :variable)
                                 (format nil "arg~A" index)))
             (argument-type (type-of argument)))
        (push argument-name previous-args)
        (pprint-newline :fill)
        (format t " (~A " argument-name)
        (write-cffi-type argument-type)
        (format t ")"))))

;; ...to...

(let ((previous-args nil))
  (do-arguments-of-function (argument node)
    (incf index)
    (if (typep argument 'gccxml:ellipsis)
        (write-string "common-lisp:&rest")
        (bind ((argument-name (if (and (name-of argument)
                                       (not (member (name-of argument)
                                                    previous-args
                                                    :test #'equal)))
                                  (transform-name (name-of argument) :variable)
                                  (format nil "arg~A" index)))
               (argument-type (type-of argument)))
          (push argument-name previous-args)
          (pprint-newline :fill)
          (format t " (~A " argument-name)
          (write-cffi-type argument-type)
          (format t ")")))))

This change keeps track of what symbols we have already used in the argument list and replace it with a generic argument of the form "arg<n>". This gets rid of the last error that stops this file from loading. Now we can load this file and, again, we are basically where we would be if were a C programmer, but we can work from Lisp to build a good interface on top of this set of generated bindings. This brings up another point:

  • Verrazano is not production ready; it is hacker ready. Verrazano is useful because it is a tool that developers use every once in a while for generating bindings. It is not ready for every end user to use.

Okay, so we have something that will load. However, you may have noticed that there are crap-load of warnings that are produced when those bindings are loaded. These warnings are regarding a pretty new improvement to CFFI. Not so long ago, if you wanted to pass a structure to a function using CFFI, your only option was to pass it by address. If you needed to pass it by value you were out of luck. Luckily basically nobody uses pass by value, but it is a little limiting. Now CFFI supports this, so you need to specify structs by (:struct struct-name) and pointers to structs as (:pointer (:struct struct-name)). In order to fix this we are going to have to do a bit more hacking, but for now these are just warnings.

If you have been perusing the generated bindings, you may be wondering what all these weird functions that Verrazano is making are coming from. Some of the functions look good, but some seem very odd, for instance _ZN17gsl_vector_ushortaSERKS_. These functions look very odd and they show up because Verrazano is, fundamentally, a tool for making C++ bindings, not C bindings. It is interpreting these header files as C++ header files. It doesn't offer a "assume this is C" option. This isn't Verrazano's limitation, really, the GCCXML people are not interested in working with anything but C++. Often this doesn't matter as C++ is mostly a superset of C. However, there are differences. One difference is that when it sees structs, it has to assume that these are C++ sturcts, which are basically classes in C++ with all of their members public by default. This means that there will be a bunch of code that is generated by Verrazano like constructors, destructors, and comparison operators that have special GCC style "mangled" names. These symbols are actually not in the library at all as a simple nm -D /usr/lib/libgsl.so.0 will demonstrate. I suppose that these don't really do much harm, and I don't know how to get rid of them at the time short of post-processing the file and removing them. Plus we may want them if we are linking against a library with a C++ interface some day. At least Verrazano seems kind enough to place all these oddities at the end of our bindings file.

To prove that this works:

;; Define our library and use it (stolen from GSLL)
(cffi:define-foreign-library libgslcblas
  (:darwin #+ccl #.(ccl:native-translated-namestring
                    (gsl-config-pathname "libgslcblas.dylib"))
           #-ccl #.(gsl-config-pathname "libgslcblas.dylib"))
  (:cygwin "cyggslcblas-0.dll")
  (:unix (:or "libgslcblas.so.0" "libgslcblas.so"))
  (t (:default "libgslcblas")))

(cffi:use-foreign-library libgslcblas)

(cffi:define-foreign-library libgsl
  (:darwin #+ccl #.(ccl:native-translated-namestring
                     (gsl-config-pathname "libgsl.dylib"))
           #-ccl #.(gsl-config-pathname "libgsl.dylib"))
  (:cygwin "cyggsl-0.dll")
  (:unix (:or "libgsl.so.0" "libgsl.so"))
  (t (:default "libgsl")))

(cffi:use-foreign-library libgsl)

;; Load the bindings
(load #p"gsl-cffi-bindings.lisp")

;; Define our callback function
(cffi:defcallback csin :double
    ((x :double)
     (params :pointer))
  (declare (ignore params))
  (float (sin x) 0d0))

(let ((workspace (gsl-cffi-bindings:integration-workspace-alloc 10)))
  (unwind-protect
       (cffi:with-foreign-objects ((gsl-fn '(:struct
                                             gsl-cffi-bindings:function-struct))
                                   (result :double)
                                   (abs-error :double))
         ;; Setup the GSL function object
         (setf
          (cffi:foreign-slot-value
           gsl-fn '(:struct gsl-cffi-bindings:function-struct)
           'gsl-cffi-bindings:function)
          (cffi:callback csin))
         ;; Call the quadrature routine
         (gsl-cffi-bindings:integration-qag
          (cffi:mem-aptr
           gsl-fn '(:struct gsl-cffi-bindings:function-struct))
          0d0 (/ pi 2d0) 1d-10 1d-5 10
          (cffi:convert-to-foreign :gsl-integ-gauss-61 'gsl-cffi-bindings:.-155)
          workspace result abs-error)
         ;; Return the result and estimate of the error of the result
         (values (cffi:mem-ref result :double)
                 (cffi:mem-ref abs-error :double)))
    ;; Clean up
    (gsl-cffi-bindings:integration-workspace-free workspace)))
==>
1.0
1.1102230246251565e-14

It may be ugly, but it works. Note that everything here translates pretty directly to the C code you would write to do this using GSL. In the future we'll explore how to make this a bit easier to use.

Till next time.

No comments :

Post a Comment