Tuesday, September 30, 2014

Ubuntu Upgrade Woes

I tried to upgrade to the newest LTS release last week. Let's get this out of the way, relevant XKCD. It took three solid, long days, which is more than anybody should ever be expected to spend installing an operating system for a desktop system (though I must admit that I spent a full week working through Linux From Scratch in my off time). All said and done, I was left with a (almost) fully functional Ubuntu installation, so I am pretty happy.

To put this in context, I have a whole bunch of rants about how much Ubuntu sucks filed away on my computer, but this will probably be the first one that will actually see the light of day. The only reason I bring it up is that it seems that I, sort of, figured out why this was causing me problems and how to fix it. However, I don't really know of a place to put this. I know what worked for me, but I really don't understand it fully and I really don't think I am in the position to give technical advice to anybody.

So, I'll get to the short of it with the hopes that anybody in my same shoes that is frantically Googling about might land on this page. If you:

  1. Are running Ubuntu (or probably any other GNU/Linux that supports EFI boot) on a MacBook Pro (at least the 6,2 model, but perhaps others that use hybrid video technology).
  2. Are having a tough time getting the Nvidia drivers to work (the install but you end up with a blank screen on the next boot).

You might be suffering from the same issue I had; you need to ensure that you are booting in emulated BIOS mode rather than using Apple's EFI. For me, this means that I need to hold the "Option Key" on boot and hand select the Ubuntu drive, but Google says that there are other ways to do this.

It actually solves one of the big mysteries I've had since starting to use Ubuntu on the Apple computers: the poor battery life in Ubuntu versus OS X (in my and my friend's experience this basically cuts the idle lifetime by a factor of two to three). At least one of the reasons for this poor battery life (and perhaps the major contributor) is that the video card is stuck in the discrete Nvidia GPU mode and never switches to the integrated i915 card.

This caught me by surprise as I wasn't aware that booting using different methods would leave the hardware in different states (i.e. boot one way and the system is stable and usable, boot another way and you will get crappy performance at best and very likely an install that will crash every time it tries to start Xorg). It is a shame that Ubuntu picks the unstable EFI way to set up the booting process… however…

You really can't blame Ubuntu for their decision. Apple used a non-standard EFI method for (at least) this series of MacBook Pros. They also used a non-standard hybrid video technology based on Optimus but sufficiently different that none of the bumblebee stuff works, yet. I think people are working on it and Youtube shows some videos with working proof-of-concept graphics switching, but it is a pretty old laptop so I'm not holding my breath.

As a side note, while the whole installation debacle should have soured me on 14.04, I have to say that it is extremely well polished compared to 12.04. Of course it could be better, but I absolutely love how Unity looks once you install a few themes and fiddle with Unity Tweak Tool to bend it to your taste.

Also, I know many people that would ask, "was it worth it?" Is it worth it to work for three days to get a GNU/Linux installation up and running? Absolutely. I am vastly more productive when working within GNU/Linux. Should it have taken less time? Absolutely, and that is the real question to ask. Would another distribution provide similar features for much less investment?

Tuesday, April 1, 2014

Some Emacs hacks for GDB and other stuff

I am fighting with GDB in Emacs today. Thought I would share a few hacks that make things more livable. These get hackier as you go, so… be warned.

First, GDB in many windows mode makes a bunch of dedicated windows. This can be pretty annoying. In fact, dedicated windows in general can be pretty annoying. If I want to change the buffer of a window, don't you dare stop me. Emacs Wiki has some advice code that allows you to disable the setting of dedicated windows in GDB startup (bottom of the page). But to be clear, this is a problem with the Emacs interface, not the GDB interface. The GDB interface has a good reason to not want those buffers to change, and it is annoying to use it if they do change. The problem is when I know that I really do want these to change but Emacs makes me jump through hoops to do it. So, let's fix the underlying problem. Here is a bit of code to add to your init files that will allow you to change buffers even if they are dedicated.

(defun undedicate-window (&optional window)
  (set-window-dedicated-p (or window (get-buffer-window)) nil))

;; Removing annoying dedicated buffer nonsense
(defun switch-to-buffer! (buffer-or-name &optional norecord force-same-window)
  "Like switch-to-buffer but works for dedicated buffers \(though
it will ask first)."
   (list (read-buffer-to-switch "Switch to buffer: ") nil 'force-same-window))
  (when (and (window-dedicated-p (get-buffer-window))
             (yes-or-no-p "This window is dedicated, undedicate it? "))
  (switch-to-buffer buffer-or-name norecord force-same-window))

I just set a global key binding of (kbd "C-x b") to switch-to-buffer! (actually I use a global minor mode to keep track of all of my keybindings, but the result is the same). This will now act exactly like switch-to-buffer unless the window is dedicated, in which case it will ask if you want to "undedicate" the window first. Now Emacs wont reuse these buffers willy nilly, but you can still do what you think is best.

Second, dedicated windows are so convenient (now that they are convenient to undedicate) that you might find that you want to start setting the dedicated flag on random windows that you don't want Emacs to change. So here is a function for that. If you want a dedicated completion window, well, just set it on that window and you won't have to worry about it getting taken over by some other Emacs pop-up.

(defun toggle-window-dedication (&optional window)
  (let ((window (or window (get-buffer-window))))
    (set-window-dedicated-p window (not (window-dedicated-p window)))))

(global-set-key (kbd "C-x d") 'toggle-window-dedication)

Third, I really hate that performing any act in GUD tends to make the source buffer you are working with jump back to the execution point. This is a problem if you are setting several breakpoints, or printing out several values from the source file. Thus I came up with this hack (and this really is a hack) to make this problem go away.

(defun nice-gud-print (arg)
  (interactive "p")
   (gud-print arg)
   (sleep-for .1)))
(global-set-key (kbd "C-x C-a C-p") 'nice-gud-print)
(defun nice-gud-break (arg)
  (interactive "p")
   (gud-break arg)
   (sleep-for .1)))
(global-set-key (kbd "C-x C-a C-b") 'nice-gud-break)
(defun nice-gud-tbreak (arg)
  (interactive "p")
   (gud-tbreak arg)
   (sleep-for .1)))
(global-set-key (kbd "C-x C-a C-t") 'nice-gud-tbreak)

The sleep-for call is necessary to make save-excursion actually work. My guess here is that GUD queues requests and then executes them later. So without the sleep, my call to GUD returns, and with it the save-excursion environment exits, long before GUD processes the command and then resets the view to the execution point. Not pretty, but it works for me at least.

Lastly, I find that it is helpful to have a key binding for gdb-restore-windows. This way you can easily jump to a particular window, make the buffer full screen, and then return to the debugger view later. However, if you like to use a vertically split screen like I do (i.e. you like 80 column width programs), it is often even better to have a toggle between just the two center windows and the full debugger view. So, here is a function to do that:

(defun gdb-windows-toggle (&optional full-restore)
  (interactive "P")
  (let ((window-tree (first (window-tree))))
    (cond (full-restore
          ((and (listp window-tree)
                (= (length window-tree) 5))
           (let ((buffers (if (listp (fourth window-tree))
                              (mapcar 'window-buffer
                                      (rest (rest (fourth window-tree))))
                              (list (window-buffer (fourth window-tree)))))
                 (first-selected (or (windowp (fourth window-tree))
                                     (eql (first (window-list))
                                          (third (fourth window-tree))))))
             (when (> (length buffers) 1)
             (cl-loop for buffer in buffers
                      for window in (window-list)
                      do (progn (undedicate-window window)
                                (set-window-buffer window buffer)))
             (unless first-selected
               (select-window (second (window-list))))))
          ((or (windowp window-tree)
               (and (= (length window-tree) 4)
                    ;; horizontal split
                    (not (first window-tree))
                    ;; No further splits
                    (windowp (third window-tree))
                    (windowp (fourth window-tree))))
           (let ((current-buffers
                   (if (windowp window-tree)
                       (list (window-buffer window-tree))
                       (mapcar 'window-buffer (rest (rest window-tree)))))
                 (first-selected (or (windowp window-tree)
                                     (eql (first (window-list))
                                          (third window-tree)))))
             (let ((windows (rest (rest (fourth (first (window-tree)))))))
               (when (= (length current-buffers) 1)
                 (delete-window (second windows)))
               (cl-loop for buffer in current-buffers
                        for window in windows
                        do (progn (undedicate-window window)
                                  (set-window-buffer window buffer)))
               (if first-selected
                   (select-window (first windows))
                   (select-window (second windows))))))
          (t ;; If all else fails, just restore the windows

(global-set-key (kbd "C-c g") 'gdb-windows-toggle)

That is long and ugly, and probably could be made much simpler, but it does the trick and does it pretty well. Maybe if these really work out I can get someone involved with Emacs to include some of these little hacks (cleaned up of course). It would be nice to deal with the fact that GDB is fundamentally broken for me right now, but… we'll see where those bug reports go.

Monday, January 13, 2014

Visualization of the Game of Life

Working with visualization a bit today and needed a system to visualize, so I did the Game of Life and it turned out pretty, so I decided I should share.

The game is played in the XY plane. Older generations are moved down in Z and their color is darkened by 20%. Each system is started with random live and dead cells. The first runs are with free boundary conditions while after around 2:50 I switch to a toroidal system of size 20 in X and Y. You can clearly see gliders at several places in the video, for instance there is one at the 2:00 mark.

If you care, the simulation and visualization were all done from inside a Lisp session and takes around 90 lines of code (including blank ones) to get this effect.  The stuttering frame rate is due to the screen capturing software, RecordMyDesktop.  I'm not sure how to do this better other than grabbing the frames from the OpenGL display buffer, writing them to a bitmap file, and making my own video with FFMPEG (which is what I do for presentations).

I need to figure out how to make this a live wallpaper for my phone...

Friday, October 25, 2013

CL-Libsvm Usage

CL-libsvm usage

At work we decided that we needed to use a bit of machine learning to automate a problem. I recommended using LIBSVM as it very established (read old with many bindings available). I decided to see what was involved in using it this morning, but found that it took an embarrassingly long time to get it working like I expected it to out of the box. This is my short-coming, I had forgotten too much of the machine learning that I used to know. But this post will hopefully help others get it working more quickly.

First, SVMs are pretty versatile, and there are a lot of choices that you can make when you start using LIBSVM. It is often not clear what you should use for what task, and I don't have the expertise to tell you what is best. For that information, you should check out this guide by the library's author. What I will show here is a quick and dirty approach.

I will be using Gabor Melis' bindings for LIBSVM. They work well enough, but they will issue many warnings when you load them up and use them as they use a now deprecated syntax of CFFI. Just be aware.

We will be using a radial basis function (RBF) kernel to model the "a1a" training data and test data that is available on the LIBSVM website. Already this is a bit odd, as the LIBSVM people have split their data such that ~5% is in the training data set while ~95% is in the test data set. This is probably due to the somewhat expensive nature of SVMs? But, whatever the case, we will stick with it.

Here is a simple procedure to read in the data:

(ql:quickload '(:iterate :cl-ppcre :alexandria))

(use-package :iterate)

(defun parse-data-file (file)
  "Read in a data-file, return list of target values with input as a sparse
vector suitable for cl-libsvm."
  (iter (for line :in-file file :using 'read-line)
     (list (read-from-string line)
           (let ((result nil))
             (ppcre:do-matches-as-strings (match "\\w+:\\w+" line)
                (apply 'cons (mapcar 'read-from-string
                                     (let ((separator (position #\: match)))
                                        (subseq match 0 separator)
                                        (subseq match (+ 1 separator))))))
             (coerce (reverse result) 'vector))))))

(defun randomize-order (data)
  "Return a random permutation of the data."
  (alexandria:shuffle (copy-seq data)))

The parse-data-file returns a vector where each element is a line in the data file mapped into a list of the form (target-val sparse-vector-input). A sparse vector is a vector where each element is a cons cell containing an index as its car and a value as its cdr.

We need to further divide the testing data so that we have a data set that we can use to validate the parameters that the RBF SVM uses (the "validation" set) while we use the rest of the test set to determine the accuracy of the model (the "test" set). We could just cut the training data in half, but that would make our validation calculations very slow, and as we will see is a moment, this becomes the bottleneck of the calculation. I have chosen my validation set to be 2000 point. Whatever size you choose will be a trade off between accuracy and speed. Having 2000 points in your validation set is about my limit when it comes to playing around at the REPL (I would choose something higher if it showed benefits for the models accuracy). If we are dividing the data, we must first first randomize it to ensure that no structure of how the data was collected shows up in the analysis. This can be done with the randomize-order function above and a couple calls to subseq.

(let ((test-data (randomize-order (parse-data-file #p"~/Desktop/a1a.t"))))
  (defparameter *validation-set* (subseq test-data 0 2000))
  (defparameter *test-set* (subseq test-data 2000)))

Now, to get into the nitty-gritty of CL-Libsvm. First, you'll need to install it, and it is not available in Quicklisp yet. So, clone it from Gabor's Git repo. Naturally you will also need to have LIBSVM installed (you don't actually need libsvm-dev or libsvm-tools, but why not have them anyway):

cd ~/quicklisp/local-projects
git clone http://quotenil.com/git/cl-libsvm.git

# Install LIBSVM if needed
sudo aptitude install libsvm3 libsvm-dev libsvm-tools

CL-Libsvm is basically structured like LIBSVM, so the same documentation applies here. We must first create a problem which contains inputs and the expected outputs, then create a parameter structure that contains the parameters that define how the SVM operates (numerical constants as well as the type of SVM and kernel), and finally combine these two into a model which can be used to make predictions.

This can be done easily enough for any particular RBF parameters \(C\) and \(\gamma\).

(ql:quickload :cl-libsvm)

(let ((data (parse-data-file #p"~/Desktop/a1a")))
  (defparameter *problem* (libsvm:make-problem (map 'vector 'first data)
                                               (map 'vector 'second data))))

(defparameter *parameter* (libsvm:make-parameter :c c :gamma gamma))

(defparameter *model* (libsvm:train *problem* *parameter*))

But we want to do this for many different values of \(C\) and \(\gamma\) in order to find what parameters give us the best performance. We could do several things to find the optimum values. We will be following the LIBSVM procedure and just search the parameter space on a grid. We should note from the manual that it is probably in our best interests to search in \(\log C\) and \(\log \gamma\).

(let ((data (parse-data-file #p"~/Desktop/a1a")))
  (defparameter *problem* (libsvm:make-problem (map 'vector 'first data)
                                               (map 'vector 'second data))))

(defparameter *optimization-grid*
  (iter :outer
    (for log-c :from 1.5 :to 3.5 :by .1)
    (iter (for log-gamma :from -5 :to -3.5 :by .1)
      (in :outer (collecting
                  (list* log-c
                         (let* ((params (libsvm:make-parameter
                                         :c (exp log-c)
                                         :gamma (exp log-gamma)))
                                (model (libsvm:train *problem* params)))
                           (list (quality model *validation-set* log-c log-gamma)

Note that there is a missing function here. We never defined quality. This function is meant to take a model and some testing data and determine a measure of how good the model is performing. For this I chose to use the Matthews Correlation Coefficient with the threshold for the prediction set to \(0.5\).

(defun logistic (x)
  (/ (+ 1 (exp (- x)))))

(defun quality (model test-data log-c log-gamma)
  "Use the Matthews Correlation Coefficient to measure how well the model does"
  (iter (for (target input) :in test-data)
    (let ((p (if (< 0.5 (logistic (libsvm:predict model input)))
                 1 -1)))
      (cond ((and (= p 1) (/= target p))
             (summing 1 :into false-positives))
            ((and (= p -1) (/= target p))
             (summing 1 :into false-negatives))
            ((and (= p 1) (= target p))
             (summing 1 :into true-positives))
            ((and (= p -1) (= target p))
             (summing 1 :into true-negatives))))
     (let ((quality
             ;; Compute quality of model
             (if (= 0 (- (* true-positives true-negatives)
                          (* false-positives false-negatives)))
                  (/ (- (* true-positives true-negatives)
                        (* false-positives false-negatives))
                       (* (+ true-positives false-positives)
                          (+ true-positives false-negatives)
                          (+ true-negatives false-positives)
                          (+ true-negatives false-negatives))
       ;; Print some output so we know what it's doing
        t "log-c = ~A, log-gamma = ~A~@
           TP = ~A, TN = ~A, FP = ~A, FN = ~A~@
           Quality = ~A~%"
        log-c log-gamma
        true-positives true-negatives false-positives false-negatives
       (return quality)))))

When we put this all together and playing around with the ranges of the plots, we get a plot that looks like this:

From this we can tell that there is likely some kind of optimum around \(\log C = 2.10\) and \(\log \gamma = -4.27\).

You might wonder how I was able to determine, looking at those blocky heat maps, where that tiny maximum was. While you can do what LIBSVM docs suggest and move to finer and finer grids to find optimal points, I find this pretty annoying, and finicky. I opted to use some, as yet unreleased, bindings for the NLOpt optimization library. With NLOpt we can do a global optimization followed by a local optimization, which requires next to zero human intervention and finds a pretty good optimum that I doubt I would be able to otherwise. (Small caveat here, it is entirely possible that the rough nature of my MCC heat map is merely an artifact of the small validation set size. I don't have the patience to test this for a silly example)

;; Global optimization (better grab a snack)
 (lambda (log-c log-gamma)
   (let* ((params (libsvm:make-parameter
                   :c (exp log-c)
                   :gamma (exp log-gamma)))
          (model (libsvm:train *problem* params)))
     (- (quality model *validation-set*))))
 '(1 1)
 :lower-bounds '(-5 -5)
 :upper-bounds '(7 7)
 :abs-xtol 1d-1)

;; Fit parameters = (2.071364331304816d0 -4.265683211751565d0)
;; Validation MCC = -0.5509856550306286d0

;; Local optimization (this should be quick)
 (lambda (log-c log-gamma)
   (let* ((params (libsvm:make-parameter
                   :c (exp log-c)
                   :gamma (exp log-gamma)))
          (model (libsvm:train *problem* params)))
     (- (quality model *validation-set*))))
 '(2.071364331304816d0 -4.265683211751565d0)
 :lower-bounds '(-5 -5)
 :upper-bounds '(7 7)
 :abs-xtol 1d-5)

;; Fit parameters = (2.096969188326027d0 -4.268553108908674d0)
;; Validation MCC = -0.5522135970232868d0

(let* ((params (libsvm:make-parameter
                :c (exp 2.096969188326027d0)
                :gamma (exp -4.268553108908674d0)))
       (model (libsvm:train *problem* params)))
  (quality model *test-set*))

;; Test set MCC = 0.5497032935038368d0

This gives the optimum at \(C = 2.10\) and \(\gamma = -4.27\) and a Matthew's Correlation Coefficient of \(0.55\) (measured against the test set, naturally).

Wednesday, July 31, 2013

Rudel Survival Guide

We are gearing up for our collaborative effort for ICFP and so I figured it might be nice to write out a little "how to make it work" guide for Rudel, the collaborative editing framework for Emacs. To get this out of the way first, Rudel doesn't work out of the box. In order to use it, we had to hack a bit on the source to correct what we can only guess are legitimate errors. That said, I certainly don't have the expertise on the system in order to dictate the correct way to solve these problem.

As a note, throughout this contest our setup will be:

  • Rudel v0.3
  • Obby backend
  • TCP transport
  • Port 6522
  • not using the built in Rudel/Obby encryption
  • all connections must be tunneled over SSH (for encryption/access control)
  • No global or user passwords

The first step for using Rudel is to, naturally, install Rudel. If you have Emacs v24, you may use the package manager via M-x package-list-packages. This will make sure that it also gets any dependencies, however I don't think there are any that aren't already part of Emacs. If you are not using Emacs v24, you will need to install package.el (and this will actually be useful as I believe it will have to track down some dependencies). This can be done like this:

cd .emacs.d
wget http://repo.or.cz/w/emacs.git/blob_plain/1a0a666f941c99882093d7bd08ced15033bc3f0c:/lisp/emacs-lisp/package.el

Then from Emacs, M-x load-file that package.el file. Then you can use continue as if you were using v24 (except where noted).

Rudel is available via the the Marmalade repository. In order to enable the Marmalade repository, you should add something like this early in your .emacs file:

;; (load "~/.emacs.d/package.el") ;; If not v24

;; Load up the new package manager
(require 'package)
;; Add the Marmalade repo
(add-to-list 'package-archives
             '("marmalade" . "http://marmalade-repo.org/packages/") t)

The install/compile should seemingly complete without any issues. (If you are not on v24, then there might be an issue with package displaying the info page of certain packages, Rudel amongst them. Instead of using M-x package-list-packages, use M-x package-install and specify "rudel" when it asks).

Now the trouble fun begins.

1.1 We broke your Emacs

Try closing and restarting Emacs. On your next start, Emacs will fail to read your .emacs file with an error like:

Symbol's function definition is void: eieio-defclass-autoload

This is because there are some bugs in Rudel that make it not load properly. My solution is to not use the standard method of loading Rudel. The first order of business is stopping it from loading the correct but broken way via the package manager. Go into the shell and move the rudel directory somewhere where it cannot be found by the package manager:

cd .emacs.d
mv elpa/rudel-0.3 ./

Now Emacs should read your .emacs without issue (because it no longer is trying to load Rudel).

The next order of business, we want to be able to load and use Rudel. In order to do this, we will run Rudel's compile script. Do a M-x load-file on .emacs.d/rudel-0.3/rudel-compile.el. This will compile and generate some files, most importantly for our purposes, it will generate rudel-loaddefs.el. Perform a M-x load-file on .emacs.d/rudel-0.3/rudel-loaddefs.el and Rudel should be operational. Try it out. Use M-x rudel-host-session to start a session (protocol obby, transport tcp, port 6522). Then join that session via M-x rudel-join-session. Try publishing a buffer with M-x rudel-publish-buffer. This should all work.

We want to make this permanent, so we should add something like:

(load-file "~/.emacs.d/rudel-0.3/rudel-loaddefs.el")

…to our .emacs file after the (package-initialize) statement.

Look, I know this is extremely hackish, but I think it will work for everybody. It is the only way I have consistently been able to get things working.

1.2 Joining and leaving sessions

So, as best I can see, this is the status of this functionality in Rudel: you can join sessions and you can leave, but as far as the server cares, you can never leave. This doesn't seem like too much of a problem at first, but here is how problems start to manifest.

  1. A username must be unique: This means that each time you log in, you have to pick a unique name, not the one you used last time. This manifests as a lot of "smithzv", "smithzv2", "smithzv3", etc. Luckily, you shouldn't be disconnected often.
  2. A color must be unique at login time. This one isn't as bad as you can change colors once you are in the session using rudel-change-color. This means that a good practice is to log in and pick a garish and otherwise infrequently used color and then immediately change it to something more appropriate. No need to worry about conflicts after you have logged in.

1.3 Undo and Rudel

So, one of the biggest problems that I have with collaborative editing in Rudel is that undo are treated very literally. I you undo, you implicitly expect it to apply to your code. However, with Rudel, where someone could be editing somewhere else in the file, undo is happy to go about undoing their work as they type rather than the edits you just made.

The end result is that in a collaborative editing environment, undo is a very dangerous thing; doubly so when undo means what it does in Emacs land (i.e. you can redo by undoing an undo). Basically, if you are using Rudel, you probably should not be using undo, at all. This is a pretty tall order if you have deeply internalized undo into your editing commands (as I have).

In strikes me that a helpful heuristic would be to undo only within a region that contains only your edits (or even using something like multiple-regions to allow for undo of anything that you have edited but not things that others have). This means that if you are the only person editing a buffer, undo works exactly as expected, but if there are others editing the buffer, it will only undo your edits. Note that this isn't always what you want.

However, I'm not sure that such a heuristic is possible (or more likely, it is possible but is hard to do). I'll take a look. It seems that for safe, useful undoing, you need to tell everybody that is working on that buffer that they need to stop what they are doing so you may perform your undos.

I realize that other undo systems can be used with Emacs. For instance, there is Undo-Tree. I am not sure how well these work (or really how they work). Perhaps someone who is better versed in these tools can enlighten us.

1.4 When things go wrong

There are times when, try as it might, Rudel screws up and the files become out of sync. This happens fairly infrequently, thank goodness, but when it does, Rudel does not handle this gracefully. There is no "resync" function that I can see. This means that if you find that your buffer says one thing, but someone elses says something else (this usually manifests a what looks like a typo, but if you fix it, another person goes back and changes it back to its typo state), you will have do something pretty drastic in order to get Rudel to work correctly again. There are a couple of things that must work, but they both are pretty annoying:

  1. Ditch this buffer, have everybody unsubscribe, rename the buffer so it will share under a different name, then republish. This way everything has started fresh with this buffer.
  2. Restart the entire Rudel session.

Of course, the first method is preferred to the second.

1.5 Dealing with Annoying Colors

If you start using Rudel, sometimes a collaborator will pick a color that doesn't work with your theme or general aesthetic. Luckily, there is something you can do about it even if they refuse to change their color (or are perhaps away from keyboard), you can turn off all coloring by author. Simply specialize the variable rudel-overlay-author-display (set it to nil) and no more colors. This is pretty necessary right now because Rudel is ignorant of the difference between light and dark themes. Thus two locally appropriate choices might be completely inappropriate for the remote party.

Thursday, July 25, 2013

A Bit About Ubuntu Edge

A couple days ago, Canonical announced a crowd funding campaign for the Ubuntu Edge, a phone sized device that is able to function as a phone but is firmly designed to be a PC replacement. Just in case someone stumbles on this page after finding such crappy reporting as this, here is the deal with Ubuntu Edge as I see it: The Ubuntu Edge is about shaping the future.

Now, it is no secret that I like Canonical, one of the only companies fighting for (and throwing money at) GNU/Linux on the desktop. Frankly, despite recent missteps in quality control and their tendency to not go with 100% Free Software, I think they are doing more for the Free Software Movement than basically anybody else out there. But I have to say that Ubuntu Edge, and it's associated crowd funding campaign, is a particular point of brilliance for the company.

Canonical has been working for the last few years on something they call converged computing. The idea is to have every consumer electronic centered around a screen controlled on a very fundamental level by one device (one device that happens to run nearly 100% Free Software). As most of you are aware, every piece of consumer electronics out there has a small, typically underpowered, computer in it. The Ubuntu Edge vision is to take all of those computers out of the electronics and simply have the manufacturers build screens with an HDMI port and power supply attached. For all of the computation, something like the Ubuntu Edge will handle it.

The way I see Canonical's end game vision is something much like the technology seen in the Total Recall remake from a few years ago. In this movie, people have technology that they carry with them (actually implanted inside their body) and may use any piece of glass as a display for that technology by pressing their hand against it. Canonical's vision is similar. Computational devices will be mobile, private, secure, personal device that you carry with you while displays will be stationary, plentiful, pseudo-public devices you hook into when needed. In such a world, every hotel room you rent, every office, every coffee shop table, and every room in your house will have a display and input devices (keyboard and mouse, or touchscreen) and a cradle to place your "phone" into. The way I see Ubuntu Edge is the way it is presented, as a taste of the future.

But that is not entirely fair. There is one place where the Ubuntu Edge fits perfectly in the current world. If you have a computer at home and a computer at work (or if you have one laptop that you use for both and you lug it back and forth), the Ubuntu Edge makes a lot of sense for you. You should think of the Ubuntu Edge as a PC tower and UPS that fits in your pocket. There is no need to sync between different disk drives because you bring your drive with you. If you have ever emailed a file to yourself, or left your computer on at home so that you would have SSH access to it in case you needed it, or if you simply have so much data that you cannot store it in something like Dropbox or some analogous cloud file storage, an Ubuntu Edge like system is a great fit. In addition, there is no need to worry about software differences between your applications and files (no Excel 07 can't read Excel 09 XML files file problems) or incompatibilities during software development because you bring the hardware and software with you. If you have ever wasted a few hours or days configuring a new computer or tweaking a computers configuration, Ubuntu Edge seems like a useful platform. Canonical isn't the first company to sniff around this idea, but they seem to be the closest to actually achieving this.

An extra benefit of something like Ubuntu Edge is that it solves many problems that people that value freedom, privacy, and freedom from vendor lock-in have felt mounting. Ubuntu Edge resonates very well with us freedom loving people as it is a solution that is centered around a well standardized interface. It means that anybody can use this with whatever device they like. It is of course not lost on me that this device will be close to a completely freedom respecting device. There will probably be proprietary device drivers, but I expect that user land will be Free as in Freedom. This also resonates with us people that support competition in the market. Any time we have a technology where any new inventor can enter the market and leverage it for their invention, my free market loving self smiles a little bit. Standardized interfaces are great for competition (e.g. see the Internet).

Even further, this also resonates very well with us privacy loving people and people that haven't drank the "cloud" Kool-Aid. The current solution to the woes of syncing multiple computers (be they PCs, tablets, phone, or whatever) is to have something like Dropbox shuffle files around. This is actually a pretty crappy solution, if we are being honest. Don't get me wrong, it is good for the world we have constructed, but it is far from optimal. For example, on any system that I am familiar with, it is not possible to sync installed applications via something like Dropbox.

Of course there is nothing wrong with cloud services in principle, and in many instances they are extremely convenient for transferring files to friends. I'm not proposing that we move away from Internet services in total, but I think there is an argument to be made for only using Internet services for things that need to be Internet services. The problem is that cloud services are run by corporations and corporations only have allegiances to their share holders and, if required by law, the government that they operate under. In addition, in the world we currently live in corporations tend to also have a disturbing level of control over "mobile" devices. In this data syncing/sharing scheme there are many points of weakness when it comes to privacy. By buying into the "cloud", we have transitioned from the normal state where files that are not shared with others are basically safe into a world where our files are more vulnerable (to either attackers, corporations, or government agencies) because we have implicitly shared them in order to sync them. Something like Ubuntu Edge is a way out of this, it is a solution for many of the problems that the cloud was designed to solve. If my files are local (i.e. they are in my pocket) and third party devices that I interact with are nothing more than a screen with no general purpose processing capabilities, I have much less to worry about.

All of these aspects packed together makes this a pretty awesome project. But now the brilliant part. By making this a crowd funding campaign, Canonical has made it clear to the consumers what is possible. I have been hearing rumblings about a system like this for years, but perhaps the ordinary people out there have not. This is certainly a way to get the word out. On the flip side, this tells the manufacturers that there are people that are willing to pay money for something like this. Just the fact that this crowd funding campaign existed, regardless of whether it is actually funded, will affect the trajectory of technology in the future. This is a win-win scenario and it is brilliant, pure and simple. I would love to see Canonical make their goal and make their phone, but in my estimation, they have already changed the world for the better.

So, no, buying an Ubuntu Edge isn't the best fit for everyone (but to be honest, neither is a tablet, which try as you might, is a poor platform for anything but consuming information). Ubuntu Edge doesn't fit in perfectly with the world as it is. It isn't even designed to fill a gap, or at least not a gap that exists today. It is designed to be that device that is necessary to make the future that many of us envision come true. That said, it is priced competitively for the package that they promise and, for people that lug a laptop back and forth from work or for people that have a two or more computers that they struggle to keep in sync (e.g. me), this will be a useful device right now. If I had $700-830 to spend on a speculative device, I would. The real question is whether they can actually deliver what they promise.

Sunday, July 7, 2013

Automatic Binding Generation With Verrazano

Verrazano is a library for automatically creating Foreign Function Interface (FFI) bindings for Common Lisp. It is easy to use, as long as it works, and it can save you a lot of busy work. However, it is not well documented, and the documentation that is available is incorrect. Hopefully this post will help out people that don't want to figure it out themselves (including myself in 3 months).

As a sample of how to use Verrazano, I'm going to generate a set of bindings for the NLopt library from our friends over at MIT. This library is a great tool for optimization, whether it be minimization/maximization, fitting, or any optimization as it is very flexible in the constraints that the problem can have. It puts many of the optimization facilities in GSL to shame. I first started using it because of its derivative free local optimization routines and really went on to appreciate it when I realized I needed non-trivial constraints on the parameters in my model fits (GSL can't even handle trivial ones, really).

Up until now, I have been using some hand crafted bindings. This library is pretty small, so it actually isn't much of a burden to maintain for my purposes. But let's see how to generate them with Verrazano. We will use the function generate-binding which takes a list which represents a backend. The only backend available right now is CFFI, which just happens is a good portable FFI, so this is okay. To generate the NLopt bindings:

   :package-name :nlopt-cffi-bindings
   :input-files ("nlopt.h")
   :gccxml-flags "-I /usr/local/include/"))

After this, we have a file called "nlopt-cffi-bindings.lisp". These represent a very low-level interface for the library. It basically gets you to the level of functionality you would have if programming in C. As such, this could be used as is, but that wouldn't be very Lispy, would it? I wrapped some of these utilities and will later post my end results as a proper set of Lispy bindings on GitHub.

This is already very useful, but Verrazano can do much more.

Generating GSL Bindings

There are already several GSL bindings available for CL (I even have my own private library that I like). The most popular, and deservedly so, is GSLL. The shear level of completeness is inspiring considering Liam wrote the bindings by hand (an admirable amount of work, but in my opinion a bit wasteful). One thing I don't like about GSLL is that it feels like it was brought just to the point of usefulness, but no further. Coding with GSLL feels like you are coding in C. This has changed a bit recently, with the introduction of Antik, but the library is still more difficult to use than my hand rolled GSL library (in fact, perhaps I'm dumb or something, but I have yet to actually productively used GSLL in a project for work or otherwise).

But never mind all that, let's make our own:

  :package-name :cffi-gsl-bindings
  :input-files (mapcar #'namestring (directory #p"/usr/include/gsl/*.h"))
  :gccxml-flags "-I /usr/include/gsl"
  `(,(cl-ppcre:create-scanner "(^gsl_?)") "")))

This basically looks like the example above but with a few things added. First, we are including all of the GSL header files. Also, we use the :standard-name-transformer-replacements option to remove the prefix on GSL functions using a regular expression. If you are familiar with C, you know that this prefix is necessary because the C programming language doesn't have a way to divide the symbol namespace. This means that without these prefixes the likelihood of a name collision is very high. In Common Lisp we don't have to worry about this because we have a package system.

However, when you run the command above, you get an error and the bindings were not generated. This is because GSL changed some naming conventions a while back and we need to explicitly define that we want to use the new (non-backwards compatible) functions by passing an extra argument to GCCXML. We need to pass -D GSL_DISABLE_DEPRECATED to GCCXML. Well, let's try it again:

  :package-name :cffi-gsl-bindings
  :input-files (mapcar #'namestring (directory #p"/usr/include/gsl/*.h"))
  :gccxml-flags "-I /usr/include/gsl -DGSL_DISABLE_DEPRECATED"
  `(,(cl-ppcre:create-scanner "(^gsl_?)") "")))

Bingo, this time it worked. If we look at the binding file we see that basically all of the stuff is there that we want. However there is a bunch of stuff that we don't care about and, quite frankly, probably doesn't work. Indeed, if you define the library and attempt to load that file, then you will get several errors. Also, the file is some 30 thousand lines long, making it difficult to track them all down.

The errors seem to come in a few different flavors. First, sometimes Verrazano will find a type that that it doesn't know how to deal with. These are associated with comments in the generated source file that look like:

;;; No entry found for gccxml type "long double" in *GCCXML-FUNDAMENTAL-TYPE->CFFI-TYPE*

Indeed, long doubles are typically seen as an extension to C and not standard or necessarily available on every platform, and is certainly not available in CFFI (maybe someday?). The best way to get around this is to not include those headers that provide long double support in GSL. But even that doesn't clear it all up. It turns out that if you also exclude gsl_sys.h, gsl_complex.h, gsl_types.h, gsl_test.h, and gsl_version.h then it clears all of these errors up. Which brings up two points:

  • If there is an error, it might show up as a comment in the generated source file without so much as a warning at the REPL. This is weird, but it is the way things are right now. These will typically manifest as errors at load time.
  • Only include the headers that you really need. The headers that we ended up removing turned out to be ones that we really didn't want or need in the first place. It takes some time up front to pick the right header files, but it can save you a pain later.
    :package-name :cffi-gsl-bindings
    :input-files (remove-if (lambda (x)
                              (or (ppcre:scan "gsl_sys\\.h$" x)
                                  (ppcre:scan "gsl_complex\\.h$" x)
                                  (ppcre:scan "gsl_types\\.h$" x)
                                  (ppcre:scan "gsl_version\\.h$" x)
                                  (ppcre:scan "gsl_test\\.h$" x)
                                  (ppcre:scan "long_double\\.h$" x)))
                            (mapcar #'namestring (directory #p"/usr/include/gsl/*.h")))
    :gccxml-flags "-I /usr/include/gsl -DGSL_DISABLE_DEPRECATED"
  `(,(cl-ppcre:create-scanner "(^gsl_?)") "")))

Now we run into the next error, case sensitivity. Apparently Verrazano isn't really designed to handle cases where a function has multiple identifiers that differ by nothing more than case. Seems like a bad programming practice, but GSL does it, and Verrazano screws it up, so we have to work around it. One place where this seems appropriate but unfortunate is with the Bessel functions where the cylindrical functions are written as upper case letters (J, K, I, and Y) and the spherical functions are written as lower case (j, k, i, and y). We can deal with it by adding a new translation rule:

    :package-name :cffi-gsl-bindings
    :input-files (remove-if (lambda (x)
                              (or (ppcre:scan "gsl_sys\\.h$" x)
                                  (ppcre:scan "gsl_complex\\.h$" x)
                                  (ppcre:scan "gsl_types\\.h$" x)
                                  (ppcre:scan "gsl_version\\.h$" x)
                                  (ppcre:scan "gsl_test\\.h$" x)
                                  (ppcre:scan "long_double\\.h$" x)))
                            (mapcar #'namestring (directory #p"/usr/include/gsl/*.h")))
    :gccxml-flags "-I /usr/include/gsl -DGSL_DISABLE_DEPRECATED"
  `(,(cl-ppcre:create-scanner "(^gsl_?)")
     ,(cl-ppcre:create-scanner "bessel_(zero_)?([jkiy])")

This will translate the lower case Bessel functions of the form gsl_sf_bessel_j into gsl_sf_spherical_bessel_j while leaving gsl_sf_bessel_J alone.

The much more troubling situation is when GSL uses both N and n, or F and f, in a single functions argument list. This cannot work, as I'm sure is clear as day. In order to fix this, I needed to actually hack on Verrazano a bit. I had to change the write-cffi-function function from it's current version to:

;; From...
(do-arguments-of-function (argument node)
  (incf index)
  (if (typep argument 'gccxml:ellipsis)
      (write-string "common-lisp:&rest")
      (bind ((argument-name (aif (name-of argument)
                                 (transform-name (name-of argument) :variable)
                                 (format nil "arg~A" index)))
             (argument-type (type-of argument)))
        (push argument-name previous-args)
        (pprint-newline :fill)
        (format t " (~A " argument-name)
        (write-cffi-type argument-type)
        (format t ")"))))

;; ...to...

(let ((previous-args nil))
  (do-arguments-of-function (argument node)
    (incf index)
    (if (typep argument 'gccxml:ellipsis)
        (write-string "common-lisp:&rest")
        (bind ((argument-name (if (and (name-of argument)
                                       (not (member (name-of argument)
                                                    :test #'equal)))
                                  (transform-name (name-of argument) :variable)
                                  (format nil "arg~A" index)))
               (argument-type (type-of argument)))
          (push argument-name previous-args)
          (pprint-newline :fill)
          (format t " (~A " argument-name)
          (write-cffi-type argument-type)
          (format t ")")))))

This change keeps track of what symbols we have already used in the argument list and replace it with a generic argument of the form "arg<n>". This gets rid of the last error that stops this file from loading. Now we can load this file and, again, we are basically where we would be if were a C programmer, but we can work from Lisp to build a good interface on top of this set of generated bindings. This brings up another point:

  • Verrazano is not production ready; it is hacker ready. Verrazano is useful because it is a tool that developers use every once in a while for generating bindings. It is not ready for every end user to use.

Okay, so we have something that will load. However, you may have noticed that there are crap-load of warnings that are produced when those bindings are loaded. These warnings are regarding a pretty new improvement to CFFI. Not so long ago, if you wanted to pass a structure to a function using CFFI, your only option was to pass it by address. If you needed to pass it by value you were out of luck. Luckily basically nobody uses pass by value, but it is a little limiting. Now CFFI supports this, so you need to specify structs by (:struct struct-name) and pointers to structs as (:pointer (:struct struct-name)). In order to fix this we are going to have to do a bit more hacking, but for now these are just warnings.

If you have been perusing the generated bindings, you may be wondering what all these weird functions that Verrazano is making are coming from. Some of the functions look good, but some seem very odd, for instance _ZN17gsl_vector_ushortaSERKS_. These functions look very odd and they show up because Verrazano is, fundamentally, a tool for making C++ bindings, not C bindings. It is interpreting these header files as C++ header files. It doesn't offer a "assume this is C" option. This isn't Verrazano's limitation, really, the GCCXML people are not interested in working with anything but C++. Often this doesn't matter as C++ is mostly a superset of C. However, there are differences. One difference is that when it sees structs, it has to assume that these are C++ sturcts, which are basically classes in C++ with all of their members public by default. This means that there will be a bunch of code that is generated by Verrazano like constructors, destructors, and comparison operators that have special GCC style "mangled" names. These symbols are actually not in the library at all as a simple nm -D /usr/lib/libgsl.so.0 will demonstrate. I suppose that these don't really do much harm, and I don't know how to get rid of them at the time short of post-processing the file and removing them. Plus we may want them if we are linking against a library with a C++ interface some day. At least Verrazano seems kind enough to place all these oddities at the end of our bindings file.

To prove that this works:

;; Define our library and use it (stolen from GSLL)
(cffi:define-foreign-library libgslcblas
  (:darwin #+ccl #.(ccl:native-translated-namestring
                    (gsl-config-pathname "libgslcblas.dylib"))
           #-ccl #.(gsl-config-pathname "libgslcblas.dylib"))
  (:cygwin "cyggslcblas-0.dll")
  (:unix (:or "libgslcblas.so.0" "libgslcblas.so"))
  (t (:default "libgslcblas")))

(cffi:use-foreign-library libgslcblas)

(cffi:define-foreign-library libgsl
  (:darwin #+ccl #.(ccl:native-translated-namestring
                     (gsl-config-pathname "libgsl.dylib"))
           #-ccl #.(gsl-config-pathname "libgsl.dylib"))
  (:cygwin "cyggsl-0.dll")
  (:unix (:or "libgsl.so.0" "libgsl.so"))
  (t (:default "libgsl")))

(cffi:use-foreign-library libgsl)

;; Load the bindings
(load #p"gsl-cffi-bindings.lisp")

;; Define our callback function
(cffi:defcallback csin :double
    ((x :double)
     (params :pointer))
  (declare (ignore params))
  (float (sin x) 0d0))

(let ((workspace (gsl-cffi-bindings:integration-workspace-alloc 10)))
       (cffi:with-foreign-objects ((gsl-fn '(:struct
                                   (result :double)
                                   (abs-error :double))
         ;; Setup the GSL function object
           gsl-fn '(:struct gsl-cffi-bindings:function-struct)
          (cffi:callback csin))
         ;; Call the quadrature routine
           gsl-fn '(:struct gsl-cffi-bindings:function-struct))
          0d0 (/ pi 2d0) 1d-10 1d-5 10
          (cffi:convert-to-foreign :gsl-integ-gauss-61 'gsl-cffi-bindings:.-155)
          workspace result abs-error)
         ;; Return the result and estimate of the error of the result
         (values (cffi:mem-ref result :double)
                 (cffi:mem-ref abs-error :double)))
    ;; Clean up
    (gsl-cffi-bindings:integration-workspace-free workspace)))

It may be ugly, but it works. Note that everything here translates pretty directly to the C code you would write to do this using GSL. In the future we'll explore how to make this a bit easier to use.

Till next time.