directed procrastination: Machine Learning

Showing posts with label Machine Learning. Show all posts

Friday, October 25, 2013

CL-Libsvm Usage

CL-libsvm usage

At work we decided that we needed to use a bit of machine learning to automate a problem. I recommended using LIBSVM as it very established (read old with many bindings available). I decided to see what was involved in using it this morning, but found that it took an embarrassingly long time to get it working like I expected it to out of the box. This is my short-coming, I had forgotten too much of the machine learning that I used to know. But this post will hopefully help others get it working more quickly.

First, SVMs are pretty versatile, and there are a lot of choices that you can make when you start using LIBSVM. It is often not clear what you should use for what task, and I don't have the expertise to tell you what is best. For that information, you should check out this guide by the library's author. What I will show here is a quick and dirty approach.

I will be using Gabor Melis' bindings for LIBSVM. They work well enough, but they will issue many warnings when you load them up and use them as they use a now deprecated syntax of CFFI. Just be aware.

We will be using a radial basis function (RBF) kernel to model the "a1a" training data and test data that is available on the LIBSVM website. Already this is a bit odd, as the LIBSVM people have split their data such that ~5% is in the training data set while ~95% is in the test data set. This is probably due to the somewhat expensive nature of SVMs? But, whatever the case, we will stick with it.

Here is a simple procedure to read in the data:

(ql:quickload '(:iterate :cl-ppcre :alexandria))

(use-package :iterate)

(defun parse-data-file (file)
  "Read in a data-file, return list of target values with input as a sparse
vector suitable for cl-libsvm."
  (iter (for line :in-file file :using 'read-line)
    (collecting
     (list (read-from-string line)
           (let ((result nil))
             (ppcre:do-matches-as-strings (match "\\w+:\\w+" line)
               (push
                (apply 'cons (mapcar 'read-from-string
                                     (let ((separator (position #\: match)))
                                       (list 
                                        (subseq match 0 separator)
                                        (subseq match (+ 1 separator))))))
                result))
             (coerce (reverse result) 'vector))))))

(defun randomize-order (data)
  "Return a random permutation of the data."
  (alexandria:shuffle (copy-seq data)))

The parse-data-file returns a vector where each element is a line in the data file mapped into a list of the form (target-val sparse-vector-input). A sparse vector is a vector where each element is a cons cell containing an index as its car and a value as its cdr.

We need to further divide the testing data so that we have a data set that we can use to validate the parameters that the RBF SVM uses (the "validation" set) while we use the rest of the test set to determine the accuracy of the model (the "test" set). We could just cut the training data in half, but that would make our validation calculations very slow, and as we will see is a moment, this becomes the bottleneck of the calculation. I have chosen my validation set to be 2000 point. Whatever size you choose will be a trade off between accuracy and speed. Having 2000 points in your validation set is about my limit when it comes to playing around at the REPL (I would choose something higher if it showed benefits for the models accuracy). If we are dividing the data, we must first first randomize it to ensure that no structure of how the data was collected shows up in the analysis. This can be done with the randomize-order function above and a couple calls to subseq.

(let ((test-data (randomize-order (parse-data-file #p"~/Desktop/a1a.t"))))
  (defparameter *validation-set* (subseq test-data 0 2000))
  (defparameter *test-set* (subseq test-data 2000)))

Now, to get into the nitty-gritty of CL-Libsvm. First, you'll need to install it, and it is not available in Quicklisp yet. So, clone it from Gabor's Git repo. Naturally you will also need to have LIBSVM installed (you don't actually need libsvm-dev or libsvm-tools, but why not have them anyway):

cd ~/quicklisp/local-projects
git clone http://quotenil.com/git/cl-libsvm.git

# Install LIBSVM if needed
sudo aptitude install libsvm3 libsvm-dev libsvm-tools

CL-Libsvm is basically structured like LIBSVM, so the same documentation applies here. We must first create a problem which contains inputs and the expected outputs, then create a parameter structure that contains the parameters that define how the SVM operates (numerical constants as well as the type of SVM and kernel), and finally combine these two into a model which can be used to make predictions.

This can be done easily enough for any particular RBF parameters \(C\) and \(\gamma\).

(ql:quickload :cl-libsvm)

(let ((data (parse-data-file #p"~/Desktop/a1a")))
  (defparameter *problem* (libsvm:make-problem (map 'vector 'first data)
                                               (map 'vector 'second data))))

(defparameter *parameter* (libsvm:make-parameter :c c :gamma gamma))

(defparameter *model* (libsvm:train *problem* *parameter*))

But we want to do this for many different values of \(C\) and \(\gamma\) in order to find what parameters give us the best performance. We could do several things to find the optimum values. We will be following the LIBSVM procedure and just search the parameter space on a grid. We should note from the manual that it is probably in our best interests to search in \(\log C\) and \(\log \gamma\).

(let ((data (parse-data-file #p"~/Desktop/a1a")))
  (defparameter *problem* (libsvm:make-problem (map 'vector 'first data)
                                               (map 'vector 'second data))))

(defparameter *optimization-grid*
  (iter :outer
    (for log-c :from 1.5 :to 3.5 :by .1)
    (iter (for log-gamma :from -5 :to -3.5 :by .1)
      (in :outer (collecting
                  (list* log-c
                         log-gamma
                         (let* ((params (libsvm:make-parameter
                                         :c (exp log-c)
                                         :gamma (exp log-gamma)))
                                (model (libsvm:train *problem* params)))
                           (list (quality model *validation-set* log-c log-gamma)
                                 model))))))))

Note that there is a missing function here. We never defined quality. This function is meant to take a model and some testing data and determine a measure of how good the model is performing. For this I chose to use the Matthews Correlation Coefficient with the threshold for the prediction set to \(0.5\).

(defun logistic (x)
  (/ (+ 1 (exp (- x)))))

(defun quality (model test-data log-c log-gamma)
  "Use the Matthews Correlation Coefficient to measure how well the model does"
  (iter (for (target input) :in test-data)
    (let ((p (if (< 0.5 (logistic (libsvm:predict model input)))
                 1 -1)))
      (cond ((and (= p 1) (/= target p))
             (summing 1 :into false-positives))
            ((and (= p -1) (/= target p))
             (summing 1 :into false-negatives))
            ((and (= p 1) (= target p))
             (summing 1 :into true-positives))
            ((and (= p -1) (= target p))
             (summing 1 :into true-negatives))))
    (finally
     (let ((quality
             ;; Compute quality of model
             (if (= 0 (- (* true-positives true-negatives)
                          (* false-positives false-negatives)))
                  0d0
                  (/ (- (* true-positives true-negatives)
                        (* false-positives false-negatives))
                     (sqrt
                      (float
                       (* (+ true-positives false-positives)
                          (+ true-positives false-negatives)
                          (+ true-negatives false-positives)
                          (+ true-negatives false-negatives))
                       0d0))))))
       ;; Print some output so we know what it's doing
       (format
        t "log-c = ~A, log-gamma = ~A~@
           TP = ~A, TN = ~A, FP = ~A, FN = ~A~@
           Quality = ~A~%"
        log-c log-gamma
        true-positives true-negatives false-positives false-negatives
        quality)
       (return quality)))))

When we put this all together and playing around with the ranges of the plots, we get a plot that looks like this:

From this we can tell that there is likely some kind of optimum around \(\log C = 2.10\) and \(\log \gamma = -4.27\).

You might wonder how I was able to determine, looking at those blocky heat maps, where that tiny maximum was. While you can do what LIBSVM docs suggest and move to finer and finer grids to find optimal points, I find this pretty annoying, and finicky. I opted to use some, as yet unreleased, bindings for the NLOpt optimization library. With NLOpt we can do a global optimization followed by a local optimization, which requires next to zero human intervention and finds a pretty good optimum that I doubt I would be able to otherwise. (Small caveat here, it is entirely possible that the rough nature of my MCC heat map is merely an artifact of the small validation set size. I don't have the patience to test this for a silly example)

;; Global optimization (better grab a snack)
(nlopt:nlopt-apply
 (lambda (log-c log-gamma)
   (let* ((params (libsvm:make-parameter
                   :c (exp log-c)
                   :gamma (exp log-gamma)))
          (model (libsvm:train *problem* params)))
     (- (quality model *validation-set*))))
 '(1 1)
 :nlopt-gn-crs2-lm
 :lower-bounds '(-5 -5)
 :upper-bounds '(7 7)
 :abs-xtol 1d-1)

;; Fit parameters = (2.071364331304816d0 -4.265683211751565d0)
;; Validation MCC = -0.5509856550306286d0

;; Local optimization (this should be quick)
(nlopt:nlopt-apply
 (lambda (log-c log-gamma)
   (let* ((params (libsvm:make-parameter
                   :c (exp log-c)
                   :gamma (exp log-gamma)))
          (model (libsvm:train *problem* params)))
     (- (quality model *validation-set*))))
 '(2.071364331304816d0 -4.265683211751565d0)
 :nlopt-ln-bobyqa
 :lower-bounds '(-5 -5)
 :upper-bounds '(7 7)
 :abs-xtol 1d-5)

;; Fit parameters = (2.096969188326027d0 -4.268553108908674d0)
;; Validation MCC = -0.5522135970232868d0

(let* ((params (libsvm:make-parameter
                :c (exp 2.096969188326027d0)
                :gamma (exp -4.268553108908674d0)))
       (model (libsvm:train *problem* params)))
  (quality model *test-set*))

;; Test set MCC = 0.5497032935038368d0

This gives the optimum at \(C = 2.10\) and \(\gamma = -4.27\) and a Matthew's Correlation Coefficient of \(0.55\) (measured against the test set, naturally).

Thursday, November 1, 2012

No More Coursera Starter Kit

I just finished the latest Coursera assignment minutes before the deadline (a deadline that was extended due to Hurricane Sandy). The assignment was not hard, but it certainly was time consuming. Clearly I will be unable to put out a starter kit in any timely manner. But, more importantly, I realized something when working through the assignment, this code that they have offered is a mess. The act of porting this to stuff to Common Lisp would be a nightmare. It is much better to rewrite the stuff. The problem with this is that the actual assignments more or less require you to use the Octave code as it does things like sample from a random distribution. It would be annoyingly difficult to guarantee that any code I wrote would sample the distribution the same way, especially since they use a built in MatLab/Octave routine. I would have to research what PRNG Octave uses and likely implement it and make sure to seed it with whatever Octave is using.

The other thing I realized when working with the perceptron code is that writing starter code sucks for the programmer. Writing code that has trivial pieces missing from it on purpose is not how I want to spend my time. Creating teaching materials requires a crazy amount of prep-time and I don't care enough about education to spend all that time and, in the end, am left with nothing but a classroom exercise that cannot even be used for tinkering at the REPL.

This has lead me to the decision to stop my already feable attempts to maintain a Common Lisp port of the neural network starter code. However, I still find that porting code is extremely useful in making sure you understand the concepts involved in the code. I have always felt that once I could write the code for something, I tend to truly understand it. So I will be taking the provided code from the course as a starting point and creating my own neural network related code, closely in parallel with the course, in Common Lisp. I will not be leaving out the missing pieces as they do in the class.

I have decided that this won't really violate any rules of the course. Looking at my perceptron code and the code that was released by the class organizers I doubt that any person would see one as derivative to the other if they didn't already know I was working off the other. Further, I doubt that any person that couldn't answer the questions in the programming assignments using the MatLab code could look at my code and identify the pieces that needed to be ported back. And it isn't as if executing the Lisp code will help them at all. As I said, I doubt I could actually write code that would give the correct numerical answers for the homework even if I wanted to because of the MatLab/Octave heavy emphasis in the provided code. I was able to just barely do it in the perceptron case and it probably took about half the development time to figure out how I would load the MatLab data and exactly match their form of the learning algorithm.

If you are following this work, I will probably stay on the same git repository for a while, but might switch to another more appropriately named one in due time. Hopefully I will have a little toy neural network library to play with in a few weeks.

Friday, October 19, 2012

Neural Networks, Coursera, and Common Lisp

There is a course offered on machine learning using artificial neural networks offered at Coursera this "semester". I am taking it and it is my first class I have taken with Coursera. I have forgotten how nice it is to learn a new subject.

Besides the fact that artificial neural networks have always interested me, one of the reasons I decided to take this class is to become somewhat fluent in Python, which for some reason I thought would be one of the supported languages. Turns out it isn't, but no matter. Python is popular enough that someone will always port to Python.

I figure I might as well do the same for Common Lisp. This is very late for the first assignment, and I make no promises that it will be more timely in the future. I decided not to do a line by line porting of the Octave or Python code to Lisp because, well that's just not how we Lispers do things. It is cleaner in its current form, but harder to prove correct at a glance. Also, I will be using some packages that I have been working on but are not officially released, such as my zgnuplot library. I have been using this plotting library with in-house code for a long time, but it is still pretty rough around the edges and the interface is still in flux (hopefully headed towards something a bit better). But whatever, it's out there, feel free to do what you will with it.

Note to people that might fork this, the rules on Coursera (well, at least this course) is that you shouldn't post solutions. So don't work off this repo, plug your solution into it, and publish it online. You should probably work in a private branch that you don't publish to Github.