directed procrastination
A blog about programming, mathematics, physics, and free software.
Thursday, May 16, 2013
ICFP Contest 2013 Anyone?
The contest seems to be starting August 8th sometime and should run for the next 72 hours. If you are at all interested in programming Common Lisp on a team ICFP attempt, please comment below, or email, or whatever your favorite method of communication.
This year I have a few new tools that should work better than last year, including a way that spectators can follow along without video streams that are susceptible to crappy resolutions (anyway, I now know someone that works on the YouTube team at Google, so he could probably get the old setup to work better). I plan to post on those tools later to try and drum up interest as the contest approaches.
Wednesday, May 15, 2013
What is wrong with Python strings?
- Python actually only allows ASCII characters? Really?!? If you include some odd (or not so odd) character like a lambda, it will choke. If this doesn't seem like a big deal to you, you have clearly never dealt with any language other than English, nor have you dealt with the various fancy forms of punctuation. You can instruct Python to accept a different encoding by putting a magic comment at the beginning of your source file that looks like "# coding: <encoding>". As a reference, the last time this was an issue in any Lisp I tried, was around 5 years ago with CLISP (a fairly out of date Lisp at the time) and it was remedied shortly after.
- What is going on with your string type? If I index into the string type, I can land in the middle of a multi-byte character. The very fact that this is possible is evidence that this string type is fundamentally screwed up.
- Why is the printed representation of a Unicode string unreadable? Seriously, we have terminals that can handle pretty much anything that you can throw at them, why print a bunch of line noise instead of the actual character?
- What is wrong with the way you output in Python? Why is it that I get an encoding error when I try to pipe or redirect output from my program to another program and a non-ASCII character is encountered? I have no idea what is going on here, or how to get around it. This is, quite possibly the biggest hiccup in terms of productivity. I cannot even send my output to a file or to less in order to carefully review the output. How hard is it to dump the output of a program into a file? In the case of pipes, you might think that this is a shortcoming of the pipe itself, or of the program on the other side of the pipe, but this is not the case. I routinely pipe all sorts of crazy text through pipes and to shell commands without any problems; whatever the issue is, it is rooted in Python.
People say (and I believe) that Common Lisp is simultaneously the highest level and lowest level programming language you will ever use. In Lisp you can write code that is so abstracted you have have no idea of what code is actually being passed to the CPU, much less the data-structures or memory usage. But, in the same program, you can write code that translates directly to machine instructions in a predictable way (or even put inline C or assembly). Because of its high level nature, I have never had to worry about strings with odd characters in them; they just work. And, because of its low level nature, it is simple to convert strings down to ASCII when you have to, which is a very rare need in my experience. You would think that Python, whose earliest versions where a full 5 years after the most recent standardization of Lisp (coming up on 20 years since then), would have done this better.
I could actually go on and on about things that I perceive Python sucking at, for instance the fact that there is a REPL but no iterative development, or the fact that the error messages I get are basically garbage (however, Lisp is not that much better in this regard), or the fact that you must explicitly have a "return" statement on your functions, and of course, of course, the accursed whitespace dependence (which isn't actually annoying because of the whitespace, but because I actually have to tab around to get the correct indentation level in Emacs). But those things are probably just matters of taste, something that usually works out with time. The string handling in Python, however, is basically inexcusable.
Friday, April 26, 2013
Rogoff, Reinhart, and Research: Four Questions
This has been irking me for the last week or so. From Krugman's article in the NY Times:
Finally, Ms. Reinhart and Mr. Rogoff allowed researchers at the University of Massachusetts to look at their original spreadsheet — and the mystery of the irreproducible results was solved. First, they omitted some data; second, they used unusual and highly questionable statistical procedures; and finally, yes, they made an Excel coding error. Correct these oddities and errors, and you get what other researchers have found: some correlation between high debt and slow growth, with no indication of which is causing which, but no sign at all of that 90 percent “threshold.”After reading about this a bit, as a person that performs publicly funded research for a living (at least right now) I find myself with four questions. First...
Why are you using Excel for important, presumably mathematically involved, statistical work?
Now this is a minor issue, more of just a curiosity really and a particular pet peeve of mine: Excel is a spreadsheet program. It is meant for things like making invoices, calculating grades, and balancing your check book, and even for those purposes it is not the best tool for the job. Each of those tasks have specialized programs written expressly for them. Excel is a "jack of all trades" sort of program, which is good (I tend to like those types of programs), but if your profession is to run statistics on data, is that the tool you should use? We have tools that are designed specifically for statistical analysis. We even have tools that look basically exactly like Excel but have a bunch of statistical tools built in as well.
Yes, Excel can be used for more complicated things. I knew a guy who used Excel to perform a discrete element method calculation of the heat flow throughout the complex geometry of an internal combustion engine. It worked, but it is not what Excel is meant to do. I'll even go so far is say that you should not use Excel to solve PDEs in complicated geometries; just a blanket no, don't do it. Excel is certainly Turing complete, that doesn't mean that you should use it for everything. I can use LaTeX to compute as well (TeX is also Turing complete), but I would never use it for something that wasn't document formatting.
Should you use tools for things that they are not intended to be used for? Sure, if the tool is well suited for that task. Sometimes tools are well designed and they can be used for tasks that the designers never intended them for, sometimes wildly outside the target use case. This usually falls under what we would call a hack. Should you use hacks routinely in your professional code? Probably not.
Perhaps I am off base. I'm not a statistician or an economist, perhaps the mathematics/calculations involved in economics is so brain-dead simple that Excel or a pocket calculator is the perfect tool for the job.
Why is it that the simple coding error resonates so well with the public whereas the general bad statistics falls flat?
The answer (which Krugman draws attention to elsewhere) is actually pretty obvious, one is embarrassing because people understand it and have probably made a similar mistake themselves while the other is considered complicated. Thus, people tend to give a pass on the latter. There is also the fact that the coding error resonates better with the media. It is a much easier story to tell, and media outlets routinely talk to the lowest common denominator (how else are you going to make money on the news?).
My understanding is that the paper did some pretty sketchy things regarding selecting what data to include and what to exclude in their analysis. There was also some pretty bad logic involved as well: the paper was published based on a correlation found in some data but never made a causality argument. I think that this is actually pretty common in social sciences, but data mining for correlations and then never attempting to figure out the reason behind them is a pretty shoddy research method.
Of all of the errors that seem to have gone into this research, the error in the spreadsheet seems to be the least offensive. That is until you consider my next question.
Why did this take so long to uncover?
The answer to this is almost certainly the fact that the code that was used for these calculations was not made available to the public. This was a contentious result and others have attempted and failed to reproduce the results. This is exactly the time when having source available to others would have helped this dispute get resolved much faster.
I know that there are people that don't share my opinion that Free/Libre Software is (to first approximation) always to be preferred over proprietary alternatives, but there are places where it is wholly inappropriate to use proprietary software. Perhaps the most important place for Free/Libre Software is the source code and interpreters/compilers for that source code that is used in public research. Note that I'm not talking about Open Source development; we are talking about the freedom to inspect the code, use it in your own research, and distribute it to others, not a development model, though that might be a good fit for some projects.
In my opinion, and hopefully more and more researchers share this opinion, this research is flawed largely because it is not available for inspection. It is not available for inspection for two reasons: 1) Excel, the interpreter, is not Free Software nor gratis, and 2) the Excel document that they performed the calculation in was not made immediately available. Though, to their credit, it was made available to another researcher upon request and Excel is widely used and has several Free Software spreadsheet programs that very likely would run these files. Think of how much better things would have been if the files were posted where anybody could get at them on a whim rather than by request. Anybody could execute them without buying a software license (due to gratis software) and understand what is happening in the software (freedom 1) and have the right to pass on altered versions to other researchers (freedom 2 and 3).
In addition, the proprietary nature of Excel causes concern as well as there may be internal errors in Excel that invalidate user programs that are correct. I should point out that errors within Excel itself become more and more likely as you start using Excel for things that it really wasn't made for, like advanced statistical analysis/modelling.
Thus, I think that people performing public research should publish any associated code to the public. And yes, I know that this is a scary prospect. I recently found an error in some code that I had written for a paper after it had been submitted to a journal. I had neglected to initialize a floating point variable in my program. Luckily, it did not effect the end result, and I really appreciate the luck in being able to fix this before someone else found it. If I am honest to myself, I am grateful that I didn't get caught making a huge mistake; I am grateful that I avoided shame, and that I got to analyze the situation and (if it had been necessary) get a head start on damage control.
But scary as it is, let's face it, the right thing to do for the advancement of research is to have open access of source code to other researchers and have that source code licensed in such a way that others can use it in their work. After all, what is worse, your work misleading people for years before it goes down in flames, or someone pointing it out early before anything bad happens? More eyes mean errors get caught faster, even if some of those eyes are out to completely discredit your work; perhaps even more so when this is the case.
Do we need a "Free/Libre Research" Movement?
When pondering this last question, I initially thought that it might be useful to define a sort of "Free/Libre Research" movement, where publications themselves and all source code and data associated with it are made available to the public (or at least the portion of the public that funded the research) at approximately the cost required to package and deliver it if not free and can be freely reused in derivative work. There are hints of this happening, particularly in the field of Computer Science. After some thought, I realized that such a movement shouldn't actually be necessary.
Defining such a movement is akin to saying that we want to do scientific research, i.e. we should be doing it already this way already. These conditions I've described in a so called "Free/Libre Research" movement clearly falls under the very definition of scientific research. It is all part of reproducibility (i.e. have other researchers attempt to reproduce your results) and falsifiability (allowing other researchers to potentially challenge your hypothesis). And, while it is not absolutely necessary to provide source to people in order to maintain reproducibility and falsifiability, shouldn't we be actively trying to make the scientific process work better rather than hindering it?
Wednesday, April 10, 2013
Programming Competition News
If you are into programming, and programming competitions, and programming in Lisp, three things that are probably highly correlated with the readership of this blog, then this is probably of interest to you. The Lisp in Small Projects competition has been announced and is slated to run from the end of June to the end of September. This probably encompasses ICFP this year, but what is one weekend out of three months?
This looks like a very fun activity and I very much plan to take part if time permits (alas, time is always the enemy). There is something to be said for long term programming competitions. As I have written before, programming in the extreme short term, while fun, tends to breed certain otherwise undesirable development habits. This competition will promote a better development style.
In other awesome (semi) news, HackerRank, a website that is about solving fun programming tasks, has allowed Lisp submissions for a few months now. They allow you to submit programs that will be compiled and executed using a relatively recent version of SBCL (v1.0.55, I think). I have been fairly silent about this in the past as I have been working on something that should make HackerRank a whole lot more fun for us Lispers and a post about why it is necessary in the greater scheme of things, which I will be posting shortly. I just need to test it a bit first. But as it stands, you can use Lisp, and some of there tasks are, indeed, fun (they tend to be under the Artificial Intelligence category). There are also fairly boring problems (which tend to be under Algorithm Challenges, though still some good stuff there) and the extremely silly category of Code Golf (which has taught me that Lisp is a relatively verbose language, and that I am more than okay with that).
To people that are "working through Project Euler", unless you are a mathematician by trade, HackerRank is probably a much better use of your time if you are interested in improving your programming skills for the real world in any language.
Tuesday, February 5, 2013
And So It Came To Be
Some time ago Denis Budyak posted on the Iterate devel mailing list regarding a patch that would allow Iterate to use keyword symbols for drivers and clauses. The Iterate devel mailing list was in general, and the current maintainer in particular, very quiet regarding this matter, scarily so, even. But I and one other user spoke out that we thought this was a bad idea. This is not exactly a consensus, but here were the basic arguments. Denis wanted to patch Iterate such that this was valid syntax.
(iterate (:for (x y) :in seq) (:while (some-predicate-p y)) (:collect (list (f x) (g y))))
This removes one of the common annoyances that new users of Iterate have to grapple with; the iterate package exports a lot of symbols with very common names. For instance, people often times implement their own while macro as part of an introductory text. Iterate has its own while loop macro and this leads people to complain that they cannot have their homebrew:while and iterate:while imported into the same package (why they don't complain about other symbol names conflicts is a bit lost on me, I remember writing my own with-gensyms that I was relatively proud of). This patch would be a way around this.
My argument for not doing this was primarily that it would bifurcate the method by which Iterate is extended. For those that are not aware, Iterate is a DSL like Loop, but it is not as disjoint from standard Lisp as Loop is. Iterate is fairly transparently built out of standard Lisp macros and functions and, as such, Iterate is extended by writing standard Lisp macros and functions. Because of this "Lispy" design, symbols are not just syntactic markers for the DSL, they actually have meaning in the Lisp system at large. In this framework, the package system gives us the benefits of a divided name-space (as much benefit as it can give until the nickname business is figured out).
But in the end, I had to admit that this change wasn't likely to actually break anything so long as the original syntax was still allowed. I still oppose this change on the basis that it would muddy the waters of extension, raising the predefined drivers and aggregation clauses like for, while, and collect to a higher, irreplaceable status. For example, this would mean that I couldn't redefine the for driver. Even with all of this in consideration, I had to admit, this was a weak argument and these problems would never actually come up. Turns out, I was wrong.
Let's say, for instance, we find something that is more or less a bug in iterate:for. Let's say we find something that is basically just not the way that it should be done. For instance, why doesn't for accept normal destructuring-bind style argument lists?
The destructuring-bind construct will take apart a lambda list like the ones used in the defmacro forms. The destructuring-bind construct is a basic part of Lisp. And yet this will produce an error in standard Iterate:
(iterate (for (x &rest y) in-sequence #((1 2) (3))) (for (&key (a 3) b) in '((:a 1 :b 2) (:b 3))) (collect (list x y a b)))
That just seems like it should work. Not only does it seem like it should work, it would be pretty darn useful if it did work. For instance, it would be easier to use lists as general purpose data containers because we can use keyword meta-data without an extra destructuring-bind cluttering the code. We can also use optional bindings, something that shows up a lot in the destructuring I do. This method of destructuring is inherently more powerful than the facility that Iterate currently has and it's more consistent with standard Lisp syntax.
This brings me to the point. Iterate has a defined built in behavior which I don't agree with. To me this feels like a bug, something that should be fixed. However, this fix is not compatible with the current behavior. One of the interesting aspects of Common Lisp is that any enhancement at all is usually not backward compatible because errors have a defined run time consequence. This means that there is a pretty good argument to not "fix" this. This seems at first like an impasse, and it would be if we were using Loop or a version of Iterate with the proposed patch. The extensibility of Iterate can save us here.
To summarize my options, I could:
- Patch my local Iterate so this works. This means that my code will not work with anybody else's version of Iterate. This is a non-option really.
- Write a patch and petition the Iterate developers and community at large to accept this change. This is more attractive, but still deeply flawed. First, because errors have defined run-time behavior in Common Lisp, this is technically an incompatible change. Second, seeing as the maintainer didn't even weigh in on the previous discussion, I don't hold high hopes that any patch would be accepted in a timely manner.
- Or, simply replace the
fordriver with my own.
This last option means I am able to make the change I want without approval from a community or maintainer (which may or may not be absent) and am able to do so in a completely portable, completely backward/forward/concurrently compatible way. With this method, I can make the change and start using it immediately in my own code. Meanwhile, I can publish the change and the maintainer can contemplate the benefits of such a change, and the community can make the decision for their selves on an individual by individual basis. In order to use it, just (shadowing-import smithzv.destructuring-bind-iterate::for), or when you define your package:
(defpackage my-package (:use :cl :iterate) (:shadowing-import-from :smithzv.destructuring-bind-iterate #:for))
If you want both behaviors, simply use smithzv.destructuring-bind-iterate which will import the new macro under the name dfor. Further, if the maintainer ever wanted to include this in Iterate, they have a working implementation that can be easily adapted to their code.
I am able to do all this because of the framework of extension that the proposed patch would uproot, or at the very least weaken. With the built-in macros and functions in Iterate implemented as keyword symbols, it would be impossible to create a drop-in replacement like this. I guess that in the end, this is the best argument I have as to why this change shouldn't be made.
Thursday, January 17, 2013
Soft-Semicolons: A little Emacs hack
I have been waging an all out war against the "Shift" key. I find that for programmers these keys, and in particular the left shift key, are used way too much. In my case, this overuse (paired with the common use of control key modifiers and the fact that my keyboard only has a control key on the left side) produced some numbness in my left pinky. This has since been eliminated via removing most of the "shifting" I do on a daily basis by using the Programmers Dvorak keyboard layout. In the process of doing this, however, I realized how annoying and disrupting it is to actually type a shift+key sequence in general. I realized that the annoyance of typing a colon before a symbol is one of the primary reasons that I tend to use the slightly problematic standard symbol notation in Loop or Iterate:
(loop for i below 10 by 2 collect i) (iter (for i below 10 by 2) (collect i))
…rather than the more syntactically and stylistically pure keyword notation:
(loop :for i :below 10 :by 2 :collect i) (iter (for i :below 10 :by 2) (collect i))
So, partly as an exercise in Emacs Lisp and partly just to scratch this personal itch, I decided to modify Emacs behavior in order to make colons very cheap to type. One option is to switch the semicolon and colon on your key map. This makes semicolons more expensive to type, with would be a pretty big loss for C coding where semicolons are much more common than colons. This might lead to the unfortunate situation where your key bindings are not the same between different modes (first layer colons in Lisp, first layer semicolons in C). This is a pretty messy solution.
What I really wanted was to have certain semicolons be converted to colons in certain situations. For instance, if I write a semicolon and then follow it with text (with no whitespace in between), it is very likely that I am trying to write a keyword, so I would like the preceding semicolon to be converted into a colon.
;keyword -> :keyword
But it is certainly the case that if I write a semicolon, then whitespace, then some text, I am trying to write a comment. In this case I want to leave the semicolon alone.
;; Some comment -> ;; Some comment
Naturally, since this is a little hack to save using the shift key, I would like these conversions to be transient, i.e. they only attempt to convert the semicolons if the very next character decides it. For instance, if I move the cursor to the front of some semicolons and start typing, those semicolons should be unaffected (the '_' marks the cursor):
_ ;; ;;_ ;;some text -> ;;some text
There were a few ways I could think about doing this, but the aim is to be unobtrusive. My solution was to rebind the semicolon key and have it read the characters and commands you give until you give one that isn't "type a semicolon", in which case it decides if it should convert the semicolons it typed on not. This is based very closely on the code in kmacro, namely the function kmacro-end-and-call-macro, which uses the same mechanism to temporarily bind a key (typically "e") to repeat the macro you just performed.
(defcustom *soft-semicolons-also-convert-on* '(9 tab)
"This variable marks characters that will trigger semicolon
conversion in addition to the non-whitespace printable
character requirement.")
(defcustom *soft-semicolons-dont-convert-on* '(?( ?))
"This variable allows you to exclude certain characters from
triggering conversion.")
(defun soft-semicolons (arg)
"Type semicolons like normal expect if they are immediately
followed by a non-whitespace character, in which case convert all
of the consequtive semicolons you were typing into colons."
(interactive "p")
(let ((keep-going t)
(start-point (point)))
(insert 59)
(while keep-going
(let ((event (read-event)))
(cond ((equal 59 event)
;; A Semicolon, insert and keep going
(clear-this-command-keys t)
(insert 59)
(setq last-input-event nil))
((member event *soft-semicolons-dont-convert-on*)
;; For these special cases, don't do any conversion
(setq keep-going nil))
;; A non-whitespace printable character or something in
;; *soft-semiclons-also-convert-on*
((or (member event *soft-semicolons-also-convert-on*)
(and (integerp event)
;; See if this is a printable character
(aref printable-chars event)
;; Rule out whitespace characters (which might also be
;; printable)
(not (member event '(9 10 13 32)))))
(let ((length (- (point) start-point)))
(delete-region start-point (point))
(insert-char 58 length))
;; ...and exit...
(setq keep-going nil))
;; Exit on anything else
(t (setq keep-going nil)))))
;; Push any residual command back onto unread-command-events to be read and
;; processed
(when last-input-event
(clear-this-command-keys t)
(setq unread-command-events (list last-input-event)))))
Ironically enough, Common Lisp is one of the only languages I can think of where this soft-semicolon thing interferes with standard syntax. Logical pathname namestrings use semicolons as the delimiter between directories. These directories are necessarily whitespace dependent, and this means that this little hack will make it very annoying to insert logical pathname namestrings. I have never used a logical pathname, I'm pretty sure I never intend to, so I guess this isn't a huge concern for me.
I threw the code up on Github in case you'd like it. This is one of my first forays into Emacs Lisp coding, so I am even more grateful than usual for any comments on how this is implemented or how it could be implemented in a better way.
Sunday, December 16, 2012
A Flat Dark Theme for Unity/Gnome 3
What About The Rest Of The World?
There is one issue with all this, while you might enjoy a dark theme, the rest of the world seems to have decided on bright backgrounds (something about thinking they look simple, or clean, or minimal). When you combine that with the fact that probably more than half your time is going to be spent in a browser window, you are going to be staring at and interacting with a lot of brightly themed interfaces anyway. But there is something we can do; we can extend our little hack for Firefox above to all web content and do it for Chrome as well using Stylish. Stylish is an extension for Firefox and Chrome/Chromium that allows users to fiddle with CSS styles (and perhaps more) on web pages that you visit. Think of it as a limited version of GreaseMonkey that is geared towards tweaking the visual style of pages. With Stylish installed you just click a button when you encounter a page that you don't like and find a style that fixes what bugs you. There are over 41,000 user submitted styles out there, but importantly they don't usually work forever (pages constantly change and the Stylish scripts will need to be tweaked). This means that the vast majority of these actually don't work completely. It is a process of trial and error, but it is easy to turn off malfunctioning style scripts. I guess I feel that they are worth the effort as they are exactly the best possible solution to this problem if they work, and sometimes they do. They allow you to do the following:When all else fails
I find that it is good to have a quick and dirty method to save your eyes if you have a non-GTK app or your Stylish script is broken or things are just generally not working. For this, I use Compiz's "Negative" plugin which you can configure using Compiz Configuration Settings Manager or ccsm. This plugin allows you to invert the video on a per window basis at the press of a key chord. I have bound "Super-n" to this functionality. This is far from ideal. This inverts all video which mean that all of the colors are screwed up when we use it, which makes video and images look like crap. But it is a useful fall-back method.Update: Firefox can really look the part if you install the "Dark Bright-Aero" theme.

