That feature might take a while

Alicemy editor who just shipped out to afghanistan(sic) for six months said to me before she left, 'when i come back, i expect you to have a life'

me: lol! well. you have 6 months to work on it

meand since we know that everything takes twice as long as the time allocated for it that means it should only take a year to implement this feature right?

 

I shipped my second patch for hsopenid last weekend. Elliot Trevor had his bag of code cleanliness tricks going. I'm starting to feel really good about all this.

A Categorical Treatment of Romance

Or, why Information Theory is more touchy-feely than classical E&M.

You folks are familiar with the concept of a duality, yes? Then what is the usual loser scenario for guys? You dote on some girl with all this attention but then she runs away with some motorcycle dude, right? And to add insult to injury, she complains about him to you afterwards.

What would be duality of this situation? Just reverse the arrows, yes? I don't want you to get to an answer without thinking about it, so take some time to muse a little, then highlight the text after this sentence. You're a girl. No one calls you, not even that motorcycle dude you gave it up for. "I hope he dies in an accident. Asshole." ;-)

Now there is a catch here, because this seems to have violated conservation of attention. It's like integrating the magnetic flux over a closed continuous surface, yknow? Should sum to zero.* Alas, the physically correct description of the world is totally useless to us. We have a glimmer of hope, however. See, if you step back and think about it,... If you give unconditional attention, is it still attention? ;-) Remember, it takes only one bit to encode unconditional attention.** One. Bit. :-) Admittedly, this reconciliation requires Information Theory, but what matters is that the math works!! :-D ***

I wonder how many analogies I can butcher with dualities, and which supplementary theories I will need to grease the axe uh I mean logic...

[*] Maxwell's equation for magnetic flux works particularly well here when you imagine how the positive and negative nature of the interaction would play out. ;-) And the fact that it sums to zero. Face it, kid. There is no magnetic monopole.

[**] Kindly consider that girls may find one-bit guys to be cute, but ultimately a little annoying. And that giving it up to those cool boys riding motorcycles can also be encoded in one bit. Happy romancing, and try to not say anything stupid! :-)

[***] The solution to maths is… MOAR MATHS!

 

The Information Density of Code

While bitching about make with a friend...

This is rather nonintuitive at first glance, but how clean a piece of code looks has relatively little to do with how much the code does, or what it does.

What makes code easy to read is its information density. This is as much a cognitive psych measure as a engineering matter. All that Python and Haskell code where the whitespace is part of the syntax? A lot of people hate it, but the adoption of that practice is gaining traction. I strongly suspect that this is because it is easier for your eyes to perceive where one routine ends and where another begins.

There is a Goldilocks situation with regards to information density. Just like most people can optimally remember about 5-7 things at a time, we can optimally process information at a narrow range of rates, so the code should neither be too dense, nor too terse. If it is too dense, it's tricky. If it's too sparse, then you have to scroll through too much of it to get the idea of what it does, such that you may actually need your memory buffer in order to process it.

So how might we make some recommendations based on this and other tenets of Information Design?

Corner cases and nontrivial wrappers need to be kept track of, so they tally up into the 5-7 things you can keep in your memory buffer. No function or routine should have more than a few of these.

People perceive similar looking items to be similar in function, regardless of whether they are or not, so...
A. If you have a bunch of routines that are similar, they should be presented in list format.
B. Anytime you have something that is different, that breaks the pattern, you should separate it away with whitespace.
C. On the other hand, if all the routines are markedly different in function, it should also be ok to present them in listlike imperative style, so long as there aren't more than 5 or 6 of them at a time.
D. Those cascading nested Nothing/Just matches? Please don't. That should really be aligned imperative style, even if it means more lines will be necessary. The same reasoning goes for nested If/Else, but this is where Haskell's monads really shine through. If you are doing imperative work, it's much cleaner to write a wrapper, Bool -> m (), that escapes to the appropriate exception, and such a wrapper would be uneconomical in most other languages.
E. This applies particularly to all those pieces of code where you have some pure stuff being wrapped in something monadic, and the pure stuff could've stood on its own. Changes in purpose count towards information density, so in those cases where there actually is a separation between pure and monadic, it should be seen visually in the whitespace.

Anytime you have long lists of anything, please dump them elsewhere in their own file. People shouldn't have to switch gears within a single file. The information density of a single file should be relatively uniform, because information density is a meta variable.

More is less sometimes. Thanks to arrows, functors and monads, you can do all sorts of stuff with (.), <$>, second and return? Just because you can doesn't mean you shouldn't label it with intermediate terms. This applies to partial function applications especially. When you label it, people only have to remember the label, not the detail of the implementation. All that glue can obscure what something does.

Long lines that smack or exceed the 80-column limit are not terribly bad if they are either the function definition/args or the "return statement" (if you are using let..in syntax). Long lines, where they exist, should delimit the boundaries of a function, unless you are using where syntax, in which case they should just delimit the start of the function. Avoid multiple, consecutive long lines, because one long line is already visually attention grabbing. You can re-think the 80-column rule under these considerations.

That's all I can think of for now, but if you remember form follows function, then you should understand that what your code looks like to a non-programmer should reflect what it actually means, structurally, to you.

An import trick too useful to pass up

So here is a trick I learned from reading the Snap Framework code over the last week.

Namespace collisions suck. Data.Map, Data.Set and Data.List all have fairly similar functions that we all know and love to use, and they differ subtly, so people often import them qualified, i.e.

import qualified Data.Map as M
import qualified Data.Set as S
import qualified Data.List as L

Now the annoying thing about this is that then you have to prepend the type signatures too, e.g.

foobar :: S.Set a -> a -> S.Set a

This is pissy, so what some genius who worked on Snap did was:

import Data.Map (Map)
import qualified Data.Map as M
import Data.Either (Left,Right)

Now this sounds simple and all, but it actually works Much Better in practice than in theory, partly because type constructors like Left and Right rarely overlap from module to module.

How *do* you tell good code from bad anyway?

So here's one straightforward metric for code cleanliness. Is there code that could've been purely functional that can't be easily be extracted from the monadic wrappers you placed it in? I guess a similar thing for OCaml would've applied to the OO and functional code.

This is probably not as easy as it seems though, because there's plenty of stuff that would've looked like it could've been pure but is actually much better off effectful. Sometimes there's no substitute of experience.

Easy type-level programming hack to make DB calls type-safe

A few months ago, there was a useful tutorial on using CouchDB with Haskell. You can find the original here.

One weakness of these DB layers is that you have to verify your data, and the APIs to the input data are usually not typesafe. As it turns out, it's incredibly easy to use type-level programming to make your DB calls typesafe.

Lets take an idealized version of a typical DB get call:

getDBUnsafe :: (JSON a) => DB -> String -> IO a
getDBUnsafe = undefined

There's two weak spots here. The first is the part where we pass in the DB key, and the second is when we use the value. The latter is more insidious than the former, because whatever it is you're using to parse your JSON, it's probably a pure function, so if you're coding in the usual expedient way, you'll have no idea why the parse is failing, when the real cause is that you're fetching from the wrong DB.

So here's the framework for a solution that uses FunctionalDependencies and MultiParamTypeClasses to impose a constraint on the type of the DB key and stored value, based on the type of the DB.

-- this makes k and v uniquely determined by a
class (JSON v) => DBTy a k v | a -> k v where
   getDBName :: a -> String
   getKey :: a -> k -> String

getDB :: (DBTy a k v) => a -> k -> IO v
getDB db k =
  getDBUnsafe (getDBName db) (getKey db k)

So how we use this? Well you just define a dummy type for a DB like such:

type UserId = Int

data Avatar
   = Avatar ByteString
   deriving (Eq, Show, Ord, Typeable, Data)

instance JSON Avatar where
   showJSON = toJSON
   readJSON = fromJSON

data AvatarsDB

instance DBTy AvatarsDB UserId Avatar where
  getDBName (AvatarsDB name) = "avatars"
  getKey _ = show

avatarsDB = AvatarsDB

 

 

After that, you just replace your unsafe DB calls that look like this:

v <- getDBUnsafe "avatars" uid

with this:

-- because the type of avatarsDB belongs to DBTy AvatarsDB UserId Avatar,
-- the type of "uid" here will be inferred to be UserId, and
-- the return value "v" will be inferred to be Avatar
v <- getDB AvatarsDB uid

 

Oh and here's the stuff you need to paste at the top of all this to get it to compile.

{-# LANGUAGE FunctionalDependencies, MultiParamTypeClasses, DeriveDataTypeable, EmptyDataDecls #-}
module FundepsExample where

import Data.ByteString.Lazy

import Text.JSON
import Text.JSON.Generic

type DB = String

 

I know it's a pretty silly example and use case, but I was surprised that there were folks in my local Haskell meetup who hadn't seen it, so I thought I should share it. It's saved me no end of errors ever since I put it to use.

Now in practice, I've found it more useful to create a KeyStringTy class instead of using that getKey bit. As it so happens, the sort of stuff I typically use to index my bit bucket database are also the sort of stuff that I use in RPC calls from the web. It's probably less correct, but it sure was dandier to code. Watch it bite me in the ass someday.

It's also really dandy to create a State monad that stores the docrevs of the get calls, caches the fetched values, and gives you an API that's similar to Data.Map's API. Specifically, I'm thinking of Data.Map's alter function, but that's less of a howto and more of a library.

Mike Rowe Celebrates Dirty Jobs

 
I couldn't agree with this guy more. In tech, the cultural dynamic has led to more than a few awkward conversations about how programming doesn't interest you anymore. Then there's some equally silly remark on how one could remedy the problem with -insert-deux-ex-machina- here. Or an even more awkward defensive remark about how X is still interesting, the other guy just hasn't seen aspect A of X.
 
It's nearly impossible to talk about things like work ethic and the psychology of motivation with such a dynamic. People tend to fall into one of three camps: that they do work simply because it has to be done (i.e. it's a responsibility, so shut up), that they do their work because of the money it earns them, or that they'll only do those things that pulls their heartstrings. You don't get very far when a conversation usually devolves into some sort of argument over philosophies or personality traits.
 
Perhaps that's why Self-Determination Theory works so well: it bypasses the problem entirely. In the context of a conversation however, I wonder whether the smart thing to do is to keep quiet because there does not seem to be a right answer. I personally came from the third camp, but doing a startup teaches you that passion alone can't carry you the whole way (at least if it was a certain problem or passion that sent you down that path).
 
I wonder how many other creative industries suffer from this problem, and in what way, because I'm sure that the problem manifests itself differently depending on the field.
 

Haskell & STM, Why no Applicative?

There was a fellow on #haskell the other day who was apprehensive about learning STM. Should he learn it after learning category theory? We assured him that Haskell's STM was fairly simple, whereas category theory is a dense liberal art that you study to enrich your mind. Someone pointed him to the wiki and he went on his way.

Afterwards, I perused the wiki (again). What struck me was that there was no example there that show you how to convert a plain old bunch of IO routines into STM routines. When I went to make one myself, I realized how brain-dead easy it is, but that there's something missing...

Anyway, here's a chalked up example where someone has to perform multiple time-consuming tasks (preferably in parallel), with the interesting routine in bold:


import Control.Applicative
import Control.Monad
import Control.Concurrent
import Control.Concurrent.STM
import Control.Concurrent.STM.TMVar
import Data.DateTime
 
main :: IO ()
main =
    do putStrLn "Without STM"
       t1s <- getCurrentTime
       stuff <- withoutSTM
       t1e <- getCurrentTime
       putStrLn $ show stuff
       putStrLn $ "That took " ++ show (diffSeconds t1e t1s) ++ " seconds"
 

-- this is the routine that actually does stuff
withoutSTM :: IO GroceryStore
withoutSTM =
    do a <- getTomatoesCountFromDB
       b <- haveFreshBerries
       c <- getNameOfCurrentStore
       return $ GroceryStore a b c
 
getTomatoesCountFromDB :: IO Int
getTomatoesCountFromDB =
    do milliSleep 1000  -- simulate slow DB read
       return 5

haveFreshBerries :: IO Bool
haveFreshBerries =
    do milliSleep 1000
       return True

getNameOfCurrentStore :: IO String
getNameOfCurrentStore =
    do milliSleep 1000
       return "Tom's Produce"

data GroceryStore
    = GroceryStore
      { numTomatoes :: Int
      , freshBerriesInStock :: Bool
      , nameOfStore :: String
      }
    deriving (Eq, Show, Ord)
 
-- helpers
milliSleep = threadDelay . (*) 1000
 
 
 
So there you have the plain old imperative code.
 
And here is the same code modified to use STM.
 
 
-- getTomatoesCountFromDB, haveFreshBerries, getNameOfCurrentStore are unchanged from before

-- this is the modified routine
withSTM :: IO GroceryStore
withSTM =
    do a <- stmFork getTomatoesCountFromDB
       b <- stmFork haveFreshBerries
       c <- stmFork getNameOfCurrentStore

       GroceryStore <$> (stmWait a)
                    <*> (stmWait b)
                    <*> (stmWait c)

main :: IO ()
main =
    do putStrLn "With STM"
       t2s <- getCurrentTime
       stuffWithSTM <- withSTM
       t2e <- getCurrentTime
       putStrLn $ show stuffWithSTM
       putStrLn $ "That took " ++ show (diffSeconds t2e t2s) ++ " seconds"
       return ()

-- more helpers
stmWait = atomically

stmFork :: IO a -> IO (STM a)
stmFork m =
    do tmv <- newEmptyTMVarIO
       let m' = m >>= (atomically . putTMVar tmv)
       forkIO m'
       return $ readTMVar tmv
 
 
 
Now the folks who've been playing with this for a while won't find this particularly remarkable, but I wouldn't have used it a year ago when I was first learning Haskell, so I thought someone should make note of it. There are also two important things to take away from this.
 
1. You could only do this if imperative routines are treated as values. The pons asinorum that Eric S. Raymond was referring to is not without its benefits.
 
2. Would an applicative instance for STM be nicer? At the least, we might be able to do code like this instead.
 
       stmWait $ GroceryStore <$> a <*> b <*> c
 

What would be wrong with the naive implementation of this?  e.g.:
 
(<*>) :: (Functor m, Monad m) => m (a -> b) -> m a -> m b
(<*>) mf ma =
    do a <- ma
       f <- mf
       return $ f a
 
 
Again, I'm probably way out of my depth here, so I apologize profusely if this is asinine.