Sunday, December 31, 2006

IRC messages and 2 more Haskell functions

An IRC (Internet Relay Chat) message is composed of fields delimited by a space, up to an optional last argument which may include spaces. This last argument is delimited by a field whose first character is a colon; in this case, from the colon to the terminating CR-LF pair, every character is part of the same argument. This is done to send chat messages that may include spaces, of course. A message to the person with nickname lovecraft is sent (from the client to the server) like this:
"PRIVMSG lovecraft :hi there, man!\r\n"

Also, the message may include a sender field as its first field. This is indicated by the very first character of the message being a colon. When the above PRIVMSG is sent by a user cthulhu to the server, it redirects the message to the recipient, but now indicating who sent it; like this:
":cthulhu PRIVMSG lovecraft :hi there, man!\r\n"

So the task is to break an incoming IRC message into its constituent fields. There are two complications: 1) it's not possible to just split using spaces as separators, because of the last argument and 2) the first and last argument both may begin with a colon, and so it's not sufficient to just stop splitting at the first sight of a colon in the first character of a field.

I first defined a recursive function that did most of the work after a call to String.split, but then thought about using higher-order functions. Two Haskell functions could be used, called takeWhile and dropWhile. They are like take and drop, but instead of using a numeric index to decide where to stop taking (or dropping) elements to (or from) the list, it uses a predicate testing over the elements. So, takeWhile p l will return a list taking elements x from l while p x is true. Here is the code for these functions:

let rec takeWhile p l =
match l with
[] -> []
| (x :: rl) when p x -> x :: (takeWhile p rl)
| _ -> []

let rec dropWhile p l =
match l with
[] -> []
| (x :: rl) when p x -> dropWhile p rl
| _ -> l

Now it's easy to define the function splitLine that splits an incoming IRC message into fields, keeping the last field intact. It assumes that the trailing CR-LF characters where already stripped from the message.

let splitLine line =
let noColon s = s.[0] <> ':' in
let concat l = match l with [] -> [] | _ -> [String.concat " " l] in
let words = line |> String.split [' '] in
match words with
[] -> []
| (w :: wds) -> w :: ((takeWhile noColon wds) @
(concat (dropWhile noColon wds)))

So, to show an example:

> splitLine
":cthulhu PRIVMSG lovecraft :hi there, man!\r\n";;
val it : string list = [":cthulhu";
"PRIVMSG";
"lovecraft";
":hi there, man!"]

It's necessary, at this stage, to keep the colon in the sender field (to signal that the message has a sender field), but it could be removed from the last field. Anyway, splitLine will be used by another function that builds a Message record, removing all the colons that are added by the protocol.

No comments: