Sunday, December 31, 2006

EOPL - 1.2.2 - Some Important Examples

The first 2 chapters of the book are quite basic, but in this section we get some interesting tidbits regarding static versus dynamic typing.

Beginning with two easy functions, remove and remove-first. The type definition for a list of symbols is

<list-of-symbols> ::= [] | <symbol> :: <list_of_symbols>

And the implementation of the functions are:

let rec remove_first s los =
match los with
[] -> []
| (smb :: los') when smb = s -> los'
| (smb :: los') -> smb :: (remove_first s los')

let rec remove s los =
match los with
[] -> []
| (smb :: los') when smb = s -> remove s los'
| (smb :: los') -> smb :: (remove s los')

Of course, the structural equality operator (=) here implies that these functions work not only with lists of symbols, but lists of any type. The type checker correctly infers the type 'a -> 'a list -> 'a list for both lists. In Scheme too it's possible to get generic functions using the structural equality function equal? -- and as it is the function is guaranteed to work with symbols, numbers and characters because it uses eqv? -- but the authors probably didn't want to talk about parametric polymorphism at this point in the discussion.

Now to two more functions, subst and notate-depth, that work on symbol expressions. A symbol expression is either a symbol or a list of symbol expressions, called an slist:

<s-list> ::= [] | <symbol-expression> :: <s-list>
<symbol-expression> ::= <symbol> | <s-list>

So now we can't directly define in F# functions that work either on a symbol or a list of similar data. We must define a new type that is a disjoint union of symbols and lists of symbol expressions:

type SymbolExp = Symbol of string | SList of SymbolExp list

And now define the function subst as in the book:

let rec subst n old slist =
match slist with
[] -> []
| (s :: sls) -> (subst_in_symbol_expression n old s) ::
(subst n old sls)
and subst_in_symbol_expression n old sexp =
match sexp with
Symbol s when s = old -> Symbol n
| Symbol s -> Symbol s
| SList sl -> SList (subst n old sl)

Note that subst receives plain strings as the new and old parameters, not a Symbol str where str is a string. Also, the list that subst receives is a SymbolExp list and not a SymbolExp that is a SList. The former fact is a difference in relation to the Scheme version (but could easily be changed) and the latter is consistent with the Scheme version; both may seem a bit confusing.

For notate-depth there's still another problem: the values returned by it are not really slists. While in Scheme this causes no problem, in F# we have to define a new type that includes the depth information with every symbol. So we get

type NotatedSymbolExp = NSymbol of string * int
| NSList of NotatedSymbolExp list

And finally, the function:

let rec notate_depth_in_slist slist n =
match slist with
[] -> []
| (s :: sls) -> (notate_depth_in_symbol_expression s n) ::
(notate_depth_in_slist sls n)
and notate_depth_in_symbol_expression sexp n =
match sexp with
Symbol s -> NSymbol (s, n)
| SList sl -> NSList (notate_depth_in_slist sl (n + 1))

let notate_depth slist = notate_depth_in_slist slist 0

The rest of the code are solutions to the exercises that require coding.

Exercise 1.11

let rec subst n o slist =
match slist with
[] -> []
| ((Symbol s)::sls) when s=old -> ((Symbol n)::subst n o sls)
| ((SList sl)::sls) -> ((SList (subst n o sl))::subst n o sls)
| (s::sls) -> s :: subst n o sls

Exercise 1.12

let rec subst_in_symbol_expression n old sexp =
match sexp with
Symbol s when s = old -> Symbol n
| Symbol s -> Symbol s
| SList sl -> SList (subst3 n old sl)
and subst n old slist = (subst_in_symbol_expression n old) slist

Exercise 1.13

let rec notate_depth_in_slist slist n = (notate_depth_in_symbol_expression n) slist
and notate_depth_in_symbol_expression n sexp =
match sexp with
Symbol s -> NSymbol (s, n)
| SList sl -> NSList (notate_depth_in_slist sl (n + 1))

EOPL - 1.2.1 - Deriving Programs from BNF Data Specifications

It's not necessary to have functions like list-of-numbers? in F#, because types are checked statically. An int list will always be a list of integers if the code compiles. The type checker takes care of that. So here are versions of nth-elt and list-length:

let rec nth_elt l n =
match (l, n) with
([], _) -> failwith ("List too short by " +
string_of_int (n + 1) + " elements")
| (x :: rl, 0) -> x
| (_ :: rl, _) -> nth_elt rl (n - 1)

let rec list_length l =
match l with
[] -> 0
| (_ :: rl) -> 1 + list_length rl

Just for fun, here is a non-recursive version of list-length, using a fold:

let list_length l = List.fold_left (fun n _ -> n + 1) 0 l

Unfortunately the F# compiler doesn't let us write this version in point-free style because of some trouble with the value restriction.

Exercises 1.5 and 1.6 make no sense in F#, once again because of the static type checker. Exercise 1.7 is quite easy, thanks to any_to_string:

let nth_elt l n =
let rec loop l' n' =
match (l', n') with
([], _) -> failwith ((any_to_string l) +
" does not have an element " +
string_of_int n)
| (x :: rl, 0) -> x
| (_ :: rl, _) -> loop rl (n' - 1) in
loop l n

IRC messages and 2 more Haskell functions

An IRC (Internet Relay Chat) message is composed of fields delimited by a space, up to an optional last argument which may include spaces. This last argument is delimited by a field whose first character is a colon; in this case, from the colon to the terminating CR-LF pair, every character is part of the same argument. This is done to send chat messages that may include spaces, of course. A message to the person with nickname lovecraft is sent (from the client to the server) like this:
"PRIVMSG lovecraft :hi there, man!\r\n"

Also, the message may include a sender field as its first field. This is indicated by the very first character of the message being a colon. When the above PRIVMSG is sent by a user cthulhu to the server, it redirects the message to the recipient, but now indicating who sent it; like this:
":cthulhu PRIVMSG lovecraft :hi there, man!\r\n"

So the task is to break an incoming IRC message into its constituent fields. There are two complications: 1) it's not possible to just split using spaces as separators, because of the last argument and 2) the first and last argument both may begin with a colon, and so it's not sufficient to just stop splitting at the first sight of a colon in the first character of a field.

I first defined a recursive function that did most of the work after a call to String.split, but then thought about using higher-order functions. Two Haskell functions could be used, called takeWhile and dropWhile. They are like take and drop, but instead of using a numeric index to decide where to stop taking (or dropping) elements to (or from) the list, it uses a predicate testing over the elements. So, takeWhile p l will return a list taking elements x from l while p x is true. Here is the code for these functions:

let rec takeWhile p l =
match l with
[] -> []
| (x :: rl) when p x -> x :: (takeWhile p rl)
| _ -> []

let rec dropWhile p l =
match l with
[] -> []
| (x :: rl) when p x -> dropWhile p rl
| _ -> l

Now it's easy to define the function splitLine that splits an incoming IRC message into fields, keeping the last field intact. It assumes that the trailing CR-LF characters where already stripped from the message.

let splitLine line =
let noColon s = s.[0] <> ':' in
let concat l = match l with [] -> [] | _ -> [String.concat " " l] in
let words = line |> String.split [' '] in
match words with
[] -> []
| (w :: wds) -> w :: ((takeWhile noColon wds) @
(concat (dropWhile noColon wds)))

So, to show an example:

> splitLine
":cthulhu PRIVMSG lovecraft :hi there, man!\r\n";;
val it : string list = [":cthulhu";
":hi there, man!"]

It's necessary, at this stage, to keep the colon in the sender field (to signal that the message has a sender field), but it could be removed from the last field. Anyway, splitLine will be used by another function that builds a Message record, removing all the colons that are added by the protocol.

Wednesday, December 27, 2006

Good news

It seems will be added to the F# library in its next release. Also, there were quite good responses to the points I raised about library functions in the F# mailing list. Don Syme noted that zip is already there, and it's called List.combine. I hadn't noticed this.

Also, Robert Pickering, author of the upcoming APress book Foundations of F#, responded on his blog with a short version of the ROT13 algorithm, using only functions already present in the current library:

It basically uses List.mapi instead of my combo of zip, drop and take. List.mapi maps a function over a list, but this function gets as arguments the list element and its index. And mapping over a string is implemented by converting the string to a char array, using over it and then creating a new string based on the result.

Monday, December 25, 2006

Still more ROT13

This is the last post on ROT13, I promise. The shortest version in F# is a low-level one, working with character codes. This is the most obvious solution to C programmers, and I considered it from the start, but I feel it loses something from the other versions. We have to "break" the abstraction of characters and think about their representation, and this lowers the level we are working. Here it is:


let strmap f (s : string) =
let sb = new System.Text.StringBuilder(s)
let rec aux i =
if i = sb.Length then () else
(sb.Chars(i) <- f (sb.Chars(i)) ; aux (i + 1))
( aux 0; sb.ToString() )

// still another one
let rot13 s =
(fun c -> Char.chr ((((Char.code c) - 97 + 13) mod 26) + 97))

I should probably substitute Char.code 'a' for 97 there, and this would work as long as the encoding guaranteed that letters a to z were assigned contiguous codes, which I believe is valid in most or all current text encodings. Still, we have to think about text encodings and character representations. And we still need a function to map over strings.

Sunday, December 24, 2006

ROT13 in F#, revisited

So after seeing the Haskell version I thought, lookup is actually assoc, in F# included in the module List. So how about trying a F# version similar to the Haskell one? Well, the problem is, as I mentioned, that F# doesn't have zip, take and drop, and there is no way to map over a string. So we have to define all these functions, and then define rot13 as in the Haskell version. I defined zipWith from the Haskell prelude as well, and then used it to define zip (as it is in Haskell).


let strmap f (s : string) =
let sb = new System.Text.StringBuilder(s)
let rec aux i =
if i = sb.Length then () else
(sb.Chars(i) <- f (sb.Chars(i)) ; aux (i + 1))
( aux 0; sb.ToString() )

let rec drop n l =
match (l, n) with
([], _) -> []
| (l, 0) -> l
| (_ :: rl, n) -> drop (n - 1) rl

let rec take n l =
match (l, n) with
([], _) -> []
| (l, 0) -> []
| (x :: rl, n) -> x :: (take (n - 1) rl)

let rec zipWith f l1 l2 =
match (l1, l2) with
([], _) -> []
| (_, []) -> []
| (x1 :: rl1, x2 :: rl2) -> (f x1 x2) :: (zipWith f rl1 rl2)

let zip l1 l2 =
zipWith (fun a b -> (a, b)) l1 l2

// and now for something completely different
let rot13 s =
let letters = ['a' .. 'z']
let transp = zip letters ((drop 13 letters) @ (take 13 letters))
let rotchar c = List.assoc c transp
strmap rotchar s

Works as expected, but we ended with code that is still longer than the first version, only because it was necessary to define some general-use functions. I guess I will create a Prelude module to use with F#, containing the Haskell functions I use most.

EOPL - 1.2 - Recursively Specified Programs

The first program is easy enough:

let rec e n x =
if n = 0 then 1 else x * (e (n-1) x)

But the second one uses the definition of bintrees from section 1.1. I should have noted there that some differences are caused because of static vs. dynamic typing. Scheme is "dynamically typed" (that's the usual term, anyway -- maybe it should be called "dynamically checked"), so it can define a bintree data type that may be a number or a list. Parts of the program that use this type must check if it received a number or a list, and act accordingly. But F# is statically typed, so we can't do it like this, because a number is a member of a type and a list is from another one. The way to solve this in F# is by creating a disjoint union data type that includes numbers and lists; well, as we know the second possibility has only 3 components, we'll do this as a tuple. So we get

type Bintree = Leaf of int | Node of string * Bintree * Bintree

With this definition, and some pattern matching, it's easy to write count-nodes:

let rec count_nodes s =
match s with
Leaf _ -> 1
| Node (_, bt1, bt2) -> (count_nodes bt1) + (count_nodes bt2)

The program still follows the same structure as the proof given earlier for the property.

It's interesting to note that we have to define a type, where in Scheme it wouldn't need any definition. Of course, the Scheme program is harder to understand because of the use of implicit list structure (cadr and caddr are used to decompose the list). But in F# we have to define a representation in terms of a tagged union type, and to hide the representation we would have to define an interface with an opaque type. A bit more of work but we get static checking and better runtime performance.

ROT13 in Haskell

To drive home the point of F#/OCaml lacking some predefined functions that make life easier, here is a version of the same ROT13 encoding algorithm, now in Haskell. Of course the whole thing is easier not only because of functions like zip, take and drop, but also because strings in Haskell are actually lists of characters. Anyway, here it is. Compare with the F# version.

rot13 s = map rotchar s
where rotchar c = maybe '#' id (lookup c transp)
transp = zip letters ((drop 13 letters) ++ (take 13 letters))
letters = ['a' .. 'z']

And now for the use case:

*Rot13> rot13 "feijoada"

ROT13 in F#

So, this is nothing really exciting, I wrote a simple program to do ROT13 text encoding, and in the way bumped into some minor quirks of the standard library. One of the problems with OCaml is its idiosyncratic standard library which is missing a lot of simple and useful functions. Granted, most of them you can write in less than five minutes, but it would be better if they were already done and standardized. I guess I became spoiled by Haskell. Anyway, F# seems to have inherited this as well. Below you can see the code, and a sample use case.


let letters = Array.of_list(['a' .. 'z'] @ ['a' .. 'z'])

let index a c =
let rec aux i = if a.(i) = c then i else aux (i + 1) in
aux 0

let strmap f (s : string) =
let sb = new System.Text.StringBuilder(s) in
let rec aux i =
if i = sb.Length then () else
(sb.Chars(i) <- f (sb.Chars(i)) ; aux (i + 1)) in
( aux 0; sb.ToString() )

let rotchar c =
let i = index letters c in
letters.(i + 13)

let rot13 s =
strmap rotchar s

And now for the use case:

> rot13 "feijoada";;
val it : string = "srvwbnqn"

Saturday, December 23, 2006

EOPL - 1.1 - Recursive Sets of Data

The ML languages, unlike Scheme, don't have a "cons pair" type. For pairs one can use tuples, and for proper lists there are lists. This means it is not possible to build an improper list in ML. So it is with F#, and this implies a simple change to the definition of lists of numbers (Definition 1.1.2, page 2).

Definition 1.1.2 The set list-of-numbers is the smallest set of values satisfying the two properties:
  1. The empty list is a list-of-numbers, and
  2. If l is a list-of-numbers and n is a number, then the cons expression (n :: l) is a list-of-numbers
Another point of difference is that F# lacks a native symbol type. Regarding the examples in the book, it seems we can safely use strings in place of symbols, although I don't know if later chapters make use of the fact that the symbols may have associated values. Regarding types, we can use a simple type synonym:

type Symbol = string

Or a tagged type:

type Symbol = Symbol of string

The latter will be more useful when defining other types of expressions, like the examples for the lambda-calculus. In this case there will be symbols, applications and abstractions, and not only a single variant.

With this in mind, it's easy to define a list-of-symbols type. At this point in the book, though, there's not much attention to representation of the data types, as the goal is to become familiar with recursive definitions. It uses the lisp pair notation, however, and lisp symbols, and these dependencies would be thus represented in F#. The same would hold for Standard ML or OCaml, of course.

Friday, December 22, 2006

ML type systems and Hindley-Milner type inference

Two texts I've come across in the last few days dealing with type systems for ML-like languages and Hindley-Milner type inference.

First, from a rather simple question about principal typings and principal types in the Caml list came a link to the paper ML Has Principal Typings by Carlos Camarão and Lucília Figueiredo; it shows a type system for ML that has principal typings, whereas the usual Damas-Milner type system found in ML languages has not.

Then, LtU links today to A modern eye on ML type inference, some seemingly interesting lecture notes that explain Hindley-Milner type inference in current terms, including extensions. I didn't read it yet, but will do.

Just beginning

This will be used to collect some code, mostly snippets I'm writing from time to time, often as exercices in books. One of the ideas is to translate code in some interesting books to other languages, like the SICP in other languages stuff. The language I'll be tackling will be, mostly, F#, but I guess some Haskell will appear too. Initially the idea is to work through the book Essentials of Programming Languages, but there are other candidates.

And maybe something else beyond book translations, I don't know. I'll try to comment on the code where I think it's necessary.