Deriving magic and parsing csv

Parsing CSV with "NaN" values

Another question popped up today on #haskell-beginners. The usecase is pretty simple. There is a csv file with the bytestring NaN inside, and you want to get these as Double. The first thing is to actually get Maybe Double, NaN is evil, and it's better to be upfront about invalid Double.

Let's see first how to solve this problem, and then, let's explore a couple of options to get GHC "write" code for us. This is meant as an beginner/intermediate tutorial.

Straightforward solution

Csv and Types

Here's the csv we're going to be concerned about for this post. I ommited the header to simplify the parsing. The first field is a tag, and the second field is a number.

coucou,42
boom,NaN

Here's the record we want to map row to:

The expected result is to get Nothing when NaN is encountered.

Parsing

cassava is the de-facto standard to parse csv. It provides two typeclasses: FromRecord to parse a row into a record, and FromField to parse a given field from a raw ByteString.

So let's write these instances, by hand first.

And then put everything together:

Automatic FromRecord instance with Generics

Cassava provides a way to automatically get a FromRecord instance if the datatype derives Generic. If you're new to the Generic mechanism, the idea is that GHC can produce an internal representation of data types, and it's possible to pattern match on this structure to do plenty of useful things. This tutorial is very nice as an introduction to the concept.

I'm also going to throw in the DerivingStrategies language extension, more for explanatory purpose than anything else. It allows the developper to specify how to derive instances.

So that's an improvement, we don't need to write the FromRecord ourselves thanks to cassava providing the machinery for that through Generics.

Newtypes and DerivingVia

Now, the problem with the previous solution is that it doesn't work /o\. The generically derived instance for Maybe Double returns Nothing if the field is empty, but in this case, it's a NaN and so it will fails to parse. As an aside, don't forget to write tests for these kind of situations ;)

So let's fix that with a newtype and a custom FromField instance so we can keep the automatic FromRecord instance.

Now, everything works !

The name MaybeNaN isn't very nice though. Quite often you want a better name which describes better the meaning of the field. This is where you can use DerivingVia extension to get the behavior you want without having to write custom instance for each newtype.

Or if you prefer, you can use a standalone deriving like so:

Conclusion

The Generic deriving mechanism is very powerfull and pretty widely used. Aeson is the other very popular library exploiting this mechanism. It's an easy way to get a lot for free. The most impressive usage of Generic I know is the library generic-lenses which gives you lenses and prism all thanks to Generic. The DerivingVia mechanism is a nice addition to have concise yet extensible instances. It really shines when you want the same instance on multiple different types.