POSTED: 21 Sep 2018
TAGGED: #work #aws #clojure

Bye Bye Youview

I just left YouView and will be joining Habito shortly. Here's a quick (mostly tech focused) recap about these nearly two years.

Clojure and the jvm are really good

When I started, I knew nearly nothing about clojure. Pretty quickly though, I was quite comfortable, and once I got used to repl-driven development I was productive and pretty happy. Although I still miss type when I write clojure, the language and the community is very much focused on getting things done, very pragmatic and no non-sense.

The java interop is also very nice and straightfoward, so it's easy to leverage the gigantic jvm based ecosystem. Special mention to Hikari-cp, a really good connection pool and the Kinesis Client Library (KCL) which gives AWS kinesis queue semantic. KCL is especially nice since it does automatic load balancing and failover, allowing multiple instances to consume a kinesis stream.

Rust is awesome

I didn't really used rust, but once I had a big list of uuid in a file. The file was around 30GB big. I wanted to sort and deduplicate every line. sort --unique was taking forever. I wrote a simple python program, but it would blow up my memory. So I turned to rust in anger, and without any experience after 1h I got the following:

use std::collections::HashSet;
use std::vec::Vec;
use std::io::BufReader;
use std::io::BufRead;
use std::fs::File;
use std::env;

fn main() {
    let args: Vec<_> = env::args().collect();
    if args.len() > 1 {
        let path = &args[1];
        let f = File::open(path).unwrap();
        let mut s = HashSet::new();
        let fd = BufReader::new(&f);
        for line in fd.lines() {
            let l = line.unwrap();
            s.insert(l);
        }

        let mut v: Vec<String> = s.into_iter().collect();
        v.sort_unstable();
        for elem in &v {
            println!("{}", elem);
        }
    } else {
        println!("Need an argument");
    }

}

Which process my 30GB file in less than 2 minutes, with very minimal memory footprint.

DynamoDB is alright, PostgreSQL Aurora is awesome

This job reinforced my belief that your primary databases should be SQL based. DynamoDB is really good, but for some narrow usecase. When someone comes to you and ask some question like "what's the average number of recordings per user?", and dynamo hasn't been designed to answer that kind of question, you're in trouble. DynamoDB autoscaling and now the ability to downscale and reshard makes it really robust though. There are definitely some usecases for that.

PostgreSQL aurora on the other hand is really good. Since it's based on S3, the write contention is much lower. For write-heavy application, it performs really well. The downside is: it's expensive.

Cloudformation and terraform are meh

Declarative infrastructure is a really nice idea, but the execution is subpar. Cloudformation is often very slow, and if it gets out of sync for whatever reason (including some cloudformation bug), you're in a world of trouble. We had a go with terraform, but I don't have a lot of experience with it. Ultimetely it was decided to stay with cloudformation. While it was better for some things (smaller and more modular configurations), it has its fair share of drawback as well.

Be careful with microservice hype

Everything cloud at youview is very microservice oriented. There are definitely some nice aspects:

having a dedicated service for centralised identity is pretty nice. No need to think about it in other services.
Each service is quite small, so it's fairly easy to understand it well.
Language freedom (to some extend). We ended up having services written in node.js, go and clojure. Pick what you like.

However, I also became aware of some downsides:

Imposing a network barrier to ensure modularity is a really heavy price. Distributed systems are really really hard, and the failure modes are very much not easy to deal with. It definitely make the applications more complex to write when you have many downstream dependencies.
Getting the big picture is more difficult. Especially since most of the service expose a JSON api, and the documentation we are supposed to write is not always up to date or accurate. Mapping the inputs and outputs for a given service and tracing the dependencies chain is crucial but often quite difficult and this requires quite a bit of time.

When I went to a clojure conference, this talk really resonated with me. I believe a mixed approach between monolith and microservice is the way to go. Group similar services together at the code level (and perhaps at the artifact level as well), to simplify code reuse and give a clearer idea of dependencies.

Hello Habito

One of my main reason to move to Habito is their heavy use of strongly statically typed language: haskell and purescript. At last, I'm going to be paid to write bug in haskell and see how it works in a real production settings. Compared to the side projects I have, I'm bound to learn quite a few things. \o/