Applicative Options Parsing in Haskell

Posted 3 months back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

I’ve just finished work on a small command line client for the Heroku Build API written in Haskell. It may be a bit overkill for the task, but it allowed me to play with a library I was very interested in but hadn’t had a chance to use yet: optparse-applicative.

In figuring things out, I again noticed something I find common to many Haskell libraries:

  1. It’s extremely easy to use and solves the problem exactly as I need.
  2. It’s woefully under-documented and appears incredibly difficult to use at first glance.

Note that when I say under-documented, I mean it in a very specific way. The Haddocks are stellar. Unfortunately, what I find lacking are blogs and example-driven tutorials.

Rather than complain about the lack of tutorials, I’ve decided to write one.

Applicative Parsers

Haskell is known for its great parsing libraries and this is no exception. For some context, here’s an example of what it looks like to build a Parser in Haskell:

type CSV = [[String]]

csvFile :: Parser CSV
csvFile = do
    lines <- many csvLine
    eof

    return lines

  where
    csvLine = do
        cells <- many csvCell `sepBy` comma
        eol

        return cells

    csvCell = quoted (many anyChar)

    comma = char ','

    eol = char '\n' <|> char '\r\n'

    -- etc...

As you can see, Haskell parsers have a fractal nature. You make tiny parsers for simple values and combine them into slightly larger parsers for slightly more complicated values. You continue this process until you reach the top level csvFile which reads like exactly what it is.

When combining parsers from a general-purpose library like parsec (as we’re doing above), we typically do it monadically. This means that each parsing step is sequenced together (that’s what do-notation does) and that sequencing will be respected when the parser is ultimately executed on some input. Sequencing parsing steps in an imperative way like this allows us to make decisions mid-parse about what to do next or to use the results of earlier parses in later ones. This ability is essential in most cases.

When using libraries like optparse-applicative and aeson we’re able to do something different. Instead of treating parsers as monadic, we can treat them as applicative. The Applicative type class is a lot like Monad in that it’s a means of describing combination. Crucially, it differs in that there it has no ability to define an order – there’s no sequencing.

If it helps, you can think of applicative parsers as atomic or parallel while monadic parsers would be incremental or serial. Yet another way to say it is that monadic parsers operate on the result of the previous parser and can only return something to the next; the overall result is then simply the result of the last parser in the chain. Applicative parsers, on the other hand, operate on the whole input and contribute directly to the whole output – when combined and executed, many applicative parsers can run “at once” to produce the final result.

Taking values and combining them into a larger value via some constructor is exactly how normal function application works. The Applicative type class lets you construct things from values wrapped in some context (say, a Parser State) using a very similar syntax. By using Applicative to combine smaller parsers into larger ones, you end up with a very convenient situation: the constructed parsers resemble the structure of their output, not their input.

When you look at the CSV parser above, it reads like the document it’s parsing, not the value it’s producing. It doesn’t look like an array of arrays, it looks like a walk over the values and down the lines of a file. There’s nothing wrong with this structure per se, but contrast it with this parser for creating a User from a JSON value:

data User = User String Int

-- Value is a type provided by aeson to represent JSON values.
parseUser :: Value -> Parser User
parseUser (Object o) = User <$> o .: "name" <*> o .: "age"

It’s hard to believe the two share any qualities at all, but they are in fact the same thing, just constructed via different means of combination.

In the CSV case, parsers like csvLine and eof are combined monadically via do-notation:

You will parse many lines of CSV, then you will parse an end-of-file.

In the JSON case, parsers like o .: "name" and o .: "age" each contribute part of a User and those parts are combined applicatively via (<$>) and (<*>) (pronounced fmap and apply):

You will parse a user from the value for the “name” key and the value for the “age” key

Just by virtue of how Applicative works, we find ourselves with a Parser User that looks surprisingly like a User.

I go through all of this not because you need to know about it to use these libraries (though it does help with understanding their error messages), but because I think it’s a great example of something many developers don’t believe: not only can highly theoretic concepts have tangible value in real world code, but they in fact do in Haskell.

Let’s see it in action.

Options Parsing

My little command line client has the following usage:

% heroku-build [--app COMPILE-APP] [start|status|release]

Where each sub-command has its own set of arguments:

% heroku-build start SOURCE-URL VERSION
% heroku-build status BUILD-ID
% heroku-build release BUILD-ID RELEASE-APP

The first step is to define a data type for what you want out of options parsing. I typically call this Options:

import Options.Applicative -- Provided by optparse-applicative

type App = String
type Version = String
type Url = String
type BuildId = String

data Command
    = Start Url Version
    | Status BuildId
    | Release BuildId App

data Options = Options App Command

If we assume that we can build a Parser Options, using it in main would look like this:

main :: IO ()
main = run =<< execParser
    (parseOptions `withInfo` "Interact with the Heroku Build API")

parseOptions :: Parser Options
parseOptions = undefined

-- Actual program logic
run :: Options -> IO ()
run opts = undefined

Where withInfo is just a convenience function to add --help support given a parser and description:

withInfo :: Parser a -> String -> ParserInfo a
withInfo opts desc = info (helper <*> opts) $ progDesc desc

So what does an Applicative Options Parser look like? Well, if you remember the discussion above, it’s going to be a series of smaller parsers combined in an applicative way.

Let’s start by parsing just the --app option using the library-provided strOption helper:

parseApp :: Parser App
parseApp = strOption $
    short 'a' <> long "app" <> metavar "COMPILE-APP" <>
    help "Heroku app on which to compile"

Next we make a parser for each sub-command:

parseStart :: Parser Command
parseStart = Start
    <$> argument str (metavar "SOURCE-URL")
    <*> argument str (metavar "VERSION")

parseStatus :: Parser Command
parseStatus = Status <$> argument str (metavar "BUILD-ID")

parseRelease :: Parser Command
parseRelease = Release
    <$> argument str (metavar "BUILD-ID")
    <*> argument str (metavar "RELEASE-APP")

Looks familiar, right? These parsers are made up of simpler parsers (like argument) combined in much the same way as our parseUser example. We can then combine them further via the subparser function:

parseCommand :: Parser Command
parseCommand = subparser $
    command "start"   (parseStart   `withInfo` "Start a build on the compilation app") <>
    command "status"  (parseStatus  `withInfo` "Check the status of a build") <>
    command "release" (parseRelease `withInfo` "Release a successful build")

By re-using withInfo here, we even get sub-command --help flags:

% heroku-build start --help
Usage: heroku-build start SOURCE-URL VERSION
  Start a build on the compilation app

Available options:
  -h,--help                Show this help text

Pretty great, right?

All of this comes together to make the full Options parser:

parseOptions :: Parser Options
parseOptions = Options <$> parseApp <*> parseCommand

Again, this looks just like parseUser. You might’ve thought that o .: "name" was some kind of magic, but as you can see, it’s just a parser. It was defined in the same way as parseApp, designed to parse something simple, and is easily combined into a more complex parser thanks to its applicative nature.

Finally, with option handling thoroughly taken care of, we’re free to implement our program logic in terms of meaningful types:

run :: Options -> IO ()
run (Options app cmd) = do
    case cmd of
        Start url version  -> -- ...
        Status build       -> -- ...
        Release build rApp -> -- ...

Wrapping Up

To recap, optparse-applicative allows us to do a number of things:

  • Implement our program input as a meaningful type
  • State how to turn command-line options into a value of that type in a concise and declarative way
  • Do this even in the presence of something complex like sub-commands
  • Handle invalid input and get a really great --help message for free

Hopefully, this post has piqued some interest in Haskell’s deeper ideas which I believe lead to most of these benefits. If not, at least there’s some real world examples that you can reference the next time you want to parse command-line options in Haskell.

Phusion Passenger 4.0.45: major Node.js and Meteor compatibility improvements

Posted 3 months back at Phusion Corporate Blog


Phusion Passenger is a fast and robust web server and application server for Ruby, Python, Node.js and Meteor. Passenger takes a lot of complexity out of deploying web apps, and adds powerful enterprise-grade features that are useful in production. High-profile companies such as Apple, New York Times, AirBnB, Juniper, American Express, etc are already using it, as well as over 350.000 websites.

Phusion Passenger is under constant maintenance and development. Version 4.0.45 is a bugfix release.

Phusion Passenger also has an Enterprise version which comes with a wide array of additional features. By buying Phusion Passenger Enterprise you will directly sponsor the development of the open source version.

Recent changes

  • Major improvements in Node.js and Meteor compatibility. Older Phusion Passenger versions implemented Node.js support by emulating Node.js’ HTTP library. This approach was found to be unsustainable, so we’ve abandoned that approach and replaced it with a much simpler approach that does not involve emulating the HTTP library.
  • Introduced support for sticky sessions. Sticky sessions are useful — or even required — for apps that store state inside process memory. Prominent examples include SockJS, Socket.io, faye-websocket and Meteor. Sticky sessions are required to make the aforementioned examples work in multi-process scenarios. By introducing sticky sessions support, we’ve much improved WebSocket support and support for the aforementioned libraries and frameworks.
  • Due to user demand, GET requests with request bodies are once again supported. Support for these kinds of requests was removed in 4.0.42 in an attempt to increase the strictness and robustness of our request handling code. It has been determined that GET requests with request bodies can be adequately supported without degrading robustness in Phusion Passenger. However, GET requests with both request bodies and WebSocket upgrade headers are unsupported. Fixes issue #1092.
  • [Enterprise] The Flying Passenger feature is now also available on Apache.
  • Fixed some issues with RVM mixed mode support, issue #1121.
  • Fixed Passenger Standalone complaining about not finding PassengerHelperAgent during startup.
  • Fixed various minor issues such as #1190 and #1197.
  • The download timeout for passenger-install-nginx-module has been increased. Patch by 亀田 義裕.

Installing or upgrading to 4.0.45

OS X OS X Debian Debian Ubuntu Ubuntu
Heroku Heroku Ruby gem Ruby gem Tarball Tarball

Final

Fork us on Github!Phusion Passenger’s core is open source. Please fork or watch us on Github. :)

<iframe src="http://ghbtns.com/github-btn.html?user=phusion&amp;repo=passenger&amp;type=watch&amp;size=large&amp;count=true" allowtransparency="true" frameborder="0" scrolling="0" width="170" height="30"></iframe><iframe src="http://ghbtns.com/github-btn.html?user=phusion&amp;repo=passenger&amp;type=fork&amp;size=large&amp;count=true" allowtransparency="true" frameborder="0" scrolling="0" width="170" height="30"></iframe><iframe src="http://ghbtns.com/github-btn.html?user=phusion&amp;type=follow&amp;size=large&amp;count=true" allowtransparency="true" frameborder="0" scrolling="0" width="190" height="30"></iframe>

If you would like to stay up to date with Phusion news, please fill in your name and email address below and sign up for our newsletter. We won’t spam you, we promise.



Shared Terminology Yet Different Concepts Between Ember.js and Rails

Posted 3 months back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

Developers who are well versed in Ruby on Rails (or other MVC implementations) and start learning Ember.js may find it surprising that, even though there’s shared vocabulary, denoted concepts are sometimes very different.

The first and obvious difference comes from the fact both are web frameworks. Rails facilitates the creation of web apps that offer mainly an HTTP interface to interact with. On the other hand, Ember helps creating web apps that interface directly with humans (through clicks, taps, key presses, etc). They are both web application frameworks, but the former is server side and the latter client side.

A look into their workflows will shed light over the main differences.

Rails: Request Life Cycle

The Rails request life cycle works as follows:

  1. The Router receives an HTTP request from the browser/client, and calls the controller that will handle it.
  2. The Controller receives the HTTP parameters, and instantiates necessary Model or Models.
  3. The Model fetches from the database requested objects.
  4. The Controller passes Models to the View, and renders them.
  5. The View generates a text reponse (HTML, JSON, etc), interpolating Ruby objects where necessary.
  6. The Controller response is sent back to the Router, and from there to the client.

Rails Request Life Cycle

Ember.js: Run Loop

The Ember.js run loop works as follows (don’t forget that Model and Controller refer now to Ember concepts rather than Rails):

  1. The Router instantiates a Model and sets it in a Controller.
  2. The Router renders the Template.
  3. The Template gets its properties from the Controller, which acts as a decorator for the Model. The template doesn’t know where a property it displays is defined, the controller will provide it by itself or through its Model.

Ember Run Loop

At this point the cycle ends, and it can be restarted by:

  • An event (like a click on a link) that triggers an action that updates the route.
  • A new URL is directly visited by writing in the browser’s address bar.

Model

Models are similar in both frameworks. It is a common situation that a model in Ember maps one-to-one with a model in Rails.

In Rails almost always a Model is backed by a Database like PostgreSQL, whereas in Ember it is common that a model only lives in memory, and is fetched, changed or deleted via a JSON API.

Note that in Ember Template and Model are always automatically in sync thanks to the two-way binding feature. This means that if we edit in a form an attribute for a Model, the attribute will change in real time in any place the model is rendered (let’s say, in the title of the page), even if we don’t submit the form or persist the changes. This is another surprise coming from Rails, where a change in a form is stateless and until we don’t successfully send it nothing really changes.

View

It is a good Rails practice to have simple views, with presenters/decorators providing any necessary logic. Ember enforces this good practice, with its Templates being logic-less by nature of its engine, Handlebars. A similar enforcement may be used in Rails via gems like curly.

Ember has both the concept of Views and of Templates, though Templates are more akin to Rails Views. An Ember View renders Templates, and it provides re-usable components and more complex event handling.

Controller

A Controller in Rails is a Rack application, that talks to Models to return an HTTP response. A Controller in Ember is a Model decorator, and it’s called from the templates.

Router

A Router in Rails is responsible for HTTP requests/responses. In Ember a Route (and not a Router, which is just a mapping between strings and Routes) is concerned about the current state of the application (what models and controllers should be set up), and about keeping the URL up to date as application’s state changes (like after a click on a link).

Final Thoughts

As you dig more into Ember you’ll find more similarities and differences. This blog post should provide a good start to avoid possible confusions due to similar vocabulary that refers to different things.

Swift Sequences

Posted 3 months back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

We’re incredibly excited about the new Swift programming language announced by Apple at this year’s WWDC. As a way of experimenting, we’ve begun looking into what it would be like if we rewrote Liftoff, our command line Xcode project generation/configuration tool, in Swift.

Liftoff supports a few options on the command line, so the first thing we’re trying to do is write a small command line parsing library in Swift.

We want to try avoiding importing Foundation, so we are relying on the top level constants C_ARGV and C_ARGC to get the arguments passed on the command line. Instead of working with these primitive types, we’d really rather have our own object that can hold onto a native String[]. By implementing the Sequence protocol, we could quickly iterate over these options to do whatever we need to do with them.

Creating the Argument List

The requirements for the ArgumentList object are as follows:

  • Instantiate it with C_ARGV and C_ARGC
  • Transform those into a native property with the type String[]

C_ARGV is of the type UnsafePointer<CString>. It contains all of the arguments passed to our process from the command line. From the type definition alone, we know that the internal contents of the object are instances of CString. This is good, because it means that once we get to those contents, we can use the method fromCString() on String to convert them to a nicer type. We also know that we’ll be able to access the contents via subscripting, but since UnsafePointer doesn’t conform to Sequence itself, we can’t iterate through it.

C_ARGC is of the type CInt. It represents the number of arguments that were passed to our object on the command line. We can use this to generate a loop so that we can convert each CString inside C_ARGV into a String.

We can start with a struct:

struct ArgumentList {
    var arguments: String[]

    init(argv: UnsafePointer<CString>, count: CInt) {
    }
}

Here, we’ve defined a basic constructor that will take C_ARGV and C_ARGC, and a property named arguments of the type String[]. So now, we can implement our constructor to loop through the provided input from the command line and convert the arguments into String instances:

init(argv: UnsafePointer<CString>, count: CInt) {
    for i in 1..count {
        let index = Int(i);
        let arg = String.fromCString(C_ARGV[index])
        arguments.append(arg)
    }
}

This gives us an object that satisfies our basic requirements. Now we can start to look into what it would take to conform this object to Sequence.

Inspecting Sequence

Now that we have an object that behaves how we want as a container, we can start to implement the methods that will let us transparently iterate through the internal list.

The protocol that lets us do this is called Sequence, and although it seems very straightforward, it took three of us in a room watching the Advanced Swift session video, looking through the session slides, and implementing it three times to fully understand what we needed to do.

So here’s a quick overview of how the protocol works when used with for in:

When you use the for <object> in <sequence> syntax, Swift actually does some re-writing of the code under the covers. As described in the Advanced Swift session, when you write:

for x in mySequence {
    // iterations here
}

Swift actually turns that into:

var __g: Generator = mySequence.generate()
while let x = __g.next() {
    // iterations here
}

So, breaking this down:

  • Swift calls generate() on the provided Sequence, returning a Generator. This object is stored in a private variable, __g.
  • __g then gets called with next(), which returns a type of Element?. This object is conditionally unwrapped in the while let statement, and assigned to x.
  • It then performs this operation until next() has nothing left to return.

For the record, I’m not crazy about the naming here. I think it’s probably best to think Enumerator instead of Generator, at least in this use case. I’ve filed a radar to this effect, but have already gotten some feedback that this change might not be so simple.

So it looks like we’ll need to actually implement two protocols to conform to Sequence. We’ll need our ArgumentList to conform to Sequence, and we’ll need another object to conform to Generator. We can start with Generator, since it’s the one that’s actually going to be doing the work.

Implementing Generator

As previously shown, we’ll need to implement one method for Generate: next(). This method has the return type of Element?, which is really just a catch-all type defined internally due to some weirdness with protocols and Generics. For now, we’ll ignore that, and think of it as being <T>?. The important thing to get is that we need to return an Optional.

In order to iterate through our array of arguments, we’re going to use a new type: Slice. This type holds a reference to a range of an existing array. This is a bit odd, but essentially, if we create a Slice with a range from an Array, and then update that Array, the Slice is updated as well:

let array: Array = ["foo", "bar", "baz"]
let slice: Slice<String> = array[1...2]
println(slice) // prints ["bar", "baz"]

array[1] = "bat"
println(slice) // prints ["bat", "baz"]

Note that I’m adding the type signatures for those constants for illustrative purposes. The return type of a range of an array is already Slice<T>, so Swift is able to infer this information.

We’ll create a light weight ArgumentListGenerator that conforms to Generator, and has an internal items property:

struct ArgumentListGenerator: Generator {
    var items: Slice<String>
}

If you try to compile, you’ll see that the compiler throws an error, because we haven’t implemented Generator properly. We need to implement next() for the compiler to be happy:

mutating func next() -> String? {
  if items.isEmpty? { return .None }
  let element = items[0]
  items = items[1..items.count]
  return element
}

Our implementation performs a quick check to see if our Slice is empty, and performs an early return with Optional.None if so. Note that since the return type is already Optional<String>, we can omit the Optional prefix for the enum.

We can then grab the top item from items, then reset items to the rest of the Slice. This is why we declared items as mutable, and also why we declared next() as mutating.

Now note that none of this implementation is specific to Strings, or even to our ArgumentList. In fact, with a quick refactor, we can modify this object to use Generics:

struct CollectionGenerator<T>: Generator {
    var items: Slice<T>

    mutating func next() -> T? {
        if items.isEmpty { return .None }
        let item = items[0]
        items = items[1..items.count]
        return item
    }
}

This is such a generic object that seems to solve so much of the common use case here, I’m a bit baffled as to why it hasn’t been included as a part of the standard library. I’ve already filed a radar on the issue.

Now that we have our Generator, we can finally conform our ArgumentList to Sequence.

Implementing Sequence

We can start by creating an extension on ArgumentList to hold the required method:

extension ArgumentList: Sequence {
}

We can then declare the required method, generate():

extension ArgumentList: Sequence {
    func generate() -> CollectionGenerator<String> {
    }
}

Note that we’re using our generic CollectionGenerator<T> type as the return type here. All that’s left is to create a Collection Generator with our arguments:

extension ArgumentList: Sequence {
    func generate() -> CollectionGenerator<String> {
        return CollectionGenerator(items: arguments[0..arguments.endIndex])
    }
}

Now, we can quickly and easily create a list of arguments passed on the command line, and iterate through them using for in:

let arguments = ArgumentList(argv: C_ARGV, count: C_ARGC)

for argument in arguments {
    println(argument)
}

What’s next

Episode #471 - June 10th, 2014

Posted 3 months back at Ruby5

The Rails/Merb Merge in Retrospect, Opinionated Rails Application Templates with orats, Why Swift Will Never Replace RubyMotion, RubyMotion 3.0 Sneak Peek, Docker 1.0 and RubyConf Portugal taking place in October.

Listen to this episode on Ruby5

Sponsored by Codeship.io

Codeship is a hosted Continuous Deployment Service that just works.

Set up Continuous Integration in a few steps and automatically deploy when all your tests have passed. Integrate with GitHub and BitBucket and deploy to cloud services like Heroku and AWS, or your own servers.

Visit http://codeship.io/ruby5 and sign up for free. Use discount code RUBY5 for a 20% discount on any plan for 3 months.

Codeship

The Rails/Merb Merge in Retrospect

Giles Bowkett wrote an article last week called "The Rails/Merb Merge in Retrospect" where he looks back at the result of merging the Merb framework into what would become Rails 3.0. Giles recognizes improvements, but he argues that despite all of the hard work put into making Rails more modular, most developers haven't embraced this modularity on their projects.
The Rails/Merb Merge in Retrospect

Orats

Orats is a Ruby gem by Nick Janetakis that stands for Opinionated Rails Application Templates. It's a wrapper around the rails command for creating new apps and it does a bunch of stuff for us like setting up redis, sidekiq, puma and a lot more. It can also setup authentication with Devise and a playbook for Ansible.
Orats

Why Swift Will Never Replace RubyMotion

Jack Watson-­Hamblin wrote a blog post on with pretty solid arguments on why RubyMotion will not die anytime soon. In short, he says that RubyMotion is not just a *language*, but a whole toolchain, with a command line tool that doesn't tie you to an editor, like Xcode. There's also the fact that it's a Ruby, so we still have access to all the existing gems out there. Lastly, he points out that life is going to go on for a while without any large portion of the App Store containing apps written in Swift.
Why Swift Will Never Replace RubyMotion

RubyMotion 3.0

Upcoming RubyMotion 3.0 will add support for Android. You will be able to build and run your apps in the Android emulator by running `rake emulator`, or on a USB-­connected Android device running `rake device`. To prepare and sign builds for a Google Play submission, you run `rake release`. All pretty similar to how it works with iOS and OS X.
RubyMotion 3.0

Docker 1.0

This week Docker released version 1.0. This release signifies a level of quality, feature completeness, backward compatibility and API stability to meet enterprise IT standards.
Docker 1.0

RubyConf Portugal

RubyConf Portugal is happening on October 13th and 14th in Braga. This is the first RubyConf taking place in Portugal, and Braga is one of the oldest portuguese cities established in Roman times. Tickets are going fast, so make sure to grab yours at http://rubyconf.pt/ and use the special discount code RUBY5<3PT
RubyConf Portugal

Sponsored by Top Ruby Jobs

If you're looking for a top Ruby job or for top Ruby talent, then you should check out Top Ruby Jobs. Top Ruby Jobs is a website dedicated to the best jobs available in the Ruby community.
Top Ruby Jobs

Sponsored by Ruby5

Ruby5 is released Tuesday and Friday mornings. To stay informed about and active with this podcast, we encourage you to do one of the following:

Thank You for Listening to Ruby5

The Hampshire Hub: open and linked data for Hampshire

Posted 3 months back at RicRoberts :

Some great news - we’ve recently been selected to work on a new project called the Hampshire Hub, which is a partnership project for Hampshire County Council and public service providers in and around the area.

The Hub will be powered by PublishMyData and will centre around a linked datastore for open data on the web, just like our work with DCLG and Glasgow (which is coming soon as part of their future cities project). And as well as just publishing the raw data, of course we’ll be creating some cool stuff with it too.

So how will it work? Well, if you’re technical and want to use the data for an an app or web site then it’ll be available in a variety of computer formats via APIs. And for non-technical users, the whole thing will be human friendly too with filtered searches, visualisations and tools so you can publish, share and consume any data you need.

Whatever type of user you are, the whole point of the Hub is to make it easy to find and use the exact data you want in the format that you want it. You can link to the exact data you need because all of the data (including metadata and attachments) has a URL. And, because we’ll be publishing 5-star open data, you can also combine the exact data you want - whether it’s used for reference or to create apps with.

We’re really pleased to be working on this with Hampshire Council and all the other partners. They already have lots of ideas on their Protohub and this project will be an evolution of that.

Protohub site screenshot

The whole project’s a great example of how public sector organisations can use open data to their advantage. And, it’s great for PublishMyData too because we’ll be adding loads of new features to support new functionality for the Hub. If you want to read more, check out both Bill’s guest post for the current Hub and Hampshire’s post too.

The Definition of Garbage

Posted 3 months back at Jake Scruggs

The views and opinions expressed here are my own and don’t necessarily represent positions, strategies, or opinions of Backstop Solutions Group.

Recently we released episode 3 of the Software Apprenticeship Podcast but had to pull it back for re-editing because of some problems with how developers talk to each other.  Developers are not kind to ANY code.  Even our own.  Especially our own.  Sitting next to a dev while he or she discusses the code they are working on can be a shocking experience.  Words like “Crap,” “Junk”, “Garbage” and many worse are used often.  A lot of this type of talk was on episode 3 and when someone at Backstop (who’s job it is to protect us from ourselves and comments taken out of context) heard it they asked us to edit the podcast to take out some of the more offensive comments. This is why episode 3 sometimes fades into music and then comes back mid-conversation.  Sorry about that.

I don’t know where I first heard the definition of developer as “Whiny Optimist” but it is uncannily accurate.  We developers are forever complaining about previously written code.  Code is awful. Code is crap.  Code is the worst spaghetti wrapped around horse manure we’ve ever seen. 

And yet…

We couldn’t go on if we thought we’d have to live out our lives fighting the very thing we create.  There is this optimism about future code.  It  will be bright and shiny.  The next project to re-write the <whatever> is going to make everything better.  So much better…  The code will be pristine and new features slide in like rum into coke.  Ponies and rainbows are coming.

Also…

Every year I get better at what I do, so even code I thought wonderful 3 years ago can be “crap” to me today.  I look back and see a developer who didn’t keep orthogonal concepts separate who coupled code that should not be coupled and I am sad.  I regret my past inefficiencies and curse them.

But…

How bad is this code really?  Backstop’s code is rigorously tested many times automatically before being poured over by humans.  Any code change in my product gets tested first on my machine (by automated tests) then on another “Build Server” (which runs the tests I was supposed to run and a bunch more), then another series of “Regression Servers” will run some even longer regression tests that literally use the app as our customers do.  If it passes all that then we’ll have our Quality Assurance people go over it again to make sure the machines haven’t missed anything.  The last thing the Q.A. people do is write a new automated regression test to make sure this functionality doesn’t break in the future.

What the heck are we complaining about then?  The software works!  It helps many people make a lot of money, it makes the company money, and is a leader in the industry.  We developers are, in some ways, a bunch of ungrateful jerks.

Let me see if I can explain why.  Writing software that solves hard problems is hard.  Duh. There are only so many people who can do it and we struggle through.  Writing software that solves hard problems and can continue to accept new features easily is the HOLY GRAIL of software development.  Rarely has it been done even though every company claims their code is the “best in the industry.”  If you were to get your hands on the unedited version of episode 3 you would hear a lot of developers complain how we wish we had written code in the past that could be easily changed today.  We might even call such code “garbage” but what does “garbage” really mean?  In our app it has come to mean code that works, is well tested, but resists change more than we would like.  We are whining about having to do more work.  If only our past selves had properly separated the concerns more, if only there was more time for refactoring.  But some day we will reach that shining castle on the hill.  And there will be ponies and rainbows for all.</whatever>

Expanding the Raleigh Office

Posted 3 months back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

Last September, we opened an office in Raleigh, NC. Jason moved down from the Boston office, and I moved back to the Triangle after a tour of Atlanta, Charlotte, and Boston. We couldn’t wait to get involved with the fast-growing tech scene, enjoy the small-town feel, and settle into the best city in the country for raising a family.

We moved into the American Underground in Raleigh, downtown on Fayetteville Street. We’re in a private office in an awesome co-working space. It’s an exciting environment, with folks from Uber, Google, Mozilla, and local startups like Bandwidth Labs, Groundfloor, and Photofy.

American Underground Raleigh

We’ve met hundreds of people in the community, given talks at the JavaScript and Ruby meetups, consulted with teams out of The Startup Factory, and argued about barbecue.

We finally feel like we’ve put down our roots, and now we’re looking to grow.

We’re looking for an experienced developer

Web developers at thoughtbot are able to rapidly build high-quality, fully test-driven Ruby on Rails applications. Well-qualified candidates have an excellent knowledge of HTML, CSS, JavaScript, SQL, Unix, deployment, performance, debugging, refactoring, design patterns, and other best practices.

We’re looking for an experienced designer

Product designers at thoughtbot are fully capable of creating great visual design as well as doing great product design and user experience, and then implementing their designs with HTML and CSS (Sass). We don’t have front-end developers, preferring smaller, more integrated teams that can directly realize their vision.

Benefits and perks

These are full-time positions with competitive salary and exceptional benefits, which include unlimited time off, paid conference expenses, and 100% of medical premiums paid.

Quit your job and come work with us. Head to thoughtbot.com/jobs and get in touch.

Three Things To Know About Composition

Posted 4 months back at Luca Guidi - Home

A few days ago Aaron Patterson wrote a in interesting article about composition vs inheritance with Ruby.

He says that when inheriting our classes directly from Ruby’s core objects such as Array, our public API for that object will become too large and difficult to maintain.

Consider a powerful object like String which has 164 public methods, once our library will be released, we should maintain all that amount of code. It doesn’t worth the trouble, probably because we just wanted to pick a few methods from it. It’s better to compose an object that hides all that complexity derived from String, and to expose only the wanted behaviors.

I was already aware of these issues, but that article was a reminder for fixing my OSS projects. For this reason I refactored Lotus::Utils::LoadPaths. It used to inherit from Array (169 methods), but after breaking the inheritance structure, I discovered that I only needed 2 methods.

However, there are some hidden corners that are worthing to share.

Information escape

A characteristic that I want for LoadPaths is the ability to add paths to it. After the refactoring, for the sake of consistency, I decided to name this method after Array’s #push, and to mimic its behavior.

The initial implementation of this method was:

it 'returns self so multiple operations can be performed' do
  paths = Lotus::Utils::LoadPaths.new

  paths.push('..').
        push('../..')

  paths.must_include '..'
  paths.must_include '../..'
end

class Lotus::Utils::LoadPaths
  # ...

  def push(*paths)
    @paths.push(*paths)
  end
end

When we use this Ruby’s method, the returning value is the array itself, because language’s designers wanted to make chainable calls possible. If we look at the implementation of our method, the implicit returning value was @paths (instead of self), so the subsequent invocations were directly manipulating @paths.

The test above was passing because arrays are referenced by their memory address, so that the changes that happened on the outside (after the accidental escape) were also visible by the wrapping object (LoadPaths). Because our main goal is to encapsulate that object, we want to prevent situations like this.

it 'returns self so multiple operations can be performed' do
  paths = Lotus::Utils::LoadPaths.new

  returning = paths.push('.')
  returning.must_be_same_as(paths)

  paths.push('..').
        push('../..')

  paths.must_include '.'
  paths.must_include '..'
  paths.must_include '../..'
end

class Lotus::Utils::LoadPaths
  # ...

  def push(*paths)
    @paths.push(*paths)
    self
  end
end

Dup and Clone

LoadPaths is used by other Lotus libraries, such as Lotus::View. This framework can be “duplicated” with the goal of ease a microservices architecture, where a developer can define MyApp::Web::View and MyApp::Api::View as “copies” of Lotus::View, that can independently coexist in the same Ruby process. In other, words the configurations of one “copy” shouldn’t be propagated to the others.

Until LoadPaths was inheriting from Array, a simple call to #dup was enough to get a fresh, decoupled copy of the same data. Now the object is duplicated but not the variables that it encapsulates (@paths).

paths1 = Lotus::Utils::LoadPaths.new
paths2 = paths1.dup

paths2.push '..'
paths1.include?('..') # => true, which is an unwanted result

The reason of this failure is the same of the information escaping problem: we’re referencing the same array. Ruby has a special method callback that is designed for cases like this.

class Lotus::Utils::LoadPaths
  # ...

  def initialize_copy(original)
    @paths = original.instance_variable_get(:@paths).dup
  end
end

Now, when paths1.dup is called, also the @paths instance variable will be duplicated and we can safely change paths2 without affecting it.

Freeze

A similar problem arises for #freeze. I want Lotus::View to freeze its configurations after the application is loaded. This immutability will prevent accidental changes that may lead to software defects.

When we try to alter the state of a frozen object, Ruby raises a RuntimeError, but this wasn’t the case of LoadPaths.

paths = Lotus::Utils::LoadPaths.new
paths.freeze
paths.frozen? # => true

paths.push '.' # => It wasn't raising RuntimeError

This had an easy fix:

class Lotus::Utils::LoadPaths
  # ...

  def freeze
    super
    @paths.freeze
  end
end

Conclusion

Composition should be preferred over inheritance, but beware of the unexpected behaviors.

I discovered these problems in a matter of a minutes, because the client code of this object (Lotus::View) has some integration tests that are asserting all these features, without assuming anything of the underlying objects. For instance, it checks one by one all the attributes of a configuration after its duplication, without trusting the fact they can safely duplicate themselves. This double layered testing strategy is fundamental for me while building Lotus.

Testing from the Outside-In

Posted 4 months back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

A few months ago my colleague Josh Steiner wrote a comprehensive post on How We Test Rails Applications, detailing the different types tests we write and the various technologies that go with them. In this follow up, we will take a closer look at thoughtbot’s testing workflow.

We use a process known as “Outside-in testing”, driving our development from high-level tests and working our way down to lower-level concerns. Say we are working on an e-commerce site and want to implement the following story:

As a guest, I can add items to my shopping cart so that I can keep on shopping

Before we start thinking about models, controllers, or other architectural concerns we write a high-level RSpec feature test that describes the behavior from the user’s perspective.

# spec/features/guest_adds items_to_shopping_cart_spec.rb
feature 'Guest adds items to shopping cart' do
  scenario 'via search' do
    item = create(:item)

    visit root_path
    fill_in 'Search', with: item.name
    click_on 'Search Catalogue'

    click_on item.name
    click_on 'Add to Cart'
    click_on 'Shopping Cart'

    expect(page).to have_content(item.name)
    expect(page).to have_content("Subtotal: #{item.price})
  end
end

Depending on how much of the application is implemented, this test could break in multiple places. If this were a newly-generated application we might need to implement a home page. Once we have a home page we would probably get an error while attempting to use the search bar saying that ‘No such route exists’. This leads us to implement a /items route.

# config/routes.rb
# ...
resources :items, only: [:index]
# ...

The next few errors walk us through creating an ItemsController, with an empty index action and corresponding view. Now that we can successfully click on “Search Catalogue”, we get an error saying that there the desired item does not appear in the search results so we expose some items in the controller and display them in the view.

# app/controllers/items_controller.rb
#...
def index
  @items = Item.search(params[:search_query])
end
#...
# app/views/items/index.html.erb
<% @items.each do |item| %>
  <%= link_to item.name, item %>
<% end %>

This gives us a new error saying that there is no method search defined Item. At this point, we drop down a level of abstraction and write a unit test for Item.

# spec/models/item_spec.rb
describe Item, '.search' do
  it 'filters items by the search term' do
    desired_item = create(:item)
    other_item = create(:item)

    expect(Item.search(item.name)).to eq [item]
  end
end

This test leads us to correctly implement Item.search:

# app/models/item.rb
#...
def self.search(term)
  where(name: term)
end
#...

Now the unit test passes so we go back up to our feature test. We can successfully click on the item’s name in the search results!

We keep following this pattern for the remaining test failures, dropping down to the unit test level when necessary, until we have a green test suite. Now our story has been successfuly implemented!

Mocking and Stubbing

The goal of a feature test is to test the real system from end-to-end from the user’s perspective. To do this, we use real database records and don’t mock or stub any of our objects. We do stub calls to external websites (via webmock or a fake) since the network can be unreliable. Our tests should run without an internet connection.

When dropping down to the unit test level, we aggressivly mock/stub out dependencies and collaborators. The goal of a unit test is to prove the functionality of the object being tested, not the functionality of its collaborators. Difficulty in testing two objects in isolation from each other often points to too tight coupling between them.

Further Reading

For some more great articles on testing, check out:

Analyzing Minard's Visualization Of Napoleon's 1812 March

Posted 4 months back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

In The Visual Display of Quantitative Information, Edward Tufte calls Minard’s graphic of Napoleon in Russia one of the “best statistical drawings ever created.” But what makes it so good?

Before we analyze this graphic, we need to know a bit of history.

The year is 1812, and Napoleon is doing pretty well for himself. He has most of Europe under his control, except for the UK. No matter how many times he tried to invade them, he couldn’t break through their defenses. His plan was to place an embargo on them, forcing the other European countries to stop trade with the UK which would weaken them enough so that Napoleon could invade and take over easily.

Czar Alexander of Russia sees that Napoleon was becoming too powerful, so he refuses to participate in this embargo. Angry at Czar Alexander’s decision, Napoleon gathers a massive army of over 400,000 to attack Russia in June of 1812. While Russia’s troops are not as numerous as France’s, Russia has a plan. Russian troops keep retreating as Napoleon’s troops move forward, burning everything they pass, ensuring that the French forces could not take anything from their environment. Eventually the French army follows the Russian army all the way to Moscow during October, suffering major losses from lack of food. By the time Napoleon gets to Moscow, he knows he has to retreat. As winter settles into Europe and the temperature drops, Napoleon’s troops suffer even more losses, returning to France from lack of food, disease, and weather conditions.

Let’s look at all the data we have for this battle.

We have the numbers of Napoleon’s troops by location (longitude), organized by group and direction. We can plot it on line graphs like so.

Next, the temperature experienced by his troops when winter settled in on the return trip.

We also have the path that his troops took to and from Moscow. We can display this information by plotting the paths on a map.

Finally, here is Minard’s graphic.

Source

We have many dimensions of data that takes several individual graphs to represent. Minard’s graphic is quite clever because of its ability to combine all of dimensions: loss of life at a time and location, temperature, geography, historical context, into one single graphic. He shows these various details without distracting text or labels as well. For example, he displays the points where Napoleon’s troops divide into subgroups by breaking out the main bar into branches. He adds thin lines to represent river crossings on the return trip that further decimated Napoleon’s diminishing troops. And he is able to show the drastic loss in life from Napoleon’s decision in just a single corner of the diagram.

Beginning vs end of the campaign

The beginning of Napoleon’s march vs the end of his retreat.

Equally important as what’s shown is what’s not shown — here’s an earlier example of a well-published data visualization:

Source

This graphic was created by William Playfair, largely considered to be the father of information design, in 1786 — about 100 years before Minard’s diagram was made. Playfair is the inventor of the pie chart, the bar graph, and the line graph - statistic graphics we use regularly today. This graphic has gridlines to mark the years and the number of exports. There are five colors, each color representing something different, as well as a number of specific labels in large text. Compare this to Minard’s graphic: when he draws the map, there are no geographical borders and only very minimal geographical plotting. There are many labels describing cities and number of troops, but they are very small. Minard only uses two colors to represent all of the data in the graphic. Additional labels, gridlines, and geographical markings would have made the graphic too overwhelming for the eye, which makes the impact of the data so much stronger.

There are some similarities to designing print data graphics and designing modern interfaces for mobile and web. When we need to translate numbers into graphics to users, we want to focus on communicating lots of information without overwhelming the users with extraneous content, much like Minard with his visualization of Napoleon’s March. Both designers created effective graphics to turn numbers into a narrative, but Minard was able to tell a much more detailed story with his design techniques.

What’s next?

If you would like to learn more about information design, check out Edward Tufte’s The Visual Display of Quantitative Information or any of his other books. You can also take one of his courses in your city.

You can also read:

Episode #470 - June 6th, 2014

Posted 4 months back at Ruby5

Rails and Sinatra learn to share, Code Climate make a case for Protocol Buffers, read up on HTTP API design, and dat science.

Listen to this episode on Ruby5

Sponsored by New Relic

New Relic is _the_ all-in-one web performance analytics product. It lets you manage and monitor web application performance, from the browser down to the line of code. With Real User Monitoring, New Relic users can see browser response times by geographical location of the user, or by browser type.
This episode is sponsored by New Relic

Share Rails & Sinatra Sessions

Learn a few ways to share authentication between Rails and Sinatra apps.
Share Rails & Sinatra Sessions

Protocol Buffers vs JSON

Read this comparison of JSON and Protocol Buffers and decide for yourself which is right for your service.
Protocol Buffers vs JSON

HTTP API Design Guide

A guide that describes a set of HTTP+JSON API design practices, originally extracted from work on the Heroku Platform API.
HTTP API Design Guide

dat-science

A Ruby library for carefully refactoring critical paths.
dat-science

Episode #469 - June 3rd, 2014

Posted 4 months back at Ruby5

This week we talk about the brand new RSpec 3 and Git 2, but also how to use bower-rails instead of JavaScript gems, JSON responses with PostgreSQL, and the quirks of Ruby serialization & enumeration.

Listen to this episode on Ruby5

Sponsored by Codeship

Codeship is a hosted Continuous Deployment Service that just works. Set up Continuous Integration in a few steps and automatically deploy when all your tests have passed. Codeship has great support for lots of languages and test frameworks. It integrates with GitHub and BitBucket. You can start with their free plan. Setup only takes 1 minute. You can find them on codeship.io/ruby5 and remember check out their great blog.
Codeship

RSpec 3

Whether you test-drive or not, you should be excited about the release of RSpec 3 yesterday. Myron Marston — the project maintainer — wrote a humongous and deliciously detailed blog post about all these Notable Changes. First, RSpec 3.0 adds better support for Ruby 2.0 features like keyword arguments and Module prepending. That said they dropped Ruby 1.8.6 and 1.9.1 support, but at this point who cares. The RSpec team provided version 2.99 which will output deprecation warnings for anything that changes in RSpec 3.0. It’s pretty great and if you’re worried that your build logs will be littered with deprecations, you can configure RSpec to dump those warnings into a separate file. before(:each) has been aliased to before(:example) and before(:all) to before(:context) in order to make the scope of those hooks more explicit. Every method in the RSpec DSL now yields the RSpec example object instead of exposing an `example` method that could interfere with your own specs. This version also brings an eagerly awaited Zero Monkey Patching mode which should silence one of the biggest criticisms of RSpec: that it monkey patched the Object class with tons of RSpec-specific methods. You can now use expect(object).to equal something, which to me reads quite well and allows for cleaner specs because it makes the expectation part of an example painfully obvious. Zero monkey patching also extends to rspec-mocks, so you now allow(object).to receive(:methodname).and_return(value). Same thing for stubs. You can even expose the RSpec DSL globally in order to avoid monkey patching Ruby’s `main` and Module to call the describe method for example.
RSpec 3

Git 2.0.0

Felipe Contreras, took the time to decipher the Git developers’ cryptic prose to figure out what changed in Git 2.0.0 and why you should care. Instead of pushing all branches that exist locally and on your remote, Git now defaults to “simple” pushing by only dealing with the branch currently checked out. `git add` now defaults to also staging removed files so no need to add the uppercase `A` flag to force it to add all changes. `git add` with no arguments also has more predictable behavior now and will add anything that was modified within the repo.
Git 2.0.0

bower-rails

In the JavaScript world, Bower seems to be the prefered package manager so why not use Bower inside your Rails app instead? This is something the bower-rails gem let’s you do. If you’re not a fan of JSON you can use bower-rails’ own Ruby DSL which even allows you to use the familiar group method from Bundler’s Gemfile. Oh, and just like Bundler, you can declare dependencies stored on Bower’s repository, any public Git repository, or even with a simple GitHub user/repo format. The gem comes with a bunch of rake tasks that allow you to: install, update, and list dependencies as you need to.
bower-rails

Avoid Rails when Generating JSON responses with PostgreSQL

In “Avoid Rails when generating JSON responses with Postgres” Dan McClain shows you all the steps that Rails has to typically go through to get data from Postgres and into your custom JSON format. Then he shows you how to craft a huge Postgres query that will generate pure JSON much quicker than Rails. Dan also created a new PostgresEXT-Serializers gem. This gem monkey patches ActiveModel Serializers and then everytime you try to serialize an ActiveRecord Relation it will take over and push the work down to Postgres. Only caveat is this is a VERY new library, so use with caution.
Avoid Rails when Generating JSON responses with PostgreSQL

Ruby Serialization & Enumeration

It was refreshing to see Xavier Shay from the Square team write about dealing with a weird serialization error with ActiveSupport. In the blog post titled Ruby Serialization & Enumeration, he praises the transparency of Ruby’s libraries, mentions how easy it is to pry them open, jump inside and twiddle stuff around to figure out what’s going wrong. co: Yeah, if you’re never used debugger or pry to jump inside of some of the gems that you use every day, this post shows how useful it can be. That, and it forces you to read more code written by other potentially more skilled developers.
Ruby Serialization & Enumeration

Sponsored by Top Ruby Jobs

If you're looking for a top Ruby job or for top Ruby talent, then you should check out Top Ruby Jobs. Top Ruby Jobs is a website dedicated to the best jobs available in the Ruby community.
Top Ruby Jobs

Sponsored by Ruby5

Ruby5 is released Tuesday and Friday mornings. To stay informed about and active with this podcast, we encourage you to do one of the following:

Thank You for Listening to Ruby5

Episode #468 - May 30th, 2014

Posted 4 months back at Ruby5

Understanding the magic of routes in Rails, a simple rest-client, using PostGIS with Google Maps, setting up SSL with the latest toys, an examination of Rails vs. Sinatra, and how to get your developers blogging all in this episode of the Ruby5!

Listen to this episode on Ruby5

Sponsored by New Relic

New Relic is _the_ all-in-one web performance analytics product. It lets you manage and monitor web application performance, from the browser down to the line of code. With Real User Monitoring, New Relic users can see browser response times by geographical location of the user, or by browser type.
This episode is sponsored by New Relic

Magical Routes

Routing in Rails...let me count the ways. The URL helpers are very flexible and this blog posts walks you through the different parts, their options, and when to use them.
Magical Routes

REST Client

Probably the simplest REST client for Ruby.
REST Client

PostGIS and Google Maps

This two part blog series walks you through using PostGIS with Rails and Google Maps!
PostGIS and Google Maps

SSL for Rails with Heroku and DNSimple

Setting up SSL doesn't have to be a hassle if you're using the right services. Check out this blog post from thoughtbot to see what a wonderful world it can be!
SSL for Rails with Heroku and DNSimple

Rails vs. Sinatra

Which do you chooose? Rails or Sinatra? Unfortunately for us it is never quite that simple. This blog post from Engine Yard gives a good explanation of the differences in getting started with the two frameworks.
Rails vs. Sinatra

How to Get Developers to Write a Blogpost

Been wanting to write a blog post but letting fear, uncertainty, and doubt get in the way? Check out this post from Netguru to get over yourself.
How to Get Developers to Write a Blogpost

iOS Text Kit Basics

Posted 4 months back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

In the iOS 7 SDK, Apple gave developers programmatic access to Text Kit, an advanced text layout system that lets you easily display and edit rich text in your applications. Although these capabilities are new to iOS, they’ve existed in other contexts (OS X since its first release, and OpenStep before that) for nearly two decades, in what’s called the Cocoa text system. In fact, if you’ve used OS X at all, there is a near 100% chance that you’ve run apps built with the Cocoa text system.

However, for iOS developers, who often are not steeped in the details of OS X development, the details of using the supplied text layout system are new, and may seem mysterious at first. I intend to help you understand the basics of how this works, and see how you can add rich text features to your own apps.

The simplest thing

Let’s start off by making the simplest possible app that shows off some of what Text Kit can do. In Xcode, create a new project using the Single View Application template, and name it Simple Text View. Select Main.storyboard, use the Object Library to find a UITextView object, and drag it out to fill the view of the available view controller; you’ll see blue guidelines appear and the whole thing snap into place when it’s properly centered. Then use the Attributes Inspector to change the text view’s Text attribute from Plain to Attributed. What this does is tell the text view to allow rich text by using an attributed string. An attributed string, which is represented in iOS by the NSAttributedString class, is simply a string that has some attached metadata describing its attributes. This metadata may contain any number of ranges of characters, each with its own set of attributes. For example, you could specify that starting at the fifth character, the next six characters are bold, and that starting at the tenth character, the next five characters are italicized; In that case, the tenth character would be both bold and italicized. In effect,

0123456789ABCDEF

However, plenty of rich text content is created not by programmatically specifying ranges and attributes, but by users working in an editor that lets them create rich text. That’s a use case that is fully supported by UITextView starting in iOS 7.

UITextView in Interface Builder

To prove this, use the Attributes Inspector to modify parts of the “Lorem ipsum” text that the view contains by default. Use the controls in the inspector to change some fonts, adjust paragraph alignment, set foreground and backgroud colors, whatever you want. When you hit cmd-R to run the app in the iOS Simulator or on a device, you’ll see that all the formatting changes you made show up on the device. You can tap to edit the text at any point, and the formatting that applies where the cursor is will carry on to new characters you type, just as you’d expect from any word processor application.

The innards

So far, so good. Even better, it turns out that a few other popular UIKit classes, namely UILabel and UITextField, also allow the use of attributed strings in iOS 7. This means that if you just want to display some rich text in a single rectangular box, you’re all set. Just put a properly configured UILabel where you want to show your rich text, and you’re done! This simple task was remarkably hard to accomplish before iOS 7, so right there we’ve made a huge leap.

But, what if you want to do more? There are certain kinds of layout tricks that none of the UIKit classes can do on their own, out of the box. For example, if you want to make text flow around a graphic, or make a single string fill up one rectangle before spilling into another (as in the case of multiple columns), you’ll have to do more. Fortunately, the innards of Text Kit, which are used by UITextView and the rest, are at your disposal in the form of the NSTextStorage, NSLayoutManager, and NSTextContainer classes. Let’s talk about these one by one:

  • NSTextStorage is actually a subclass of NSMutableAttributedString, which itself is a subclass of NSAttributedString. It adds some functionality that is useful for dealing with a user editing text, and nothing more.
  • NSTextContainer is an abstract description of a two-dimensional box that text could be rendered into. Basically, this class is little more than a glorified size. It contains a few parameters for describing how text should behave when rendered within a box of its size, and that’s about it.
  • NSLayoutManager is the real brains of the operation. It knows how to take an NSTextStorage instance, and layout all the characters it contains into the virtual boxes described by one or more NSTextContainers.

A class like UITextView uses these components to do all its text layout. In fact, UITextView has three properties called textStorage, textContainer, and layoutManager for just this purpose. When UITextView wants to draw its content, it tells its layoutManager to figure out which glyphs (the graphical representations of the characters it contains) from its textStorage can fit within its textContainer, then it tells the layoutManager to actually draw those glyphs at a point inside the text view’s frame. So you see that the design of UITextView itself is inherently limited to a single rectangle. In order to get a feel for how these innards work, I’ll now show you a UIView subclass that will display rich text in multiple columns, a trick that UITextView really can’t pull off in its current form.

Create TBTMultiColumnTextView

In your open Xcode project, create a new subclass of UIView called TBTMultiColumnView. Like UITextView, this class will have textStorage and layoutManager properties. Unlike UITextView, it will keep track of multiple independent text containers and multiple origins for drawing rectangles. The first thing you should do is create a class extension at the top of the file, containing the following properties:

@interface TBTMultiColumnTextView ()

@property (copy, nonatomic) NSTextStorage *textStorage;
@property (strong, nonatomic) NSArray *textOrigins;
@property (strong, nonatomic) NSLayoutManager *layoutManager;

@end

Besides the NSTextStorage and NSLayoutManager instances, we’re also going to maintain an array of origins, each corresponding to an NSTextContainer. We don’t have to hang onto the text containers themselves, because the layout manager keeps its own list, which we can access.

Now, let’s get started with the methods for this class. First, override viewDidLoad as shown here:

- (void)awakeFromNib {
    [super awakeFromNib];
    self.layoutManager = [[NSLayoutManager alloc] init];

    NSURL *fileURL = [[NSBundle mainBundle] URLForResource:@"constitution"
                                             withExtension:@"rtf"];
    self.textStorage = [[NSTextStorage alloc] initWithFileURL:fileURL
                                                      options:
                        @{NSDocumentTypeDocumentAttribute:NSRTFTextDocumentType}
                                           documentAttributes:nil
                                                        error:nil];
    [self createColumns];
}

This method is pretty straightforward. It starts off by creating a layout manager, which we’ll use every time we need to draw this object’s content. Then we read the contents of an RTF file, which we’ve included in our project, into an NSTextStorage instance. Our project contains an RTF file that contains the U.S. constitution, but you can use any RTF document you have at hand. Since this object will need to be redrawn any time the text storage changes, we implement the setter, like this:

- (void)setTextStorage:(NSTextStorage *)textStorage {
    _textStorage = [[NSTextStorage alloc] initWithAttributedString:textStorage];
    [self.textStorage addLayoutManager:self.layoutManager];
    [self setNeedsDisplay];
}

Note that we have a special way of making a new copy of the object that’s passed in. As it turns out, just sending copy to an instance of NSTextStorage actually returns an instance of an immutable parent class (just like you’d expect with, say, an NSMutableString). That’s why we take the step of explicitly creating a new instance based on the received parameter.

At the end of awakeFromNib, we called the createColumns method, which is where most of this class’s work really happens. It looks like this:

- (void)createColumns {
    // Remove any existing text containers, since we will recreate them.
    for (NSUInteger i = [self.layoutManager.textContainers count]; i > 0;) {
        [self.layoutManager removeTextContainerAtIndex:--i];
    }

    // Capture some frequently-used geometry values in local variables.
    CGRect bounds = self.bounds;
    CGFloat x = bounds.origin.x;
    CGFloat y = bounds.origin.y;

    // These are effectively constants. If you want to make this class more
    // extensible, turning these into public properties would be a nice start!
    NSUInteger columnCount = 2;
    CGFloat interColumnMargin = 10;

    // Calculate sizes for building a series of text containers.
    CGFloat totalMargin = interColumnMargin * (columnCount - 1);
    CGFloat columnWidth = (bounds.size.width - totalMargin) / columnCount;
    CGSize columnSize = CGSizeMake(columnWidth, bounds.size.height);

    NSMutableArray *containers = [NSMutableArray arrayWithCapacity:columnCount];
    NSMutableArray *origins = [NSMutableArray arrayWithCapacity:columnCount];

    for (NSUInteger i = 0; i < columnCount; i++) {
        // Create a new container of the appropriate size, and add it to our array.
        NSTextContainer *container = [[NSTextContainer alloc] initWithSize:columnSize];
        [containers addObject:container];

        // Create a new origin point for the container we just added.
        NSValue *originValue = [NSValue valueWithCGPoint:CGPointMake(x, y)];
        [origins addObject:originValue];

        [self.layoutManager addTextContainer:container];
        x += columnWidth + interColumnMargin;
    }
    self.textOrigins = origins;
}

This method is honestly a little longer than we’d like, but for this example it does the job. This method may need to be called multiple times, whenever the view’s coordinates are adjusted (such as when the device rotates), so we need to make sure it can run multiple times without ending up in a weird state. So, it starts off by removing any old text containers that may be attached to the layout manager. It does this because the whole point of this method is to create a fresh set of text containers, and having old ones lying around will only give us grief. This method then calculates appropriate text container sizes depending on the view’s size and some hard-coded values for the number of columns and the amount of margin that should appear between columns. Finally it creates and configures a number of containers and and equal number of points (wrapped in NSValue objects).

Next we’re going to make use of all those containers and points we just created. The drawRect: method tells the layout manager to finally draw its content into each text container. It looks like this:

- (void)drawRect:(CGRect)rect {
    for (NSUInteger i = 0; i < [self.layoutManager.textContainers count]; i++) {
        NSTextContainer *container = self.layoutManager.textContainers[i];
        CGPoint origin = [self.textOrigins[i] CGPointValue];

        NSRange glyphRange = [self.layoutManager glyphRangeForTextContainer:container];

        [self.layoutManager drawGlyphsForGlyphRange:glyphRange atPoint:origin];
    }
}

All we do here is loop over all available text containers and origin points, each time asking the layout manager which glyphs it can fit into the container, then telling the layout manager to draw those glyphs at the origin point.

That’s just about all we need. In order to make things work after device rotation, however, we need to do one more thing. By overriding layoutSubviews, which is called when the view rotates, we can make sure that the columns are regenerated for the new size:

- (void)layoutSubviews {
    [super layoutSubviews];
    [self createColumns];
    [self setNeedsDisplay];
}

That’s all we need to make this class draw rich text in two columns, and automatically adjust for changes in view geometry. To see this in action, go back to the storyboad and follow these steps:

  • Remove the UITextView you added at the start.
  • Find a UIView in the object library, drag it into the view and make it fill the view completely.
  • Use the Identity Inspector to change this object’s class to TBDMultiColumnView.
  • To make sure the view’s geometry changes along with its superview (e.g. when the device rotates), add constraints from the view to its superview for top, bottom, leading, and trailing space. This is most easily accomplished by clicking the Pin button at the bottom of Interface Builder’s editing area, and selecting each of the four red, dashed-line symbols surrounding the little square (which represents the selected view). That sounds complicated, but once you see it, you’ll get it.

Once you’ve taken those final steps, you can build and run in the simulator or on a device, and see your multicolumn display in all its glory!

Multi Column Text View running on iOS
Simulator

Closing remarks

This class demonstrates a technique for creating a view that lets rich text flow across multiple columns in just a few lines of code. But we’re really just scratching the surface here. Besides flowing across multiple rectangles, Text Kit will let you do plenty of other things, including drawing text inside the path of an arbitrary shape, making text flow around other paths, and more. You can learn more about thse techniques by looking at Apple’s iOS Text Kit Overview, as well as their Mac documentation for the Cocoa text system, which is where much of Text Kit’s functionality originated.