Notes on 'Notes on "Counting Tree Nodes"'

Posted about 1 month back at - Home

Having now finished watching Tom’s episode of Peer to Peer, I finally got around to reading his Notes on “Counting Tree Nodes” supplementary blog post. There are a couple of ideas he presents that are so interesting that I wanted to highlight them again here.

If you haven’t seen the video, then I’d still strongly encourage you to read the blog post. While I can now see the inspiration for wanting to discuss these ideas1, the post really does stand on it’s own.

Notes on ‘Enumerators’

Here’s the relevant section of the blog post. Go read it now!

I’m not going to re-explain it here, so yes, really, go read it now.

What I found really interesting here was the idea of building new enumerators by re-combining existing enumerators. I’ll use a different example, one that is perhaps a bit too simplistic (there are more concise ways of doing this in Ruby), but hopefully it will illustrate the point.

Let’s imagine you have an Enumerator which enumerates the numbers from 1 up to 10:

> numbers = 1.upto(10)
=> #<Enumerator: 1:upto(10)>
=> 1
=> 2
=> 3
=> 9
=> 10
StopIteration: iteration reached an end

You can now use that to do all sorts of enumerable things like mapping, selecting, injecting and so on. But you can also build new enumerables using it. Say, for example, we now only want to iterate over the odd numbers between 1 and 10.

We can build a new Enumerator that re-uses our existing one:

> odd_numbers = do |yielder|
    numbers.each do |number|
      yielder.yield number if number.odd?
=> #<Enumerator: #<Enumerator::Generator:0x007fc0b38de6b0>:each>

Let’s see it in action:

=> 1
=> 3
=> 5
=> 7
=> 9
StopIteration: iteration reached an end

So, that’s quite neat (albeit somewhat convoluted compared to 1.upto(10).select(&:odd)). To extend this further, let’s imagine that I hate the lucky number 7, so I also don’t want that be included. In fact, somewhat perversely, I want to stick it right in the face of superstition by replacing 7 with the unluckiest number, 13.

Yes, I know this is weird, but bear with me. If you have read Tom’s post (go read it), you’ll already know that this can also be achieved with a new enumerator:

> odd_numbers_that_arent_lucky = do |yielder|
    odd_numbers.each do |odd_number|
      if number == 7
        yielder.yield 13
        yielder.yield number
=> #<Enumerator: #<Enumerator::Generator:0x007fc0b38de6b0>:each>
=> 1
=> 3
=> 5
=> 13
=> 9
StopIteration: iteration reached an end

In Tom’s post he shows how this works, and how you can further compose enumerators to to produce new enumerations with specific elements inserted at specific points, or elements removed, or even transformed, and so on.


A hidden history of enumerable transformations

What I find really interesting here is that somewhere in our odd_numbers enumerator, all the numbers still exist. We haven’t actually thrown anything away permanently; the numbers we don’t want just don’t appear while we are enumerating.

The enumerator odd_numbers_that_arent_lucky still contains (in a sense) all of the numbers between 1 and 10, and so in the tree composition example in Tom’s post, all the trees he creates with new nodes, or with nodes removed, still contain (in a sense) all those nodes.

It’s almost as if the history of the tree’s structure is encoded within the nesting of Enumerator instances, or as if those blocks passed to act as a runnable description of the transformations to get from the original tree to the tree we have now, invoked each time any new tree’s children are enumerated over.

I think that’s pretty interesting.

Notes on ‘Catamorphisms’

In the section on Catamorphisms (go read it now!), Tom goes on to show that recognising similarities in some methods points at a further abstraction that can be made – the fold – which opens up new possibilities when working with different kinds of structures.

What’s interesting to me here isn’t anything about the code, but about the ability to recognise patterns and then exploit them. I am very jealous of Tom, because he’s not only very good at doing this, but also very good at explaining the ideas to others.

Academic vs pragmatic programming

This touches on the tension between the ‘academic’ and ‘pragmatic’ nature of working with software. This is something that comes up time and time again in our little sphere:

Now I’m not going to argue that anyone working in software development should have a degree in Computer Science. I’m pretty sympathetic with the idea that many “Computer Science” degrees don’t actually bear much of direct resemblance to the kinds of work that most software developers do2.

Ways to think

What I think university study provides, more than anything else, is exposure and training in ways to think that aren’t obvious or immediately accessible via our direct experience of the world. Many areas of study provide this, including those outside of what you might consider “science”. Learning a language can be learning a new way to think. Learning to interpret art, or poems, or history is learning a new way to think too.

Learning and internalising those ways to think give perspectives on problems that can yield insights and new approaches, and I propose that that, more than any other thing, is the hallmark of a good software developer.

Going back to the blog post which, as far I know, sparked the tweet storm about “programming and maths”, I’d like to highlight this section:

At most academic CS schools, the explicit intent is that students learn programming as a byproduct of learning CS. Programming itself is seen as rather pedestrian, a sort of exercise left to the reader.

For actual developer jobs, by contrast, the two main skills you need these days are programming and communication. So while CS still does have strong ties to math, the ties between CS and programming are more tenuous. You might be able to say that math skills are required for computer science success, but you can’t necessarily say that they’re required for developer success.

What a good computer science (or maths or any other logic-focussed) education should teach you are ways to think that are specific to computation, algorithms and data manipulation, which then

  • provide the perspective to recognise patterns in problems and programs that are not obvious, or even easily intuited, and might otherwise be missed.
  • provide experience applying techniques to formulate solutions to those problems, and refactorings of those programs.

Plus, it’s fun to achieve that kind of insight into a problem. It’s the “a-ha!” moment that flips confusion and doubt into satisfaction and certainty. And these insights are also interesting in and of themselves, in the very same way that, say, study of art history or Shakespeare can be.

So, to be crystal clear, I’m not saying that you need this perspective to be a great programmer. I’m really not. You can build great software that both delights users and works elegantly underneath without any formal training. That is definitely true.

Back to that quote:

the ties between CS and programming are more tenuous … you can’t necessarily say that they’re required for developer success.

All I’m saying is this: the insights and perspectives gained by studying computer science are both useful and interesting. They can help you recognise existing, well-understood problems, and apply robust, well-understood and powerful solutions.

That’s the relevance of computer science to the work we do every day, and it would be a shame to forget that.

  1. In the last 15 minutes or so of the video, the approach Tom uses to add a “child node” to a tree is interesting but there’s not a huge amount of time to explore some of the subtle benefits of that approach

  2. Which is, and let’s be honest, a lot of “Get a record out of a database with an ORM, turn it into some strings, save it back into the database”.

Baseimage-docker 0.9.12 released

Posted about 1 month back at Phusion Corporate Blog

Baseimage-docker is a special Docker image that is configured for correct use within Docker containers. It is Ubuntu, plus modifications for Docker-friendliness. You can use it as a base for your own Docker images. Learn more at the Github repository and the website, which explain in detail what the problems are with the stock Ubuntu base image, and why you should use baseimage-docker.

Changes in this release

  • We now officially support nsenter as an alternative way to login to the container. With official support, we mean that we’ve provided extensive documentation on how to use nsenter, as well as related convenience tools. However, because nsenter has various issues, and for backward compatibility reasons, we still support SSH. Please refer to the README for details about nsenter, and what the pros and cons are compared to SSH.
    • The docker-bash tool has been modified to use nsenter instead of SSH.
    • What was previously the docker-bash tool, has now been renamed to docker-ssh. It now also works on a regular sh shell too, instead of bash specifically.
  • Added a workaround for Docker’s inability to modify /etc/hosts in the container (Docker bug 2267). Please refer to the README for details.
  • Fixed an issue with SSH X11 forwarding. Thanks to Anatoly Bubenkov. Closes GH-105.
  • The init system now prints its own log messages to stderr. Thanks to mephi42. Closes GH-106.

Using baseimage-docker

Please learn more at the README.

The post Baseimage-docker 0.9.12 released appeared first on Phusion Corporate Blog.

Tmux Only For Long-Running Processes


This post describes a minimal Tmux workflow, used only for long-running processes. It is intended to reduce the cognitive load imposed by administrative debris of open tabs, panes, or windows.

Set up Tmux for a Rails project

From within a full-screen shell (to hide window chrome, status bars, notifications, the system clock, and other distractions), create a Tmux session for a Rails project:

cd project-name

tat (short for “tmux attach”) is a command from thoughtbot/dotfiles that names the Tmux session after the project’s directory name. That naming convention will help us re-attach to the session later using the same tat command.

At this point, tat is the same thing as:

tmux new -s `basename $PWD`

Run the Rails app’s web and background processes with Foreman:

foreman start

The process manager is a long-running process. It is therefore a great candidate for Tmux. Run it inside Tmux, then forget it.

After only running one command inside Tmux, detach immediately:

<tmux-prefix> d

Ctrl+b is the default Tmux prefix. Many people change it to be Ctrl+a to match the API provided by GNU Screen, another popular terminal multiplexer.

Perform ad-hoc tasks

Back in a full-screen shell, we perform ad-hoc tasks such as:

vim .
git status
git add --patch
git commit --verbose

Those commands are targeted, “right now” actions. They are executed in a split second and focus us immediately on the task at hand.

Doing most of the work from inside Vim

A majority of our work is done from within a text editor, such as fast grepping in Vim:

\ string-i-am-searching-for

Or, running specs from Vim:

<Leader> s

In thoughtbot/dotfiles, <Leader> is <Space>.

Suspending the Vim process when necessary

To return control from Vim to the command line, suspend the process:


Run this command to see suspended processes for this shell session:


It will output something like:

[1]  + suspended  vim spec/models/user_spec.rb

This is when we might do some Git work:

git fetch origin
git rebase -i origin/master
git push --force origin <branch-name>
git log origin/master..<branch-name>
git diff --stat origin/master
git checkout master
git merge <branch-name> --ff-only
git push
git push origin --delete <branch-name>
git branch --delete <branch-name>

When we’re ready to edit in Vim again, we foreground the process:


Re-attach to the Tmux session quickly

When we need to restart the process manager or start a new long-running process, we re-attach to the Tmux session:


At this point, tat is the same thing as:

tmux attach -t `basename $PWD`

Compared to other Tmux workflows, this workflow does involve more attaching and detaching from Tmux sessions. That is why the tat shortcut is valuable.

Back inside Tmux, we can kill the Foreman process:


Or, we might want to open a long-running Rails console in order to maintain a history of queries:

<tmux-prefix> c
rails console

After poking around in the database, we might detach from Tmux again:

<tmux-prefix> d

Get things done

At that point, we might take a break, go home, or move on to another project.

The next time we sit (or stand!) at our desks, we start fresh by creating a branch, opening Vim, or doing whatever ad-hoc task is necessary in a clean slate, distraction-free environment.

Meanwhile, Tmux handles one responsibility for us: quietly manages long-running processes.

Solitary Unit Test

Posted about 1 month back at Jay Fields Thoughts

Originally found in Working Effectively with Unit Tests

It’s common to unit test at the class level. The Foo class will have an associated FooTestsclass. Solitary Unit Tests follow two additional constraints:

  1. Never cross boundaries
  2. The Class Under Test should be the only concrete class found in a test.
Never cross boundaries is a fairly simple, yet controversial piece of advice. In 2004, Bill Caputo wrote about this advice, and defined a boundary as: ”...a database, a queue, another system...”. The advice is simple: accessing a database, network, or file system significantly increases the the time it takes to run a test. When the aggregate execution time impacts a developer’s decision to run the test suite, the effectiveness of the entire team is at risk. A test suite that isn’t run regularly is likely to have negative-ROI.

In the same entry, Bill also defines a boundary as: ”... or even an ordinary class if that class is ‘outside’ the area your [sic] trying to work with or are responsible for”. Bill’s recommendation is a good one, but I find it too vague. Bill’s statement fails to give concrete advice on where to draw the line. My second constraint is a concrete (and admittedly restrictive) version of Bill’s recommendation. The concept of constraining a unit test such that ‘the Class Under Test should be the only concrete class found in a test’ sounds extreme, but it’s actually not that drastic if you assume a few things.
  1. You’re using a framework that allows you to easily stub most concrete classes
  2. This constraint does not apply to any primitive or class that has a literal (e.g. int, Integer, String, etc)
  3. You’re using some type of automated refactoring tool.
There are pros and cons to this approach, both of which are examined in Working Effectively with Unit Tests.

Solitary Unit Test can be defined as:
Solitary Unit Testing is an activity by which methods of a class or functions of a namespace are tested to determine if they are fit for use. The tests used to determine if a class or namespace is functional should isolate the class or namespace under test by stubbing all collaboration with additional classes and namespaces.

Knowledge Base updates

Posted about 1 month back at entp hoth blog - Home

Howdy everyone!

Today I would like to highlight two updates we deployed to the Knowledge Base in the past few weeks.


KB articles are now versioned. You can see changes between versions, restore versions, and see who made the changes. This should make it safer to update your articles regularly and allow you to recover from mistakes more easily :)

You can see the versions on the KB list:

Showing the number of versions in the KB listing

On the KB page:

Showing the number of versions on the KB page

And look at the history:

Showing the versions history


On the KB admin page, in the left sidebar, there is an option to export your whole KB as an HTML page:

Export your KB

We just improved this feature to add a Table of Contents, and fix all links between the different articles. This means that if you use the same anchor names in different articles, they will now work flawlessly in the exported file. If you moved or renamed articles but still have links to the old address, the export will take care of that as well.

This change will allow you to export your entire KB in one page, and print it as a PDF, and you have a complete manual for your application/service. Some of our customers do just that!

This last part is a bit experimental (the re-linking of everything), so if you experience any issue, just let us know.


The hard graft of Linked Data ETL

Posted about 1 month back at RicRoberts :

Allowing a range of users to repeatably and reliably produce high quality Linked Data requires a new approach to Extract, Transform, Load (ETL), and we’re working on a solution called Grafter.

At Swirrl, our aims are simple: we’re helping regional and national governments publish the highest quality open data imaginable, whilst helping consumers access and use this data in rich and meaningful ways. This means visualisations, browsing, metadata, statistical rigour, data modelling, data integration and lots of design; and it’s why our customers choose to buy PublishMyData.

This is great, but unlocking the data from the “ununderstood dark matter of IT” and the dozens of other formats in use within the bowels of government is currently a highly skilled job.

Extract, Transform and Load

For us this job involves converting the data from its source form (which is almost always tabular in nature) to a linked data graph. Once this is done the data then needs to be loaded into our publishing platform. This whole life-cycle is known as “Extract, Transform and Load”, or ETL for short.

Recently thanks to our continuing growth, in terms of employees, customers and data volume, we’ve come to identify ETL as a barrier. And it’s a barrier to both our own and our customers’ ability to produce high quality, high volume data in a repeatable efficient manner.

ETL is not a new problem; it dates back to the code breaking work of Turing at Bletchley Park in the 1940’s. Though solutions to it were largely popularised by the likes of IBM with the birth of mainframe, batch processing, and database computing in the 1950s and 60s.

The Bletchley Park ETL Pipeline

Unfortunately when it comes to Linked Data, the ETL tools available tend to be immature, unusable, flawed in a critical dimension, or too inflexible to be useful for the work we encounter on a day to day basis.

Scripting Transformations

For lack of a better, flexible option we’ve solved all of our ETL needs until now by writing bespoke Ruby scripts. Though this is flexible and we manage a certain amount of code re-use, it can be problematic and costly to do.

The scripts typically end up doing data cleaning and data conversion: they are often quite large and take time to develop, and they can be awkward to maintain and document. So if they need to be re-run again with new data, another time consuming step is required to check exactly how the script works and what inputs it needs.

But even deeper problems can occur. Is the script robust to structural changes in the source data? Does it identify errors and report them properly? Does the script need to be modified to accommodate any new assumptions in the data? And, when it’s finally done, how can we be sure it actually did the right thing?

Typically once we’ve run the script, we’ll have a file of linked data in an appropriate RDF serialisation (usually n-triples). This file will then need to be loaded and indexed by the triple store; an operation made awkward by having to load and transfer large files over the network.

Answering all of these questions and validating the result is time consuming - it’s clear that a better way is needed. We realised it was time to start addressing some of these thorny issues for ourselves and our customers and so are busy developing a suite of software we’re calling Grafter. We’re grateful for support from two EU FP7 projects: DaPaas and OpenCube.

DaPaas Logo

OpenCube Logo

ETL Users

The kinds of transformation we encounter broadly fall into two camps: simple or complex. With an appropriate tool simple transformations could be built by the talented Excel users we meet working within local authorties and government. These represent perhaps 60-80% of the datasets we encounter.

The remaining datasets will sometimes require a significantly more complicated transformation; one that requires a Turing complete programming language to process.

There’s a tendency for many existing graphical ETL tools to become Turing complete, i.e. to introduce loops and conditional branches. But we feel this is a big mistake because the complexities of user interface workflows around Turing-complete systems are so great that they become almost useless.

Graphical Interfaces to Turing complete systems are unwieldy

Discriminating between user types

At the one extreme of the spectrum are the software developers, like ourselves, who need to fall back on a Turing complete language to tackle the bigger, thornier transformation problems.

At the other end of that spectrum are the talented Excel workers who produce detailed and rigorous stats for Local Authorities and government. These users need tools that let them graphically perform the bulk of the data conversions necessary to publish their data online - without having to be programmers and deal with the many problems that come with software development.

It’s also worth mentioning that even experienced developers will prefer a graphical tool for simple cases, as GUI’s can make some aspects of the problem significantly easier, especially by introducing rapid feedback and interfaces for data exploration.

Finally there is another class of users, who are less willing to get into low level data transformation, but are responsible for putting the data online, curating it and loading it into the system. These users should be able to use the transformations built by the other users, by simply providing the source data through a file upload.

We are hoping to target these three classes of user by building a suite of tools which are built on a common basis and which target the different types of users within the data publication industry.

Clear & Coherent Transformation Model

The mental model for designing transformations should be clear, flexible and encourage people to build transformations that are unsurprising.

The set of operations for expressing transformations should help users fall into the pit of success, encouraging users to express their transformations in a way that is both natural and internally consistent.


Many of the existing data cleaning and transformation tools we’ve seen aren’t efficient enough at processing the data for robust, commercial grade ETL. For example Open Refine, though a great tool for data cleaning, isn’t suited to ETL because its processing model requires files to be eagerly loaded into RAM rather than streamed or lazily evaluated. This makes it unsuitable for large datasets, which can easily consume all available memory.

Also, it’s common for tools to make multiple unnecessary passes over the data - for example, performing an additional pass for each operation in the pipeline. We believe most data can be processed in a single pass, and that a good ETL tool needs to be efficient in terms of memory and minimizing the work it does.

Removing unnecessary intermediate steps

Our old way of doing things (with Ruby scripts) would frequently export the data into an RDF serialisation (such as n-triples) which would then need to be imported separately to the database via HTTP.

Sometimes you might want to serialise the output data locally to disk, but often we’d like to wire transformations to go directly into our triple store; because it minimises the amount of manual steps required and means that data can be processed as a stream from source to database.


We believe that ETL has to be a process you can trust: partial failures are common in distributed systems, particularly when dealing with large volumes of data. So detecting failures, recovering from them and failing into well known recoverable error states is extremely important for large scale ETL pipelines.

Robustness requires facilities for developers and pipeline builders to easily incorporate error detection into their transformations. It makes sense to support this kind of validation and assertions on the source data, at arbitrary intermediate steps and on the final output.

A big part of being robust is reporting good errors at different levels of severity, and handling them appropriately. Sometimes you might just want to warn the user of an error; other times you might want to fail the whole import. Sometimes if pipelines are to be reused, error handling may need to be overridden from outside of the pipeline itself.

Likewise, being robust to structural changes in the source data is critically important. Sometimes a transformation might expect a spreadsheet to grow in particular dimensions, but not others. For example, one transformation might need to be tolerant of new columns being added to the end of the sheet, whilst another should throw an error if that ever happened.

Layered architecture with a component interface

We believe that having a suite of interoperable ETL tools that operate and build on each other in a cleanly architected manner is the right way to solve the ETL problem. There is never a one size fits all solution to ETL, so having a suite of layered APIs, DSLs, file formats, services and tools that let users change levels of abstraction is important to ensure both flexibility and reuse.

It’s also important to be able to specify transformations abstractly without too much concern over the underlying file format. (e.g. lots of formats are inherently tabular in nature, but have different serialisations) So you need an appropriate abstraction that lets you convert arbitrary file formats into a unit of operation on your pipeline.

Import services

Transformations themselves should be loadable into a generic import service, that will inspect the transformation for its required inputs and generate a web form that allows other users to import and process the raw data.

Once a spreadsheet’s structure has been decided (and a transformation built for sheets with that structure), import services become essential to gaining reuse out of transformations and lowering barriers to repeatable publication.


A lot of our users face similar data transformation challenges. It’s important that the transformation pipelines can be easily shared between users, so that rather than starting from scratch, a user can tweak a transformation that someone else has already built for a similar purpose.

Commercial Grade ETL

We see a need for a different kind of ETL tool and we’re currently working on delivering on this vision. We’re starting small and have a good understanding of the problems we’re tackling, and for whom we are solving them.

We already have the fundamentals of this approach up and running and we are using it to process hundreds of datasets far more efficiently than we have before.

The next step is to start wrapping Grafter (our solution) in a user interface that will make high performance repeatable linked data creation easier and quicker for experts and more accessible to non-experts.

Phusion Passenger 4.0.46 released

Posted about 1 month back at Phusion Corporate Blog

Phusion Passenger is a fast and robust web server and application server for Ruby, Python, Node.js and Meteor. Passenger takes a lot of complexity out of deploying web apps, and adds powerful enterprise-grade features that are useful in production. High-profile companies such as Apple, New York Times, AirBnB, Juniper, American Express, etc are already using it, as well as over 350.000 websites.

Phusion Passenger is under constant maintenance and development. Version 4.0.46 is a bugfix release.

Phusion Passenger also has an Enterprise version which comes with a wide array of additional features. By buying Phusion Passenger Enterprise you will directly sponsor the development of the open source version.

Recent changes

Most notable changes:

  • Further improved Node.js and compatibility.
  • Sticky session cookies have been made more reliable.
  • Fixed WebSocket upgrade issues on Firefox. Closes GH-1232.
  • Improved Python compatibility.
  • Logging of application spawning errors has been much improved. Full details
    about the error, such as environment variables, are saved to a private log file.
    In the past, these details were only viewable in the browser. This change also
    fixes a bug on Phusion Passenger Enterprise, where enabling Deployment Error
    Resistance causes error messages to get lost. Closes GH-1021 and GH-1175.
  • Passenger Standalone no longer, by default, loads shell startup files before
    loading the application. This is because Passenger Standalone is often invoked
    from the shell anyway. Indeed, loading shell startup files again can interfere
    with any environment variables already set in the invoking shell. You can
    still tell Passenger Standalone to load shell startup files by passing
    --load-shell-envvars. Passenger for Apache and Passenger for Nginx still
    load shell startup files by default.
  • If you are a Union Station customer, then
    Phusion Passenger will now also log application spawning errors to Union Station.
    This data isn’t shown in the Union Station interface yet, but it will be
    implemented in the future.

Minor changes:

  • The Python application loader now inserts the application root into sys.path.
    The fact that this was not done previously caused a lot of confusion amongst
    Python users, who wondered why their could not import any
    modules from the same directory.
  • Fixed a compatibility problem with Django, which could cause Django apps to
    freeze indefinitely. Closes GH-1215.
  • Fixed a regression in Node.js support. When a Node.js app is deployed on
    a HTTPS host, the X-Forwarded-Proto header wasn’t set in 4.0.45.
    Closes GH-1231.
  • Passenger Standalone now works properly when the HOME environment variable
    isn’t set. Closes GH-713.
  • Passenger Standalone’s package-runtime command has been removed. It has
    been broken for a while and has nowadays been obsolete by our automatic
    binary generation system.
    Closes GH-1133.
  • The passenger_startup_file option now also works on Python apps. Closes GH-1233.
  • Fixed compilation problems on OmniOS and OpenIndiana. Closes GH-1212.
  • Fixed compilation problems when Nginx is configured with OpenResty.
    Thanks to Yichun Zhang. Closes GH-1226.
  • Fixed Nginx HTTP POST failures on ARM platforms. Thanks to nocelic for the fix.
    Closes GH-1151.
  • Documentation contributions by Tim Bishop and Tugdual de Kerviler.
  • Minor Nginx bug fix by Feng Gu. Closes GH-1235.

Installing or upgrading to 4.0.46

OS X OS X Debian Debian Ubuntu Ubuntu
Heroku Heroku Ruby gem Ruby gem Tarball Tarball


Fork us on Github!Phusion Passenger’s core is open source. Please fork or watch us on Github. :)

<iframe src=";repo=passenger&amp;type=watch&amp;size=large&amp;count=true" allowtransparency="true" frameborder="0" scrolling="0" width="170" height="30"></iframe><iframe src=";repo=passenger&amp;type=fork&amp;size=large&amp;count=true" allowtransparency="true" frameborder="0" scrolling="0" width="170" height="30"></iframe><iframe src=";type=follow&amp;size=large&amp;count=true" allowtransparency="true" frameborder="0" scrolling="0" width="190" height="30"></iframe>

If you would like to stay up to date with Phusion news, please fill in your name and email address below and sign up for our newsletter. We won’t spam you, we promise.

Let Your Code Speak For Itself


Let’s say you have some code whose intent is clear to you but you can see a world where someone else may be confused. Often after writing such code, you realize this and add some comments to clarify the intent.

We add code comments for other developers who will interact with whatever we wrote, because we are courteous and thoughtful teammates:

class HostsController < ApplicationController
  def show
    # make sure user belongs to host if name is internal
    # set current host in session

    if current_user.hosts.include?(host) && host.match(/
      session[:current_host_id] =
      raise 'something went horribly wrong oh nooooo'

The Telephone Game

Remember the telephone game? Messages passed through intermediaries can get garbled in transmission. Particularly unreliable, logic-free intermediaries like code comments.

On their face, comments should be super helpful - I mean, you’ve left helpful notes for the next person! Isn’t that a good thing?

Yes, although now we have duplication - the comment and the code itself both speak to what this bit of logic should do.

Comments are a code smell which means “something may be improved here and we should dig deeper to see if that’s true.” In this case, the smell is “hey something is probably more complicated than it needs to be.”

Later on, when someone moves the session-setting behavior somewhere else, they have to remember to move this comment or update it. As humans, this is easy to forget.

Instead, let’s use intention-revealing method names to encapsulate the behavior. We’ll also move this logic into private methods since we don’t want other classes calling these methods:

class HostsController < ApplicationController
  def show
    if user_belongs_to_host? && host_name_is_internal?
      raise 'something went horribly wrong oh nooooo'


    def user_belongs_to_host?

    def host_name_is_internal?

    def set_current_host_in_session
      session[:current_host_id] =

Other Smelly Comments

  • TODOs, like # Remember to fix this terrible method
  • Commented out dead code. Just delete it - that’s what Git is for.
  • Comments which restate the method name in English.

Comments are one of the code smells we address in our Ruby Science ebook.

When Are Comments Useful?

This isn’t a hard and fast rule. Some comments are useful:

  • Class-level comments: Adding a comment at the top of a class to describe its responsibilities can be helpful.
  • Open-Source: Ruby Gems and other open-source libraries are good places to add more detail, because we can use tools such Yard to automatically generate documentation from the comments. Here’s an example in Paperclip. If you’re providing a library for others to use, lightly commenting the public interface is typically encouraged so that documentation for the library can be auto-generated. Here’s an example in Golang.

Episode #480 - July 15th, 2014

Posted about 1 month back at Ruby5

In this episode we cover fun with iBeacons and PunchClock, visually starting a Rails app with Prelang, a Ruby Queue Pop method with Timeout, text translations from the command line with Termit and Diving into the Rails request handling.

Listen to this episode on Ruby5

Sponsored by

Codeship is a hosted Continuous Delivery Service that just works.

Set up Continuous Integration in a few steps and automatically deploy when all your tests have passed. Integrate with GitHub and BitBucket and deploy to cloud services like Heroku and AWS, or your own servers.

Visit and sign up for free. Use discount code RUBY5 for a 20% discount on any plan for 3 months.

Also check out the Codeship Blog!



PunchClock is a combination of software applications that use Apple’s iBeacon tracking, geo-­fencing, and Sinatra to automatically mark employees as being either in or out of the office.


Prelang is a service which allows us to visually generate Rails applications through a web interface. You add features and configure settings all through webs form, with buttons, text inputs and dropdowns, and at the end, the project is published to your GitHub account.

Ruby Queue Pop with Timeout

Job Vranish created a Ruby Queue Pop method with support for timeout. It uses a Ruby Mutex combined with a ConditionVariable to create a blocking queue pop interface.
Ruby Queue Pop with Timeout


Termit is a Ruby gem that allows you to use Google Translate from your terminal. The gem depends on Ruby 1.9.2 or higher. To use the text to speech feature, you must have mpg123 installed.

Diving in Rails - The request handling

Adrien Siami published an article last last week where he performs a deep dive into request handling in Ruby on Rails, specifically version 4.1. He details how Rack and Rails work together, the middleware that Rails introduces, how Routes work and a whole lot more.
Diving in Rails - The request handling

Sponsored by Top Ruby Jobs

PeerStreet is looking for a Rails Developer in Los Angeles, CA.
Top Ruby Jobs

Sponsored by Ruby5

Ruby5 is released Tuesday and Friday mornings. To stay informed about and active with this podcast, we encourage you to do one of the following:

Thank You for Listening to Ruby5

Preload Resource Data into AngularJS


Preloading, or bootstrapping, data resources from the server via HTML is a common technique used in JavaScript web development. This technique allows the application to avoid making an extra HTTP request to get that initial data for rendering on the page.

Unfortunately, there isn’t a mechanism built into AngularJS that allows for preloading resources into the initial HTML page. Yet, the following implementation can be utilized for such a purpose.

First, a directive needs to exist:

angular.module("myApp", []).
  directive("preloadResource", function() {
    return {
      link: function(scope, element, attrs) {
        scope.preloadResource = JSON.parse(attrs.preloadResource);

Now, this directive can be used in the HTML that is sent by the initial request, and populated with JSON data:

<div ng-cloak preload-resource="<%= %>"></div>

The preloadResource directive reads the JSON structure into the preloadResource variable on the scope, and removes the element from the DOM to clean-up the final HTML.

The resource data is now available on the scope, and can be used as desired. For example, it can be passed to a corresponding AngularJS factory, like so:

angular.module("myApp", []).
  controller("MainCtrl", [$scope, "User", function($scope, User) {
    $scope.user = new User($scope.preloadResource);

This method requires very little code and reduces the number of HTTP requests made by an Angular application.

Episode #479 - July 11th, 2014

Posted about 1 month back at Ruby5

Time Travel Movies explained in git, a free online book to learn programming, better controllers with adequate_exposure, Avdi's Sinatra testing adventure, Engine Yard's App Server Arena, and the Informant Heroku add-on all in this episode of the Ruby5!

Listen to this episode on Ruby5

Sponsored by New Relic

New Relic is _the_ all-in-one web performance analytics product. It lets you manage and monitor web application performance, from the browser down to the line of code. With Real User Monitoring, New Relic users can see browser response times by geographical location of the user, or by browser type.
This episode is sponsored by New Relic

Time Travel Movies Explained in Git

Trying to understand all these time travel movies in the theater recently but at a loss? Well Rocketeer Vic Ramon has explained two of them via git commands!
Time Travel Movies Explained in Git

Introduction to Programming with Ruby

If you are trying to learn Ruby or know someone who is, there is a brand new online book available for free from the fine folks at Tealeaf Academy!
Introduction to Programming with Ruby


Want to dramatically improve the readability and maintainability of your controllers? Check out the adequate_exposure gem from Rocketeer Pavel Pravosud!

Zero to Smoke Test with Sinatra

This new blog post from Avdi walks you through the process of adding high level regression tests to an existing Sinatra app that has none.
Zero to Smoke Test with Sinatra

App Server Arena

Engine Yard pits all the heavy hitters in the application server space against one another for a fight to the death!
App Server Arena


This new Heroku add-on tracks form submissions on the server side and provides metrics about which forms are causing the most pain for your users.

Open Data Communities: New Features

Posted about 1 month back at RicRoberts :

Over the last few months we’ve added some new features on the OpenDataCommunities site linked open data site that we continue to run and develop for our friends at DCLG. The first is the Statistics Selector, and the second is the Geography Selector. And both of them make it easy to cherry pick and combine data you want from across all the datasets on the site.

The Statistics Selector was created late last year. It allows you to build your own table of data by uploading a CSV file of geography codes (GSS codes or postcodes) and then creating a table of your choice by combining columns from multiple datasets in the OpenDataCommunities database. The resulting table can be downloaded as CSV (so you can load it into a spreadsheet or other computer program), and conveniently, has a permalink so you can refer to it later.

stats selector

But what if you don’t know the GSS code (or the postcode) of the area you’re interested in? To help address this, last month we launched the Geography Selector. This interactive map is nice and easy to navigate and lets you find the area you’re interested in quickly. When you select an area, its details appear alongside and you can launch the Statistics Selector straight from there. Lovely and quick. As a bonus, you can also download the GSS code(s) of your selection (these are useful to know as they’re often used to identify geographic areas in government data).

geo selector

Both of these features make it easier for users to slice and dice the data so they can get what they need, download it, save it and do with it what they will. Watch this space for more developments with opendatacommunities - in the coming months we’ll be updating it the latest version of PublishMyData and it’s going to get a new look. Exciting!

What Not to Ask


Despite our best intentions, spending time developing software that people won’t use can be devastating to morale and to a product’s long-term viability. How can we stop wasting time on developing features (or whole products!) that won’t be used? How can your customers' most painful problems be identified? Most teams turn to user interviews to flesh these ideas out early on in the product life cycle. With user interviews we can understand what users want and how to best alleviate their problems.

Except it doesn’t work out that way for most teams. Without realizing it we ask questions that yield biased results. That means that if we aren’t careful, we might actually build features or entire products that customers say they want, but actually won’t use.

The mind is a maze.

“If I had asked people what they wanted, they would have said faster horses.” - Henry Ford

There are over 150 biases listed on Wikipedia that may come into play during a conversation with your users. The difference between the answers you will get when asking, “Would a feature that lets you add task templates be useful?” and “What problems do you have with managing tasks?” could mean the difference between spending weeks on a feature or realizing that people won’t even use it (and spending no time at all). Without realizing it, we may build something based on biased feedback.

Why do our minds adopt these patterns of bias? Do they even serve a purpose? Cognitive biases enable faster decisions when timeliness is more valuable than accuracy. They can help us to make decisions quickly when accuracy is not as important or when situations are dangerous. When we are asking questions about our products, we want accuracy, not speed. Are the questions you tend to ask during user interviews invoking these cognitive biases or not? Are they causing your users to answer with speed, or with accuracy?

Without realizing, we often ask questions that trigger these mental shortcuts and produce biased responses. Biased responses lead us to build software that is not really what our users need or want. Luckily, research on cognitive biases and logical fallacies have helped us ask questions that produce honest and actionable results.

Hidden in plain sight.

Let’s go over some common pitfalls and how to avoid them. In the process we’ll figure out what kind of questions will help us identify the information we need, while avoiding questions that lead to inaccurate responses.

  • Avoid giving away too much information. A good question should be goal oriented. Asking someone to, “add a task for sending a proposal to your client” makes it difficult to see how a real person would think through a problem, because you told them how to map their problem to an action in the software, “add a task”. Instead don’t tell them what to do in the interface, give them a problem and see how they think through it. “You want to send a proposal to a client later in the week, what would you do?” This helps you see how your customer thinks through a problem and uses your software to solve it. It may be that they don’t need your software to solve their problem.
  • Avoid asking questions that plant an idea in their head and lead them to a conclusion. This is called anchoring. “How useful would task templates be in your project management software?” is a leading question. It plants the idea that task templates would be useful. Why is asking this a problem? Task templates probably are useful, right? So is Tylenol, but only if you have pain. Before prescribing a possible solution, try asking about their pain, the problem they want to solve. “What problems do you have with managing your team’s work?” is a type of question that you will help you find your customers most painful problems. These are the problems that customers really care about.
  • Avoid asking questions that limit responses (false dilemmas). These include yes/no questions and questions where you give the person only a subset of possible responses. For example instead of asking, “Would you rather have a task template feature first or would you rather be able to bulk delete tasks?” you could ask, “if you could choose one feature to be included in the next release, what would it be?”
  • Beware of the current moment bias. When asking if a feature would be useful, there is a good chance that they can see it being more useful in the moment because they are sitting there with the software in their hands. When someone is experiencing normal life, the pain of the problem they need to solve may not be bad enough to cause them to use your software. Instead ask about past experiences and problems to see if they have felt and noticed the pain point you are trying to solve. Ask what they currently do to solve the problem. It is also helpful to do field tests where you observe people using software in the real world.
  • Don’t be afraid to stray from the script and ask, “why?” Knowing why someone says or acts or does things, can be huge. You may realize that their mind took a mental leap to a conclusion that wasn’t really what they thought. Asking “why?” can help you be sure that your findings are accurate. It will require extra work to stray from the script and interpret the responses, but it will be worth the extra effort.
  • Be mindful of your own confirmation bias. We want to agree and remember when users say things that we already agree with. Do not do this. Being actively aware of this bias is essential when speaking with your users. Be open to viewpoints that don’t align with what you were thinking.

There are many reasons that a product can fail to gain traction, but products that are based on user’s real needs have a fighting chance. The best user interview questions avoid biases and attempt to find the real problem people are trying to solve. With these tips you’ll be a bit further ahead in creating your next hit product.

Episode #478 - July 8th, 2014

Posted about 1 month back at Ruby5

From small releases of rails, to Awesome Ruby, to MessageEncryptor, to BHF, and Inch CI, Olivier and Gregg fourchette their way through the Ruby world.

Listen to this episode on Ruby5

Sponsored by

Codeship is a hosted Continuous Delivery Service that just works.

Set up Continuous Integration in a few steps and automatically deploy when all your tests have passed. Integrate with GitHub and BitBucket and deploy to cloud services like Heroku and AWS, or your own servers.

Visit and sign up for free. Use discount code RUBY5 for a 20% discount on any plan for 3 months.

Also check out the Codeship Blog!


Rails 4.0.8 & 4.1.4 released

Last week Karle & Chris told you about the Rails 4.0.7, and 4.1.3 security releases that patched a vulnerability in the Postgres adapter for Rails. Well, the same day — actually a mere two hours later — versions 4.0.8 and 4.1.4 were also released.
Rails 4.0.8 & 4.1.4 released

Awesome Ruby

For a list of awesome Ruby libraries, tools, frameworks and software check out Awesome Ruby curated by Marc Anguera Insa. There’s maybe over a 100 libraries listed on this single Github page, but I suspect they might be the most common gems used in Ruby applications today.
Awesome Ruby

Reading Rails - How Does MessageEncryptor Work?

After diving into the Rails implementation of MessageVerifier, Adam Sanderson took to his blog again to tackle MessageEncryptor, a useful class from ActiveSupport. It’s great blog post that literally takes you line by line through an internal Rails tool that actually saves you the hassle of interfacing directly with OpenSSL which doesn’t feel very Ruby-ish.
Reading Rails - How Does MessageEncryptor Work?


Anton Pawlik dropped us a line to let us know about the release of his BHF gem, which is a Rails engine which generates an admin interface for you.


If anybody out there is using Heroku as their production stack, there’s an interesting gem out there called Fourchette. It allows you to use Heroku’s Fork feature in order to create a clone of your production environment, deploy a pull request to it and allow you to poke around as if you had deployed to your real production environment.

Inch CI

On episode 441 of Ruby5 we introduced Inch, a library that will grade how well your code is documented, created by Rene Fohring. Today I discovered Inch CI, a project also created by Rene, which will automatically run Inch every time you push new code.
Inch CI

Sponsored by TopRubyJobs

The Advisory Board Company / (slash) Crimson is looking for a Senior Software Engineer in Austin or Houston, Texas and Metabahn is looking for a Web Application Developer in Huntsville, Alabama. So if you’re looking for a top ruby gig or top ruby talent, checkout

Episode #477 - July 4th, 2014

Posted about 1 month back at Ruby5

Happy Birthday `Murica. Rails security, modular migrations and N+1 with custom Arel

Listen to this episode on Ruby5

Sponsored by NewRelic

NewRelic just released Ruby Agent 3.9.0, it will now automatically instrument Rack middleware.

Rails 4.0.7, 4.1.3 and 3.2.19

Rails 4.0.7, 4.1.3 and 3.2.19 addresses two distinct but related vulnerabilities in the PostgreSQL adapter for Active Record
Rails 4.0.7, 4.1.3 and 3.2.19

Modular Migrations

swordray has released the Modular_migrations gem to help with OBD
Modular Migrations

Rails and the Warped Asset Pipeline

according to Risa, to most Rails developers the Asset Pipeline is like the magical warp pipe
Rails and the Warped Asset Pipeline

Build a Custom Query with AREL

Arel is another great feature of rails, but sometimes you can fall into an N+1 problem
Build a Custom Query with AREL

Goal! Detecting the most important World Cup moments

Luis Cipriani built a cool project to ring a bell when big moments happen during a the World Cup
Goal! Detecting the most important World Cup moments