Open Cube: Making publishing statistics easier

Posted 4 months back at RicRoberts :

Open Cube logo

We’ve recently started work on another European collaborative project called OpenCube. This project is all about making open statistical data easier to publish, link and reuse so it can reach its full potential. The technical approach concentrates on the RDF Data Cube vocabulary, that recently reached W3C recommendation status.

This project is very relevant to our work: much of the data that our customers publish using PublishMyData is statistical data and we are big users of the RDF Data Cube approach. We can already see the power of this approach: for example how it enables combining datasets together, as in our Stats Selector tool. But we also have first hand experience of the challenges our customers face in creating RDF Data Cube data and can see many further opportunities for enriching access to data in this form.

A real strength of OpenCube is the close involvement of a group of ‘use case partners’. These are organisations that have agreed to participate in a series of pilot studies, assisting us with setting the requirements of the project and evaluating the tools we create. The pilots will involve the UK Department for Communities and Local Government, the Government of Flanders, the Irish Central Statistics Office and a (yet to be confirmed) Swiss bank.

As part of the project we’ll be creating an OpenCube extension of PublishMyData. This extension will include lots of cool stuff including:

  • tools for transforming CSV files and Excel spreadsheets to RDF Data Cube datasets (without the need for programming)
  • tools for managing a collection of concept schemes
  • tools for selecting URIs from external reference data
  • quality checks that make sure any generated RDF meets the required standards

So, like the DaPaaS project, we’re excited about being part of this because it will help us make Linked Data easier to create and easier to exploit; bringing the benefits to a larger community of users.

Again, we’re lucky to be working alongside some great data people on this project. Take a look at the links below to find out more about who’s involved:

Is Ruby Manor X happening this year?

Posted 4 months back at interblah.net - Home

It’s been almost six years since the first Ruby Manor. Last year we ran Ruby Manor 4.

The observant among you will notice that four conferences into six years leaves some years where a Ruby Manor did not happen. However, it might not be obvious why this is the case.

I’ve already written at length here about why Ruby Manor is a conference, just not a traditional one. With each iteration of the event, we have tried to stay true to the manifesto while making careful changes to see if we can improve on the previous event.

As I’ve said before, the right way to think about Ruby Manor is as an experiment. We make hypotheses (e.g. ‘community feedback during the proposal process will result in more relevant presentations’ or ‘not spending money on t-shirts will not detract from the conference experience’), then we run the experiment and evaluate the results.

In most cases the experiment has been largely successful. I’m pretty sure most people who’ve come to one or more Ruby Manors have enjoyed the day, enjoyed the majority of the content, enjoyed not having to spend hundreds of pounds or dollars, not minded the lack of swag and so on. It’s probably true to say that we could run a basically similar event ever year, and most people would be largely happy.

However – and here I speak mainly for myself, although I know that Murray and Tom feel largely similarly – I’m not desperately interested in putting bums-on-seats once per solar revolution, just for the sake of it.

I don’t know what the next experiment is.

Introducing Lotus::Model

Posted 4 months back at Luca Guidi - Home

Almost all the Ruby frameworks for the Model layer mix up business logic with database details. This kind of architecture leads to god classes, slow build times and to a general bad design. These problems are well known to legacy projects’s maintainers.

What if we assign these roles to smaller components that are able to collaborate together? Imagine how life changing would be to work just with objects, without worrying how to persist them. How easy and fast would be testing them? How small and well defined would be your objects?

Let me introduce Lotus::Model.

It brings a new level of decoupling that is a huge step in that direction. The framework constitutes a boundary that offers a convenient public API to execute queries and commands to persist and fetch entities.

Entities

An entity is the core of an application, where the part of the domain logic is implemented. It’s a small, cohesive object that express coherent and meaningful behaviors.

It deals with one and only one responsibility that is pertinent to the domain of the application, without caring about details such as persistence or validations.

This simplicity of design allows you to focus on behaviors, or message passing if you will, which is the quintessence of Object Oriented Programming.

Consider this object:

class Article
  attr_accessor :id, :title, :text

  def initialize(attributes = {})
    @id, @title, @text =
      attributes.values_at(:id, :title, :text)
  end
end

It can be optionally expressed as:

require 'lotus/model'

class Article
  include Lotus::Entity
  self.attributes = :title, :text
end

Yes, optionally. Lotus::Model can work with pure objects, as long they implement that small interface above.

But how the framework knows how to handle these objects?

Data Mapper

We use a data mapper for the job. It’s a persistence mapper that keep entities unaware of schema details. Good news are that it’s database independent, it can work with SQL, document, and even with key/value stores.

The role of a data mapper is to translate database columns into the corresponding attribute of an entity.

require 'lotus/model'

mapper = Lotus::Model::Mapper.new do
  collection :articles do
    entity Article

    attribute :id,    Integer
    attribute :title, String
    attribute :text,  String
  end
end

For simplicity sake, imagine that the mapper above is used with a SQL database. We use #collection to indicate the table that we want to map, #entity to indicate the kind of object to persist. In the end, each #attribute call specifies which Ruby type we want to associate for given column.

Repositories

Once we have in place all the entities and a mapping for them, we can use a repository to talk with a database.

A repository is an object that mediates between entites and the persistence layer. It offers a standardized API to query and execute commands on a database.

A repository is storage idenpendent, all the queries and commands are delegated to the current adapter.

This architecture has several advantages:

* An application depends on an standard API, instead of low level details (Dependency Inversion principle)

* An application depends on a stable API, that doesn’t change if the storage changes

* You can postpone storage decisions

* It confines persistence logic at a low level

* Multiple data sources can easily coexist in an application

require 'lotus/model'

class ArticleRepository
  include Lotus::Repository
end

When a class includes Lotus::Repository it will expose CRUD methods such as .create, .update, .find, and a powerful private query API. This decision forces developers to define intention revealing APIs, instead leak storage details outside of a repository.

Look at the following code:

ArticleRepository.
  where(author_id: 23).
  order(:published_at).
  limit(8)

This is an example of implicit API, it means nothing in terms of the behavior of the domain model. It’s just a chain of method calls, from the caller, that should be aware of the internal query mechanisms.

There is a better way to write it:

require 'lotus/model'

class ArticleRepository
  include Lotus::Repository

  def self.most_recent_by_author(author, limit = 8)
    query do
      where(author_id: author.id).
        order(:published_at)
    end.limit(limit)
  end
end

Look at how revealing is the name of that method. It encapsulates the implementation details, in favor of a clear and testable API.

If we change the type of database, the callers of that method will be unaffected.

Adapters

As mentioned above, Lotus::Model is database agnostic. A repository forwards method calls to its current adapter. An adapter is a concrete implementation of persistence logic for a specific database. The framework is shipped with two adapters:

* SqlAdapter * MemoryAdapter

An adapter can be associated to one or multiple repositories and different repositories can have different data sources. For instance an application can have ArticleAdapter that uses a SQL database and TweetRepository that talks to a third part JSON service.

Roadmap

For the next two months, I will focus on Lotus (the gem). The main goal is to make all the frameworks to work together in a full stack app. This will require improve the existing libraries and empower them with the missing features.

On June 23rd I will release Lotus.

To stay updated with the latest releases, to receive code examples, implementation details and announcements, please consider to subscribe to the Lotus mailing list.

<link href="//cdn-images.mailchimp.com/embedcode/slim-081711.css" rel="stylesheet" type="text/css"/>

Episode #458 - April 22nd, 2014

Posted 4 months back at Ruby5

This week we cover Searchlight's 3.0, dumping code in models, this week in Rails commits, Whiny Validations, Bootstrap 3 support for Simple Form, and how to be an open source gardener.

Listen to this episode on Ruby5

Sponsored by Top Ruby Jobs

If you're looking for a top Ruby job or for top Ruby talent, then you should check out Top Ruby Jobs. Top Ruby Jobs is a website dedicated to the best jobs available in the Ruby community.
This episode is sponsored by Top Ruby Jobs

Searchlight 3.0 released

Nathan Long dropped us an email recently to let us know about the 3.0 release of searchlight library, which we originally covered about a year ago now. Searchlight gives you a low-magic way to build database searches. It allows you to encapsulate your search options in an object which is more testable and database agnostic. It also plays really well with Rails forms, so if you find yourself building an advanced search page check it out.
Searchlight 3.0 released

Don’t just dump code into your models

Jakub Arnold of Sensible.io has an interesting post where he takes you through the consequences of dumping coupled code that depends on the Mailchimp gem inside of an ActiveRecord model’s callback cycle. As Jakub explains, this dependency may cause exceptions when saving the model and even if you handle the exception, your call to Mailchimp may fail. He goes through several steps, eventually extracting the Mailchimp logic into a separate class, and to handle possible network issues, he uses a queue which will retry in case of failure. It’s a nice, short example of something that can definitely bite you if you casually take dumps it inside of a Rails model.
Don’t just dump code into your models

This week in Rails

A few weeks ago Godfrey Chan started a new weekly newsletter called this week in Rails, where he’s covering the most interesting commits, pull requests on the Rails code base. In his most recent newsletter he mentioned a commit that introduced a new helper called “sanitize_sql_like” to help escape characters with special meaning in a LIKE clause, another that added a `to_io` method to `UploadedFile`which allows you to directly manipulate a uploaded (temp) file, and a bunch of fixes including getting rid of the wrapping divs on hidden form fields and some optimizations to the postgreSQL adapter. If you enjoy living on the edge of rails, this could be a great weekly newsletter to subscribe to.
This week in Rails

Whiny Validation

Sometimes when you run a spec on a Rails app, it fails silently but it isn’t clear why.. It could be a validation error, but the log may not even tell you that. This is a problem Brian Morearty set out to solve with his Whiny Validation gem, which adds a log message so when you debug you’ll know why your test failed. co: That sounds useful, but this sounds like a pretty simple gem... Yeah, admittedly this is a simple gem. But often enough, simple gems like this end up being merged into Rails when people realize they’re lightweight enough and make so much sense. What’s also great is that Brian wrote a blog post which walks us through the internals of developing the gem, how to use ActiveSupport notifications, and how to add your own Log Subscriber. co: It’s great to have the confidence to get in under the hood of these libraries when you need to, and simple examples can go a long way.
Whiny Validation

Bootstrap 3 Support for Simple Form

It’s been a while on the podcast since we’ve talked about Simple Form, which just this week received support for Bootstrap 3. Simple Form is the gem by plataformatec which gives your Rails app a fully featured DSL for creating form elements. It can be especially useful when you have complex forms which may be stored across multiple models. Aside from adding support for Bootstrap 3, the 3.1 RC1 version of Simple Form which was just released has a bunch of improvements, you can read about in the changelog.
Bootstrap 3 Support for Simple Form

How to be an open source gardener

Steve Klabnik wrote a great piece called How to be an open source gardener in which he talks about his experience working on Rails issue triaging. Everyone who uses Rails — or any open source project really — should give this a read. Steve does a good job to show that open source often involves hard tedious and not so creative work and I really love the gardening analogy. For a beautiful project to keep growing healthily we need people to pull the weeds out, so please check out Richard Schneeman’s CodeTriage and pick an open source project you’d like to occasionally help with issue triaging.
How to be an open source gardener

Sponsored by Ruby5

Ruby5 is released Tuesday and Friday mornings. To stay informed about and active with this podcast, we encourage you to do one of the following:

Thank You for Listening to Ruby5

Sass's @content Directive Use Cases

Posted 4 months back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

Sass 3.2 added the @content directive, which allows us to pass a content block into a mixin.

@mixin apply-to-ie6-only {
  * html {
    @content
  }
}

@include apply-to-ie6-only {
  #logo {
    background-image: url(/logo.gif);
  }
}

Generates:

* html #logo {
  background-image: url(/logo.gif);
}

We can use it anywhere that declarations need to be wrapped in outer-scoped selectors, and many places where declarations are duplicated.

Media Queries

We can inline our media queries rather than maintaining separate, device-specific stylesheets, but writing queries over and over can get pretty unwieldy. We can simplify them by passing a content block into a mixin that wraps a query.

@mixin media($width) {
  @media only screen and (max-width: $width) {
    @content;
  }
}

@include media(320px) {
  background: red;
}

Generates:

@media only screen and (max-width: 320px) {
  background: red;
}

This becomes especially helpful for long, highly specific media queries, like the HiDPI mixin we use in Bourbon.

You can see our full @media mixin in Neat.

Keyframes

Keyframes are a good example of content duplication. Rather than rewriting the declarations for each vendor-specific selector, we can instead write a mixin to do it for us.

@mixin keyframes($name) {
  @-webkit-keyframes #{$name} {
    @content;
  }

  @-moz-keyframes #{$name} {
    @content;
  }

  @keyframes #{$name} {
    @content;
  }
}

@include keyframes(fadeIn) {
  from {
    opacity: 0%;
  }
  to {
    opacity: 100%;
  }
}

Generates:

@-webkit-keyframes fadeIn {
  from {
    opacity: 0%;
  }
  to {
    opacity: 100%;
  }
}

@-moz-keyframes fadeIn {
  from {
    opacity: 0%;
  }
  to {
    opacity: 100%;
  }
}

@keyframes fadeIn {
  from {
    opacity: 0%;
  }
  to {
    opacity: 100%;
  }
}

This is also used in Bourbon.

Context Specificity

I just picked up a project from Reda Lemeden, who wrote a pair of clever mixins to modify components for a given context.

Instead of creating many .component--modifiers or chaining modifying classes, we can better separate our concerns by defining a style's context specificity.

@mixin create-context($classes...) {
  @each $class in $classes {
    .#{$class} & {
      @content;
  }
}

@mixin context--alternate-template {
  @include create-context(about, blog) {
    @content
  }
}

.header {
  height: 12em;
  background: red;

  @include context--alternate-template {
    background: green;
  }
}

Generates:

  .header {
    height: 12em;
    background: red;
  }

  .about .header {
    background: green;
  }

  .blog .header {
    background: green;
  }

Getting BEMy

Sass 3.3 adds the @at-root directive and improved &s. The former allows us to nest declarations in Sass, but compile them to the stylesheet's root. The latter appends any following text directly to the parent's selector.

These can be used with @content to simplify writing BEM syntax. Thanks to Scott Kellum for the original implementation.

@mixin element($name) {
  @at-root #{&}__#{$name} {
    @content;
  }
}

@mixin modifier($name) {
  @at-root #{&}--#{$name} {
    @content;
  }
}

.block {
  color: red;

  @include element(element) {
    color: green;

    @include modifier(modifier) {
      color: blue;
    }
  }
}

Generates:

.block {
  color: red;
}

.block__element {
  color: green;
}

.block__element--modifier {
  color: blue;
}

In Conclusion

@content is just one of many Sass directives that can empower us to remove duplication in our SCSS, and think more creatively about its organization and implementation. Learn more by reading the Sass directives documentation.

Uplink Latency of WiFi and 4G Networks

Posted 4 months back at igvita.com

The user opens your application on their device and triggers an action requiring that we fetch a remote resource: application invokes the appropriate platform API (e.g. XMLHttpRequest), the runtime serializes the request (e.g. translates it to a well-formed HTTP request) and passes the resulting byte buffer to the OS, which then fragments it into one or more TCP packets and finally passes the buffer to the link layer.

So far, so good, but what happens next? As you can guess, the answer depends on the properties of the current link layer in use on the device. Let's dig a bit deeper...

Transmitting over WiFi

If the user is on WiFi, then the link layer breaks up the data into multiple frames and (optimistically) begins transmitting data one frame at a time: it waits until the radio channel is "silent," transmits the WiFi frame, and then waits for an acknowledgement from the receiver before proceeding with transmission of the next frame. Yes, you've read that right, each frame requires a full roundtrip between the sender and receiver! 802.11n is the first standard to introduce "frame aggregation," which allows multiple frames to be sent in a single transmission.

Of course, not all transmissions will succeed on their first attempt. If two peers transmit at the same time and on the same channel then a collision will happen and both peers will have to retransmit data: both peers sleep for a random interval and then repeat the process. The WiFi access model is simple to understand and implement, but as you can guess, also doesn't provide any guarantees about the latency costs of the transmission. If the network is mostly idle, then transmission times are nice and low, but if the network is congested, then all bets are off.

In fact, don't be surprised to see 100ms+ delays just for the first hop between the WiFi sender and the access point - e.g. see the histogram above, showing 180ms+ first-hop latency tails on my own (home) WiFi network. That said, note that there is no "typical" or "average" uplink WiFi latency: the latency will depend on the conditions and load of the particular WiFi network. In short, expect high variability and long latency tails, with an occasional chance of network collapse if too many peers are competing for access.

If your WiFi access point is also your gateway then you can run a simple ping command to measure your first hop latency.

Uplink scheduling on 4G networks

In order to make better use of the limited capacity of the shared radio channel and optimize energy use on the device, 4G/LTE standards take a much more hands-on approach to scheduling and resource assignment: the radio tower (eNodeB) notifies the device for when it should listen for inbound data, and also tells the device when it is allowed to transmit data. As you can imagine, this can incur a lot of coordination overhead (read, latency), but such is the cost of achieving higher channel and energy efficiency.

  1. The radio network has a dedicated Physical Uplink Control Channel (PUCCH) which is used by the device to notify the radio network that it wants to transmit data: each device has a periodic timeslot (typically on a 5, 10, or 20 ms interval) where it is allowed to send a Scheduling Request (SR) that consists of a single bit indicating that it needs uplink access.

  2. The SR request bit is received by the radio tower (eNodeB) but the SR request on its own is not sufficient to assign uplink resources as it doesn't tell the scheduler the amount of data that the device intends to transfer. So, the eNodeB responds with a small "uplink grant" that is just large enough to communicate the size of the pending buffer.

  3. Once the device receives its first uplink grant, it waits for its turn to transmit (up to ~5 ms), and sends a Buffer Status Report (BSR) indicating the amount of application data pending in its upload buffers. Finally, the eNodeB receives the BSR message, allocates the necessary uplink resources and sends back another uplink grant that will allow the device to drain its buffer.

What happens if additional data is added to the device buffer while the above process is underway? Simple, the device sends another BSR message and waits for new uplink grant! If timed correctly, then the BSR requests and uplink grants can be pipelined with existing data transfers from the device, allowing us to minimize first-hop delays. On the other hand, once the device buffer is drained and then new data arrives, the entire process is repeated once over: SR, uplink grant, BSR, uplink grant, data transmission.

So, what does this all mean in practice? Let's do the math:

  • If the network is configured to use a 10 ms periodic interval for communicating SR messages then we would expect a ~5 ms average delay before the SR request is sent.
  • There are two full roundtrips between the device and the eNodeB to negotiate the uplink resource assignment to transmit pending application data. The latency incurred by these roundtrips will vary for each network, but as a rule of thumb each exchange is ~5 ms.

Add it all up, and we're looking at 20+ ms of delay between application data arriving at the (empty buffer) of the link layer on the device and the same data being available at the link layer of the eNodeB. From there the packet needs to traverse the carrier network, exit onto the public network, and get routed to your server.

Above uplink latency overhead is one reason why low latency applications, such as delivering voice, can be a big challenge over 4G networks. In fact, for voice specifically, there is ongoing work on Voice over LTE (VoLTE) which aims to address this problem. How? Well, one way is to provide a persistent uplink grant: transmit up to X bytes on a Y periodic interval. Believe it or not, today most 4G networks still fall back to old 3G infrastructure to transmit voice!

Optimizing for WiFi and 4G networks

As you can tell, both WiFi and 4G have their challenges. WiFi can deliver low latency first hop if the network is mostly idle: no coordination is required and the device can transmit whenever it senses that the radio channel is idle. On the other hand, WiFi is subject to high variability and long latency tails if the network has many peers competing for access - and most networks do.

By contrast, 4G networks require coordination between the device and the radio tower for each uplink transfer, which translates to higher minimum latency, but the upside is that 4G can reign in the latency tails and provides more predictable performance and reduces congestion.

So, how does all this impact application developers? First off, latency aside, and regardless of wireless technology, consider the energy costs of your network transfers! Periodic transfers incur high energy overhead due to the need to wake up the radio on each transmission. Second, same periodic transfers also incur high uplink coordination overhead - 4G in particular. In short, don't trickle data. Aggregate your network requests and fire them in one batch: you will reduce energy costs and reduce latency by amortizing scheduling overhead.

Back to Basics: Writing SQL Queries

Posted 4 months back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

Almost all applications store data in one format or another somewhere. Us developers spend a lot of time thinking about our data. In a most cases we end up storing that data in a relational database. The advent of the ORM has made writing raw SQL much less common, but when we do it's good to have an understanding of the fundamental elements of the language. Today we'll go over some basic queries and the syntax required to make them.

Setup

For the sake of simplicity we'll use SQLite3 for this blog. It's important to note that none of the topics or commands we'll look at are specific to SQLite3. All the query examples we'll go over below are ISO 9705 compliant and will work in any of the major databases (Postgresql, MySql, Oracle, etc).

From the command line fire up SQLite3 and create a database named back_to_basics:

% sqlite3 back_to_basics

Now let's create three tables. We'll call them players, teams and players_teams:

SQLite3 version 3.7.12 2012-04-03 19:43:07
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> CREATE TABLE players (id INTEGER PRIMARY KEY ASC, name TEXT, seasons_played INTEGER);
sqlite> CREATE TABLE teams (id INTEGER PRIMARY KEY ASC, name TEXT);
sqlite> CREATE TABLE players_teams (player_id INTEGER, team_id INTEGER, won_championship BOOLEAN);
sqlite> .tables
players        players_teams  teams

Before we're done with the setup stage let's turn on explain mode. This will let us see our column names in query results:

sqlite> .explain on

Our Data

In order to write any queries we'll need some data, which means we need some players and teams. We'll use baseball since players tend to move around more in that sport.

Let's enter information on three baseball players and the first two teams they played for in their careers (just to keep the data set small).

Player First Team Second Team
Nolan Ryan New York Mets California Angels
Jim Sundberg Texas Rangers Milwaukee Brewers
Ivan Rodriguez Texas Rangers Florida Marlins

INSERT

To get our player data into the database we'll use the INSERT statement:

sqlite> INSERT INTO players (name, seasons_played) VALUES ('Nolan Ryan', 27);
sqlite> INSERT INTO players (name, seasons_played) VALUES ('Jim Sundberg', 16);
sqlite> INSERT INTO players (name, seasons_played) VALUES ('Ivan Rodriguez', 21);

SELECT

Now that we have data in our first table let's run a query and make sure it looks right. We'll use the SELECT statement to do this. For our first query we'll just ask the database to return all rows and columns from our players table. We'll use the * operator to do this:

sqlite> SELECT *
   ...> FROM players;
id    name           seasons_played
----  -------------  --------------
1     Nolan Ryan     27
2     Jim Sundberg   16
3     Ivan Rodriguez 21

In place of the * operator we can also give the SELECT statement a list of columns. This will limit the result set to just the columns we're interested in:

sqlite> SELECT name
   ...> FROM players;
name
----
Nolan Ryan
Jim Sundberg
Ivan Rodriguez

ORDER BY

We can also order our results how we'd like. This is done by using the ORDER BY clause:

sqlite> SELECT *
   ...> FROM players
   ...> ORDER BY seasons_played;
id    name           seasons_played
----  -------------  --------------
2     Jim Sundberg   16
3     Ivan Rodriguez 21
1     Nolan Ryan     27

Previously our results were always ordered by id. Because we used the ORDER BY clause we get results ordered by the seasons_played column.

You can also specify you would like to order results descending:

sqlite> SELECT *
   ...> FROM players
   ...> ORDER BY seasons_played DESC;
id    name           seasons_played
----  -------------  --------------
1     Nolan Ryan     27
3     Ivan Rodriguez 21
2     Jim Sundberg   16

A Little More Setup

Now that we know how to insert data, and query to ensure we entered it correctly, let's add data to our teams table:

sqlite> INSERT INTO teams (name) VALUES ('Texas Rangers');
sqlite> INSERT INTO teams (name) VALUES ('Florida Marlins');
sqlite> INSERT INTO teams (name) VALUES ('New York Mets');
sqlite> INSERT INTO teams (name) VALUES ('California Angels');
sqlite> INSERT INTO teams (name) VALUES ('Milwaukee Brewers');
sqlite> INSERT INTO teams (name) VALUES ('New York Yankees');

Junction Table

Now we need to connect our players and teams. We'll do this in our teams_players table. This is what is called a junction table. Junction tables are used to create many to many relationships in relational databases. They achieve this by combining common data from multiple tables. In our case we're going to include the id columns from our teams and players tables thus allowing us to relate rows from one to the other.

We'll start with Nolan. His first two teams were the New York Mets and the California Angels. First we need to get our ids:

sqlite> SELECT *
   ...> FROM players;
id    name           seasons_played
----  -------------  --------------
1     Nolan Ryan     27
2     Jim Sundberg   16
3     Ivan Rodriguez 21

Nolan's id is 1. Let's find out what the Mets and Angles ids are:

sqlite> SELECT *
   ...> FROM teams;
id    name
----  -------------
1     Texas Rangers
2     Florida Marlins
3     New York Mets
4     California Angels
5     Milwaukee Brewers
6     New York Yankees

The Mets have an id of 3 and the Angels have an id of 4. Nolan won a World Series with the Mets. We now have enough information to write our INSERT statements:

sqlite> INSERT INTO players_teams (player_id, team_id, won_championship) VALUES (1, 4, 0);
sqlite> INSERT INTO players_teams (player_id, team_id, won_championship) VALUES (1, 3, 1);

In the INSERT statements above we associate the player with id 1 (Nolan) to the teams with id 3 (Mets) and 4 (Angels) and provide a 0, which represents false, for the won_championship column for the Angels and a 1, for true, for the won_championship column for the Mets. As you can see we're able to create relationships between tables by using the ids our database is generating for each of our rows. This is one of the corner stones of the relational database and is called a primary key.

Now we just need to finish up building our players_teams table.

Jim played for the Rangers and Brewers and didn't win a championship with either:

sqlite> INSERT INTO players_teams (player_id, team_id, won_championship) VALUES (2, 1, 0);
sqlite> INSERT INTO players_teams (player_id, team_id, won_championship) VALUES (2, 5, 0);

Ivan played for the Rangers and the Marlins and won a championship with the Marlins:

sqlite> INSERT INTO players_teams (player_id, team_id, won_championship) VALUES (3, 1, 0);
sqlite> INSERT INTO players_teams (player_id, team_id, won_championship) VALUES (3, 2, 1);

INNER JOIN

Now that we have some data to play with let's write some multi-table queries.

In order to this we'll have to use joins. The most common join we have in our tool box is called an INNER JOIN. An INNER JOIN allows us to combine two tables based on a condition we provide. The only rows from both tables that remain after the join are the rows that satisfy the condition.

Let's join our players table to our players_teams and take a look at the results on the conditions the players table id matches the players_teams player_id column:

sqlite> SELECT *
   ...> FROM players
   ...> INNER JOIN players_teams ON players.id = players_teams.player_id;
id    name           seasons_played  player_id  team_id  won_championship
----  -------------  --------------  ---------  -------  -------------
1     Nolan Ryan      27             1          4        0
1     Nolan Ryan      27             1          3        1
2     Jim Sundberg    16             2          1        0
2     Jim Sundberg    16             2          5        0
3     Ivan Rodriguez  21             3          1        0
3     Ivan Rodriguez  21             3          2        1

The condition we discussed above is what comes after the ON keyword.

Notice that the rows in the players table have all been doubled. This is because the INNER JOIN preserves the of the rows from both tables, so since there are two rows in the players_teams table for every one row in the players table (because we're looking at each players first two teams) we see the rows from the players table twice.

Earlier when we inserted data into our teams table we added a row for the New York Yankees, but we actually don't have a player who spent one of their first two seasons with the Yankees.

Let's see what happens if we INNER JOIN the teams table and players_teams table:

sqlite> SELECT *
   ...> FROM teams
   ...> INNER JOIN players_teams ON teams.id = players_teams.team_id;
id    name               player_id  team_id  won_championship
----  -------------      ---------  -------  ----------------
4     California Angels  1          4        0
3     New York Mets      1          3        1
1     Texas Rangers      2          1        0
5     Milwaukee Brewers  2          5        0
1     Texas Rangers      3          1        0
2     Florida Marlins    3          2        1

Because we used an INNER JOIN the only rows present are the rows that satisfy our condition and in this case the New York Yankees do not, so that row isn't present in our result set.

We're not limited to joining just two tables either. Let's inner join all three of our tables together and see what that looks like:

sqlite> SELECT *
   ...> FROM players
   ...> INNER JOIN players_teams ON players.id = players_teams.player_id
   ...> INNER JOIN teams ON players_teams.team_id = teams.id;
id    name           seasons_played  player_id  team_id  won_championship  id  name
----  -------------  --------------  ---------  -------  ----------------  --  -------------
1     Nolan Ryan      27             1          4        0                 4   California Angels
1     Nolan Ryan      27             1          3        1                 3   New York Mets
2     Jim Sundberg    16             2          1        0                 1   Texas Rangers
2     Jim Sundberg    16             2          5        0                 5   Milwaukee Brewers
3     Ivan Rodriguez  21             3          1        0                 1   Texas Rangers
3     Ivan Rodriguez  21             3          2        1                 2   Florida Marlins

We can also take advantage of the SELECT statement we learned about above to create more readable results sets. Let's look at a list of players and the teams they played on only:

sqlite> SELECT players.name, teams.name
   ...> FROM players
   ...> INNER JOIN players_teams ON players.id = players_teams.player_id
   ...> INNER JOIN teams ON players_teams.team_id = teams.id;
name            name
----            -------------
Nolan Ryan      California Angels
Nolan Ryan      New York Mets
Jim Sundberg    Texas Rangers
Jim Sundberg    Milwaukee Brewers
Ivan Rodriguez  Texas Rangers
Ivan Rodriguez  Florida Marlins

LEFT OUTER JOIN

A LEFT OUTER JOIN is very similar to an INNER JOIN with one big exception. If a row in the table being joined on to does not match the condition specified in the join the row still remains in the result set.

Let's look at the query from above where we joined players_teams to teams again. As we recall, when we used an INNER JOIN on the two tables the Yankees were omitted form the result set. Let's see what happens when we LEFT OUTER join the two tables:

sqlite> SELECT *
   ...> FROM teams
   ...> LEFT OUTER JOIN players_teams ON teams.id = players_teams.team_id;
id    name               player_id  team_id  won_championship
----  -------------      ---------  -------  ----------------
1     Texas Rangers      2          1        0
1     Texas Rangers      3          1        0
2     Florida Marlins    3          2        1
3     New York Mets      1          3        1
4     California Angels  1          4        0
5     Milwaukee Brewers  2          5        0
6     New York Yankees

The Yankees show up in our result set, but with no values in the columns associated with the players_teams table.

Like INNER JOIN we can also specify columns in the SELECT, we can LEFT OUTER JOIN multiple tables and we can also mix INNER JOIN and LEFT OUTER JOIN in the same query.

WHERE

The WHERE clause gives us the ability to specify a condition that will be applied to every row in our final result set. If the condition is not met the row will not remain part of the result set.

Let's take a look at a list of all the players in our database that spent over 20 years in the league:

sqlite> SELECT *
   ...> FROM players
   ...> WHERE seasons_played > 20;
id    name           seasons_played
----  -------------  --------------
1     Nolan Ryan     27
3     Ivan Rodriguez 21

Our result set only includes the two players with over 20 years.

There are lots of different operators we can use in our WHERE clauses. Above we used the greater than operator. Let's take a look at a few more of our options.

We can look for rows that meet equality conditions:

sqlite> SELECT *
   ...> FROM players
   ...> WHERE seasons_played = 16;
id    name           seasons_played
----  -------------  --------------
2     Jim Sundberg   16

We can look for rows that contain values between two values:

sqlite> SELECT *
   ...> FROM players
   ...> WHERE seasons_played BETWEEN 20 and 22;
id    name           seasons_played
----  -------------  --------------
3     Ivan Rodriguez 21

We can look for rows that contain a value in a set we provide:

sqlite> SELECT *
   ...> FROM players
   ...> WHERE seasons_played IN (16, 27);
id    name           seasons_played
----  -------------  --------------
1     Nolan Ryan     27
2     Jim Sundberg   16

We can also user the WHERE clause on result sets that have been created by joining multiple tables. Let's look at a list of players that have won a championship:

sqlite> SELECT *
   ...> FROM players
   ...> INNER JOIN players_teams ON players.id = players_teams.player_id
   ...> WHERE won_championship = 1;
id    name           seasons_played  player_id  team_id  won_championship
----  -------------  --------------  ---------  -------  ----------------
1     Nolan Ryan      27             1          3        1
3     Ivan Rodriguez  21             3          2        1

We can also user the WHERE clause on two columns from two different tables by concatenating with an AND or OR:

sqlite> SELECT *
   ...> FROM players
   ...> INNER JOIN players_teams ON players.id = players_teams.player_id
   ...> WHERE players_teams.won_championship = 1
   ...> AND players.seasons_playerd > 21;
id    name           seasons_played  player_id  team_id  won_championship
----  -------------  --------------  ---------  -------  ----------------
1     Nolan Ryan      27             1          3        1

DELETE

Deleting from a table is done using the DELETE command. Above we made a mistake and inserted the New York Yankees into our teams table. None of the players we have in our database played for that team, so, like I wish we could in real life, we need to delete the Yankees.

First let's get the Yankees' id:

sqlite> SELECT *
   ...> FROM teams;
id    name
----  -------------
1     Texas Rangers
2     Florida Marlins
3     New York Mets
4     California Angels
5     Milwaukee Brewers
6     New York Yankees

We'll use their id, 6, to safely delete them from the database:

sqlite> DELETE FROM teams WHERE id = 6;
sqlite> SELECT * 
   ...> FROM teams;
id    name
----  -------------
1     Texas Rangers
2     Florida Marlins
3     New York Mets
4     California Angels
5     Milwaukee Brewers

I like that result set much better!

GROUP BY

We can also group results together and aggregate values. We'll list our teams and aggregate the total number of players in our database that spent one of their first two seasons playing there. This query will require us to GROUP BY team name and aggregate the number of players using the COUNT function. We'll count the number of players on a team by counting the number of players.name values in each group:

sqlite> SELECT teams.name, COUNT(players.name)
   ...> FROM teams
   ...> INNER JOIN players_teams ON teams.id = players_teams.team_id
   ...> INNER JOIN players ON players.id = players_teams.player_id
   ...> GROUP BY teams.name;
name               COUNT(players.name)
----               -----------------
California Angels  1
Florida Marlins    1
Milwaukee Brewers  1
New York Mets      1
Texas Rangers      2

Let's look at what the result set would look like without grouping (we'll order by team name to make the results more obvious):

sqlite> SELECT teams.name, players.name
   ...> FROM teams
   ...> INNER JOIN players_teams ON teams.id = players_teams.team_id
   ...> INNER JOIN players ON players.id = players_teams.player_id
   ...> ORDER BY teams.name;
name               players.name
----               -----------------
California Angels  Nolan Ryan
Florida Marlins    Ivan Rodriguez
Milwaukee Brewers  Jim Sundberg
New York Mets      Nolan Ryan
Texas Rangers      Jim Sundberg
Texas Rangers      Ivan Rodriguez

The Texas Rangers show up twice in the non-grouped result set.

One thing to note about the GROUP BY clause is anything we leave in the SELECT statement must either be aggregated, what we're doing with the COUNT function, or in the GROUP BY clause, in this case teams.name.

There are many other types of aggregate functions we can use and I encourage we to research more. Here is a list of aggregate functions available in Postgres.

HAVING

The HAVING clause works like a WHERE clause, but on grouped results sets. If we go back to our list of teams and player counts from above we can use a HAVING to limit the result set to only teams that have more than one player from our list on them in their first two seasons, or a COUNT(player.id) of greater than one:

sqlite> SELECT teams.name, COUNT(players.id)
   ...> FROM teams
   ...> INNER JOIN players_teams ON teams.id = players_teams.team_id
   ...> INNER JOIN players ON players.id = players_teams.player_id
   ...> GROUP BY teams.name
   ...> HAVING COUNT(players.id) > 1;
name           COUNT(players.id)
----           -----------------
Texas Rangers  2

Our result set has now been limited to only the teams we're interested in.

Sub Queries

We can also embed queries in our queries to create more useful result sets. Let's use a sub query to get a list of players who have won a championship:

sqlite> SELECT *
   ...> FROM players
   ...> INNER JOIN (
   ...>   SELECT player_id, MAX(won_championship)
   ...>   FROM players_teams
   ...>   GROUP BY player_id) sub_query_players_teams ON players.id = sub_query_players_teams.player_id;
id    name           seasons_played  player_id  MAX(won_championship)
----  -------------  --------------  ---------  ---------------------
1     Nolan Ryan      27             1          1
2     Jim Sundberg    16             2          0
3     Ivan Rodriguez  21             3          1

In this case we're just doing an inner join, but instead of to an existing table, we're creating a new result set and joining to that. Our subquery consists of only a list of player ids and the max of all of their won_championship columns. In the case they have the max will be one otherwise it will be zero. Taking advantage of the sub query we're able to get a nice list of players and whether they won a championship (the last column in our result set). If we tried to do this without a subquery we would have to include all the information from our players_teams table, which as we recall from our inital INNER JOIN would lead to the players in the result set being doubled. In addition we couldn't say definitively if a player had won or not. We would be forced to look at several different rows to deduce the information.

What's next?

If you found this useful, you might also enjoy:

Episode #457 - April 18th, 2014

Posted 5 months back at Ruby5

URL parsing with Rippersnapper, awesome APIs with Pliny, thread-safe utilities from Charles Nutter, a revival of the invoicing gem, info about recursion and memoization, querying git with gitql, and refactoring bad controllers all in this episode of the Ruby5 podcast!

Listen to this episode on Ruby5

Sponsored by New Relic

New Relic is _the_ all-in-one web performance analytics product. It lets you manage and monitor web application performance, from the browser down to the line of code. With Real User Monitoring, New Relic users can see browser response times by geographical location of the user, or by browser type.
This episode is sponsored by New Relic

Rippersnapper

Working with URLs in your app has never been easier with Rippersnapper. Give it a URL and get back all your favorite parts!
Rippersnapper

Pliny

Base Sinatra app for writing excellent APIs in Ruby.
Pliny

thread_safe

A collection of utilities for thread-safe programming in Ruby.
thread_safe

Invoicing

Before you roll your own invoicing framework, have a look at the invoicing gem from Codemancers. It might be just want you need!
Invoicing

Recursion and Memoization

Ray Hightower on avoiding the some recursion performance pitfalls in Ruby with memoization.
Recursion and Memoization

gitql

A git query language. Such git! So query!
gitql

Refactoring Controllers

Ever needed to refactor a bad controller? Justin Weiss walks you through his process in this new blog post.
Refactoring Controllers

Swirrl and the ODI

Posted 5 months back at RicRoberts :

ODI logo

Some great news - we’ve recently become members of the Open Data Institute. Founded in 2012 by Sir Tim Berners-Lee and Professor Sir Nigel Shadbolt, the ODI is an independent, non-profit, non-partisan, limited by guarantee company. And they’re all about promoting innovation and collaboration when it comes to Open Data, not only in the UK but around the world.

So we’re happy as sand boys (and girls) to be members of an organisation that shares our passion for data. And, if that wasn’t enough, we’re also working with the ODI on the European DaPaaS project that we recently blogged about.

You can find us in their member directory and, of course, we’re always here if you want to get in touch.

Mid-Senior Ruby on Rails Developer up to 60K central London

Posted 5 months back at Ruby on Rails, London - The Blog by Dynamic50

Hiring a Mid-Senior Ruby on Rails Developer. You will be an experienced Ruby and Rails engineer with the opportunity to work in central London with a great company.


Essential skills
Ruby
Rails
Exposure to Rack Middleware and implementation
Gems
RSpec/Cucumber/Capybara
Creating /Consuming RESTful API’s
GIT


Experience that would be Nice to have
Cloud -AWS, Heroku
Elastic Search, Lucene, Solr
 

Central London, great company!


up to 60K based on experience


Get in touch now alex.tealdi@dynamic50.com 020 3286 2879

DaPaaS: simplifying open data publishing and use

Posted 5 months back at RicRoberts :

DaPaas logo

Towards the end of last year, we started work on a European collaborative project called DaPaaS (Data-and-Platform-as-a-Service). It aims to optimise and simplify both publication and use of Open Data across different platforms.

Being part of this is great for us because it’s very well aligned to our development objectives for our PublishMyData platform.

Our main role in the project is to simplify data publishing for non experts. So, we’ll be developing tools to help users create and publish Linked Data themselves. And we’ll also be working towards creating a clear methodology for data publishing that can be used in the DaPaaS system. Exciting stuff, because it’ll make Linked Data more accessible to people who don’t usually work directly with the advanced, techy side, which is one of our goals.

But the good stuff doesn’t end there. We’re fortunate to be working with some really talented people on this project including our friends at the ODI. The full list of project partners is:

You can read more about DaPaas and our consortium on the project website.

Test Driving iOS - A Primer

Posted 5 months back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

In case you missed it, our own Britt Ballard (a fellow native Texan) published a fantastic post in our Back to Basics series on test-first methodology. My only issue with it is that the entire post is written in this super obscure language called Ruby! So I wanted to take a look at the same methodology from the point of view of an iOS developer, in tried and true Objective-C.

Our desired code

We would like to build a system that includes the following:

  • A category method for reversing strings
  • A custom UITableViewCell subclass with a method that:
    • Takes a string
    • Reverses the string
    • Sets the string as the text on its label

Tools and techniques

We're going to use the same techniques as laid out in Britt's post. We'll use red/green/refactor methodology, and we'll write unit tests that are fast, isolated, repeatable, self-verifying, and timely. But instead of using RSpec, we're going to use Specta and Expecta to test our method's output. These are modular spec and matcher frameworks (respectively) that make XCTest not horrible. We'll also use OCMock to mock out any external dependencies we might run into.

A quick aside on using Specta

Assuming you're using CocoaPods, you'll want to create a group inside your Podfile for our dependencies:

target :unit_tests, :exclusive => true do
  link_with 'UnitTests'
  pod 'Specta'
  pod 'Expecta'
  pod 'OCMock'
end

This will keep our testing dependencies out of our main bundle, and the :exclusive => true flag will keep any normal dependencies out of our testing bundle. This is to avoid duplicate symbols when the testing bundle is injected into the host application.

Specta has a syntax you might not be immediately familiar with. I highly recommend installing their file templates, but all you really need to know is that instead of your normal pattern for defining a class:

@implementation MyTestClass : XCTestCase

// Tests go here

@end

Specta uses some nifty macros to clean things up:

SpecBegin(ClassUnderTest)

// Tests go here

SpecEnd

Convention (stolen from Ruby) says that we should name our spec files ClassUnderTestSpec.m. Specs for categories will be named FoundationClass_ExtensionSpec.m. We don't need public headers for specs.

The testing syntax itself is very block heavy. But other than that, it follows RSpec's syntax almost exactly.

The category method: Starting with the desired end state

We'll start in the same place as we would in Ruby. Given a string ("example string"), we expect to be able to call a method to get a reversed version of that string ("gnirts elpmaxe"):

// NSString_ReversedSpec.m

#import "NSString+Reversed.h"

SpecBegin(NSString_Reversed)

describe(@"NSString+Reversed", ^{
    it(@"reverses strings", ^{
        NSString *originalString = @"example string";

        NSString *reversedString = [originalString reversedString];

        expect(reversedString).to.equal(@"gnirts elpmaxe");
    });
});

SpecEnd

Now, here's where Ruby and Objective-C diverge a bit. If we look at Britt's original post, this is when he starts running the tests. But since we have a compiler, and Xcode generates warnings and errors in real time, we're able to lean on those a bit to get us moving forward.

Right now we should have 2 errors displayed:

  • 'NSString+Reversed.h' file not found for our #import statement
  • No visible @interface for 'NSString' declares the selector 'reversedString'

This is the roadmap for our first few steps. Looking at the errors, we can see that we need to do 2 things:

  • Create a new Reversed category on NSString
  • Define a public reversedString method for that category

Let's go ahead and take care of the first one:

// NSString+Reversed.h

@interface NSString (Reversed)
@end

// NSString+Reversed.m

#import "NSString+Reversed.h"

@implementation NSString (Reversed)
@end

Once Xcode catches up, we should see one of the errors disappear. The only remaining error is telling us that we need to define reversedString. We know by looking at the usage that it's going to need to return an NSString. So let's go ahead and declare it:

// NSString+Reversed.h

@interface NSString (Reversed)

- (NSString *)reversedString;

@end

And hey! Just like that, we've gotten rid of all of the errors that were preventing us from building. But there's still one more thing we can use the compiler to lead us to, before actually running the tests.

Once the errors went away, we should have seen a warning pop up (you might have to build the project to get Xcode to catch up). The warning tells us that while we're saying that NSString has a method named reversedString, there isn't actually any implementation for a method with that name anywhere. Let's implement that method just enough to get rid of the warning:

// NSString+Reversed.m

#import "NSString+Reversed.h"

@implementation NSString (Reversed)

- (NSString *)reversedString
{
    return nil;
}

@end

Blam. No warnings, no errors. We haven't run the tests once, but we've used the compiler to help us get to the point where we know that our class is set up the way we said it would be. Let's finally run the tests! I recommend getting used to smashing ⌘U in Xcode. It's a handy shortcut. In the test navigator we see the failure:

And we see the error inline:

expected: gnirts elpmaxe, got: nil/null

So now we know that everything is hooked up properly, but we aren't returning the right thing. Let's make the test pass with the least amount of work possible:

// NSString+Reversed.m

#import "NSString+Reversed.h"

@implementation NSString (Reversed)

- (NSString *)reversedString
{
    return @"gnirts elpmaxe";
}

@end

And now we have our first passing test!

Refactoring

Returning a hardcoded value got our tests to pass, but it surely isn't what we want in production. It's pretty obvious that this method won't work for any string value, but now that we've gotten our test suite (small that it may be) to green, we can use it to help us complete the implementation without changing the expected behavior:

// NSString+Reversed.m

#import "NSString+Reversed.h"

@implementation NSString (Reversed)

- (NSString *)reversedString
{
    NSMutableString *reversedString = [NSMutableString string];

    NSRange range = NSMakeRange(0, [self length]);
    NSStringEnumerationOptions options = (NSStringEnumerationReverse | NSStringEnumerationByComposedCharacterSequences);

    [self enumerateSubstringsInRange:range
                               options:options
                            usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
                                [reversedString appendString:substring];
                            }];

    return [reversedString copy];
}

@end

Now we have a nice method that we can use on any NSString instance that will return a reversed version of that string, and the whole thing is backed by tests!

UITableViewCell subclass

We start our journey the same as before. We'll state our expected behavior and then work towards getting a clean build, and a green test suite.

We'll start with the initial requirement of reversing the string provided to the method. We're going to use a mock object to ensure that we're only dealing with our Foundation class and the class under test:

// TBReverseStringCellSpec.m

#import "TBReverseStringCell.h"

SpecBegin(TBReverseStringCell)

describe(@"TBReverseStringCell", ^{
    it(@"reverses the string provided to it", ^{
        TBReverseStringCell *cell = [TBReverseStringCell new];
        id stringMock = [OCMockObject mockForClass:[NSString class]];
        [[stringMock expect] reversedString];

        [cell setReversedText:stringMock];

        [stringMock verify];
    });
});

SpecEnd

Right away, we notice that we've got a few errors to deal with. We'll start by creating the class itself:

// TBReverseStringCell.h

@interface TBReverseStringCell : UITableViewCell
@end

// TBReverseStringCell.m

#import "TBReverseStringCell.h"

@implementation TBReverseStringCell
@end

Now we're left with the warning about the undeclared selector, so we'll fix that:

// TBReverseStringCell.h

@interface TBReverseStringCell : UITableViewCell

- (void)setReversedText:(NSString *)string;

@end

And now we're left with that same warning about the missing definition for setReversedText:. Once again, we're only going to add enough code to get our project to build cleanly. In this case, that means creating an empty method definition:

// TBReverseStringCell.m

#import "TBReverseStringCell.h"

@implementation TBReverseStringCell

- (void)setReversedText:(NSString *)string
{
}

@end

Now that our build is clean, we can run our tests, and get some meaningful feedback:

test failure: -[TBReverseStringCellSpec TBReverseStringCell_reverses_the_string_provided_to_it] failed:
OCMockObject[NSString]: expected method was not invoked: reversedString

So let's go ahead and fix that test:

// TBReverseStringCell.m

#import "TBReverseStringCell.h"
#import "NSString+Reversed.h"

@implementation TBReverseStringCell

- (void)setReversedText:(NSString *)string
{
    [string reversedString];
}

@end

And now our suite is green again. We can move on to our final requirement, that the cell sets the reversed string value on the internal label:

// TBReverseStringCellSpec.m

it(@"sets a string value on the internal label", ^{
    TBReverseStringCell *cell = [TBReverseStringCell new];
    id labelMock = [OCMockObject mockForClass:[UILabel class]];
    cell.reverseLabel = labelMock;
    [[labelMock expect] setText:@"gnirts elpmaxe"];

    [cell setReversedText:@"example string"];

    [labelMock verify];
});

Again, a build error because of a missing part of our public interface, so that's our first step:

// TBReverseStringCell.h

@interface TBReverseStringCell : UITableViewCell

@property (nonatomic) UILabel *reverseLabel;

- (void)setReversedText:(NSString *)string;

@end

With our build clean, we can see what we need to to to fix the test:

test failure: -[TBReverseStringCellSpec TBReverseStringCell_sets_a_string_value_on_the_internal_label] failed:
OCMockObject[UILabel]: expected method was not invoked: setText:@"gnirts elpmaxe"

A quick modification to our implementation gets us to our desired result:

// TBReverseStringCell.m

- (void)setReversedText:(NSString *)string
{
    NSString *reverseString = [string reversedString];
    self.reverseLabel.text = reverseString;
}

And there we have it. We were able to satisfy our requirements, and we did it by leading ourselves with tests. This example is simple and contrived, but this same pattern can be applied to requirements of any size.

What's next?

If you're excited about using TDD in your app, you should check out:

Episode #456 - April 15th, 2014

Posted 5 months back at Ruby5

In this episode we cover the results of the Cloudflare Heartbleed challenge, tracking trends in the Ruby community with the Ruby Survey, Rails 4.1 ActiveRecord enums, iStats for CPU temperature on OS X and some Insanely Useful ActiveAdmin Customizations.

Listen to this episode on Ruby5

Sponsored by ElixirSips

If you're interested in learning Elixir, but don't know where to start, then ElixirSips.com is perfect for you. It's two short screencasts each week - between 5 and 15 minutes each. ElixirSips currently consists of over 7 hours of densely packed videos, and there are more every week.
This episode is sponsored by ElixirSips

Heartbleed Verified

Last Friday, CloudFlare set up a honeypot server and invited the world to steal their private SSL keys. The goal was to prove their disbelief that private SSL keys were vulnerable to the Heartbleed security breach. The result was that two different people were able to steal their private SSL keys before the end of the day. Those individuals proved it to CloudFlare by sending them back their private SSL keys in an email.
Heartbleed Verified

Ruby Survey

The Ruby Survey for 2014 was recently posted, and the objective of the survey is to track trends in the Ruby community. If you are curious about the result for previous years, they are available on the website as well.
Ruby Survey

Enums

Andrea Longhi put together an article on Rails 4.1 ActiveRecords enums. She shows the methods that are automatically created for us, and points out how if we try to make an enum which is a reserved word, ActiveRecord is smart enough to explode.
Enums

iStats

iStats is a Ruby gem that displays the temperature for your computer's CPU. The gem is a Ruby wrapper for a small C library that interacts with Apple's IOKit library.
iStats

ActiveAdmin Customizations

Mike Ackerman’s wrote an article on Viget labs last Friday where he covers eight insanely useful ActiveAdmin customizations. Mike details how to add custom logic to your admin controllers, add query scoping to disallow some people from seeing *All The Things*, manipulate top-­level menus, display images in upload forms, and more.
ActiveAdmin Customizations

Refactoring Ruby Iteration Patterns to the Database

Posted 5 months back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

Frequently on projects we need to run a calculation through an ActiveRecord association. For example, we might want to get a user's all time purchases, a company's total amount of products sold, or in our case, the total amount of loans made to a campaign. This sort of calculation is ripe for using a map and an inject but frequently the solution is more elegant and faster if we can have SQL do the work instead.

Here is what our domain looks like.

class Campaign < ActiveRecord::Base
  has_many :loans
end

class Loan < ActiveRecord::Base
  belongs_to :campaign
end

The goal

We want to be able to ask a campaign how much money it has raised. An initial implementation might look like this.

class Campaign < ActiveRecord::Base
  # ...
  def funds_raised
    loans.map(&:amount).inject(:+)
  end
end

We are using the association to get the campaign's loans and then summing up the amount of each loan. If we look at our logs we'll see that the SQL generated is grabbing all of the loans for our campaign and pulling them into memory.

SELECT "loans".* FROM "loans" WHERE "loans"."campaign_id" = <target_campaign_id>

Then in Ruby (at the application layer), we run the map to get the amounts and the inject to get the sum. Done. Not that bad but we can do better.

Using SQL sum

We want to avoid pulling all this data into memory just to get a single number. Let's have the database do the heavy lifting.

class Campaign < ActiveRecord::Base
  # ...
  def funds_raised
    loans.sum(:amount)
  end
end

With this implementation we have the database do the calculation. We get access to sum from ActiveRecord. Check out the docs for more details.

Looking at the logs we can see the following SQL.

SELECT SUM("loans"."amount") AS sum_id FROM "loans" WHERE "loans"."campaign_id" = <target_campaign_id>

This SQL is going to give us the same answer but in a single step. Through this refactoring we have code that is easier to read, we use less application memory, and we see a performance boost because we're pushing the calculation down to the database. Based on benchmarks where each campaign has 1000 loans, the sum approach can be more than 20x faster than using map / inject.

Next time we reach for a map and inject on associated ActiveRecord objects, we'll check to see if there is a way to have the database do the work instead.

What's next?

If you would like some more background on additional SQL aggregate functions check out the following articles:

For some more information about Ruby's Enumerable class and Enumerable#inject in particular, check out:

Smart Data Hack Edinburgh

Posted 5 months back at RicRoberts :

In February, our very own Bill went to advise at the Smart Data Hack in Edinburgh. This was a 5 day event that involved undergraduates splitting into 16 teams to make apps using open data. Ewan Klein arranged it; it was well attended and, eager to encourage our love of data and its uses, we put up a prize of £250 for the best app that used government data.

And Bill said competition was high - you can read more detail on the different projects here but the winning team was called Go4itPrincess with their MSP Involvement Project. Their app examines how much Scottish Members of Parliament speak up and on what topics. It then aims to analyse any correlation between that and their constituency’s rankings on the Scottish Index of Multiple Deprivation. Pretty creative data use - kudos princess(es?)!

And projects aside, one of the things that Bill found most impressive was that the students could get to grips with SPARQL for extracting their data. And they were generally happier retrieving data from APIs than having to download files and manage them themselves - showing just how accessible API use can be.

If you want to get more of an insight about the hackathon, check out tweets from the event and the screenshots below (thanks to the OKF Scotland blog), which show some of the great apps created.

CareForEd project:

MSP Involvement project: