Breaking into Orkut with Mechanize

Posted about 7 years back at schadenfreude

Alternate title: Using Mechanize to Scrape Orkut for Love Letters from Lonely Brazilian Beauties and Translating their Wantings to English

Shout outs to Natai for the request.

In this tutorial we will use mechanize to login to Orkut, pull our scrap messages, we will assume the message is written in Portuguese and translate the message to English with Google Translate.

FUN!

Breaking into Orkut with Mechanize

Posted about 7 years back at schadenfreude

Alternate title: Using Mechanize to Scrape Orkut for Love Letters from Lonely Brazilian Beauties and Translating their Wantings to English

Shout outs to Natai for the request.

In this tutorial we will use mechanize to login to Orkut, pull our scrap messages, we will assume the message is written in Portuguese and translate the message to English with Google Translate.

FUN!

Axiom Of The Empty Set

Posted about 7 years back at zerosum dirt(nap) - Home

A list without any items in it is still a list, I must insist. Just like a box without anything in it is still a box, or a Smurf village without any Smurfs (Smurves?) is still a frickin Smurf villiage..

But not according to the W3C, apparently. File this one under Rant Of The Day: none of the standard XHTML doctypes acknowledge the existance of the empty list.

<!ELEMENT ul (li)+>

That is, neither the <ul></ul> or the <ul/> representations are valid XHTML. This seems pretty broken to me at first glance, although I’m certianly willing to hear a rebuttal if anyone has one. Even if you don’t think it’s busted, it certainly adds complication to a very common web dev scenario…

So we have a list of resources in our view and there’s a good possibility that our list is empty. We want to be able to display that list, and have a button on the page that lets me add new items to the list with a little Ajax love so that we don’t have to reload the page. If we can represent an empty list in the page, it’s simple: we can just render :update in Rails…

render :update do |page|
  page.insert_html(:bottom, :my_list, "<li>#{item}</li>")
end

The code above assumes that there’s a DOM ID ‘my\_list’ that has 0 or more elements. But since XHTML won’t let us represent a 0-element list, the obvious thing to do is to bake some extra smarts into the update code.

Now it’s the job of the update to determine if a list exists, and if not to create it, but only for the special one-off case of the zero element list? Yuck, bleh. No thanks. This is probably what most people do, but it just strikes me as putting too much intelligence into something that’s supposed to be pretty dumb.

The alternate solution listed here also qualifies as a hack, no bones about it. But, at least in my opinion, it’s a somewhat more elegant hack than what was described above. File under simple Workaround Of The Day:

<ul id="list_things">
  <li class="invisible" style="display: none"/>
</ul>

Note that the W3C does support empty list item elements :-). Thanks to azta for the suggestion.

Super Mathematics

Posted about 7 years back at zerosum dirt(nap) - Home


Do the math. No matter what you think, nobody’s perfect.

Instance variables, class variables, and inheritance in Ruby

Posted about 7 years back at Sporkmonger

I’m seeing a lot of code that indicates to me that people don’t have a complete grasp of when to use class variables in Ruby.

Here’s a quick example script that should give people a better idea of what’s going on:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

class ExampleClass
  @variable = "foo"
  @@variable = "bar"
  
  def initialize
    @variable = "baz"
  end
  
  def self.test
    puts @variable
  end
  
  def test
    self.class.test
    puts @@variable
    puts @variable
  end
end

class ExampleSubclass < ExampleClass
  @variable = "1"
  @@variable = "2"
  
  def initialize
    @variable = "3"
  end
end

first_example = ExampleClass.new
first_example.test

puts "---"

second_example = ExampleSubclass.new
second_example.test

Output:

foo
2
baz
---
1
2
3

Essentially, when you declare a class variable in a base class, it’s shared with all subclasses. Changing its value in a subclass will affect the base class and all of its subclasses all the way down the inheritance tree. This behavior is often exactly what’s desired. But equally often, this behavior is not what was intended by the programmer, and it leads to bugs, especially if the programmer did not originally expect for the class to be subclassed by someone else.

I strongly encourage Ruby developers to sit down and think about how they want their classes to behave if subclassed by someone else. Be careful about using class variables, because often what you actually wanted was an instance variable on the class object.

NH Ruby UG Meeting.002 Tuesday

Posted about 7 years back at zerosum dirt(nap) - Home

If you’re a Ruby developer (or just an interested outsider) living in southern Maine/NH or the northern Mass area, don’t forget to attend the next meeting of the NH Ruby/Rails User Group. Discussion topic this week is RJS and Ajax. Come hang out with us in Portsmouth tomorrow and make sure to stick around afterwards for drinks, discussion, and merriment.

Oh yeah, and Scott has some free stuff to give away at the gathering too. You like free stuff, don’t you?

Initial reactions to Rails 1.2... annoyed

Posted about 7 years back at Wood for the Trees

I’ve held off my opinion about Rails 1.2 until I had a little more experience with it. First of all, I’m going to say I’m glad Rails has put out this release, because some of the features in Edge I’ve been needing for a while and, in cases where I couldn’t wait, like my simply_restful_backport, I hacked a ported version to 1.1.x. I also had unpublished ports for REST, routing, and a few deprecations to prepare myself. Most of the added features are bang on what is needed when following the Rails methodology of web development: singleton resources, nesting resources, the emphasis on REST, the respond_to(&block), etc.

But my optimism and preparations were in vain. I’m annoyed. My initial reaction is that Rails 1.2 is a fascist. The front-facing libraries, ActiveRecord and ActionPack, are not the cause of my annoyance. It’s the underbelly, ActiveSupport, which has acquired a rank smell.

How Deprecations are handled

I’ll be brief. Backward compatibility is beautiful and when you can’t maintain it, you prepare developers for the transition. Deprecrations are usually the answer: you say it is going to be removed for a long time, encourage people away from it, then finally remove it in either the next major release or the one after that, safe in the knowledge that everyone who will be affected by the change has got the message.

Rails has not held to the time-honoured traditions of deprecation. It does one of three things: fill up the logs with warning messages, raise annoying exceptions that apologise something is gone, or just plain crash because the old API had changed so much. That’s not deprecation. That’s just API upheaval.

And it’s unacceptable to have so many differences in the framework which, first, are undocumented, and second, change development so much. I would expect this behaviour from a 1.3 to 2.0 upgrade, but not between two minor releases. Maybe Rails Core should have re-numbered the release to warn users that Rails has changed quite drastically under the hood. The changes are fine, but it’s the way they came unannounced that is annoying.

Especially after so much time between the two minor versions, there was ample opportunity to tell developers… 1.1.6 was the last opportunity, and a better time was back at 1.1.2, when these additions already existed or were being considered. That is still too quick, but to give no warning, not even a proviso ‘this may change’, until it is too late is just insufferable.

ActiveSupport::Dependencies < ActiveSupport::Devil

For example, I don’t know why an innocuous ObjectSpace block in routes.rb should cause a blocking exception from ActiveSupport::Dependencies. It prevented my application from even loading and took half a day to track down. Or why I needed to alter ExceptionNotifier to require all its libs. Or track down 6 other ‘dependencies’ and make them explicit—I thought Rails was supposed to make this stuff easier, not more of a pain than C’s #include <hardtofind.h>?

It’s ridiculous for an ‘improvement’ release to be so fragile. Rails 1.1.x was pretty flexible on these terms. However, Rails, in its growing focus on encouraging programmers into standardised practices, is looking more like 37signals practices. And to say they are ‘best practices’ or good ‘conventions’ is a bit rich, because Rails is not the cleanest set of libraries. Just run it with $VERBOSE = true and you’ll see what I mean.[1]

In fact, after using ActiveSupport::Dependencies for a while, I’ve come to the conclusion that it is too clever for its own good and I foresee lots of bugs in Rails Core Trac about it. It’s too prying. It’s too observant of everything that is happening. It infects the production environment. It’s a glorified const_set trace_func.

ActiveSupport.include :bloat

What kind of overhead does ActiveSupport::Dependencies have? How many orders of magnitude slower is it than the previous Reloadable module, which just included a method, reloadable?, and was found via ObjectSpace? First of all, it didn’t need a revolution like Dependencies; it was clean, simple, elegant and extendable. It didn’t solve all the problems and had room for improvement, but removing it I think was a mistake. ActiveSupport::Dependencies, unlike it’s predecessor, is a beast. Here’s a quick LOC comparison between activesupport 1.3.1 and 1.4.1 (ignoring hooks in the other gems):

$ grep -v '^\s*#' activesupport-1.3.1/lib/active_support/reloadable.rb | wc -l
28
$ grep -v '^\s*#' activesupport-1.3.1/lib/active_support/dependencies.rb | wc -l
172

$ grep -v '^\s*#' activesupport-1.4.1/lib/active_support/reloadable.rb | wc -l
56
$ grep -v '^\s*#' activesupport-1.4.1/lib/active_support/dependencies.rb | wc -l
538

Not the most reliable comparison, but it gives a good idea of the bloat introduced by, first, dependency tracking and, second, simple deprecating. Most of those lines of code in Reloadable are about how it is not just deprecated, but downright crippled and how evil it was. This is not a good practice for a developing an open source framework which is not easy to upgrade (and we live with it, because Rails is useful). The difficulty of upgrading is precisely because of these badly planned deprecations and backward incompatibilities.

Enough ranting

In this sense, Rails 1.2 is by the far worst gem update I’ve experienced so far. How many blogs have posted about the many little things that needed to be changed to upgrade? How many forward-thinking developers looked in cringing, abject horror when their tests failed left and right after that (apparently) over-optimistic gem update rails? How many people flew to IRC and the mailing lists for help? Let me be so conceited as to introduce a ‘best practice’: discuss major changes with your users.

I realise that Rails is first and foremost a 37signals project, evolved from DHH’s work there and is primarily altered and improved for that company’s purposes, but as an open source project trying to get more users, it’s important to either branch those changes which are special to that company or to be more considerate to the user base. I think Rails development would benefit from:

  • a separate experimental branch (different from Edge trunk) for API changes—Radiant does this quite nicely
  • a highly visible discussion board on rubyonrails.org to discuss major changes with a link to it in IRC and regular updates to the Rails mailing lists

Granted, much of my qualms with Rails 1.2 is with Dependencies and with some good old fashioned hacking, they can fix it. However, I think it and the library from which it has spawned are worth my comment. Dependencies is the most invasive addition to the framework I’ve seen to-date. I appreciate the attempt at innovation, but Rails needs to stop doing this kind of widespread deprecation and fundamentalist intolerance to the unconventional.

I wish the development team would take more pleasure in the diversity of uses of Rails and less perverse pleasure in the militant drilling of young, transitional and experienced programmers. I like the framework, but I don’t work (or want to work) at 37signals.

1 I’ve been aware of my own transgressions for a while. I’m in complete agreement with Mauricio on the topic of warnings. They are there for a reason: to create semantically correct Ruby. We should all be able to run our fantastically short, readable, meta-programmed, overloaded and mixin-loving code with $VERBOSE = true. It isn’t difficult to achieve and it helps in the long-run to make that lovely little code more readable, usable and quiet.

Initial reactions to Rails 1.2... annoyed

Posted about 7 years back at Wood for the Trees

I’ve held off my opinion about Rails 1.2 until I had a little more experience with it. First of all, I’m going to say I’m glad Rails has put out this release, because some of the features in Edge I’ve been needing for a while and, in cases where I couldn’t wait, like my simply_restful_backport, I hacked a ported version to 1.1.x. I also had unpublished ports for REST, routing, and a few deprecations to prepare myself. Most of the added features are bang on what is needed when following the Rails methodology of web development: singleton resources, nesting resources, the emphasis on REST, the respond_to(&block), etc.

But my optimism and preparations were in vain. I’m annoyed. My initial reaction is that Rails 1.2 is a fascist. The front-facing libraries, ActiveRecord and ActionPack, are not the cause of my annoyance. It’s the underbelly, ActiveSupport, which has acquired a rank smell.

How Deprecations are handled

I’ll be brief. Backward compatibility is beautiful and when you can’t maintain it, you prepare developers for the transition. Deprecrations are usually the answer: you say it is going to be removed for a long time, encourage people away from it, then finally remove it in either the next major release or the one after that, safe in the knowledge that everyone who will be affected by the change has got the message.

Rails has not held to the time-honoured traditions of deprecation. It does one of three things: fill up the logs with warning messages, raise annoying exceptions that apologise something is gone, or just plain crash because the old API had changed so much. That’s not deprecation. That’s just API upheaval.

And it’s unacceptable to have so many differences in the framework which, first, are undocumented, and second, change development so much. I would expect this behaviour from a 1.3 to 2.0 upgrade, but not between two minor releases. Maybe Rails Core should have re-numbered the release to warn users that Rails has changed quite drastically under the hood. The changes are fine, but it’s the way they came unannounced that is annoying.

Especially after so much time between the two minor versions, there was ample opportunity to tell developers… 1.1.6 was the last opportunity, and a better time was back at 1.1.2, when these additions already existed or were being considered. That is still too quick, but to give no warning, not even a proviso ‘this may change’, until it is too late is just insufferable.

ActiveSupport::Dependencies < ActiveSupport::Devil

For example, I don’t know why an innocuous ObjectSpace block in routes.rb should cause a blocking exception from ActiveSupport::Dependencies. It prevented my application from even loading and took half a day to track down. Or why I needed to alter ExceptionNotifier to require all its libs. Or track down 6 other ‘dependencies’ and make them explicit—I thought Rails was supposed to make this stuff easier, not more of a pain than C’s #include <hardtofind.h>?

It’s ridiculous for an ‘improvement’ release to be so fragile. Rails 1.1.x was pretty flexible on these terms. However, Rails, in its growing focus on encouraging programmers into standardised practices, is looking more like 37signals practices. And to say they are ‘best practices’ or good ‘conventions’ is a bit rich, because Rails is not the cleanest set of libraries. Just run it with $VERBOSE = true and you’ll see what I mean.[1]

In fact, after using ActiveSupport::Dependencies for a while, I’ve come to the conclusion that it is too clever for its own good and I foresee lots of bugs in Rails Core Trac about it. It’s too prying. It’s too observant of everything that is happening. It infects the production environment. It’s a glorified const_set trace_func.

ActiveSupport.include :bloat

What kind of overhead does ActiveSupport::Dependencies have? How many orders of magnitude slower is it than the previous Reloadable module, which just included a method, reloadable?, and was found via ObjectSpace? First of all, it didn’t need a revolution like Dependencies; it was clean, simple, elegant and extendable. It didn’t solve all the problems and had room for improvement, but removing it I think was a mistake. ActiveSupport::Dependencies, unlike it’s predecessor, is a beast. Here’s a quick LOC comparison between activesupport 1.3.1 and 1.4.1 (ignoring hooks in the other gems):

$ grep -v '^\s*#' activesupport-1.3.1/lib/active_support/reloadable.rb | wc -l
28
$ grep -v '^\s*#' activesupport-1.3.1/lib/active_support/dependencies.rb | wc -l
172

$ grep -v '^\s*#' activesupport-1.4.1/lib/active_support/reloadable.rb | wc -l
56
$ grep -v '^\s*#' activesupport-1.4.1/lib/active_support/dependencies.rb | wc -l
538

Not the most reliable comparison, but it gives a good idea of the bloat introduced by, first, dependency tracking and, second, simple deprecating. Most of those lines of code in Reloadable are about how it is not just deprecated, but downright crippled and how evil it was. This is not a good practice for a developing an open source framework which is not easy to upgrade (and we live with it, because Rails is useful). The difficulty of upgrading is precisely because of these badly planned deprecations and backward incompatibilities.

Enough ranting

In this sense, Rails 1.2 is by the far worst gem update I’ve experienced so far. How many blogs have posted about the many little things that needed to be changed to upgrade? How many forward-thinking developers looked in cringing, abject horror when their tests failed left and right after that (apparently) over-optimistic gem update rails? How many people flew to IRC and the mailing lists for help? Let me be so conceited as to introduce a ‘best practice’: discuss major changes with your users.

I realise that Rails is first and foremost a 37signals project, evolved from DHH’s work there and is primarily altered and improved for that company’s purposes, but as an open source project trying to get more users, it’s important to either branch those changes which are special to that company or to be more considerate to the user base. I think Rails development would benefit from:

  • a separate experimental branch (different from Edge trunk) for API changes—Radiant does this quite nicely
  • a highly visible discussion board on rubyonrails.org to discuss major changes with a link to it in IRC and regular updates to the Rails mailing lists

Granted, much of my qualms with Rails 1.2 is with Dependencies and with some good old fashioned hacking, they can fix it. However, I think it and the library from which it has spawned are worth my comment. Dependencies is the most invasive addition to the framework I’ve seen to-date. I appreciate the attempt at innovation, but Rails needs to stop doing this kind of widespread deprecation and fundamentalist intolerance to the unconventional.

I wish the development team would take more pleasure in the diversity of uses of Rails and less perverse pleasure in the militant drilling of young, transitional and experienced programmers. I like the framework, but I don’t work (or want to work) at 37signals.

1 I’ve been aware of my own transgressions for a while. I’m in complete agreement with Mauricio on the topic of warnings. They are there for a reason: to create semantically correct Ruby. We should all be able to run our fantastically short, readable, meta-programmed, overloaded and mixin-loving code with $VERBOSE = true. It isn’t difficult to achieve and it helps in the long-run to make that lovely little code more readable, usable and quiet.

8.5 Minutes

Posted about 7 years back at zerosum dirt(nap) - Home

What were you doing for those eight and a half minutes?
Was it mean, was it petty, or did you realize you were sorry
And that you love them?

It’d be nice to think we could get it right down here just once.
G*d bless the Plan.

Inheritance vs Relational Databases in RoR

Posted about 7 years back at zerosum dirt(nap) - Home

There are three patterns in common use that deal with mapping object inheritance to relational databases. We didn’t discuss any of them in my graduate databases course (sigh), and our friend ActiveRecord implements only one of them: Single Table Inheritance (STI).

STI is what you use when you want to represent an object hierarchy by mapping the union of all attributes found in that class hierarchy to a single underlying database table. Yup, it’s a mouthful. And it works great if the attributes available on the subclasses all tend to be very similar. It’s very fast (comparatively), but makes poor use of space since unused fields are just left null. ActiveRecord’s implementation uses a type field in the database to identify the subclass that the row belongs to.

So what about alternative approaches? There are a couple…

First up is Concrete Table Inheritance, in which there are n database tables for n concrete classes in the hierarchy. Although I’m sure there are cases where this must be useful, I can’t think of a single one. The problem is the silly amount of replication that happens here — fields common to classes will get replicated across tables. This means a lot of replication, more and more as the object hierarchy gets deeper, and a lot of difficulty when it gets to be refactoring time. Bleh, no thanks.

Class Table Inheritance, by contrast, has a table for each class, including interior nodes in the hierarchy (classes that have derivations). A subclass is represented at the database layer as a table that has some number of additive fields plus a foreign key to it’s parent class, which contains the fields that are common to all ancestors.

Conceptually, this seems like the cleanest strategy to me, and the most “OO” of the approaches. However, when we think about the implementation, we realize that performance is going to be a big issue: the class-table strategy is going to require n joins for an object nested n levels into a class hierarchy. Ouch.

Despite that admittedly ugly problem, there has been some interest in implementing Class Table Inheritance in ActiveRecord. In fact there’s some contributed code on the Rails wiki that should allow you to use it as an alternative to STI in Rails. Someone should plugin-ize it, perhaps.

I initially set out to look at alternative DB/object mappings because I was frustrated by the fact that STI wasn’t “accommodating” enough for me. At one point I thought that Class Table Inheritance might be a better match for a specific problem we were looking to solve. It turns out that wasn’t really the case at all; The problem wasn’t STI, it was the way our object hierarchy was structured.

My Lesson Of The Day for February 16th, 2007: If your object hierarchy is relatively shallow, and the single table strategy produces ugly results for you (lots of null fields on subclasses), take another look at the relationships between your objects. STI is a great fit for things that are very closely related to the parent class and require few extra attributes to express that. It’s a lousy fit, otoh, for object relations that aren’t tight like rockstar pants. If both CarPart and DogLeash share the parent class Purchase, they’re probably not good candidates for STI (wink, wink).

Embarrassed as I am to admit it, that was sort of the problem. More on that later, maybe.

Scraping Gmail with Mechanize and Hpricot

Posted about 7 years back at schadenfreude

This quick tutorial will show you how to use mechanize and hpricot to login to gmail and return a list of Unread emails.

Scraping Gmail with Mechanize and Hpricot

Posted about 7 years back at schadenfreude

This quick tutorial will show you how to use mechanize and hpricot to login to gmail and return a list of Unread emails.

Scraping Gmail with Mechanize and Hpricot

Posted about 7 years back at schadenfreude

This quick tutorial will show you how to use mechanize and hpricot to login to gmail and return a list of Unread emails.

Silence is golden

Posted about 7 years back at The Hobo Blog

Just in case you’re wondering, there’s some new screencasts and a pretty cool Hobo update on its way very soon now. Just thought we’d let you know. :-)

Silence is golden

Posted about 7 years back at The Hobo Blog

Just in case you’re wondering, there’s some new screencasts and a pretty cool Hobo update on its way very soon now. Just thought we’d let you know. :-)