Knowledge and Power

Posted over 8 years back at Ryan Tomayko's Writings

Heard today on lwn.net:

… knowledge is exactly like power – something to be distributed as widely as humanly possible, for the betterment of all. — jd

Word!

Watching people watch stuff at the Magic Kingdom

Posted over 8 years back at Ryan Tomayko's Writings

Neal Stephenson’s In The Beginning was the Command Line was recently put online (legally and with permission from Stephenson, of course). I've always heard this was a must read for anyone with even a mild interest in the history of the computer industry. My vertical scrollbar is at about 37% and already I have to agree.

This alone will be worth the read (and it doesn’t even have anything to do with computing):

I was in Disney World recently, specifically the part of it called the Magic Kingdom, walking up Main Street USA. This is a perfect gingerbready Victorian small town that culminates in a Disney castle. It was very crowded; we shuffled rather than walked. Directly in front of me was a man with a camcorder. It was one of the new breed of camcorders where instead of peering through a viewfinder you gaze at a flat-panel color screen about the size of a playing card, which televises live coverage of whatever the camcorder is seeing. He was holding the appliance close to his face, so that it obstructed his view. Rather than go see a real small town for free, he had paid money to see a pretend one, and rather than see it with the naked eye he was watching it on television.

And rather than stay home and read a book, I was watching him.

I fell in love with Stephenson’s writing style with Snowcrash. The day after I finished Snowcrash, I purchased Cryptonomicon – in hardcover. I have not yet found the time to take on the Baroque trilogy.

Oh yea. There are some mixed feelings about the copy of Command Line linked to. The guy who got Stephenson’s okay to put it online felt it necessary to annotate some bits that were slightly out of date. I read the first couple of annotations, couldn’t really follow, and skipped the rest. They aren’t overly intrusive and might even make sense to a different monkey.


The Static Method Thing

Posted over 8 years back at Ryan Tomayko's Writings

Phillip J. Eby has a recent weblog entry, Python is not Java, wherein he points out a few aspects of Python that are notably different from Java but that are similar enough that Java coders working in Python mistake them for their Java counterparts.

He touches on a few of the characteristics that led me to make Python my general purpose language of choice, a title that had belonged to Java as recently as a year ago. I thought it might be useful to explain these characteristics without assuming the reader has a whole lot of Python experience but that they do have significant Java experience. This would have helped me immensely in determining whether Python was right for me much sooner and I hope it can be of use to someone else.

I'm planning on covering each of the items Phillip points out in his article over the next month and if that works out, I may extend the series to a few other things that may be useful.

This is not a Python is better than Java thing. Instead of bashing aspects of Java’s language and library that are obviously valued by the Java community, or insulting the intelligence of Java coders, or using any number of other techniques you might find in Every Language War Ever, Phillip is simply asking Java programmers who may be dabbling in Python to take a closer look at some of the features that make Python unique and attractive instead of attempting to force-fit concepts from Java. If you base your expectations of Python on Java concepts, you are likely going to have a bad experience with Python.

Continue on for the first part of the series, which talks about some of the differences between Python class methods / class variables and Java static methods / static variables.

Python Class Methods are not Java Static Methods

I’ll start from the top with the comment on static variables and methods in Java and why the equivalent Python code looks a bit different:

A static method in Java does not translate to a Python classmethod. Oh sure, it results in more or less the same effect, but the goal of a classmethod is actually to do something that’s usually not even possible in Java (like inheriting a non-default constructor).

We’ll get into the inheriting a non-default constructor bit in a second. For now, let’s take a look at a static variable and method in Java:

public class X {
  public static String message = "Hello";
  public static void sayHello() {
    System.out.println(X.message);
  }
}

Calling this method from another class looks like this:

import X
public class Main {
  public static void main(String[] args) {
    X.sayHello();
  }
}

Now, if you’re teaching someone to program and using Java to illustrate, this situation kind of sucks. You've introduced classes, the static modifier, and access modifiers – all before you can even start to talk about the basic concept of a function (led alone a method) or writing output to the screen. (Note: this makes Java only mildly less shitty than C++ for teaching programming because at least you don’t have to deal with pointers and operator overloading: cout << "Hello".)

Whether Java’s constraint of requiring everything to be written within a class was a good idea is debatable (and it has been debated to death). On the one hand, it forces programmers to think about programming using OOP concepts, which generally helps ensure that code is somewhat reusable (so the theory goes). On the other hand, it forces programmers to think about programming using OOP concepts, even when they may be overkill. This leads to code like the example above that has a lot of extra scenery to distract you from the intent.

The idiomatic translation of a Java static method is usually a module-level function, not a classmethod or staticmethod. (And static final fields should translate to module-level constants.)

Here’s the idiomatic translation to Python of the Java example above. The following is in a module file named X.py:

message = "Hello"
def say_hello():
    print message

And calling the function from another module (in Main.py):

import X
X.say_hello()

Now let’s look at an example of what not to do in Python:

class X:
    message = "Hello"
    def say_hello(cls):
        print X.message
    say_hello = classmethod(say_hello)

Calling the above from another module looks like this:

import X
X.X.say_hello()

Here’s why: in Python, each module is itself an object without having to define an explicit class. A single module in Python maps to a single .py file on disk. Modules may contain functions, attributes (variables), multiple class definitions, etc. There is no equivalent of the Python module in Java – each .java file must have a single top-level class that matches the filename. Imports in Java work at the class level; imports in Python work at the module level.

This may just confuse things further but it may help for the Java inclined to think of each Python module as acting like a purely static Java class. Functions defined at the top level of a module are static methods on the class. Variables defined at the top level of a module are static members. Classes defined at the top level of a module act like public static inner classes. In other words, the idiomatic Python example maps exactly to the Java example without having to write an explicit class at all. The example of what not to do in Python would look like this if you ported it directly to Java:

public class X {
  public static class X {
    public static String message = "Hello";
    public void sayHello() {
        System.out.println(X.X.message);
    }
  };
}

It should now be obvious why the second form is incorrect in Python, it’s incorrect in Java!

Class Method Inheritance

So why have a Java static method like construct in Python at all? Phillip begins to answer that question in his post:

the goal of a classmethod is actually to do something that’s usually not even possible in Java (like inheriting a non-default constructor).

In a Python classmethod, the class object that the method is invoked on is passed as the first parameter to the method. This can be used for a variety of weird and arcane things but the most common use is for factory-style constructors.

Like static methods in Java, Python classmethods are inherited by subclasses. Unlike Java, it is possible for subclasses to override classmethods. Python also passes the class object that a classmethod is called on to the method as the first parameter. This gives superclasses access to the subclass' methods (such as the constructor).

Here’s an example that provides a caching mechanism for new classes:

class Cacheable:
    _cache = {}
    def create_cached(cls, value):
        if not cls._cache.has_key(value):
            print 'cache miss'
            rslt = cls(value)
            cls._cache[value] = rslt
        else:
            print 'cache hit'
            rslt = cls._cache[value]
        return rslt
    create_cached = classmethod(create_cached) 

class X(Cacheable):
    def __init__(self, value):
        self.value = value

x = X.create_cached('test')  # X instance created
x = X.create_cached('test2') # X instance created
x = X.create_cached('test')  # X instance taken from cache

When we call X.create_cached(), the method is called on the Cacheable class but the cls argument is set to the X class object. This means that many classes can subclass Cacheable and inherit the class methods it defines (including __new__, which is something we won’t get into).

This technique is actually used very rarely in Python as there are usually simpler ways of accomplishing the same thing, and simplicity is highly valued in Python.

I hope this helped illuminate the difference between Python class methods/variables and Java static methods/variables. Remember, the point isn’t to convince you that Python classmethods are somehow cooler than Java static methods or vice versa, just that the two concepts are not equivalent.

Fedora Project Shaping Up

Posted over 8 years back at Ryan Tomayko's Writings

I've had a few conversations recently where someone expressed interest in GNU/Linux and asked about getting involved. I really wanted to suggest that they consider joining the Fedora project but I couldn’t do that comfortably because, well, there are some pretty massive issues.

So I'm really excited to see that Redhat is finally getting around to really supporting the excellent volunteer community that has dedicated themselves to making Fedora, and by extension Redhat’s commercial offerings, excellent distributions. Herewith some recent news that led to this post…

Seth Vidal gave a promising update on the status of Fedora Extras. For those not familiar with the breakdown of the Fedora community, Extras is to be a repository of additional packages for Fedora that aren’t part of the core distro but are packaged and tested using the same tools and processes. It was actually a separate project at one point (fedora.us). To my knowledge, Extras consists almost entirely of volunteer work and has been seriously neglected by Redhat leading to all kinds of disgruntlement among the community.

I'm not sure if Seth is going to be taking on some kind of formal leadership role for the volunteer side but, having worked with him on Yum, I can say that something like that would be hugely valuable to the community. His leadership skills are exceptional and I've always thought he’s done a tremendous job of keeping the crack out of yum.

The other bit is from Colin Charles who declares that this is Fedora’s most productive week (and it’s only mid-week). Again, the emphasis here is that the community is finally being paid for all their hard work and dedication—with tools to do more hard work no less.

It has always baffled me why Redhat hasn’t thrown these guys a bigger bone. It would be like going to Nike and saying, “Hey, if you build a factory, we’ll work for free as long as we can have a couple pairs of shoes.” Nike would have that factory up the next day. It took Redhat almost two years. Crazy.

How I Explained REST to My Wife

Posted over 8 years back at Ryan Tomayko's Writings

Translations of the following dialog available in Japanese, French, Vietnamese, Italian, Spanish, Portuguese, and Chinese. Huge thanks to YAMAMOTO Yohei, Karl Dubost, jishin, Barbz, Tordek, Edgard Arakaki, keven lw, respectively. If you know of additional translations, please leave a comment with the location.

Wife: Who is Roy Fielding?

Ryan: Some guy. He’s smart.

Wife: Oh? What did he do?

Ryan: He helped write the first web servers and then did a ton of research explaining why the web works the way it does. His name is on the specification for the protocol that is used to get pages from servers to your browser.

Wife: How does it work?

Ryan: The web?

Wife: Yeah.

Ryan: Hmm. Well, it’s all pretty amazing really. And the funny thing is that it’s all very undervalued. The protocol I was talking about, HTTP, it’s capable of all sorts of neat stuff that people ignore for some reason.

Wife: You mean http like the beginning of what I type into the browser?

Ryan: Yeah. That first part tells the browser what protocol to use. That stuff you type in there is one of the most important breakthroughs in the history of computing.

Wife: Why?

Ryan: Because it is capable of describing the location of something anywhere in the world from anywhere in the world. It’s the foundation of the web. You can think of it like GPS coordinates for knowledge and information.

Wife: For web pages?

Ryan: For anything really. That guy, Roy Fielding, he talks a lot about what those things point to in that research I was talking about. The web is built on an architectural style called REST. REST provides a definition of a resource, which is what those things point to.

Wife: A web page is a resource?

Ryan: Kind of. A web page is a representation of a resource. Resources are just concepts. URLs—those things that you type into the browser…

Wife: I know what a URL is..

Ryan: Oh, right. Those tell the browser that there’s a concept somewhere. A browser can then go ask for a specific representation of the concept. Specifically, the browser asks for the web page representation of the concept.

Wife: What other kinds of representations are there?

Ryan: Actually, representations is one of these things that doesn’t get used a lot. In most cases, a resource has only a single representation. But we’re hoping that representations will be used more in the future because there’s a bunch of new formats popping up all over the place.

Wife: Like what?

Ryan: Hmm. Well, there’s this concept that people are calling Web Services. It means a lot of different things to a lot of different people but the basic concept is that machines could use the web just like people do.

Wife: Is this another robot thing?

Ryan: No, not really. I don’t mean that machines will be sitting down at the desk and browsing the web. But computers can use those same protocols to send messages back and forth to each other. We've been doing that for a long time but none of the techniques we use today work well when you need to be able to talk to all of the machines in the entire world.

Wife: Why not?

Ryan: Because they weren’t designed to be used like that. When Fielding and his buddies started building the web, being able to talk to any machine anywhere in the world was a primary concern. Most of the techniques we use at work to get computers to talk to each other didn’t have those requirements. You just needed to talk to a small group of machines.

Wife: And now you need to talk to all the machines?

Ryan: Yes – and more. We need to be able to talk to all machines about all the stuff that’s on all the other machines. So we need some way of having one machine tell another machine about a resource that might be on yet another machine.

Wife: What?

Ryan: Let’s say you’re talking to your sister and she wants to borrow the sweeper or something. But you don’t have it – your Mom has it. So you tell your sister to get it from your Mom instead. This happens all the time in real life and it happens all the time when machines start talking too.

Wife: So how do the machines tell each other where things are?

Ryan: The URL, of course. If everything that machines need to talk about has a corresponding URL, you've created the machine equivalent of a noun. That you and I and the rest of the world have agreed on talking about nouns in a certain way is pretty important, eh?

Wife: Yeah.

Ryan: Machines don’t have a universal noun – that’s why they suck. Every programming language, database, or other kind of system has a different way of talking about nouns. That’s why the URL is so important. It let’s all of these systems tell each other about each other’s nouns.

Wife: But when I'm looking at a web page, I don’t think of it like that.

Ryan: Nobody does. Except Fielding and handful of other people. That’s why machines still suck.

Wife: What about verbs and pronouns and adjectives?

Ryan: Funny you asked because that’s another big aspect of REST. Well, verbs are anyway.

Wife: I was just joking.

Ryan: It was a funny joke but it’s actually not a joke at all. Verbs are important. There’s a powerful concept in programming and CS theory called polymorphism. That’s a geeky way of saying that different nouns can have the same verb applied to them.

Wife: I don’t get it.

Ryan: Well.. Look at the coffee table. What are the nouns? Cup, tray, newspaper, remote. Now, what are some things you can do to all of these things?

Wife: I don’t get it…

Ryan: You can get them, right? You can pick them up. You can knock them over. You can burn them. You can apply those same exact verbs to any of the objects sitting there.

Wife: Okay… so?

Ryan: Well, that’s important. What if instead of me being able to say to you, “get the cup,” and “get the newspaper,” and “get the remote”; what if instead we needed to come up with different verbs for each of the nouns? I couldn’t use the word “get” universally, but instead had to think up a new word for each verb/noun combination.

Wife: Wow! That’s weird.

Ryan: Yes, it is. Our brains are somehow smart enough to know that the same verbs can be applied to many different nouns. Some verbs are more specific than others and apply only to a small set of nouns. For instance, I can’t drive a cup and I can’t drink a car. But some verbs are almost universal like GET, PUT, and DELETE.

Wife: You can’t DELETE a cup.

Ryan: Well, okay, but you can throw it away. That was another joke, right?

Wife: Yeah.

Ryan: So anyway, HTTP—this protocol Fielding and his friends created—is all about applying verbs to nouns. For instance, when you go to a web page, the browser does an HTTP GET on the URL you type in and back comes a web page.

Web pages usually have images, right? Those are separate resources. The web page just specifies the URLs to the images and the browser goes and does more HTTP GETs on them until all the resources are obtained and the web page is displayed. But the important thing here is that very different kinds of nouns can be treated the same. Whether the noun is an image, text, video, an mp3, a slideshow, whatever. I can GET all of those things the same way given a URL.

Wife: Sounds like GET is a pretty important verb.

Ryan: It is. Especially when you’re using a web browser because browsers pretty much just GET stuff. They don’t do a lot of other types of interaction with resources. This is a problem because it has led many people to assume that HTTP is just for GETing. But HTTP is actually a general purpose protocol for applying verbs to nouns.

Wife: Cool. But I still don’t see how this changes anything. What kinds of nouns and verbs do you want?

Ryan: Well the nouns are there but not in the right format.

Think about when you’re browsing around amazon.com looking for things to buy me for Christmas. Imagine each of the products as being nouns. Now, if they were available in a representation that a machine could understand, you could do a lot of neat things.

Wife: Why can’t a machine understand a normal web page?

Ryan: Because web pages are designed to be understood by people. A machine doesn’t care about layout and styling. Machines basically just need the data. Ideally, every URL would have a human readable and a machine readable representation. When a machine GETs the resource, it will ask for the machine readable one. When a browser GETs a resource for a human, it will ask for the human readable one.

Wife: So people would have to make machine formats for all their pages?

Ryan: If it were valuable.

Look, we've been talking about this with a lot of abstraction. How about we take a real example. You’re a teacher – at school I bet you have a big computer system, or three or four computer systems more likely, that let you manage students: what classes they’re in, what grades they’re getting, emergency contacts, information about the books you teach out of, etc. If the systems are web-based, then there’s probably a URL for each of the nouns involved here: student, teacher, class, book, room, etc. Right now, getting the URL through the browser gives you a web page. If there were a machine readable representation for each URL, then it would be trivial to latch new tools onto the system because all of that information would be consumable in a standard way. It would also make it quite a bit easier for each of the systems to talk to each other. Or, you could build a state or country-wide system that was able to talk to each of the individual school systems to collect testing scores. The possibilities are endless.

Each of the systems would get information from each other using a simple HTTP GET. If one system needs to add something to another system, it would use an HTTP POST. If a system wants to update something in another system, it uses an HTTP PUT. The only thing left to figure out is what the data should look like.

Wife: So this is what you and all the computer people are working on now? Deciding what the data should look like?

Ryan: Sadly, no. Instead, the large majority are busy writing layers of complex specifications for doing this stuff in a different way that isn’t nearly as useful or eloquent. Nouns aren’t universal and verbs aren’t polymorphic. We’re throwing out decades of real field usage and proven technique and starting over with something that looks a lot like other systems that have failed in the past. We’re using HTTP but only because it helps us talk to our network and security people less. We’re trading simplicity for flashy tools and wizards.

Wife: Why?

Ryan: I have no idea.

Wife: Why don’t you say something?

Ryan: Maybe I will.

Blasphemy!

Posted over 8 years back at Ryan Tomayko's Writings

At this very moment, Linus Torvalds has a massive 42 vote lead on the next closest contender for the Twenty Top Software People in the World. Unfortunately, that next closest contender is Alan Turing. Blasphemy! This is just wrong on so many levels. I have a ton of respect for Linus and all but come on people, if you go to this site and don’t use a vote on Turing you need to get your head examined. The closest Linus should come to Turing is a tie: if every single visitor to the site votes for both.

The whole thing is stupid, really. What might be more interesting would be to allow readers to submit names (not select them) with a short one or two paragraph justification for the submission. Then, show all the names and let people read the justifications.

But the world doesn't work that way

Posted over 8 years back at Ryan Tomayko's Writings

I was searching for a specific piece of Dive into Python when I ran into this classic from diveintomark.org:

Tom: “This is really good. You could probably make some money off this someday.”
Mark: “Maybe, but I'm not going to. I'm giving it away for free.”
Tom: “Why would you do that?”
Mark: “Because this is the way I want the world to work.”
Tom: “But the world doesn’t work that way.”
Mark: “Mine does.”

I had to pour out a little liquor for my homies.

Transformation Templates in Kid

Posted over 8 years back at Ryan Tomayko's Writings

This is the second post in response to cross-breading ZPT and XSLT. I'd like to dig into how I'd like templating to work in Kid and Leslie opens the door for me:

… maybe this is the sort of thing Ryan’s thinking about— I wonder how hard it would be to hack this into Kid? It would give only a subset of XSLT’s capabilities in trade for simplicity, and would only offer the pull approach, but it would give XML-pipelining to a ZPT-ish technology.

I'm not sure where to start so I’ll just dump out a little piece of a Kid template I've written as a unit test (i.e. there’s no code that processes this yet but it’s what I want to happen):

<p kid:match="test">
  This is a test. This paragraph was inserted in place of a 
  &lt;test&gt; element found in the source document.
</p>

The XSLT equivalent would be:

<xsl:template match="test">
  <p>This is a test. This paragraph was inserted in place of a
  &lt;test&gt; element found in the source document.</p>
</xsl:template>

Both examples transform this document:

<testdoc>
  <test>bla bla bla</test>
</testdoc>

… into this:

<testdoc>
  <p>This is a test. This paragraph was inserted in place of a
  &lt;test&gt; element found in the source document.</p>
</testdoc>

That’s a simple example but serves to illustrate the general concept of specifying match criteria to trigger template invocation. To me this is far superior to, let’s say, the ZPT/METAL model where the source document must specify a macro explicitly:

<testdoc>
  <span metal:use-macro="test" tal:omit-tag=""/>
</testdoc>

Bringing XSLTish templates to Kid gets complex fast. In XSLT, the match value is an XPath match expression and lets you get pretty crazy with specifying when a template matches a node in the source document:

<xsl:template match="pet[@kind = 'dog' and ../@name =  'Jack']">
...

That matches all <pet> elements with a kind attribute of dog and whose parent has a name attribute of Jack. I'm not sure what the best way to do this in Kid would be. We could make kid:match a Python expression but there’s a couple of issues with that. First, the expression would be verbose:

<p kid:match="context.tag = 'pet' and context.items['kind'] == 'dog'
              and context.parent.items['name'] = 'Jack'">

Too loud.

Second, there’s a performance penalty with using expressions. Each element in the source document would have to be run through n expressions to see if any templates match. If kid:match were just a tag name, we could do a dictionary lookup, but we'd lose a valuable piece of what makes XSLT templates so powerful.

The other option would be to try to use XPath match expressions in Kid exactly like they’re used in XSLT. I'm not completely opposed to this idea but ElementTree, which I'm pretty committed to at this point, supports only light XPath functionality.

But yeah, I plan to steal as much from XSLT’s template approach as possible while keeping things somewhat sane and simple. If there’s any ideas or opinions out there, please let me know.

I've got one more post in me on this topic that I hope to get out either tonight or tomorrow. It will be on the ElementTree based pullparser I've been working on and how it plays into Kid’s templating system.

Why isn't there a simple XSLT?

Posted over 8 years back at Ryan Tomayko's Writings

Leslie de 0xDECAFBAD talks a bit about cross-breading ZPT and XSLT and mentions Kid along the way. This is the first in a series of response posts.

More than once, I've wished that XSLT was as simple as ZPT (i.e. less verbose and intrusive, more document centered), and I've wished that ZPT had some of the features of XSLT (i.e. ability to be used as a transforming filter).

This is pretty much the exact thing going through my head when I decided to start in on Kid. The first half, the TAL/ZPT feel, is already quite visible in the Kid Language Reference. The second part, constructs for transformation, is what has been eating my brane for the past month or so. I'd like to expound on that a bit but first I want to get through a couple of other things Leslie mentioned that I've spent some time wondering about myself:

This strikes me as such an obvious idea that someone has to already have done it and possibly rejected it for good reason.

Here’s my take on this: XSLT people value XSLT’s programming language neutrality. Baking in support for a specific language would mean doing it in a manner that is language agnostic. XSLT 1.1 (abandoned) was to provide a programming language neutral mechanism for adding XPath extension functions. This proved to be fairly difficult and I think this may have led to general apathy in the XSLT crowd for mixing XSLT with more traditional programming languages. A little bit later XSLT was proven Turing complete. XSLT people stopped thinking about making things simpler since anything was technically possible and started looking to XSLT 2.0 to provide the types of ease-of-use features you would get with a programming language.

And then there’s XQuery. From a recent conversation on xml-dev, I'm getting the feeling that XQuery may become a general purpose programming language that is extremely XML aware. I've not spent much time on XQuery or XSLT 2.0 because casual reads through the early specs left a bad taste in my mouth as they both seemed waaaaaaaayy more complex than anything I would ever consider working with – especially in a Python environment.

But in some senses, the idea of mixing a general purpose language with XSLT has been tried and proven somewhat useful. Also, libxml2/libxslt has a fairly straight-forward method for writing XPath extension functions. Still, this seems like a duct-tape solution that just masks XSLT’s complexity and limitations in limited scenarios. XPath extension functions are a bit like putting a pillow over your head right before someone rockets a cannonball at your face. They make things a bit easier but you’re still in a lot of pain. IMO, you need to start fresh valuing simplicity over completeness but keeping in mind the concepts that make XSLT attractive.

The Day Tim Bray Saved Java

Posted over 8 years back at Ryan Tomayko's Writings

In Weapons and Coding, I made a prediction:

The first environment [Java / .NET] to successful mesh static and dynamic languages into a coherent platform will win the interpreted byte-code market.

Tim Bray must have come to a similar conclusion because he recently organized a meeting of the minds at Sun to talk about Dynamic Java. Here’s a great pic of the BDFL and Larry Wall Tim shot right before they pulled out their katana’s to settle the Python vs. Perl debate like gentlemen.

Katana

In my mind, Tim is moving into this small classification of people labeled, Hero. His past work on XML (as in, his name is on the Recommendation), recent work on Atom, declaration of The Loyal WS-Opposition, contributions through W3C TAG on the excellent Architecture of the World Wide Web, and now this seemingly unrelated initiative to get Sun to wake up about dynamic languages puts Tim on the right side of almost every major area of innovation I'm interested in. Go Tim, go!

I really hope this leads to some serious discussion on how we can bring static and dynamic languages together into a single cohesive platform. Drop the MFL is better than YFL talk found in Every Language War Ever. We need to start having these types conversations: MFL is a compliment to YFL. This is happening only in very small pockets right now and until today, they were pockets without a lot of potential for real impact.

FC2 to FC3 upgrade with Yum

Posted over 8 years back at Ryan Tomayko's Writings

I finally got around to upgrading one of my non-critical FC2 desktop boxes to FC3 using yum and figured I'd dump my notes for Google. This covers the basic upgrade process.

You may also want to take a peek at Seth’s original post to fedora-test-list on FC2 to FC3 upgrades or Brandon Hutchinson’s notes on various Redhat and Fedora upgrades with yum.

Before we begin

Commands with sudo assume you have root privileges on the box. You can use sudo, su over to root, or something else – whatever. Just make sure these commands are executed as root.

Install GPG Keys

You may already have the Fedora GPG keys installed. If not, or just to be safe, you can run this command:

$ sudo rpm --import http://fedora.redhat.com/about/security/4F2A6FD2.txt

Install fedora-release and yum packages

You will need to do this the old fashioned way1.

$ rpmbase=http://redhat.secsup.org/fedora/core/3/i386/os/Fedora/RPMS/
$ wget $rpmbase/yum-2.1.11-3.noarch.rpm
$ wget $rpmbase/fedora-release-3-8.i386.rpm
$ sudo rpm -Uvh --force fedora-release*.rpm
$ sudo rpm -Uvh yum-*.rpm

Upgrade Yum, Python, and RPM

Upgrade yum, python and rpm. This will help make sure the rest of the upgrade goes smoothly. There are some issues in the rpm provided with FC2 that have been worked out in FC3.

 $ sudo yum upgrade rpm python yum

Just for fun, and with the memory of yum 2.0 fresh in your mind, run the following:

 $ sudo yum list updates

This should just fly compared to yum 2.0. We all supported Seth in his decision to sell his soul to the devil for this speed-up. Seth, your soul has saved thousands of people thousands of minutes of waiting; we thank you!

Upgrade everything else

 $ sudo yum upgrade

This will take a while. Even if you get lucky and hit on a fast mirror, it’s still going to take a while.

Fix conflicts if needed

If you stuck with the Fedora Core, Extras (fedora.us), and Livna repositories you shouldn’t have any problems here. However, if you've installed RPMs from other third party repositories or RPMs found in the wild, you may run into a few conflicts.

For example, I got the following on my initial run of yum upgrade:

Error: missing dep: libgtop-2.0.so.2 for pkg gdesklets

The gdesklets RPM I had installed was built by me and likely had multiple packaging issues. The easiest way to proceed is to just remove the offending packages. Worry about reinstalling them after the upgrade:

$ sudo rpm -ev gdesklets
error: Failed dependencies:
    gdesklets is needed by (installed) gdesklets-starterbar-0.30-0.noarch
$ sudo rpm -ev gdesklets gdesklets-starterbar

After clearing up conflicts start the upgrade again:

$ sudo yum upgrade

Wrapping Up

You may want to go into /etc/grub.conf as root and change the default kernel:

default=0
timeout=10
splashimage=(hd0,0)/grub/splash.xpm.gz
title Fedora Core (2.6.9-1.681_FC3smp)
    root (hd0,0)
    kernel /vmlinuz-2.6.9-1.681_FC3smp ro root=LABEL=root2 rhgb quiet
    initrd /initrd-2.6.9-1.681_FC3smp.img
title Fedora Core (2.6.8-1.521smp)
    root (hd0,0)
    kernel /vmlinuz-2.6.8-1.521smp ro root=LABEL=root2 rhgb quiet
    initrd /initrd-2.6.8-1.521smp.img

I've selected the newly install 2.6.9 kernel as my default.

All that’s left to do now is to reboot. I haven’t had a chance to dive into changes but nothing has jumped out as being completely broken.

XML Pull-chaining with Python

Posted over 8 years back at Ryan Tomayko's Writings

So this is pretty crazy. I'm messing around with ElementTree (which has been nothing less than perfect) and trying to get it to act like a xml.dom.pulldom/XmlTextReader style pull-parser. But I'd like to be able to assemble a chain of generator producing/consuming functions (or other callable) so that the file can be read, parsed, filtered/mutated, encoded, and written all incrementally.

Check it out:

import sys
import pulltree    # that's what I'm working on :)

def upper_filter(source):
    for (ev, item) in source:
        if ev == pulltree.CHARACTERS:
            item = item.upper()
        yield (ev, item)

reader = pulltree.reader(sys.stdin)
filter = upper_filter(reader)
writer = pulltree.writer(filter, sys.stdout)

for (ev, item) in writer:
    pass

C-z

$ echo "<hello>world</hello>" | python test_filter.py 
<hello>WORLD</hello>

That felt good. More functional than a chain of SAX XMLFilters, almost as efficient, and muuuuch perdier.

Something like this might work someday soon:

import urllib2
from pulltree

XINCLUDE = '{http://www.w3.org/2001/XInclude}include'

def xinclude_filter(source):
    events = iter(source)
    for (event, item) in events:
        if event == pulltree.START_ELEMENT and elm.tag == XINCLUDE:
           href = item.attrib['href']
           for woot in pulltree.reader(urllib2.urlopen(href))
               yield woot
           pulltree.eat(elm, events) # eat events to the end of the element
        yield (ev, elm)

Granted, that’s as basic an XInclude processor could be and still be useful but you get the point.

Is BoingBoing a Legal Honeypot?

Posted over 8 years back at Ryan Tomayko's Writings

It occurred to me today that Cory Doctorow et al. may be using BoingBoing as a legal honeypot: a sort of tractor beam for litigation the EFF may be interested in testing court…

Update pending on this. Xeni Jardin emailed me and I'm completely stoked right now. Too star struck to get a proper paragraph out. Short version is that it’s not and there’s no hidden agenda.

Since I've had a weblog, I've wondered about limitations to things like deep linking and using images/excerpts from other sites. If you follow BoingBoing, you know that almost every article is accompanied by images, audio, video, excerpts, links, etc. all taken from some other site.

There is some question as to what the law is surrounding these things. Advocates of the Free Culture movement, and Docotorow is one of the biggest, will tell you that having the ability to use and remix content is a cornerstone of the web. Indeed, this is a large part of what the EFF is fighting for.

It is important to note that the EFF is not a lobbying organization. Organizations like IPac work in the interest of the general public by lobbying congress to pass—or more often, reject—legislation related to intellectual property. The EFF, on the other hand, does all its fighting in the courts. This is an important distinction because it means that the EFF requires something hit the courts before it can move into action.

I think Doctorow and the EFF are tired of waiting around while companies use FUD tactics instead of real legal action and are trying to find ways to push some of this stuff into the courts.

Take this BoingBoing post on FastCompany’s linking policy, which basically states that you can’t link to their pages without express written permission. In the post, Docotorow blatantly links to their site four times as he’s writing about how it’s against their policy.

Translation: “bring it on!”

Doctorow and the EFF know that the policy won’t hold up in court but as long as FastCompany is selective in who they threaten to litigate, it never has to go to court – individuals will simply comply with FastCompany demands to avoid the costs in taking on a lawsuit.

BoingBoing has been diligent in linking to every site that pops up with a restrictive linking policy (NPR, Sellotape, etc). No one seems to be biting though. Hmmm, odd, maybe it has to do with all those Stanford lawyers working for the EFF.

Yesterday was probably the best example of this I've seen to date. Xeni Jardin (also a member of the EFF: My bad. Xeni is not a member of the EFF. Her work as a tech. journalist precludes her from belonging to the EFF because she sometimes covers them.) posts to BoingBoing on the recent Sony v. Kottke fiasco where Sony is threatening litigation against Jason Kottke for hosting some audio excerpts and text transcripts from the show Jeopardy. Long story short, Kottke (whose readership, btw, is likely in the hundreds of thousands) removed the content and is scared shitless because he doesn’t know what he might get sued over next. This obviously falls within the type of rights the EFF believes the public should have and so the BoingBoing post includes links to the video, audio, text transcripts, and other stuff that should trigger a notice from Sony. But Sony isn’t going to take on BoingBoing because, well, there’s that whole lawyer thing (e.g. BoingBoing has them).

This same trend can be seen elsewhere and people are getting tired of it. There’s an excellent article at NetworkWorldFusion, entitled Vendors: Stop threatening us regarding the recent copyright/patent FUDfest surrounding Linux:

What we should fear is possible litigation over unlicensed, patented material that may or may not be in Linux.

[Snip a bunch of background on Sun being silly and Balmer’s recent threats about IP indemnification]

Ptui, I say. If you've got patents, play that card, and play it now. We’re sick, mighty sick, of listening to it. Tell us, chapter and verse, what you think is the problem. Do it now. Stop the threats – go in to action. We’re the customers, remember? Stop harassing us with vague threats.

This chest thumping, lawyer-enriching hubris, and cold intimidation is causing this industry to slink around as though we’re all guilty of something that we didn’t do. If there’s a license to pay, then we’ll pay it. If you as vendors have manipulated the patent offices into granting a patent for something that you didn’t invent, then back off. If you so thoroughly patent your future products so that no one can work with them, then you've lost compatibility, interoperability, and therefore any worth to us.

Possible threat of litigation (not actual litigation) has become an accepted competitive force. This is something separate from actual copyright or patent; those just happen to be the vehicles being used right now for this new thing. Copyright/patent works great because the law around them is fuzzy to the general public. To label this force “intellectual property” is a fallacy though, it has nothing to do with IP.

We talk about how IP is part of a company’s asset portfolio, and copyrights and patents should be a big part of company’s assets. But the real asset that’s been such a boon of late isn’t the copyright or patent that grants specific limited monopoly, we've had those since Ancient Egypt and they've generally worked extremely well in society to balance the rights of the public against the rights of artists, authors, and inventors. The cool new asset is the capacity at which a company can create cost/liability for negatively impacting entities out of thin air, where a negatively impacting entity might be a competitor (Microsoft v. Linux) or a consumer (Sony v. Kottke, RIAA/MPAA v their customers, etc).

Organizations with significant legal capabilities can make up laws by simply threatening potential litigation so long as the potential cost of challenging the threat in court is greater than the challenger can afford. That’s what needs reformed. Not the basic copyright and patent law (well, some of that is shaky too) but the process for determining whether someone is, in fact, infringing. We've seen in the case of BoingBoing that the effectiveness of IP threat is reduced significantly given capable legal backing. If there were a low cost means of estimating the validity of a copyright/patent litigation threat, I believe a lot of this non-sense would just disappear.

Kid 0.2 and a note on Template Design

Posted over 8 years back at Ryan Tomayko's Writings

I pushed up a much needed 0.2 release of Kid today. I hadn’t meant for my previous post to be an announcement but I got quite a few comments showing interest and was surprised to see some people actually grabbing the 0.1.1 release. As it was, for all intents and purposes, not a release at all. The 0.2 release should be somewhat more stable. At least enough to dive in and play around.

I usually don’t enjoy documentation but for the most part, the doc has been writing itself on this project. I think the key is making sure you are concentrating on something small enough that you can see the end of the documentation.

When you’re documenting a year’s worth of coding – you've lost. It just isn’t going happen.

One of the topics that has come up in comments and email is the ability to embed blocks of Python code in the XML template, much like you would with PHP, JSP, ASP, etc. but actually not like those at all because it’s not useful for controlling output, only for priming the context for template execution.

But code in templates is generally considered bad form as it can lead to a mash of concerns. I agree with this as a best practice and subscribe to something pretty close to a standard MVC design pattern myself. However, my experience with TAL, which allows only Python expressions, and those only in controlled circumstances when you have no other choice, was that I often wanted to just drop a couple of quick formatting-specific functions somewhere. Instead of putting them in the template, I was forced to pile them into a cruft module that just pissed me off every time I had to use it.

One of the reasons I'm so attracted to Python is that there’s very little “saving people from themselves.” Whether it be the liberal take on private attributes in classes, the ability to modify an objects member __dict__, the disregard for interface as contract, or a myriad of other aspects of the language, Python’s utter disrespect for all things sacred and holy in protecting machines from the idiot programmers that abuse them really flipped a switch for me.

I've grown to expect this from my tools and libraries. The “force them to be good and proper programmers” attitude around Java programming has become a drag.

But I still program like that in Java because it makes sense in Java, which is even more puzzling. One might assume that upon appreciating the ability to get at pseudo-private members in Python that the practice would transfer to other languages. My Java classes instance variables should be at the very most protected, right? It doesn’t work that way. The concepts aren’t portable between languages.

This goes for other aspects of the language as well. Unchecked Exception’s immediately come to mind as something I value tremendously but cannot, even though technical feasible, bring myself to advocate seriously on the Java side.

I'm rambling now so I'm going to end this with a bit I finished up on tonight from the Kid Tutorial, where an attempt is made to convince the programmer that there is rarely, if ever, a situation where making blanket rules about what types of ways you might allow yourself to manipulate a machine is in your best interest. You can always refrain from doing something you’re able to do, but you can never do something you’re not able to. That’s not to say a system should not have constraints, just that it should not be constrained needlessly.

And of course, the following references to the Worse is Better mindfck is attributed to the masterful Richard Gabriel (or whatever his real name is).

A Note on Template Design

It is possible to embed blocks of Python code using the <?kid?> processing instruction. However, the practice of embedding object model, data persistence, and business logic code in templates is highly discouraged. In most cases, these types of functionality should be moved into external Python modules and imported into the template.

Worse is often not better.

Keeping large amounts of code out of templates is important for a few reasons:

  • Separation of concerns. Templates are meant to model a document format and should not be laden with code whose main concern is something else.

  • Editors with Python support (like Emacs) will not recognize Python code embedded in Kid templates.

  • People will call you names.

Clearly, worse is not better.

That being said, circumstances requiring somewhat complex formatting or presentation logic arise often enough to incline us to include the ability to embed blocks of real code in Templates. Template languages that help by hindering one’s ability to write a few lines of code when needed lead to even greater convolution and general distress.

Worse is sometimes better.

That being said, there are some limitations to what types of usage the <?kid?> PI may be put to. Specifically, you cannot generate output within a code block (without feeling dirty), and all nested blocks end with the PI.

You cannot do stuff like this:

<table>
  <?kid #
  for row in rows: ?>
    <tr><td kid:content="row.colums[0]">...</td></tr>
</table>
<p><?kid print 'some text and <markup/> too'?></p>

This is a feature.

Worse is sometimes better but oft is just worse.

Not exactly sure what I'm trying to say with that whole deal but I think that’s the point. You need to be able to make decisions when you need to make decisions.

Thanks for listening!

In search of a Pythonic, XML-based Templating Language

Posted over 8 years back at Ryan Tomayko's Writings

I've been searching for the perfect Python based XML template language. I was happy to find TAL (and specifically, SimpleTAL) a while back but, although neither of us wants to admit it, we've been growing apart for some time now. I spent last week looking for options and, after careful consideration and planning (read: beer and a nap), decided to just build the XML template language I really wanted.

There’s at least four billion and nine text based template languages for Python but there aren’t a lot of options that fit nicely into the XML tool-chain. Or, if they do fit nicely into the XML tool-chain, they don’t fit nicely with Python.

My dreamboat XML template language would combine the pythonicness and simplicity of PTL, the templating features and pipeline-ability of XSLT, and the terseness of Zope’s TAL. I'm building it, it’s called Kid, and I'm making good progress to be honest.

But I have this overwhelming NIH feeling so I've decided the best thing to do is to run through the current set of tools and take a professional, objective look at why each isn’t getting it done for me (i.e. make fun of minor flaws and limitations until I feel better about myself). Herewith, a look at the good and the bad in the Python XML templating space…

TAL (Template Attribute Language)

I like TAL and have used it fairly extensively for XML templating. In fact, the current Kid Language Spec looks nefariously like TAL sans METAL and TALES.

The things I kept:

  • Attribute based

    TAL and Kid consist entirely of attributes and no elements. It latches onto whatever markup you’re already dealing with. I like this approach as opposed to XSLT’s verbose element model because it results in a lot less clutter. TAL documents are ~80% primary markup and 20% TAL markup. XSLT documents are ~80% XSLT markup and 20% primary markup.

  • Tool Friendliness

    TAL survives round-trips through most tools. This is important.

The things I left:

  • TALES

    I liked TALES at first but have come to point where I think having different types of expressions (python, path, string, etc.) just gets in the way. Python expressions do the job just fine (replace “/” by “.” and you usually come out okay).

  • METAL

    Maybe I just don’t get it but I was hoping for more of an XSLT <template> type construct or even a taglib approach. METAL is overly complex and doesn’t seem to do much other than insert stuff. I really want to be able to build up my own element definitions and have something to process them.

XSLT (XSL Transformations)

XSLT is the strange lover you can’t seem to leave behind. While I'm infatuated with the language from a theoretical point-of-view, I can’t stand typing all that crap over and over and over again. It’s too verbose, and the lack of a clear extension mechanism makes it hard to do things outside of document in, transform, document out.

That being said, I like XSLT. James Clark is my hero and I would totally have his babies if I was biological capable of producing them. Concepts I will be stealing from XSLT are:

  • Templating

    Where XSLT has always shined for me was simple transformation based on matching element names in the source document and then turning them into something else in the output. In other words, making up your own tags or modifying the behavior of existing tags.

    Kid will have some kind of facility to provide a rough equivalent of XSLT’s <template> functionality.

  • Attribute Interpolation

    I always wished TAL would have gone this route instead of having tal:attributes.

A few things I dislike about XSLT and will be trying to avoid:

  • It’s silly complex for simple templating.

  • The whole world isn’t available in XML just yet. Anything not XML is off-limits without extension functions.

  • Once you start using the document() xpath function, your happy world begins crashing down upon you. I still can’t explain this all that clearly but every time I've tried to composite documents using the document() function, complexity increases significantly.

PXTL (Python XML Template Language)

I have very little experience with PXTL but a run through the language specification gave me the impression that it was stunningly thorough, correct, and quite complete. It is a lot of stuff though. I just can’t see this being picked up by someone quickly due to the large number of constructs and apparatus.

PTL (Python Template Language)

Quixote’s PTL is cool but is text based so it doesn’t really satisfy the basic requirements. There is one thing I really like about PTL that I was able to bring over into Kid: the ability to compile templates down to native Python modules, import them, use them like normal objects, etc. Kid does the same thing with XML documents. In fact, I ripped a ton of nice import hook code from Quixote to add the ability to import Kid templates without having to pre-compile them. They work just like .py files in that they are compiled the first time they are imported and written to a .pyc file.

I have a feeling Quixote with PTL for text based media types and Kid for XML based media types will make a really solid, simple, pythonic web framework.