Wednesday, December 3, 2008

Summary of November Sponsored Twisted Development

I've just completed another round of sponsored Twisted development (the sixth so far). This period sees a lot of documentation improvements, as well as various bug fixes and continued work on a new HTTP client for Twisted Web (#886).

These are the documentation issues which were resolved:

#2281 - Annotations for Twisted Finger Tutorial
#3548 - twisted.conch.client.knownhosts.PlainEntry misdocuments its "_hostnames" attribute.
#3455 - CONNECTION_LOST not an Integer...
#3537 - Conch's public key authentication process is confusing.
#3490 - FTPClient errors should provide ftp errorcode

There were some improvements to Twisted Web:

#1878 - twisted.web.monitor traceback_ AttributeError: class IChangeNotified has no attribute '__class__'
#2402 - client.py crashes on URL's that would be no problem for most browsers
#3192 - HTTPClientFactory sets followRedirect on the HTTPPageGetter class
#3469 - Exception is rendered when NotFound is more appropriate.

A problem with Conch's SFTP server's reporting of modification times was fixed (and then a problem with the unit tests for the fix was fixed):

#3503 - Wrong date format delivered by twisted.conch.ls.lsLine
#3551 - TZ=Pacific/Auckland python2.4 ./bin/trial twisted.conch.test.test_cftp.ListingTests fails

With #3551, I was once again reminded that working with timezones is extremely challenging. This time, I found that the range of local renderings of any particular UTC timestamp is greater than 24 hours, so you cannot rely on any particular timestamp falling on a particular day in an arbitrary timezone. So like other unit tests in Twisted, the unit tests for lsLine explicitly set the timezone when they want to make assertions which depend on it, then reset it to its original value. Unfortunately, this means they can't run on Windows, since Windows apparently lacks APIs for doing this. I also rediscovered that time.tzset() is a no-op, at least on Linux (but the unit tests call it anyway, in case Solaris or HP-UX or some other POSIX implementation requires it).

An important issue with Twisted's Jabber support was fixed as well (#3463). This one prevented Twisted's Jabber client from successfully negotiating TLS when connected to Google Talk (and possibly other Java-based Jabber servers). This problem was ultimately caused by a bug in the TLS support on Google Talk which caused TLS negotiation to fail if the client included a session ticket (RFC 5077) section in the handshake. This is allowed and servers which do not support session tickets should ignore the section, but for some reason it causes problems with Google Talk. OpenSSL (on which Twisted's SSL support is based) 0.9.9 enables session tickets and the 0.9.8 package distributed with some platforms (eg Ubuntu 8.10) includes a backport of this feature. So Twisted's Jabber client cannot communicate with Google Talk if one of those versions of OpenSSL is installed.

A couple issues related to Python 2.6 support were fixed:

#2763 - md5 and sha module will be deprecated in python 2.6
#3545 - Support Python 2.6 in the Windows build system.

A bug in Twisted Mail's IMAP4 client which prevented the unseen part of a server's response to a select or examine command from being made available to applications was fixed (#3550).

And there were several other assorted fixes:

#3521 - Documentation for `processExited()` conflicts with the implementation
#3315 - t.p.reflect.safe_repr includes the wrong traceback and misformats the return value
#3541 - twisted.internet.abstract.FileDescriptor.loseConnection drops reason (will be reported as clean shutdown)
#3544 - bin/admin/change-versions should update the main README file

I also spent some time working on converting Twisted Lore from using microdom, Twisted's XML parser and DOM implementation (circa 2002), to using minidom, the XML parser and DOM implementation in the Python standard library. For a long time, microdom was better than any of the alternatives, but it's seen very little maintenance in the past several years and there are some problems with Lore (eg #414) which are caused by behavior of microdom that it would be difficult to change.

Unfortunately, switching to minidom brings a new set of problems. It's still probably worthwhile, but it seems like it's harder to use than it should be. Some of the issues I've run into so far (and I'm not done yet):

  • minidom's constructors are less convenient than microdom's constructors. For example, the Text constructor doesn't accept a string to use as the text node's value. Instead, you have to instantiate Text and then set an attribute. This expands code which should have been one line into three lines. And to make things worse, a Text instance with no data set raises an exception from the __repr__ method.
  • The parse error exceptions raised by expat are less informative than the parse error exceptions raised by microdom. For example, if a document contains mismatched tags, microdom reports the name and location of both the opening and closing tags. minidom will report only the location of the closing tag. It's easy enough to find out the name of the closing tag by finding that location in the input document, but finding the offending start tag means parsing the document in your head. For any non-trivial document this is ridiculous. Fortunately, by switching to sax and providing custom error and content handlers, most of the information can be recovered. This information is always useful though, and it would be better if minidom provided it by default.
  • Once you switch to xml.sax, you have to remember to disable its validation features or it will try to retrieve DTDs from the internet every time you parse something. This is bad, bad default behavior.
Some of these issues may turn into Python bug reports once I've made more progress on converting Lore. A much bigger difficulty with the conversion than the problems minidom has is the fact that Lore is largely pre-UQDS code. Some of it has tests, but they're mostly whole-system tests which compare gigantic xhtml strings and are subject to extremely obscure failures. And most of it doesn't have any tests at all.

Switching away from microdom should be worth the effort though. minidom is a bit faster and Lore will generate better looking output once the switch is done.

Itamar points out I didn't separate tickets into groups for those I reviewed and those I did development on myself this time. So let me point out that of the above tickets, many were developed by others and reviewed by me. The Twisted development process is highly collaborative. I couldn't accomplish anything without the help of all the other great Twisted developers who volunteer to contribute to Twisted in their free time. If you want to find out which are which, head over to the Twisted issue tracker where you can look up the development(/authorship/etc) history of any ticket.

That's all for now. Thanks again to the Software Freedom Conservancy, all of the Twisted Sponsors, and all the other developers who contribute to Twisted (hi Michael!).

No comments: