Tuesday, December 30, 2008

Summary of December Sponsored Twisted Development

With a lot of help from Allen Short, two more weeks of sponsored Twisted development have just been completed. During this round, I continued to work on the new HTTP client and spent some time trying to resolve some long-standing issues around Twisted's support for starting and controlling child processes. Meanwhile, Allen developed a plan for integrating the SIP code, maintained by Divmod, Inc. outside of Twisted for several years, into Twisted and the existing SIP support in Twisted.

The SIP-related work that Allen did encompassed these tickets:

#2194 - small bug in SIP Via header generation
#3575 - Common implementation of RFC 2617 digest authentication
#3582 - Improve SIP URI parsing/formatting
#3583 - Include SIP message-parsing changes from Sine
#3584 - SIP transport layer and transaction layer

These are the process tickets that I worked on:

#733 - twisted's SIGCHLD handler breaks popen
#1997 - perhaps wakeUp could be slightly simpler (closed)
#2967 - Reaping child processes has superlinear complexity on POSIX
#3571 - intermittent spawnProcess failure in test_process on Linux (closed)
#3576 - add high-level, cross-platform close-on-exit togglers to t.i.fdesc

A lot of this work consisted of getting some old code which Glyph had previously worked on but never completed updated and brought closer to being ready for a real code review. The summary of #1997 is a bit misleading - while the change did simplify the reactor's "wake up" mechanism, it also removed a race condition, fixing a bug which could cause certain events to fall into a limbo where they could only ever be processed after another event arrived.

Aside from process related tickets and the HTTP client, I worked on a few other tickets as well:

#2808 - AMP should raise MissingArgument (or other) if a callRemote is called with wrong arguments (closed)
#3246 - remove all mentions of plugins.tml from the documentation (closed)
#3562 - Setup Python 3.0 buildslave(s) (closed)
#3568 - ERROR from conch test when pycrypto is not installed
#3569 - Twisted Web WSGI container sometimes emits too many (or duplicate) headers (closed)

The Python 3.0 buildslave will probably garner the most interest of the bunch. The resolution of this ticket does not mean that Twisted supports Python 3.0 now. It just means that we've added a column to our buildbot (continuous integration system). We can now tell at any given time that Twisted does not support Python 3.0. ;) But seriously, with this slave set up, we can accept contributions which move Twisted towards Python 3.0 compatibility, but I don't plan to spend any time doing such development myself in the near future. There's a ton of other more pressing issues, so I'll leave Python 3.0 work to people who think they'll benefit from it.

As for the new HTTP client, this round of development moves it inexorably towards completion. Itamar Shtull-Trauring and I spent several days improving its error handling, simplifying some of the more hideous parts of the implementation, and dealing with various corner cases (and HTTP 1.1 sure has a lot of them). The development branch also includes a sketch (only a sketch) of the higher-level API we're planning to provide on top of the low-level protocol implementation (the sketch is currently undocumented and a bit obtuse, so it may not make sense without me or Itamar looking over your shoulder) and an example which uses the APIs provided by the low-level protocol to implement a simple web client (something along the lines of wget or curl, but obviously much, much, much more rudamentary).

That's it for now. It'll be 2009 when I post the next one of these. 2008 has been a great year for Twisted development, and I know that things are just going to get better. :) Thanks again to the Software Freedom Conservancy, all of the Twisted Sponsors, and all the other developers who contribute to Twisted.

Wednesday, December 3, 2008

Summary of November Sponsored Twisted Development

I've just completed another round of sponsored Twisted development (the sixth so far). This period sees a lot of documentation improvements, as well as various bug fixes and continued work on a new HTTP client for Twisted Web (#886).

These are the documentation issues which were resolved:

#2281 - Annotations for Twisted Finger Tutorial
#3548 - twisted.conch.client.knownhosts.PlainEntry misdocuments its "_hostnames" attribute.
#3455 - CONNECTION_LOST not an Integer...
#3537 - Conch's public key authentication process is confusing.
#3490 - FTPClient errors should provide ftp errorcode

There were some improvements to Twisted Web:

#1878 - twisted.web.monitor traceback_ AttributeError: class IChangeNotified has no attribute '__class__'
#2402 - client.py crashes on URL's that would be no problem for most browsers
#3192 - HTTPClientFactory sets followRedirect on the HTTPPageGetter class
#3469 - Exception is rendered when NotFound is more appropriate.

A problem with Conch's SFTP server's reporting of modification times was fixed (and then a problem with the unit tests for the fix was fixed):

#3503 - Wrong date format delivered by twisted.conch.ls.lsLine
#3551 - TZ=Pacific/Auckland python2.4 ./bin/trial twisted.conch.test.test_cftp.ListingTests fails

With #3551, I was once again reminded that working with timezones is extremely challenging. This time, I found that the range of local renderings of any particular UTC timestamp is greater than 24 hours, so you cannot rely on any particular timestamp falling on a particular day in an arbitrary timezone. So like other unit tests in Twisted, the unit tests for lsLine explicitly set the timezone when they want to make assertions which depend on it, then reset it to its original value. Unfortunately, this means they can't run on Windows, since Windows apparently lacks APIs for doing this. I also rediscovered that time.tzset() is a no-op, at least on Linux (but the unit tests call it anyway, in case Solaris or HP-UX or some other POSIX implementation requires it).

An important issue with Twisted's Jabber support was fixed as well (#3463). This one prevented Twisted's Jabber client from successfully negotiating TLS when connected to Google Talk (and possibly other Java-based Jabber servers). This problem was ultimately caused by a bug in the TLS support on Google Talk which caused TLS negotiation to fail if the client included a session ticket (RFC 5077) section in the handshake. This is allowed and servers which do not support session tickets should ignore the section, but for some reason it causes problems with Google Talk. OpenSSL (on which Twisted's SSL support is based) 0.9.9 enables session tickets and the 0.9.8 package distributed with some platforms (eg Ubuntu 8.10) includes a backport of this feature. So Twisted's Jabber client cannot communicate with Google Talk if one of those versions of OpenSSL is installed.

A couple issues related to Python 2.6 support were fixed:

#2763 - md5 and sha module will be deprecated in python 2.6
#3545 - Support Python 2.6 in the Windows build system.

A bug in Twisted Mail's IMAP4 client which prevented the unseen part of a server's response to a select or examine command from being made available to applications was fixed (#3550).

And there were several other assorted fixes:

#3521 - Documentation for `processExited()` conflicts with the implementation
#3315 - t.p.reflect.safe_repr includes the wrong traceback and misformats the return value
#3541 - twisted.internet.abstract.FileDescriptor.loseConnection drops reason (will be reported as clean shutdown)
#3544 - bin/admin/change-versions should update the main README file

I also spent some time working on converting Twisted Lore from using microdom, Twisted's XML parser and DOM implementation (circa 2002), to using minidom, the XML parser and DOM implementation in the Python standard library. For a long time, microdom was better than any of the alternatives, but it's seen very little maintenance in the past several years and there are some problems with Lore (eg #414) which are caused by behavior of microdom that it would be difficult to change.

Unfortunately, switching to minidom brings a new set of problems. It's still probably worthwhile, but it seems like it's harder to use than it should be. Some of the issues I've run into so far (and I'm not done yet):

  • minidom's constructors are less convenient than microdom's constructors. For example, the Text constructor doesn't accept a string to use as the text node's value. Instead, you have to instantiate Text and then set an attribute. This expands code which should have been one line into three lines. And to make things worse, a Text instance with no data set raises an exception from the __repr__ method.
  • The parse error exceptions raised by expat are less informative than the parse error exceptions raised by microdom. For example, if a document contains mismatched tags, microdom reports the name and location of both the opening and closing tags. minidom will report only the location of the closing tag. It's easy enough to find out the name of the closing tag by finding that location in the input document, but finding the offending start tag means parsing the document in your head. For any non-trivial document this is ridiculous. Fortunately, by switching to sax and providing custom error and content handlers, most of the information can be recovered. This information is always useful though, and it would be better if minidom provided it by default.
  • Once you switch to xml.sax, you have to remember to disable its validation features or it will try to retrieve DTDs from the internet every time you parse something. This is bad, bad default behavior.
Some of these issues may turn into Python bug reports once I've made more progress on converting Lore. A much bigger difficulty with the conversion than the problems minidom has is the fact that Lore is largely pre-UQDS code. Some of it has tests, but they're mostly whole-system tests which compare gigantic xhtml strings and are subject to extremely obscure failures. And most of it doesn't have any tests at all.

Switching away from microdom should be worth the effort though. minidom is a bit faster and Lore will generate better looking output once the switch is done.

Itamar points out I didn't separate tickets into groups for those I reviewed and those I did development on myself this time. So let me point out that of the above tickets, many were developed by others and reviewed by me. The Twisted development process is highly collaborative. I couldn't accomplish anything without the help of all the other great Twisted developers who volunteer to contribute to Twisted in their free time. If you want to find out which are which, head over to the Twisted issue tracker where you can look up the development(/authorship/etc) history of any ticket.

That's all for now. Thanks again to the Software Freedom Conservancy, all of the Twisted Sponsors, and all the other developers who contribute to Twisted (hi Michael!).