Skip to Navigation | Skip to Content

User login

Log in using OpenIDCancel OpenID login

Coder Who Says Py

Syndicate content
A place for me to babble on about Python development, Python itself, and coding in general. The title is inspired by some knights who enjoy a good shrubbery.
Updated: 1 hour 48 min ago

Holy crap, 3.0 is done!

December 3, 2008 - 6:03pm
Python 3.0 is out! As Jeremy Hylton has pointed out, Python 3000 was first discussed nine years ago. When I first started following python-dev back in June 2002, Python 3000 was this somewhat mythical version that pretty much any idea that seemed somewhat reasonable somehow to someone was attached to. To an extent I viewed it as a joke for a while since the Py3K title was tossed about so loosely.

But no longer. Python 3.0 is a real piece of software for the world to use. It's been quite the ride. With the amount of wild hacking everyone did on the p3yk/py3k branch and just the sheer amount of fundamental changes underneath the hood, I think python-dev has done a helluva job in getting 3.0 out the door within the release schedule we set for ourselves (we only slipped three months from a schedule we set quite a while ago). I still wish I could have gotten my import rewrite in, but the standard library reorganization was more important.

I am very happy with the way the language has turned out. I think the language is cleaner and easier to use. It is a good step forward and will help keep Python relevant for a long time. I for one plan to pretty much focus on Python 3 from this point forward in terms of new features since I like it so much more.

So now what? Well, we have a transition to deal with. I am willing to wager that 2.7/3.1 will be pretty much just bug fixes and helping make the transition from 2 to 3 even cleaner. We also have to continue to move the community over to 3 and off of 2. My hope is that by the time 3.2 is out the door the 2 series is pretty much just there for the stoggy, stubborn people who refuse to upgrade.

Oh, and I love the fact we beat Perl 6 out the door. =)

Using the AST and decorators for L10n translations

November 22, 2008 - 2:21pm
I had a thought the other day about L10n and how I have never thought the gettext approach was optimal. The idea of having to wrap every string literal in a call to the '_' method seems needlessly sub-optimal. What happens if you forget to wrap a string? What about the overhead of having to call that function for every string literal used by your code? And don't you have to run your app completely to create the translation corpus, hoping every string literal is exercised?

There has to be a better way for both creating the corpus and handling the translation. For the corpus, I think the AST could be used to extract every non-docstring string literal from code. Simply compile your Python source to an AST and you can walk it to find all string literals. You also have the perk of knowing exactly where a string came from since the AST is annotated with that kind of information. Since it is done statically it doesn't rely on testing to give 100% coverage of your string literals (which you should have anyway, but we all know the world is not perfect).

As for translation, I see to possibilities based on whether you want to allow for translations between executions. If you only care about installations supporting only a single translation, you can take the AST idea and simply rewrite the AST such that string literals are replaced with their translations. Then you simply write out the translated AST to bytecode. As long as the bytecode is always used there is absolutely no overhead in using a translation of your application.

But that seems rather heavy-handed. This is where decorators come in. Let's say you decorate every function that contains a string literal. I know this doesn't solve the same issue of the '_' function accidentally being left off, but a function decorator is required to be put in fewer places than every string literal. With this decorator in place, it checks to see what the locale is. If it is different than what the string literals are, it takes the function object and re-creates the code object with a new co_consts that contains the translated strings. Now your overhead is simply the creation of new function objects which should be no worse, and is probably better, than calling a function for every string literal.

And you can have this translation be a one-time cost or fully dynamic per execution. If you simply have the decorator return the new function object then the translation is done at function creation time and is never worried about again. But if you instead return a closure that checks per run-time you can figure out if you need to create a new function object for every call. Since people tend not to change their locale in the middle of execution constantly it should be no more expensive than finding out the current locale and checking a variable to see if they differ, so it's still not very expensive.

But what if this idea is ridiculous and you still want to use gettext and its approach? Well, the AST trick can still be used to generate the corpus. But it can also be used to verify that every string literal is wrapped in a call to '_'. If it isn't you can simply raise an error in your build step and add the appropriate call.

Now since this is just an idea I had the other day and I am not a huge user of L10n, I have not researched this in any way. It is possible someone has already thought of this and has a package out there that does this exact thing, which would be neat as it means my idea is not completely ludicrous.

Social software graph

November 16, 2008 - 3:51pm

AMK did a post where he graphed is social software connections. I thought that is rather interesting, so I decided to do it myself. Doing this put a couple of things into perspective for me.

One is that I like the microblogging services. At the moment I swap between identi.ca and Jaiku. I started to use the former after hearing the creater, Evan Prodromou, interviewed on FLOSS Weekly. Liked his reasoning behind choosing PHP and appreciated it being open source. But I still prefer Jaiku. Plus I have talked to some people while here at Google who are still working on it so I have a personal connection to the app. Twitter is only there because I started to use it for a second time just before the whole rash of fail whales earlier this year hit. And even FriendFeed can do microblogging. But I really should settle down to a single account instead of jumping around.

My second revelation is how empowering FriendFeed is. I mean I love the site and consider it one of only two web services I use which I consider invaluable that is not run by Google (the other being Instapaper). But having them all come together into a single location is nice. Plus having various ways to export the information back out (e.g. Twitter and my personal blog) makes it even handier.

Third, Facebook is just a walled garden with an information blackhole in the center; information might go in, but nothing comes back out of it to contribute to my presense online. Since I don't use the service to keep in touch with friends and nothing I do within the service can come back out, I have pretty much decided I am going to delete my account. I would rather find some other service where my information can be more useful to me outside of the service itself.

It's rather amazing the amount of information that people are able (and willing) to share online these days. I think people will continue to tear down the barriers of data as time goes on and become more and more comfortable sharing information online. I know I use the basic rule that if a stranger asked me about something and I am willing to answer, then it's okay to toss it online. Turns out I am willing to share a lot. =)

Do people prefer FAQs or HOWTOs?

November 14, 2008 - 10:32pm
There is currently an initial draft for a document that is to be considered a Getting Set Up doc for people wanting to hack on Python. But I have a dilemma: how does this and future docs I plan to write tie into the dev FAQ?

For as long as I have been the unofficial maintainer of the dev FAQ it has been primarily used for helping out with svn and SSH. While both tools obviously have their own FAQs all over the Net, the dev FAQ collects exactly what a typical contributor or core developer might need to know in order to work on Python.

But I am beginning to realize there is some overlap in goals. I need to decide what balance I want to strike between the dev FAQ and these narrative docs. One approach is to push all technical details into the dev FAQ, allowing the dev docs to basically be tools-agnostic. This has a certain appeal to me as our switch from CVS to Subversion or from SourceForge to Roundup caused documentation issues for some time after the switch-overs. That way any tool changes can be focused in the dev FAQ.

But not everything is necessarily structured for a FAQ. While I can outline the questions in a certain order to help facilitate people learning what to do, having a basic narrative can be more helpful. So what am I to do?

Take as an example checking out the trunk. Now I can explain how we typically have multiple versions of Python under development in the narrative and then point to the dev FAQ for details on where to find the repositories and how to check out the code. Or I can outline directly in the Setup doc what steps one needs so that it is more like a HOWTO that people just follow.

I could attempt a hybrid where I list the steps at a high-level in the Setup doc with links to the specific questions in the dev FAQ, although that is still brittle in the case where the dev FAQ changes (e.g. we move off of svn to some distributed VCS).

I think I have convinced myself to have the docs be tool-agnostic and move all of it into the dev FAQ. Organization of the information makes sense to me as well as centralizing stuff so that any changes in the future will not be too painful. If people (dis)agree, feel free to say so.

Oplop 0.3 released

November 8, 2008 - 2:51pm
The password-generating program that I created, Oplop, has now hit version 0.3. The biggest change since the last version is that the whole idea of having various restriction filters placed on your eventual password is now gone. From practical experience it turned out that most sites are happy with what Oplop generates, otherwise sometimes there is demand for making sure there is a digit.

So now a digit is always included in any generated password. Simplifies the UI immensely. Now all one needs to do is input a label and a password to get their generated label password back; no more checkboxes!

And thanks to the power of App Engine, Oplop is online. I kept needing passwords at work and since I have not gotten around to writing an iPhone or Nokia S60 app, I decided I needed something. Well, since I am interning with the App Engine team and I already had Oplop implemented in Python, I figured I would just do an App engine application where I skip using Django and just do a bare-metal web app.

Now as of right now the online solution is not ideal. While I use HTTPS for everything to prevent online snooping, there is no protection from someone peering over your shoulder to notice the password that is shown on the screen. Besides, if I didn't have JavaScript so much or had my housemate mention he has had issues at his work with differing MD5 implementations, I could have a JS version that does everything client-side.

But I would like to harness the ability to store info on a per-user level for this web app. The thing holding me up is deciding how much to store. If I store labels along with any caveats for the label password for that label, then I don't have to think about having that stuff written down anywhere. Plus I could have completion suggestions for someone who is logged in. And if there is a way for someone to be logged into their Google account from an application then the label completion could extend to even application implementations of Oplop.

The security implication, though, is that if someone gets a hold of someone's label list, will that provide too much information? Chances are you could figure out where someone had an account from the label name (e.g. it's obvious what account the "amazon.com" label is for), but the labels are worthless without the master password. But if you got a hold of the master password, your accounts could all be compromised (still requires knowing the username for each account).

If I store just a list of sites where one has labels along with any password caveats for the site, then I could have a bookmarklet that tells you what the caveat is. You then still have to remember your label for the site, but at least that bit of information would not be available to anyone who manages to get a hold of your account information. But it does guarantee people will know where you have an account.

Storing both the label and corresponding site for the label gives you the perks of both, but with the combined drawbacks. I have not decided if any of these potential security issues are worth the benefits they bring at the moment.

Syndicate

Syndicate content