Alex Martelli on the Future of Python
O’Reilly’s Josette Garcia had a cosy chat with Alex Martelli. Together they discussed the past and future of Python, Martelli’s role at Google, what he misses about Italy, working with his wife and how he feels about Erlang:
Q. 1) Python 3000! What is the story behind this huge number?
A. It started as an in-joke, back at the time Windows 2000 was having serious delays — Guido joked that when we did the next major release we should name it “Python 3000″ so we could be sure to not miss the target date;-). When we eventually did get started on it, the PEPs (Python Enhancement Proposals) about it were numbered starting with 3000 (to ensure no conflicts with the PEPs for Python 2.*, which are in the low hundreds). Now Guido’s “vanity” license plate (for his beautiful red Prius) is “PY3K”, which of course refers to the same famous number. (Anna and I, for our mundane silver Prius, have a wider-applying “P♥THON” — I love how the heart can be read as “Y” or “love”, since both apply, but there’s no futuristic references there, alas;-).
But, as you note below, 3.0, 3.1 and so forth are how we refer to the actual code releases — the “3K” needs to recede back to a quirky in-joke, as it was all the way from the beginning.
Q. 2) I believe we are on Python 3.0 with 3.1 coming soon?
3.1 is now out (my bad, as it took me a while to answer these interview questions!) and it’s a very substantial improvement on 3.0, to the point that there’s really no reason to even consider 3.0 any more — while Python releases are normally maintained for quite a while after they’re superseded, this is NOT the case for 3.0, which was somewhat in the nature of an “experimental” release for the new Python 3 language. 3.1 is solid and suitable for production use, and WILL be maintained as previous releases were.
Q. 3) Can you tell us what are the differences between Python 2.x and Python 3?
Comparing 2.6 (which already gained most of the new 3.0 features — some of the backwards incompatible changes listed below can be had in 2.6 with `from future import`) with 3.0, there are several key differences (and a host of minor ones):
- print is now a function (not a statement), so ‘print’ is not a keyword any more, and many options have nicer syntax — ‘print(x,file=y)’ instead of the old syntax ‘print>>y, x’, for example
- legacy classes are finally gone, all classes are what since 2.2 we have been calling “new-style” ones (you should never use legacy classes anyway unless your code somehow needs to support Python 2.1 or earlier…!)
- “strings” are now Unicode, like in Java and C# (there are specific new types for string of binary bytes, mutable and not), rather than the 2.*str/unicode distinction; so for example opening a file in text vs binary mode (always important on Windows, but not on Unix-like systems) is now crucial everywhere (‘.read()’ will return different types — text strings [in Unicode] from a file opened in text mode, byte strings from a file opened in binary mode…!)
- lots of methods and functions that used to return lists now return views or iterators (usually no need to materialize the list, call ‘list(whatever)’ in the unusual case where you do need an immediately materialized list) and the special ways to ask for iterators (xrange vs range, somedict.iteritems vs somedict.items) have blessedly disappeared
- comparisons semantics are much simpler and sharper — the strange behavior of 2.* whereby e.g. all ints were less than all strings is gone, now ’1<”foo”‘ raises an exception — cmp and __cmp__ are gone, as is the old cmp= argument to ‘sort’/'sorted’ (use key= instead, it’s faster on 2.6 too!-)
- the int/long distinction is gone — all ‘int’s are now unbounded (as long’s, only, used to be in Python 2.* — we’ve been slowly and gradually “fading” the distinction for quite a few releases, but it’s finally totally gone!).
- 1/2 now returns 0.5, NOT 0 (“true division”), use // for truncating division (same as `from __future__ import true_division` in 2.*)
- many small syntax improvements: identifier can include non-ASCII letters, annotations are allowed (and preserved but not otherwise used by Python itself) for function arguments and return values, dict comprehensions, set literals and comprehensions, new literal syntax for octal and binary ints (and the new `bytes` type), keywords allowed in `class` statements, new syntax in `def` to specify keyword-only arguments, new keyword `nonlocal`, extended-unpacking (‘a, *b, c = someiterable’), improved syntax for raising and catching exceptions, many syntax simplifications (e.g. ‘<>’ as a synonym of ‘!=’ has disappeared, backticks as a synonym for repr have disappeared, …).
- many built-ins were removed: reload, reduce, coerce, cmp, callable, apply… 3.1 adds many new (typically relatively small) features, roughly the typical “size” of “delta” in a minor release (2.6 doesn’t have those because it was released together with 3.0; no doubt 2.7, once it arrives together with 3.2, will again incorporate as many new features as can be had in ways that are both backwards and forwards compatible)
Q. 4) I previously heard that Python 3 is not backward compatible. Is it true and what does it means for the people who are using Python 2x?
The whole point of making Python 3, rather than 2.(N+1), was the ability to break backwards compatibility and finally remove things that we long thought should be removed — redundancies such as <> equivalent to !=, old stuff like legacy classes, the apply built-in, coercion, int/long distinction, …
Users of Python 2.x for x<6 should first migrate to 2.6 (no harder than any other minor-version migration) to gain most of the new features and other “migration to 3″ helpers. Removing deadwood like apply, ‘<>’, etc, should have been done quite a while ago, but now is better than never ;-).
Python 2.6 has a new switch ‘-3′ that warns about likely incompatibilities, and a `2to3` source to source translator to make warning-free Python 2.6 source into working Python 3 source — in as much as we can, but with a good suite of unit-tests for the application, that should be a pretty painless migration, all in all. (If you DON’T have a good suite of unit tests, forget ALL other tasks until you have developed one — seriously, I mean it: code NOT covered by good unit tests is a disaster waiting to happen… actually, on second thoughts, no need for any waiting !-).
The one factor that’s likely to slow down application migration, now that Python 3.1 is a solid production-worthy release, is that Python 3 is likely to be missing for quite a while some of the huge numbers of extensions available for 2.* — porting C-coded extensions has no nice helpers like porting Python sources do. If your extensions are coded in Cython, they can support Python 3 as well as 2.* even today — but I think other popular extension-writing frameworks (SWIG, SIP, Boost::Python, …) do not yet support Python 3, so all extensions written using those will have to wait for their particular framework to gain Python 3 support; extensions coded in pure C will need work on their authors’ or maintainers’ part.
Q. 5. At least two of the best-known Python people work for Google, what does it mean for Google, what does it mean for Python?
This is an important point! Even when Googlers are developing key new Python things like Unladen Swallow (http://code.google.com/p/unladen-swallow/) [[and please note, as an important aside, that none of the owners, committers and contributors of that absolutely crucial project are those you were probably thinking of as "at least two" above ;-) ]], this is done _in Open Source_ — the result of such efforts is NOT secret, Google-proprietary technology, but rather it’s made and kept available for the whole community’s benefit.
[[Note that one of the committers, Fredrik Lundh, a Googler who was also in Florence [presenting Unladen Swallow at Pycon Italia Tre], is a really major Python figure, involved in it since well before I fell in love with it — also the first Pythonista to be ever honored by being named a “bot”, “effbot” in his case — I was third, as “martellibot”, with Tim Peters, “timbot”, chronologically in second place.]]
Q. 6. Your job at Google is described as “Über Technical Lead”. What does this involve and what languages and applications does it cover?
I actually switched almost a year ago from UTL (a highly technical but mostly managerial role) to being an individual contributor (“senior staff technical solutions engineer” if you want the whole scoop — yeah, nowhere as cool a job title, it’s just too verbose !-) — a parallel career step. It’s great to work at a company that makes this easy, rather than one where management is a kind of “trap” from which you just can’t go back to being an IC ;-).
As it happens, I’ve also switched application areas: earlier it was software for cluster management, now it’s software for business intelligence. Some people have a double-take when they think of such a drastic change, but hey, the languages, development methodologies, tools, and most other supporting technologies are the same — the maths are very similar, statistics, machine learning, data mining, Bayesian logic, etc etc — plus, I’ve always been more of a generalist than a specialist anyway :-).
To put it in perhaps more sound-bitey form: I was mostly “building the cloud” (helping build some of the SW tools to control it, administer it, keep a watchful eye on it, keep it in good shape, etc, etc), now I’m helping build the SW tools to keep an eye on how well the cloud (and the apps running on it) perform business-wise, optimize that performance, and so forth. Cloud Computing and Business Intelligence are two key application areas for software today (of course, there are many other important ones!), and I’m pretty happy about playing in these areas (and kind of proud about the role that Python plays there, and how important Bayesian logic is in both, etc, etc).
Q. 7. What is the market share of Python?
Essentially, your guess is as good as mine: there are no reliable statistics. The numbers no doubt vary a lot depending on what application area you’re talking about, of course. For example, I’ve seen a site trying to guess at the technologies used in publicly accessible web sites, based on various hints and artifacts, and Python appears to be the #2 language for that specific purpose — but that’s _way_ below PHP, with PHP at 60% or so and Python in the teens (of the sites using Python, 80% or so appear to be using Django) — still above a host of famous others, including Ruby, Perl, ASP, Java, etc. But that’s for publicly accessible websites: who can guess what the proportion is on enterprises’ internal intranets? If I had to guess I’d say that on such intranets there’s going to be a lot more Java and Microsoft technologies and a lot less PHP (after all, the IT department may have to approve_those_ internal sites), but how would Python’s share vary? No idea.
TIOBE currently has Python at #7 with 4.4% of the market — among dynamic languages, that’s way below PHP’s #4 and 9.3%, but above all others (Perl being down to #8 with 4.2%); but, as you can see from their diagrams, the month by month results are quite noisy — the trends are more stable, with Java around 20%, C around 16%, C++/PHP/VB around 10% each, and Python just below (though by a monthly glitch C# seems to have just passed it this month ;-).
For a brief summary of the issues involved in the near-impossible feat of estimating languages’ market shares, and a short list of URLs of people still attempting this near-impossible feat of estimation (many, alas, very old — guess most have stopped even trying !-), see e.g. the wikipedia entry at http://en.wikipedia.org/wiki/Measuring_programming_language_popularity.
Q.8. When does one use Python and when is it best to use a different language?
C++ (or C when one needs to respect that constraint, e.g. when working in the Linux or BSD kernels, or on existing open source code that sticks with C, such as the CPython interpreter itself) gives you full responsibility and full control over your memory consumption: when every bit counts, it would be wrong to rely on a garbage-collected language (most modern languages except C++ and C are garbage-collected). If you can possibly afford garbage collection, then by all means avoid non-GC languages — if you can’t possibly afford it, though, then non-GC languages are the only game in town, obviously.
I can’t think of a “normal” case where I, personally, would want to use Java, C#, or other JVM or .NET languages, since Jython and IronPython are so good in those environments.
Functional programming languages are a whole different kettle of fish, and I wouldn’t mind an opportunity one day to write a really challenging application in a FP language, preferably Haskell — so far, though, I’ve been sniffing at them for 25 years, but never really had an occasion to use one in a production environment. Erlang, which you mention later, may be the closest a FP language has come to the mainstream so far (I’m not counting special-purpose languages such as XPath/XSLT), showing that FP languages may have a chance if they can join to their intellectual fascination some real-world, down-to-earth practical advantages (probably in the realm of concurrency, as Erlang has — Haskell for “software transactional memory” approaches might get there too).
Q. 9. What is the relationship between Erlang and Python? (see Erlang + Python, l’unione di due mondi by Lawrence Oluyede)
No relationship, really, despite Lawrence’s excellent attempts: Erlang is its own world, very interesting for its scalability and robustness in highly concurrent workloads (showing its long background as a language born squarely inside the telecom industry, I guess). However, Erlang’s clumsy syntax and some of its approaches (strings as lists of characters — like, alas, Haskell) have very little reason for being the way they are, except, of course, history. There is clearly a lot of mutual interest — 1.8 million search hits for the two words together, almost half of the hits for Erlang alone and more than for, say, Python and Fortran together! — and I recommend e.g. http://muharem.wordpress.com/2007/07/31/erlang-vs-stackless-python-a-first-benchmark/, Muharem Hrnjadovic’s benchmarking of them (specifically using Stackless Python) from a couple years ago, and Reia, http://wiki.reia-lang.org/wiki/Reia_Programming_Language, an attempt to build a Pythonic syntax (and some Pythonic semantics) on top of the Erlang virtual machine (BEAM).
Q. 10. In Florence, you said that you spent 365 days speaking English and now you need to wash your mouth in the Arno – do you miss Italy?
Well, 365 days/year speaking English was an overbid on my part if that’s exactly what I said (as long as I do manage to spend a couple weeks a year in Italy, it’s more like 350 days/year of English !-)
“Lavare i panni nell’Arno” (your clothes, not your mouth, and I’m 100% sure I couldn’t misquote or mistranslate THAT one!-) was Alessandro Manzoni’s expression (he was our greatest novelist according to the most widespread opinion) as to why he needed to rewrite his masterpiece, cleansing it of northern-Italian semi-dialectal influences in favor of a purer “Tuscan” kind of Italian. So he did, and the result is indeed an awesome book, though I may doubt how important the exact choice of slightly dialectal inflection may have been (other superb Italian writers – Calvino, Gadda, Pirandello, Fo, Tombari, Verga, … – wrote in very-recognizably non-Tuscan inflections, after all !-). I guess that in Manzoni’s time, with Italy just having been reunited into a single state, promoting a single “Italian” image was overwhelmingly important.
There are a lot of things I miss about Italy, though California has its own charms — our incredibly sweet language most of all, probably. Those who know me might be surprised that other things don’t rank higher — what about the food, the wine, the Alps…?
Amazingly enough, I find the Sierras, the Cascades range, and the Rockies, to be almost as awesome and breathtaking as the Alps — sure, Cervino at dawn is one sight you’ll never forget, but Shasta at sunset isn’t ALL that far !-)
Availability of superb Italian wines and food (at very reasonable prices) is surprisingly good in this part of California, too (especially since Anna, in her brief residence in Italy, developed a great skill in Italian cooking, and she hones it regularly with US Food Network superstar chefs such as Mario Battali and Giada De Laurentis — Italians, of course !-) — our favorite grocery store, easy walking distance from home, is Piazza’s, a family-run grocery currently owned by second-generation Americans but founded by their granddad, an immigrant from Italy — great selection of wines and foods of all kinds, but Italian ones in particular; and sometimes we shop at Ferrari’s (that one was founded by an emigrant from very close to my hometown, just like the famous car company of the same name ;-).
Donato Scotti just opened a new and absolutely delightful place in Redwood City, about 10 miles from my home, and, besides the great Italian, esp. northern-Italian, cooking that made him famous at La Strada in Palo Alto – a place which still runs just fine – that’s finally a place to get real Italian “aperitivi” (mildly alcoholic pre-dinner drinks and lots of yummy fingerfood munchies), which I _had_ been missing a bit.
I get excellent operas (most of them Italian ones) in San Jose and at the local “grassroots” West Bay Opera in Palo Alto and Mountain View; I can get really decent gelato (if I ever miss it compared with American ice cream, which, however, has become awesome in its own way at places such as Rick’s in Palo Alto — next door to Piazza’s, as it happens ;-) also in easy walking distance from home; north-Eastern Italy coffee (I’m from the north-East) dominates all around (including the free espresso machines at work ;-) ; YouTube gives me more access to my favorite Italian singers (Guccini, Ligabue, Branduardi, Battiato, …) than I ever had back when I was living in Italy… !-)
When I get a day in Bologna, I splurge on “Pane Comune” (the “common bread” of Bologna, impossible to find anywhere in Italy outside of a 20-miles radius or so from my hometown), “ragnini” (another kind of bread, dry, thin, vaguely grissini-like), and “tagliatelle al ragu” (one pasta sauce that’s almost impossible to do right anywhere BUT in Bologna — so this isn’t so much about Italy as about specifically my hometown !-) — but even in Florence I can’t get _those_, so… ;-)
Q.11. I believe you are writing a new edition of Python in a Nutshell. What is the publishing process?
Anna and I are writing “Python 3 in a Nutshell” (that’s a new book, not a new edition of the existing “Python in a Nutshell”; the latter may be warranted, in the future, to cover newer Python 2.* versions since the 2.4 covered in the 2nd ed of “Python in a Nutshell”), to be published at first as a “rough cut”, an ebook-only edition; we hope to make the “rough cut” by Christmas (though Python 3.1 sent a bit of a spanner in the works;-), eventually leading to a paper edition next year. We’re using XML Docbook, and XML-Mind as our main editor (with a few plugins from O’Reilly), though XML is easy to analyze and process via Python scripts when we want to check something in particular, of course;-); and hg (Mercurial) to keep track of revisions, as opposed to svn which we used in the past (hg, like other DVCS, is vastly superior if you ever find yourself writing/developing on, say, an isolated laptop with no net access: you can keep committing at key “worth saving” points, rather than having to wait until you have connectivity again — I’m told that git and bazaar are just as good [better, their proponents say;-)] but hg’s the one I’ve been trying and it makes me very happy — it’s also what Python is switching to [from svn], and code.google.com hosting is now supporting [as an alternative to svn, which we still do support of course).
Q. 12. You have co-authored books with your wife, Anna Martelli Ravenscroft. Does this cause strains on marital harmony?
Absolutely not -- for the right kind of marriage. Remember we read the Zen of Python as part of our wedding readings 5 years ago -- we DO believe it has great ideas that would help any marriage. Beautiful is better than ugly, simple is better than complex, explicit is better than implicit, and so on.
We fought epic, no-holds-barred battles over the wording of certain paragraphs, the structure of some chapters, the order in which to present a few of the many sets of examples and recipes -- and, a result, we both emerged loving each other even more deeply, AND the book came out much better than if we were practicing the normal courteous restraint and compromise that co-authors generally have to practice with each other.
Pre-reqs for such an approach, and towards such wonderful results, is that both spouses/co-authors must be in love with the English language (in both cases, a love dating from far before we ever met), in love with the subject matter (Python, in our case), AND much more interested in having the "true" or "most effective" solution and approach emerge, than in "being right"; being in love with each other, with enormous mutual respect, also helps, as does having subtly different focus (very oriented to the user/reader, for her; very oriented to the "plumbing", the inner working of the technology and its logic, for me) -- differences that are synergistic and complementary, not inimical.
We've had the opportunity to chat with other married couples who had engaged in similarly successful endeavors (always technical books, simply because that's the field we're both in and so these are the kinds of people we tend to meet!, but on a very wide spread of subjects), and most of these aspects do appear to generalize.
Q. 13. Python continues to attract more and more interest from programmers skilled in other languages. Have you any learning recommendations for people moving to Python?
My top recommendation is to consider that you probably don't have to MOVE to Python, in many cases: Python can probably play nicely with the languages you used to prefer (C++, Java, Objective C, C#, Fortran, ...), so you can "have your cake and eat it too" -- keep using the frameworks, libraries and tools you know and love, use Python to keep it all together and flowing smoothly towards your applications' goals. This is probably not the case if you're coming from Perl, Tcl, Scheme, PHP, Lua, or Ruby -- these are languages whose favorite niches vastly overlap with Python's own, so in this case a clean break, "a move" as you put it, may in fact be preferable.
The second-from-the-top recommendation is to bend over backwards to NOT program in Python "as if" you were programming in your previous favorite language -- this applies to ANY case where a programmer is adding a new language to their quiver, but more so when the new language is especially powerful and flexible: I've seen people "do Cobol in Java", "do Fortran in C++", "do C in Perl", etc, but "do X in Python" is scarily widespread for many values of X. The more different languages you're skilled with, and the more idiomatically you've learned to employ each of them, the less this particular risk becomes -- but especially if Python is just your second language, *beware* of striving to use it "just as if it was" PHP, or Perl, or Java, or... it _isn't_ -- books like Python in a Nutshell and Python Cookbook will help you pick up the idioms and their reason for being, so you can internalize them and use them properly.
The third-from-the top recommendation is kind of the counterpoint to the second one: don't imagine you have to forget all you've learned so far -- at a sufficiently high level of abstraction many of your existing best practices will stand you in good stead! Most design patterns are still very useful in Python (though some are superseded by the language's built-in facilities, that's the exception, not the rule) -- watch my YouTube videos on design patterns in Python to get an idea. For example, just because you CAN (in an emergency) "monkey-patch", doesn't mean you SHOULD rely on that troublesome technique where you can possibly avoid it -- Dependency Injection still has its extremely important role, for both testability and extendability. Your best practices in overall system architecture, testing, and development methodologies are still just as important -- spec-driven design, merciless refactoring, continuous build, pair programming if that's what you like, good release engineering, security audits, load-stress tests, etc, etc.
In the end, programming is something _human beings_ do -- Python has done its best to be the most pleasant, productive language for human beings, fitting their brains and their needs with a carefully balanced mix of flexibility and rigor, of simplicity and power, of readability and conciseness; but all practices that address human beings characteristics (pervasive code reviews, automated testing/building/deployment, revision control and issue tracking, obsessively regular and frequent check-backs with the intended user[s], &c), are still just as indispensable in Python as they were in ANY other language!-)