Archive for the ‘Python’ Category

Initial Python Impressions

August 12, 2010

This blog is about my initial impressions of Python in the context of an “official Python production project”. I have long used Unix scripts for basic scripting needs, and occasionally used Python (Perl less so) for more substantial tasks but it has always been “unofficial”. My latest gig involved deploying a Python program to listen to incoming AWS SQS messages and dispatch them to a downstream processing engine (business logic, MySQL database).

Though Java has been my bread ‘n butter since its inception, I am firmly in the camp of language non-bigots. I was a coder long before Java, and it is hardly the only show in town. It basically boils down to the best tool for the task at hand. After all, it is all about tools – that’s what launched us Homo sapiens onto our current trajectory towards ultimate civilization.

Python is certainly enticing, and I fully appreciate its appeal. For example, there’s no question that a Python dictionary is so much more convenient to define than a Java map. For example:

mydict = []

instead of its Java equivalent  of:

Map<String,Integer> map = new HashMap<String,Integer>()

or the upcoming Java 7 syntax improvement with inferred typing:

Map<String,Integer> map = new HashMap()

You can also use Google Guava utilities to mitigate this issue for now.

It is obviously so much easier to “whip out” a Python program to execute some basic functionality than a Java equivalent. The crux of the dilemma is: convenience for developers vs. long-term operations concerns.

It basically boils down to two issues (not necessarily unrelated):

  • Type safety
  • Size of team

If you’re one developer or a tight group of like-minded developers, then type safety issues can be mitigated by convention and mind-meld. However, as soon as the team grows, and the life cycle of the application is extended (original developers are no longer involved in maintenance), then problems begin. Its hard to imagine a type-less language such as Python comparing to Java for a large-scale development team where unrelated developers  and hundreds of thousands of lines of code are involved.

For example, without explicit typing, new developers are forced to drill down into the source code to verify method signatures. Typically in the Java world this is handled by Javadoc, IDE magic or mere perusal of source signatures. In Python, you cannot merely look at a method’s source signature for there is none – you have to actually look at the entire method’s code and all its return values (cyclomatic complexity).

An interesting recent article precisely looks at these issues in the migration from Python to Java for Nuxeo’s CMS – see here.

In order to bullet-proof production code, the developer is forced to “play compiler”. To compensate for the lack of a compiler, much of the type-checking should be done by unit tests; these unit tests  would not exist in the Java world. These tests are basically accidental complexity – extra cost – and exist only for type safety.  Here the chickens come home to roost – the trade-off between developer ease of use and run-time stability. Senior Python developers have told me that the “safe” way is to check function return values by either using “isintance()” or checking for specific attributes with “hasattr()”. Whew! This just doesn’t “smell right” to me – too dependent on the whims of individuals. The stuff of nightmares for operations folks trying to discern what went wrong at 3 AM!

One particular place I noticed that this can cause run-time production problems is in the rarely executed “except” clause of a try/except (Java’s try/catch). I ran into unpleasant surprises due to Python’s inexplicable inability to conveniently cast different values in a print statement. Where Java easily concatenates distinct types, Python requires you to cast everything to a string with the str() function if you wish to use the “+” operator – using “,” you don’t, but formatting suffers. Whew, a bit of inconsistency I’d say. You’ll never know this is a problem until an error happens.

Another Python cultural issue that strikes me as “strange” is the lack of true multi-threading due to the GIL (Global Interpreter Lock) limitation. This limitation seems to be an arbitrary constraint due to to the BDL (Benevolent Dictator for Life). Sure, threading is a non-trivial issue – as any tool it can be used or abused. But to summarily dismiss it and force people to spawn processes strikes me as arbitrary and ultimately retro.

Threading concerns can divided into two basic types:

  • Threads that access shared resources that need to be synchronized. Care, diligence and discipline need to be exercised.
  • Threads that access external resources that require no synchronization Goetz et. al. in their seminal book Java Concurrency In Practice call these deferred computations. Since there is no synchronization, programmer complexity is greatly reduced.

It is the latter that is used more often, and thus more important. Forcing users to always spawn processes is unnecessary accidental complexity. For an interesting recent perspective on the subject, see Michele Simonato’s Artima post at Threads, processes and concurrency in Python: some thoughts.