I completely failed to explain to a group why I think C# 3.0’s pile of features are a poor design. Somehow people keep missing my point. Can you believe English is my first language? If anyone is reading this post, please tell me if this makes any sense to you.

Imagine how Anders, head of the C# design team, might think about the next version of C#. Ten different groups within MS propose 10 brilliant features to add to C# 4.0. Things like pattern matching, concurrency, transactional memory, assertions, categories, etc. Big, complicated, cool features. How does he decide which one to implement? Anders would think long and hard about which feature will have the biggest impact. He’ll choose 1 or 2 and the rest will get shot down. C# 4.0 introduces a few big new features, and this process repeats itself for version 5.0.

If I were running the C# team, I would resist adding any features to C# for fear of feature creep. Instead, I would tell all 10 groups to implement their features as libraries. They will grumble and complain, then they will build a prototype using some insane hackery to get around some limitations in C# and the CLR. I would study those gross hacks, not the features themselves, because that’s where they ran into a roadblock with the language. The goal of language design is to reduce the number of gross hacks needed to implement complex libraries. Because if I can fix C# so these guys can implement their 10 great features as libraries more easily, then I’ve magically enabled hundreds of groups outside MS to also implement their very complex features.

The changes I envision making to C# would be much more subtle than a giant feature like LINQ. I would tinker with some dynamic typing features, better integration with code generation tools, and maybe a way to use attributes within the body of a method (i.e. a parallel loop declaration above a foreach stmt). Small, subtle changes that would have wide impact on library writers, but not most programmers. I’m against adding feature X. Instead, I want to change C# so you can write feature X as a library. Does this make sense?

I watched Linus Torvalds rant on Git and distributed source control management (SCM). First of all, Torvalds is a hard-core jerk. He is stunningly obnoxious and egotistical; the very embodiment of the poorly socialized nerd stereotype. As for the talk, he takes a lot of credit for some fairly well known ideas (much like Linux OS).

Read the rest of this entry »

YouTube Architecture

August 1, 2007

Good summary of YouTube’s architecture. They also started with a monolithic database, threw more hardware at it, and finally partitioned the database so it could scale across many machines. This is an important lesson for any business. Plan to split your database from the beginning. You might keep it on one DB in the beginning, but you should be prepared to partition once you get to a reasonably large size.

Everyone tells me stored procedures are faster because the DB caches the execution plan. This always struck me as preposterous: a DB should obviously cache all SQL queries. Finally, someone at Microsoft SQL states, “The cached execution plan [was] used to give stored procedures a performance advantage over queries. However, for the last couple of versions of SQL Server, execution plans are cached for all T-SQL batches, regardless of whether or not they are in a stored procedure. Therefore, performance based on this feature is no longer a selling point for stored procedures. Any T-SQL batch with static syntax that is submitted frequently enough to prevent its execution plan from aging out of memory will receive identical performance benefits.” That was 2004, and she implies it was around for a few versions before that date. In most cases, I don’t think stored procedure perfs is something that should concern most IT sheep.

NIH Syndrome

June 4, 2007

Not Invented Here (NIH) syndrome is where companies prefer to develop their own solution rather than reuse an existing solution. In most cases this is an absolute disaster. Even though an existing software package has bugs, anything you develop will likely have more bugs. At least the existing package has been tested, used in the field, and updated to shake out many bugs. Many IT people have told me that their management will not pay for any additional software; instead, they build it all themselves. Their managers are morons. If an IT goon is paid $100K, the company is really paying ~$150K for salary + benefits + office space. Let’s say it takes 50 dev-weeks to write, debug, and test the data access layer for a moderately complex schema. Certainly one can purchase an object-relational mapping tool for far less than $150K. Hibernate is free, and most commercial tools offer source code access for a few thousand $s. This is vastly cheaper than building it in-house. The same applies to windows and web controls, testing tools, IDEs and other libraries. NASA and the US military now strives to use off-the-shelf components where possible. So why aren’t IT shops buying more software components? I wanted to start a little software tools company, but was dissuaded after too many IT guys told me their companies don’t buy much 3rd party software. What’s a compiler guy like me to do?


April 30, 2007

The book Moneyball describes how the Oakland A’s used a rational analysis of baseball statistics to cheaply construct a competitive major league team (book summary here). Many people have been inspired to consider if their own field can be improved by the “Moneyball” approach. To continue this trend, how would the Moneyball approach apply to software? If you believe the cliche that the best programmers are 10 times better than the average, then why aren’t companies willing to pay 10X  (or 5X or even 2X) more for those guys? The problem is we don’t have public statistics on programmers, nor do we have valid statistical measures of productivity. So we have nothing so far. And that’s why it’s important to figure out some measure, however crude, that can help skim the best from the rest. Google is the only company I’m aware of that is making an attempt at doing this.


April 27, 2007

I’m going to write about software productivity and how it might impact software engineering, including programming languages. However, I first need to understand productivity by skimming the work in economics. Strangely, measuring productivity is a crude pseudoscience. [Read this mumbo-jumbo (pdf) about productivity.] To measure national improvements in productivity, they simply divide outputs by inputs. It once costs $1 to produce a $3 widget, now it costs $.75 to produce that widget: that’s a 33% improvement (3/1 -> 3/.75). However, if they simply raised the price to $4, that’s also a 33% improvement. But is that a productivity improvement? Even now, productivity growth has slowed from 3% (’95-’04) to 1.5% lately. But economists have to guess as to the cause. If the economy loses steam and revenue falls, but a company hasn’t reduced their workforce, then (lower output) / (same workforce) = lower productivity. Another problem is that companies produce better products (i.e. faster processors) at the same price. Is that a productivity improvement? Economists argue about this all the time. Basically, measuring productivity is imprecise and involves lots of guessing. So measuring software productivity will likely be equally tough.