Stream Programming

August 8, 2007

Stream programming is a style where one describes a graph of interconnected actors that process a stream of data. See StreamIt. The data can be split and joined; therefore, it’s not the typical simple linear stream found in I/O libraries. There are three types of parallelism to consider:

  • Task parallelism: two tasks running in parallel on separate data streams
  • Data parallelism: a task that can be applied independently to every element in a data stream
  • Pipeline parallelism: a linear set of tasks that can be merged together

I’d like to build a stream programming language that runs on top of Hadoop. This is a project at Apache that implements something like Google’s distributed filesystem, MapReduce and BigTable. MapReduce is, I think, a subset of stream programming (though I’m not sure about “reduce”). Anyway, I need a benchmark that is large enough to exercise a group of machines on Amazon’s EC2.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: