Before Easter, I quietly released version 0.01 of the threads::tbb module to CPAN.  This week, I posted the white paper I’ve been working on to demonstrate that indeed it does permit high-performance threaded programs to be written.  The paper, included with the 0.02 CPAN release, is called Threading Perl with TBB.  I managed to demonstrate scaling to 7 out of 8 processors, which is good enough for a 0.01 release I think.

Writing a LaTeX article-style white paper is an unusual way to communicate a new idea in the Perl community, which are normally focused on CPAN releases, conference presentations, blog posts etc.  However I chose this format for a wider audience – as the research itself applies to many of the dynamic language environments out there, and I try to characterize which interpreters it is applicable to – along with warnings about the difficulties in proceeding.

Actually, the first attempt at this effort was for PHP/Zend, to go alongside other OpenParallel successes with PHP/HipHop.  However it became obvious early on that the depth of previous research and support framework I had access to made targeting PHP first not a feasible option.  At least not for someone without the oral transmission of how the Zend internals work.

Here’s the abstract, with a little summary commentary:

Perl’s history has seen two threading models; one shared-state model now dis-
continued, and one heap-duplicating model with a very inefficient shared state ap-
proach. There is also an Erlang-style threading library on CPAN, threads::lite,
more efficient and scalable but specific in functionality.

This summarizes the “Threading History” section of the paper.  Threading has a bad rap to date, with many language communities considering it a flawed approach, inherently instable or unnecessary if you have fork().  I’m with the TBB authors on this – that it’s more of an API issue.  This section tries to give the section of the relevant history for Perl, as it explains why in the Perl community the situation is so.  As the person who compiled the Perl git history, my focus for this section was certainly on what is available in git and on CPAN.

A task-oriented parallelisation approach permits parallel operations on data sets
as well as pipeline-based programming. threads::tbb, the core invention of the
paper, uses Intel’s Threading Building Blocks (TBB) along with a system of lazy
cloning for state, and is shown to result in speed-ups for embarassingly parallel tasks
to 8 processor cores or more.

There is a section in the paper where I have effectively summarized the contents of the TBB approach for the lazy by virtue Perl programmer.  It references James Reinders’ book, still the best source of information on TBB.  It also summarizes the results section, where I show that:

  • the approach works for “synthetic” tasks – ie, inner loops, with the best time increase being a 7-times speed up on an 8-core system.
  • described effects of memory allocation mutex contention are observed, and can be worked around using the TBB scalable memory allocator proxy.
  • The API is useful enough to be useful to retrofit a legacy program.

The results are applicable in principle to other languages which are built around
boxed variables and state machine interpreters, such as PHP/Zend or standard
Python.

As I mentioned before the jump, most of the difficulties are around the nature of having a state machine and are applicable to a selection of other Virtual Machines.  This assertion would need to be tested by those other communities of course.

The cloning logic duplicates core Perl 5 code in intent, the API for which could be
cleaned up to avoid some minor API intrusions. Green fields interpreter approaches
would benefit from a const concept to avoid duplication in the first place, and for
safer operation.

The “lazy clone” approach I mentioned has some very large caveats, and I attempt to explain why they are necessary in terms of achieving practical speed-ups, mitigating factors to the decision, and design approaches for avoiding the problems in the first place.

Transformations for task-orientation are similar to those required for event-
oriented programming, with potential to parallelize event frameworks, or for APIs
which span the two styles.

This is a possibility I want to expand upon in the future – I believe that the parallel_while API in TBB would allow you to make event-style frameworks which allow multiple events to be processed in parallel.  This would be interesting and perhaps a new style of programming.