2009
- 1. Partial
Memoization of Concurrency and Communication
Memoization is a well-known optimization technique used to eliminate
redundant calls for pure functions. If a call to a function f with
argument v yields result r, a subsequent call to f with v can
be immediately reduced to r without the need to re-evaluate f's
body.
Understanding memoization in the presence of concurrency and
communication is significantly more challenging. For example, if f
communicates with other threads, it is not sufficient to simply record
its input/output behavior; we must also track inter-thread
dependencies induced by these communication actions. Subsequent calls
to f can be elided only if we can identify an interleaving of
actions from these call-sites that lead to states in which these
dependencies are satisfied. Similar issues arise if f spawns
additional threads.
In this paper, we consider the memoization problem for a higher-order
concurrent language whose threads may communicate through synchronous
message-based communication. To avoid the need to perform unbounded
state space search that may be necessary to determine if all
communication dependencies manifest in an earlier call can be
satisfied in a later one, we introduce a weaker notion of memoization
called partial memoization that gives implementations the
freedom to avoid performing some part, if not all, of a previously
memoized call.
To validate the effectiveness of our ideas, we consider the benefits
of memoization for reducing the overhead of recomputation for
streaming, server-based, and transactional applications executed on a
multi-core machine. We show that on a variety of workloads,
memoization can lead to substantial performance improvements without
incurring high memory costs.
- 2. Exceptionally
Safe Futures
A future is a well-known programming construct used to introduce
concurrency to sequential programs. Computations
annotated as futures are executed asynchronously and run concurrently
with their continuations. Typically, futures are not
transparent annotations:
a program with futures need not produce the same result as the
sequential program from which it was derived.
Safe futures guarantee a future-annotated program produce the
same result as its sequential counterpart. Ensuring safety is especially
challenging in the presence of constructs such as exceptions that permit the
expression of non-local control-flow. For example, a future may raise an
exception whose handler is in its continuation. To ensure safety, we must
guarantee the continuation does not discard this handler regardless of the
continuation's own internal control-flow (e.g. exceptions it raises or
futures it spawns). In this paper, we present a formulation of safe
futures for a higher-order functional language with first-class
exceptions. Safety can be guaranteed dynamically by stalling the
execution of a continuation that has an exception handler
potentially required by its future until the future completes. To
enable greater concurrency, we develop a static analysis and
instrumentation and formalize the runtime behavior for instrumented programs
that allows execution to discard handlers precisely when it is safe to
do
so.
- 3. Semantics-Aware
Trace Analysis
As computer systems continue to become
more powerful and complex,
so do programs. High-level
abstractions introduced to deal with complexity
in large programs, while simplifying human reasoning,
can often obfuscate
salient program properties gleaned from
automated source-level
analysis
through subtle
(often non-local) interactions. Consequently, understanding
the effects of program changes and
whether these changes violate intended protocols
become difficult to infer.
Refactorings, and feature additions, modifications, or removals
can introduce hard-to-catch bugs
that often go undetected until many revisions later.
To address these issues, this paper
presents a novel
dynamic program analysis that builds
a semantic view of program executions.
These views reflect program abstractions;
and aspects; however, views are not simply
projections
of execution traces, but are linked
to each other to capture semantic
interactions
among abstractions
at different levels of
granularity in
a scalable manner.
We describe our approach in the context
of Java and demonstrate
its utility to
improve regression
analysis. We first formalize a subset
of Java and a grammar for traces
generated at program execution.
We then introduce several types of
views used to analyze
regression bugs along with a novel, scalable technique for
semantics differencing of traces from
different versions of the same program. Bechmark results
on large open-source Java programs
demonstrate that semantic-aware trace differencing can
identify precise and useful details
about the underlying cause for a regression
even in programs that use reflection,
multithreading, or dynamic code
generation, features that typically
confound other techniques.
4.
Alchemist:
A Transparent Dependence Distance Profiling Infrastructure
Effectively migrating sequential
applications to take advantage of parallelism
available on multicore platforms
is a well-recognized challenge. This paper
addresses important aspects of
this issue by proposing a novel profiling technique
to automatically detect available
concurrency in C programs. The profiler,
called Alchemist, operates
completely transparently to applications, and identifies
constructs at various levels of
granularity (e.g., loops, procedures, and conditional
statements) as candidates for
asynchronous execution. Various dependences including
read-after-write (RAW),
write-after-read (WAR), and write-after-write (WAW), are
detected between a construct and
its continuation, the execution following the
completion of the construct.
The time-ordered {\em distance} between program points
forming a dependence gives a
measure of the effectiveness of parallelizing that construct,
as well as identifying the
transformations necessary to facilitate such
parallelization. Using
the notion of post-dominance, our
profiling algorithm builds an execution index tree at
run-time. This tree is used
to differentiate among multiple instances of the same
static
construct, and leads to improved
accuracy in the computed profile, useful to better identify
constructs that are amenable to
parallelization. Performance results indicate that the profiles
generated by Alchemist pinpoint
strong candidates for parallelization, and can help
significantly ease the burden of
application migration to multicore environments.
5.
Speculative
N-Way Barriers
Speculative execution is an
important technique that has historically been used to
extract
concurrency from sequential
programs. While techniques to support speculation work well
when computations perform
relatively simple actions (e.g., reads and writes to known
locations), understanding
speculation for multi-threaded programs in which threads
communicate through shared
references is significantly more challenging, and is
the focus of this paper.
We use as our reference point a
simple higher-order concurrent language extended with an
n-way barrier and a fork/join
execution model. Our technique permits the expression guarded
by the barrier to speculatively
proceed before the barrier has been satisfied (i.e., before all
threads that synchronize on that
barrier have done so) and to have participating threads that
would normally block on the
barrier to speculatively proceed as well. Our solution formulates
safety properties under which
speculation is correct in a fork/join model, and uses traces to
validate these properties
modularly on a per-thread and per-synchronization basis.
2008
1.
Flattening
Tuples in an Intermediate SSA Representation
For functional programs, unboxing
aggregate data structures such as tuples
removes memory indirections and
frees dead components of the decoupled structures.
To explore the consequences of
such optimizations in a whole-program compiler, this paper
presents a tuple flattening
transformation and a framework that allows the formal study and
comparison of different flattening
schemes.
We present our transformation over
functional SSA, a simply-typed, monomorphic language
and show that the transformation
is type-safe. The flattening algorithm defined by our
transformation has been
incorporated into MLton, a whole-program, optimizing compiler for
SML. Experimental results indicate
that aggressive tuple flattening can lead to substantial
improvements in runtime
performance, a reduction in code size, and a decrease in
total
allocation without a significant
increase in compilation time.
- 2. A
Uniform
Transactional Execution Environment for Java
Transactional memory (TM) has
recently emerged as an effective tool
for extracting fine-grain
parallelism from declarative critical
sections. In order to make STM
systems practical, significant effort
has been made to integrate
transactions into existing programming
languages. Unfortunately,
existing approaches fail to provide a
simple implementation that
permits lock-based and
transaction-based abstractions to
coexist seamlessly. Because of
the fundamental semantic
differences between locks and transactions,
legacy applications or libraries
written using locks can not be
transparently used within atomic
regions. To address these shortcomings,
we implement a uniform
transactional execution environment for Java
programs in which transactions
can be integrated with more traditional
concurrency control constructs.
Programmers can run arbitrary programs that
utilize
traditionalmutual-exclusion-based programming techniques,
execute new programs written with
explicit transactional constructs,
and freely combine abstractions
that use both coding styles.
- 3.
Protocol
Inference Using Static Path Profiles
Specifcation inference tools
typically mine commonalities
among states at relevant program
points. For example, to infer the
invariants that must hold at all
calls to a procedure
p
requires examining
the state abstractions found at
all call-sites to p. Unfortunately, existing
approaches to building these
abstractions require being able to explore
all paths (either static or
dynamic) to all of
p’s
call-sites to derive
specifications with any measure of
confidence. Because programs that
have complex control-flow
structure may induce a large number of
paths, naive path exploration is
impractical.
In this paper, we propose a new
specification inference technique that
allows us to efficiently explore
statically all paths to a program point.
Our approach builds static path
profiles, profile information constructed
by a static analysis that
accumulates predicates valid along different
paths to a program point. To make
our technique tractable, we employ
a summarization scheme to merge
predicates at join points based on
the frequency with which they
occur on different paths. For example,
predicates present on a
majority of static paths to all
call-sites of any
procedure
p forms the pre-condition of
p.
We have implemented a tool,
Marga, based on static path profiling.
Qualitative analysis of the
specifications inferred by marga indicates
that it is more accurate than
existing static mining techniques, can be
used to derive useful
specification even for APIs that occur infrequently
(statically) in the program, and
is robust against imprecision that may
arise from examination of
infeasible or infrequently occurring dynamic
paths. A comparison of the
specifications generated using marga with a
dynamic specification inference
engine based on Cute, an automatic unit
test generation tool, indicates
that Marga generates comparably precise
specifications with smaller cost.
4. Quasi-Static
Scheduling for Safe Futures.
-
Migrating sequential programs to effectively utilize next generation
multicore architectures is a key
challenge facing application
developers and
implementors. Languages like Java that support complex
control- and dataflow
abstractions confound classical automatic
parallelization techniques.
On the other hand, introducing
multithreading and concurrency
control explicitly into programs can
impose a high conceptual burden
on the programmer, and may entail a
significant rewrite of the
original program.
In this paper, we consider a new
technique to address this issue. Our
approach makes use of futures, a simple annotation that
introduces asynchronous
concurrency into Java programs, but provides
no concurrency control. To
ensure concurrent execution does not yield
behavior inconsistent with
sequential execution (i.e., execution
yielded by erasing all futures),
we present a new interprocedural
summary-based dataflow
analysis. The analysis inserts lightweight
barriers that block and resume
threads executing futures if a
dependency violation may
ensue. There are no constraints on how
threads execute other than those
imposed by these barriers.
Our experimental results indicate
futures can be leveraged to
transparently ensure safety and
profitably exploit parallelism; in
contrast to earlier efforts, our
technique is completely portable, and
requires no modifications to the
underlying JVM.
2007