STCLang: State Thread Composition as a Foundation for Monadic Dataflow Parallelism
Sebastian Ertel, Justus Adam, Norman A. Rink, Andrés Goens, Jeronimo Castrillon
Proceedings of the 12th ACM SIGPLAN International Symposium on Haskell
Dataflow execution models are used to build highly scalable parallel systems. A programming model that targets parallel dataflow execution must answer the following question: How can parallelism between two dependent nodes in a dataflow graph be exploited? This is difficult when the dataflow language or programming model is implemented by a monad, as is common in the functional community, since expressing dependence between nodes by a monadic bind suggests sequential execution.
Even in monadic constructs that explicitly separate state from computation, problems arise due to the need to reason about opaquely defined state. Specifically, when abstractions of the chosen programming model do not enable adequate reasoning about state, it is difficult to detect parallelism between composed stateful computations.
In this paper, we propose a programming model that enables the composition
of stateful computations and still exposes opportunities for
parallelization. We also introduce
smap, a higher-order function that can
exploit parallelism in stateful computations. We present an implementation
of our programming model and
smap in Haskell and show that basic concepts
from functional reactive programming can be built on top of our programming
model with little effort. We compare these implementations to a
state-of-the-art approach using monad-par and LVars to expose parallelism
explicitly and reach the same level of performance, showing that our
programming model successfully extracts parallelism that is present in an
algorithm. Further evaluation shows that
smap is expressive enough to
implement parallel reductions and our programming model resolves
short-comings of the stream-based programming model for current
state-of-the-art big data processing systems.
Supporting Fine-grained Dataflow Parallelism in Big Data Systems.
Sebastian Ertel, Justus Adam, Jeronimo Castrillon
Big data systems scale with the number of cores in a cluster for the parts of an application that can be executed in data parallel fashion. It has been recently reported, however, that these systems fail to translate hardware improvements, such as increased network bandwidth, into a higher throughput. This is particularly the case for applications that have inherent sequential, computationally intensive phases. In this paper, we analyze the data processing cores of state-of-the-art big data systems to find the cause for these scalability problems. We identify design patterns in the code that are suitable for pipeline and task-level parallelism, potentially increasing application performance. As a proof of concept, we rewrite parts of the Hadoop MapReduce framework in an implicit parallel language that exploits this parallelism without adding code complexity. Our experiments on a data analytics workload show throughput speedups of up to 3.5x.
Compiling for concise code and efficient I/O
Sebastian Ertel, Andrés Goens, Justus Adam, Jeronimo Castrillon
Proceedings of the 27th International Conference on Compiler Construction
Large infrastructures of Internet companies, such as Facebook and Twitter, are composed of several layers of micro-services. While this modularity provides scalability to the system, the I/O associated with each service request strongly impacts its performance. In this context, writing concise programs which execute I/O efficiently is especially challenging. In this paper, we introduce Ÿauhau, a novel compile-time solution. Ÿauhau reduces the number of I/O calls through rewrites on a simple expression language. To execute I/O concurrently, it lowers the expression language to a dataflow representation. Our approach can be used alongside an existing programming language, permitting the use of legacy code. We describe an implementation in the JVM and use it to evaluate our approach. Experiments show that Ÿauhau can significantly improve I/O, both in terms of the number of I/O calls and concurrent execution. Ÿauhau outperforms state-of-the-art approaches with similar goals.
Control Flow and Side Effects support in a Framework for automatic I/O batching
Bachelors Thesis Justus Adam
Technische Universität Dresden
The largest source of latency for many modern systems, particularly services, is the inherent latency of I/O operations. Disk I/O, database access and network bound communication is frequent, especially in data processing systems and distributed service infrastructures. Making I/O efficient is cumbersome, involves asynchronous programming and caching and hinders modularity when attempting to perform batch jobs. There have been a few solutions proposed to automate this, first and foremost Haxl, a Haskell library. We proposed an alternate, improved implementation called Ÿauhau. It leverages dataflow to produce even better performance while being independent from any particular code writing style in a minimal Embedded Domain-Specific Language (EDSL). In this thesis we first explore the techniques used by Ÿauhau to make I/O more efficient. Secondly we identify, explain and solve issues in Ÿauhau arising from control flow. Furthermore we suggest new optimisations for the compiler and implement safety guards for handling side effects.
Ohua: Implicit Dataflow Programming for Concurrent Systems
Sebastian Ertel, Christoph Fetzer, Pascal Felber
Proceedings of the Principles and Practices of Programming on The Java Platform
Concurrent programming has always been a challenging task best left to expert developers. Yet, with the advent of multi-core systems, programs have to explicitly deal with multithreading to fully exploit the parallel processing capabilities of the underlying hardware. There has been much research over the last decade on abstractions to develop concurrent code that is both safe and efficient, e.g., using message passing, transactional memory, or event-based programming. In this paper, we focus on the dataflow approach as a way to design scalable concurrent applications. We propose a new dataflow engine and programming framework, named Ohua, that supports implicit parallelism. Ohua marries object-oriented and functional languages: functionality developed in Java can be composed with algorithms in Clojure. This allows us to use different programming styles for the tasks each language is best adapted for. The actual dataflow graphs are automatically derived from the Clojure code. We show that Ohua is not only powerful and easy to use for the programmer, but also produces applications that scale remarkably well: comparative evaluation indicates that a simple web server developed with Ohua outperforms the highly-optimized Jetty server in terms of throughput while being competitive in terms of latency. We also evaluate the impact on energy consumption to validate previous studies indicating that dataflow and message passing can be more energy-efficient than concurrency control based on shared-memory synchronization.