Greg Bronevetsky

News

6/1/2008: Did my duty to supercomputing by participating in the program committee for Supercomputing 2008. It was fun and we finished surprisingly quickly (~1 day), despite having to go through something like 70 papers. And on the second day, Esther's Follies!

5/28/2008: We got back responses to our DOE letters of intent. I batted 1 out of 3, with DOE inviting us to submit a full proposal for "Snooper Threads: Gaining Visibility Into Multi-Core Applications For Debug And Performance Analysis".

5/13/2008: Presented my paper "CLOMP: Accurately Characterizing OpenMP Application Overheads" at the International Workshop on OpenMP (IWOMP) at Purdue University.

5/12/2008: Submitted three letters of intent to the Department of Energy ASCR call for proposals for tools for petascale computing:

  • "Measuring Software Quality Via Bug Injection" focuses on using fault bug injection techniques to estimate the true and false negative rates of application test suites (i.e. probability of undetected bugs) and debugging tools, which is useful for refining the quality of these suites and to help improve future releases of tools.
  • "Lightweight And Extensible Correctness Checkers For Explicitly Parallel Programs" will look at developing a variety of static compiler analyses to detect bugs in parallel applications. Furthermore, the analyses will be used to improve the performance and accuracy of dynamic bug detection tools. My work will focus on compiler techniques to detect the communication topology of parallel applications.
  • "Snooper Threads: Gaining Visibility Into Multi-Core Applications For Debug And Performance Analysis" will use extra cores on future many-core processors to monitor the operation of the application on a chip's primary cores. These "snooper threads" will be used to help application debugging, to dynamically identify and correct load-balancing problems and to predict future application actions.

5/10/2008: I'm working on an LLNL Strategic Initiative proposal on techniques for ensuring scalable performance and usability of petascale applications. This includes many of the ideas in my fault tolerance LDRD proposal as well as other things relating to debugging, scalable solvers and online application monitoring.

5/9/2008: I submitted a full proposal to the LLNL Lab-Directed Research (LDRD) program entitled "Scalable Fault Tolerance for Petascale Systems", focused on developing an infrastructure for enabling applications to become fault-tolerant. This includes things like high-performance checkpoint storage and coordination as well as support for application-specific error detection and correction techniques. I will give the presentation in late June.

5/8/2008: EuroPVM has accepted "On the Performance of Transparent MPI Piggyback Messages", which I wrote in collaboration with Martin Schulz and Bronis de Supinski!

4/14/2008: I submitted a paper to Supercomputing 2008 with Adam Moody, entitled "Scalable I/O Systems via Node-Local Storage: Approaching 1 TB/sec File I/O". It presents a case that modern supercomputers are designed with poor bandwidth to the storage system, which results in poor checkpointing performance, among other things. The paper suggests that we can get much better bandwidth if our machines had more node-local storage and supports this case by presenting experimental performance numbers from a checkpoint storage library that reaches 1TB/s aggregate storage bandwidth by using node-local storage.

4/12/2008: I submitted a paper to EuroPVM with Martin Schulz and Bronis de Supinski, entitled "On the Performance of Transparent MPI Piggyback Messages". We're arguing that piggybacking is a critical capability for MPI tools and one that can not be implemented efficiently using mechanisms available in the current MPI standard. We're currently trying to add explicit support for piggybacking into the MPI 3 standard.

4/11/2008: My Fault Tolerance LDRD proposal was selected by the Computation Directorate to be submitted to the lab's LDRD committee.

3/31/2008: I am officially organizing the Workshop on the Analysis of System Logs (WASL) at this year's OSDI. The workshop website is up and I am currently assembling the program committee. Now I just need some sponsorship and we're set!

3/24/2008: Out of my two ICS submissions I got one accept and one reject (~25% conference acceptance rate). The acceptance was "Soft Error Vulnerability of Iterative Linear Algebra Methods", which is the first paper in a 6-paper series on analyzing the fault vulnerability of applications, with paper 2 scheduled for Supercomputing (Rolling Forwards!). The rejection was "Compiler-Enhanced Incremental Checkpointing for OpenMP Applications", which actually got good reviews but the reviewers were confused by the prior LCPC version of the paper and the PPoPP poster and thought we were republishing. Oh well, it happens!

3/20/2008: Presented one of the proposals (Fault Tolerance for Petascale Systems) to the Computation Directorate pre-proposal committee. Seemed to go pretty well.

3/14/2008: Working on two LLNL LDRD funding proposals (LDRD - lab-directed research funding). One is on scalable checkpointing techniques, while the other focuses on using extra cores on many-core processors to monitor the primary computation.

2/23/2008: Presented the poster "Compiler-Enhanced Incremental Checkpointing for OpenMP Applications" at PPoPP 2008.

1/22/1008: Submitted "Soft Error Vulnerability of Iterative Linear Algebra Methods" and "Compiler-Enhanced Incremental Checkpointing for OpenMP Applications" to ICS 2008.