Final Report: GSoC '24 - Integrating Trixi.jl with Enzyme.jl

The goal of this GSoC project was to integrate Trixi.jl with compiler-based automatic differentiation via Enzyme.jl.

Project Overview

The core idea was to bring together two powerful Julia packages: Trixi.jl, which is this amazing CFD framework for conservation laws, and Enzyme.jl, which does compiler-level automatic differentiation. Why? Because this combination lets us do some really cool stuff - we can differentiate through complex numerical simulations faster, handle both forward and reverse mode AD, and (this is the really neat part) deal with all the mutation and caching that Trixi.jl uses to keep things running fast. This work was undertaken as part of the Google Summer of Code 2024 program, and the progress is summarized below:

Please note that some aspects of the GPU integration remain in progress and will be completed in future work.

How to Setup

CPU Version

To install the package, run the following command in the Julia REPL:

]  # enter Pkg mode
(@v1.10) pkg> add https://github.com/junyixu/TrixiEnzyme.jl.git

Then simply run the following command to use the package:

using TrixiEnzyme

GPU Version

For GPU support, you'll need additional setup steps. Please refer to the detailed setup guide from GSoC 2023's GPU implementation: GPU Setup Guide.

Key Highlights

API Overview

Here are the main APIs we developed. Some of them are:

CPU Differentiation

GPU Differentiation

What We Achieved

One of our major accomplishments was implementing efficient automatic differentiation for DGSEM (Discontinuous Galerkin Spectral Element Method). We implemented both forward and reverse mode, each with its own strengths. Both modes work equally well for computing Jacobians, and our implementation lets users choose based on their specific workflow preferences.

Performance optimization was a big focus. We developed sophisticated batching strategies to handle Trixi's complex caching system efficiently. The automatic batch size selection turned out to be crucial for balancing memory usage and computation speed.

In the Enzyme4CUDA branch, we made significant progress on GPU support:

However, we encountered some technical challenges with GPU integration. There's currently a circular dependency issue during precompilation when using CUDA.jl with Enzyme.jl - it affects Julia 1.10.7 and we're waiting for 1.10.8 to fix it. For now, we found that avoiding module wrappers for CUDA+Enzyme test code works as a workaround (Issue #5).

The Technical Nitty-Gritty

Here are some key technical insights we gained:

Enzyme Integration

When working with Enzyme.autodiff, naming conventions are crucial. We prefix everything with enzyme_ to ensure proper unpacking of semi.cache and correct interaction with Enzyme's APIs.

Forward vs Reverse Mode Implementation

The core difference between forward and reverse mode in Enzyme.jl comes down to whether you set dy or dx as your onehot vector. However, there are important considerations:

Performance Characteristics

We observed interesting performance patterns:

GPU Implementation Details

Our GPU implementation required careful attention to:

Future Work

Acknowledgements

This project was made possible through the support and guidance of many incredible people in the Julia community. My mentors played crucial roles throughout the project - Michael Schlottke-Lakemper (@sloede) spent numerous video calls helping me debug issues and guided me in seeking help on Slack, while Hendrik Ranocha (@ranocha) provided invaluable insights into type stability issues that significantly improved our implementation.

William Moses (@wsmoses) from the Enzyme.jl team deserves special thanks for his documentation examples and responsive support through both Slack discussions and GitHub issues. His work on Enzyme.jl has been foundational to this project.

I'm also grateful to Huiyu Xie (@huiyuxie) for her technical support regarding GPU implementation. Her expertise with CUDA.jl integration proved invaluable as we worked to extend TrixiEnzyme's capabilities to GPU computing.

The project also received helpful feedback from Benedict from the Trixi Framework community.

The newer versions of Enzyme.jl have been super helpful with their improved error messages - makes debugging much more manageable!