|
| 1 | +Building a JIT: Adding Optimizations \-- An introduction to ORC Layers |
| 2 | +====================================================================== |
| 3 | + |
| 4 | +::: {.contents} |
| 5 | +::: |
| 6 | + |
| 7 | +**This tutorial is under active development. It is incomplete and |
| 8 | +details may change frequently.** Nonetheless we invite you to try it out |
| 9 | +as it stands, and we welcome any feedback. |
| 10 | + |
| 11 | +Chapter 2 Introduction |
| 12 | +---------------------- |
| 13 | + |
| 14 | +**Warning: This tutorial is currently being updated to account for ORC |
| 15 | +API changes. Only Chapters 1 and 2 are up-to-date.** |
| 16 | + |
| 17 | +**Example code from Chapters 3 to 5 will compile and run, but has not |
| 18 | +been updated** |
| 19 | + |
| 20 | +Welcome to Chapter 2 of the \"Building an ORC-based JIT in LLVM\" |
| 21 | +tutorial. In [Chapter 1](BuildingAJIT1.html) of this series we examined |
| 22 | +a basic JIT class, KaleidoscopeJIT, that could take LLVM IR modules as |
| 23 | +input and produce executable code in memory. KaleidoscopeJIT was able to |
| 24 | +do this with relatively little code by composing two off-the-shelf *ORC |
| 25 | +layers*: IRCompileLayer and ObjectLinkingLayer, to do much of the heavy |
| 26 | +lifting. |
| 27 | + |
| 28 | +In this layer we\'ll learn more about the ORC layer concept by using a |
| 29 | +new layer, IRTransformLayer, to add IR optimization support to |
| 30 | +KaleidoscopeJIT. |
| 31 | + |
| 32 | +Optimizing Modules using the IRTransformLayer |
| 33 | +--------------------------------------------- |
| 34 | + |
| 35 | +In [Chapter 4](LangImpl04.html) of the \"Implementing a language with |
| 36 | +LLVM\" tutorial series the llvm *FunctionPassManager* is introduced as a |
| 37 | +means for optimizing LLVM IR. Interested readers may read that chapter |
| 38 | +for details, but in short: to optimize a Module we create an |
| 39 | +llvm::FunctionPassManager instance, configure it with a set of |
| 40 | +optimizations, then run the PassManager on a Module to mutate it into a |
| 41 | +(hopefully) more optimized but semantically equivalent form. In the |
| 42 | +original tutorial series the FunctionPassManager was created outside the |
| 43 | +KaleidoscopeJIT and modules were optimized before being added to it. In |
| 44 | +this Chapter we will make optimization a phase of our JIT instead. For |
| 45 | +now this will provide us a motivation to learn more about ORC layers, |
| 46 | +but in the long term making optimization part of our JIT will yield an |
| 47 | +important benefit: When we begin lazily compiling code (i.e. deferring |
| 48 | +compilation of each function until the first time it\'s run) having |
| 49 | +optimization managed by our JIT will allow us to optimize lazily too, |
| 50 | +rather than having to do all our optimization up-front. |
| 51 | + |
| 52 | +To add optimization support to our JIT we will take the KaleidoscopeJIT |
| 53 | +from Chapter 1 and compose an ORC *IRTransformLayer* on top. We will |
| 54 | +look at how the IRTransformLayer works in more detail below, but the |
| 55 | +interface is simple: the constructor for this layer takes a reference to |
| 56 | +the execution session and the layer below (as all layers do) plus an *IR |
| 57 | +optimization function* that it will apply to each Module that is added |
| 58 | +via addModule: |
| 59 | + |
| 60 | +``` c++ |
| 61 | +class KaleidoscopeJIT { |
| 62 | +private: |
| 63 | + ExecutionSession ES; |
| 64 | + RTDyldObjectLinkingLayer ObjectLayer; |
| 65 | + IRCompileLayer CompileLayer; |
| 66 | + IRTransformLayer TransformLayer; |
| 67 | + |
| 68 | + DataLayout DL; |
| 69 | + MangleAndInterner Mangle; |
| 70 | + ThreadSafeContext Ctx; |
| 71 | + |
| 72 | +public: |
| 73 | + |
| 74 | + KaleidoscopeJIT(JITTargetMachineBuilder JTMB, DataLayout DL) |
| 75 | + : ObjectLayer(ES, |
| 76 | + []() { return std::make_unique<SectionMemoryManager>(); }), |
| 77 | + CompileLayer(ES, ObjectLayer, ConcurrentIRCompiler(std::move(JTMB))), |
| 78 | + TransformLayer(ES, CompileLayer, optimizeModule), |
| 79 | + DL(std::move(DL)), Mangle(ES, this->DL), |
| 80 | + Ctx(std::make_unique<LLVMContext>()) { |
| 81 | + ES.getMainJITDylib().setGenerator( |
| 82 | + cantFail(DynamicLibrarySearchGenerator::GetForCurrentProcess(DL))); |
| 83 | + } |
| 84 | +``` |
| 85 | +
|
| 86 | +Our extended KaleidoscopeJIT class starts out the same as it did in |
| 87 | +Chapter 1, but after the CompileLayer we introduce a new member, |
| 88 | +TransformLayer, which sits on top of our CompileLayer. We initialize our |
| 89 | +OptimizeLayer with a reference to the ExecutionSession and output layer |
| 90 | +(standard practice for layers), along with a *transform function*. For |
| 91 | +our transform function we supply our classes optimizeModule static |
| 92 | +method. |
| 93 | +
|
| 94 | +``` c++ |
| 95 | +// ... |
| 96 | +return cantFail(OptimizeLayer.addModule(std::move(M), |
| 97 | + std::move(Resolver))); |
| 98 | +// ... |
| 99 | +``` |
| 100 | + |
| 101 | +Next we need to update our addModule method to replace the call to |
| 102 | +`CompileLayer::add` with a call to `OptimizeLayer::add` instead. |
| 103 | + |
| 104 | +``` c++ |
| 105 | +static Expected<ThreadSafeModule> |
| 106 | +optimizeModule(ThreadSafeModule M, const MaterializationResponsibility &R) { |
| 107 | + // Create a function pass manager. |
| 108 | + auto FPM = std::make_unique<legacy::FunctionPassManager>(M.get()); |
| 109 | + |
| 110 | + // Add some optimizations. |
| 111 | + FPM->add(createInstructionCombiningPass()); |
| 112 | + FPM->add(createReassociatePass()); |
| 113 | + FPM->add(createGVNPass()); |
| 114 | + FPM->add(createCFGSimplificationPass()); |
| 115 | + FPM->doInitialization(); |
| 116 | + |
| 117 | + // Run the optimizations over all functions in the module being added to |
| 118 | + // the JIT. |
| 119 | + for (auto &F : *M) |
| 120 | + FPM->run(F); |
| 121 | + |
| 122 | + return M; |
| 123 | +} |
| 124 | +``` |
| 125 | +
|
| 126 | +At the bottom of our JIT we add a private method to do the actual |
| 127 | +optimization: *optimizeModule*. This function takes the module to be |
| 128 | +transformed as input (as a ThreadSafeModule) along with a reference to a |
| 129 | +reference to a new class: `MaterializationResponsibility`. The |
| 130 | +MaterializationResponsibility argument can be used to query JIT state |
| 131 | +for the module being transformed, such as the set of definitions in the |
| 132 | +module that JIT\'d code is actively trying to call/access. For now we |
| 133 | +will ignore this argument and use a standard optimization pipeline. To |
| 134 | +do this we set up a FunctionPassManager, add some passes to it, run it |
| 135 | +over every function in the module, and then return the mutated module. |
| 136 | +The specific optimizations are the same ones used in [Chapter |
| 137 | +4](LangImpl04.html) of the \"Implementing a language with LLVM\" |
| 138 | +tutorial series. Readers may visit that chapter for a more in-depth |
| 139 | +discussion of these, and of IR optimization in general. |
| 140 | +
|
| 141 | +And that\'s it in terms of changes to KaleidoscopeJIT: When a module is |
| 142 | +added via addModule the OptimizeLayer will call our optimizeModule |
| 143 | +function before passing the transformed module on to the CompileLayer |
| 144 | +below. Of course, we could have called optimizeModule directly in our |
| 145 | +addModule function and not gone to the bother of using the |
| 146 | +IRTransformLayer, but doing so gives us another opportunity to see how |
| 147 | +layers compose. It also provides a neat entry point to the *layer* |
| 148 | +concept itself, because IRTransformLayer is one of the simplest layers |
| 149 | +that can be implemented. |
| 150 | +
|
| 151 | +``` c++ |
| 152 | +// From IRTransformLayer.h: |
| 153 | +class IRTransformLayer : public IRLayer { |
| 154 | +public: |
| 155 | + using TransformFunction = std::function<Expected<ThreadSafeModule>( |
| 156 | + ThreadSafeModule, const MaterializationResponsibility &R)>; |
| 157 | +
|
| 158 | + IRTransformLayer(ExecutionSession &ES, IRLayer &BaseLayer, |
| 159 | + TransformFunction Transform = identityTransform); |
| 160 | +
|
| 161 | + void setTransform(TransformFunction Transform) { |
| 162 | + this->Transform = std::move(Transform); |
| 163 | + } |
| 164 | +
|
| 165 | + static ThreadSafeModule |
| 166 | + identityTransform(ThreadSafeModule TSM, |
| 167 | + const MaterializationResponsibility &R) { |
| 168 | + return TSM; |
| 169 | + } |
| 170 | +
|
| 171 | + void emit(MaterializationResponsibility R, ThreadSafeModule TSM) override; |
| 172 | +
|
| 173 | +private: |
| 174 | + IRLayer &BaseLayer; |
| 175 | + TransformFunction Transform; |
| 176 | +}; |
| 177 | +
|
| 178 | +// From IRTransformLayer.cpp: |
| 179 | +
|
| 180 | +IRTransformLayer::IRTransformLayer(ExecutionSession &ES, |
| 181 | + IRLayer &BaseLayer, |
| 182 | + TransformFunction Transform) |
| 183 | + : IRLayer(ES), BaseLayer(BaseLayer), Transform(std::move(Transform)) {} |
| 184 | +
|
| 185 | +void IRTransformLayer::emit(MaterializationResponsibility R, |
| 186 | + ThreadSafeModule TSM) { |
| 187 | + assert(TSM.getModule() && "Module must not be null"); |
| 188 | +
|
| 189 | + if (auto TransformedTSM = Transform(std::move(TSM), R)) |
| 190 | + BaseLayer.emit(std::move(R), std::move(*TransformedTSM)); |
| 191 | + else { |
| 192 | + R.failMaterialization(); |
| 193 | + getExecutionSession().reportError(TransformedTSM.takeError()); |
| 194 | + } |
| 195 | +} |
| 196 | +``` |
| 197 | + |
| 198 | +This is the whole definition of IRTransformLayer, from |
| 199 | +`llvm/include/llvm/ExecutionEngine/Orc/IRTransformLayer.h` and |
| 200 | +`llvm/lib/ExecutionEngine/Orc/IRTransformLayer.cpp`. This class is |
| 201 | +concerned with two very simple jobs: (1) Running every IR Module that is |
| 202 | +emitted via this layer through the transform function object, and (2) |
| 203 | +implementing the ORC `IRLayer` interface (which itself conforms to the |
| 204 | +general ORC Layer concept, more on that below). Most of the class is |
| 205 | +straightforward: a typedef for the transform function, a constructor to |
| 206 | +initialize the members, a setter for the transform function value, and a |
| 207 | +default no-op transform. The most important method is `emit` as this is |
| 208 | +half of our IRLayer interface. The emit method applies our transform to |
| 209 | +each module that it is called on and, if the transform succeeds, passes |
| 210 | +the transformed module to the base layer. If the transform fails, our |
| 211 | +emit function calls `MaterializationResponsibility::failMaterialization` |
| 212 | +(this JIT clients who may be waiting on other threads know that the code |
| 213 | +they were waiting for has failed to compile) and logs the error with the |
| 214 | +execution session before bailing out. |
| 215 | + |
| 216 | +The other half of the IRLayer interface we inherit unmodified from the |
| 217 | +IRLayer class: |
| 218 | + |
| 219 | +``` c++ |
| 220 | +Error IRLayer::add(JITDylib &JD, ThreadSafeModule TSM, VModuleKey K) { |
| 221 | + return JD.define(std::make_unique<BasicIRLayerMaterializationUnit>( |
| 222 | + *this, std::move(K), std::move(TSM))); |
| 223 | +} |
| 224 | +``` |
| 225 | +
|
| 226 | +This code, from `llvm/lib/ExecutionEngine/Orc/Layer.cpp`, adds a |
| 227 | +ThreadSafeModule to a given JITDylib by wrapping it up in a |
| 228 | +`MaterializationUnit` (in this case a |
| 229 | +`BasicIRLayerMaterializationUnit`). Most layers that derived from |
| 230 | +IRLayer can rely on this default implementation of the `add` method. |
| 231 | +
|
| 232 | +These two operations, `add` and `emit`, together constitute the layer |
| 233 | +concept: A layer is a way to wrap a portion of a compiler pipeline (in |
| 234 | +this case the \"opt\" phase of an LLVM compiler) whose API is is opaque |
| 235 | +to ORC in an interface that allows ORC to invoke it when needed. The add |
| 236 | +method takes an module in some input program representation (in this |
| 237 | +case an LLVM IR module) and stores it in the target JITDylib, arranging |
| 238 | +for it to be passed back to the Layer\'s emit method when any symbol |
| 239 | +defined by that module is requested. Layers can compose neatly by |
| 240 | +calling the \'emit\' method of a base layer to complete their work. For |
| 241 | +example, in this tutorial our IRTransformLayer calls through to our |
| 242 | +IRCompileLayer to compile the transformed IR, and our IRCompileLayer in |
| 243 | +turn calls our ObjectLayer to link the object file produced by our |
| 244 | +compiler. |
| 245 | +
|
| 246 | +So far we have learned how to optimize and compile our LLVM IR, but we |
| 247 | +have not focused on when compilation happens. Our current REPL is eager: |
| 248 | +Each function definition is optimized and compiled as soon as it is |
| 249 | +referenced by any other code, regardless of whether it is ever called at |
| 250 | +runtime. In the next chapter we will introduce fully lazy compilation, |
| 251 | +in which functions are not compiled until they are first called at |
| 252 | +run-time. At this point the trade-offs get much more interesting: the |
| 253 | +lazier we are, the quicker we can start executing the first function, |
| 254 | +but the more often we will have to pause to compile newly encountered |
| 255 | +functions. If we only code-gen lazily, but optimize eagerly, we will |
| 256 | +have a longer startup time (as everything is optimized) but relatively |
| 257 | +short pauses as each function just passes through code-gen. If we both |
| 258 | +optimize and code-gen lazily we can start executing the first function |
| 259 | +more quickly, but we will have longer pauses as each function has to be |
| 260 | +both optimized and code-gen\'d when it is first executed. Things become |
| 261 | +even more interesting if we consider interprocedural optimizations like |
| 262 | +inlining, which must be performed eagerly. These are complex trade-offs, |
| 263 | +and there is no one-size-fits all solution to them, but by providing |
| 264 | +composable layers we leave the decisions to the person implementing the |
| 265 | +JIT, and make it easy for them to experiment with different |
| 266 | +configurations. |
| 267 | +
|
| 268 | +[Next: Adding Per-function Lazy Compilation](BuildingAJIT3.html) |
| 269 | +
|
| 270 | +Full Code Listing |
| 271 | +----------------- |
| 272 | +
|
| 273 | +Here is the complete code listing for our running example with an |
| 274 | +IRTransformLayer added to enable optimization. To build this example, |
| 275 | +use: |
| 276 | +
|
| 277 | +``` bash |
| 278 | +# Compile |
| 279 | +clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core orcjit native` -O3 -o toy |
| 280 | +# Run |
| 281 | +./toy |
| 282 | +``` |
| 283 | + |
| 284 | +Here is the code: |
| 285 | + |
| 286 | +::: {.literalinclude} |
| 287 | +../../examples/Kaleidoscope/BuildingAJIT/Chapter2/KaleidoscopeJIT.h |
| 288 | +::: |
0 commit comments