You may choose whatever storage duration you feel like for your own types, but built in types are globally declared and therefore shared between VM instances.
I see, so it refers to the C storage class. There can be several virtual machine instances referencing the same built-in integer type instance. This imposes on users the need to call lg_init() and lg_deinit() before and after using the library. I feel like this could have been avoided by statically enumerating the built-in types instead of initializing a data structure dynamically.
The data type structure contains pointers to functions that implement operations such as addition, subtraction, copying and cloning. An integer type instance is initialized with addition and subtraction functions. The integer value representation is part of the value data structure though. So how could a user of the library define new data types? It seems like it would be necessary to modify the library's source code in order to add new members to the tagged union.
Also, perhaps the virtual machine could be optimized further by using tagged pointers to make integer values immediate, avoiding the need to dereference the pointer.
You would either have to reuse one of the existing representations or modify the union atm.
I'll add a void *as_any to the union eventually which means you're just one level of indirection from supporting any representation without touching the union.
> ideas on how to improve its performance further without making a mess are most welcome.
One approach would be designing the input language for performance. In particular, having statically typed operations. A specialized iadd instruction for values that you are sure you want to treat as integers would save you a lot of indirecting through function pointers. A disadvantage of static typing is that you need to implement type checks if you want to guarantee well typedness.
Another (orthogonal) approach would be to consider a JIT backend. Not one you write yourself, that could definitely be considered "making a mess". But in the past I've had success using LibJIT (https://www.gnu.org/software/libjit/) for speeding up a stack interpreter. In that case, it was a subset of Python bytecode (see https://github.com/gergo-/pylibjit, the code has very probably bitrotted).
This piqued my intrest. I’m a hobbyist C programmer so forgive me if this is a rudimentary question. What is the rationale/convention for naming a variable “_”? I’ve never seen this before.
An underscore is primarily used as an identifier to denote a variable/parameter whose name does not really matter. It's more commonly used when you're not planning on using the aforementioned variable, but it's also sometimes used when the identifier is used but its name doesn't matter.
One such instance is the abbreviated scala lambda syntax, where
(x:Int) => x + 2
can be abbreviated to
_ + 2
in the same way that kotlin would allow abbreviating it as
{ it + 2 }
with its equivalent default lambda parameter name, "it".
In the example you quoted, the identifier denotes the sole parameter, so in a sense its name does not matter, and as such people from certain programming circles might be inclined to use an underscore instead of taking the time to come up with an appropriate name. It's not like a more descriptive name would help in that example, the type name and function name already give sufficient context for it to be perfectly clear what the parameter is for, and it's not like parameter names have any semantic significance in C.
> It's not like a more descriptive name would help in that example, the type name and function name already give sufficient context for it to be perfectly clear what the parameter is for, and it's not like parameter names have any semantic significance in C.
I agree, and I’ve been thinking about this today. There is also a minimalistic quality to this style, the _ pointer is more prominent simply by not having a real name. I like it!
I think this is borrowing a convention from Perl, where the variable $_ is often given an implicit value of "the thing I'm talking about" when you didn't bother to give a real variable name for it.
I've seen variables named _ when you dont use it (and it's the way it's not an error not to use it after declaration in rust or go by example), but in that context I find it weird.
The ISO C standard in section 7.1.3 states that global functions and variables in compiler/system libraries should be prefixed with an underscore to avoid the risk of conflict with names in user programs.
I checked and indeed... lg_buf is a global variable.
But I've never seen anyone use just the underscore prefix without a name before.
Funny, I actually have some fib benchmarks for my RISC-V emulator! It uses fib(40), but I added one for 20. Interestingly, that is the one benchmark that LuaJIT crushes my emulator in, so I'm still working on beating it, but i don't really have any plan. :)
libriscv: fib(20) median 317ns lowest: 310ns highest: 356ns
luajit: fib(20) median 146ns lowest: 145ns highest: 170ns
lua5.4: fib(20) median 631ns lowest: 598ns highest: 694ns
Running your emulator:
$ ./fibrec
567us
Modern compilers used with emulated machines can beat even v8 at times.
Cool project! I divided your number by 100, is that correct? Is there some additional overhead in setting up / tearing down something?
You might want to peruse the C sources for GForth which has been under continuous development for 20 years or so. It introduced a concept called super-instructions that speeds things up quite a bit. I am not an expert on the internals, just a casual user.
> The core loop uses computed goto, which means that new instructions must be added in identical order
> Values are represented as tagged unions.
> Fundamental types are global (as in not tied to a specific VM instance)
What exactly is a global fundamental type? Is there a local counterpart?