-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathNOTES
1189 lines (909 loc) · 39.1 KB
/
NOTES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
These notes were, for the most part, written before the corresponding
features were implemented. So they're likely to be speculative and
inaccurate.
It's interesting (to me) to see the ideas become more refined.
----------------------------------
Note 1: Give up on saving a word on pairs. Just give them a normal
type record. Revisit later if you really want to.
Note 2: Unify the rdt and the mem_ops_t. Give them the same layout,
maybe with some fields unused. Then the heap copying code doesn't
have to special case record instances. Add an "I am not in the heap"
flag somewhere, so the copier can ignore out-of-heap rdts.
----------------------------------
2010.01.17
If we're going to use the C reader with Scheme I/O, we need a general
way for C to call Scheme. I think.
----------------------------------
2010.01.18
Time to refactor mem.h. First cut: three kinds of data -- whole heap
operations, things that are needed for obj definitions, and things
that are needed throughout the interpreter.
Whole heap stuff (heap.h?)
set_heap_size_bytes()
init_heap()
Obj definition stuff (mem.h?):
heap_object_t
OBJ_ALIGN
is_forward()
is_immediate()
is_special()
obj_heap_ptr()
obj_fwd_ptr()
heap_object_set_fwd_ptr()
mem ops
mem_ops_t definition
heap_object
obj_heap_object()
heap_obect_mem_ops()
obj_mem_ops()
is_primitive_obj()
is_record_instance()
CHECK_OBJ, check_obj
mem_alloc_obj
Global stuff (obj.h? object.h?):
obj_t
word_t
masks, tags, shifts, and bits
obj_bits()
macros to create chars, bools, and reader constants
FALSE_OBJ (obj_boolean.h?)
TRUE_OBJ (obj_boolean.h?)
EMPTY_LIST (obj_null.h?)
UNDEFINED (obj_undefined.h?)
END_OF_FILE (obj_???)
MEM_OPS_PRIMITIVE
READER_CONSTANT - rename to READER_ACTION?
is_heap()
is_null() (obj_null.h?)
is_undefined (obj_undefined.h)
mem_ops_t declaration
obj_type_name()
----------------------------------
2010.01.25
Work interferes.
Trying to decide what the next big step should be. I wanted to
implement continuations as records, which means I need records next.
Or I could just do continuations ad-hoc and move on to eval.
I've been thinking about the problem of calling Scheme from C. I
can't just call eval() from arbitrary places because (a) eval() uses a
static jmp_buf to recover from exceptions, and (b) I'm trying to have
a hard bound on C stack size.
So I reviewed the C coroutine literature.
http://en.wikipedia.org/wiki/Coroutine#Implementations_for_C
http://www.chiark.greenend.org.uk/~sgtatham/coroutines.html
Nothing there appears to be directly applicable. So for now, the rule
is, C does not call Scheme (except once at the top level).
So let's talk about eval. One question to answer is, what happens
when a procedure is applied to some arguments? The old Scheme
interpreter had some complex mumbo-jumbo where it built the arg list
in place as each arg was evaluated, and then the procedure was called.
I'm envisioning four registers.
CONT is the continuation register. It points to the current
eval's continuation. Think of it as a return PC.
TBE is to-be-evaluated. It points to a LIFO list of expressions
to evaluate.
VALUES points to a LIFO list of values of past evaluations.
ENV is the current environment.
Eval's basic operation is to pop one item off TBE, evaluate it,
push the result onto VALUES, and then jump to CONT.
For self-evaluating expressions, that's pretty easy.
def eval(expr):
expr = TBE.pop()
if expr.is_self_evaluating():
VALUES = cons(expr, VALUES)
elif ...
...
goto CONT
For symbols, it's just a lookup in the current environment.
...
elif expr.is_symbol():
VALUES = cons(ENV.lookup(expr), VALUES)
...
For applications (aka procedure calls), we need to push a continuation
to apply the operator to its args and then push more continuations to
evaluate each of the arguments. (Ignore special forms for now.)
...
elif expr.is_application():
CONT = make_continuation(code=apply,
arg=expr,
arg2=VALUES,
env=ENV,
cont=CONT)
for arg in expr.arguments():
CONT = make_continuation(code=eval,
arg=arg,
env=ENV,
cont=CONT)
...
Then apply looks something like this.
def apply(expr, eoargs):
# Reverse and null-terminate the argument list
args = nil
while VALUES != eoargs:
args = cons(car(VALUES), args)
VALUES = cdr(VALUES)
CONT = make_continuation(code=expr.operator(),
env=push_env(expr.arglist, args, ENV),
cont=CONT)
Something like that.
Meanwhile, there is a large number of primitives. It would be grand
if we had a simple ABI for simple primitives. Something like this.
obj_t my_primitive(obj_t arg1, obj_t arg2)
{
return <return value>;
}
Maybe wrap a macro around that to bind a symbol to the procedure.
DEFINE_SIMPLE_PROC("my-primitive", 2)(obj_t arg1, obj_t arg2)
{
return <return value>;
}
----------------------------------
2010.01.29
Alternative eval VM.
An "instruction" has a varying number of fields (operands). Two
fields are always included: proc is the C procedure that implements
the instruction, and cont is the address of the next instruction
(continuation).
An instruction manipulates the three global registers, VALUES, ENV,
and CONT. VALUES is a list of values of previously evaluated
expressions. ENV is the current environment. CONT points to the
next instruction.
A self-eval instruction evaluates a self-evaluating expression. It
has three fields. proc = eval_const, expr = the self-evaluated
expresion, and cont = whatever. The self_eval procedure looks like
this.
void eval_const(obj_t inst)
{
VALUES.push(inst.arg);
CONT = CONT.cont
}
Each instruction the result of evaluating its expression.
A eval_symbol instruction evaluates a symbol. It uses the global
register ENV. The three fields are proc = eval_sym, arg = the symbol,
and cont.
void eval_sym(obj_t inst)
{
VALUES.push(lookup(inst.arg, ENV);)
}
An application instruction applies a procedure to a symbol. It has
three fields: proc = eval_app, arg = VALUES, and cont = whatever.
eval_app looks like this.
void eval_app(obj_t inst)
{
CONT = make_instruction(finish_apply, VALUES, CONT);
/* go on to evaluate the first arg which pushes onto VALUES. */
}
void finish_apply(obj_t inst)
{
obj_t proc = VALUES.pop();
ENV.push(make_env(proc.arglist, VALUES, inst.arg));
proc = VALUES.pop();
}
This doesn't work. Need to check, after proc has been evaluated, whether
its args should be evaluated.
----------------------------------
2010.01.30
Need a new layer between except and eval. Something to handle
all the various reasons to call longjmp(). Call it low-level
exceptions. lowex.h.
Considering yesterday's idea. Wondering whether "instruction" is the
right term. Might also be appropriate to call it a "continuation".
---
Condition Index. Every place in r6rs that mentions raising an exception.
r6rs 4.2.6: &lexical - #\x0001z
- #\λx
- #\alarmx
- #\Alarm
- #\alert
- #\(x)
- #\(x
- #\x00110000
- #\xD800
r6rs 4.2.7: &lexical - "\x41"
- "\x;"
- "\x41bx;"
- "\x00110000;"
- "\xD800;"
r6rs 5.10: &assertion - writing immutable object.
- writing literal constant
- writing symbol name
- writing record w/ no mutable field
r6rs 6.1: &syntax - syntax called with wrong kind of argument
- syntax called with wrong shape of form
r6rs 6.2: &assertion - procedure called with wrong kind of argument
- procedure called with wrong number of arguments
r6rs 6.6: &assertion - (integer->char #xD800)
r6rs 9.1: &assertion - operator of application is not a procedure
r6rs 9.1: &assertion - application wrong number of arguments
r6rs 11.2.1: &assertion - continuation of rhs of define invoked more than once
r6rs 11.4.6: &assertion - <init> in letrec refers to other variable
- <init> in letrec* refers to later variable
- <init> in letrec* invoked more than once
- <formals> doesn't match <init> in let-values
r6rs 11.7.2: &no-infinities - if FPU doesn't support Inf
&no-nans - if FPU doesn't support NaN
r6rs 11.7.4: &implementation-violation - if exact->inexact fails
- if inexact->exact fails
(I think they meant implementation-restriction.)
&implementation-restriction
- if min or max fails to compare exact vs inexact
- if + or * called with mixed non-rational real and
non-real complex arguments
- if - called with mixed non-rational real and
non-real complex arguments
- if / called with mixed non-rational real and
non-real complex arguments
&assertion - (/ 0 0)
- (/ 3 0)
&assertion - if divisor is zero or dividend is Nan or Inf.
- (log 0.0)
&implementation-restriction - if can't raise 0.0 to negative power
- if can't convert float to string
r6rs 11.9: &assertion - on (car '())
- on (cdr '())
r6rs 11.11: &assertion - (integer->char #\xD800)
r6rs 11.13: &assertion - if vector-set! passed an immutable vector
- (vector-set! '#(0 1 2) 1 "doe")
r6rs 11.14: &assertion - if user assertion fails
- (fac 4.5)
- (fac -3)
r6rs 11.19: &syntax - (set! p.car 15)
r6rs-lib 3: &assertion - (for-all even? '(2 4 14 . 9))
- (exist even? '(3 1 1 5 9 . 2))
r6rs-lib 5: &assertion - if case-lambda doesn't match
r6rs-lib 6.2: &assertion - if two nongenerative record types don't match
r6rs-lib 6.3: &assertion - if a sealed record type is subclassed
- if two nongenerative record types don't match
- if record-rtd called for opaque record
- if mutator for immutable record field requested
- if record-mutator called for immutable field
- if record-rtd called for opaque record
r6rs-lib 7.1: &non-continuable - if exception handler returns
r6rs-lib 7.3: &undefined - unbound identifier
r6rs-lib 8.1: &i/o-read - read error
&i/o-write - write error
&i/o-invalid-position - seek error
&i/o-filename - error on a named file
&i/o-file-protection - EPERM
&i/o-file-is-read-only - tried to write read-only file
&i/o-file-already-exists - tried to create existing file
&i/o-file-does-not-exist - tried to open nonexistent file
&i/o-port - procedures accepting a port as an argument
r6rs-lib 8.2.2: &i/o-file-already-exists - sometimes when output file opened
&i/o-file-does-not-exist - sometimes when output file opened
r6rs-lib 8.2.4: &i/o-decoding - on character decoding error
&i/o-encoding - on character encoding error
r6rs-lib 8.2.6: &assertion - ftell on unseekable port
- fseek on unseekable port
&i/o-invalid-position - seek to invalid position
- seek on non-binary file
r6rs-lib 8.2.9: &lexical + &i/o-read - char decoding error in get-datum
- intra-char EOF in get-datum
r6rs-lib 9: &i/o-filename - delete nonexistent or undeletable file
r6rs-lib 11.2: &implementation-restriction - fixnum arithmetic overflow
- fx+ or fx* overflow
- fx- overflow
&assertion - (fx- (least-fixnum))
&implementation-restriction - fxarithmetic-shift overflow
r6rs-lib 11.3: &no-infinities - if FPU can't represent Inf
*no-nans - if FPU can't represent NaN
r6rs-lib 12.4: &syntax - (set! p.car 15)
r6rs-lib 12.5: &syntax - (rec 5 (lambda (x) x))
- (let ([a 3] [a 4]) (+ a a))
- (let ([else #f])
(case 0 [else (write "oops")]))
r6rs-lib 12.9: &syntax - raised by syntax-violation
r5rs-lib 14: &syntax - (color purpel)
r6rs-lib 16: &syntax - if eval passed a non-expression
- if eval passed a definition or splicing begin
containing a definition
&assertion - if eval assigns to its environment
r6rs-lib 17: &assertion - (set-car! (g) 3)
- if set-car! passed immutable pair
- if set-cdr! passed immutable pair
r6rs-lib 18: &assertion - if string-st! passed immutable string
- (string-set! (g) 0 #\?)
- (string-set! (symbol->string 'immutable)
0
#\?)
----------------------------------
2010.01.31
More Scheme papers. These are by Matt Might.
Lexing with derivatives
http://matt.might.net/articles/implementation-of-regular-expression-matching-in-scheme-with-derivatives/
Compiling Scheme directly to Java
http://matt.might.net/articles/compiling-to-java/
Flat closure conversion while compiling to C
http://matt.might.net/articles/compiling-scheme-to-c/
Church encoding
http://matt.might.net/articles/church-encodings-demo-in-scheme/
First-class macros and meta-circular evaluators
http://matt.might.net/articles/metacircular-evaluation-and-first-class-run-time-macros/
Other Scheme implementors who announced their projects on Scheme from Scratch
http://peter.michaux.ca/
Peter Michaux (C, Scheme from Scratch series)
git://github.com/petermichaux/bootstrap-scheme.git
Jim Ingram (C)
http://github.com/ingramj/bs
Chris Salch (C)
http://git.kc5vzm.com/?p=scheme;a=summary
Nick Fitzgerald (Ada)
http://github.com/fitzgen/ada-scheme
Stu (C)
git://github.com/stu/bootstrap-slip.git
Chris Lloyd (Go)
http://github.com/chrislloyd/go-scheme
Seems to be abandoned early.
Philipp (Java for Lego Mindstorms)
http://github.com/toelke/scheme-nxj
Matei Conovici
http://github.com/cmatei/yalfs
Franciso José Marín Pérez (pmarin)
http://github.com/pmarin/Muddy-Scheme
http://wiki.tcl.tk/25512
kbob
http://github.com/kbob/schetoo
---
Need to implement a handful of primitives before I can eval a
procedure call. Peter Michaux started with quote, but I want a
procedure, not a syntax keyword.
Let's suppose we declare primitive procedures like this.
DEFINE_PROC(L"+", 0-)(obj_t arglist)
{
word_t sum = 0;
while (arglist != EMPTY_LIST) {
sum += fixnum_value(CAR(arglist));
arglist = CDR(arglist);
}
return make_fixnum(sum);
}
DEFINE_PROC(L"cons", 2)(obj_t car, obj_t cdr)
{
return CONS(car, cdr);
}
Just like the old codebase, there will be three kinds of procs,
varying in their scope. The default scope will be anonymous.
#define DEFINE_PROC DEFINE_ANONYMOUS_PROC
#define DEFINE_ANONYMOUS_PROC(scheme_name, arg_range) ...
#define DEFINE_STATIC_PROC(C_name, scheme_name, arg_range) ...
#define DEFINE_EXTERN_PROC(C_name, scheme_name, arg_range) ...
----------------------------------
2010.02.02 (Groundhog Day)
What shall the calling convention for C special forms be?
Keep in mind that call/cc is a procedure, not a special form.
Examples:
define
define-syntax
quote
lambda
if
set!
let (and friends)
begin
obj_t my_special_form(obj_t cont, obj_t form)
{
return a_continuation;
}
So what about call/cc? It needs to be marked special somehow
so that it can access its continuation. apply probably needs
the same treatment.
So I think instead of the current two procedure flags,
PF_COMPILED_C
PF_SPECIAL_FORM
there should be three flags:
PF_COMPILED_C
PF_RAW
PF_ARGS_UNEVALUATED
RAW means that the procedure can be called directly from the
eval trampoline. (Maybe there's a better name.)
ARGS_UNEVALUATED means what it says.
What special forms wouldn't need the PF_RAW flag?
Not define, if, or set!. They each need to push a continuation.
Not begin. It needs to manipulate the continuation stack.
Not lambda. It needs a pointer to the current environment.
Not let. It needs to manipulate the environment.
How about quote? Sure, quote doesn't need PF_RAW.
DEFINE_COOKED_SPECIAL_FORM(L"quote", 1)(obj_t datum)
{
return datum;
}
Otherwise, it would be something like this.
DEFINE_SPECIAL_FORM(L"quote")(obj_t cont, obj_t *p_values, obj_t *p_env)
{
*p_values = CONS(cont3_arg1(cont), *p_values);
return cont_cont(cont);
}
What are the macros?
DEFINE_RAW_PROC(name, arg_range). (Still not happy with the name.)
DEFINE_SPECIAL_FORM() (implies RAW + ARGS_UNEVALUATED).
Anonymous, static and extern variants.
---
Cons cell optimization idea number 188^W33.
A heap object has three possible value for its first word.
If it's forwarded, it has a forwarding pointer, tag 0x02.
If it's a non-cons, non-forwarded object, it has a mem_ops_t
pointer, tag 0x00. If it's a non-forwarded cons cell, then
either the first word (car) is a fixnum/immediate or it has
a tag of 0x04.
bool is_pair(obj_t p)
{
if (!is_heap_obj(p))
return false;
word_t tag = *(word_t *)p & LONG_TAG_MASK;
return tag != HEAP_TAG; /* ignore forwarding pointers */
}
mem_ops_t *obj_mem_ops(obj_t p)
{
if (is_pair(p))
return &pair_ops;
else
return heap_object_mem_ops(obj_heap_object(obj));
}
is_pair() is basically the same cost as it is now. In the current
scheme, we do a memory access to load the mem_ops_t pointer. In the
new scheme, we'll do a memory access to load the car.
obj_mem_ops() has an extra test and branch.
In exchange, pairs save 8 bytes on 32 and 64 bit architectures.
On 32 bit, we go from 4 to 8 cons cells per cache line.
On 64 bit, we go from 2.67 to 4.
So I think it's a win.
Now, what about a mutability flag?
Some objects are inherently immutable: all the immediate objects,
continuations, rtds, symbols, and procedures.
The potentially mutable objects are pair, string, vector, bytevector,
binding, and record instance.
Binding is easy. It already has a mutability flag.
String - could jam a flag bit into the size field. Since elements are
multibyte, don't need all 32 or 64 bits for size.
Vector, bytevector - ditto.
Record instance - handled at a higher level - can't construct
a mutator for an immutable field.
Pair - ???
The only thing that comes to mind is to downgrade fixnums to 30 bits
so there are more bits to play with.
----------------------------------
2010.02.10
Records.
I started implementing records tonight. Got part way through
and found out I'm doing it wrong.
1. All rtd's must be stored in the heap. I was thinking that there
would be some in heap and some in C .data section. But the
contents of an rtd has to be GC'd, so it's easier to put it in the
heap than declare each field a root.
1a. Record instances don't need to check for a primitive
rtd. That makes instance mem_ops simpler.
2. r6rs-lib requires a specific format for the `field' argument
when it's passed (by reference) to make-record-type-descriptor.
So that's the format I'll use.
3. An instance should be laid out like a vector, except it doesn't
need a length field if the rtd has a length field.
3a. The rtd should have a number-of-fields field.
4. Accessors need to index into the fields. Sort of like
(lambda (rec) (vector-get rec 3))
4a. Should record_get_by_index()/record_set_by_index() functions.
4b. Accessors will be created by Scheme. C will export lower-
level procedures, maybe just accessors for parent and fields.
5. Field storage for a record instance should be in-line in the record.
It seems like it should be almost straightforward -- bad magic is
happening in the overloading, in a record instance, of
(heap_object).ho_ops and the rtd pointer and in the overloading, in an
rtd, of (rtd_obj_t).rtd_inst_ops.mo_start_marker and
(heap_object.ho_ops, but that was accounted for long ago when I
defined the basic heap object layout.
So what, exactly, is &syntax? I think syntax is a root pointer to the
syntax rtd. Or maybe it's a structure containing the root pointer so
there's a little C type safety.
And I'll need a DECLARE_RTD(name, ...) analogous to DECLARE_PROC()
which squirrels stuff away until the heap is initialized, then
creates the RTD and stores a pointer in a named C variable.
----------------------------------
2010.02.11
Think about auxilliary syntax.
Symbols with special meanings like else, =>, ..., mutable. Bind them
to a magic inevaluable type but with distinct values. Think about
what makes free-identifier=? evaluate true.
----------------------------------
2010.02.12
I claimed that I could pass NULL to raise with impunity.
Well, now it's time to figure out who &who is.
Back in the top-level, there's a continuation pointer.
code = cont_code(cont)
if (code == c_apply_proc)
operator = CAR(cont4_arg(cont));
else if (code == c_eval_operator)
operator = CAR(values);
else
operator = eval; //why not?
But that's all at top-level. I guess I should pack up the other stuff
and build &who and the compound condition back up there.
---
RAISE() builds a compound condition. Hands it off. Now I need
to store it in a static variable in eval, decode the &who (later),
and pass it to a pretty print routine. All the info is there,
but it's scattered around. Probably need some more condition
accessors too.
How about this for a target message format?
scheme: assertion-violation - divide by zero
div: 1 0
Source file and line number would be nice, but...
----------------------------------
2010.02.13
Time to 'blog.
First, introduce the new Scheme. List its major features.
(Lots of datatypes, lots of primitives, some cool stuff
like call/cc and exceptions.
Second, talk about the eval environment.
1. Bounded stack use (with exceptions).
1. It uses trampolines.
2. Pseudo-VM. cont, values, and dynamic environment.
Third, talk about the setjmp/longjmp hack.
--
Son of Scheme: Introducing Schetoo
I've started a new Scheme project. This one is called Schetoo. The
name has connotations of "Scheme Two", "me too", and "Gentoo" (a
hardcore Linux distribution and a breed of penguin).
A 'blogger named Peter Michaux started a series in January called <a
href="http://peter.michaux.ca/index#Scheme%20from%20Scratch">Scheme
from Scratch</a>. His goal was to write a very quick and dirty
Scheme, just enough to bootstrap a Scheme compiler. I read his
postings and decided to get started again.
Schetoo is similar in concept to my first Scheme, <a href="http://kernelbob.wordpress.com/tag/kbscheme/">kbscheme</a>. In fact,
I reused a fair amount of code to get Schetoo up and running faster.
But at the same time, I'm applying what I learned to try to get a
better design.
I've published the code on GitHub.
<a href="http://github.com/kbob/schetoo">http://github.com/kbob/schetoo</a>
---
Goals
The primary goal for Schetoo is pedagogical. I want to understand the
Scheme language and how to design language runtimes. I intend to
focus on performance more than last time, because kbscheme was running
for many seconds just initializing itself.
Another goal is to have no dynamic memory outside of the heap. No
recursion so stack size is bounded. No mallocs. At present, it falls
short of that in two ways. My I/O subsystem uses malloc for memory
buffers, and the print routine is recursive. Both were thrown
together early on to get something running, and both will eventually
be rewritten. I'll rewrite I/O in C and rewrite print in Scheme so it
can remain recursive.
The third goal is to integrate macro translation into Schetoo.
I'm not sure how that will work.
Status
Right now, Schetoo is usable for toy programs.
It has a heap with lots of primitive data types: bindings,
bytevectors, Booleans, Unicode characters, continuations, fixnums,
pairs, procedures, records, record type descriptors (RTDs), strings,
symbols, and vectors. The interpreter also has a hierarchy of
condition types constructed from records. (A Scheme condition is like
a Python exception object.)
It has an assortment of primitives for operating on and converting
among those types. I count 44 primitives at the moment.
The reader came from kbscheme, and just like kbscheme, it handles the
whole R6RS Scheme language except for non-fixnum numbers.
It also has an evaluator with a nice extension mechanism for adding
more procedures and special forms. Some of the more interesting forms
I've implemented are apply, call/cc, define, eval, if, lambda, quote
and set!.
There is an exception mechanism, and all the forms are wired to
generate exceptions as needed. There is no way to set an exception
handler yet -- the default handler just prints the message and pops
back to the REPL.
Finally, there's a clever mechanism to track stack-based heap roots
with zero overhead. That will be the subject of a future post.
I'm planning to implement dynamic-wind and multiple-value expressions
someday. The evaluator was designed with them in mind, but they're
not implemented yet.
Design
Once again, the implementation language is C. The overall structure
is a trampoline-style interpreter using explicit continuations instead
of a call stack. This time, I'm planning to write less C and more
Scheme. I'm almost at the point where the read-eval-print loop (REPL)
can be written in Scheme.
I used the same basic heap and garbage collector design as last time.
There are two significant differences, though.
First, small objects are stored as immediate pointers. Those include
fixnums, (Unicode) characters, Booleans, the empty list, and some
other constants. For example, the Boolean false is represented as a
pointer with the binary value 0x0000000E. There's nothing stored at
address 0x0000000E, it's just a bit pattern.
Second, I've implemented records. Records are Scheme's objects. Each
record has a class (called a record type descriptor or RTD), and
classes have single inheritance. Those both went into the heap design
fairly easily.
The evaluator is also similar at a high level, but completely
different in the details. There is a virtual machine (VM), and the VM
is a little better defined than last time. I think I'll write a
future post about the VM design, too.
And exceptions got designed in early. That's an important part of
building a usable interactive environment.
Next Steps
I'd like to get the REPL into Scheme. To do that, I need user
exception handlers and I/O primitives.
----------------------------------
2010.02.14
Scan and read need to be rewritten to be leaf routines. That will be
challenging. Right now, I'm severely limited
----------------------------------
2010.03.05
I have to turn the parser and scanner inside-out.
Let's assume that I have an I/O library with the equivalent of
getchar() and peekchar(), aka ungetc(getchar(), stdin).
read_datum() should repeatedly call peekchar(), and pass each
character to the scanner's scan_char(). If that succeeds, then the
scanner returns zero, one or two tokens which should be passed on to
the parser. If the parser succeeds, then the scanner can update its
state, and the io stream can commit to reading the character.
We could fail in peekchar(), which is not a problem, since peekchar()
is the first thing to run. getchar() after peekchar() must never
fail. (Should be easy enough -- getchar() just updates the file or
buffer position.)
We could fail in scan_char() when it tries to allocate a token.
If that happens, we want to retry peekchar() and we want to leave
the scanner's state unchanged.
After the scanner has returned zero, one or two tokens, we want to
commit that. Save the tokens, if any, and call getchar(). Can the
tokens be stored in a continuation? Yes, that's the best place.
"Calling" getchar() means pushing a continuation pointing to it
then returning into that continuation.
Then call the parser to update its state based on the new tokens.
(If no tokens, continue directly to another call to peekchar()
and scan_char().) If we fail in the parser, we'll retry the
parse step.
So I see four entry points.
getchar()
peekchar()
scan_char()
parse_token()
(R6RS uses the names peek-char and read-char.)
1. read-char can be a simple procedure. It takes 0-1 args.
It returns a character.
2. peek-char ditto. It can also fail, returning EOF.
Later, read-char and peek-char can be written in Scheme and use
get-u8 and eof-object as the primitives. But for now, I'll implement
read-char and peek-char in C.
3. scan_char() is a raw procedure. Its arguments, which it must
unpack itself, are a character and its starting state. It pushes
continuations onto the stack and returns. If it recognizes zero
tokens, here's the continuation stack it makes.
get-char -> discard-value -> peek-char -> scan_char -> [current TOS]
If there's one token, it looks like this.
get-char -> discard-value -> parse_token -> peek-char -> scan-char -> [TOS]
If scan_char() sees EOF, it doesn't push itelf. It just pushes
any parse_token calls and returns.
I'm thinking we need a primitive "get-char and discard value".
Or maybe I can set up the registers just so.
For now, the first thing to do is implement peek-char and get-char.
And their friends, current-input-port and open-input-file.
----------------------------------
2010.03.06
Okay, I wrote a minimal I/O library. Four procedures: read-char,
peek-char, eof-object, and eof-object?.
----------------------------------
2010.03.07
When EOF is reached, scan_char should exit the loop.
The continuation stack should look like this.
parse_token -> [TOS]
Expand the earlier stack a bit (zero tokens case).
get-char -> discard-value -> peek-char -> scan-char -> [TOS]
expands to
get-char w/ values = ()
discard-value w/ values = (char)
peek-char w/ values = ()
scan-char w/ values = reverse(char scan-ctx)
TOS w/ values = whatever
saved values are stored in CDR(cont4_arg(cont)).
----------------------------------
2010.03.08
It's time to make some changes to continuations before I delve into
the messy project of building them into the new reader.
Goals.
- (G.ENV) Remove env from continuation, keep as separate register,
saved and restored as needed.
- (G.SEP) Separate c_apply_proc's proc and saved_values parameters.
- (G.ORD) Rearrange make_cont's arguments like this.
[opcode, arg ..., env(?), next].
- (G.VFY) Add an assertion to every cont proc that it is called
with the right continuation type.
I'm not sure that G.ENV will actually be faster, though. The cv_t
will become a triple, cve_t, and each cont proc will be called with
three args instead of two. If we didn't have the trampoline, it
would definitely be faster.
Maybe I should write a benchmark comparing the current thing
vs. cve_t vs. keeping env in a global root.
---
Benchmark results are in. V3 is fastest. (with my compiler on my
PC today)
~/scheme2> time ./a.out
version 1 result 1000000000 time 10.4975
version 2 result 1000000000 time 10.3443
version 3 result 1000000000 time 9.79096
30.629u 0.000s 0:30.63 99.9% 0+0k 0+0io 0pf+0w
Version 1 is what we have today. Each cont_proc takes two arguments,
cont and values. It returns a cv_t structure which contains cont
and values. Env is in the continuation structure, and is copied
by every op.
Version 2 is the cve_t idea. Each cont_proc takes three arguments,
cont, values, and env. It returns a cve_t structure which contains
cont, values, and env.
Version 3 keeps env in a global variable. Every tenth operation
updates env.
The source code is in regbench.c.
Implementing version 3 is also cleaner -- less bookkeeping passing env
around in routines that don't care about env. Which is the real reason
to do it.
---
How to get there.
Step 0. Add test cases to exercise all relevant code paths.
Step 1. Add assertion to every cont_proc to verify it has the
right kind of continuation. "assert(is_cont4(cont));"
Step 2. Add global env, use it, and ignore the env passed in the
continuation.
Step 3. Rename cont4 to cont3, remove the env member, reorder args.
Step 4. Create cont4 for c_apply_proc and call/cc. Cut them over.
---
Complication. There is no obvious place to restore the environment
after evaluating a procedure application.
- I could link another cont_proc in to restore the environment.
But that would destroy tail call optimization.
- The environment is only used in c_eval. (Is that true? [update: No.]) So it
could be an explicit arg to c_eval and live in a global variable
the rest of the time.
- Or keep the current scheme where env is a field in each continuation.
Let's map all the places where the env is changed and all the places
where it is used. The interpreter is complete enough that they're
all accounted for, I think.
Modified:
- core_eval()
- c_apply_proc() if proc is not C.
- implicitly when eval of a procedure body finishes.
- the eval primitive.