-
Notifications
You must be signed in to change notification settings - Fork 5
/
tutorial.txt
1685 lines (1338 loc) · 50.5 KB
/
tutorial.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
{
title: Tutorial
description: Myrddin Tutorial
}
Myrddin Tutorial
----------------
Myrddin is a simple modern programming language. It allows you to write clear,
terse, and readable code with a powerful but comprehensible type system. The
compiler infers types globally, checking your code without getting in your
way. It is currently available on Linux, OSX, FreeBSD, OpenBSD, and Plan 9.
This tutorial will get a new user up to speed with Myrddin quickly. This
tutorial comes in three parts. The first will discuss key concepts via several
example programs, the second will cover parts of the language in more detail,
and the third will give an idea of what libraries exist and how to use them.
For deeper coverage, look at the [language specification](spec.html) and the
[library reference manual](doc/index.html).
We assume that you are already familiar with programming, and have installed
Myrddin on your machine already, following the instructions on the
[Environment Setup](setup.html) page.
A Simple Program
---------------
```{runmyr hello}
use std
const main = {
std.put("hello world\n")
}
```
A program begins running at the first line of the function named `main`, and
proceeds line by line, executing statements one after the other. Each
statement is ended by a newline or semicolon.
Here, the first line of main invokes `std.put`. This function does formatted
output. We pass it the string `hello world`, and it dutifully prints out
hello world
The put function can also handle more complex formatting. The first argument
to `std.put` can contain format specifers (`{}`). These will be substituted
with the corresponding argument in the parameter list. Myrddin passes type
information to the format function, and tries to produce a reasonable output
for all arguments.
For example,
std.put("{} + {} = {}\n", 2, 2, 5)
would output the string `2 + 2 = 5`. Additional parameters for specifying
the formatting can be passed between the `{` and `}`. These vary by type,
and are fully documented in the [library documentation](doc/libstd/fmt.html).
The `std.put` function comes from the `std` library, loaded via `use std` on
the first line of the program. Use statements will import a library, allowing
the program to access all of the functions and variables that the library
provides.
In order to compile this program, save it into a file with the extension
`.myr`. A good name for this program is `hello.myr`. Then, build it with
`mbld`:
mbld -b hello hello.myr
./hello
There are other ways to invoke mbld, which will be covered later in this
tutorial.
Another small program
----------------
This program computes factorials.
```{runmyr factorial}
use std
const main = {
var x : int64
x = factorial(10)
std.put("factorial {} = {}\n", 10, x)
}
const factorial = {n
var acc
acc = 1
for var i = 1; i < n + 1; i++
acc *= i
;;
-> acc
}
```
As before, it can be compiled and run with the following command:
mbld -b factorial factorial.myr
Expressions are similar to other common programming languages, such as C,
Java, or Python. A full table of operators will be in the second half of
this document.
Declarations begin with the keyword `var`, `const`, or `generic`, followed by
a list of variable names, optionally with types and initializers. Variable
names are composed of the characters 'a-z', 'A-Z', '0-9', and '_'. The first
character of the variable must not be a digit.
If we want to provide a type for the variable, then the variable name can
be followed by a ':', and then the type we want to declare. Providing the
type explicitly is optional, because the compiler can usually infer the type
on its own.
Functions in Myrddin follow the pattern outlined above, with no special syntax
for declarations. Instead, we simply declare a `const`, and assign it a
function literal expression. Function literal expressions are chunks of code
with arguments and a body, and generally follow this form:
{arg, list
function
body
}
The argument list consists of a list of argument names. Like declarations,
types can be added with `:type`, but are usually not needed. Like statements,
the argument list is terminated with a line ending.
Functions are called using the function call operator, `()`. The types and
arguments of the function must match the declared or inferred type of the
function arguments.
In our factorial program, the variable `x` is given the type `int64`. This
means that when we call `factorial(n)`, the compiler realizes that the
`factorial` function must return an int64. Because the factorial function
returns the variable `acc`, this means that it must also have the type
`int64`. Thus, the type of `acc` is fixed, in spite of the lack of explicit
type declaration. If we attempted to assign `acc` anything other than `int64`,
the compiler would reject the program.
For loops in Myrddin come in stepping form, and iterator form. The type of
loop used in the `factorial` function is a stepping loop.
Stepping for loops will be familiar to anyone who has used C. This type of
loop has the form `for init; test; incr; body ;;`. The `init` expression is
executed before the loop is entered. The `test` expression is run at the start
of each loop iteration, and the `incr` expression is run at the end of every
loop iteration. The `test` expression is a boolean expression, and the loop is
exited when it returns false.
Iterator loops have the form `for pat : expr; body ;;`. These loops
operate on an iterable expression such as an array or a slice. Each time
that the loop runs, the next element in that iterable is stored into `pat`,
and the body is run. This continues until all elements of the iterable are
exhausted. Pat is actually not simply a variable, and may be a pattern.
Patterns are covered later in this tutorial.
Myrddin also has other common control flow statements. If statements
are written as you'd expect:
if cond
thing()
;;
As usual, the control construct is separated from the body of the if statement
using a line ending or semicolon. The condition is a boolean typed expression,
which, if true, will enter the body of the if statement. Otherwise, it will
skip over it. If statements can also be expanded with `elif` and `else`
conditions.
if cond
thing()
elif othercond
otherthing()
elif moreconds
morethings()
else
fallback()
;;
While loops are also supported. These loops repeat as long as the condition
on the `while` is true:
while cond
thing()
;;
The only other significant tool for controlling program flow are match
statements. These are covered below.
Pattern Matching
------------------------
This is a simple example demonstrating pattern matching.
```{runmyr match}
use std
const main = {
var x = 11
match x
| 7: std.put("first\n")
| 9: std.put("second\n")
| n: std.put("got {}\n", n)
;;
}
```
This program will output `"got 11"`. Each pattern in the match statement is
checked against the value in sequence, and the first one that matches has its
body executed. Here, 7 and 9 are not equal to 11, so their bodies are not
executed. However, a free name matches any value, so matching against `n`
succeeds. Additionally, the free name captures the value that it is being
matched against, meaning that in the expression `std.put("got {}\n")`, the
variable n evaluates to 11.
This kind of matching can be applied to more than just integers. If `x` was
assigned the tuple `(11, 33)`, then in the code below, the pattern `(11, n)`
would match, and `n` would hold the value `33`:
match x
...
| (11, n): std.put("got {}\n", n)
...
Pattern matches can descend into the structure of almost any type. Structures,
arrays, strings, unions, and even values on the other end of pointers are fair
game. Of these, matching on unions is likely to be the most common.
A union is a type that has two parts: A tag, and a body. The body is optional,
but the tag is always present. We could define a union type as:
type u = union
`Bodyless
`Int int
`Pair (int, char)
;;
The word after the \` (backtick) is the tag. A union can only hold one of
its variants at once. Unions are written out with the tag and body value,
as in:
x = `Int 123
Once a value is in a union, the only way to extract is by applying a
pattern match to it. The tag is matched on to decide which variant of the
union to extract, and the body is matched using the usual rules. For example:
```{runmyr umatch}
use std
type u = union
`Bodyless
`Int int
`Pair (int, char)
;;
const main = {
match `Pair (1, 'c')
| `Bodyless: std.put("no body\n")
| `Int i: std.put("int body is {}\n")
| `Pair (a,b): std.put("pair body: first={}, second={}\n", a, b)
;;
}
```
In order for a match statement to compile, it must be exhaustive. This means
that there must be at least one case that will match any possible value.
Additionally, each pattern must be useful. This means that a match must not
be fully subsumed by earlier matches.
Patterns also show up in iterator style for loops. In this context, only
a single pattern is allowed, on the loop variable. If a value does not
match the pattern, the loop body is skipped.
```{runmyr formatch}
use std
const main = {
for (1, x) : [(1,1), (2, 4), (1, 3), (2, 7)]
std.put("x = {}\n", x)
;;
}
```
This program will only print `x = 1` and `x = 3`, even though it is iterating
over 4 values. This is because the pattern `(1, x)` only matches the values
`(1,1)` and `(1,3)`.
A Marginally Useful Program
----------------
This program behaves like the Unix `wc` program. You'll have to run it on your
local machine -- it does input and output, and therefore will fail when run in
the playground.
```{runmyr wc}
use std
use bio
const main = {
var lines = 0, words = 0, chars = 0
var inword
var f
f = bio.mkfile(std.In, bio.Rd)
inword = false
while true
match bio.getc(f)
| `std.Err `bio.Eof: break
| `std.Err e: std.fatal("error reading file: {}\n", e)
| `std.Ok ' ': inword = false
| `std.Ok '\t': inword = false
| `std.Ok '\n':
lines++
inword = false
| `std.Ok c:
if !inword
words++
;;
inword = true
;;
chars++
;;
std.put("lines: {}\n", lines)
std.put("words: {}\n", words)
std.put("chars: {}\n", chars)
}
```
This program is a state machine centered around a pattern match statement.
It operates by keeping track of whether it's currently inside a word or not,
and every time it flips into a word, it increments the number of words
using a `++` expression.
We start off by initializing all of our counters to zero, and creating a
buffered wrapper around the `std.In` input stream. This buffered reader is
used to efficiently read and decode whole Unicode codepoints.
The main loop of the `wc` program matches over the result of `bio.getc`. The
std result type is generic, but for our purposes right now we can assume it is
defined as:
type std.result = union
`Err bio.err
`Ok char
;;
A value of `` `std.Err `bio.Eof `` indicates that the reader has successfully reached
the end of the file. A value of `` `std.Err bio.err`` indicates that the
reader has encountered an error reading the file. And a value of `` `std.Ok
char`` indicates that a single character was successfully read from the file.
Refer to the [API documentation](http://myrlang.org/doc/libbio/index.html) for the full
details of what the buffered I/O library provides.
The main loop first checks for the end of the file, exiting the loop and
printing the accumulated statistics if one is encountered. Then, it checks
for errors, bailing out of the program with a failure if one is encountered.
In all other cases it matches on the character that was encountered
to count up the lines, words, and characters.
There are four patterns that match on the `bio.Ok` union tag. The first two
match on spaces and tabs.
| `std.Ok ' ': inword = false
| `std.Ok '\t': inword = false
These patterns simply set the `inword` state variable to false. If we are in a
word, this records that we have left the word. Otherwise, the state is
unchanged.
The next pattern matches on `bio.Ok \n`. Here, in addition to recording the
end of a word, the program increments the line count.
| `std.Ok '\n':
lines++
inword = false
And finally, the last case matches any character that was successfully read.
Since this character is not a space character or newline, we define it to be
a word character. If we are not currently in a word, then this must mark the
start of a new word, so we increment the word count. Finally, the fact that
the program is scanning along a word is recorded.
| `std.Ok c:
if !inword
words++
;;
inword = true
;;
The program then finishes the loop, incrementing the total number of
characters in the program, and reads the next character, starting the
cycle over again.
Stacks
------------
Here's a program that defines a stack. For simplicity, the stack is statically
sized, holding at most 100 elements.
```{runmyr stack}
use std
type fixstack(@a) = struct
top : std.size
data : @a[100]
;;
generic stkpush = {s : fixstack(@a)#, val : @a
s.data[s.top++] = val
}
generic stkpop = {s : fixstack(@a)# -> @a
-> s.data[--s.top]
}
generic mkstk = { -> fixstack(@a)
-> [.top=0]
}
const main = {
var intstk : fixstack(int)
var strstk : fixstack(byte[:])
/* create the stacks */
intstk = mkstk()
strstk = mkstk()
/* initialize the integer stack */
stkpush(&intstk, 0)
stkpush(&intstk, 1)
stkpush(&intstk, 2)
/* type error: stkpush(intstk, "foo") */
/* initialize the string stack */
stkpush(&strstk, "foo")
stkpush(&strstk, "bar")
stkpush(&strstk, "baz")
/* type error: stkpush(strstk, true) */
for var i = 0; i < 3; i++
std.put("{}\n", stkpop(&intstk))
std.put("{}\n", stkpop(&strstk))
;;
}
```
User-defined types are created using the `type` keyword. Type definitions
may define new types based on existing ones, and may optionally take
parameters. For example:
type flags = int32
type slice(@a) = @a[:]
The `flags` type is a definition based off of the `int32` type. This definition
is a distinct type, and requires an explicit cast to be converted to an int32.
The `slice(@a)` type is parameterized, taking a single type parameter `@a`.
When this type is used, the type parameter must be passed in. This substitutes
the type parameter on the right hand side, producing a new type.
In the stack example, the type `stack` is generic. It gets specialized into
`stack(int)` and `stack(byte[:])` in the body of `main`. The `int` stack can
only contain ints, as verified by the compiler when type checking. Similarly,
the `byte[:]` stack can only contain `byte[:]`.
The functions `stkpush`, `stkpop`, and `mkstk` are declared with the keyword
`generic`. The `generic` keyword indicates that they may contain type
parameters in their signatures. This means that when `stkpush` is called with
a stack of `fixstack(int)`, the type `@a` is substituted with `int`.
Similarly, when called with `fixstack(byte[:])`, `@a` is substituted with
`byte[:]`. Note that `@a` is substituted with the same type throughout the
context, so if we defined a `max` function, we would not be able to mix
arguments:
generic max = {a : @t::numeric, b : @t::numeric
if a > b
-> a
else
-> b
;;
}
max(1, 2) /* ok, @t is replaced with int */
max('x', 'y') /* ok, @t is replaced with char */
max('x', 2) /* error: @t wants to be both int and char */
In the `max` example, we also used traits to restrict the types passed to
`max`, requiring them to be numeric. Traits are constraints on generic types,
requiring the type passed to have certain attributes. Numeric is a trait built
in to the language, and is defined for integer, floating point, and character
types. If a type has the numeric trait, it can be compared using relational
operators (`<`, `<=`, `>`, `>=`). It can also have the usual numeric operators
applied (`+`, `-`, `*`, `/`).
Turning Code into a Library
----------------------------
Often, code can be reused from multiple files. This example shows how to put
code into reusable libraries, available from a `use` statement.
pkg stack =
type fixed(@a) = struct
top : std.size
data : @a[100]
;;
generic mk : (-> fixstack(@a))
generic push : (s : fixstack(@a)#, val : @a -> void)
generic pop : (s : fixstack(@a)#, val : @a -> void)
;;
generic push = {s, val
s.data[s.top++] = val
}
generic pop = {s, val
-> s.data[--s.top]
}
generic mk = {
-> [.top=0]
}
The library code is based on the stack example above, but repackaged so that
it can be used from multiple places. We removed the `main` function, and added
a `pkg` section to declare the exports. The `pkg` section contains the data
type that we are providing, and the function prototypes to expose in order
to manipulate that type.
There were also a few stylistic changes. Because the fully qualified name
of the functions (`stack.funcname`) must be used to refer to the library
exports, the `stk` prefix is redundant. It has been removed, replacing, for
example, `stkpush()` with `push()`.
The package name is unrelated to the file name that we decide to save this
code into, and as a general rule, packages consist of multiple files. However,
this example is small enough that a single file suffices.
This library is built and installed with mbld. If the file that the code was
in was named `stk.myr`, then we need to create a file named `bld.proj`, in the
same directory as `stk.myr`, containing the following:
lib stack =
stk.myr
;;
The `lib` clause produces a library named `stack` out of the files listed in
the package. In our case, there is only one file.
mbld
will build the library, and
mbld install
will install it to a place that `use` statements in other code will be able
to find it. To use it, we might write a program similar to our previous one,
but using this library. For brevity, main is shortened:
use std
use stack
const main = {
var istk : stack.fixed(int)
istk = stack.mk()
stack.push(&istk, 123)
std.put("{}\n", stack.pop(&istk))
}
If `mbld install` has been run, then the usual `mbld -b main main.myr` would
produce a binary linked against the stack library that we just wrote.
Alternatively, `main.myr` may also be built with a `bld.proj` file. We can
put this into a bld.proj file in the same directory as `main.myr`:
bin main =
main.myr
;;
There is one problem that separate bld.proj files and installed libraries does
not address. We may want to have the binaries and libraries shipped as part of
the same project, implying that we want to build them all together as a unit.
To do this, we could put the two build targets into the same `bld.proj`, we
and add a dependency from `main` to the `stack` library, as below:
lib stack =
stk.myr
;;
bin main =
main.myr
lib stack
;;
Splitting code into multiple files is done in a similar way. Only two small
changes need to be done. First, because the files are being compiled into the
same unit, instead of dependent libraries, the use statements have to be
changed to the quote form:
use std
use "stk"
const main = { ... }
Then, the bld.proj needs to be changed to put both files into a single
unit:
bin stackdemo =
stk.myr
main.myr
;;
The distinction between quoted and unquoted use statements is how the
packages are looked up. An unquoted use looks for a fully compiled and
installed library with requested name. A quoted use looks for a single
`.myr` file and imports the definitions from that. The quoted form is
used for dependencies within a single package, while the unquoted form
is used for dependencies between different packages.
There's a lot more to mbld, and the full documentation is available
in the [mbld tutorial](mbld.html).
Printing Roman Numerals
-----------------------
This program uses traits to decide how to stringify integers. Traits are a
powerful mechanism for attaching behavior to types that can be overridden at
compile time.
They add a lot of expressiveness, but the overloading that they imply can
heavily hurt readability. As a result, they are best used sparingly, and with
care.
```{runmyr trait}
use std
trait stringable @a =
stringify : (buf : std.strbuf#, v : @a -> void)
;;
type roman = int64
const romanmap = [
(1000, "M"), ( 900, "CM"),
( 500, "D"), ( 400, "CD"),
( 100, "C"), ( 90, "XC"),
( 50, "L"), ( 40, "XL"),
( 10, "X"), ( 9, "IX"),
( 5, "V"), ( 4, "IV"),
( 1, "I"),
]
impl stringable roman =
stringify = {sb, n
for (i, s) : romanmap
while n >= i
std.sbputs(sb, s)
n -= i
;;
;;
}
;;
impl stringable int32 =
stringify = {sb, n
std.sbfmt(sb, "{}", n)
}
;;
const main = {
var i32 : int32
var r : roman
var sb, s
r = 1234
i32 = 1234
sb = std.mksb()
std.sbputs(sb, "roman: ")
stringify(sb, r)
std.sbputs(sb, ", i32: ")
stringify(sb, i32)
s = std.sbfin(sb)
std.put("traity conversion: {}\n", s)
std.slfree(s)
}
```
This program begins by defining a trait `stringable @a`. The `stringable`
trait requires implementations to provide a `stringify` function with
a type ` (buf : std.strbuf#, v : @a -> void)`. This function will put a
string version of the value `v` into the string buffer.
Next, a new type `roman` is defined. It's an integer, but we attach a
trait to it that will cause `stringify` to render it as a roman numeral.
The implementation follows.
Then, another trait is defined to stringify `int32` values. The `int32`
impl just uses `std.sbfmt()` to render the integer into the string buffer.
Finally, `main` uses the `string` function on the two types, demonstrating
that the roman numeral value indeed gets formatted as a roman numeral,
and the int32 gets formatted with boring old arabic numerals.
Traits are closely related to generics, however instead of substituting
the type within the body of a function, the types are used to look up a
type specific implementation when the program is compiled.
Command Line Arguments
----------------------
This program implements the Unix `echo` program. When run on the command
line, it will echo all of the arguments given to it.
```{runmyr echo}
use std
const main = {args : byte[:][:]
for a : args[1:]
std.put("{} ", a)
;;
std.put("\n")
}
```
Arguments given on the command line are passed to Myrddin programs as
the first argument to main. The type of the arguments is a `byte[:][:]`.
The first element of this slice is the program name. The second element
onwards are the arguments passed to the program.
This program is the first program written where an additional type annotation
is needed. Because the operations on `args` can be done on both a slice or
an array, type inference has too little information to disambiguate the two
cases. Therefore, the `args` parameter to `main` is annotated with a type.
By convention, options are flagged with a leading `-`. Flags which take no
arguments can be grouped together, so that `-a -b -c` is equivalent to `-abc`.
Flags that do take arguments are insensitive to spaces in the argument list,
so that `-o arg` is equivalent to `-oarg`. And option processing is stopped
after the first `--` seen in the input.
Following these rules yourself isn't difficult, but standard library
provides code that handles these cases for you.
The example program above is incomplete: According to POSIX, `/bin/echo`
accepts a `-n` option which suppresses the final newline. For the sake of
illustration, let's also extend it with a `-p prefix` argument, which adds a
prefix to each value printed.
```{runmyr echoargs}
use std
const main = {args
var cmd
var printnl, pfx
printnl = true
pfx = ""
cmd = std.optparse(args, &[
.argdesc="args...",
.opts=[
[.opt='n', .desc="suppress newlines"],
[.opt='p', .arg="pfx", .desc="insert prefix"],
][:]
])
for o : cmd.opts
match o
| ('n', ""): printnl = false
| ('p', p): pfx = p
| _: std.die("bug: unhandled arg\n")
;;
;;
for a : cmd.args
std.put("{}{} ", pfx, a)
;;
if printnl
std.put("\n")
;;
}
```
The `std.optparse` function takes two arguments. The first is the argument
list to parse. The second is a pointer to an argument description structure.
In this program, this is written out as a struct literal.
The argument description structure is used for two purposes. The primary
purpose is for describing to `std.optparse` what the command line should look
like. The second purpose is producing a useful help message for the user.
The `optparse` function parses the command line into two data structures. The
first is a slice of (char, byte[:]) pairs that contains the options and their
values. The second is a slice of byte[:] that contains the non-option
arguments.
Once the options are parsed, the program loops over them and processes them,
storing the prefix and recording whether to print newlines.
This program only exercises a small portion of the command line parser.
The [API reference](doc/libstd/cli.html) covers the rest of the capabilities
in detail.
Declarations in Detail
----------------------
Declarations come in three flavors. There are constant declarations and
variable declarations. Constant declarations are indicated with `const`.
Variable declarations are indicated with `var`. Generic declarations are
indicated with `generic`.
This keyword is followed by the variable name. The type follows, optionally.
If the type is omitted, then it will be inferred. Finally, the initializer
follows. In the case of consts, the initializer is mandatory. Otherwise, it
can be omitted.
Here's an example of a fully specified declaration:
var x : int = 123
The type can be omitted, and left up to the type inference:
const y = 123
And, if the declaration is a var, then the initializer can also be omitted:
var z
Multiple declarations can be placed after a single keyword. Each type and
initializer is independent.
var w, x = 123, y : char = 'a', z = "string"
Vars are mutable at runtime. The compiler prevents using them before they
are initialized. If the address of a variable is passed to a function, the
analysis assumes that they are being passed as an out parameter, and will
be initialized by this function.
var a
f(a) /* illegal: used before defined */
g(&a) /* ok: assumption that g initializes a */
Consts are are compile time constants, and are often placed in read only
memory by the compiler. Consts must be initialized with an expression that is
computable at compile time. Generics are closely related to constants,
although their type may contain type variables.
Myrddin has no special syntax for declaring functions. Functions are simply
declared by initializing a const or var with an anonymous function. For
example, to declare a function that takes a single argument and returns it
unmodified:
const id = {a
-> a
}
Because it is desirable to make mutual recursion convenient, functions
may be declared in any order. But because there is no distinction between
functions and variables, this means that variables may also be declared in
any order. This leads to interesting effects, where it is possible to use
a variable before it is declared.
const f = {
y = 123
-> y
var y
}
This is strongly discouraged, stylistically.
Literals in Detail
-------------------
Many values in can be written out directly in code, as literals. Integers,
characters, strings, arrays, structs, and slices are all examples.
#### Ints
Integer literals are usually written out as decimal numbers. Integers can
also be written out in hex, octal, or binary. These variants are specified
with the prefixes `0x`, `0o` or `0b`, respectively. For example:
123 /* decimal 123 */
0x123 /* hex 123 (291 decimal) */
0b101 /* binary 101 (5 decimal) */
Integer literals have a generic type. and can therefore be assigned to any
type with the `integral` and `numeric` traits. Integer suffixes can be used
to restrict the type. The integer suffixes 'b', 's', 'i', and 'l' respectively
indicate that the integer is a signed 8, 16, 32, or 64 bit integer. Adding a
`u` suffix indicates that the integer is unsigned.
#### Floats
Floating point literals are written using decimal notation, separating the
integer portion from the fractional portion with a period. Optionally, an
exponent may be written using either an 'e' or an 'E'. For example:
0.5 /* 0.5 decimal */
1.0e2 /* 100.0 decimal */
Floating point literals have a generic type, and can be assigned to any other
type with the `floating` and `numeric` traits.
#### Characters
Characters are quoted using single quotes. They represent a single Unicode
codepoint. Most characters can be written directly, but some are either
syntactically significant, or would combine with the quotes. As a result,
the following escape sequences are recognized:
<table>
<tr><td>\n</td><td>New line</td></tr>
<tr><td>\r</td><td>Carriage return</td></tr>
<tr><td>\b</td><td>Backspace</td></tr>
<tr><td>\"</td><td>Double quote</td></tr>
<tr><td>\'</td><td>Single quote</td></tr>
<tr><td>\\</td><td>Backslash</td></tr>
<tr><td>\v</td><td>Vertical tab</td></tr>
<tr><td>\0</td><td>Null character</td></tr>
<tr><td>\xDD</td><td>Hex byte. DD are two hex digits</td></tr>
<tr><td>\u{codepoint}</td><td>Unicode codepoint</td></tr>
</table>
The codepoint value for Unicode escapes is a hex encoded integer.
#### Strings
Strings are quoted using double quotes. They contain a byte slice, which
is conventionally a UTF-8 encoded string. The language, however, enforces
no such constraint on the contents of a string, and leaves the interpretation
up to the libraries using it.
The escape codes allowed in strings are the same as those allowed in
characters. Unicode escapes (`\u{codepoint}`) will be UTF-8 encoded. All other
escape codes, including hex escapes, will be inserted into the byte sequence
uninterpreted.
#### Arrays and Slices
Array literals are written as comma separated sequences of values enclosed in
square brackets. Optionally, indexes can be given to the initialized values.
If there are gaps in an indexed initializer sequence, then the missing values
are zero initialized. For example:
/* packed 3 element array */
x = [1,2,3]
/* 74 element array, with x[0]==1, x[73] == 2 */
x = [0: 1, 73: 2]
There is no dedicated slice literal syntax in Myrddin, but slices can be taken
off of array literals, giving a compact syntax that serves the purpose.
sl = [1,2,3][:]
Beware, array literals within functions are allocated on the stack, so the
lifetime of a slice is the same as the lifetime of the array literal.
#### Structs
Struct literals are written as comma separated sequences of initializers
enclosed in square brackets. Initializers come in the form `.membername =
value`. In order for the compiler to be able to tell apart a struct literal
and an array literal, at least one initializer is needed. For example:
type example = struct
a : int
b : int
;;
var x : example
x = [.a=123]
If a member of a struct is not initialized by the literal, it is zeroed.
#### Unions
Unions are constructed by prefixing a value of the appropriate type with the
union tag. If the union has no value for the tag, then the tag stands on its
own as a constructor. For example:
uval = `Tag2 123
uval = `Tag1
Operators In Detail
-------------------