-
Notifications
You must be signed in to change notification settings - Fork 1
/
julia-basics.qmd
711 lines (525 loc) · 30.5 KB
/
julia-basics.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
---
date: "2024-07-01"
engine: julia
---
# Julia Basics {.unnumbered}
## Key Terms
- **REPL**: The Julia REPL is the Julia Read-Eval-Print-Loop.
This is the interactive command line interface for Julia.
When you start Julia in the command line (terminal in Mac/Linux, command prompt in Windows), you are in the REPL, and it is a common way to interact with Julia.
- **Package**: A package is a collection of code that can be used to extend the functionality of Julia and complete specific tasks.
Packages are installed using the `Pkg` package manager.
- **Variable**: A variable is a value or object that you have assigned a name.
This may be as simple as a number or a sentence (a string variable), or as complex as a model or a plot.
- **Function**: A function is a block of code that performs a specific task.
Functions are called by name and can take arguments, before completing some computation and returning a value or object.
Sometimes functions are written and called for their side effects, i.e. they do not directly return an object, but instead perform some action.
- **Method**: A method is a specific implementation of a function.
- **Multiple Dispatch**: Multiple dispatch is a really exciting feature of Julia, but also one that is more difficult to understand for newer programmers.
The basic premise is that in Julia, how functions behave depends on the types of the arguments that are passed to them.
For example, the `*` operator (function) will behave differently if you try to multiply two integers (whole numbers), two floats (numbers with decimals), two matrices, any combination of these etc.
Each of these different behaviors is a different **method** of the `*` function.
## What to Expect
As mentioned previously, this book (and page) is not meant to provide a ground-up description of everything you need to know about Julia.
Instead, we'll give an overview of some of the key concepts and features that should provide enough of an understanding that you can start using Julia with reasonable confidence.
It'll likely take a couple of passes through this page to really get a good understanding of the concepts, and that's okay!
It's meant to act as a reference so you can come back to it later if you don't understand something in the later, more-applied, sections.
At the bottom of this page are some additional resources that you can use to gain a deeper understanding of Julia.
## Data Types and Structures
There are a number of different data types and structures in Julia.
Here are the key ones for your to be aware of.
**Data Types**:
- **Integer**
- Whole numbers
- **Float**
- Numbers with decimals
- **Boolean**
- `true` or `false` (written in lowercase)
- `true` has equal value to `1` e.g. `1 == true`
- `false` has equal value to `0` e.g. `0 == false`
::: {.callout-note}
In Julia, the `==` operator is used to check if two values are equal.
It returns a boolean value, `true` or `false`, depending on whether the values are equal or not i.e. `1 == true` returns `true` because `1` and `true` are equal!
It is different to the `=` operator, which is used to assign a value to a variable (see [this section below](#assignment) for more details on variables).
In Julia there is also the `===` operator, which is used to check if two values are identical.
This is different to the `==` operator, which checks if two values are equal.
For example, `1 == true` returns `true` because `1` and `true` are equal, but `1 === true` returns `false` because `1` and `true` are not identical (they are not stored in the same location in memory in the computer).
:::
- **Char**
- A single character e.g. `"H"`
- **String**
- A sequence of characters e.g. `"Hello World!"`
**Data Structures**:
- **Array**
- An **array** is a collection of values that are all the same type.
Arrays can be one-dimensional (**vectors**), two-dimensional (**matrices**), or multi-dimensional.
Arrays are **mutable**, meaning that they can be changed after they are created.
An example of an array is `[1, 2, 3]`
- **DataFrame**
- A **DataFrame** is a special type of **array** created by the `{DataFrames}` package that is used to store tabular data.
It is a collection of columns, where each column is an **array** of the same type.**DataFrames** are **mutable**, meaning that they can be changed after they are created.
- **Tuple**
- A **tuple** is a collection of values that do not all have to be the same type.
**Tuples** are very useful because they require very little memory, so are fast to create and access.
They are also **immutable**, meaning that they cannot be changed after they are created, but because they are so fast to create, you can just create a new tuple with the values you want.
An example of a **tuple** is `("John", 25, 1.8)`
- **Dictionary**
- A dictionary is a collection of key-value pairs that do not need to be of the same type.
**Dictionaries** are **mutable**, and are very useful for storing data that you want to access by a key (i.e. name), rather than an index.
For example, you might want to store a person's name, age, and height e.g. `Dict("name" => "John", "age" => 25, "height" => 1.8)`
- **Named Tuple**
- A variant of the **tuple** is the **named tuple**.
It is a cross between a **tuple** and a **dictionary**, and therefore has the benefits of being able to access values with keys instead of indices (though you use indices), but it is **immutable** and much smaller and faster than a **dictionary**.
For our person example, a **named tuple** would look like `(; name = "John", age = 25, height = 1.8)`.
Note the `;` at the beginning of the tuple - the use of semicolons is common in Julia to separate named arguments from unnamed arguments in functions, and while it is not essential to create a named tuple with length > 1, it must be used for a named tuple with only one element (`(name = "John", )` with a `,` after the pair could similar be used for 1-element named tuples).
- **Structs**
- A **struct** is a custom data type that you can create to store data.
It is similar to a **named tuple** in that it is **immutable** and you can access values with keys instead of indices.
One reason you may prefer to use a **struct** over a **named tuple** is that you can define methods for a **struct** (see [the multiple dispatch section](#multiple-dispatch) for more details).
Creating **structs** are out of the scope of this book, but it is important to know that they exist and are a useful tool for organizing your data.
If you want to learn more about **structs**, check out the [documentation](https://docs.julialang.org/en/v1/manual/types/#Composite-Types) and [this tutorial](https://www.youtube.com/watch?v=pHQe3PiYY1w).
If you have an object and want to tell what type it is, you can use the `typeof()` function.
If you have an array and want to tell what type the elements of the array are, you can use the `eltype()` function.
## Variables
Variables really just stored pieces of information that you've given a name to.
This is useful because it allows you to run a calculation, for example, and then save it for use later on.
That way you don't need to run the calculation again, you can just pull the value out of storage!
A slightly different example is if you have a constant value that you use multiple times in your code, e.g. the size of a population.
Rather than typing out the value every time you need it, you can just store it in a variable and use the variable name instead.
This not only saves you time and makes your code more readable, but also can reduce the chance of making a mistake (e.g. if you accidentally type the wrong value when copying it to a new calculation).
### Assignment
Now we know what **variables** are, let's look at how to create them.
As mentioned earlier, we use the `=` operator to assign a value to a **variable**.
For example, if we wanted to create a variable called `x` and assign it the value `1`, we would write `x = 1`.
But we aren't just restricted to numbers, we can assign any type of value to a variable.
This includes **strings**, **arrays**, **tuples**, **dictionaries**, and **structs**.
Earlier, when talking about data structures, we used the example of a person's name, age, and height.
Let's see how we can create **tuples**, **dictionaries**, and **dataframes** to store this information.
```{julia}
using DataFrames # We need to load the DataFrames package to create a DataFrame
john_tuple = ("John", 25, 1.8)
john_ntuple = (name = "John", age = 25, height = 1.8)
john_dict = Dict("name" => "John", "age" => 25, "height" => 1.8)
john_df = DataFrame(name = "John", age = 25, height = 1.8);
```
::: {.callout-note}
When creating a dictionary, you can use the `=>` operator to assign a value to a key.
The key is always on the left, and the value is always on the right.
At the end of the test array assignment, we have a semicolon (`;`).
This has nothing to do with the array, but is used to suppress the output of the assignment, so when we run the code, we don't see the array printed to the screen.
:::
Because a person's name is a string, their age is an integer, and their height is a float, we cannot create an array to store this information, because arrays can only store values of the same type.
To show how we can create and access **arrays**, let's create a **vector** (1-D array) of multiple people's names, as well as a random **matrix** (2-D array).
```{julia}
people_vec = ["John", "Jane", "Joe"]
test_arr = [1 2 3; 4 5 6; 7 8 9]
```
::: {.callout-note}
When creating a matrix, you can use a semi-colon to separate rows in the matrix.
One alternative is to specify the exact positions of each value e.g.
```julia
test_arr = [
1 2 3
4 5 6
7 8 9
]
```
:::
### Accessing Values
To access the value stored in a variable, we can often use indices.
Julia, like R, is a 1-indexed language, meaning that the first element in an array has an index of 1, not 0 (like Python).
In our examples, the first element of the objects we created is the person's name, so we can access it with an index of 1.
```{julia}
john_tuple[1] # "John"
john_ntuple[1] # "John"
people_vec[1]
```
For **dataframes** and multi-dimensional **arrays**, we have to make a slight modification to use a comma that separates the indices for each dimension.
In an array/dataframe, the first index is the row number, and the second index is the column number.
To access the element in the first row and the first column of the array, we would use the following code.
```{julia}
test_arr[1, 1]
john_df[1, 1]
```
If we want to access an entire row or column, we can use the `:` operator.
For example, if we want to access the first column of the array, we can use the following code.
```{julia}
test_arr[:, 1]
```
```{julia}
john_df[:, 1]
```
If we want to access the first row of the array, we can use the following code.
```{julia}
test_arr[1, :]
```
::: {.callout-note}
In all cases where we used the `:` operator, we get a **column vector** as the output, not a single value, regardless of whether we are extracting a row or a column from the original array!
:::
However, none of these methods work for **dictionaries**.
For dictionaries, you need to specify the key of the value you want to access.
```{julia}
john_dict["name"]
```
You can also use the key (or column name) to access the value in **dataframes** and **named tuples**.
```{julia}
john_df[1, :name] # The : operator before the column name turns it into a symbol that can be used to index the dataframe
john_df[1, "name"]
john_ntuple.name
```
## Functions
### Overview
Functions are a core part of programming in Julia, and programming in general.
A function is a block of code that performs a specific task.
As has been said before, a function is like a recipe you might use to bake a cake.
The recipe tells you what ingredients you need, how to combine them, and how long to bake them for.
And like a recipe, a function can be used over and over again to produce the same result (assuming you have identical inputs).
This is a really powerful concept, and helps make your work and research reproducible by breaking up your code into small, reusable, and understandable chunks.
And because it is meant to be reused, it will save you time in cases when you need to do the same thing multiple times (you don't want to have to write the same code over and over again)!
So let's look at a simple example of a function in Julia, and use it to explore some of the key concepts of functions.
Say we want to take a number, multiply it by 2, and then divide the result by 3.
You could just write this out explicitly, but what if you want to do this for a bunch of different numbers?
This is where a function comes in handy.
```{julia}
function multiply_by_two_divide_by_three(x)
y = x * 2
z = y / 3
return z # it's good practice to explicitly return a value (or nothing in special cases)
end
```
This function takes a single **argument**, `x`, and then multiplies it by 2 and divides it by 3.
The `return` keyword tells Julia what value to return from the function.
It also tells Julia that the function is finished, and it will not execute any code after the `return` statement.
Let's try using this function.
```{julia}
multiply_by_two_divide_by_three(3)
```
```{julia}
multiply_by_two_divide_by_three(10)
```
Note that in both of these examples, a floating point number is returned i.e., a number with decimals.
::: {.callout-note}
Without going into too much detail, it is good practice to give functions short, descriptive names.
A good example would be `cumsum()` that is provided in Julia and calculates the cumulative sum of a vector.
If a function name is too long to write without separating the words, use snake case (words separated by underscores) e.g. `multiply_by_two_divide_by_three()` rather than leaving as a single block of text (`multiplybytwodividebythree()`), or using camelCase (`multiplyByTwoDivideByThree()`).
It is also good practice to add a docstring to your function.
This is a short description of what the function does, and can be accessed by typing `?` followed by the function name in the REPL.
This means that you can quickly understand exactly what a function does without having to work your way through the code, really helping others who may read your code, but also future you if you revisit a project.
An example of adding a docstring to a function may be as simple as adapting our original code to look like the following.
```julia
"""
multiply_by_two_divide_by_three(x)
Multiply `x` by 2 and divide by 3.
"""
function multiply_by_two_divide_by_three(x)
y = x * 2
z = y / 3
return z
end
```
Read more about docstrings [here](https://docs.julialang.org/en/v1/manual/documentation/).
:::
### Arguments & Keyword Arguments
Unlike R, Julia makes a distinction between arguments and keyword arguments.
Arguments are the values that are passed to a function.
In the example function above, `x` is an argument.
In Julia, arguments are **positional**, meaning that the order in which you pass them to a function matters.
To see this in practice, let's write a new function that takes two arguments, `x` and `y`, and multiplies them together after minusing one from argument `x` and adding one to argument `y`.
```{julia}
function multiply_together_offsets(x, y)
z = (x - 1) * (y + 1)
return z
end
```
Because Julia uses positional arguments, the following two function calls will return different values, even though the numbers `5` and `10` are used in both.
```{julia}
multiply_together_offsets(5, 10)
```
```{julia}
multiply_together_offsets(10, 5)
```
**Keyword arguments** are arguments that are passed to a function by name.
Generally speaking, keyword arguments are used to set default values for arguments that can be changed by the user.
Let's modify our `multiply_together_offsets` function to use keyword arguments.
```{julia}
function multiply_together_offsets(x, y; offset_x = 1, offset_y = 1)
z = (x - offset_x) * (y + offset_y)
return z
end
```
We have added two new arguments to the function, `offset_x` and `offset_y`, and given them default values of `1`.
::: {.callout-tip}
It is not necessary, but it is generally good style to place keyword arguments after all positional arguments, as well as separating them from positional arguments using a semi-colon (`;`), rather than a comma.
:::
Now, when we call the function, we can specify the values of these arguments by name.
```{julia}
multiply_together_offsets(5, 10)
```
```{julia}
multiply_together_offsets(5, 10; offset_x = 2, offset_y = 3)
```
### Scope
Scope is a relatively complicated concept, but it is important to understand it in order to write functions that are easy to understand and debug.
Scope refers to the visibility of variables within a function.
In Julia, variables that are defined within a function are not visible outside of the function.
The reverse is not true, however.
Variables that are defined outside of a function are visible within the function, but cannot be modified.
Let's look at some examples.
```{julia}
function add_one(x)
y = x + 1
return y
end
add_one(5)
```
```{julia}
#| error: true
y
```
In this case, `y` is defined within the function `add_one()` i.e. is a **local variable**, and is therefore not visible outside of the function, but it can be used within the function!
```{julia}
global_x = 5
function print_global_x()
return println(global_x)
end
print_global_x()
```
In this example, `global_x` is defined outside of the function `print_global_x()`, and is therefore visible within the function, but it cannot be modified.
::: {.callout-warning}
It's not good practice to access **global variables** in your functions.
Instead, if you want to use a variable in your function, pass it as an argument.
:::
```{julia}
function modify_global_x()
global_x = 10
return global_x
end
modify_global_x()
global_x
```
Here, we have tried to modify `global_x` within the function `modify_global_x()`, but this has not worked.
It looks like it worked when we called the function, but when we check the value of `global_x` outside of the function, it is still `5`.
### Multiple Dispatch
Multiple dispatch is the idea that the behavior of a function depends on the types of the arguments that are passed to it (as well as the number of arguments).
To illustrate this, let's go back to our original example function `multiply_by_two_divide_by_three()`.
In the example above, we passed a single argument to the function, and it returned a floating point number.
But what if we wanted to pass multiple numbers to the function, and have it return a vector of the results?
We could do this by specifying another **method** of the function that accepts a tuple of numbers as an argument.
```{julia}
function multiply_by_two_divide_by_three(x::Tuple)
y = zeros(Float64, length(x))
z = similar(y)
for i in eachindex(x)
y[i] = x[i] * 2
z[i] = y[i] / 3
end
return z
end
```
We have defined a new method of the function (i.e., a new way of using the function) by specifying the type of the argument `x` as a tuple (`::Tuple`), and this is illustrated in the printout `multiply_by_two_divide_by_three (generic function with 2 methods)`.
::: {.callout-note}
You don't have to understand exactly what the code is doing here (but have a look at the [for loop](#for-loop) section if you're interested).
Neither is the code particularly efficient, but it's a relatively readable way to illustrate the concept of multiple dispatch.
:::
Now let's test out our new function method
```{julia}
multiply_by_two_divide_by_three((1, 2, 3))
```
And we can see that the original method that just takes a single number as an argument still works.
```{julia}
multiply_by_two_divide_by_three(3)
```
::: {.callout-warning}
It is important to note that **keyword arguments** are not considered in multiple dispatch i.e., trying to define a new method of a function that differs only by keyword arguments *will not* create a new method, but just overwrite the old one.
So if you want/need a new method, use **positional arguments**.
:::
## Packages
Packages are an essential part of the Julia ecosystem.
You've already seen an example of a package in action: `{DataFrames}`.
At their core, a package is a way for someone to share code, data, and documentation with other people.
By design, Julia can't do everything for everyone straight out of the box.
Not only would it be an impossible task for the Julia developers to create a language that can do everything, but it would also be incredibly slow to load and run.
Instead, packages extend the abilities of Julia by providing additional features (through functions) that are not included in the base language.
The `{DataFrames}` package, for example, creates a special data structure that is very easy to read, as well as providing a number of functions that make it easy to manipulate and analyze data.
To add a package to your Julia environment (project), you can use the `add` command in the package manager (accessed by pressing `]` in the REPL).
Then, you can use the `using` command to load the package into your current Julia session.
See [this page](new-project.qmd) for more information in the context of setting up a new project.
## Control Flow
Control flow refers to the order in which the statements in a program are executed.
There are many different ways to control how a program is executed, but we will focus on the most common ones here.
See the [Julia documentation](https://docs.julialang.org/en/v1/manual/control-flow/) for more information.
### If Statements
If statements are a way to control whether or not a block of code is executed, and fall under the general category of **conditional evaluation** (but I think "if statements" gives you a more intuitive sense of what we're talking about in this section).
There are many uses for conditional evaluation, so we'll just show you some examples of how to use it, and you can explore further if you need to.
The following is an example from the [Julia documentation](https://docs.julialang.org/en/v1/manual/control-flow/#man-conditional-evaluation-1).
```{julia}
function number_relations(x, y)
if x < y
relation = "less than"
elseif x == y
relation = "equal to"
else
relation = "greater than"
end
return println("x is ", relation, " y.")
end
number_relations(2, 1)
```
In this example we are using an `if` statement to determine the relationship between two numbers.
It's important to note that the **conditional statements** are evaluated in sequence, and the first one that evaluates to `true` is executed i.e. `if` then `elseif` then `else` in this example.
::: {.callout-note}
`elseif` and `else` statements are both optional (i.e. just an `if` statement is valid), and you can have as many `elseif` statements as you like (including 0 i.e. just `if` and `else` statements).
:::
#### Short-Circuit Evaluation
If you want to check multiple conditions, you can use the `&&` (and) and `||` (or) operators.
These is known as **short-circuit evaluation**.
For example, if we wanted to check if a number is between 0 and 10, we could do the following.
```{julia}
function number_between(x)
if x > 0 && x < 10
println("x is between 0 and 10")
else
println("x is not between 0 and 10")
end
end
number_between(3)
```
```{julia}
number_between(11)
```
In `number_between(3)`, x is greater than 0 and less than 10, so both conditions evaluate to `true`, and the code in the `if` statement is executed.
In `number_between(11)`, x is greater than 0, but not less than 10, so while the first condition evaluates to `true`, the second condition evaluates to `false`, so the code in the `else` statement is executed.
This is important to understand - all conditions must evaluate to `true` for the code in the `if` statement to be executed!
Based on this, try to think about why the following code also works.
```{julia}
function number_between2(x)
if x > 0 && ((x > 10) == false)
println("x is between 0 and 10")
else
println("x is not between 0 and 10")
end
end
number_between2(3)
```
### Iteration
Iteration is a useful concept in programming, is a pretty intuitive way to think about many problems we come across in epidemiology (once you get used to it), and is very fast in Julia, so it's worth spending some time to understand it.
Do not expect to understand everything about iteration after reading this section, and you will likely need to come back to refer to it as you go through the book, but hopefully it will provide a good starting point for you to explore further.
#### For Loop
The most common way to iterate in Julia is using a `for` loop.
We have already seen a `for` loop in the [multiple dispatch](#multiple-dispatch) section, but let's look at a simpler example.
Let's say we want to calculate the sum of the numbers from 1 to 10 (cumulative sum) i.e. 1 + 2 + 3 + ... + 10.
Julia has an in-built function to do this (`cumsum()`), but let's write our own function to do it using a `for` loop.
There are multiple ways we could write this function, but the most intuitive way is to go through each of the numbers in 1 to 10, and add them to a running total.
```{julia}
function mycumsum(x)
y = 0 # Initialize our running total to 0
# For each number in x, add it to our running total
for i in x
y += x[i] # This is equivalent to y = y + x[i]
end
return y
end
mycumsum(1:10)
```
#### While Loop
Another way to iterate is using a `while` loop.
The difference between a `for` loop and a `while` loop is that a `for` loop iterates over a sequence of values, whereas a `while` loop iterates until a condition is met.
For example, let's say we want to keep adding numbers to our running total until the total is greater than 100 (and stop counting).
We might not know how many numbers we need to add to get to 100, so we can't use a `for` loop, but we can use a `while` loop.
```{julia}
function mycumsum2(x)
y = 0 # Initialize our running total to 0
i = 1 # Initialize our counter to 1
# While our running total is less than 100, add the next number to our running total
while y < 100
y += x[i] # This is equivalent to y = y + x[i]
i += 1 # Update our counter so we can add the next number
end
return println("We added ", i, " numbers to get to ", y)
end
mycumsum2(1:100)
```
::: {.callout-caution}
`while` loops are very useful in many situations, but are more dangerous than `for` loops, because it's easy to get stuck in an infinite loop.
For example, if we accidentally started our cumulative sum between 0:100 and forgot to update our counter, we would never reach our condition of y < 100, and the loop would never end.
To avoid this, people often add `break` statements to their `while` loops, which will break out of the loop if a certain condition is met i.e. if we added 100 numbers and still haven't reached 100, we can early exit out of the loop.
Generally speaking, use a `for` loop if you can, and be careful when using `while` loops.
:::
#### Map
An alternative to loops is the `map()` function.
If you are familiar with **functional programming** (or the [`{purrr}`](https://github.com/tidyverse/purrr) package and functions in **R**), the `map()` function will be easy to grasp.
If not, not to worry as it's just a different method of applying a set of functions to each element in a sequence.
The main difference to be aware of is that each application of the function(s) happen independently from each other, so you can't increment a counter and then update a value from the prior iteration of a loop.
Say we want to calculate take an array of integers and return an array of their squares.
We can use a for loop (shown in the folded code below) to do this.
Or we could use the `map()` function, which accepts an array and a function as arguments.
:::{.callout-note collapse="true"}
```{julia}
function squares(x)
y = zeros(eltype(x), length(x))
for i in eachindex(x)
y[i] = x[i]^2
end
return y
end
squares(1:10)
```
:::
```{julia}
map(x -> x^2, 1:10)
```
In the above example, we are using an anonymous function that takes each element of the array `1:10`, assigns it to the variable `x`, and then squares it before returning it in a Vector.
The `map()` function returns a vector of the same length as the input array, which is nice when we know want the output vector to be the same length as the input vector, as it removes the need for us to manually perform **bounds checking**.
In slightly more complicated scenarios, the anonymous function may become unwieldy.
Here, we can either write a named function that we use in the `map()` function, or use a `do` block.
```{julia}
function square_element(x)
return x^2
end
map(square_element, 1:10)
```
Much like the anonymous function, when we use a `do` block, we assign the elements being iterated over to a variable name that we can manipulate.
```{julia}
map(1:10) do x
x^2
end
```
The `do` block is useful when we only want to perform some operations once, so it's not necessary to create a named function.
It is also very helpful when we want to pass multiple arguments to a function, as we will see.
Here, we have two vectors of the same length and we want to perform element-wise addition i.e., add the first index of each vector together, the second elements, and so on.
Doing this without the `do` block is possible, but much more cumbersome.
Here, we can use the `zip()` function to combine the two vectors into a vector of tuples that can be iterated over by `map()`.
```{julia}
vec_a = 1:10
vec_b = 11:20
map(zip(vec_a, vec_b)) do (a, b)
a + b
end
```
::: {.callout-note collapse="true"}
```{julia}
collect(zip(vec_a, vec_b))
```
:::
::: {.callout-note collapse="true"}
Without the `do` block, we could write this.
Note the `,` between the two closing parentheses in the anonymous function i.e., `...b),)`
```{julia}
map(
((a, b),) -> a + b,
zip(vec_a, vec_b)
)
```
:::
## Additional Resources
I'd recommend checking out the following resources to learn more about Julia (roughly in descending order of preference due to complexity and target audience)
- [Scientific Computing for the Rest of Us](https://sciencecomputing.io)
- [Julia Academy](https://juliaacademy.com)
- [Julia Data Science](https://juliadatascience.io)
- [Doggo JL](https://www.youtube.com/@doggodotjl)
- [Julia Documentation](https://docs.julialang.org/en/v1/)
- [Quantitative Economics with Julia](https://julia.quantecon.org/intro.html)
- [Think Julia](https://benlauwens.github.io/ThinkJulia.jl/latest/book.html)
- [PumasAI Tutorials](https://tutorials.pumas.ai)
- [Julia For Economists Video Series](https://www.youtube.com/playlist?list=PLbuwVVKCI3sRW0Y5ehBFwdFVuyuy87ram)
- [Advanced Scientific Computing: Producing Better Code](https://www.youtube.com/playlist?list=PL-G47MxHVTewUm5ywggLvmbUCNOD2RbKA)