-
Notifications
You must be signed in to change notification settings - Fork 1
/
TODO
45 lines (36 loc) · 1.58 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
# Critical :
C1 - line 230 of script 4 ( data = response.read() ) may sometime crashes...
Test it with a "try / except" for handling stop here...
# Highly Required :
# Recommended :
R1 - Change the name of the temporary file by a combination of a common
prefix + the name of the input file ?
# Why not :
WN1 - create another 3rd script that downloads TXT only
### Debats Chambre des Députés : pictures get by adding another 'f'
https://gallica.bnf.fr/ark:/12148/bpt6k6494792j
https://gallica.bnf.fr/ark:/12148/bpt6k6494792j/f1.image/f2.jpeg?download=1"
### Tables annuelles (paid/on Retronews for some of them) :
https://gallica.bnf.fr/ark:/12148/bpt6k3997476x
https://gallica.bnf.fr/ark:/12148/bpt6k3997476x/f2.highres
# -- DONE --
HR1 - put a date in front of the Ark ID for better management of the dates
HR2 - create another 3rd script that downloads PDF only
HR3 - check the logs and errors
HR4 - modify script 2 for resolving multiple Ark ID behind one date
Example:
https://gallica.bnf.fr/ark:/12148/cb371291967/date.item
https://gallica.bnf.fr/ark:/12148/cb371291967/date1940
https://gallica.bnf.fr/ark:/12148/cb371291967/date19400101
=>
https://gallica.bnf.fr/ark:/12148/bpt6k9702712r?rk=21459;2
https://gallica.bnf.fr/ark:/12148/bpt6k2095638x?rk=42918;4
=>
https://gallica.bnf.fr/ark:/12148/bpt6k9702712r
https://gallica.bnf.fr/ark:/12148/bpt6k2095638x
Instead of :
https://gallica.bnf.fr/ark:/12148/cb328020951/date.item
https://gallica.bnf.fr/ark:/12148/cb328020951/date1893
https://gallica.bnf.fr/ark:/12148/cb328020951/date18930201
=>
https://gallica.bnf.fr/ark:/12148/bpt6k64285087.item