NOREC4DNA is an all-in-one Suite for analyzing, testing and converting Data into DNA-Chunks to use for a DNA-Storage-System using integrated DNA-Rules as well as the MOSLA DNA-Simulation-API.
NOREC4DNA implements LT, Online, and Raptor (RU10) Fountain Codes.
- Building the docker container from source:
git clone git@github.com:umr-ds/NOREC4DNA.git
docker build . --tag norec4dna
- pulling the container from Dockerhub:
- TBA once NOREC4DNA is available on Dockerhub
-
Clone the repository:
git clone git@github.com:umr-ds/NOREC4DNA.git
-
OPTIONAL create a virtual environment (recommended):
python3 -m venv <name_of_virtualenv>
- activate/source the newly created venv
-
Installing the dependencies:
- depending on your distro you might need to manually install
LLVM
as well asgcc
andbuild-essential
pip3 install -r requirements.txt
- if a packages fails to install, it might be required to install the python3-dev packages using apt
- depending on your distro you might need to manually install
-
Install NOREC4DNA:
python3 setup.py install
If you plan to build NOREC4DNA from source under Windows we recommend using Anaconda!
To get the en- and decoded files from NOREC4DNA using docker you might need to map a volume into the container.
Alternatively you could use the docker cp
command to transfer the desired files.
Build the docker container:
docker build --tag norec4dna_gd
Run:
docker run --name norec4dna_gd_multiple_files -d -t -v /tmp/norec4dna/:/norec4dna/tmp norec4dna_gd (Parameter...)
Alternatively you can run the script directly:
python3 find_minimum_packets.py <Parameters>
First enter the filename of the file to generate the packets for.
FILE (--parameters)
The following parameters can be set:
--repair_symbols=[no_symbols]
The number of repair_symbols for ReedSolomon (default=2). This does only apply if --error_correction is set to reedsolomon
--list_size=[size]
Size of operational list per thread, inferred by the number cores if sequential is set to true. The list size should always be greater than the out_size to ensure optimal results (default=1000).
--out_size=[size]
Number of packets to save after combining the lists and sorting them by the packets error_prob (default=1000).
--chunk_size=[size]
Size of chunks to split the file into, inferred from number of chunks and the filesize if not set (default=0).
--number_of_chunks=[no_chunks]
Number of chunks to split the file into, ignored if chunksize is set to value != 0 (default=300).
--sequential
If set, all seed will be generated in a sequential matter. (Recommended!)
--spare1core
If activated, one core is not used for the calculation of the lists.
--method=[RU10/Online/LT]
Sets the method to generate the packets with. Available are RU10, Online and LT.
--seed_size_str=[I,H,...]
Set the struct-string for the seed field. See https://docs.python.org/3/library/struct.html#format-characters for more information
--drop_above
Sets an upper-limit for the error probability. WARNING: This might reduce the total number of sequences returned!
--optimization
Activates the automated optimization of the chunk distribution in the packets with different options.
--overhead=[overhead (0.1=10%)]
Overhead to use for the optimization, where 0.1 means 10% additional packets based on the number of packets needed to decode the file (default=0.1).
--overhead_factor=[factor (0.1=10%)]
If the overhead is not enough to optimize the packets, the overhead factor is a factor that allows exceeding the given overhead to try to optimize the chunk distribution (default=0.0).
--errorprob_factor=[factor (0.1=10%)]
A factor for the maximum allowed error_prob of the additional packets based on the average error_prob of the packets needed to decode (default=0.1). If set to 0.0 no more packets may be added since the packets with the lowest error_probs were already used to decode the file.
--plot
Generates and saves different plots to show the results.
Demo applications for fast en- and decoding of sequences.
--error_correction [nocode, crc, reedsolomon]
Defines the error detection / correction algorithm to use per packet. (Default: nocode = no error-detection/correction)
--split_input
Sets the number of pre-splits to perform Default: 1 (= do not split the input file into multiple NOREC rounds) WARNING: If set, this value should be known during decoding (thus using a bruteforce approach this value might be reconstructed)
--store_as_fasta
If set, stores the result in a .fasta file instead of one file per sequence
--insert_header
If set, besides the created chunks an additional header chunk will be added. This chunk stores the filename and the correct padding for the last chunk. (Recommended!) WARNING: If not set, the reconstructed file will most likely be longer due to the \00-padding at the end.
Allows easy en- and decoding used .ini files. Since the supplied encoder can create such .ini files, this is especially useful for easy decoding.
there are various more or less useful scripts inside helpful_scripts/
python demo_raptor_encode.py Dorn --error_correction=reedsolomon --repair_symbols=3 --as_dna --insert_header
python demo_raptor_decode.py RU10_Dorn --use_header_chunk --error_correction=reedsolomon --repair_symbols=3 --number_of_chunks=145 (number as seen in the ini, unless --save_number_of_chunks was defined during encoding)
python ConfigWorker.py <name of the .ini-file>
The decoded file will be saved as DEC_RU10_Dorn if no header-chunk was added during encoding, otherwise the file will be saved under the correct filename.