Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lzw compression #157

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from
Draft

Lzw compression #157

wants to merge 8 commits into from

Conversation

patricoferris
Copy link

Hello! Thank you for the excellent library!

For ocaml-tiff I'm going to need to support LZW compression so I've started adding an implementation here for that compression scheme. This is still early and in draft whilst I'm still working out the kinks, but thought I would open it early in case people have opinions/views etc.

@dinosaure
Copy link
Member

Thank you for your excellent work! Would it be possible to keep the same API as zl or gz? Namely, src: [ 'Manual | 'String ... ] (and the same for dst)?

@patricoferris
Copy link
Author

Thanks!

Would it be possible to keep the same API as zl or gz? Namely, src: [ 'Manual | 'String ... ] (and the same for dst)?

Yes I think it is totally possible, I think the 'Manual mode can be used to provide the API for the OCaml 5+ effect-based IO libraries.

What do you think about the internal byte array issue (i.e. bytes vs. bigstring). I think there's a desire for the ecosystem to rally around bytes with all the shiny new get_uint style functions. But the final gap in the API is that the bytes move in the heap (the main reason Eio still uses Cstruct/bigstring). It would be a shame to have to do a lot of userspace copying if someone chooses bytes (e.g. Miou). But maybe I'm missing something here ?

@patricoferris
Copy link
Author

I think I'll need to support different byte orders too (GIF is LSB and TIFF is MSB https://fuchsia.googlesource.com/third_party/wuffs/+/HEAD/std/lzw/README.md)

@dinosaure
Copy link
Member

What do you think about the internal byte array issue (i.e. bytes vs. bigstring). I think there's a desire for the ecosystem to rally around bytes with all the shiny new get_uint style functions. But the final gap in the API is that the bytes move in the heap (the main reason Eio still uses Cstruct/bigstring). It would be a shame to have to do a lot of userspace copying if someone chooses bytes (e.g. Miou). But maybe I'm missing something here ?

It's a real question, but one that unfortunately goes beyond decompress and cannot be resolved here. The choice of bigarray is very open to criticism and my only argument would be that it's a historical choice. Once again, it's more a question of offering a coherent API than an efficient one.1

As far as Miou is concerned, there's nothing to stop you offering functions that read/write in bigarrays, like what Lwt_bytes offers!

I think I'll need to support different byte orders too (GIF is LSB and TIFF is MSB

You can probably add whether it's LSB or MSB in the parameters for creating a decoder.2

type decoder

val decoder: order:[ `LSB | `MSB ] -> src -> decoder

1: Only LZO has a different API but that's because the format isn't really streamable like zlib or gzip.
2: like uutf

@johnwhitington
Copy link

Hello! Thank you for the excellent library!

For ocaml-tiff I'm going to need to support LZW compression so I've started adding an implementation here for that compression scheme. This is still early and in draft whilst I'm still working out the kinks, but thought I would open it early in case people have opinions/views etc.

There's an OCaml LZW decompressor here:

https://github.com/johnwhitington/camlpdf/blob/master/pdfcodec.ml#L294

It's probably not the fastest, but it's solid, including on some common malformities. You might find it useful for comparison purposes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants