Best approach to read zip of gzip compressed files? #447
-
The real world context is I am attempting to read the response from the Amplitude export API (https://www.docs.developers.amplitude.com/analytics/apis/export-api/#response) in a performant way (memory constrained edge environment). The amplitude API streams a zip of json files (that are themselves gzip compressed). I am attempting to read the response stream, and process the data from one json file in the response zip at a time without writing anything to disk (still streaming - the compressed json files are jsonl which makes it easy to process one json object at a time if I can get that far). Here's a simplified example zip that reflects the behavior of the amplitude API (zip of two gzip compressed json files). I was hoping to do something like this, but can't quite wrap my head around how to marry what I know about web streams with the zip.js API. Any pointers would be much appreciated - thank you so much for creating this library! // @ts-expect-error ignore
import fs from 'node:fs';
// @ts-expect-error ignore
import { Readable } from 'node:stream';
import * as zip from '@zip.js/zip.js';
zip.configure({
useWebWorkers: false,
});
export const debug = async () => {
const nodeReadable = fs.createReadStream('./export.zip');
const fsStream = Readable.toWeb(nodeReadable);
const zipReader = new zip.ZipReader(fsStream);
const entries = zipReader.getEntriesGenerator();
for await (const entry of entries) {
console.log('process entry', entry.filename);
const { writable, readable } = new TransformStream();
readable
.pipeThrough(new DecompressionStream('gzip'))
.pipeThrough(new TextDecoderStream('utf-8'))
.pipeThrough(
new TransformStream({
transform(chunk, controller) {
console.log('text chunk', chunk);
// console.log(inflate.append(chunk));
controller.enqueue(chunk);
},
}),
);
// .pipeThrough(splitStream('\n'))
// .pipeThrough(parseJSON());
await entry.getData!({ writable });
}
await zipReader.close();
}; |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 3 replies
-
Are you sure entries are compressed with gzip? It's not mentioned in the documentation (https://www.docs.developers.amplitude.com/analytics/apis/export-api/#considerations). |
Beta Was this translation helpful? Give feedback.
-
I know it's not mentioned but I am 100% sure :). |
Beta Was this translation helpful? Give feedback.
-
However even if they were not, I'm still having trouble figuring out how to combine the stream utils I have with zip.js (for example I have some TransformStream classes to process a stream of |
Beta Was this translation helpful? Give feedback.
You can get the
readable
fromgetData()
once processed, it's theReadableStream
returned bypipeThrough()
(which is ignored in your code).