Package io.datarouter.bytes
Class GzipBlockStream
java.lang.Object
io.datarouter.bytes.GzipBlockStream
Gzip normally encodes and decodes in a single thread which underutilizes multi-threaded hardware.
The data must be written to a single OutputStream and read from a single InputStream which is not parallelizable.
Alternatively we can split the data into blocks and run the gzip encoding on each block in separate threads.
This class splits the incoming bytes into blocks and gzips each block independently.
Small tokens can be combined into GzipBlockStreamRow objects, where each row will be fully owned by one block.
When writing, it prepends a block length header before writing each gzipped data block.
When reading, the main thread can pull gzipped blocks from the InputStream and pass them to other threads to decode.
Besides un-gzipping, blocks can further decoded in helper threads.
Note that this is not compatible with the normal Gzip file format.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classReturned while encoding with convenience methods.static final recordOne or more tokens that make up a "row" of data. -
Constructor Summary
ConstructorsConstructorDescriptionGzipBlockStream(int blockSize) GzipBlockStream(int blockSize, int encodeBufferSize, int gzipBufferSize, int decodeBufferSize) -
Method Summary
Modifier and TypeMethodDescriptionio.datarouter.scanner.Scanner<byte[]>decode(InputStream inputStream) Convert an InputStream containing gzip blocks into a Scanner of raw blocks.io.datarouter.scanner.Scanner<byte[]>decodeParallel(InputStream inputStream, io.datarouter.scanner.Threads threads) Convert an InputStream containing gzip blocks into a Scanner of raw blocks.io.datarouter.scanner.Scanner<GzipBlockStream.GzipBlockStreamEncodedBlock>encode(io.datarouter.scanner.Scanner<GzipBlockStream.GzipBlockStreamRow> rows) Split the provided rows into larger blocks.io.datarouter.scanner.Scanner<GzipBlockStream.GzipBlockStreamEncodedBlock>encodeParallel(io.datarouter.scanner.Scanner<GzipBlockStream.GzipBlockStreamRow> rows, io.datarouter.scanner.Threads threads) Split the provided rows into larger blocks.long
-
Constructor Details
-
GzipBlockStream
public GzipBlockStream() -
GzipBlockStream
public GzipBlockStream(int blockSize) -
GzipBlockStream
public GzipBlockStream(int blockSize, int encodeBufferSize, int gzipBufferSize, int decodeBufferSize)
-
-
Method Details
-
encode
public io.datarouter.scanner.Scanner<GzipBlockStream.GzipBlockStreamEncodedBlock> encode(io.datarouter.scanner.Scanner<GzipBlockStream.GzipBlockStreamRow> rows) Split the provided rows into larger blocks. Encode the blocks to gzip. -
encodeParallel
public io.datarouter.scanner.Scanner<GzipBlockStream.GzipBlockStreamEncodedBlock> encodeParallel(io.datarouter.scanner.Scanner<GzipBlockStream.GzipBlockStreamRow> rows, io.datarouter.scanner.Threads threads) Split the provided rows into larger blocks. Pass each block to the provided executor for parallel encoding to gzip. -
decode
Convert an InputStream containing gzip blocks into a Scanner of raw blocks. -
decodeParallel
public io.datarouter.scanner.Scanner<byte[]> decodeParallel(InputStream inputStream, io.datarouter.scanner.Threads threads) Convert an InputStream containing gzip blocks into a Scanner of raw blocks. Offload the gzip decoding to the provided executor. -
getNumBlocksEncoded
public long getNumBlocksEncoded() -
resetCounters
-