Class GzipBlockStream

java.lang.Object
io.datarouter.bytes.GzipBlockStream

public class GzipBlockStream extends Object
Gzip normally encodes and decodes in a single thread which underutilizes multi-threaded hardware. The data must be written to a single OutputStream and read from a single InputStream which is not parallelizable. Alternatively we can split the data into blocks and run the gzip encoding on each block in separate threads. This class splits the incoming bytes into blocks and gzips each block independently. Small tokens can be combined into GzipBlockStreamRow objects, where each row will be fully owned by one block. When writing, it prepends a block length header before writing each gzipped data block. When reading, the main thread can pull gzipped blocks from the InputStream and pass them to other threads to decode. Besides un-gzipping, blocks can further decoded in helper threads. Note that this is not compatible with the normal Gzip file format.
  • Constructor Details

    • GzipBlockStream

      public GzipBlockStream()
    • GzipBlockStream

      public GzipBlockStream(int blockSize)
    • GzipBlockStream

      public GzipBlockStream(int blockSize, int encodeBufferSize, int gzipBufferSize, int decodeBufferSize)
  • Method Details

    • encode

      public io.datarouter.scanner.Scanner<GzipBlockStream.GzipBlockStreamEncodedBlock> encode(io.datarouter.scanner.Scanner<GzipBlockStream.GzipBlockStreamRow> rows)
      Split the provided rows into larger blocks. Encode the blocks to gzip.
    • encodeParallel

      public io.datarouter.scanner.Scanner<GzipBlockStream.GzipBlockStreamEncodedBlock> encodeParallel(io.datarouter.scanner.Scanner<GzipBlockStream.GzipBlockStreamRow> rows, io.datarouter.scanner.Threads threads)
      Split the provided rows into larger blocks. Pass each block to the provided executor for parallel encoding to gzip.
    • decode

      public io.datarouter.scanner.Scanner<byte[]> decode(InputStream inputStream)
      Convert an InputStream containing gzip blocks into a Scanner of raw blocks.
    • decodeParallel

      public io.datarouter.scanner.Scanner<byte[]> decodeParallel(InputStream inputStream, io.datarouter.scanner.Threads threads)
      Convert an InputStream containing gzip blocks into a Scanner of raw blocks. Offload the gzip decoding to the provided executor.
    • getNumBlocksEncoded

      public long getNumBlocksEncoded()
    • resetCounters

      public GzipBlockStream resetCounters()