module Ascii85

Ascii85 is an implementation of Adobe’s binary-to-text encoding of the same name in pure Ruby.

See en.wikipedia.org/wiki/Ascii85 for more information about the format.

Author

Johannes Holzfuß (johannes@holzfuss.name)

License

Distributed under the MIT License (see LICENSE file)

Constants

EMPTY_STRING
ENDING_MARKER
LINE_BREAK
START_MARKER
VERSION

Public Class Methods

decode(str, out: nil) click to toggle source

Searches through a String and decodes the first substring enclosed by ‘<~’ and ‘~>’.

@param str [String] The String containing Ascii85-encoded content @param out [IO, nil] An optional IO-like object to write the output to

@return [String, IO] The decoded String (in ASCII-8BIT encoding) or the output IO object (if it was provided)

@raise [Ascii85::DecodingError] When malformed input is encountered

@example Decoding Ascii85 content

Ascii85.decode("<~;KZGo~>")
# => "Ruby"

@example Decoding with multiple Ascii85 blocks present (ignores all but the first)

Ascii85.decode("Foo<~;KZGo~>Bar<~87cURDZ~>Baz")
# => "Ruby"

@example When no delimiters are found

Ascii85.decode("No delimiters")
# => ""

@example Decoding to an IO object

output = StringIO.new
Ascii85.decode("<~;KZGo~>", out: output)
# => output (with "Ruby" written to it)

@note This method only accepts a String, not an IO-like object, as the entire input

needs to be available to ensure validity.
# File lib/ascii85.rb, line 198
def decode(str, out: nil)
  decode_raw(extract(str), out: out)
end
decode_raw(str_or_io, out: nil) click to toggle source

Decodes the given raw Ascii85-encoded String or IO-like object.

@param str_or_io [String, IO] The Ascii85-encoded input to decode @param out [IO, nil] An optional IO-like object to write the output to

@return [String, IO] The decoded String (in ASCII-8BIT encoding) or the output IO object (if it was provided)

@raise [Ascii85::DecodingError] When malformed input is encountered

@example Decoding a raw Ascii85 String

Ascii85.decode_raw(";KZGo")
# => "Ruby"

@example Decoding from an IO-like object

input = StringIO.new(";KZGo")
Ascii85.decode_raw(input)
# => "Ruby"

@example Decoding to an IO object

output = StringIO.new
Ascii85.decode_raw(";KZGo", out: output)
# => output (with "Ruby" written to it)

@note The input must not be enclosed in ‘<~’ and ‘~>’ delimiters.

# File lib/ascii85.rb, line 228
def decode_raw(str_or_io, out: nil)
  reader = if io_like?(str_or_io)
             str_or_io
           else
             StringIO.new(str_or_io.to_s, 'rb')
           end

  # Return an unfrozen String on empty input
  return EMPTY_STRING.dup if reader.eof?

  # Setup buffered Reader and Writers
  bufreader = BufferedReader.new(reader, encoded_chunk_size)
  bufwriter = BufferedWriter.new(out || StringIO.new(String.new, 'wb'), unencoded_chunk_size)

  # Populate the lookup table (caches the exponentiation)
  lut = (0..4).map { |count| 85**(4 - count) }

  # Decode
  word   = 0
  count  = 0
  zeroes = unfrozen_binary_copy("\0\0\0\0")
  wordbuf = zeroes.dup

  bufreader.each_chunk do |chunk|
    chunk.each_byte do |c|
      case c.chr
      when ' ', "\t", "\r", "\n", "\f", "\0"
        # Ignore whitespace
        next

      when 'z'
        raise(Ascii85::DecodingError, "Found 'z' inside Ascii85 5-tuple") unless count.zero?

        # Expand z to 0-word
        bufwriter.write(zeroes)

      when '!'..'u'
        # Decode 5 characters into a 4-byte word
        word  += (c - 33) * lut[count]
        count += 1

        if count == 5 && word > 0xffffffff
          raise(Ascii85::DecodingError, "Invalid Ascii85 5-tuple (#{word} >= 2**32)")
        elsif count == 5
          b3 = word & 0xff; word >>= 8
          b2 = word & 0xff; word >>= 8
          b1 = word & 0xff; word >>= 8
          b0 = word

          wordbuf.setbyte(0, b0)
          wordbuf.setbyte(1, b1)
          wordbuf.setbyte(2, b2)
          wordbuf.setbyte(3, b3)

          bufwriter.write(wordbuf)

          word  = 0
          count = 0
        end

      else
        raise(Ascii85::DecodingError, "Illegal character inside Ascii85: #{c.chr.dump}")
      end
    end
  end

  # We're done if all 5-tuples have been consumed
  if count.zero?
    bufwriter.flush
    return out || bufwriter.io.string.force_encoding(Encoding::ASCII_8BIT)
  end

  raise(Ascii85::DecodingError, 'Last 5-tuple consists of single character') if count == 1

  # Finish last, partially decoded 32-bit word
  count -= 1
  word  += lut[count]

  bufwriter.write((word >> 24).chr) if count >= 1
  bufwriter.write(((word >> 16) & 0xff).chr) if count >= 2
  bufwriter.write(((word >> 8) & 0xff).chr) if count == 3
  bufwriter.flush

  out || bufwriter.io.string.force_encoding(Encoding::ASCII_8BIT)
end
encode(str_or_io, wrap_lines = 80, out: nil) click to toggle source

Encodes the bytes of the given String or IO-like object as Ascii85.

@param str_or_io [String, IO] The input to encode @param wrap_lines [Integer, false] The line length for wrapping, or false for no wrapping @param out [IO, nil] An optional IO-like object to write the output to

@return [String, IO] The encoded String or the output IO object that was passed in

@example Encoding a simple String

Ascii85.encode("Ruby")
# => <~;KZGo~>

@example Encoding with line wrapping

Ascii85.encode("Supercalifragilisticexpialidocious", 15)
# => <~;g!%jEarNoBkD
#    BoB5)0rF*),+AU&
#    0.@;KXgDe!L"F`R
#    ~>

@example Encoding without line wrapping

Ascii85.encode("Supercalifragilisticexpialidocious", false)
# => <~;g!%jEarNoBkDBoB5)0rF*),+AU&0.@;KXgDe!L"F`R~>

@example Encoding from an IO-like object

input = StringIO.new("Ruby")
Ascii85.encode(input)
# => "<~;KZGo~>"

@example Encoding to an IO object

output = StringIO.new
Ascii85.encode("Ruby", out: output)
# => output (with "<~;KZGo~>" written to it)
# File lib/ascii85.rb, line 56
def encode(str_or_io, wrap_lines = 80, out: nil)
  reader = if io_like?(str_or_io)
             str_or_io
           else
             StringIO.new(str_or_io.to_s, 'rb')
           end

  return EMPTY_STRING.dup if reader.eof?

  # Setup buffered Reader and Writers
  bufreader = BufferedReader.new(reader, unencoded_chunk_size)
  bufwriter = BufferedWriter.new(out || StringIO.new(String.new, 'wb'), encoded_chunk_size)
  writer = wrap_lines ? Wrapper.new(bufwriter, wrap_lines) : DummyWrapper.new(bufwriter)

  padding = unfrozen_binary_copy("\0\0\0\0")
  tuplebuf = unfrozen_binary_copy('!!!!!')
  exclamations = unfrozen_binary_copy('!!!!!')
  z = unfrozen_binary_copy('z')

  bufreader.each_chunk do |chunk|
    chunk.unpack('N*').each do |word|
      # Encode each big-endian 32-bit word into a 5-character tuple (except
      # for 0, which encodes to 'z')
      if word.zero?
        writer.write(z)
      else
        word, b0 = word.divmod(85)
        word, b1 = word.divmod(85)
        word, b2 = word.divmod(85)
        word, b3 = word.divmod(85)
        b4 = word

        tuplebuf.setbyte(0, b4 + 33)
        tuplebuf.setbyte(1, b3 + 33)
        tuplebuf.setbyte(2, b2 + 33)
        tuplebuf.setbyte(3, b1 + 33)
        tuplebuf.setbyte(4, b0 + 33)

        writer.write(tuplebuf)
      end
    end

    next if (chunk.bytesize & 0b11).zero?

    # If we have leftover bytes, we need to zero-pad to a multiple of four
    # before converting to a 32-bit word.
    padding_length = (-chunk.bytesize) % 4
    trailing = chunk[-(4 - padding_length)..]
    word = (trailing + padding[0...padding_length]).unpack1('N')

    # Encode the last word and cut off any padding
    if word.zero?
      writer.write(exclamations[0..(4 - padding_length)])
    else
      word, b0 = word.divmod(85)
      word, b1 = word.divmod(85)
      word, b2 = word.divmod(85)
      word, b3 = word.divmod(85)
      b4 = word

      tuplebuf.setbyte(0, b4 + 33)
      tuplebuf.setbyte(1, b3 + 33)
      tuplebuf.setbyte(2, b2 + 33)
      tuplebuf.setbyte(3, b1 + 33)
      tuplebuf.setbyte(4, b0 + 33)

      writer.write(tuplebuf[0..(4 - padding_length)])
    end
  end

  # If no output IO-object was provided, extract the encoded String from the
  # default StringIO writer. We force the encoding to 'ASCII-8BIT' to work
  # around a TruffleRuby bug.
  return writer.finish.io.string.force_encoding(Encoding::ASCII_8BIT) if out.nil?

  # Otherwise we make sure to flush the output writer, and then return it.
  writer.finish.io
end
extract(str) click to toggle source

Searches through a String and extracts the first substring enclosed by ‘<~’ and ‘~>’.

@param str [String] The String to search through

@return [String] The extracted substring, or an empty String if no valid delimiters are found

@example Extracting Ascii85 content

Ascii85.extract("Foo<~;KZGo~>Bar<~z~>Baz")
# => ";KZGo"

@example When no delimiters are found

Ascii85.extract("No delimiters")
# => ""

@note This method only accepts a String, not an IO-like object, as the entire input

needs to be available to ensure validity.
# File lib/ascii85.rb, line 152
def extract(str)
  input = str.to_s

  # Make sure the delimiter Strings have the correct encoding.
  opening_delim = '<~'.encode(input.encoding)
  closing_delim = '~>'.encode(input.encoding)

  # Get the positions of the opening/closing delimiters. If there is no pair
  # of opening/closing delimiters, return an unfrozen empty String.
  (start_pos = input.index(opening_delim))                or return EMPTY_STRING.dup
  (end_pos   = input.index(closing_delim, start_pos + 2)) or return EMPTY_STRING.dup

  # Get the String inside the delimiter-pair
  input[(start_pos + 2)...end_pos]
end

Private Class Methods

encoded_chunk_size() click to toggle source

@return [Integer] Buffer size for encoded output

# File lib/ascii85.rb, line 453
def encoded_chunk_size
  5 * 2048
end
io_like?(obj) click to toggle source

Check if an object is IO-like

@private

# File lib/ascii85.rb, line 440
def io_like?(obj)
  obj.respond_to?(:read) &&
    obj.respond_to?(:eof?)
end
unencoded_chunk_size() click to toggle source

@return [Integer] Buffer size for to-be-encoded input

# File lib/ascii85.rb, line 447
def unencoded_chunk_size
  4 * 2048
end
unfrozen_binary_copy(str) click to toggle source

Copies the given String and forces the encoding of the returned copy to be Encoding::ASCII_8BIT.

# File lib/ascii85.rb, line 318
def unfrozen_binary_copy(str)
  str.dup.force_encoding(Encoding::ASCII_8BIT)
end