Deno - UTF-8 Encoding & Decoding Examples

This tutorial shows you how to perform UTF-8 encoding & decoding in Deno.

UTF-8 is a fixed-width character encoding. It's the most common encoding for the World Wide Web. It can be used to encode all 1,112,04 valid character code points in Unicode.

UTF-8 works by encoding each character into one to four bytes depending on the code point of the character. Frequently used characters are usually encoded to fewer bytes. You can see the table below. The x characters are replaced by the bits of the code point. For example, If a character's code point is in U+0000 ~ U+007F range, it will be encoded to one byte. If the character's code point is in U+0800 ~ U+FFFF range, it will be encoded to three bytes.

Number of Bytes Code point range Byte 1 Byte 2 Byte 3 Byte 4
1 U+0000 ~ U+007F 0xxxxxxx      
2 U+0080 ~ U+07FF 110xxxxx 10xxxxxx    
3 U+0800 ~ U+FFFF 1110xxxx 10xxxxxx 10xxxxxx
4 U+10000 ~ U+10FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

As an example, we are going to encode a string 'wð𐍈lhå using UTF-8 encoding. For each character, you need to get the code point and match it with the above table to determine the result in binary. After that, convert the binary result to the corresponding UTF-8 characters.

Character Code Point Binary UTF-8 Binary UTF-8 Character
w U+0077 1110111 01110111 119
ð U+00F0 11110000 11000011 10110000 195,176
𐍈 U+10348 000010000001101001000 11110000 10010000 10001101 10001000 240,144,141,136
l U+006C 1101100 01101100 108
h U+0068 1101000 01101000 104
å U+00E5 11100101 11000011 10100101 195,165

To perform UTF-8 encoding and decoding in Deno, you don't have to implement the encode and decode functions yourself. Deno has TextEncoder and TextDecoder for that purpose. The usage examples are shown below.

Using TextEncoder and TextDecoder

Encode Using TextEncoder

TextEncoder has a function named encode which returns the result of running UTF-8 encoder.

  encode(input?: string): Uint8Array;

Example:

  import { base32Encode } from './deps.ts';

  const textEncoder = new TextEncoder();
  const encodedValue = textEncoder.encode('wð𐍈lhå');
  console.log(`encodedValue: ${encodedValue}`);

Output:

  encodedValue: 119,195,176,240,144,141,136,108,104,195,165

TextEncoder also has a function named encodeInto. It encodes the value passed as source and stores the result in destination. The function returns an object with two fields; read and written. read is the number of converted code units of source, while written is the number of bytes modified in destination.

  encodeInto(source: string, destination: Uint8Array): TextEncoderEncodeIntoResult;

Example:

  const textEncoder = new TextEncoder();
  const bytes = new Uint8Array(64);
  const result = textEncoder.encodeInto('wð𐍈lhå', bytes);
  console.log(bytes);
  console.log(result);

Output:

  Uint8Array(64) [
    119, 195, 176, 240, 144, 141, 136, 108, 104, 195, 165, 0, 0,
      0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 0, 0,
      0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 0, 0,
      0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 0, 0,
      0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 0
  ]
  { read: 7, written: 11 }

Decode Using TextDecoder

TextDecoder's decode function can be used to decode a UTF-8 encoded value into a string.

  decode(input?: BufferSource, options?: TextDecodeOptions): string;

Example:

  import { base32Decode } from './deps.ts';

  const textDecoder = new TextDecoder();
  const decodedValue = textDecoder.decode(encodedValue);
  console.log(`decodedValue: ${decodedValue}`);

Output:

  decodedValue: wð𐍈lhå5

Using Deno std UTF-8 Module

The above solution requires you to create new TextEncoder and TextDecoder instances in each function or file where you want to perform encoding or decoding. That can be inefficient if you need to perform the operations in many files. A better approach is only creating the instances of TextEncoder and TextDecoder once and reuse them on other files. The utf8 module of Deno std already implements that approach. To use the module, you need to import and re-export the functions on deps.ts file.

deps.ts

  import {
    decode as utf8Decode,
    encode as utf8Encode,
  } from 'https://deno.land/std@0.82.0/encoding/utf8.ts';

  export { utf8Decode, utf8Encode };

Then, use it in another file.

  import { utf8Decode, utf8Encode } from './deps.ts';

  const encodedValue = utf8Encode('wð𐍈lhå');
  console.log(`encodedValue: ${encodedValue}`);

  const decodedValue = utf8Decode(encodedValue);
  console.log(`decodedValue: ${decodedValue}`);

If you don't want to import the remote module, you can use it as a reference to implement a similar approach.

Summary

That's how to perform UTF-8 encoding and decoding in Deno. You can utilize TextEncoder to encode a value to UTF-8 and TextDecoder to decode a UTF-8 encoded value. It would be better if you use the same instances of TextEncoder and TextDecoder across different files, such as by using utf8 module of Deno std.

Related Posts: