Internet-Draft TODO - Abbreviation October 2025
Goncharov Expires 19 April 2026 [Page]
Workgroup:
Network Working Group
Internet-Draft:
draft-ietf-goncharov-rfcregsimples-latest
Published:
Intended Status:
Informational
Expires:
Author:
V. Goncharov
Your Organization Here

CBOR Simple Values Range for Packing and Templating

Abstract

TODO Updates: 8949 Category: Draft Standard

The Concise Binary Object Representation (CBOR, RFC 8949 == STD 94) is a data format whose design goals include the possibility of extremely small code size, fairly small message size, and extensibility without the need for version negotiation.

CBOR does not provide any forms of data compression. While traditional data compression techniques such as DEFLATE (RFC 1951) can work well for CBOR encoded data items, their disadvantage is that the recipient needs to decompress the compressed form to make use of the data.

This document describes a set of CBOR simple values which can be used by different specifications utilizing them for CBOR transformations, such as compression or templating, in a non-conflicting way. This allows for future standards to reuse the smallest values range while inventing better ways to use them for achieving their goals.

About This Document

This note is to be removed before publishing as an RFC.

The latest revision of this draft can be found at https://nuclight.github.io/cbor-spec-rfcregsimples/draft-ietf-goncharov-rfcregsimples.html. Status information for this document may be found at https://datatracker.ietf.org/doc/draft-ietf-goncharov-rfcregsimples/.

Source for this draft and an issue tracker can be found at https://github.com/nuclight/cbor-spec-rfcregsimples.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 19 April 2026.

Table of Contents

1. Terminology and Conventions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

TBD any other terminology needed here?..

2. Motivation

In a world of constrained devices (IoT) such devices are often have very little memory. While these devices are also constrained in e.g. packet size so that compression is very desirable. Traditional generic compression algorithms like DEFLATE [RFC1951] perform well but require memory both for decompressed data and internal state of decompressor itself. Many constrained implementations would want to trade-off between compression ratio and required memory - tolerating worse compression by lowering memory usage, ideally to zero bytes (that is, using compressed data in-place as-is). In the ongoing IETF efforts the "MyLED" example in JSON [MyLED] takes 1210 bytes converted to CBOR which may be further reduced to about 350 bytes using DEFLATE, but this require more than 1.5 Kbytes of memory during decompression - in contrast, work in progress specifications like [draft-cbor-packed] and [CBAR] are able to reduce this example to about 500 bytes, however accessible in-place, without need for decompression buffer at all.

Sixteen simple values were intended to be used for compression purposes in the original CBOR proposal, but didn't get into the standard at the time being due to the complexity of the task. Since then, more than one proposal for compression techniques has appeared. While the usual way for Internet specifications is to acquire different codepoints (or ranges of them) from IANA, this does not work well in the compression world given the specifics of CBOR encoding - that is, different numbers are encoded in CBOR occupying different number of bytes. Compression efficiency, however, is very sensitive to length of codes used - the shorter is better. Thus, registration of short codes range for one standard exclusively would put other standards - especially future standards - into inequal and unfair position if there are no more short codes left for them. However, it is known that future inventions tend to be more effective in usage of same resources but are impossible to predict at the time of prior standard issued (otherwise enhancements would be already incorporated into it). As (at the time of writing) there are only 20 single-byte CBOR simple values available for registration, such situation can happen with just the first standard accepted.

Therefore, this document registers first sixteen CBOR simple values (0 to 15) in such a way that they could be used by different compression methods (different specifications), both existing and not yet invented. The way for decoder to distinguish them - that is, to interpret according to one specification or another - is done via CBOR Tag (acting as a "namespace") or by external means such as messages' media type.

This range registration is similar in spirit to [RFC 4251] allocating a range of message numbers e.g. 30 to 49 to be key exchange method specific, stating that numbers can be reused for different authentication methods - and examples of specifications using such range are [RFC 4419], [RFC 4432] or [RFC 5656].

3. Specification

This document registers CBOR values simple(0) to simple(15) for needs like packing or templating CBOR documents.

Following spirit of [RFC 4251], this specification could just declare this range to be packing/templating method specific (so text could end right here), but this would not be very useful to CBOR parsers implementers in terms of what could be expected from these values or what kind of changes may specific packing/templating methods require in their parsers. So in the text below we take slightly another approach.

This specification tries, speaking in object-oriented terms, to be the "Abstract Base Class" for other specifications to "inherit", in the sense of describing meaning as broad as possible, but concrete specifications do not need to implement every imaginable feature and MAY choose to define only subset.

Application protocols using CBOR simple values in 0..15 range, or generic packing/templating specifications, should include reference to this RFC about using range and then describe precise usage of these simple values.

The semantics of these simple values 0 to 15 is that they are substitutions for one or several other CBOR items. That is, when decoder sees values simple(0) to simple(15), it treats them as some mapping function (0 to 15 may be viewed as argument here) which takes argument and, depending on current decoder state, returns CBOR item(s) to substitute instead of simple value, as if simple value never occured in the stream, but returned CBOR item(s) occured instead.

Such mapping function may be very complex, taking into account even surrounding CBOR items context to decide. Or (especially expected for short-term solutions) "inheriting" specification may choose it to be rather simple, for example, simple(0)..simple(15) could be references (indexes) to some (kept by decoding process) table, or part of such table, so that instead of simple value an integral number of CBOR items is substituted from table entry. Note that this replacement items do not necessarily form a complete well-formed CBOR item if viewed separately as entry in the table. For example, a sequence

  [..., 1, 2, simple(2), "a", "b", "c", ...]

could expand to

  [..., 1, 2, ["foo", "bar", "a"], "b", "c", ...]

if the entry for simple(2) at this moment contained definite-length array start item (0x83) and two string items "foo" and "bar".

(Informally speaking, this could be described as "cut" in text editor a whole number of tokens (counting each string, brace, etc. as one token) in CBOR diagnostic notation and then "paste" it instead of each simple(N) occurrence).

Please note that "surrounding CBOR items context" (if specification goes complex way) may include e.g. absolute or relative position of simple value. In the (imaginary non-normative) example:

  10([simple(0), 16, "atom1",
     {
       "foo": simple(0),
       "bar": 10([simple(0), 16, "otheratom",
                 {
                    "baz": simple(0),
                    "quux": 10(122(0))
                 }
              ])
     }
  ])

the simple(0) at the very start of array expands to "function-name", but same simple(0) as the map value expands to (presumably) "atom1" under "foo" key and to "otheratom" under "baz" key. That is, if packing method specification chooses "index in table" approach, there could be several such tables.

Even more, "surrounding context" usage is not prohibited from explicit arguments and their "swallowing" after substitution, for example, in

  [..., 1, 2, "arg1", "arg2", simple(2), 3, ...]

or

  [..., 1, 2, simple(2), "arg1", "arg2", 3, ...]

simple(2) could be viewed as a function call telling to use 2 arguments from surrounding CBOR items, and then do something based on "arg1" and "arg2", so probably the whole triple will be replaced by function return value, not just simple(N) itself:

  [..., 1, 2, {"substituted": "sequence"}, ['frobnicate'], 3, ...]

However, such advanced usage is NOT RECOMMENDED for generic specifications because it is hard to do it right with generic CBOR parsers if such parsers are streaming / event-based. (This may be not true for application-specific parsers in application-specific packing/templating methods.)

Therefore, as usage of simple(0)..simple(15) can modify structure of CBOR document, they SHOULD be treated as error if used outside of an area where table (or part of such table) is not set up, e.g. outside of "namespace" tag or implied by media type. Exact mechanism to setup such tables (or modify it so that same simple value may expand to different content in different parts of CBOR stream) is left for definition by application protocol, or an application protocol may "inherit" it (implemented in a library) from generic specification from appropriate IETF RFC document (such as [draft-cbor-packed]).

3.1. Interoperability

Note that it is straightforward to have usage of simple values of different compression/templating specifications in the non-overlapping areas of a single CBOR document, because the primary method for specifications to manifest themselves is via using some CBOR Tag(s) as a "namespace" ("area"), like in this example:

  {
    "foo": 55510([..., simple(1), /meaning by spec 1/ ...]),
    "bar": 55513([..., simple(1), /meaning by spec 2/ ...])
  }

(the tag numbers here are fictitious to be used just as an example).

Tag nesting, however, which could be used for things like providing output of one decompression phase (by one specification) as input for another (by same or different specification) is more complicated case details of which are left out from this document to be defined by actual specifications. It is noted here for specifications' authors that possibility of such multi-phase (nested) processing is the reason for a "should" requirement above instead of "must" because simple values (of 0..15 range) produced as a result one phase may be fed to a different decoder.

Here, only some basic principle for tag nesting is defined: packing method tag MUST NOT be used as outer if it does not support preserving non-own simple values after applying it's transformations (unpacking). This is best illustrated by example: suppose that for imaginary tags above, specification of tag 55510 supports such preserving (in example, it will list integers corresponding to simple numbers and values corresponding to them) by leaving them as-is, and specification of tag 55513 uses simplistic implementation where simple(N) is just index N into table. Then, the following example is valid:

    55510([10, "foo", 11, "bar", 12, "baz", /* outer tag's setup / / outer tag's transformed data area / 55513([ / inner tag's / ["rgbValue", "rgbValueRed", "rgbValueGreen"], / setup / [/ inner tag's transformed data area */ simple(0), simple(1), simple(2), simple(10), simple(0), simple(1), simple(2), simple(11), simple(0), simple(1), simple(2), simple(12) ] ]) ])

As tags are processed from outside to inside, decoder of tag 55510 will make this fragment equivalent to:

  55513([                                    /* inner tag's */
     ["rgbValue", "rgbValueRed", "rgbValueGreen"], /* setup */
     [/* inner tag's transformed data area */
        simple(0), simple(1), simple(2), "foo",
        simple(0), simple(1), simple(2), "bar",
        simple(0), simple(1), simple(2), "baz"
     ]
  ])

which decoder of tag 55513 finally turns into unpacked form:

  [
     "rgbValue", "rgbValueRed", "rgbValueGreen", "foo",
     "rgbValue", "rgbValueRed", "rgbValueGreen", "bar",
     "rgbValue", "rgbValueRed", "rgbValueGreen", "baz"
  ]

However, the opposite is not possible because specification of tag 55513 does not make exceptions. The following example is NOT valid:

  55513([                                    /* outer tag's */
     ["rgbValue", "rgbValueRed", "rgbValueGreen"], /* setup */
     /* outer tag's transformed data area */
     55510([10, "foo", 11, "bar", 12, "baz", /* inner setup */
        [/* inner tag's transformed data area */
           simple(0), simple(1), simple(2), simple(10),
           simple(0), simple(1), simple(2), simple(11),
           simple(0), simple(1), simple(2), simple(12)
        ]
     ])
  ])

3.2. Validity checking.

Please note that the whole concept of CBOR document transformation, be it packing or templating, means that, in general case, it is not safe to do validity-checks even on known tags as described by Section 5.3.2 of [RFC 8949]. For example, Tag 32 means URI and requires text string as a content. However, the goal of packing is to reduce redundant information, which may mean replacing such text string with simple value, raising possibility of the following example:

  55513([..., 32(simple(1)), ...]

which would be considered as error by validity-checking decoder.

Note that Section 5.4 of [RFC 8949] does not cover this case, because it is focused on evolution, however, with packing mechanisms this may occur with not new but already known, stable tags.

Therefore, this document updates Section 5.4 of [RFC 8949] dictating that generic decoders performing validity-checking MUST provide a way to disable this validity-checking. Note that a mode, where decoder knows packing method and postpones validity-checking after unpacking, is possible - but is not sufficient because newer specifications, yet unknown to such decoder, may appear in the future after it's deployment.

4. IANA Considerations

In the registry "CBOR Simple Values" [IANA.cbor-simple-values], IANA is requested to allocate the simple values defined in Table 1.

   +=======+===================================+===========+
   | Value | Semantics                         | Reference |
   +=======+===================================+===========+
   | 0..15 | Packing and templating: shared by | RFC 9NNN  |
   |       | multiple specifications           |           |
   +-------+-----------------------------------+-----------+

                      Table 1: Simple Values

5. Security Considerations

The security considerations of [STD94] apply.

If the specification chooses to implement generic non-well-formed entries in table (modifying CBOR structure as in example above), care must be taken to avoid implementation bugs potentially leading to out-of-bounds accesses.

Specifications must decide what to do with possibility of decoding loops or infinite recursion, for example, if table entry for simple(M) includes another simple(N) values which then may be expanded recursively. Possible ways are either to not expand it at all, allowing for aforementioned multi-phase processing, or provide counter-measures similar to symlink processing in filesystems, or something else.

6. References.

TODO move it in YAML where appropriate

``` 6.1. Normative References

[BCP14] Best Current Practice 14, https://www.rfc-editor.org/info/bcp14. At the time of writing, this BCP comprises the following:

          Bradner, S., "Key words for use in RFCs to Indicate
          Requirement Levels", BCP 14, RFC 2119,
          DOI 10.17487/RFC2119, March 1997,
          <https://www.rfc-editor.org/info/rfc2119>.

          Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
          2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
          May 2017, <https://www.rfc-editor.org/info/rfc8174>.

[IANA.cbor-simple-values] IANA, "Concise Binary Object Representation (CBOR) Simple Values", https://www.iana.org/assignments/cbor-simple-values.

[STD94] Internet Standard 94, https://www.rfc-editor.org/info/std94. At the time of writing, this STD comprises the following:

          Bormann, C. and P. Hoffman, "Concise Binary Object
          Representation (CBOR)", STD 94, RFC 8949,
          DOI 10.17487/RFC8949, December 2020,
          <https://www.rfc-editor.org/info/rfc8949>.

[I-D.ietf-cbor-packed] Bormann, C. and M. Gütschow, "Packed CBOR", Work in Progress, Internet-Draft, draft-ietf-cbor-packed-13, 1 September 2024, <https://datatracker.ietf.org/doc/html/ draft-ietf-cbor-packed-13>.

6.2. Informative References

[RFC1951] Deutsch, P., "DEFLATE Compressed Data Format Specification version 1.3", RFC 1951, DOI 10.17487/RFC1951, May 1996, https://www.rfc-editor.org/rfc/rfc1951.

[RFC4251] Ylonen, T. and C. Lonvick, Ed., "The Secure Shell (SSH) Protocol Architecture", RFC 4251, January 2006.

[RFC4419] Friedl, M., Provos, N., and W. Simpson, "Diffie- Hellman Group Exchange for the Secure Shell (SSH) Transport Layer Protocol", RFC 4419, March 2006.

[RFC4432] Harris, B., "RSA Key Exchange for the Secure Shell (SSH) Transport Layer Protocol", RFC 4432, March 2006.

[RFC5656] Stebila, D. and J. Green, "Elliptic Curve Algorithm Integration in the Secure Shell Transport Layer", RFC 5656, December 2009.

[MyLED] <https://github.com/w3c/wot-thing-description/raw/ db8abb3655afc7f149db7976ba4e79149619f537/test-bed/data/ plugfest/2017-05-osaka/MyLED_f.jsonld>

[CBAR] Goncharov, V., "CBOR & generic BLOB by-Atoms Reducing", Work in Progress, https://github.com/nuclight/musctp/blob/main/cbar.txt

  1. Authors' Addresses

    Carsten Bormann Universität Bremen TZI Postfach 330440 D-28359 Bremen Germany Phone: +49-421-218-63921 Email: cabo@tzi.org

    Vadim Goncharov Email: vadimnuclight@gmail.com ```

7. Normative References

[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/rfc/rfc2119>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/rfc/rfc8174>.

Acknowledgments

TODO acknowledge.

Author's Address

Vadim Goncharov
Your Organization Here