[cmucl-imp] string-to-octets

Raymond Toy toy.raymond at gmail.com
Sun Oct 30 17:30:26 CET 2011


On 10/30/11 12:04 AM, Helmut Eller wrote:
> * Raymond Toy [2011-10-30 03:28] writes:
>
>> On 10/29/11 4:18 AM, Helmut Eller wrote:
>>> I'm wondering how STRING-TO-OCTETS can be used with a fixed sized byte
>>> buffer.  For example we want to write a long string to a byte stream
>>> using a fixed size byte buffer.  STRING-TO-OCTETS seems to return the
>>> buffer and the number of bytes written.  E.g.
>>>
>>> (let ((buffer (make-array 20 :element-type '(unsigned-byte 8)))
>>>       (string (make-string 100  :initial-element #\a)))
>>>   (stream:string-to-octets string :external-format :utf32 :buffer buffer))
>>>
>>>
>> Is this better:
>>
>> (let ((buffer (make-array 19 :element-type '(unsigned-byte 8)))
>>            (string (make-string 100  :initial-element #\u+3b2)))
>>        (multiple-value-bind (b p i last)
>>            (stream:string-to-octets string :external-format :utf8
>> :buffer buffer)
>>          (values b p i last)))
>>
>> #(206 178 206 178 206 178 206 178 206 178 206 178 206 178 206 178 206
>> 178 206)
>> 19
>> 9
>> 18
>>
>> 9 is the number of characters converted, 18 is the index+1 in the buffer
>> where the last valid octet was placed.  The last octet is the first
>> octect of the 2-octet utf8 encoding for #\u+3b2.
> Is 19 the number of octets written?  Or is it an index?
> Might be nice to either use counts or indexes consistently.
>
> I'm not sure that I understand the purpose of the 18.  Is this something
> that is needed later?

Yeah, that was kind of messy.  This is what I currently have:

* (let ((buffer (make-array 20 :element-type '(unsigned-byte 8)))
           (string (make-string 100  :initial-element #\u+f012)))
       (stream:string-to-octets string :external-format :utf8 :buffer
buffer))

#(239 128 146 239 128 146 239 128 146 239 128 146 239 128 146 239 128
146 239
  128)
18
6

So 18 is the number of valid octets actually written.  (The last two
octets form an incomplete conversion.)   The 6 is the number of
characters consumed to produce those 18 octets.

For the case where no buffer is specified, a new buffer is created and
the second return value is the number of octets written (same as the
buffer length), and the third value is the number of characters (length
of the string).

Is that better?

Ray



More information about the cmucl-imp mailing list