Length and Size of String in Encoding - C# #153
-
I wrote the above code. I have three questions:
(I cannot upload an image here, I uploaded in upload center, please see the below image from my Console app) Thanks in advance |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
With this line
you overwrite the encoding passed to the method to always use UTF8. If you remove this line, you'll get different results. With the string you pass, you pass one character for the arrow: 🡪 is one character contrary to -> which are two characters. That's why you see 6 bytes. |
Beta Was this translation helpful? Give feedback.
-
When you write the preamble to the console, you don't access the bytes returned from In Chapter 18, Files and Streams (section Analyzing Text File Encodings on page 497 you can read something about the different encodings and the reason for these byte order marks (BOM). What you now get with ASCII is 4 bytes representing 4 characters. M is translated to the ASCII capital letter M (77, 0x4D). y is translated to the small letter y (121, 0x79). The Unicode symbol 🡪 cannot be translated, so you get two question marks (63, 0x3F). With the Unicode (UTF-16) encoding 8 bytes are returned which matches two bytes for each character. 0077 or 0x004D is M. UTF-16 started originally with 2 bytes for each character. However, it turned out that 2 bytes are not always enough, so one character can be two bytes or 4 bytes. With UTF-8 you get one or more bytes for each character. That's why for this string, 6 bytes are returned. Check the Codepage layout at UTF-8. |
Beta Was this translation helpful? Give feedback.
When you write the preamble to the console, you don't access the bytes returned from
encoding.GetPreamble()
. Instead, you access the converted bytes. TheGetBytes
method doesn't add a preamble.You can add the preamble yourself - if you would like to use this. For single strings you don't really need to do this. It's useful to prefix a complete text which you store on disk or send across the network, so it can easily be accessed again - also on other platforms.
In Chapter 18, Files and Streams (section Analyzing Text File Encodings on page 497 you can read something about the different encodings and the reason for these byte order marks (BOM).
What you now get with ASCII is 4 bytes repres…