-
Notifications
You must be signed in to change notification settings - Fork 56
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
16 changed files
with
105 additions
and
2 deletions.
There are no files selected for viewing
15 changes: 15 additions & 0 deletions
15
repository/Zinc-Character-Encoding-Core.package/ZnLossyUTF8Encoder.class/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
I am ZnLossyUTF8Decoder. | ||
I am a ZnUTF8Decoder. | ||
|
||
I behave like my superclass but will not signal errors when I see illegal UTF-8 encoded input, | ||
instead I will output a Unicode Replacement Character (U+FFFD) for each error. | ||
|
||
In contrast to my superclass I can read any random byte sequence, decoding both legal and illegal UTF-8 sequences. | ||
|
||
Due to my stream based design and usage as well as my stateless implementation, | ||
I will output multiple replacement characters when multiple illegal sequences occur. | ||
|
||
My convenience method #decodeBytesSingleReplacement: shows how to decode bytes so that | ||
only a single replacement character stands for any amount of illegal encoding between legal encodings. | ||
|
||
Part of Zinc HTTP Components. |
5 changes: 5 additions & 0 deletions
5
...y/Zinc-Character-Encoding-Core.package/ZnLossyUTF8Encoder.class/class/handlesEncoding..st
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
accessing | ||
handlesEncoding: string | ||
"Return true when my instances handle the encoding described by string" | ||
|
||
^ (self canonicalEncodingIdentifier: string) = 'utf8lossy' |
3 changes: 3 additions & 0 deletions
3
...haracter-Encoding-Core.package/ZnLossyUTF8Encoder.class/class/knownEncodingIdentifiers.st
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
accessing | ||
knownEncodingIdentifiers | ||
^ #( utf8lossy ) |
22 changes: 22 additions & 0 deletions
22
...-Encoding-Core.package/ZnLossyUTF8Encoder.class/instance/decodeBytesSingleReplacement..st
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
convenience | ||
decodeBytesSingleReplacement: bytes | ||
"Decode bytes and return the resulting string. | ||
This variant of #decodeBytes: will only ever use | ||
a single replacement character for each illegal UTF-8 sequence" | ||
|
||
| byteStream replaced replacement char | | ||
byteStream := bytes readStream. | ||
replaced := false. | ||
replacement := self replacementCodePoint asCharacter. | ||
^ String streamContents: [ :stream | | ||
[ byteStream atEnd ] whileFalse: [ | ||
char := self nextFromStream: byteStream. | ||
char = replacement | ||
ifTrue: [ | ||
replaced | ||
ifFalse: [ | ||
replaced := true. | ||
stream nextPut: replacement ] ] | ||
ifFalse: [ | ||
replaced := false. | ||
stream nextPut: char ] ] ] |
3 changes: 3 additions & 0 deletions
3
...r-Encoding-Core.package/ZnLossyUTF8Encoder.class/instance/errorIllegalContinuationByte.st
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
error handling | ||
errorIllegalContinuationByte | ||
^ self replacementCodePoint |
3 changes: 3 additions & 0 deletions
3
...racter-Encoding-Core.package/ZnLossyUTF8Encoder.class/instance/errorIllegalLeadingByte.st
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
error handling | ||
errorIllegalLeadingByte | ||
^ self replacementCodePoint |
3 changes: 3 additions & 0 deletions
3
...Zinc-Character-Encoding-Core.package/ZnLossyUTF8Encoder.class/instance/errorIncomplete.st
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
error handling | ||
errorIncomplete | ||
^ self replacementCodePoint |
3 changes: 3 additions & 0 deletions
3
...nc-Character-Encoding-Core.package/ZnLossyUTF8Encoder.class/instance/errorOutsideRange.st
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
error handling | ||
errorOutsideRange | ||
^ self replacementCodePoint |
3 changes: 3 additions & 0 deletions
3
...y/Zinc-Character-Encoding-Core.package/ZnLossyUTF8Encoder.class/instance/errorOverlong.st
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
error handling | ||
errorOverlong | ||
^ self replacementCodePoint |
3 changes: 3 additions & 0 deletions
3
...tory/Zinc-Character-Encoding-Core.package/ZnLossyUTF8Encoder.class/instance/identifier.st
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
accessing | ||
identifier | ||
^ #utf8lossy |
5 changes: 5 additions & 0 deletions
5
...Character-Encoding-Core.package/ZnLossyUTF8Encoder.class/instance/replacementCodePoint.st
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
accessing | ||
replacementCodePoint | ||
"Return the code point for the Unicode Replacement Character U+FFFD" | ||
|
||
^ 16rFFFD |
11 changes: 11 additions & 0 deletions
11
repository/Zinc-Character-Encoding-Core.package/ZnLossyUTF8Encoder.class/properties.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
{ | ||
"commentStamp" : "<historical>", | ||
"super" : "ZnUTF8Encoder", | ||
"category" : "Zinc-Character-Encoding-Core", | ||
"classinstvars" : [ ], | ||
"pools" : [ ], | ||
"classvars" : [ ], | ||
"instvars" : [ ], | ||
"name" : "ZnLossyUTF8Encoder", | ||
"type" : "normal" | ||
} |
2 changes: 1 addition & 1 deletion
2
repository/Zinc-Character-Encoding-Core.package/monticello.meta/categories.st
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
self packageOrganizer ensurePackage: #'Zinc-Character-Encoding-Core' withTags: #()! | ||
SystemOrganization addCategory: #'Zinc-Character-Encoding-Core'! |
18 changes: 18 additions & 0 deletions
18
...c-Character-Encoding-Tests.package/ZnCharacterEncoderTest.class/instance/testLossyUTF8.st
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
testing | ||
testLossyUTF8 | ||
| encoder replacement | | ||
encoder := ZnLossyUTF8Encoder new. | ||
self assert: #utf8lossy asZnCharacterEncoder equals: encoder. | ||
replacement := encoder replacementCodePoint asCharacter. | ||
self | ||
assert: (#[65 160 66] decodeWith: encoder) | ||
equals: ({ $A. replacement . $B } as: String). | ||
self | ||
assert: (#[16rE1 16rA0 16rC0] decodeWith: encoder) | ||
equals: replacement asString. | ||
self | ||
assert: (encoder decodeBytes: #[16r41 16rA1 16rA2 16rA3 16r42]) | ||
equals: ({ $A. replacement . replacement . replacement . $B } as: String). | ||
self | ||
assert: (encoder decodeBytesSingleReplacement: #[16r41 16rA1 16rA2 16rA3 16r42]) | ||
equals: ({ $A. replacement . $B } as: String). |
6 changes: 6 additions & 0 deletions
6
...acter-Encoding-Tests.package/ZnCharacterEncoderTest.class/instance/testLossyUTF8Random.st
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
testing | ||
testLossyUTF8Random | ||
| bytes string | | ||
bytes := ((1 to: 10000) collect: [ :_ | 256 atRandom - 1 ]) asByteArray. | ||
string := bytes decodeWith: ZnLossyUTF8Encoder new. | ||
self assert: string isString |
2 changes: 1 addition & 1 deletion
2
repository/Zinc-Character-Encoding-Tests.package/monticello.meta/categories.st
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
self packageOrganizer ensurePackage: #'Zinc-Character-Encoding-Tests' withTags: #()! | ||
SystemOrganization addCategory: #'Zinc-Character-Encoding-Tests'! |