@shortstories @Pi_rat (ASCII)
It's just a table that defines letter to number mappings.
He is not implying that not using it reduces number of lines of code, nor even that not using it reduces bloat; it's Terry, he changes topic pretty quickly.
ASCII *is* somewhat inefficient, and unicode inherited it, but it made sense at the time. The alteration I propose actually makes text longer, but fixes a large number of other problems.

**March 16th, The Hatkeshiator** @[email protected] · Dec 26, 2025, 07:36 *

**March 16th, The Hatkeshiator** @[email protected] · Dec 26, 2025, 07:36 *

Dec 26, 2025, 07:36 *

March 16th, The Hatkeshiator @[email protected]

@Zergling_man @Pi_rat @shortstories /projects/programming/newþur/replacements/eanrsam on my eepsite

**GNU/翠星石** @[email protected] · Dec 26, 2025, 12:00

**GNU/翠星石** @[email protected] · Dec 26, 2025, 12:00

Dec 26, 2025, 12:00

GNU/翠星石 @[email protected]

@Zergling_man @Pi_rat @shortstories UTF-8 is very efficient, as all ASCII is 8 bits and characters are at most 4 bytes (meanwhile UTF-16 is 2 or 4 bytes and UTF-16 is always 4 bytes).

**Zergling_man - fedicon 2026 @ C109** @[email protected] · Dec 26, 2025, 12:01

**Zergling_man - fedicon 2026 @ C109** @[email protected] · Dec 26, 2025, 12:01

Dec 26, 2025, 12:01

Zergling_man - fedicon 2026 @ C109 @[email protected]

@Suiseiseki @shortstories @Pi_rat I am not advocating UTF-16 or anything. I am merely, once again, begging to replace the capital letters with an iscapital combining character.

**πρωτος** @[email protected] · Dec 26, 2025, 12:06 *

**πρωτος** @[email protected] · Dec 26, 2025, 12:06 *

Dec 26, 2025, 12:06 *

πρωτος @[email protected]

@Zergling_man @Pi_rat @Suiseiseki @shortstories this is especially important for other languages. case insensitive ascii search is just a flag in sqlite, case insensitive Кириллик is non-trivial, and this wouldn't be so if it worked that way

**Zergling_man - fedicon 2026 @ C109** @[email protected] · Dec 26, 2025, 12:34

**Zergling_man - fedicon 2026 @ C109** @[email protected] · Dec 26, 2025, 12:34

Dec 26, 2025, 12:34

Zergling_man - fedicon 2026 @ C109 @[email protected]

@nigger @shortstories @Suiseiseki @Pi_rat https://eev.ee/blog/2015/09/12/dark-corners-of-unicode/

**πρωτος** @[email protected] · Dec 26, 2025, 12:45 *

**πρωτος** @[email protected] · Dec 26, 2025, 12:45 *

Dec 26, 2025, 12:45 *

πρωτος @[email protected]

@Zergling_man @Pi_rat @Suiseiseki @shortstories >sorting strings doesn't do anything useful other than enable binary search
no shit
the rest of it sounds like the same deal of not having a clue of what strings are for and what you can expect to do with them, or would even want to

**Zergling_man - fedicon 2026 @ C109** @[email protected] · Dec 26, 2025, 12:50

**Zergling_man - fedicon 2026 @ C109** @[email protected] · Dec 26, 2025, 12:50

Dec 26, 2025, 12:50

Zergling_man - fedicon 2026 @ C109 @[email protected]

@nigger @shortstories @Suiseiseki @Pi_rat Sorting strings is pretty important for humans.

**πρωτος** @[email protected] · Dec 26, 2025, 13:00

**πρωτος** @[email protected] · Dec 26, 2025, 13:00

Dec 26, 2025, 13:00

πρωτος @[email protected]

@Zergling_man @Pi_rat @Suiseiseki @shortstories solely for the purpose of binary search e.g. in a dictionary, not that there is anything else to ever motivate sorting strings a human would look at. and this binary search does treat é differently from e even if done by a human. after all, malus and mālus are different words (bad and apple tree, respectively)
if you wanted ā to come between a and b, then you obviously have some logic too complicated and specific to be standardised

**Zergling_man - fedicon 2026 @ C109** @[email protected] · Dec 26, 2025, 13:18

**Zergling_man - fedicon 2026 @ C109** @[email protected] · Dec 26, 2025, 13:18

Dec 26, 2025, 13:18

Zergling_man - fedicon 2026 @ C109 @[email protected]

@nigger @shortstories @Suiseiseki @Pi_rat I would expect to find mālus before mnemonic.

**GNU/翠星石** @[email protected] · Dec 26, 2025, 12:21

**GNU/翠星石** @[email protected] · Dec 26, 2025, 12:21

Dec 26, 2025, 12:21

GNU/翠星石 @[email protected]

@Zergling_man @Pi_rat @shortstories You just turned 1 byte into 2 bytes and added processing complexity.

lowercase - 32 == uppercase meanwhile uses the same byte.

You can add an iscapital combining character to Unicode if you want to screw up unicode rendering even more.

**Zergling_man - fedicon 2026 @ C109** @[email protected] · Dec 26, 2025, 12:21

**Zergling_man - fedicon 2026 @ C109** @[email protected] · Dec 26, 2025, 12:21

Dec 26, 2025, 12:21

Zergling_man - fedicon 2026 @ C109 @[email protected]

@Suiseiseki @shortstories @Pi_rat I am reasonably certain I have sent you Eevee's Unicode post before. It drastically *simplifies* processing.

**Zergling_man - fedicon 2026 @ C109** @[email protected] · Dec 26, 2025, 12:22

**Zergling_man - fedicon 2026 @ C109** @[email protected] · Dec 26, 2025, 12:22

Dec 26, 2025, 12:22

Zergling_man - fedicon 2026 @ C109 @[email protected]

@shortstories @Pi_rat @Suiseiseki (And it's only adding a byte whenever you have a capital, which makes nugger very sad but is largely irrelevant otherwise.)

**Zergling_man - fedicon 2026 @ C109** @[email protected] · Dec 26, 2025, 12:23

**Zergling_man - fedicon 2026 @ C109** @[email protected] · Dec 26, 2025, 12:23

Dec 26, 2025, 12:23

Zergling_man - fedicon 2026 @ C109 @[email protected]

@shortstories @Suiseiseki @Pi_rat (But, y'know, since you're saving 25 codepoints, you could just assign two of them to "start capital section" and "end capital section" and still be 23 up in the low end.)

**GNU/翠星石** @[email protected] · Dec 26, 2025, 12:30

**GNU/翠星石** @[email protected] · Dec 26, 2025, 12:30

Dec 26, 2025, 12:30

GNU/翠星石 @[email protected]

@Zergling_man @Pi_rat @shortstories There would be only 25 codepoints left - an well you can't do much with those aside from capital letters. It's too late anyway - everything is ASCII (see poor extended ASCII compatibility).

The only practical way to achieve that would be to massively increase the size of ASCII works - for examples GNU/Linux would turn into; <2+ bytes start capital>gnu<2+ bytes end capital>/<2+ bytes start capital>l<2+ bytes end capital>inux.

**πρωτος** @[email protected] · Dec 26, 2025, 12:34

**πρωτος** @[email protected] · Dec 26, 2025, 12:34

Dec 26, 2025, 12:34

πρωτος @[email protected]

@Suiseiseki @Pi_rat @shortstories @Zergling_man if we could have had 6 bit words that might've been nice, but since there's no way that'll happen there's no space to save

**πρωτος** @[email protected] · Dec 26, 2025, 12:37

**πρωτος** @[email protected] · Dec 26, 2025, 12:37

Dec 26, 2025, 12:37

πρωτος @[email protected]

@Suiseiseki @Pi_rat @Zergling_man @shortstories having diacritics in ascii would've been good tho, those could go in the saved 25 chars, and would fit well with the captial modifier. latin vowel extension is annoying as is.

**Zergling_man - fedicon 2026 @ C109** @[email protected] · Dec 26, 2025, 12:48

**Zergling_man - fedicon 2026 @ C109** @[email protected] · Dec 26, 2025, 12:48

Dec 26, 2025, 12:48

Zergling_man - fedicon 2026 @ C109 @[email protected]

@nigger @shortstories @Suiseiseki @Pi_rat Let's assume it's only 23. Having block capitalisation is a good idea, because generally you get either one capital or 3+ in a row, eg. NIGGERS.
You *could* do it with only two characters: iscapital/endcapital could be the same character, because there is no reason to say "this character is capital" when you are already in a capital block, but not doing this is safer when chunks get copied and pasted around. (Which is actually a great way to generate bugs in blocks, but ruby text already has this problem, among other things.)

**Zergling_man - fedicon 2026 @ C109** @[email protected] · Dec 26, 2025, 12:34

**Zergling_man - fedicon 2026 @ C109** @[email protected] · Dec 26, 2025, 12:34

Dec 26, 2025, 12:34

Zergling_man - fedicon 2026 @ C109 @[email protected]

@Suiseiseki @shortstories @Pi_rat >an well you can't do much with those aside from capital letters
You have no imagination

>It's too late anyway - everything is ASCII (see poor extended ASCII compatibility).
This is why I am mad at Unicode

>2+ bytes start capital
You're retarded. I specifically said you could put them in the points freed up, so they'd only be one byte.
Also there's nothing stopping you wrapping the / in it, since it is not capitalisable. (Although someone, somewhere, is going to create a language where that's not true, so fuck that faggot for wasting a byte. Note, only one, because you'd write it as <sc>gnu<ec>/<ic>linux.)

**πρωτος** @[email protected] · Dec 26, 2025, 12:31

**πρωτος** @[email protected] · Dec 26, 2025, 12:31

Dec 26, 2025, 12:31

πρωτος @[email protected]

@Zergling_man @Pi_rat @Suiseiseki @shortstories it would motivate people to drop capitals altogether except in special occasions, which is based. start of sentence is denoted by the dot, the capital has no syntactic purpse there. most names likewise can't be mistaken for nouns

**Zergling_man - fedicon 2026 @ C109** @[email protected] · Dec 26, 2025, 12:37

**Zergling_man - fedicon 2026 @ C109** @[email protected] · Dec 26, 2025, 12:37

Dec 26, 2025, 12:37

Zergling_man - fedicon 2026 @ C109 @[email protected]

@nigger @shortstories @Suiseiseki @Pi_rat Every day, people make the decision to log into twitter.com, which counts characters not by bytes at all, but by its own schizophrenic counting system that approximates the natural human count.

>start of sentence is denoted by the dot, the capital has no syntactic purpse there. most names likewise can't be mistaken for nouns
これきらい

**March 16th, The Hatkeshiator** @[email protected] · Dec 26, 2025, 12:38

**March 16th, The Hatkeshiator** @[email protected] · Dec 26, 2025, 12:38

Dec 26, 2025, 12:38

March 16th, The Hatkeshiator @[email protected]

@Zergling_man @Pi_rat @Suiseiseki @shortstories
you could solve many problems by only having cjk radicals and combining templates instead of precomposed everything. and separate diacritics instead of precomposed characters. and for heaven's sake, why is presentation information (fraktur this, bold that, italics) in the encoding layer? what are all these emoji for? cut it down to smiley, frowny, angry, sleepy, laugh, uoh. that's all you need.
like 90% of unicode disappears into vapor if you do this.

**Zergling_man - fedicon 2026 @ C109** @[email protected] · Dec 26, 2025, 12:43

**Zergling_man - fedicon 2026 @ C109** @[email protected] · Dec 26, 2025, 12:43

Dec 26, 2025, 12:43

Zergling_man - fedicon 2026 @ C109 @[email protected]

@HatkeshiatorTND @shortstories @Suiseiseki @Pi_rat >you could solve many problems by only having cjk radicals and combining templates instead of precomposed everything
I don't believe this for a second. It's a good theory but like have you actually seen kanji
Have you seen how many variants of 心 there are for radicals? How do you select which one to use lol. You *could* do it with composition by having the character declare which version of each component it uses, but that still requires character-specific information to exist, and thus some way to reference it.
Also see 忙 and 忘 which are the same two components arranged differently (but the radical is still 心 for both, so the order you input them doesn't solve it - inputting it first, ie. making 亡 the radical - is strictly invalid).

**March 16th, The Hatkeshiator** @[email protected] · Dec 26, 2025, 12:50

**March 16th, The Hatkeshiator** @[email protected] · Dec 26, 2025, 12:50

Dec 26, 2025, 12:50

March 16th, The Hatkeshiator @[email protected]

@Zergling_man @Pi_rat @Suiseiseki @shortstories
>there are variants
even if you included every hanzi/hanja/kanji variant including ones only found in some obscure manuscript of questionable authenticity from 875, and as a bonus all variants of all seal script radicals, you'd still use at least an order of magnitude fewer codepoints.
>they don't always compose the same way
yeah, that's why you make different templates: CJK-COMPOSITE-TEMPLATE-BINARY-TOP-BOTTOM, ditto BOTTOM-TOP, LEFT-RIGHT, RIGHT-LEFT, INNER-OUTER, OUTER-INNTER. then CJK-COMPOSITE-TEMPLATE-TERNARY-TOP-BOTTOMLEFT-BOTTOMRIGHT, ditto LEFT-CENTER-RIGHT, RIGHT-TOPLEFT-BOTTOMLEFT, etc.

**Zergling_man - fedicon 2026 @ C109** @[email protected] · Dec 26, 2025, 12:57

**Zergling_man - fedicon 2026 @ C109** @[email protected] · Dec 26, 2025, 12:57

Dec 26, 2025, 12:57

Zergling_man - fedicon 2026 @ C109 @[email protected]

@HatkeshiatorTND @shortstories @Suiseiseki @Pi_rat >even if you included every hanzi/hanja/kanji variant including ones only found in some obscure manuscript of questionable authenticity from 875, and as a bonus all variants of all seal script radicals, you'd still use at least an order of magnitude fewer
You would have to specifically not include any of that, though?

>yeah, that's why you make different templates: CJK-COMPOSITE-TEMPLATE-BINARY-TOP-BOTTOM, ditto BOTTOM-TOP, LEFT-RIGHT, RIGHT-LEFT, INNER-OUTER, OUTER-INNTER. then CJK-COMPOSITE-TEMPLATE-TERNARY-TOP-BOTTOMLEFT-BOTTOMRIGHT, ditto LEFT-CENTER-RIGHT, RIGHT-TOPLEFT-BOTTOMLEFT, etc.
Oh right now I'm getting it. So you say "this template with these components as arguments"? And that would go recursively?

**Zergling_man - fedicon 2026 @ C109** @[email protected] · Dec 26, 2025, 12:59

**Zergling_man - fedicon 2026 @ C109** @[email protected] · Dec 26, 2025, 12:59

Dec 26, 2025, 12:59

Zergling_man - fedicon 2026 @ C109 @[email protected]

@shortstories @Suiseiseki @Pi_rat @HatkeshiatorTND It seems like, say, technical discussions could blow out quickly but similarly, you could have some topic marker that temporarily defines codepoints (have a section reserved for this use) that are constantly being reused.

**March 16th, The Hatkeshiator** @[email protected] · Dec 26, 2025, 13:00

**March 16th, The Hatkeshiator** @[email protected] · Dec 26, 2025, 13:00

Dec 26, 2025, 13:00

March 16th, The Hatkeshiator @[email protected]

@Zergling_man @Pi_rat @Suiseiseki @shortstories recursively would be fun. it would let you represent the old vietnamese chu nom.

Trending now

Resources

Developers

What is Mastodon?

merovingian.club

More…