Follow

@Pi_rat

How does not having Askey reduce lines of code

Doesn't someone have to individually add code for every single letter then?

· · Web · 2 · 0 · 1
@shortstories maybe cumulatively that code would still be lesser in this particular usecase. I particularly love the last line
@shortstories @Pi_rat (ASCII)
It's just a table that defines letter to number mappings.
He is not implying that not using it reduces number of lines of code, nor even that not using it reduces bloat; it's Terry, he changes topic pretty quickly.
ASCII *is* somewhat inefficient, and unicode inherited it, but it made sense at the time. The alteration I propose actually makes text longer, but fixes a large number of other problems.
@Zergling_man @Pi_rat @shortstories /projects/programming/newþur/replacements/eanrsam on my eepsite
@Zergling_man @Pi_rat @shortstories UTF-8 is very efficient, as all ASCII is 8 bits and characters are at most 4 bytes (meanwhile UTF-16 is 2 or 4 bytes and UTF-16 is always 4 bytes).
@Suiseiseki @shortstories @Pi_rat I am not advocating UTF-16 or anything. I am merely, once again, begging to replace the capital letters with an iscapital combining character.
@Zergling_man @Pi_rat @Suiseiseki @shortstories this is especially important for other languages. case insensitive ascii search is just a flag in sqlite, case insensitive Кириллик is non-trivial, and this wouldn't be so if it worked that way
@Zergling_man @Pi_rat @Suiseiseki @shortstories >sorting strings doesn't do anything useful other than enable binary search
no shit
the rest of it sounds like the same deal of not having a clue of what strings are for and what you can expect to do with them, or would even want to
@Zergling_man @Pi_rat @Suiseiseki @shortstories solely for the purpose of binary search e.g. in a dictionary, not that there is anything else to ever motivate sorting strings a human would look at. and this binary search does treat é differently from e even if done by a human. after all, malus and mālus are different words (bad and apple tree, respectively)
if you wanted ā to come between a and b, then you obviously have some logic too complicated and specific to be standardised
@Zergling_man @Pi_rat @shortstories You just turned 1 byte into 2 bytes and added processing complexity.

lowercase - 32 == uppercase meanwhile uses the same byte.

You can add an iscapital combining character to Unicode if you want to screw up unicode rendering even more.
@Suiseiseki @shortstories @Pi_rat I am reasonably certain I have sent you Eevee's Unicode post before. It drastically *simplifies* processing.
@shortstories @Pi_rat @Suiseiseki (And it's only adding a byte whenever you have a capital, which makes nugger very sad but is largely irrelevant otherwise.)
@shortstories @Suiseiseki @Pi_rat (But, y'know, since you're saving 25 codepoints, you could just assign two of them to "start capital section" and "end capital section" and still be 23 up in the low end.)
@Zergling_man @Pi_rat @shortstories There would be only 25 codepoints left - an well you can't do much with those aside from capital letters. It's too late anyway - everything is ASCII (see poor extended ASCII compatibility).

The only practical way to achieve that would be to massively increase the size of ASCII works - for examples GNU/Linux would turn into; <2+ bytes start capital>gnu<2+ bytes end capital>/<2+ bytes start capital>l<2+ bytes end capital>inux.
@Suiseiseki @Pi_rat @shortstories @Zergling_man if we could have had 6 bit words that might've been nice, but since there's no way that'll happen there's no space to save
@Suiseiseki @Pi_rat @Zergling_man @shortstories having diacritics in ascii would've been good tho, those could go in the saved 25 chars, and would fit well with the captial modifier. latin vowel extension is annoying as is.
@nigger @shortstories @Suiseiseki @Pi_rat Let's assume it's only 23. Having block capitalisation is a good idea, because generally you get either one capital or 3+ in a row, eg. NIGGERS.
You *could* do it with only two characters: iscapital/endcapital could be the same character, because there is no reason to say "this character is capital" when you are already in a capital block, but not doing this is safer when chunks get copied and pasted around. (Which is actually a great way to generate bugs in blocks, but ruby text already has this problem, among other things.)
@Suiseiseki @shortstories @Pi_rat >an well you can't do much with those aside from capital letters
You have no imagination

>It's too late anyway - everything is ASCII (see poor extended ASCII compatibility).
This is why I am mad at Unicode

>2+ bytes start capital
You're retarded. I specifically said you could put them in the points freed up, so they'd only be one byte.
Also there's nothing stopping you wrapping the / in it, since it is not capitalisable. (Although someone, somewhere, is going to create a language where that's not true, so fuck that faggot for wasting a byte. Note, only one, because you'd write it as <sc>gnu<ec>/<ic>linux.)
@Zergling_man @Pi_rat @Suiseiseki @shortstories it would motivate people to drop capitals altogether except in special occasions, which is based. start of sentence is denoted by the dot, the capital has no syntactic purpse there. most names likewise can't be mistaken for nouns
@nigger @shortstories @Suiseiseki @Pi_rat Every day, people make the decision to log into twitter.com, which counts characters not by bytes at all, but by its own schizophrenic counting system that approximates the natural human count.

>start of sentence is denoted by the dot, the capital has no syntactic purpse there. most names likewise can't be mistaken for nouns
これきらい
@Zergling_man @Pi_rat @Suiseiseki @shortstories
you could solve many problems by only having cjk radicals and combining templates instead of precomposed everything. and separate diacritics instead of precomposed characters. and for heaven's sake, why is presentation information (fraktur this, bold that, italics) in the encoding layer? what are all these emoji for? cut it down to smiley, frowny, angry, sleepy, laugh, uoh. that's all you need.
like 90% of unicode disappears into vapor if you do this.
@HatkeshiatorTND @shortstories @Suiseiseki @Pi_rat >you could solve many problems by only having cjk radicals and combining templates instead of precomposed everything
I don't believe this for a second. It's a good theory but like have you actually seen kanji
Have you seen how many variants of 心 there are for radicals? How do you select which one to use lol. You *could* do it with composition by having the character declare which version of each component it uses, but that still requires character-specific information to exist, and thus some way to reference it.
Also see 忙 and 忘 which are the same two components arranged differently (but the radical is still 心 for both, so the order you input them doesn't solve it - inputting it first, ie. making 亡 the radical - is strictly invalid).
@Zergling_man @Pi_rat @Suiseiseki @shortstories
>there are variants
even if you included every hanzi/hanja/kanji variant including ones only found in some obscure manuscript of questionable authenticity from 875, and as a bonus all variants of all seal script radicals, you'd still use at least an order of magnitude fewer codepoints.
>they don't always compose the same way
yeah, that's why you make different templates: CJK-COMPOSITE-TEMPLATE-BINARY-TOP-BOTTOM, ditto BOTTOM-TOP, LEFT-RIGHT, RIGHT-LEFT, INNER-OUTER, OUTER-INNTER. then CJK-COMPOSITE-TEMPLATE-TERNARY-TOP-BOTTOMLEFT-BOTTOMRIGHT, ditto LEFT-CENTER-RIGHT, RIGHT-TOPLEFT-BOTTOMLEFT, etc.
@HatkeshiatorTND @shortstories @Suiseiseki @Pi_rat >even if you included every hanzi/hanja/kanji variant including ones only found in some obscure manuscript of questionable authenticity from 875, and as a bonus all variants of all seal script radicals, you'd still use at least an order of magnitude fewer
You would have to specifically not include any of that, though?

>yeah, that's why you make different templates: CJK-COMPOSITE-TEMPLATE-BINARY-TOP-BOTTOM, ditto BOTTOM-TOP, LEFT-RIGHT, RIGHT-LEFT, INNER-OUTER, OUTER-INNTER. then CJK-COMPOSITE-TEMPLATE-TERNARY-TOP-BOTTOMLEFT-BOTTOMRIGHT, ditto LEFT-CENTER-RIGHT, RIGHT-TOPLEFT-BOTTOMLEFT, etc.
Oh right now I'm getting it. So you say "this template with these components as arguments"? And that would go recursively?
@shortstories @Suiseiseki @Pi_rat @HatkeshiatorTND It seems like, say, technical discussions could blow out quickly but similarly, you could have some topic marker that temporarily defines codepoints (have a section reserved for this use) that are constantly being reused.
@Zergling_man @Pi_rat @Suiseiseki @shortstories recursively would be fun. it would let you represent the old vietnamese chu nom.
Sign in to participate in the conversation
Merovingian Club

A club for red-pilled exiles.