ASCII, accents and Unicode | Liberty BASIC Community Forum

tenochtitlanuk
Global Moderator

Posts: 579

ASCII, accents and Unicode Aug 4, 2020 14:42:06 GMT -5

Quote

Post by tenochtitlanuk on Aug 4, 2020 14:42:06 GMT -5

LB4's use of extended ASCII gives many accented characters.

I just noticed a Rosetta Code task to 'de-vowel' some supplied text.

eg

lineEnd$ =chr$( 13) +chr$( 10)

source$ ="Norwegian, Icelandic, German, Turkish, French, Spanish, English" +lineEnd$+_
"Undervisningen skal være gratis, i det minste på de elementære og grunnleggende trinn." +lineEnd$+_
"Hochschulunterricht muß allen gleichermaßen entsprechend ihren Fähigkeiten offenstehen." +lineEnd$+_
"Ögrenim hiç olmazsa ilk ve temel safhalarinda parasizdir. Ilk ögretim mecburidir." +lineEnd$+_
"L'éducation doit être gratuite, au moins en ce qui concerne l'enseignement élémentaire et fondamental." +lineEnd$+_
"La instrucción elemental será obligatoria. La instrucción técnica y profesional habrá de ser generalizada."+lineEnd$+_
"Education shall be free, at least in the elementary and fundamental stages." +lineEnd$

... is turned to

Nrwgn, clndc, Grmn, Trksh, Frnch, Spnsh, nglsh
ndrvsnngn skl vr grts, dt mnst p d lmntr g grnnlggnd trnn.
Hchschlntrrcht mß lln glchrmßn ntsprchnd hrn Fhgktn ffnsthn.
grnm hç lmzs lk v tml sfhlrnd prszdr. lk grtm mcbrdr.
L'dctn dt tr grtt, mns n c q cncrn l'nsgnmnt lmntr t fndmntl.
L nstrccn lmntl sr blgtr. L nstrccn tcnc y prfsnl hbr d sr gnrlzd.
dctn shll b fr, t lst n th lmntry nd fndmntl stgs.

Easy enough to write in LB4 with vowels defined as say "AEIOUIÖaeiouæáåäéêióöúÀÁÂÄÊËÌÍÖÙÚâêìíòóô"

I wonder how easy it'll be if Unicode multibyte characters are in the source string???

Long time LB user. See my website! diga.me.uk

cundo
Full Member

Muchas Gracias!!

Posts: 146

ASCII, accents and Unicode Aug 4, 2020 16:25:42 GMT -5

Quote

Post by cundo on Aug 4, 2020 16:25:42 GMT -5

What about the text, in Spanish doesn't say the same, in English says 'education must be free', in Spanish says 'Elemental instruction will be mandatory'..., or so.
In French also says something about Education being free at least in early stages.

Carl Gundel
Administrator

Posts: 1,535

ASCII, accents and Unicode Aug 4, 2020 18:28:01 GMT -5

Quote

Post by Carl Gundel on Aug 4, 2020 18:28:01 GMT -5

Aug 4, 2020 14:42:06 GMT -5 tenochtitlanuk said:

LB4's use of extended ASCII gives many accented characters.

I just noticed a Rosetta Code task to 'de-vowel' some supplied text.

eg

lineEnd$ =chr$( 13) +chr$( 10)

source$ ="Norwegian, Icelandic, German, Turkish, French, Spanish, English" +lineEnd$+_
"Undervisningen skal være gratis, i det minste på de elementære og grunnleggende trinn." +lineEnd$+_
"Hochschulunterricht muß allen gleichermaßen entsprechend ihren Fähigkeiten offenstehen." +lineEnd$+_
"Ögrenim hiç olmazsa ilk ve temel safhalarinda parasizdir. Ilk ögretim mecburidir." +lineEnd$+_
"L'éducation doit être gratuite, au moins en ce qui concerne l'enseignement élémentaire et fondamental." +lineEnd$+_
"La instrucción elemental será obligatoria. La instrucción técnica y profesional habrá de ser generalizada."+lineEnd$+_
"Education shall be free, at least in the elementary and fundamental stages." +lineEnd$

... is turned to

Nrwgn, clndc, Grmn, Trksh, Frnch, Spnsh, nglsh
ndrvsnngn skl vr grts, dt mnst p d lmntr g grnnlggnd trnn.
Hchschlntrrcht mß lln glchrmßn ntsprchnd hrn Fhgktn ffnsthn.
grnm hç lmzs lk v tml sfhlrnd prszdr. lk grtm mcbrdr.
L'dctn dt tr grtt, mns n c q cncrn l'nsgnmnt lmntr t fndmntl.
L nstrccn lmntl sr blgtr. L nstrccn tcnc y prfsnl hbr d sr gnrlzd.
dctn shll b fr, t lst n th lmntry nd fndmntl stgs.

Easy enough to write in LB4 with vowels defined as say "AEIOUIÖaeiouæáåäéêióöúÀÁÂÄÊËÌÍÖÙÚâêìíòóô"

I wonder how easy it'll be if Unicode multibyte characters are in the source string???

I wrote the following and ran it in LB5 build 351 on Windows 10.

lineEnd$ =chr$( 13)
vowel$ = "AEIOUIÖaeiouæáåäéêióöúÀÁÂÄÊËÌÍÖÙÚâêìíòóô"
source$ ="Norwegian, Icelandic, German, Turkish, French, Spanish, English" +lineEnd$+_
"Undervisningen skal være gratis, i det minste på de elementære og grunnleggende trinn." +lineEnd$+_
"Hochschulunterricht muß allen gleichermaßen entsprechend ihren Fähigkeiten offenstehen." +lineEnd$+_
"Ögrenim hiç olmazsa ilk ve temel safhalarinda parasizdir. Ilk ögretim mecburidir." +lineEnd$+_
"L'éducation doit être gratuite, au moins en ce qui concerne l'enseignement élémentaire et fondamental." +lineEnd$+_
"La instrucción elemental será obligatoria. La instrucción técnica y profesional habrá de ser generalizada."+lineEnd$+_
"Education shall be free, at least in the elementary and fundamental stages." +lineEnd$

for x = 1 to len(source$)
  char$ = mid$(source$, x, 1)
  if instr(vowel$, char$) = 0 then target$ = target$ + char$
next x
print target$

The result is:

Nrwgn, clndc, Grmn, Trksh, Frnch, Spnsh, nglsh
ndrvsnngn skl vr grts,  dt mnst p d lmntr g grnnlggnd trnn.
Hchschlntrrcht mß lln glchrmßn ntsprchnd hrn Fhgktn ffnsthn.
grnm hç lmzs lk v tml sfhlrnd prszdr. lk grtm mcbrdr.
L'dctn dt tr grtt,  mns n c q cncrn l'nsgnmnt lmntr t fndmntl.
L nstrccn lmntl sr blgtr. L nstrccn tcnc y prfsnl hbr d sr gnrlzd.
dctn shll b fr, t lst n th lmntry nd fndmntl stgs.

Last Edit: Aug 4, 2020 18:29:15 GMT -5 by Carl Gundel

-Carl Gundel, author of Liberty BASIC
www.libertybasic.com

tenochtitlanuk
Global Moderator

Posts: 579

ASCII, accents and Unicode Aug 5, 2020 2:56:44 GMT -5

Quote

Post by tenochtitlanuk on Aug 5, 2020 2:56:44 GMT -5

Cundo- the texts are not translations but examples selected to have lots of vowels- see Rosetta Code examples.

Carl- I guessed my example would run. Was curious to know if, when the source text was KNOWN to be Unicode with multi-byte characters, it woud run in 5.. I'll have now to find some known Unicode files!!

Long time LB user. See my website! diga.me.uk