cnoocy: green a-e ligature (Default)
(boing!) Cnoocy Mosque O'Witz ([personal profile] cnoocy) wrote2024-12-01 12:08 pm
Entry tags:

NaNoGenMo 2024: A sketch leading towards a Shavian primer

For NaNoGenMo 2024, I wanted to write a children's primer in Shavian. Which I did not succeed in doing, but I produced something and I learned a lot.


Shavian is an alternate alphabet for English created by Kingsley Read in accordance with George Bernard Shaw's will. I was first inspired by this tweet by Carla Hurt:

Carla Hurt FoundnAntiquity Oct 8
I can find places that sell posters of the 44 English phonemes for kids, but has anyone made them into a book like the abc books? Each page with one phoneme shown through several words? (Why are there so many alphabet books with only 26 words, one per letter?)


And I figured it would be an interesting NaNoGenMo project. My initial plan was to do something similar to Gyo Fujikawa's A to Z Picture book (https://www.biblio.com/book/gyo-fujikawas-z-picture-book-gyo/d/1397311738#gallery-6) with letters and a set of things starting with that letter, including pictures and then generate larger texts, using Wiktionary, the Kingsley Read Lexicon, and the wordfreq library as my data sources. This ran into a few challenges:


  1. It was hard to get a generated list of appropriate objects out of my data sources. For one, Wiktionary doesn't have specific enough tagging. For example, one can't ask for "most frequently used words referring to animals" because "white" is tagged as a possible animal name and is more frequent that "weasel".

  2. As well, wordfreq has no part-of-speech information, meaning that when combined with Wiktionary's handling of all usages of a word, there are some false positives. For example "part" as in "a part is the most popular adverb starting with 𐑐 (/p/) because the noun is so frequent

  3. Fifty thousand words is a lot of words, so generating letter-specific text was more work than I expected. I got to one sentence of "Interjection, Name verbs the adverb adjective noun" per letter and realized that generating even a paragraph would be a huge effort. And I needed to average 625 transliterated words per letter to hit the goal.

  4. Given the amount of work, I never even really started on the image side of the work. I had some thoughts about finding the first image on the Wikipedia page for the item, but since I never got to "list of items" that didn't happen.



So my eventual result is a book in two sections: The first is a listing for each letter, with traditional and new names and a single generated sentence, many of which are almost sensical. Here's a section of that:


𐑔


𐑔𐑲Thigh / 𐑔𐑹𐑯thorn



𐑔𐑨𐑙𐑒𐑕Thanks, ·𐑔𐑾𐑛𐑹Theodore 𐑔𐑰𐑥𐑟themes 𐑞the 𐑔𐑮𐑵through 𐑔𐑻𐑛third 𐑔𐑨𐑗𐑼thatcher.


𐑞


𐑞𐑱They / 𐑞𐑬thou



𐑞𐑦𐑕This, ·𐑣𐑧𐑞𐑼Heather 𐑳𐑞𐑼𐑟others 𐑞the 𐑞𐑺𐑓𐑹therefore 𐑑𐑩𐑜𐑧𐑞𐑼together 𐑢𐑧𐑞𐑼𐑥𐑨𐑯weatherman.




The rest is an automatic Shavianization of English Fairy Tales by Flora Annie Steel, originally published in 1918. which conveniently is in the public domain and has 40 stories, the same as the number of letters in the Shavian alphabet. My original plan was to replace as many words as possible in each story with words containing the corresponding letter, but the problems above led to a text that was too nonsensical even for a Dada-inspired generative text art project. Here's a sample:


𐑕𐑩𐑯𐑑St. ·𐑡𐑹𐑡George 𐑝of ·𐑥𐑧𐑮𐑦Merrie ·𐑦𐑙𐑜𐑤𐑩𐑯𐑛England



𐑦𐑯In 𐑞the {darksome} 𐑛𐑧𐑐𐑔𐑕depths 𐑝of 𐑩a 𐑔𐑦𐑒thick 𐑓𐑪𐑮𐑦𐑕𐑑forest 𐑤𐑦𐑝𐑛lived {Kalyb} 𐑞the 𐑓𐑧𐑤fell 𐑦𐑯𐑗𐑭𐑯𐑑𐑮𐑩𐑕enchantress.
𐑑𐑧𐑮𐑩𐑚𐑩𐑤Terrible 𐑢𐑻were 𐑣𐑻her 𐑛𐑰𐑛𐑟deeds, 𐑯and 𐑓𐑿few 𐑞𐑺there 𐑢𐑻were 𐑣𐑵who 𐑣𐑨𐑛had 𐑞the 𐑣𐑸𐑛𐑦𐑣𐑫𐑛hardihood 𐑑to 𐑕𐑬𐑯𐑛sound 𐑞the 𐑚𐑮𐑱𐑟𐑩𐑯brazen 𐑑𐑮𐑳𐑥𐑐𐑩𐑑trumpet 𐑢𐑦𐑗which 𐑣𐑳𐑙hung 𐑴𐑝𐑼over 𐑞the 𐑲𐑼𐑯iron 𐑜𐑱𐑑gate 𐑞𐑨𐑑that 𐑚𐑸𐑛barred 𐑞the 𐑢𐑱way 𐑑to 𐑞the ·𐑩𐑚𐑴𐑛Abode 𐑝of ·𐑢𐑦𐑗𐑒𐑮𐑭𐑓𐑑Witchcraft.
𐑑𐑧𐑮𐑩𐑚𐑩𐑤Terrible 𐑢𐑻were 𐑞the 𐑛𐑰𐑛𐑟deeds 𐑝of {Kalyb} ; 𐑚𐑳𐑑but 𐑩𐑚𐑳𐑝above 𐑷𐑤all 𐑔𐑦𐑙𐑟things 𐑖𐑰she 𐑛𐑦𐑤𐑲𐑑𐑩𐑛delighted 𐑦𐑯in 𐑒𐑨𐑮𐑦𐑦𐑙carrying 𐑪𐑓off 𐑦𐑯𐑩𐑕𐑩𐑯𐑑innocent 𐑯𐑿new-𐑚𐑹𐑯born 𐑚𐑱𐑚𐑟babes, 𐑯and 𐑐𐑳𐑑𐑦𐑙putting 𐑞𐑧𐑥them 𐑑to 𐑛𐑧𐑔death.




I use ruby text for the Latin alphabet text above words, which works well other than being extremely tiny. I will definitely adjust that with CSS if I do more with this code.


As is customary/required for NaNoGenMo, my code is public.

Like I did last year, I relied on Wiktionary's data a lot. (In retrospect, this may have been a mistake, since the Kingsley Read tsv contains duplicates of a lot of what I wanted. But it's possible that really what would be necessary to do this correctly is a much richer and cleaner data source.) My setup steps (see instructions.txt) take the Wiktionary article dump, split it into individual article files, grab only the ones containing English words, and then pull out only the tags and templates I need from the English content. Then my parsing code steps through that content and correlates it with the Kingsley Read file and the wordfreq library, producing Word objects that are then serialized to a text file, which I then sort by frequency.

The generation code then reads in that file to a Lexicon object that has various searching capabilities. It also reads in an extras file containing some words that I need that aren't in both of my main data sources. Then it reads the source text of English Fairy Tales into a Source object, dividing it into Chapter objects containing paragraph groups of Text objects representing sentences. The Text objects are made up of Token objects which can be Word or Punctuation objects. The Word objects have code to produce text in either alphabet, including handling of possessives and proper nouns, as well as spacing. Words that appear in the source but not the lexicon are stored as Punctuation so they can be called out in the printed text as in '{Kalyb}' above. I put the most frequent unfound words from the source into my extras file (sources/extras.txt), but there are still almost 200 remaining, and I ran out of time to do them all.

Time pressures also kept me from doing any more in-depth formatting. I used bare-bones html to get something up, and didn't do any styling or font selection, much less writing a formatted PDF or EPUB.


I would like, at some point, to produce a Shavian alphabet picture book without the length and time restrictions of NaNoGenMo, but it won't be today.

Post a comment in response:

If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting