Entry tags:
NaNoGenMo 2024: A sketch leading towards a Shavian primer
For NaNoGenMo 2024, I wanted to write a children's primer in Shavian. Which I did not succeed in doing, but I produced something and I learned a lot.
Shavian is an alternate alphabet for English created by Kingsley Read in accordance with George Bernard Shaw's will. I was first inspired by this tweet by Carla Hurt:
And I figured it would be an interesting NaNoGenMo project. My initial plan was to do something similar to Gyo Fujikawa's A to Z Picture book (https://www.biblio.com/book/gyo-fujikawas-z-picture-book-gyo/d/1397311738#gallery-6) with letters and a set of things starting with that letter, including pictures and then generate larger texts, using Wiktionary, the Kingsley Read Lexicon, and the wordfreq library as my data sources. This ran into a few challenges:
So my eventual result is a book in two sections: The first is a listing for each letter, with traditional and new names and a single generated sentence, many of which are almost sensical. Here's a section of that:
The rest is an automatic Shavianization of English Fairy Tales by Flora Annie Steel, originally published in 1918. which conveniently is in the public domain and has 40 stories, the same as the number of letters in the Shavian alphabet. My original plan was to replace as many words as possible in each story with words containing the corresponding letter, but the problems above led to a text that was too nonsensical even for a Dada-inspired generative text art project. Here's a sample:
I use ruby text for the Latin alphabet text above words, which works well other than being extremely tiny. I will definitely adjust that with CSS if I do more with this code.
As is customary/required for NaNoGenMo, my code is public.
Like I did last year, I relied on Wiktionary's data a lot. (In retrospect, this may have been a mistake, since the Kingsley Read tsv contains duplicates of a lot of what I wanted. But it's possible that really what would be necessary to do this correctly is a much richer and cleaner data source.) My setup steps (see instructions.txt) take the Wiktionary article dump, split it into individual article files, grab only the ones containing English words, and then pull out only the tags and templates I need from the English content. Then my parsing code steps through that content and correlates it with the Kingsley Read file and the wordfreq library, producing Word objects that are then serialized to a text file, which I then sort by frequency.
The generation code then reads in that file to a Lexicon object that has various searching capabilities. It also reads in an extras file containing some words that I need that aren't in both of my main data sources. Then it reads the source text of English Fairy Tales into a Source object, dividing it into Chapter objects containing paragraph groups of Text objects representing sentences. The Text objects are made up of Token objects which can be Word or Punctuation objects. The Word objects have code to produce text in either alphabet, including handling of possessives and proper nouns, as well as spacing. Words that appear in the source but not the lexicon are stored as Punctuation so they can be called out in the printed text as in '{Kalyb}' above. I put the most frequent unfound words from the source into my extras file (sources/extras.txt), but there are still almost 200 remaining, and I ran out of time to do them all.
Time pressures also kept me from doing any more in-depth formatting. I used bare-bones html to get something up, and didn't do any styling or font selection, much less writing a formatted PDF or EPUB.
I would like, at some point, to produce a Shavian alphabet picture book without the length and time restrictions of NaNoGenMo, but it won't be today.
Shavian is an alternate alphabet for English created by Kingsley Read in accordance with George Bernard Shaw's will. I was first inspired by this tweet by Carla Hurt:
Carla Hurt FoundnAntiquity Oct 8
I can find places that sell posters of the 44 English phonemes for kids, but has anyone made them into a book like the abc books? Each page with one phoneme shown through several words? (Why are there so many alphabet books with only 26 words, one per letter?)
And I figured it would be an interesting NaNoGenMo project. My initial plan was to do something similar to Gyo Fujikawa's A to Z Picture book (https://www.biblio.com/book/gyo-fujikawas-z-picture-book-gyo/d/1397311738#gallery-6) with letters and a set of things starting with that letter, including pictures and then generate larger texts, using Wiktionary, the Kingsley Read Lexicon, and the wordfreq library as my data sources. This ran into a few challenges:
- It was hard to get a generated list of appropriate objects out of my data sources. For one, Wiktionary doesn't have specific enough tagging. For example, one can't ask for "most frequently used words referring to animals" because "white" is tagged as a possible animal name and is more frequent that "weasel".
- As well, wordfreq has no part-of-speech information, meaning that when combined with Wiktionary's handling of all usages of a word, there are some false positives. For example "part" as in "a part is the most popular adverb starting with 𐑐 (/p/) because the noun is so frequent
- Fifty thousand words is a lot of words, so generating letter-specific text was more work than I expected. I got to one sentence of "Interjection, Name verbs the adverb adjective noun" per letter and realized that generating even a paragraph would be a huge effort. And I needed to average 625 transliterated words per letter to hit the goal.
- Given the amount of work, I never even really started on the image side of the work. I had some thoughts about finding the first image on the Wikipedia page for the item, but since I never got to "list of items" that didn't happen.
So my eventual result is a book in two sections: The first is a listing for each letter, with traditional and new names and a single generated sentence, many of which are almost sensical. Here's a section of that:
𐑔
𐑔𐑲 /𐑔𐑹𐑯
𐑔𐑨𐑙𐑒𐑕 ,·𐑔𐑾𐑛𐑹 𐑔𐑰𐑥𐑟 𐑞 𐑔𐑮𐑵 𐑔𐑻𐑛 𐑔𐑨𐑗𐑼 .𐑞
𐑞𐑱 /𐑞𐑬
𐑞𐑦𐑕 ,·𐑣𐑧𐑞𐑼 𐑳𐑞𐑼𐑟 𐑞 𐑞𐑺𐑓𐑹 𐑑𐑩𐑜𐑧𐑞𐑼 𐑢𐑧𐑞𐑼𐑥𐑨𐑯 .
The rest is an automatic Shavianization of English Fairy Tales by Flora Annie Steel, originally published in 1918. which conveniently is in the public domain and has 40 stories, the same as the number of letters in the Shavian alphabet. My original plan was to replace as many words as possible in each story with words containing the corresponding letter, but the problems above led to a text that was too nonsensical even for a Dada-inspired generative text art project. Here's a sample:
𐑕𐑩𐑯𐑑 .·𐑡𐑹𐑡 𐑝 ·𐑥𐑧𐑮𐑦 ·𐑦𐑙𐑜𐑤𐑩𐑯𐑛
𐑦𐑯 𐑞 {darksome}𐑛𐑧𐑐𐑔𐑕 𐑝 𐑩 𐑔𐑦𐑒 𐑓𐑪𐑮𐑦𐑕𐑑 𐑤𐑦𐑝𐑛 {Kalyb}𐑞 𐑓𐑧𐑤 𐑦𐑯𐑗𐑭𐑯𐑑𐑮𐑩𐑕 .𐑑𐑧𐑮𐑩𐑚𐑩𐑤 𐑢𐑻 𐑣𐑻 𐑛𐑰𐑛𐑟 ,𐑯 𐑓𐑿 𐑞𐑺 𐑢𐑻 𐑣𐑵 𐑣𐑨𐑛 𐑞 𐑣𐑸𐑛𐑦𐑣𐑫𐑛 𐑑 𐑕𐑬𐑯𐑛 𐑞 𐑚𐑮𐑱𐑟𐑩𐑯 𐑑𐑮𐑳𐑥𐑐𐑩𐑑 𐑢𐑦𐑗 𐑣𐑳𐑙 𐑴𐑝𐑼 𐑞 𐑲𐑼𐑯 𐑜𐑱𐑑 𐑞𐑨𐑑 𐑚𐑸𐑛 𐑞 𐑢𐑱 𐑑 𐑞 ·𐑩𐑚𐑴𐑛 𐑝 ·𐑢𐑦𐑗𐑒𐑮𐑭𐑓𐑑 .𐑑𐑧𐑮𐑩𐑚𐑩𐑤 𐑢𐑻 𐑞 𐑛𐑰𐑛𐑟 𐑝 {Kalyb} ;𐑚𐑳𐑑 𐑩𐑚𐑳𐑝 𐑷𐑤 𐑔𐑦𐑙𐑟 𐑖𐑰 𐑛𐑦𐑤𐑲𐑑𐑩𐑛 𐑦𐑯 𐑒𐑨𐑮𐑦𐑦𐑙 𐑪𐑓 𐑦𐑯𐑩𐑕𐑩𐑯𐑑 𐑯𐑿 -𐑚𐑹𐑯 𐑚𐑱𐑚𐑟 ,𐑯 𐑐𐑳𐑑𐑦𐑙 𐑞𐑧𐑥 𐑑 𐑛𐑧𐑔 .
I use ruby text for the Latin alphabet text above words, which works well other than being extremely tiny. I will definitely adjust that with CSS if I do more with this code.
As is customary/required for NaNoGenMo, my code is public.
Like I did last year, I relied on Wiktionary's data a lot. (In retrospect, this may have been a mistake, since the Kingsley Read tsv contains duplicates of a lot of what I wanted. But it's possible that really what would be necessary to do this correctly is a much richer and cleaner data source.) My setup steps (see instructions.txt) take the Wiktionary article dump, split it into individual article files, grab only the ones containing English words, and then pull out only the tags and templates I need from the English content. Then my parsing code steps through that content and correlates it with the Kingsley Read file and the wordfreq library, producing Word objects that are then serialized to a text file, which I then sort by frequency.
The generation code then reads in that file to a Lexicon object that has various searching capabilities. It also reads in an extras file containing some words that I need that aren't in both of my main data sources. Then it reads the source text of English Fairy Tales into a Source object, dividing it into Chapter objects containing paragraph groups of Text objects representing sentences. The Text objects are made up of Token objects which can be Word or Punctuation objects. The Word objects have code to produce text in either alphabet, including handling of possessives and proper nouns, as well as spacing. Words that appear in the source but not the lexicon are stored as Punctuation so they can be called out in the printed text as in '{Kalyb}' above. I put the most frequent unfound words from the source into my extras file (sources/extras.txt), but there are still almost 200 remaining, and I ran out of time to do them all.
Time pressures also kept me from doing any more in-depth formatting. I used bare-bones html to get something up, and didn't do any styling or font selection, much less writing a formatted PDF or EPUB.
I would like, at some point, to produce a Shavian alphabet picture book without the length and time restrictions of NaNoGenMo, but it won't be today.