Pseudo-words
July 2009 (index)
500 psuedo-words which aren't in the dictionary, but look and sound a bit like English words. (Details on how it's done after the list.)
unshott celants sanny thriste alies binated unded pennag zigant inworm priole enship heters multive simos anoes thwards nauses wittes selfies pronism calcid desiest baction hokines bemurid microny clite crunker arths moffer belism pneums outriva evenule conjuga octasis atters suborne stoon nonders lexing panded ences tened inded hered tusket oldier restas survets pooved encies rattack ching entals phonium dimino ophther longay convers spators spadas yette greavow pathism erments fulming kyogees jollers legics tearies fortle nanose nickest assed trism flavort misings raidine suspex semine undases lable racella desis baccose intess anties inclass mainery liness reces wellers lably hoopin unders subed extile predors penito abster isolans guish spookie limates beslady nattish frical enhear climan pleates redied semints rillish clodium unfeast herbird antions vinized ritors arship alched mescend rinkers frounds kaise cleane renfore chies epistas dispund aeolias austers ladding chery embos signe reting sennate blooder defore multish heteron cantory trame krates suffle pathier awares obtuses brated uplets parally tamandi sonate propies holytic hoative safara gastron mility renules ector cically inding ferting imploce jillian interos marging epicle ossily warter wouned spouths acous conting ashlams icelers hippled neoplex ciner perism simila endomes prefall chooley perist washets hemas chily timetry fernis unnah mella quacils parabi yello oversal unger acques kerbal espalds ouths tolers persed argos longle eering alumino typies panars retats ungings admonic helioma apeably terplea dissee moffend litudes engult berthy senzas anents fumetts strial semide austle ouses sachers diaphy fortile blung quadril faggons phthons copic empled discon mising quaddy etchies incuria lates tracism cution creting copulse chiere napase sible noning arred foxies hemics radic unquent yaour scable tuberos spher defted smicks neight svastic amers jettise miliac gaudons andraw comple funkies breechy prouse horness lacter assis numing eadily avoises befring inhook resine hinnie accides pedipal wrapsid peries ostes nette merlier antases bigails ructule effies anning centete existe hydring lixing sagogy anging balthy squeasy swined moting incoat reoppos konkers fisture dreggie saultic jugger undooks antical comed veilles spatchy outbraw exsicca nunche jitten oinomen postess pansing yukkies phyls trillas dework sache athetus thankle anors berds astros zebrian ontical fantre phaeing sipplet palamps amphies sootins ramid dourn ripey tetee spected perits outral beaves saling frenes appers ested antry kedgees parist salins eadis bedware burbase progest decling civilia frowing presine overy narging superel somes arble darnet cented zeching cannes anticed phils liqued uness steeing squels bicars dised satia dollis decher feneses hecked pretail leping retream dising unting flator duskers roiners pedest spoints dittily iatrial sping pillas resis intes filared revalls entate titas contic feckler darians dreided lamin sconcer brical polias liced vities reoccus meterms dissers pulists slatten flanky ragiler epheias misling unfilia cring meship sacles surably elities stoken kalia stric wylies propism vivism lassics discond bizars nonvict mesomed staming bacter twittes bankmen sendums nicote bacted legance atrism bricker rumming amalgre missoid cotting inting tonna plagic exothe moderno presini heter darka bodiest dialyte bolized ventum greas leproof betaker arkin espator intient reasses rallops chlores hedypha dosion gonists curners centias entions attered aders giaourt imming humatos idology teamier nonants forers coadax ories kerator
These are generated using a Markov process. Essentially we pick random letters based on their frequency in a list of dictionary words.
So if we've already got a string of letters to start our word "canto" then we take the last N as out prefix "anto", pick a letter randomly from all the letters that follow "anto" in a word in our dictionary ("i", "r", "m", "n", "n") and use that as the next letter of our word.
Ending a word works in a similar way. We think of our words as having a space at the end ("penguin_" rather than "penguin") and if we reach a space then we stop.
Starting a word is a little different. We take the first N letters from a random word of at least N letters. We do this because some combinations of letters can appear in the middle of a word "...ngin..." but not at the start.
Finally we filter out words that appear in the dictionary, and we can also filter by word length.
Varying N (the "order" of the process) gives a different feeling to the words.
With N=0 the letters are randomly chosen (but according to frequencies in English with lots of e,t,a,... and not many q,j,x).
aakrat raapoie cbucpos puoceo hiabbya temna ertrz garntes rcuup ealenr tieoilg teoei nilpst psaat eivygoo tsion tacepil racoqr ahrirlt erbrenr enohn rdtee nneersy eiaesiu grlevce gcheib ncsiro sscoute rhets epnst istna dmggrn oheir dteep cpasv sirdnf eysisnc rpldc vwereh lzioc toltein rbpceid inrmr ecgdtr ooseae axealc lainbr orgucth fdieip smeos
With N=1 pairs of letters can only appear if they exist in real dictionary words. So we should never see "...qj..." in a word. There's slightly more structure but it's often unpronounceable.
arind gispels paless rring fiamer thegud melis pwore demis mbson rinetr telwit pouly urngs alble tdlae benssus grilare ggicrd kenis tioin arlages cotrtas bovipl deltres prchy foles semaul alesed haioes erans eserr teauas cononrs plote lsthng secors wenge dseren nicuc rroung ppecs holvedr tingeld ymasows biede quntes tenkly atldd inthus
With N=2 we're getting closer to pronounceable words.
auchele spifs diong poolon inizoch bletrap catee uperang bifiser pstrete carreek flimins bacizes persahs nucke exads niatist paull rewing gatic oimmy ovening spewees coing ovency extrics rejol terim ching intions dulang bobed lises antrud dicasms cholds pinges cisers nomaner sonide atinte misku catters vally achies tarant surver phicoly fecti prists
With N=3
posised inting parised codic jokin squial bandles airgas oment scritis priners owroted rooth glowdy reless latory royable bured fleekly shrited klutive lapited vilings etyme countly tronite gallin agantly ubing dopic cutlus reble fianges borse contion palers croque suning starize rified vasotte bette moning wroopes agoned resis cogna rects pness collior
And with N=4 they're almost all pronounceable.
tsarity solites civile circuse sacrers patries cration imple peated ingly enfless falsis utiling taleon dyeably reling alence rehing hemical beques saucine guaning seldoms valised ravity escalds uprous prothed achinks iting refring nourine jorder unival albar tsari beeze malated muscate emises poric dipters oralows loating sliples petrots cabbas gambis ories unding
What is this useful for (other than entertaining silliness)? Initially I was thinking about internet domain names; ibbly was found this way. It could also be useful for coming up with memorable passwords that aren't in the dictionary (although they should be longer and contain some characters other than lower-case letters, and we may need to find a way of filtering out word that could be perceived as rude).
Maybe (extending word-length):
thologically cheesisterly oversaleably bombloadster myofilmmaker coquisitions physiographs bollocksures subscripting brickfishing gripmentures checkleburst irratologist antepasskeys auberosphers negrindlenut misfunctuous truebluefied gallectating emplorations prothoracled interoclised peevisitions bilberrility nanosecurias purelevanist schnologists harpenitages sorderivings ultractorice brillowwares pardownplant disclosiness microciously nonadicallic velornnesses microticians sheenerously amphibolossy delustereoed reventiously alternalisms pseudoallers mechangaring lankertorias ochrituratic deworthoaxis touchingling subpopulated retrographid portholobate extensinuses
I like these. I can imagine a doctor (or more likely some dodgy pseudo-scientific nutritionist) saying: "It's bad news Mr Smith. Your physiographs show symptoms of gallectating lankertorias so I'll have to refer you to the irratologist".
This technique might also suit a modern Edward Lear.
Guess the language
The same thing, but taken from word lists from languages other than English.
seesten gerung grenz riess zusamme wehrbar einkult westes beimen fauster eiger niedes alleres sorgung deges fehle blutend indung geldet schluss fehlern talem kulator demen stamen unempft chrieb minist gebenen genes kehrs voraben verzung magne gebung vorraet gewasse angen unterer katen lebens beche stapeln gebene trueck denkuer patts lebenen zahltem konsinn
domine operto pomodo tonassi stino peremo palavo fasti somate bozzano rarono guaste trafo perera granera rassino imboli roviste ereste operero spiata betullo scerei torsive orgati sotta obbie stanti smerai asciano inche giudi sconne frugher atteva integgi adagner diziose ormenti fegata sognosi affer testava serai conda capiti trate tentana osseda schiamo
hidach kanem takasan kokusin kessic shituz hishuto aisotu wadatsu mushio okumeta titui reigan kesitog sinkasa honzeki pastec utenr hange hidara sabet dourofu sintoub geino jyunint tazyoen shukut kuibuka shigae keiky inoute chinf tounob futuke pantuk bansai maresom genshik singamu konsy akumora shinb chirado yuisi eibitu konsh yasyo sabur kosurei nyuansi
Code
Python code is here. You'll need to give it a piece of text to get the letter frequencies from. For English I used sowpods.txt. Other languages are from Project Gutenberg.
ibbly.com contact