Discover how sylang optimizes communication between humans and machines
sylang is built on four foundational principles that guided every design decision:
These principles operate in a hierarchy, with computational benefits taking precedence when trade-offs were necessary, but never to the point of rendering the language impractical for human users.
sylang employs a streamlined phonological inventory chosen for maximum distinctiveness and ease of transcription. The sound system uses a subset of English-familiar sounds, avoiding those that are easily confusable in speech or result in complex spellings.
sylang uses an agglutinative morphology designed for both clarity and compression. Words are formed by stringing together morphemes in a fixed, logical order.
kari-su → "will read"
kari (read) + -su (future tense)
Each morpheme carries a single, clear meaning, and morphemes concatenate without changing form.
sylang syntax is designed for maximum clarity and predictability, making it easy for both humans and machines to parse.
Taru meruta.
(The person spoke.)
Subject-Verb-Object order, no extra marking needed.
Taru karisu lema.
(The person will read a book.)
The sylang vocabulary is tight and efficient, making each word or morpheme count.
taru
- persondoma
- houseani
- watermeru
- speakkari
- readsylang is written using a streamlined orthography with the dual goals of being easy for humans to read/type and easy for machines to tokenize.
Sylang Prime is a constructed language that has been specifically designed and optimized for both human and computational efficiency, particularly for processing by Large Language Models (LLMs). Its core goals are to maximize LLM efficiency, achieve high conceptual precision, and maintain human learnability. This is achieved by carefully balancing competing demands for token efficiency, semantic precision, and structural clarity.
Sylang Prime uses a carefully selected set of only 21 ASCII characters (5 vowels and 16 consonants) for maximum visual distinctiveness and tokenizer-friendliness. Key sounds like /ʃ/ (sh), /ŋ/ (ng), and /tʃ/ (ch) are represented by single letters (x, q, c) instead of digraphs. It employs a simple CV(C) syllable structure, fixed stress on the penultimate syllable, and no tone or complex consonant clusters. This simplicity minimizes ambiguity and helps align word and morpheme boundaries with tokenizer splits, enhancing processing efficiency.
Sylang Prime uses an agglutinative morphology, where morphemes attach to roots without changing form, creating clear boundaries. Affixes are ordered hierarchically based on their scope (Root → Derivational → Valency → Aspect → Tense → Mood). Tense and aspect markers are typically single consonants (-t for past, -s for future, -p for perfective, -r for imperfective), and the present tense is often unmarked (zero morpheme). Derivational affixes are also ultra-compact (-mi for adjective, -ja for agent, -xa for abstract noun). This system minimizes redundancy (e.g., no gender or number agreement on verbs) and uses optional markers only when necessary for disambiguation, resulting in high information density per word and efficient tokenization.
The basic sentence structure in Sylang Prime is Subject-Verb-Object (SVO) for main clauses, providing predictable processing. Subordinate clauses (introduced by markers like ko for complement clauses or ze for relative clauses) use a verb-final order, explicitly marking their boundaries. Modifiers generally follow the elements they modify (head-initial phrases), such as adjectives following nouns and adverbs following verbs. The language explicitly disallows deeply nested clauses (center-embedding) to maintain linear parsing.
Questions, commands, and suggestions are clearly marked at the beginning of the sentence using specific particles or prefixes. Yes/No questions use the particle ke at the start. "Wh-" questions (who, what, etc.) use the prefix ke- attached to the questioned element. Commands (imperatives) are marked with the initial particle du, and suggestions (hortatives, like "let's...") are marked with the initial particle xo.
Sylang Prime uses explicit mechanisms to manage reference and information flow, crucial for both humans and AI. Anaphoric reference (referring back to a previously mentioned entity) is handled with the -je suffix added to the noun. Deixis (referring to things in the immediate context) uses specific words like ci (this) and ca (that). Optional prefixes like na- can mark the sentence topic, and za- can mark a focused or emphasized element. This clarity reduces ambiguity, especially when multiple entities are involved or when information is fronted for emphasis.
Sylang Prime incorporates an evidentiality system using prefixes attached to the verb or clause to indicate the speaker's level of certainty or the source of the information. vi- indicates certainty or inference, ru- indicates hearsay or reported information, and mo- indicates uncertainty or speculation. This explicit marking helps distinguish between direct knowledge, reported facts, and speculative statements, enhancing the precision of communication.
The Sylang Prime lexicon is engineered for tokenizer efficiency and semantic clarity. Root words are kept very short (typically 1-2 syllables). Complex concepts are frequently expressed through productive compounding of these short roots, rather than introducing long, novel words, allowing LLMs to infer meaning from known parts. High-frequency words are especially brief. The vocabulary design also considers semantic embedding space, aiming for related concepts to share phonological patterns and for conceptual distances to be reflected in word forms, aiding LLM prediction and generation.