Decipherment: Difference between revisions

Jump to navigation Jump to search
imported>JJMC89 bot III
 
imported>Onceinawhile
 
Line 5: Line 5:
In [[philology]] and [[linguistics]], '''decipherment''' is the discovery of the meaning of the symbols found in extinct [[language]]s and/or [[alphabet]]s.<ref>Trask, R.L (2000). ''The Dictionary of Historical and Comparative Linguistics''. Fitzroy Dearborn Publishers, p. 82 ("The process of determining the relation between an extinct and unknown writing system and the language it represents. Strictly, decipherment is the elucidation of the ''script''—that is, determining the values of the written characters")</ref> Decipherment is possible with respect to languages and scripts. One can also study or try to decipher how spoken languages that no longer exist were once pronounced, or how living languages used to be pronounced in prior eras.
In [[philology]] and [[linguistics]], '''decipherment''' is the discovery of the meaning of the symbols found in extinct [[language]]s and/or [[alphabet]]s.<ref>Trask, R.L (2000). ''The Dictionary of Historical and Comparative Linguistics''. Fitzroy Dearborn Publishers, p. 82 ("The process of determining the relation between an extinct and unknown writing system and the language it represents. Strictly, decipherment is the elucidation of the ''script''—that is, determining the values of the written characters")</ref> Decipherment is possible with respect to languages and scripts. One can also study or try to decipher how spoken languages that no longer exist were once pronounced, or how living languages used to be pronounced in prior eras.


Notable examples of decipherment include the [[decipherment of ancient Egyptian scripts]] and the [[decipherment of cuneiform]]. A notable decipherment in recent years is that of the [[Linear Elamite]] script.<ref name=":4" /> Today, at least a dozen languages remain undeciphered.<ref name=":0">{{Cite journal |last1=Luo |first1=Jiaming |last2=Hartmann |first2=Frederik |last3=Santus |first3=Enrico |last4=Barzilay |first4=Regina |last5=Cao |first5=Yuan |date=2021 |title=Deciphering Undersegmented Ancient Scripts Using Phonetic Prior |url=https://direct.mit.edu/tacl/article/97780 |journal=Transactions of the Association for Computational Linguistics |language=en |volume=9 |pages=69–81 |doi=10.1162/tacl_a_00354 |issn=2307-387X|arxiv=2010.11054 }}</ref> Historically speaking, decipherments do not come suddenly through single individuals who "crack" ancient scripts. Instead, they emerge from the incremental progress brought about by a broader community of researchers.<ref name=":5" />
[[Maurice Pope (linguist)| Maurice Pope]] wrote that "Decipherments are by far the most glamorous achievements of scholarship… It is also a key to further knowledge, opening a treasure-vault of history through which for countless centuries no human mind has wandered."<ref name="z897">{{cite book | last=Pope | first=Maurice | title=The Story of Decipherment | publisher=Thames & Hudson | publication-place=London | date=1999 | isbn=978-0-500-28105-5 | page=9|quote= Decipherments are by far the most glamorous achievements of scholarship. There is a touch of magic about unknown writing, especially when it comes from the remote past, and a corresponding glory is bound to attach itself to the person who first solves its mystery. Moreover a decipherment is not just a mystery solved. It is also a key to further knowledge, opening a treasure-vault of history through which for countless centuries no human mind has wandered. Finally, it may be a dramatic personal triumph. Though many decipherments have been carried through by professional scholars as it were in the normal course of duty, this is not so for the three most famous: the decipherment of the Egyptian hieroglyphs by Champollion, of cuneiform by Rawlinson, and of Mycenaean Linear B by Ventris.}}</ref> Pope described the three most famous as the [[decipherment of ancient Egyptian scripts]], the [[decipherment of cuneiform]] and the decipherment of [[Linear B]].<ref name="z897"/> A notable decipherment in recent years is that of the [[Linear Elamite]] script, in 2022.<ref name=":4" /> Today, at least a dozen languages remain undeciphered.<ref name=":0">{{Cite journal |last1=Luo |first1=Jiaming |last2=Hartmann |first2=Frederik |last3=Santus |first3=Enrico |last4=Barzilay |first4=Regina |last5=Cao |first5=Yuan |date=2021 |title=Deciphering Undersegmented Ancient Scripts Using Phonetic Prior |url=https://direct.mit.edu/tacl/article/97780 |journal=Transactions of the Association for Computational Linguistics |language=en |volume=9 |pages=69–81 |doi=10.1162/tacl_a_00354 |issn=2307-387X|arxiv=2010.11054 }}</ref>  
 
Historically speaking, decipherments do not come suddenly through single individuals who "crack" ancient scripts. Instead, they emerge from the incremental progress brought about by a broader community of researchers.<ref name=":5" />


Decipherment should not be confused with [[cryptanalysis]], which aims to decipher special written codes or [[cipher]]s used in intentionally concealed secret communication (especially during war). It should also not be confused with determining the meaning of ambiguous text in a known language (interpretation).<ref name=":5">{{Cite book |url=https://books.google.com/books?id=sl_dDVctycgC&pg=PA417 |title=International encyclopedia of linguistics |date=2003 |publisher=Oxford University Press |isbn=978-0-19-513977-8 |editor-last=Frawley |editor-first=William |edition= |location= |pages=420}}</ref>
Decipherment should not be confused with [[cryptanalysis]], which aims to decipher special written codes or [[cipher]]s used in intentionally concealed secret communication (especially during war). It should also not be confused with determining the meaning of ambiguous text in a known language (interpretation).<ref name=":5">{{Cite book |url=https://books.google.com/books?id=sl_dDVctycgC&pg=PA417 |title=International encyclopedia of linguistics |date=2003 |publisher=Oxford University Press |isbn=978-0-19-513977-8 |editor-last=Frawley |editor-first=William |edition= |location= |pages=420}}</ref>


== Categories ==
== History ==
Gelb and Whiting classify the four situations of an undeciphered language and how difficult decipherment will be in each of them:<ref name=":1">{{Cite journal |last1=Gelb |first1=I. J. |last2=Whiting |first2=R. M. |date=1975 |title=Methods of Decipherment |url=https://www.cambridge.org/core/journals/journal-of-the-royal-asiatic-society/article/abs/methods-of-decipherment1/2DD45E0994C82E315715D106656E5293 |journal=Journal of the Royal Asiatic Society |language=en |volume=107 |issue=2 |pages=95–104 |doi=10.1017/S0035869X00132769 |issn=2051-2066|url-access=subscription }}</ref><ref name=":2">{{Cite journal |last1=Braović |first1=Maja |last2=Krstinić |first2=Damir |last3=Štula |first3=Maja |last4=Ivanda |first4=Antonia |date=2024-06-01 |title=A Systematic Review of Computational Approaches to Deciphering Bronze Age Aegean and Cypriot Scripts |url=https://direct.mit.edu/coli/article/50/2/725/119990/A-Systematic-Review-of-Computational-Approaches-to |journal=Computational Linguistics |language=en |volume=50 |issue=2 |pages=725–779 |doi=10.1162/coli_a_00514 |issn=0891-2017|doi-access=free }}</ref>
Interest in ancient scripts and dead languages began to arise by the [[Renaissance]], if not earlier. Extensive information began to be collected about these scripts in the 16th and 17th centuries, and a typology of writing was established in the 17th century. The first serious decipherments, however, did not take place until the 18th century. In 1754, Swinton and Barthélemy independently deciphered the Aramaic script as represented in [[Palmyrene inscriptions]], the first "dead" language to be deciphered.<ref name=Daniels/>
 
* Type O: known writing and known language. Although decipherment in this case is trivial, useful information can be gleaned when a known language is written in an alphabet other than the one it is commonly written in. Studying the writing of the [[Phoenician language|Phoenician]] or [[Sumerian language|Sumerian]] languages in the [[Greek alphabet]] allows information about pronunciation and vocalization to be gleaned that cannot be obtained when studying the expression of these languages in their normal writing system.
* Type I: unknown writing and known language. Deciphered languages in this category include [[Phoenician language|Phoenician]], [[Ugaritic]], [[Cypriot syllabary|Cypriot]], and [[Linear B]]. In this situation, [[alphabet]]ic systems are the easiest to decipher, followed by [[Syllabary|syllabic]] languages, and finally the most difficult being [[Logogram|logo-syllabic]].
* Type II: known writing and unknown language. An example is [[Linear A]]. Strictly speaking, this situation is not one of decipherment but of linguistic analysis. Decipherment in this category is considered extremely difficult to achieve on the basis of internal information only.
* Type III: unknown writing and unknown language. Examples include the Archanes script and the Archanes formula, [[Phaistos Disc|Phaistos disk]], [[Cretan hieroglyphs]], and [[Cypro-Minoan syllabary]]. When this situation occurs in an isolated culture and without the availability of outside information, decipherment is typically considered impossible.
 
== Methods ==
There is no single recipe or linear method for decipherment, however: instead, philologists and linguists must rely on a set of [[Heuristic|heuristic devices]] that have been established. Broadly, it is important to be familiar with the relevant texts where the script or language occurs in, access to accurate drawings or photographs of these texts, information about their relative chronology, and background information on where the texts occur in (their geography, perhaps being found in the context of a funerary monument, etc).<ref name=":5" />
 
These methods can be divided into approaches utilizing external or internal information.<ref name=":1" />
 
=== External information ===
Many successful decipherments have proceeded from the discovery of external information, a common example being through the use of [[multilingual inscription]]s, such as the [[Rosetta Stone]] (with the same text in three scripts: [[Demotic (Egyptian)|Demotic]], [[Egyptian hieroglyphs|hieroglyphic]], and [[Greek alphabet|Greek]]) that enabled the decipherment of Egyptian hieroglyphic. In principle, multilingual text may be insufficient for a decipherment as translation is not a linear and reversible process, but instead represents an encoding of the message in a different symbolic system. Translating a text from one language into a second, and then from the second language back into the first, rarely reproduces exactly the original writing. Likewise, unless a significant number of words are contained in the multilingual text, limited information can be gleaned from it.<ref name=":1" />
 
=== Internal information ===
Internal approaches are multi-step: one must first ensure that the writing they are looking at represents real writing, as opposed to a grouping of pictorial representations or a modern-day forgery without further meaning. This is commonly approached with methods from the field of [[grammatology]]. Prior to decipherment of meaning, one can then determine the number of distinct [[grapheme]]s (which, in turn, allows one to tell if the writing system is alphabetic, syllabic, or logo-syllabic; this is because such writing systems typically do not overlap in the number of graphemes they use<ref name=":2" />), the sequence of writing (whether it be from left to right, right to left, top to bottom, etc.), and the determination of whether individual words are properly segmented when the alphabet is written (such as with the use of a space or a different special mark) or not. If a repetitive schematic arrangement can be identified, this can help in decipherment. For example, if the last line of a text has a small number, it can be reasonably guessed to be referring to the date, where one of the words means "year" and, sometimes, a royal name also appears. Another case is when the text contains many small numbers, followed by a word, followed by a larger number; here, the word likely means "total" or "sum". After one has exhausted the information that can be inferentially derived from probable content, they must transition to the systematic application of statistical tools. These include methods concerning the frequency of appearance of each symbol, the order in which these symbols typically appear, whether some symbols appear at the beginning or end of words, etc. There are situations where orthographic features of a language make it difficult if not impossible to decipher specific features (especially without certain outside information), such as when an alphabet does not express double consonants. Additional, and more complex methods, also exist. Eventually, the application of such statistical methods becomes exceedingly laborious, in which computers might be used to apply them automatically.<ref name=":1" />
 
=== Computational approaches ===
Computational approaches towards the decipherment of unknown languages began to appear in the late 1990s.<ref>{{Cite journal |last1=Knight |first1=Kevin |last2=Yamada |first2=Kenji |date=1999 |title=A Computational Approach to Deciphering Unknown Scripts |url=https://aclanthology.org/W99-0906.pdf |journal=Unsupervised Learning in Natural Language Processing}}</ref> Typically, there are two types of computational approaches used in language decipherment: approaches meant to produce translations in known languages, and approaches used to detect new information that might enable future efforts at translation. The second approach is more common, and includes things such as the detection of cognates or related words, discovery of the closest known language, word alignments, and more.<ref name=":2" />
 
=== Artificial intelligence ===
In recent years, there has been a growing emphasis on methods utilizing [[artificial intelligence]] for the decipherment of lost languages, especially through [[natural language processing]] (NLP) methods. Proof-of-concept methods have independently re-deciphered [[Ugaritic alphabet|Ugaritic]] and [[Linear B]] using data from similar languages, in this case [[Hebrew alphabet|Hebrew]] and [[Ancient Greek]].<ref>{{Cite book |last1=Luo |first1=Jiaming |last2=Cao |first2=Yuan |last3=Barzilay |first3=Regina |title=Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics |date=2019 |chapter=Neural Decipherment via Minimum-Cost Flow: From Ugaritic to Linear B |language=en |publisher=Association for Computational Linguistics |pages=3146–3155 |doi=10.18653/v1/P19-1303|arxiv=1906.06718 }}</ref>
 
== Deciphering pronunciation ==
Related to attempts to decipher the meaning of languages and alphabets, include attempts to decipher how extinct writing systems, or older versions of contemporary writing systems (such as English in the 1600s) were pronounced. Several methods and criteria have been developed in this regard. Important criteria include (1) Rhymes and the testimony of poetry (2) Evidence from occasional spellings and misspellings (3) Interpretations of material in one language from authors in foreign languages (4) Information obtained from related languages (5) Grammatical changes in spelling over time.<ref name=":3">{{Cite book |last=Campbell |first=Lyle |title=Historical linguistics: an introduction |date=2021 |publisher=MIT Press |isbn=978-0-262-53159-7 |edition=4th |location= |pages=372–375}}</ref>
 
For example, analysis of poetry focuses on the use of wordplay or literary techniques between words that have a similar sound. [[William Shakespeare|Shakespeare]]'s play ''[[Romeo and Juliet]]'' contains wordplay that relies on a similar sound between the words "soul" and "soles", allowing confidence that the similar pronunciation between the terms today also existed in Shakespeare's time. Another common source of information on pronunciation is when earlier texts use [[rhyme]], such as when consecutive lines in poetry end in the similar or the same sound. This method does have some limitations however, as texts may use rhymes that rely on visual similarities between words (such as 'love' and 'remove') as opposed to auditory similarities, and that rhymes can be [[Perfect and imperfect rhymes|imperfect]]. Another source of information about pronunciation comes from explicit description of pronunciations from earlier texts, as in the case of the ''Grammatica Anglicana'', such as in the following comment about the letter <o>: "In the long time it naturally soundeth sharp, and high; as in chósen, hósen, hóly, fólly [. . .] In the short time more flat, and a kin to u; as còsen, dòsen, mòther, bròther, lòve, pròve".<ref>{{Cite book |last1=Burridge |first1=Kate |title=Understanding language change |last2=Bergs |first2=Alexander |date=2017 |publisher=Routledge, Taylor & Francis Group |isbn=978-0-415-71339-9 |series=Understanding language series |location=London New York |pages=234–235}}</ref> Another example comes from detailed comments on pronunciations of [[Sanskrit]] from the surviving works of Sanskrit grammarians.<ref name=":3" />
 
== Challenges ==
Many challenges exist in the decipherment of languages, including when:<ref name=":0" /><ref name=":2" />
 
* When it is not known which language is closest to it.
* When the words in the script are not clearly segmented, like in some [[Iberian language]]s.
* When the writing system is not known. In specific, if there is little certainty towards the number of graphemes that exist in a certain writing system, it cannot be determined if that system is an alphabet, a syllabry, a logosyllabry, or something else.
* When the reading direction is not known. For example, it may not be clear if a writing system is meant to be read from left to right, or from right to left.
* When it is not known if a script uses punctuation or spaces between words.
* When the language of a script subject to decipherment efforts is not known.
* When there is a small dataset available to learn about the properties of a script. This could lead to issues such as an incomplete vocabulary being known for the script.
* When the typical order between subjects, objects, and verbs is not known.
* When it is not known whether or how certain words can change their form.
* When it is not known when multiple symbols are used to represent the same sound, syllable, word, concept, or idea (allographs).
* When it is not clear how the penmanship or the style of writing of a particular scribe relates to the style of writing of another scribe working in the same text (the same letters or words might be written in a way that looks different), in which case it is difficult to correlate information across multiple examples of the use of the writing system.
* When it is not known if certain words change their meaning depending on the context they appear in (homonyms).
* When the context of discovery of a writing is not known. This is because information about the location out of which a writing system came from can provide valuable information about its relationship to known languages.
* When adequate digital datasets for documented writing systems is not available, limiting the ability to use computational methods for decipherment.
* When sufficient hardware resources, such as [[High-performance computing|high performance computing]], is not available (which might be necessary for more energy-intensive computational methods).


== Relationship to cryptanalysis ==
Between 1787-91, [[Silvestre de Sacy]] deciphered the [[Pahlavi scripts]], which was the script used in [[Ancient Persia]] to write down the [[Middle Iranian]] language used in the [[Sasanian Empire|Sasanian empire]]. Both decipherments relied on bilingual texts where Greek was included as the second script. It was also in the 18th century when the methodological framework for deciphering scripts and languages began to be established. For example, in 1714, [[Leibniz]] advocated that parallel content in bilingual inscriptions could be specified by correlating where personal names occur in both inscriptions.  
Decipherment overlaps with another technical field known as [[cryptanalysis]], a field that aims to decipher writings used in secret communication, known as [[ciphertext]]. A famous case of this was in the [[cryptanalysis of the Enigma]] during the [[World War II]]. Many other ciphers from past wars have only recently been cracked.<ref>{{Cite journal |last=Bauer |first=Craig P. |date=2023-03-04 |title=The new golden age of decipherment |url=https://www.tandfonline.com/doi/full/10.1080/01611194.2023.2170158 |journal=Cryptologia |language=en |volume=47 |issue=2 |pages=97–100 |doi=10.1080/01611194.2023.2170158 |issn=0161-1194}}</ref> Unlike in language decipherment, however, actors using ciphertext intentionally lay obstacles to prevent outsiders from uncovering the meaning of the communication system.<ref name=":1" />


== History ==
By the 19th century, the prerequisites for decipherment began to become widely available. These included extensive knowledge about the scripts themselves, adequate editions of known texts from that script, philological skills, and the ability to reconstruct linguistic forms from the limited available evidence. The 19th century saw two major successes in decipherment: that of [[Decipherment of ancient Egyptian scripts|Egyptian hieroglyphic]] and [[Decipherment of cuneiform|cuneiform]].<ref name=":5" />
Interest in ancient scripts and dead languages began to arise by the [[Renaissance]], if not earlier. Extensive information began to be collected about these scripts in the 16th and 17th centuries, and a typology of writing was established in the 17th century. The first serious decipherments, however, did not take place until the 18th century. In 1754, Swinton and Barthélemy independently deciphered the Aramaic script as represented in Palmyrene inscriptions from the 3rd century AD. In 1787, [[Silvestre de Sacy]] deciphered the [[Sasanian script]], which was the script used in [[Ancient Persia]] to write down the [[Middle Iranian]] language used in the [[Sasanian Empire|Sasanian empire]]. Both decipherments relied on bilingual texts where Greek was included as the second script. It was also in the 18th century when the methodological framework for deciphering scripts and languages began to be established. For example, in 1714, [[Leibniz]] advocated that parallel content in bilingual inscriptions could be specified by correlating where personal names occur in both inscriptions. By the 19th century, the prerequisites for decipherment began to become widely available. These included extensive knowledge about the scripts themselves, adequate editions of known texts from that script, philological skills, and the ability to reconstruct linguistic forms from the limited available evidence. The 19th century saw two major successes in decipherment: that of [[Decipherment of ancient Egyptian scripts|Egyptian hieroglyphic]] and [[Decipherment of cuneiform|cuneiform]].<ref name=":5" />


==Notable decipherers==
===Timeline of decipherments===
{| class="wikitable sortable"
{| class="wikitable sortable"
!Script deciphered
!Name of scholar
!Name of scholar
!Script deciphered
!Date
!Date
|-
|-
|[[Staveless runes|Staveless Runes]] (disputed as "decipherment")<ref name=Looijenga>{{cite book |last=Looijenga |first=Tineke |chapter=How the runes were lost and won... |editor-last=Moncunill Martí |editor-first=Noemí |editor2-last=Ramírez Sánchez |editor2-first=Manuel |title=Aprender la escritura, olvidar la escritura: Nuevas perspectivas sobre la historia de la escritura en el Occidente romano |year=2021 |pages=390 |isbn=978-84-1319-317-5|quote= Knowledge of runes did not get lost —it was not necessary to decipher runes, since the use of runes went on in Scandinavia until in the 16th century scholars started to study them (for a short history of runic research see Looijenga 2003, 2-5, and Barnes 2012: 197-212). Knowledge and use of runes appear to be strongest in Sweden. In [[Dalecarlian runes|some remote parts of the land]] runes lived on until the 19th century.}}</ref>
|[[Magnus Celsius]]
|[[Magnus Celsius]]
|[[Staveless runes|Staveless Runes]]
|1674
|1674
|-
|-
|[[Cipher runes]] (disputed as "decipherment")<ref name=Looijenga/>
|[[Jón Ólafsson of Grunnavík]]
|[[Jón Ólafsson of Grunnavík]]
|[[Cipher runes]]
|1740s
|1740s
|-
|-
|[[Palmyrene alphabet]] (described as the first "dead" language decipherment)<ref name=Daniels>{{cite journal | last=Daniels | first=Peter T. | authorlink= Peter T. Daniels |title= "Shewing of Hard Sentences and Dissolving of Doubts": The First Decipherment | journal=Journal of the American Oriental Society | publisher=American Oriental Society | volume=108 | issue=3 | year=1988 | issn=00030279 | jstor=603863 | pages=419–436 | url=http://www.jstor.org/stable/603863 | access-date=2026-03-14|quote= The first dead language to be recovered when its script was deciphered was not Egyptian (as might be supposed from popular and most technical accounts of decipherment), but Palmyrene; the year was 1754, and the scholar was Jean-Jacques Barthélemy}}</ref>
|[[Jean-Jacques Barthélemy]]
|[[Jean-Jacques Barthélemy]]
|[[Palmyrene alphabet]]
|1754
|1754
|-
|-
|[[Phoenician alphabet]]
|[[Jean-Jacques Barthélemy]]
|[[Jean-Jacques Barthélemy]]
|[[Phoenician alphabet]]
|1758
|1758
|-
|-
|[[Pahlavi script]]
|[[Antoine-Isaac Silvestre de Sacy]]
|[[Antoine-Isaac Silvestre de Sacy]]
|[[Pahlavi script]]
|1791
|1791
|-
|-
|[[Demotic Egyptian script|Demotic script]]
|[[Thomas Young (scientist)|Thomas Young]]
|1816
|-
|[[Egyptian hieroglyphs|Egyptian Hieroglyphs]] ([[Decipherment of ancient Egyptian scripts|Decipherment]])
|[[Jean-François Champollion]]
|[[Jean-François Champollion]]
|[[Egyptian hieroglyphs|Egyptian Hieroglyphs]] ([[Decipherment of ancient Egyptian scripts|Decipherment]])
|1822
|1822
|-
|-
|[[Old Persian Cuneiform]] ([[Decipherment of cuneiform|Decipherment]])
|[[Georg Friedrich Grotefend]], [[Eugène Burnouf]], and [[Sir Henry Rawlinson, 1st Baronet|Henry Rawlinson]]
|[[Georg Friedrich Grotefend]], [[Eugène Burnouf]], and [[Sir Henry Rawlinson, 1st Baronet|Henry Rawlinson]]
|[[Old Persian Cuneiform]] ([[Decipherment of cuneiform|Decipherment]])
|1823
|1823
|-
|-
|[[Enno Littmann]]<ref name=":12">{{Cite book |last1=Al-Jallad |first1=Ahmad |url=https://books.google.com/books?id=RI8cEAAAQBAJ&pg=PA1 |title=A Dictionary of the Safaitic Inscriptions |last2=Jaworska |first2=Karolina |date=2019 |publisher=Brill |isbn=978-90-04-40042-9 |series= |location= |pages=3}}</ref>
|[[Brahmi]], [[Kharosthi]]
|[[Safaitic|Safaitic script]]
|[[James Prinsep]]
|1901
|1837
|-
|-
|[[Thomas Young (scientist)|Thomas Young]]
|[[Nabataean script]]
|[[Demotic (Egyptian)|Demotic script]]
|[[Eduard Friedrich Ferdinand Beer]]
|
|1840
|-
|-
|[[Manuel Gómez-Moreno Martínez|Manuel Gómez-Moreno]]
|[[Libyco-Berber]] script (almost fully)
|[[Northeastern Iberian script]]
|[[Louis Félicien de Saulcy]]
|
|1843
|-
|[[James Prinsep]]
|[[Brahmi]], [[Kharosthi]]
|
|-
|-
|Mesopotamian [[Cuneiform]]
|[[Edward Hincks]]
|[[Edward Hincks]]
|Mesopotamian [[Cuneiform]]
|1857
|
|-
|-
|[[Bedřich Hrozný]]
|[[Cypriot syllabary]]
|[[Hittite cuneiform|Hittite Cuneiform]]
|[[George Smith (assyriologist)|George Smith]] and [[Samuel Birch (Egyptologist)|Samuel Birch]], et al.<ref>{{Cite web|url=http://lila.sns.it/mnamon/index.php?page=Scrittura&id=4&lang=en|title=Cypro-Syllabic}}</ref>
|
|1871
|-
|-
|[[Old Turkic]]
|[[Vilhelm Thomsen]]
|[[Vilhelm Thomsen]]
|[[Old Turkic]]
|1893
|
|-
|[[Oracle bone script|Oracle Bone script]]
|[[Wang Yirong|Wáng Yìróng]], [[Liu E (writer)|Liú È]], [[Sun Yirang|Sūn Yíràng]], et al.
|1899
|-
|[[Safaitic|Safaitic script]]
|[[Enno Littmann]]<ref name=":12">{{Cite book |last1=Al-Jallad |first1=Ahmad |url=https://books.google.com/books?id=RI8cEAAAQBAJ&pg=PA1 |title=A Dictionary of the Safaitic Inscriptions |last2=Jaworska |first2=Karolina |date=2019 |publisher=Brill |isbn=978-90-04-40042-9 |series= |location= |pages=3}}</ref>
|1901
|-
|[[Hittite cuneiform|Hittite Cuneiform]]
|[[Bedřich Hrozný]]
|1915
|-
|-
|[[George Smith (assyriologist)|George Smith]] and [[Samuel Birch (Egyptologist)|Samuel Birch]], et al.<ref>{{Cite web|url=http://lila.sns.it/mnamon/index.php?page=Scrittura&id=4&lang=en|title=Cypro-Syllabic}}</ref>
|[[Northeastern Iberian script]]
|[[Cypriot syllabary]]
|[[Manuel Gómez-Moreno Martínez|Manuel Gómez-Moreno]]
|
|1922
|-
|-
|[[Hans Bauer (semitist)|Hans Bauer]] and [[Édouard Paul Dhorme]]<ref>"Anatomy of a Decipherment", http://images.library.wisc.edu/WI/EFacs/transactions/WT1966/reference/wi.wt1966.adcorre.pdf"</ref>
|[[Ugaritic alphabet]]
|[[Ugaritic alphabet]]
|
|[[Hans Bauer (semitist)|Hans Bauer]] and [[Édouard Paul Dhorme]]<ref>"Anatomy of a Decipherment", http://images.library.wisc.edu/WI/EFacs/transactions/WT1966/reference/wi.wt1966.adcorre.pdf {{Webarchive|url=https://web.archive.org/web/20201004170010/http://images.library.wisc.edu/WI/EFacs/transactions/WT1966/reference/wi.wt1966.adcorre.pdf |date=2020-10-04 }}"</ref>
|-
|1930
|[[Wang Yirong|Wáng Yìróng]], [[Liu E (writer)|Liú È]], [[Sun Yirang|Sūn Yíràng]], et al.  
|[[Oracle bone script|Oracle Bone script]]
|
|-
|-
|[[Tangut script]]
|[[Aleksei Ivanovich Ivanov]], [[Nikolai Aleksandrovich Nevsky]], et al.
|[[Aleksei Ivanovich Ivanov]], [[Nikolai Aleksandrovich Nevsky]], et al.
|[[Tangut script]]
|1930s
|
|-
|-
|[[Linear B]]
|[[Michael Ventris]], [[John Chadwick]], and [[Alice Kober]]
|[[Michael Ventris]], [[John Chadwick]], and [[Alice Kober]]
|[[Linear B]]
|1952
|
|-
|-
|[[Maya script|Maya]]
|[[Yuri Knorozov]] and [[Tatiana Proskouriakoff]], et al.  
|[[Yuri Knorozov]] and [[Tatiana Proskouriakoff]], et al.  
|[[Maya script|Maya]]
|1950s
|
|-
|[[Louis Félicien de Saulcy]]
|[[Libyco-Berber]] script (almost fully)
|
|-
|-
|"Enlarged opening script" of [[Ravenna]] (variant of the [[Latin alphabet]])
|[[Jan-Olof Tjäder]]
|[[Jan-Olof Tjäder]]
|"Enlarged opening script" of [[Ravenna]] (variant of the [[Latin alphabet]])
|1955
|
|-
|-
|[[Caucasian Albanian alphabet]]
|[[Zaza Alexidze]]
|[[Zaza Alexidze]]
|[[Caucasian Albanian alphabet]]
|2001
|
|-
|-
|[[Linear Elamite]]
|[[François Desset]]<ref name=":4">{{Cite journal |last1=Desset |first1=François |last2=Tabibzadeh |first2=Kambiz |last3=Kervran |first3=Matthieu |last4=Basello |first4=Gian Pietro |last5=Marchesi |first5=and Gianni |date=2022-07-01 |title=The Decipherment of Linear Elamite Writing |url=https://www.degruyter.com/document/doi/10.1515/za-2022-0003/html |journal=Zeitschrift für Assyriologie und vorderasiatische Archäologie |language=en |volume=112 |issue=1 |pages=11–60 |doi=10.1515/za-2022-0003 |issn=1613-1150}}</ref>
|[[François Desset]]<ref name=":4">{{Cite journal |last1=Desset |first1=François |last2=Tabibzadeh |first2=Kambiz |last3=Kervran |first3=Matthieu |last4=Basello |first4=Gian Pietro |last5=Marchesi |first5=and Gianni |date=2022-07-01 |title=The Decipherment of Linear Elamite Writing |url=https://www.degruyter.com/document/doi/10.1515/za-2022-0003/html |journal=Zeitschrift für Assyriologie und vorderasiatische Archäologie |language=en |volume=112 |issue=1 |pages=11–60 |doi=10.1515/za-2022-0003 |issn=1613-1150}}</ref>
|[[Linear Elamite]]
|2022
|2022
|}
|}
==See also==
{{Portal|Language|Linguistics}}
===Deciphered scripts===
* [[Cuneiform]]
* [[Egyptian hieroglyphs]]
* [[Kharoshthi script|Kharoshthi]]
* [[Linear B script|Linear B]]
* [[Maya script|Mayan]]
* [[Staveless Runes]]
* [[Cypriot syllabary|Cypriot Syllabary]]


===Undeciphered scripts===
===Undeciphered scripts===
Line 188: Line 134:
* [[Espanca script|Espanca]]
* [[Espanca script|Espanca]]
* [[Numidian language]]{{efn|Although the script, [[Libyco-Berber]], has been almost fully deciphered, the language has not.}}
* [[Numidian language]]{{efn|Although the script, [[Libyco-Berber]], has been almost fully deciphered, the language has not.}}
* Unnamed languages
** [[Phaistos Disc]]
** [[Rohonc Codex]]
** [[Voynich Manuscript]]


===Undeciphered texts===
== Categories ==
* [[Phaistos Disc]]
Gelb and Whiting classify the four situations of an undeciphered language and how difficult decipherment will be in each of them:<ref name=":1">{{Cite journal |last1=Gelb |first1=I. J. |last2=Whiting |first2=R. M. |date=1975 |title=Methods of Decipherment |url=https://www.cambridge.org/core/journals/journal-of-the-royal-asiatic-society/article/abs/methods-of-decipherment1/2DD45E0994C82E315715D106656E5293 |journal=Journal of the Royal Asiatic Society |language=en |volume=107 |issue=2 |pages=95–104 |doi=10.1017/S0035869X00132769 |issn=2051-2066|url-access=subscription }}</ref><ref name=":2">{{Cite journal |last1=Braović |first1=Maja |last2=Krstinić |first2=Damir |last3=Štula |first3=Maja |last4=Ivanda |first4=Antonia |date=2024-06-01 |title=A Systematic Review of Computational Approaches to Deciphering Bronze Age Aegean and Cypriot Scripts |url=https://direct.mit.edu/coli/article/50/2/725/119990/A-Systematic-Review-of-Computational-Approaches-to |journal=Computational Linguistics |language=en |volume=50 |issue=2 |pages=725–779 |doi=10.1162/coli_a_00514 |issn=0891-2017|doi-access=free }}</ref>
* [[Rohonc Codex]]
 
* [[Voynich Manuscript]]
* Type O: known writing and known language. Although decipherment in this case is trivial, useful information can be gleaned when a known language is written in an alphabet other than the one it is commonly written in. Studying the writing of the [[Phoenician language|Phoenician]] or [[Sumerian language|Sumerian]] languages in the [[Greek alphabet]] allows information about pronunciation and vocalization to be gleaned that cannot be obtained when studying the expression of these languages in their normal writing system.
* Type I: unknown writing and known language. Deciphered languages in this category include [[Phoenician language|Phoenician]], [[Ugaritic]], [[Cypriot syllabary|Cypriot]], and [[Linear B]]. In this situation, [[alphabet]]ic systems are the easiest to decipher, followed by [[Syllabary|syllabic]] languages, and finally the most difficult being [[Logogram|logo-syllabic]].
* Type II: known writing and unknown language. An example is [[Linear A]]. Strictly speaking, this situation is not one of decipherment but of linguistic analysis. Decipherment in this category is considered extremely difficult to achieve on the basis of internal information only.
* Type III: unknown writing and unknown language. Examples include the Archanes script and the Archanes formula, [[Phaistos Disc|Phaistos disk]], [[Cretan hieroglyphs]], and [[Cypro-Minoan syllabary]]. When this situation occurs in an isolated culture and without the availability of outside information, decipherment is typically considered impossible.
 
== Methods ==
There is no single recipe or linear method for decipherment, however: instead, philologists and linguists must rely on a set of [[Heuristic|heuristic devices]] that have been established. Broadly, it is important to be familiar with the relevant texts where the script or language occurs in, access to accurate drawings or photographs of these texts, information about their relative chronology, and background information on where the texts occur in (their geography, perhaps being found in the context of a funerary monument, etc).<ref name=":5" />
 
These methods can be divided into approaches utilizing external or internal information.<ref name=":1" />
 
=== External information ===
Many successful decipherments have proceeded from the discovery of external information, a common example being through the use of [[multilingual inscription]]s, such as the [[Rosetta Stone]] (with the same text in three scripts: [[Demotic Egyptian script|Demotic]], [[Egyptian hieroglyphs|hieroglyphic]], and [[Greek alphabet|Greek]]) that enabled the decipherment of Egyptian hieroglyphic. In principle, multilingual text may be insufficient for a decipherment as translation is not a linear and reversible process, but instead represents an encoding of the message in a different symbolic system. Translating a text from one language into a second, and then from the second language back into the first, rarely reproduces exactly the original writing. Likewise, unless a significant number of words are contained in the multilingual text, limited information can be gleaned from it.<ref name=":1" />
 
=== Internal information ===
Internal approaches are multi-step: one must first ensure that the writing they are looking at represents real writing, as opposed to a grouping of pictorial representations or a modern-day forgery without further meaning. This is commonly approached with methods from the field of [[grammatology]]. Prior to decipherment of meaning, one can then determine the number of distinct [[grapheme]]s (which, in turn, allows one to tell if the writing system is alphabetic, syllabic, or logo-syllabic; this is because such writing systems typically do not overlap in the number of graphemes they use<ref name=":2" />), the sequence of writing (whether it be from left to right, right to left, top to bottom, etc.), and the determination of whether individual words are properly segmented when the alphabet is written (such as with the use of a space or a different special mark) or not. If a repetitive schematic arrangement can be identified, this can help in decipherment. For example, if the last line of a text has a small number, it can be reasonably guessed to be referring to the date, where one of the words means "year" and, sometimes, a royal name also appears. Another case is when the text contains many small numbers, followed by a word, followed by a larger number; here, the word likely means "total" or "sum". After one has exhausted the information that can be inferentially derived from probable content, they must transition to the systematic application of statistical tools. These include methods concerning the frequency of appearance of each symbol, the order in which these symbols typically appear, whether some symbols appear at the beginning or end of words, etc. There are situations where orthographic features of a language make it difficult if not impossible to decipher specific features (especially without certain outside information), such as when an alphabet does not express double consonants. Additional, and more complex methods, also exist. Eventually, the application of such statistical methods becomes exceedingly laborious, in which computers might be used to apply them automatically.<ref name=":1" />
 
=== Computational approaches ===
Computational approaches towards the decipherment of unknown languages began to appear in the late 1990s.<ref>{{Cite journal |last1=Knight |first1=Kevin |last2=Yamada |first2=Kenji |date=1999 |title=A Computational Approach to Deciphering Unknown Scripts |url=https://aclanthology.org/W99-0906.pdf |journal=Unsupervised Learning in Natural Language Processing}}</ref> Typically, there are two types of computational approaches used in language decipherment: approaches meant to produce translations in known languages, and approaches used to detect new information that might enable future efforts at translation. The second approach is more common, and includes things such as the detection of cognates or related words, discovery of the closest known language, word alignments, and more.<ref name=":2" />
 
=== Artificial intelligence ===
In recent years, there has been a growing emphasis on methods utilizing [[artificial intelligence]] for the decipherment of lost languages, especially through [[natural language processing]] (NLP) methods. Proof-of-concept methods have independently re-deciphered [[Ugaritic alphabet|Ugaritic]] and [[Linear B]] using data from similar languages, in this case [[Hebrew alphabet|Hebrew]] and [[Ancient Greek]].<ref>{{Cite book |last1=Luo |first1=Jiaming |last2=Cao |first2=Yuan |last3=Barzilay |first3=Regina |title=Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics |date=2019 |chapter=Neural Decipherment via Minimum-Cost Flow: From Ugaritic to Linear B |language=en |publisher=Association for Computational Linguistics |pages=3146–3155 |doi=10.18653/v1/P19-1303|arxiv=1906.06718 }}</ref>
 
== Deciphering pronunciation ==
Related to attempts to decipher the meaning of languages and alphabets, include attempts to decipher how extinct writing systems, or older versions of contemporary writing systems (such as English in the 1600s) were pronounced. Several methods and criteria have been developed in this regard. Important criteria include (1) Rhymes and the testimony of poetry (2) Evidence from occasional spellings and misspellings (3) Interpretations of material in one language from authors in foreign languages (4) Information obtained from related languages (5) Grammatical changes in spelling over time.<ref name=":3">{{Cite book |last=Campbell |first=Lyle |title=Historical linguistics: an introduction |date=2021 |publisher=MIT Press |isbn=978-0-262-53159-7 |edition=4th |location= |pages=372–375}}</ref>
 
For example, analysis of poetry focuses on the use of wordplay or literary techniques between words that have a similar sound. [[William Shakespeare|Shakespeare]]'s play ''[[Romeo and Juliet]]'' contains wordplay that relies on a similar sound between the words "soul" and "soles", allowing confidence that the similar pronunciation between the terms today also existed in Shakespeare's time. Another common source of information on pronunciation is when earlier texts use [[rhyme]], such as when consecutive lines in poetry end in the similar or the same sound. This method does have some limitations however, as texts may use rhymes that rely on visual similarities between words (such as 'love' and 'remove') as opposed to auditory similarities, and that rhymes can be [[Perfect and imperfect rhymes|imperfect]]. Another source of information about pronunciation comes from explicit description of pronunciations from earlier texts, as in the case of the ''Grammatica Anglicana'', such as in the following comment about the letter <o>: "In the long time it naturally soundeth sharp, and high; as in chósen, hósen, hóly, fólly [. . .] In the short time more flat, and a kin to u; as còsen, dòsen, mòther, bròther, lòve, pròve".<ref>{{Cite book |last1=Burridge |first1=Kate |title=Understanding language change |last2=Bergs |first2=Alexander |date=2017 |publisher=Routledge, Taylor & Francis Group |isbn=978-0-415-71339-9 |series=Understanding language series |location=London New York |pages=234–235}}</ref> Another example comes from detailed comments on pronunciations of [[Sanskrit]] from the surviving works of Sanskrit grammarians.<ref name=":3" />
 
== Challenges ==
Many challenges exist in the decipherment of languages, including when:<ref name=":0" /><ref name=":2" />
 
* When it is not known which language is closest to it.
* When the words in the script are not clearly segmented, like in some [[Iberian language]]s.
* When the writing system is not known. In specific, if there is little certainty towards the number of graphemes that exist in a certain writing system, it cannot be determined if that system is an alphabet, a syllabry, a logosyllabry, or something else.
* When the reading direction is not known. For example, it may not be clear if a writing system is meant to be read from left to right, or from right to left.
* When it is not known if a script uses punctuation or spaces between words.
* When the language of a script subject to decipherment efforts is not known.
* When there is a small dataset available to learn about the properties of a script. This could lead to issues such as an incomplete vocabulary being known for the script.
* When the typical order between subjects, objects, and verbs is not known.
* When it is not known whether or how certain words can change their form.
* When it is not known when multiple symbols are used to represent the same sound, syllable, word, concept, or idea (allographs).
* When it is not clear how the penmanship or the style of writing of a particular scribe relates to the style of writing of another scribe working in the same text (the same letters or words might be written in a way that looks different), in which case it is difficult to correlate information across multiple examples of the use of the writing system.
* When it is not known if certain words change their meaning depending on the context they appear in (homonyms).
* When the context of discovery of a writing is not known. This is because information about the location out of which a writing system came from can provide valuable information about its relationship to known languages.
* When adequate digital datasets for documented writing systems is not available, limiting the ability to use computational methods for decipherment.
* When sufficient hardware resources, such as [[High-performance computing|high performance computing]], is not available (which might be necessary for more energy-intensive computational methods).
 
== Relationship to cryptanalysis ==
Decipherment overlaps with another technical field known as [[cryptanalysis]], a field that aims to decipher writings used in secret communication, known as [[ciphertext]]. A famous case of this was in the [[cryptanalysis of the Enigma]] during the [[World War II]]. Many other ciphers from past wars have only recently been cracked.<ref>{{Cite journal |last=Bauer |first=Craig P. |date=2023-03-04 |title=The new golden age of decipherment |url=https://www.tandfonline.com/doi/full/10.1080/01611194.2023.2170158 |journal=Cryptologia |language=en |volume=47 |issue=2 |pages=97–100 |doi=10.1080/01611194.2023.2170158 |issn=0161-1194}}</ref> Unlike in language decipherment, however, actors using ciphertext intentionally lay obstacles to prevent outsiders from uncovering the meaning of the communication system.<ref name=":1" />
 
==See also==
{{Portal|Language|Linguistics}}


==References==
==References==
Line 199: Line 199:


== Further reading ==
== Further reading ==
* {{Cite book |last=Daniels |first=Peter T. |title=A Companion to Ancient Near Eastern Languages |date=2020 |publisher=Wiley |editor-last=Hasselbach-Andee |editor-first=Rebecca |pages=1–25 |chapter=The Decipherment of Ancient Near Eastern Languages}}
* {{Cite book |last=Daniels |first=Peter T. |title=A Companion to Ancient Near Eastern Languages |date=2020 |publisher=Wiley |editor-last=Hasselbach-Andee |editor-first=Rebecca |pages=1–25 |chapter=The Decipherment of Ancient Near Eastern Languages}}
* {{Cite journal |last1=Ferrera |first1=Silvia |last2=Tamburini |first2=Fabio |date=2022 |title=Advanced techniques for the decipherment of ancient scripts |url=https://www.rivisteweb.it/doi/10.1418/105964 |journal=Lingue e Linguaggio |issue=2 |pages=239–259|doi=10.1418/105964 }}
* {{Cite journal |last1=Ferrera |first1=Silvia |last2=Tamburini |first2=Fabio |date=2022 |title=Advanced techniques for the decipherment of ancient scripts |url=https://www.rivisteweb.it/doi/10.1418/105964 |journal=Lingue e Linguaggio |issue=2 |pages=239–259|doi=10.1418/105964 }}