Indus Valley Scripts
Can we use machine learning to decode ancient scripts?
The Indus Valley scripts were found in the early 1900s. They are still waiting to be decyphered. The problem is reminescient to the Egyptian scripts which required the Rosetta Stone. Rather than waiting to find a new rosetta stone for the Indus Valley scripts what if we could build one?
Natural Language Processing
Languages seem to follow a certain structure which allows translating from one language to another. For example, “king” minus “man” plus “woman” equals queen.
This vector relationship exists for all languages. We can even extend this idea. Suppose to use a t-SNE plot to visualize words in English, French, German, and Esperanto. With clever rotations we would find that the seemingly random cloud of dots would map from one language to another. So, we can use this vector approach for a pretty decent word-for-word translation from one language to another.
The idea that languages map so well had lead a group of scientists to decode animal language. The Earth Species Project is one such team and is oriented towards changing out ecological impact.
Screen shot of Earth Species Project mission statement from their website.
If they are so certain of decoding animal language then why can’t we decypher an ancient language?