Project Anuvaad has been conceptualised to provide translation capabilities for Indic languages. Project Anuvaad is open sourced under the MIT license and is funded by EkStep foundation.
What we do ?
The goals of Project Anuvaad include:
Train deep learning models for Indic language using these and other publicly available corpora. It is our goal to have high quality (Neural Machine Translation) NMT models for all major Indian languages. As of May 2020 we have models for nine Indian languages.
Create parallel corpora that can be used to train NMT Tools and utilities to help create such parallel corpus. These copora may be may be general or domain specific. It is the stated goal of the project to create the largest publicly available parallel corpora in Indic languages.
Develop interactive translation tools to help users to obtain “final” translated output.
Develop and maintain open source implementations of OCR tools in Indic scripts for pre-processing of documents in Indic languages
Why do this ?
90% of India does not speak English. The Eighth Schedule of the Indian Constitution lists 22 official languages with 6,000-plus dialects and 55-plus languages with 1 million-plus speakers. Hence, it goes without saying that translation is an important national priority.
Machine Translation (and Natural Language Processing (NLP) in general) is a field that has made dramatic progress in the last few years. While the core technology is available as open source, there is no credible open source translation alternative for Indic languages. Project Anuvaad hopes to fill this gap and help us take control of our own languages.
An example of domain where our technology can impact society is the judicial system. Reducing the time and effort to obtain high quality translations to and from Indian languages can help quality translations can signifcantly reduce pendencies in the judicial system. Project Anuvaad has assisted the Honorable Supreme Court of India in the launch of SUVAS to help make progress in this matter.
Project Anuvaad is leveraging various deep learning open-source projects to build a ready to use translation solution, specifically for Indic languages. In addition, tools and utilities built will also be open sourced using the MIT license.
We are releasing parallel corpora that have been collected from public sources and converted into a form the NMT tools can use. The first release of parallel corpora will consist of the data from legal and official documents. We hope these corpora would be useful for trying to improve the state-of-art in NMT tools solutons.
We believe in openness and transparency. We invite language enthusiasts to experience what we have built.
Why not experience the legal document translation firsthand ?
While we are preparing Anuvaad for beta launch, we currently signing up invite-only user. Please fill-in a short form to express your participation, Team Anuvaad will definitely reach out to you !
We are Team Anuvaad, a place of Indic language translation for legal domain.