Transformers-based experimental AI Models written in Java based on DeepLearning4J framework. The repository is located here.
This project is an experimental work in the field of Artificial Intelligence and Natural Language Processing (NLP). It aims to implement and explore the models based on the Transformer Architecture with different modifications aimed to enhance the overall models efficiency. The project is written in Java and utilizes the DeepLearning4J framework’s Samediff layers as the core of neural networks which stand behind each of the models implemented in this project.
The project includes the following features:
Please note that this project is still a work in progress. While the core functionality is implemented, there are ongoing developments and improvements being made. As a result, the documentation is not yet complete, and some parts of the project may lack detailed explanations. There are currently also no unit/integration tests implemented due to insufficient resources.
To use the project, follow these steps:
This Readme file is an entry point which provides a lot of information on how to run the model training and/or inference.
Contributions to the project are welcome. If you find any issues, have suggestions for improvements, or would like to add new features, please feel free to submit a pull request. However, due to the project being a work in progress, please ensure that your contributions align with the project’s direction and goals. In order to avoid formatting issues, please use the cody style formatter config in code_style folder.
As mentioned earlier, the documentation for this project is still under development. While some information and usage instructions are provided, please be aware that not all aspects of the project may be thoroughly documented at this stage.
The project currently does not have many comments within the code. However, efforts are being made to improve code readability and add comments to enhance understanding.
The project is released under the Apache 2.0 License. Feel free to modify and distribute the code within the terms of the license.
This project is inspired by the original “Attention Is All You Need” paper and is built upon the DeepLearning4J framework.
I’d like to express my gratitude to all the authors of the original “Attention Is All You Need” paper as well as to the creators and contributors of DeepLearning4J framework.
For any inquiries or additional information regarding the project, please contact me via e-mail.
In case you notice any bugs or you have any enhancement suggestions - please create an issue. I’m checking those at a regular basis and will try to answer as soon as possible.