Short Description
DLLava: A Decentralised Multi Modal Large Language Model leveraging blockchain for secure, innovative image description.
Motivation Behind the Project
The motivation behind DLLava initially stemmed from an ambitious goal to create an AI tool that could assist doctors and patients in analysing CT scans. However, due to resource constraints, the project's focus shifted towards developing an advanced system capable of describing images. This pivot doesn't diminish the core ambition; rather, it showcases adaptability in tackling a pressing issue in the AI space: the need for secure, transparent, and decentralised AI solutions. Traditional large language models operate within centralized frameworks, which are fraught with potential issues like censorship, data manipulation, and privacy breaches. DLLava seeks to mitigate these concerns by leveraging blockchain technology, thereby paving the way for a new paradigm of AI that excels in multi-modal data analysis while championing security and transparency. Cartesi's blockchain infrastructure emerges as an ideal platform for DLLava, offering the computational integrity and scalability essential for handling extensive datasets. At the same time, it guarantees user privacy and data security, aligning perfectly with the project's foundational motives.
Detailed Description
Tools & Technology Stack:
AI & Machine Learning: PyTorch for model training and inference.
Blockchain: Cartesi Rollups for decentralized computation, Ethereum for smart contracts.
Architecture:
DLLava's architecture is centered around a decentralized application (dApp) running on Cartesi, with a large language model at its core capable of analyzing and describing images.
Users interact with the dApp via a web interface.
Smart contracts manage requests and responses, ensuring integrity and transparency throughout the process.
The AI model runs within Cartesi's Linux environment, allowing complex computations without compromising the blockchain's security.
Challanges Faced
Our main challenge was the compatibility of most Python libraries with the RISC-V architecture. Many of the libraries we intended to use did not support RISC-V out of the box, necessitating us to compile them for the target architecture. This process was both hard and time-consuming, significantly impacting our development timeline. In fact, this challenge led us to pivot our project direction twice. Despite these hurdles, we persevered, adapting our project scope and learning valuable lessons about cross-compilation and the nuances of working with emerging technologies like RISC-V in the context of decentralised applications.