The decentralized finance (DeFi) sector is transforming traditional financial systems by enabling transparent and efficient financial transactions on blockchain. Among the prominent protocols, Compound V2 allows users to lend and borrow cryptocurrencies. However, the dynamic interest rates for supply and borrow operations depend heavily on market conditions, which makes predicting them crucial for maximizing returns and minimizing risks. Initially, this project relied on GizaTech's dataset to build predictive models. Now, by leveraging Squid's SDK to gather real-time blockchain transaction data from tokens like WBTC and USDT, the dataset has been significantly enriched, leading to improved prediction performance. This real-time data incorporation addresses the need for accurate, timely predictions, giving users a competitive edge in managing DeFi investments.
Squid Rate AI combines advanced machine learning models with enriched blockchain datasets to predict supply and borrow rates on Compound V2. Using GizaTech's Zero-Knowledge Machine Learning (ZKML) framework, the solution enhances privacy while ensuring accurate predictions. By integrating Squid's SDK, Squid Rate AI leverages real-time blockchain data on WBTC and USDT transactions, which enriches the dataset and enables more precise forecasting. These predictions allow users to make informed decisions, optimize their portfolios, and manage risks in the ever-evolving DeFi market. Squid Rate AI offers reliable insights, supporting users with predictive analytics that leverage both historical and real-time data.
Squid Rate AI provides DeFi users with accurate predictions of Compound V2 supply and borrow rates by combining Compound v2 dataset with real-time blockchain data. The use of Squid's SDK to gather token-specific transaction data ensures that the predictions are more relevant and timelier. This enriched dataset enhances the precision of machine learning models, helping users optimize their lending and borrowing strategies. Whether for risk management, portfolio optimization, or algorithmic trading, Squid Rate AI empowers users to stay ahead of market fluctuations and make data-driven decisions that maximize their returns.
The following contract addresses were passed to the EvmBatchProcessor constructor to extract data which were specifically aimed to enrich the Compound V2 dataset from GizaTech's Dataset. Considering the volume of the dataset, USDT and WBTC was focused and primarily used during the data analysis, but other tokens were also considered during the mining process.
export const contractAddresses = {Matic: '0x7D1AfA7B718fb893dB30A3aBc0Cfc608AaCfeBB0'.toLowerCase(), WBTC: '0x2260FAC5E5542a773Aa44fBCfeDf7C193bc2C599'.toLowerCase(), USDT: '0xdAC17F958D2ee523a2206206994597C13D831ec7'.toLowerCase(), USDC: '0xA0b86991c6218b36c1d19D4a2e9Eb0cE3606eB48'.toLowerCase(), WETH: '0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2'.toLowerCase(), DAI: '0x6B175474E89094C44Da98b954EedeAC495271d0F'.toLowerCase()};
The total traded amount in each day for each token was aggregated by the following Market Activity timeframes to enhance the information that can be contained from the original Compound V2 Dataset.
To synchronize the timeframes of the Squid's extracted dataset for the two tokens (USDT and WBTC) with the Compound V2 dataset timeframe, we first extracted both the oldest and latest timestamps from the extracted dataset for each token:
Oldest Timestamps: {'USDT data': Timestamp('2017-11-28 15:38:10+0000', tz='UTC'), 'WBTC data': Timestamp('2023-08-26 16:30:59+0000', tz='UTC')} Latest Timestamps: {'USDT data': Timestamp('2024-08-07 18:11:59+0000', tz='UTC'), 'WBTC data': Timestamp('2024-07-31 05:25:11+0000', tz='UTC')}
We then extracted the global oldest and latest timestamps for each token, which resulted to the following:
Global Oldest Timestamp: 2023-08-26 16:30:59+00:00 Global Latest Timestamp: 2024-07-31 05:25:11+00:00
Both tokens dataset must synchronize on the same timeframe so that we can avoid having null values when we integrate it with the Compound V2 dataset. We then use the Global Oldest Timestamp and Global Latest Timestamp we then use it to truncate the original Compound V2 dataset to these bounds. The tradeoff is that we will be losing more information on the original dataset to capture a wider timeframe, but we could focus more on a specific timeframe on how both the supply and borrow rates works.
Based on the jupyter notebook from the code repository here's the comparison of the models (supply rate focused) that were trained on enriched Compound v2 dataset and the non-enriched one.
Original Compound v2 Dataset Results:
Training RMSE: 0.37868252396583557
Training R-squared (R²) score: 0.8565995572197411
Validation RMSE: 0.663108229637146
Validation R-squared (R²) score: 0.7677438088501785
Test RMSE: 1.4575222730636597
Test R-squared (R²) score: -5.9971057203683245
Enriched Compound v2 Dataset with (Squid Extracted Data) Results:
Training RMSE: 0.1319979578256607
Training R-squared (R²) score: 0.9825765381387938
Validation RMSE: 0.5409678220748901
Validation R-squared (R²) score: 0.8454242812656314
Test RMSE: 0.8830604553222656
Test R-squared (R²) score: -1.5684374082940638
Percentage Improvement from Original to Enriched Dataset
Training RMSE Improvement:
Training R² Improvement:
Validation RMSE Improvement:
Validation R² Improvement:
Test RMSE Improvement:
Test R² Improvement:
Note: The R² score on the test set may not be ideal, but this is largely due to the data synchronization constraints with the Squid Extracted Dataset, limiting the timeframe we could capture. The primary goal of this analysis was to demonstrate whether integrating Squid's data could enhance the model's predictive performance. The results show that it does, as seen by the significant improvements in both training and validation scores when comparing the original and enriched datasets. This suggests that, in the future, as more data is extracted over time, the model's predictive capabilities can be further enhanced by capturing a broader and more comprehensive timeframe during training.
Training of Supply Rate Prediction Model Enriched with Extracted Data from Squid
The enriched trained supply rate model (built with PyTorch) will be converted to ONNX format. This allows for seamless conversion of the model into Cairo, the language used for generating proof of computation for model inferences. The model must be converted and deployed to an endpoint before running inferences using the Giza CLI. The same process can be done will also be done with the borrow rate model.
The following code snippet shows how we can generate a zero-knowledge proof from the inferences/predictions of our transpiled and deployed model using the GizaModel module from the giza actions library.
prediction_result, proof_id = prediction_model.predict(input_feed={"input": data_for_prediction}, verifiable=True #Set to true to generate proof id for your model)
We can now verify the proof using the following command on the Giza CLI:
giza verify --proof-id PROOF_ID
Squid Rate AI Jupyter Notebook
"I am into Robotics, Electronics, Programming, Data Science, Data Analysis, Business Intelligence, Machine Learning, Deep Learning, and ZKML Solutions & Applications"
Recent Awards:- Connext AI Solutions Hackathon 1st Place (2024) (Developed an AI chatbot using finance and payroll documents of the company.) See More
- Gizathon Top AI Action 2nd Place (2024) See More
- Starknet Infra Hackathon Overall Best Project (2023) See More
- FWD Regional Insurtech Data Hackathon Top 10 Finalist (2022) (Developed an ML model for Insurance Products Preferences of Customers) See More
- Build the Future Hackathon First Placer (2022) See More
- Fishackathon (Wild Fisheries) Finalist (2022) See More
- Galileo Hackathon Philippines First Placer (2021) See More
- Planetary Health Hackathon First Placer (2021) See More
- Taikai Top 100 Builders (Rank 26) See More
I am Mark Lloyd Cuizon, a driven and dedicated tech enthusiast, with a strong foundation in AI, machine learning, data analysis, and software development. My passion lies in applying cutting-edge technology to solve real-world problems, particularly in the fast-evolving world of AI. I thrive in dynamic environments where I can leverage my strong problem-solving skills, keen attention to detail, and adaptability.
Prominent Hackathon Participations