Please describe your proposed solution.
Problem:
The recent rise in AI and ML resulted in the lack of available GPU computing power on the market which is dominated by big cloud service providers and are often expensive and out of reach.
Currently there is no real-world use-case for distributed scaling that works on a decentralized network. There is a need for a globally distributed computing infrastructure that seamlessly works on consumer devices and machines. That can be achieved by splitting large scale models into separate containers deployable on decentralized infrastructure for parallel execution and combination of results.
Unique solution:
Currently NuNet enables GPU computing on multiple GPU cards on one machine for a single task which was implemented as part of funded Fund8 proposal NuNet: Decentralized GPU ML Cloud. Splitting large scale execution models into containerized components running and communicating in parallel on decentralized hardware is the next step.
Detailed approach:
The general process of distributing a single GPU job (training/inference/general purpose computing) across multiple nodes in a decentralized network using the specified tools and techniques, is briefly outlined as follows:
1. Job Preparation:
Prepare the specific task - it could be training a machine learning(ML) model or performing inference using a pre-trained model with the Python program ready for the task at hand. It could also be a non-ML computational Python program.
2. Environment Setup:
Wrap the Python script and the necessary libraries (like TensorFlow, PyTorch, or other machine learning or computational libraries) into a standardized unit, which we'll refer to as the 'distributed job.' The distributed job will also include tools for distributed processing (like Horovod) and network communication (like libp2p).
3. Node Configuration:
Arrange the nodes in the network and ensure they have the necessary tools to handle the 'distributed job'. All nodes should be connected through a communication protocol, such as libp2p.
4. Job Splitting:
Distribute the job across the nodes in the network. Each node now has an identical setup and is capable of executing the task independently.
5. Task Initialization:
Initiate the task using distributed processing tools. For training, the data is split among nodes, and the model is trained simultaneously on each node. For inference, the nodes make predictions independently on a subset of the data.
6. Inter-container Communication:
As the nodes execute the task inside containers, they would use peer to peer network communication such as libp2p to share and synchronize their work.
7. Task Finalization:
Once the task is complete, we gather the results. For training, the final model parameters could be accessed from any of the nodes. For inference, we may need to collect and compile the prediction results from each node. For general purpose computation, we simply collect the results.
Benefits for the Cardano ecosystem:
The research is a continuation and expansion of the already completed Fund8 proposal. It will enable all dapps and usecases in the web2 and web3 space that need GPU computing power to source it via NuNet. The value for the compute provided will be exchanged via NTX token which is a Cardano Native Token.
Each transaction will be executed as a Smart Contract on the Cardano blockchain which will directly increase the volume of tx, volume of CNT as well as provide unique use cases to be built on top of it for the Cardano ecosystem.
How does your proposed solution address the challenge and what benefits will this bring to the Cardano ecosystem?
The proposal addresses the following directions of the challenge:
- Deployment, testing, and monitoring frameworks
- Knowledge base & Documentation
The research done in this proposal would lead to the development of the NuNet framework to be available as Open Source to all the users in the Cardano ecosystem and wider with further development. In order for the Open Source community to use NuNet, extensive knowledge base, documentation and step-by-step procedures shall be prepared.
The current hot trends are in AI and large scale machine learning and are not slowing down. GPU computing is the main aspect of it which this research and development results out of it will tap into.
NuNet is building technology that will allow people to provision hardware for AI/ML jobs monetized via Cardano ecosystem; in the short term and in case of success, that may boost Cardano usage; in the long term, it would connect real-world assets (computing power) and crypto payment space with the help of Cardano integration.
NuNet builds a potentially disruptive technology where it has a potential to grab a share of the global computing market valued at 548 B USD, with a potential to grow to 1240 B USD. Grabbing just a fraction would result in potentially huge values being moved via Cardano Smart Contracts. Based on this proposal, research, an implementation shall proceed where more precise estimation on the number of users could take place. Anyone in the Cardano ecosystem could deploy and use the cheaper GPU cluster resources for AI, ML, rendering and many other applications. It is a fundamental enabling technology.
Source:
<https://www.marketsandmarkets.com/Market-Reports/cloud-computing-market-234.html#:~:text=The%20global%20Cloud%20Computing%20Market,at%20a%20CAGR%20of%2017.9%25>.
How do you intend to measure the success of your project?
This project will result in the implementation of a way to distribute large scale models that need GPU computing resources (mostly machine learning and AI related) on a decentralized network of hardware resources owned by the community. In case of success, it will provide the ability to access these resources to the groups of users, which are currently excluded (due to high price and low availability of GPU resources, as explained in the problem statement).
After completing this project, we expect a substantial increase of deployment requests on NuNet network which uses Cardano SC for tokenomics and settlement layer, which in turn will increase transactions on Cardano network and further develop real use-cases in Cardano ecosystem.
Some of the direct benefits to the Cardano ecosystem are:
- Number of projects using cheaper GPU resources for AI/ML tasks
- Computing resources used in the processes are to be compensated in NTX, which is a Cardano Native Token
- Each exchange of value will be done as a Smart Contract on Cardano
- Currently over 2000+ people are in NuNet Discord testing the various builds of the NuNet platform
Some of the indirect benefits to the Cardano ecosystem are:
- Cardano becomes the settlement layer for decentralized Open Source computing frameworks used in training AI/ML models
- Other solutions can be built on top of the framework, greatly expanding the potential business models
- With the right onramp/offramp solutions Web2 users can utilize compute power without even realizing the Web3 layer underneath. NuNet is interested in joint work with the experts in this field.
Please describe your plans to share the outputs and results of your project?
Spreading Outputs Over a Timescale
Our project plan includes clear milestones and deliverables, which will be shared publicly as they are completed. This incremental release of outputs will ensure a continuous stream of updates for the community.
This approach lets us provide updates on a regular basis, and offers users the chance to provide feedback that we can use to guide subsequent development.
Sharing Outputs, Impacts, and Opportunities
We intend to leverage various communication channels to share our project's outputs, impacts, and opportunities:
- GitLab: The primary hub for our technical work, hosting our codebase, documentation, and issue tracking. This will be the main point of reference for the details of our project.
- Social Platforms: We plan to regularly post updates on our progress on platforms like Twitter, LinkedIn, and Reddit. This will include major milestones, bug fixes, and insights from our development work.
- Technical Discussions: We will continue to hold weekly technical discussion where we discuss the technical aspects of our work. This provides a forum for live Q&A and discussion with our community.
- Blogs: A regular blogs to summarize the progress we have made, highlighting key achievements and outlining the next steps in our project.
Testing and further research
As an open-source project, our outputs will be freely accessible for further research and development. We encourage the community's involvement in testing our solutions to enhance their real-world performance.
Community Testing: We'll invite our users to participate in alpha and beta testing phases, where they can help identify bugs and suggest improvements. We'll use GitLab's issue tracking for managing feedback and provide guidelines for issue reporting and feature suggestions.
Internally, we'll use project insights and community feedback to guide our future work, optimize performance, and prioritize new features. Our aim is to foster a collaborative development ecosystem that is robust, relevant, and of high quality.