NuNet: Decentralized GPU Splitting on software level // Splitting large scale compute work into small containers optimized for deployment on decentralized hardware is a necessary component of any decentralized system.

Funds Fund 10 Proposals Developer Ecosystem - The Evolution NuNet: Decentralized GPU Splitting on software level // Spli...

not approved

View on Ideascale View on projectcatalyst.io

Current Project Status

Unfunded

Amount
Received

₳0

Amount
Requested

₳369,400

Percentage
Received

0.00%

Solution

NuNet is a decentralized, open source, peer-to-peer computing alternative to big cloud providers. Currently NuNet enables computing on one GPU on one machine for one task. Scaling up is the next step

Problem

The recent rise in AI and ML has resulted in the lack of available GPU computing power on the market which is dominated by big cloud service providers and are often expensive and out of reach.

Feasibility

Value for money

Impact / Alignment

Impact

Please describe your proposed solution.

Problem:

The recent rise in AI and ML resulted in the lack of available GPU computing power on the market which is dominated by big cloud service providers and are often expensive and out of reach.

Currently there is no real-world use-case for distributed scaling that works on a decentralized network. There is a need for a globally distributed computing infrastructure that seamlessly works on consumer devices and machines. That can be achieved by splitting large scale models into separate containers deployable on decentralized infrastructure for parallel execution and combination of results.

Unique solution:

Currently NuNet enables GPU computing on multiple GPU cards on one machine for a single task which was implemented as part of funded Fund8 proposal NuNet: Decentralized GPU ML Cloud. Splitting large scale execution models into containerized components running and communicating in parallel on decentralized hardware is the next step.

Detailed approach:

The general process of distributing a single GPU job (training/inference/general purpose computing) across multiple nodes in a decentralized network using the specified tools and techniques, is briefly outlined as follows:

1. Job Preparation:

Prepare the specific task - it could be training a machine learning(ML) model or performing inference using a pre-trained model with the Python program ready for the task at hand. It could also be a non-ML computational Python program.

2. Environment Setup:

Wrap the Python script and the necessary libraries (like TensorFlow, PyTorch, or other machine learning or computational libraries) into a standardized unit, which we'll refer to as the 'distributed job.' The distributed job will also include tools for distributed processing (like Horovod) and network communication (like libp2p).

3. Node Configuration:

Arrange the nodes in the network and ensure they have the necessary tools to handle the 'distributed job'. All nodes should be connected through a communication protocol, such as libp2p.

4. Job Splitting:

Distribute the job across the nodes in the network. Each node now has an identical setup and is capable of executing the task independently.

5. Task Initialization:

Initiate the task using distributed processing tools. For training, the data is split among nodes, and the model is trained simultaneously on each node. For inference, the nodes make predictions independently on a subset of the data.

6. Inter-container Communication:

As the nodes execute the task inside containers, they would use peer to peer network communication such as libp2p to share and synchronize their work.

7. Task Finalization:

Once the task is complete, we gather the results. For training, the final model parameters could be accessed from any of the nodes. For inference, we may need to collect and compile the prediction results from each node. For general purpose computation, we simply collect the results.

Benefits for the Cardano ecosystem:

The research is a continuation and expansion of the already completed Fund8 proposal. It will enable all dapps and usecases in the web2 and web3 space that need GPU computing power to source it via NuNet. The value for the compute provided will be exchanged via NTX token which is a Cardano Native Token.

Each transaction will be executed as a Smart Contract on the Cardano blockchain which will directly increase the volume of tx, volume of CNT as well as provide unique use cases to be built on top of it for the Cardano ecosystem.

How does your proposed solution address the challenge and what benefits will this bring to the Cardano ecosystem?

The proposal addresses the following directions of the challenge:

Deployment, testing, and monitoring frameworks
Knowledge base & Documentation

The research done in this proposal would lead to the development of the NuNet framework to be available as Open Source to all the users in the Cardano ecosystem and wider with further development. In order for the Open Source community to use NuNet, extensive knowledge base, documentation and step-by-step procedures shall be prepared.

The current hot trends are in AI and large scale machine learning and are not slowing down. GPU computing is the main aspect of it which this research and development results out of it will tap into.

NuNet is building technology that will allow people to provision hardware for AI/ML jobs monetized via Cardano ecosystem; in the short term and in case of success, that may boost Cardano usage; in the long term, it would connect real-world assets (computing power) and crypto payment space with the help of Cardano integration.

NuNet builds a potentially disruptive technology where it has a potential to grab a share of the global computing market valued at 548 B USD, with a potential to grow to 1240 B USD. Grabbing just a fraction would result in potentially huge values being moved via Cardano Smart Contracts. Based on this proposal, research, an implementation shall proceed where more precise estimation on the number of users could take place. Anyone in the Cardano ecosystem could deploy and use the cheaper GPU cluster resources for AI, ML, rendering and many other applications. It is a fundamental enabling technology.

Source:

<https://www.marketsandmarkets.com/Market-Reports/cloud-computing-market-234.html#:~:text=The%20global%20Cloud%20Computing%20Market,at%20a%20CAGR%20of%2017.9%25>.

How do you intend to measure the success of your project?

This project will result in the implementation of a way to distribute large scale models that need GPU computing resources (mostly machine learning and AI related) on a decentralized network of hardware resources owned by the community. In case of success, it will provide the ability to access these resources to the groups of users, which are currently excluded (due to high price and low availability of GPU resources, as explained in the problem statement).

After completing this project, we expect a substantial increase of deployment requests on NuNet network which uses Cardano SC for tokenomics and settlement layer, which in turn will increase transactions on Cardano network and further develop real use-cases in Cardano ecosystem.

Some of the direct benefits to the Cardano ecosystem are:

Number of projects using cheaper GPU resources for AI/ML tasks
Computing resources used in the processes are to be compensated in NTX, which is a Cardano Native Token
Each exchange of value will be done as a Smart Contract on Cardano
Currently over 2000+ people are in NuNet Discord testing the various builds of the NuNet platform

Some of the indirect benefits to the Cardano ecosystem are:

Cardano becomes the settlement layer for decentralized Open Source computing frameworks used in training AI/ML models
Other solutions can be built on top of the framework, greatly expanding the potential business models
With the right onramp/offramp solutions Web2 users can utilize compute power without even realizing the Web3 layer underneath. NuNet is interested in joint work with the experts in this field.

Please describe your plans to share the outputs and results of your project?

Spreading Outputs Over a Timescale

Our project plan includes clear milestones and deliverables, which will be shared publicly as they are completed. This incremental release of outputs will ensure a continuous stream of updates for the community.

This approach lets us provide updates on a regular basis, and offers users the chance to provide feedback that we can use to guide subsequent development.

Sharing Outputs, Impacts, and Opportunities

We intend to leverage various communication channels to share our project's outputs, impacts, and opportunities:

GitLab: The primary hub for our technical work, hosting our codebase, documentation, and issue tracking. This will be the main point of reference for the details of our project.
Social Platforms: We plan to regularly post updates on our progress on platforms like Twitter, LinkedIn, and Reddit. This will include major milestones, bug fixes, and insights from our development work.
Technical Discussions: We will continue to hold weekly technical discussion where we discuss the technical aspects of our work. This provides a forum for live Q&A and discussion with our community.
Blogs: A regular blogs to summarize the progress we have made, highlighting key achievements and outlining the next steps in our project.

Testing and further research

As an open-source project, our outputs will be freely accessible for further research and development. We encourage the community's involvement in testing our solutions to enhance their real-world performance.

Community Testing: We'll invite our users to participate in alpha and beta testing phases, where they can help identify bugs and suggest improvements. We'll use GitLab's issue tracking for managing feedback and provide guidelines for issue reporting and feature suggestions.

Internally, we'll use project insights and community feedback to guide our future work, optimize performance, and prioritize new features. Our aim is to foster a collaborative development ecosystem that is robust, relevant, and of high quality.

Capability/ Feasibility

What is your capability to deliver your project with high levels of trust and accountability?

Illustration of Capacity:

Our organization comes with a history of successfully bringing intricate technology projects to fruition. The pillars of our success lie in our deep-rooted technical understanding, stringent project management practices, and an unwavering focus on transparency and responsibility.

Our team is populated with seasoned software engineers with excellent skills to leverage containerization (Docker), distributed computing (Horovod), and peer-to-peer networking (Go libp2p). NuNet past work includes the implementation of projects similar to the one proposed here, showcasing our readiness to tackle the unique challenges this project poses.

NuNet is committed to Open Source Software development from the inception. Therefore, all our development and progress is available for public scrutiny at all times as well as open collaboration with the community. We actively invite and work with the community in regards to contribution, usage, work and testing of the platform codebase.

Link: <https://gitlab.com/nunet>

NuNet licencing policy:

<https://docs.nunet.io/nunet-licensing-policy/>

Openness and Responsibility:

We have established a robust framework to ensure openness and responsibility in the execution of the project and the management of finances:

1. Elaborate Budgeting: We present an exhaustive budget layout at the start of the project that details the fund allocation across various tasks. This leaves no room for ambiguity regarding the utilization of funds.

2. Periodic Reporting: Regular updates regarding the project and financial statements will be shared, offering complete transparency into the progression of the project and the use of funds.

3. External Auditing: We are open to audits conducted by independent third parties at regular intervals. This ensures responsibility and openness in our financial management.

4. Escrow Mechanisms: To further reassure proper use of funds, we can utilize an escrow service. This arrangement ensures that the project funds are held by a third party and released according to pre-set milestones. This provides an extra layer of assurance for the funds.

5. Payment Based on Milestones: Our payment structure is built around specific, agreed-upon milestones. This ensures that funds are released as we achieve these milestones. The completion of each milestone can be verified, ensuring you pay only for verifiable progress.

These measures reflect our commitment to openness, responsibility, and proper management of funds. We believe that these factors, along with our technical capabilities, make us an ideal choice to successfully execute this project.

We understand that not all steps we have implemented are valid for the Catalyst proposal but it demonstrates the internal working procedures we have in place.

Catalyst Experience

NuNet also has received the funding for proposals in Fund7 and Fund8. One proposal is successfully closed and the other is close to completion with one technical obstacle left to be solved. Overall, the funds were spent as intended on the development which can be monitored on Gitlab with daily commits since the award.

<https://gitlab.com/groups/nunet/-/milestones/19#tab-issues>

<https://gitlab.com/groups/nunet/-/milestones/20#tab-issues>

Financial Stability

As a 28+ strong team, we have independent funding to develop the core platform with a cash runway for at least 1-1.5 years. Cardano Catalyst proposals are used to extend the functionality and add features to the platform in order to enrich the possible use cases.

The financial report is publicly available and can be reviewed here:

<https://medium.com/nunet/nunet-financial-report-2022-and-outlook-for-2023-405d38397629>

What are the main goals for the project and how will you validate if your approach is feasible?

Opening Statement:

This initiative seeks to engineer a decentralized, universally adaptable, GPU-enhanced setting for shared machine intelligence tasks. By integrating the capacities of Docker modules, the distributed training capabilities of Horovod, and the peer-to-peer networking of Go libp2p, we aim to architect a system that can optimize GPU usage across a distributed network. Our strategy will pave the way for an innovative approach to conducting machine intelligence experiments, one that is not tethered to a single provider or a centrally controlled infrastructure.

Main goals:

1. Design a Universally Adaptable GPU-Enhanced Environment: The system will be designed to cooperate with GPUs from all manufacturers, enabling users to utilize their current GPU resources without needing to pledge to a specific provider. This objective will be verified by illustrating the ability to run Docker modules with machine intelligence workloads on GPUs from various manufacturers.

2. Incorporate Distributed Machine Intelligence Training: Our goal is to facilitate distributed training of machine intelligence models across various Docker modules. This will be verified by successfully training a model on multiple GPUs across the network, and comparing the training duration and model performance to those achieved with single-GPU training.

3. Develop Inter-Container Communication: The initiative will implement a communication protocol between Docker modules using libp2p, enabling efficient inter-container communication for distributed computing tasks. This will be validated by demonstrating efficient message passing and synchronization between modules in the network.

4. Showcase Scalability and Efficiency: Our system should display improved efficiency and scalability in executing machine intelligence experiments as compared to traditional centralized methods. We will validate this by running experiments that compare the performance of our system to traditional methods in terms of processing time, resource utilization, and scalability.

5. Enhance User Experience: While our initiative is highly technical, we aim to design a user-friendly platform. The success of this objective will be validated qualitatively through user feedback and quantitatively through user engagement metrics.

Strategy for Implementation:

Our initiative is technical and experimental in nature. We plan to commence by establishing Docker modules equipped with GPUs from various manufacturers. Next, we will incorporate Horovod into these modules to facilitate distributed training. Simultaneously, we will work on implementing a communication protocol between the modules using Go libp2p. Once these components are established, we will conduct experiments to validate the efficiency, scalability, and user-friendliness of our system.

The anticipated outcome of this initiative is a functional prototype of a decentralized, universally adaptable, GPU-enhanced system for distributed machine intelligence. This system will not only push the boundaries of what's possible in machine intelligence infrastructure, but will also empower researchers and developers to conduct their machine intelligence experiments in a more efficient and flexible manner.

Please provide a detailed breakdown of your project’s milestones and each of the main tasks or activities to reach the milestone plus the expected timeline for the delivery.

Milestone 1: Project Commencement and detailed architecture blueprints

Main Tasks / Activities: Forming the squad, building communication platforms, conducting weekly brainstorming sessions for comprehensive understanding of the Cardano universe and its latest computational offerings, identifying the prerequisites for a GPU splitting, drafting a comprehensive project blueprint, and conducting a practicability study based on the specified prerequisites and available resources.
Expected Delivery Time: 1.6-4 Weeks

Milestone 2: Development of Proof-of-Concept for GPU Job Splitting

Main Tasks / Activities: Development of the algorithms and smart contracts based on the preliminary design, implementation of a basic version of the decentralized GPU job splitting system based on prior NuNet research and experience, and testing of the system in a controlled environment.
Expected Delivery Time: 5-14 Weeks

Milestone 3: System Testing & Improvement for GPU Job Splitting

Main Tasks / Activities: Extensive testing of the system on Cardano testnet/preprod under different conditions and gathering feedback to make improvements to the system, considering onboarding external auditors to review the code based on the complexity, time and cost of the work to be done.
Expected Delivery Time: 4-12 Weeks

Milestone 4: Production Release & Documentation for GPU Job Splitting

Main Tasks / Activities: Deployment of the finalized system on the Cardano mainnet and preparation of comprehensive documentation for users and developers, considering opinion and input of external auditors if required.
Expected Delivery Time: 9-12 Weeks

Milestone 5: Dissemination & Research Paper Writing for GPU Job Splitting

Main Tasks / Activities: Gathering and analyzing data from the deployed system and writing a research paper based on the gathered data and the entire project experience.
Expected Delivery Time: 3.2-8 Weeks

The success of this project will be measured by the successful deployment of the decentralized GPU job splitting system on the Cardano testnet first and later on mainnet. The long-term impact of the project will be evaluated by the adoption and usage of the system by the Cardano community. The project's progress will be tracked by checking the completion of each milestone's deliverables and intended outcomes.

Please describe the deliverables, outputs and intended outcomes of each milestone.

Milestone 1: Project Commencement and detailed architecture blueprints

Deliverables: Project blueprint, general architecture and applied research, practicability analysis, project roadmap, defining risk management strategy and resource allocation plan.
Outcomes: Detailed documents of all strategies, including a timeline, resource distribution, and dependencies.

Milestone 2: Development of Proof-of-Concept for GPU Job Splitting

Deliverables: Splitting algorithms and the smart contracts based on the preliminary design, implementation of a basic version of the decentralized GPU job splitting system based on prior GPU cluster research, and testing of the system in a controlled environment.
Outcomes: A functional prototype of the system that showcases its core capabilities, verification of the system design, testing of the core functionalities, and gathering of initial feedback for improvement.

Milestone 3: System Testing & Improvement for GPU Job Splitting

Deliverables: Extensive testing of the system under different conditions and gathering feedback to make improvements to the system specified in a documented format.
Outcomes: A reliable and robust system that meets the defined requirements and delivers the desired functionality.

Milestone 4: Production Release & Documentation for GPU Job Splitting

Deliverables: Deployment of the finalized system on the Cardano mainnet and preparation of comprehensive documentation for users and developers.
Outcomes: A live, functioning system on the Cardano mainnet, users trained on how to effectively use it, and improved productivity and efficiency in GPU job splitting.

Milestone 5: Dissemination & Research Paper Writing for GPU Job Splitting

Deliverables: Gathering and analyzing data from the deployed system and writing a research paper based on the gathered real usage data and the entire project experience.
Outcomes: Continuous improvement of the system based on user feedback and changing needs, a system that remains relevant, efficient, and effective over time. Broad community involvement in susage and further development.

Each milestone’s progress will be tracked through the completion of the stated expected results and the achievement of the anticipated impact. Regular project update meetings and reports will provide visibility into the project's progress, and any issues or delays will be addressed through the project's risk management process. The overall project management methodology will be agile, with regular sprint planning, daily stand-up meetings, and retrospective meetings. Key performance indicators will be defined to track the progress and success of the project. The team will regularly communicate with stakeholders and the Cardano community to keep them updated on the progress and gather feedback.

Resources & Value For Money

Please provide a detailed budget breakdown of the proposed work and resources.

Each project is examined in great detail which can be seen in the proposed budgeting sheet. This results in pre-feasibility and feasibility studies which minimize the risk of budget overruns.

Project management in NuNet is on a high level with employed techniques such as Agile, Scrum, CCPM and others resulting in a good daily overview of the project progress.

The project is complex and involves research and development uncertainties however, NuNet is a well funded deep tech startup and in case of budget overruns will continue to develop until delivered due to this being a critical part of the overall NuNet development plan. This is evidenced by the funding received in Cardano Catalyst Fund 7 and 8 where NuNet has continued the work despite the substantial unexpected technical roadblocks and time impact.

Who is in the project team and what are their roles?

NuNet is a deep tech startup that is developing cutting edge solutions in the decentralized open source space. Currently, there are 28+ people in NuNet working on delivering use cases, primarily for Cardano. On top of that,

As a SingularyNET spin-off, NuNet has access to 100+ AI and software engineers for support. Main team members responsible for this proposal are presented below.

The NuNet Team working on this project:

Name: Kabir Veitas, PhD AI, MBA

Location: Brussels, Belgium

LinkedIn: <https://www.linkedin.com/in/vveitas/>

Position: Co-Founder & CEO

Bio:

Working in the computer software, research and management consulting industries with demonstrated experience. Skilled in Artificial Intelligence, cognitive and computer sciences, systems thinking, technology strategy, strategic business planning, management and social science research. Strong operations professional with a Doctor of Philosophy - PhD focused in Multi/Interdisciplinary Studies from Vrije Universiteit Brussel.

Name: Janaina Senna, MSc CS, MBA

Location: Belo Horizonte, Brasil

LinkedIn: <https://www.linkedin.com/in/janaina-farnese-senna/>

Position: Product Owner

Bio:

Master's degree in computer science and played different roles over the past 20 years, such as development manager, tech lead, and system architect, helping organizations launch new software and hardware products in the telecommunication and energy areas. As a product owner, she has shaped the product vision into manageable tasks and constructed the bridge between developers and stakeholders. She enjoys seeing products coming to life!

Name: Avimanyu Bandyopadhyay, PhD Candidate, Bioinformatics, MTech CS

Location: Kolkata, India

LinkedIn: <https://www.linkedin.com/in/iavimanyu/>

Position: Systems Scientist

Bio:

Knowledge-driven PhD candidate who manages resources and technical skills to accelerate collaborative research with GPU-based Bioinformatics. He thrives in a fast-paced and cross-disciplinary team environment that challenges his capacity for problem-solving and troubleshooting. He’s very passionate about understanding how various open source software work and loves to design new deployment models for them. Furthermore, he also believes that any software is as good as its documentation.

Interest driven researcher and author of “Hands-On GPU Computing With Python”, he has produced several scientific articles in different areas of science and research, with an academic publication related to enhancing productivity while working with extensive data.

At NuNet, he works with the integration of GPUs, tools and mechanisms with the broader NuNet platform.

Name: Dagim Sisay Anbessie, BSc CS

Location: Addis Ababa, Ethiopia

LinkedIn: <https://www.linkedin.com/in/dagim-sisay-7b4b05b8/>

Position: Tech Lead

Bio:

Experience in projects in the areas of Robotics, Machine Learning, System Software Development and Server Application Deployment and Administration for several international clients. At SingularityNET he worked on AI and misc. software development. Main responsibilities lay in researching the development path, technology to be used and directing specific tasks to the dev team. Additionally, he has been involved in system development when circumstances demand it.

Name: Jennifer Bourke, BA, MSc

Location: Dublin, Ireland

LinkedIn: https://www.linkedin.com/in/jennifer-bourke-1bb286158/

Position: Marketing and Community Lead

Bio:

A data-driven marketing expert with a postgraduate degree in digital marketing and data analytics. Currently pursuing a postgraduate degree in global leadership, she combines her strategic marketing skills with a global perspective. With over 6 years of experience, Jennifer has a proven track record of driving successful marketing campaigns.

Name: Ilija Radeljic, MSc CE

Location: Oslo, Norway

LinkedIn: <https://www.linkedin.com/in/dagim-sisay-7b4b05b8/>

Position: Director of Operations and Business Development

Bio:

Corporate industry veteran and AI&Blockchain enthusiast. This combination brings a wealth of 15 years of experience managing major infrastructure, power and manufacturing projects to the emerging blockchain world and its applications.

15+ years of experience in business negotiation, partnerships, leads, market entry, project management, promotion and presentations worldwide.

Formal engineering education, MSc Civil Engineering + MIT Sloan Executive Management and Leadership certified.

Cardano Catalyst Community Advisor and Cardano Catalyst Veteran Community Advisor since the beginning (Fund2) and consulted several funded proposals in Cardano Catalyst.

External auditors:

NuNet is also collaborating with the external auditing company Obsidian (<https://obsidian.systems/>) which has been contracted to audit the core platform development as well as specific use case integrations such as this one.

We intend to extend their contract (or hire another suitable 3rd party auditor) for auditing the implementation of this research work as well.

External support:

NuNet has a capable team (28+) to tackle the project but sometimes some extra resources or skills might be needed outside of the available pool. This will be sourced either as additional employees or subcontracted depending on the size and length of the development.

How does the cost of the project represent value for money for the Cardano ecosystem?

The costs of the project are based on the average salary levels of engineers currently employed by NuNet. Since the team is fully distributed and remote, it is challenging to have a suitable median cost that covers the range of countries (India, Pakistan, Ethiopia, Brasil, Egypt, UAE, UK, Italy and others).

We believe that the costs are reasonable and reflect the seniority and knowledge of various positions involved in the delivering of the proposal.

In line of full openness, in the budget table can be seen the very granular distribution of costs, all the way to the hours of each position for each milestone.

In addition, fully remote workers can compete for jobs in Western countries driving the individual compensation levels much higher than in their native countries.

bookmarked!

bookmarked!