Scrolls - Develop and deploy Custom GraphQL chain indexes

Funds Fund 10 Proposals Developer Ecosystem - The Evolution Scrolls - Develop and deploy Custom GraphQL chain indexes

completed

View on Ideascale View on projectcatalyst.io

Current Project Status

Complete

Amount
Received

₳207,857

Amount
Requested

₳207,857

Percentage
Received

100.00%

Solution

A framework that allows projects to define data filtering / aggregation logic in a concise Typescript file (Map / Reduce algorithm). A tool that executes the custom indexing and exposes a GraphQL API.

Problem

Screenshot-2023-07-16-at-15.43.48-0ce0e2.png

Indexing chain data is complex, every project has particular requirements. There’s no silver bullet, generic indexing solutions are either too heavy or lack required features.

Feasibility

Value for money

Impact / Alignment

Scrolls - Develop and deploy Custom GraphQL chain indexes

Impact

Please describe your proposed solution.

Context

On-chain data is massive and only keeps increasing
The full ledger structure is very complex
The vast majority of dApps only care about a small fraction of the on-chain data which is relevant for their use-case. For example:
UTxOs locked in a particular script address
UTxOs generated by a particular script address
UTxOs with a particular asset policy or token
The vast majority of dApps only care about particular projections of the ledger relevant for their use-case. For example:
The balance of a particular token per address
The Plutus data contained in UTxO datums
The Plutus data sent as redeemers
The metadata contained in a particular label

Current Scenario

Current tools approach the problem with a one-size-fits-all solution.
We have tools such as DB-Sync or Carp that approach the problem by trying to replicate the ledger in a relational database. Pros: it’s flexible enough to fulfill almost all query requirements. Cons: requires lots of storage space; requires lots of compute resources; complex queries are slow.
We have tools such as Kupo or Scrolls that approach the problem by having an opinionated view of the data that needs to be indexed. Pros: they are lightweight; don’t require much compute resources; queries are optimized. Cons: the available queries are limited; use-cases are limited.

Ideal Scenario

Each dApp should have a tailored-made database for their particular use-case.
This database should only contain the subset of the chain relevant to the dApp.
The schema of the database should be designed to fit the needs of the dApp.
All of the boilerplate code and infrastructure plumbing required to sync data from the chain should be already available as an SDK.
Developers should only be focused on adjusting the SDK to the particular requirements of their dApp.
Querying the data should be easy and flexible. In particular, a GraphQL endpoint should be available to query from browser or any of the existing client SDKs.
Deployment of the solution should be well documented and as simple as possible (without losing flexibility) so that dAapp teams can run their own indexers.
Optionally, infrastructure service providers should be able to offer “indexing” clusters that can plug-and-play custom code for client dApps.
Other ecosystems have focused on similar solutions. To mention a few:
<https://www.subsquid.io/>
<https://thegraph.com>

Example Use-Cases

A DEX could create custom indexes representing their liquidity pools and the history of swap transactions.
An NFT marketplace could create custom indexes representing available collections, bidding price, total supply and history of transactions.
An Oracle could create custom indexes representing the current value and history of past values for their fact statement catalogs.
ADA Handles could be represented by a custom index that maps handles to the latest address.
A dApp that requires batching of UTxOs could create custom indexes to keep track of relevant UTxO required for their batching purposes.

Technical Plan

Screenshot-2023-07-16-at-19.35.18-7da4e4.png

Build an open-source indexing engine based out of a modified version of Scrolls that supports plugins.
Plugins will run using either the Deno runtime or a WASM runtime.
Each plugin will be responsible for providing two functions: “Map” and “Reduce”.
The “Map” function will take a Cardano block as a parameter and output a custom key / value pair defined by the developer.
The “Reduce” function will take an array of key / value pairs and aggregate them into a single key / value pair.
The indexing engine will be responsible for crawling through the history of the chain and executing the Map / Reduce operations for each block.
The output from the Map / Reduce will be persisted in a relational database (TBD, but probably PostgreSQL). The schema of the database will be provided declaratively by the plugin.
Build an open-source SDK that contains required libraries and toolchains to build the Map / Reduce plugins. It will contain code generation utilities to scaffold required functionality. Developers will need to fill the gaps adapting the logic to their use-case.
Integrate existing GraphQL libraries that automatically generate APIs from existing database schemas. This allows the developers to query their data using modern and flexible technologies.
Prepare installation instructions, docker images and provisioning scripts so that DevOps teams that wish to run the system can do so on their own infrastructure.
Integrate the indexing engine with Demeter platform so that developers can have one-click deployments of their custom indexes running on the cloud.

How does your proposed solution address the challenge and what benefits will this bring to the Cardano ecosystem?

Indexing on-chain is one of the primary pain points of building dApps in Cardano. By providing a framework that strikes a good balance between flexibility, performance and TCO (Total Cost of Ownership) we allow dApp developers to:

Write less boilerplate code, which results on fewer bugs
More time to focus on core-business propositions
Simplifying the onboarding process when starting to work on the Cardano ecosystem

The approach proposed here is being already applied in other blockchain ecosystems. By providing a similar which has been validated we:

Reduce the Developer-experience gap that Cardano has compared to other ecosystems
Simplify the migration of projects already building in other ecosystems that wish to integrate Cardano

How do you intend to measure the success of your project?

We consider the following dimensions for measuring the success of the project:

Activity in the open-source Github repository through metrics such as, but not limited to: # of issues, clones, external contributors, stars, visitors, etc.
Number of external repositories that include this project as a dependency.
Number of dApp projects using this feature through the hosted version at Demeter.run

Please describe your plans to share the outputs and results of your project?

Being an open-source project, the outputs will be available to any developer in the ecosystem at every step of the development process:

Latest version of the source-code will be available in the Github repository.
Source code changes will be applied through a pull-request process.
Alpha and Beta versions will be released at every milestone.

Upon reaching the end of the development process, we’ll provide:

A LTS release version of the engine available to download in many formats (binary, Docker image, etc)
A CLI (command line interface) binary to serve as entry point for developers
A documentation website with instructions for usage and deployment
A collection with several examples of different common indexers to use as starting points
A tutorial video showing a walkthrough of how to create an custom indexer

Our hosted version of Demeter.run will provide:

A one-click deploy mechanism to host custom indexers
A web-based monitoring dashboard for your indexers
A playground UI to execute ad-hoc queries over your data

Capability/ Feasibility

What is your capability to deliver your project with high levels of trust and accountability?

TxPipe is very proud of their past and current contributions to the ecosystem. Just to mention a few:

We have developed “Pallas”, a Rust library for Cardano which is used by several high-profile project in the community (such as: cncli and Aiken)
Through Catalyst, we have developed and delivered “Oura”, an off-chain data integration tool for Cardano used by many projects in the community (such as: Pool.io and dcSpark’s Carp, etc).
Through Catalyst, we have developed “Dolos”, a minimalistic version of the Cardano node which is being slowly rolled out to the community as a beta version.
We have developed “Demeter”, a cloud hosting platform for Cardano infrastructure with several high-profile clients (such as: JPG.store, SummonPlatform and TeddySwap).

The above are examples of our accountability because:

It shows our commitment to evolving the open source community and the Cardano ecosystem.
It shows the technical expertise required to develop and maintain Cardano infrastructure.
It shows our commitment to the Catalyst program and its required procedures.

What are the main goals for the project and how will you validate if your approach is feasible?

We are pursuing the main goal of simplifying the process of creating chain indexers that are lightweight, flexible and performant.

The success for this goal will be measured by the level of adoption of the framework. We’ll consider this goal achieved if:

We see at least 20 developers / teams working on the development of custom indexers using this framework within the first 3 months after the release of v1.
We see at least 5 projects (dApps) running custom indexers in testnet or mainnet, within the first 6 months after the release of v1.

Please provide a detailed breakdown of your project’s milestones and each of the main tasks or activities to reach the milestone plus the expected timeline for the delivery.

Milestone #1: Scrolls Refactoring (1 month)

Integrate Deno / WASM runtime into Scrolls pipeline
Adapt processing logic for Map / Reduce operations
Adapt storage backend to accept custom schemas

Milestone #2: SDK Development (1.5 month)

Implement Deno (Typescript) library for Map / Reduce definition
Implement Rust library for Map / Reduce definition
Implement Python library for Map / Reduce definition
Implement Go library for Map / Reduce definition
Implement a CLI (command line interace) for scaffolding indexer code
Prepare documentation and tutorials

Milestone #3: GraphQL Server (1 month)

Implement GraphQL server to serve custom schemas
Create end-to-end examples

Milestone #4: Demeter Integration (1 month)

Add Scrolls as a new extension in Demeter.run
Allow provisioning of new indexers by uploading the map / reduce definition
Provide automatic provision of GraphQL endpoints for each indexer

Please describe the deliverables, outputs and intended outcomes of each milestone.

Milestone #1: Scrolls Refactoring

Output: publicly available source code under Apache license of the new version of Scrolls; including documentation for basic usage.
Outcome: a new version of Scrolls that support custom indexing of custom schemas by enabling developers to plugin the map / reduce logic without having to deal with any of the common boilerplate.

Milestone #2: SDK Development

Output: publicly available SDK for creating custom map / reduce logic in each of the implemented languages (Typescript, Rust, Go, Python); including documentation and tutorials.
Output: binary release of the CLI to scaffold codebase structure of new indexers; including usage documentation.
Outcome: a developer-friendly toolchain that allows developers to create custom Scrolls-compatible map / reduce algorithms in their preferred language.

Milestone #3: GraphQL Server

Output: publicly available source code under Apache license of a GraphQL server that allows access to custom indexes defined in Scrolls.
Outcome: a flexible mechanism to query custom indexes using a well-known and popular mechanism (GraphQL)

Milestone #4: Demeter Integration

Output: generally-available feature on https://Demeter.run where new indexes can be defined by uploading the definition of the Map / Reduce algorithm.
Outcome: Cardano devs have access to a hosted solution that allows them to deploy custom indexes without any kind of effort outside of their own use-case.

Resources & Value For Money

Please provide a detailed budget breakdown of the proposed work and resources.

FTE = full-time equivalent

Values expressed in ADA (₳)

Breakdown by resource type:

Rust developers: 1 FTE x 4.5 months = ₳ 144,643
Frontend / React developers: 1 FTE x 1 month = ₳ 14,286
Technical writers: 1 FTE x 2 months = ₳ 10,714
Project manager: 1/4 FTE x 4.5 months = ₳ 9,643
Site-reliability engineers: 1 FTE x 1 months = ₳ 28,571

Breakdown by milestone

Milestone #1: ₳ 41,429
Milestone #2: ₳ 67,500
Milestone #3: ₳ 50,357
Milestone #4: ₳ 48,571

Who is in the project team and what are their roles?

This project will be executed by the TxPipe team. In particular, the following people will be involved in the development:

Santiago Carmuega (TxPipe): will be in charge of system architecture and part-time Rust developer. (Github)
Paulo Bressan (TxPipe): will be responsible for Rust development (Github)
Rodrigo Santamaria (TxPipe): will be responsible for UI / UX and frontend development. (Github)
Federico Weill (TxPipe): will be responsible for project management.
Florencia Luna (TxPipe): will be responsible for technical writing of tutorials and documentation.

A new hire will take care of the role of site-reliability engineer for this particular project.

How does the cost of the project represent value for money for the Cardano ecosystem?

The resources associated with the development process will result in open-source code that can be leveraged by any team or member of the Cardano ecosystem.

The resources geared towards a hosted version of the custom indexes via Demeter will minimize the level-of-effort of building a dApp, allowing smaller teams to iterate faster and cheaper.

bookmarked!

bookmarked!