Please describe your proposed solution.
Context
- On-chain data is massive and only keeps increasing
- The full ledger structure is very complex
- The vast majority of dApps only care about a small fraction of the on-chain data which is relevant for their use-case. For example:
- UTxOs locked in a particular script address
- UTxOs generated by a particular script address
- UTxOs with a particular asset policy or token
- The vast majority of dApps only care about particular projections of the ledger relevant for their use-case. For example:
- The balance of a particular token per address
- The Plutus data contained in UTxO datums
- The Plutus data sent as redeemers
- The metadata contained in a particular label
Current Scenario
- Current tools approach the problem with a one-size-fits-all solution.
- We have tools such as DB-Sync or Carp that approach the problem by trying to replicate the ledger in a relational database. Pros: it’s flexible enough to fulfill almost all query requirements. Cons: requires lots of storage space; requires lots of compute resources; complex queries are slow.
- We have tools such as Kupo or Scrolls that approach the problem by having an opinionated view of the data that needs to be indexed. Pros: they are lightweight; don’t require much compute resources; queries are optimized. Cons: the available queries are limited; use-cases are limited.
Ideal Scenario
- Each dApp should have a tailored-made database for their particular use-case.
- This database should only contain the subset of the chain relevant to the dApp.
- The schema of the database should be designed to fit the needs of the dApp.
- All of the boilerplate code and infrastructure plumbing required to sync data from the chain should be already available as an SDK.
- Developers should only be focused on adjusting the SDK to the particular requirements of their dApp.
- Querying the data should be easy and flexible. In particular, a GraphQL endpoint should be available to query from browser or any of the existing client SDKs.
- Deployment of the solution should be well documented and as simple as possible (without losing flexibility) so that dAapp teams can run their own indexers.
- Optionally, infrastructure service providers should be able to offer “indexing” clusters that can plug-and-play custom code for client dApps.
- Other ecosystems have focused on similar solutions. To mention a few:
- <https://www.subsquid.io/>
- <https://thegraph.com>
Example Use-Cases
- A DEX could create custom indexes representing their liquidity pools and the history of swap transactions.
- An NFT marketplace could create custom indexes representing available collections, bidding price, total supply and history of transactions.
- An Oracle could create custom indexes representing the current value and history of past values for their fact statement catalogs.
- ADA Handles could be represented by a custom index that maps handles to the latest address.
- A dApp that requires batching of UTxOs could create custom indexes to keep track of relevant UTxO required for their batching purposes.
Technical Plan
- Build an open-source indexing engine based out of a modified version of Scrolls that supports plugins.
- Plugins will run using either the Deno runtime or a WASM runtime.
- Each plugin will be responsible for providing two functions: “Map” and “Reduce”.
- The “Map” function will take a Cardano block as a parameter and output a custom key / value pair defined by the developer.
- The “Reduce” function will take an array of key / value pairs and aggregate them into a single key / value pair.
- The indexing engine will be responsible for crawling through the history of the chain and executing the Map / Reduce operations for each block.
- The output from the Map / Reduce will be persisted in a relational database (TBD, but probably PostgreSQL). The schema of the database will be provided declaratively by the plugin.
- Build an open-source SDK that contains required libraries and toolchains to build the Map / Reduce plugins. It will contain code generation utilities to scaffold required functionality. Developers will need to fill the gaps adapting the logic to their use-case.
- Integrate existing GraphQL libraries that automatically generate APIs from existing database schemas. This allows the developers to query their data using modern and flexible technologies.
- Prepare installation instructions, docker images and provisioning scripts so that DevOps teams that wish to run the system can do so on their own infrastructure.
- Integrate the indexing engine with Demeter platform so that developers can have one-click deployments of their custom indexes running on the cloud.
How does your proposed solution address the challenge and what benefits will this bring to the Cardano ecosystem?
Indexing on-chain is one of the primary pain points of building dApps in Cardano. By providing a framework that strikes a good balance between flexibility, performance and TCO (Total Cost of Ownership) we allow dApp developers to:
- Write less boilerplate code, which results on fewer bugs
- More time to focus on core-business propositions
- Simplifying the onboarding process when starting to work on the Cardano ecosystem
The approach proposed here is being already applied in other blockchain ecosystems. By providing a similar which has been validated we:
- Reduce the Developer-experience gap that Cardano has compared to other ecosystems
- Simplify the migration of projects already building in other ecosystems that wish to integrate Cardano
How do you intend to measure the success of your project?
We consider the following dimensions for measuring the success of the project:
- Activity in the open-source Github repository through metrics such as, but not limited to: # of issues, clones, external contributors, stars, visitors, etc.
- Number of external repositories that include this project as a dependency.
- Number of dApp projects using this feature through the hosted version at Demeter.run
Please describe your plans to share the outputs and results of your project?
Being an open-source project, the outputs will be available to any developer in the ecosystem at every step of the development process:
- Latest version of the source-code will be available in the Github repository.
- Source code changes will be applied through a pull-request process.
- Alpha and Beta versions will be released at every milestone.
Upon reaching the end of the development process, we’ll provide:
- A LTS release version of the engine available to download in many formats (binary, Docker image, etc)
- A CLI (command line interface) binary to serve as entry point for developers
- A documentation website with instructions for usage and deployment
- A collection with several examples of different common indexers to use as starting points
- A tutorial video showing a walkthrough of how to create an custom indexer
Our hosted version of Demeter.run will provide:
- A one-click deploy mechanism to host custom indexers
- A web-based monitoring dashboard for your indexers
- A playground UI to execute ad-hoc queries over your data