Problem Overview
Currently data relating to the Cardano ecosystem is available, but is spread out over multiple sources.
DBSync is the most detailed source of on-chain data, but it's not easy or cheap to run, and the data contained is highly normalized, meaning it's often difficult to use the data to gain insights without a high level of knowledge about the database schema. The effort and time required to transform this data is something that any project that consumes this data will need to account for. i.e., this is a repeatable process that doesn't need to be done every time a project needs data.
Additionally there are other sources of data which must be integrated in order to make the data optimally useful, which includes:
- Stake pool metadata server (SMASH)
- Extended metadata files as-per the adapools.org standard
- Token data from sources such as the Cardano Foundation token registry, and the NFT marketplaces (CNFT.io, JPG.store, etc.) policyid databases
- Market data
- Social metrics
- Many others…
Additionally, there is a wealth of data in the transaction metadata which is specific to certain use cases, and is difficult to access without knowledge of how to query JSON data structures.
There are currently several excellent sites which offer pool specific data such as adapools.org and pooltool.io, as well as several block explorers, however none of these sites provide full historical data, custom queries, or data which has been modelled specifically for analysis or machine learning use cases.
Solution
We propose to build the initial MVP of a community data hub which will provide consolidated analytics-ready data to the Cardano ecosystem. We have already begun initial data sets (on-chain data and stake pool data sets), which would be integrated into the single data hub platform.
At a minimum, there will be data available from DBSync and other sources listed above, which have been modelled for various analytics activities. The DBSync data will have additional aggregated views such as the ones in the following repository: <https://github.com/cardanocanuck/db-sync-queries>
Additionally, we will continue to add special purpose datasets for various domains within the Cardano ecosystem. We have / will be submitting several smaller proposals for specialized datasets to be modelled and developed such as:
- On-chain Analytics - Transactions, volume, rewards, etc. (funded in F6 and wrapping up)
- Pool Analytics - Machine Learning ready dataset on historical pool performance (funded in F7 and underway)
- NFT Analytics - information about several aspects of NFT projects
- Smart Contract Analytics
The initial MVP Data Hub will allow the download of scheduled CSV data sets. In the future, the range of sharing methods will be expanded. Some of these sharing methods will be:
- API access
- Web based data explorer
- Community available Google Sheets
- Direct cloud database access
- Direct data sharing (Azure / Snowflake)
We will prioritize free community access methods, but some access methods such as direct database access or data sharing may be monetized with a subscription model. The purpose of monetizing premium aspects of the data hub is to fund future ongoing development and enhancement.
This proposal is for the core functionality and backend infrastructure development of this community hub.
Project Plan
The requested funds will cover the first 3 months of development of the platform as well as the first 6 months of running costs.
We propose to follow a hybrid waterfall / agile methodology, starting with some upfront architecture and design and feature planning, followed by 4 sprints of feature development. The project plan will be updated throughout this catalyst process as we find team members and refine our idea and feature set.
See attached diagram.
Budget
The budget we are requesting will fund the first 3 months of development, and 6 months of infrastructure costs.
The approximate budget breakdown by role is as follows:
- Architect / Senior Dev - 100h x $75 = $7,500
- Graphic Designer - 60h x $75 = $4,500
- Web Developer - 80h x $75 = $6,000
- Data Engineer - 280h x $75 = $21,000
- Project Manager - 80h x $75 = $6,000
- QA - 40h x $75 = $3,000
Total development costs: $48,000
Infrastructure costs estimated at $2000/mo x 6 months = $12,000
Total Budget: $60,000
Core Team Experience
Michael Stewart
- 17+ years of software development and architecture experience.
- 10+ years focused in the data and analytics space
- Led the development team of a boutique data / analytics firm where I designed and architected cloud based data warehouse solutions for fortune 500 companies
- Member of the Cardano community since 2017
- Co-Founder of Cardano Canucks stake pool and Canuckz NFTs
- Co-Founder of Canucks Publishing NFT Minting Platform and Service
- Co-Founder of CCSPA (Canadian Cardano Stake Pool Association)
Vivek Nankissoor
- 15+ years of experience in database requirements, design and development
- Established and grew web analytics, marketing automation and QA practices
- Engaged in marketing, data and analytics strategy development with enterprise retail, cpg organizations, banks, automotive, pharma, fintech and others
- Co-Founder of Cardano Canucks stake pool and Canuckz NFTs
- Co-Founder of CCSPA (Canadian Cardano Stake Pool Association)
- Co-Founder of Canucks Publishing NFT Minting Platform and Service
- Participant in community work such as financial literacy relating to crypto and raising awareness with various investment groups
This solution will address the challenge by providing a starting point for data and analytics projects within the developer ecosystem, removing the overhead of time and effort for creating a usable data set. In addition, developers will not need to ramp up on the nuances of raw data sets (e.g., structures and relationships within DBSync). Instead, they can start with curated data that lends itself to easy integration within developer applications.
Also, this solution will allow for previously funded projects to be integrated into a single place for the aggregation and distribution of curated data sets:
- on-chain dataset
- stake pool dataset
The risks are:
-
Resource management - ensure that the resources assigned have the proper skills and experience to complete the project
-
Strict adherence to timelines - ensure that the project stays on time and on budget
-
Significant changes to data sources - risk that the source data schema, format, etc. may change during the duration of this project. If this occurs, there is a risk that timelines may be extended. This is assumed to be a low risk as any changes in the past have been minor and accompanied by good notice and documentation