Analytics Data Hub

Funds Fund 8 Proposals F8: Developer Ecosystem Analytics Data Hub

over budget

Analytics Data Hub

View on Ideascale

Current Project Status

Unfunded

Amount
Received

Amount
Requested

$60,000

Percentage
Received

0.00%

Solution

A data hub for the Cardano ecosystem that makes historical and modelled datasets available through multiple access mechanisms.

Problem

Data relating to the Cardano ecosystem is granular and scattered, making it difficult to access and use for analytics or machine learning.

Addresses Challenge

Feasibility

Auditability

Analytics Data Hub

Impact

Problem Overview

Currently data relating to the Cardano ecosystem is available, but is spread out over multiple sources.

DBSync is the most detailed source of on-chain data, but it's not easy or cheap to run, and the data contained is highly normalized, meaning it's often difficult to use the data to gain insights without a high level of knowledge about the database schema. The effort and time required to transform this data is something that any project that consumes this data will need to account for. i.e., this is a repeatable process that doesn't need to be done every time a project needs data.

Additionally there are other sources of data which must be integrated in order to make the data optimally useful, which includes:

Stake pool metadata server (SMASH)
Extended metadata files as-per the adapools.org standard
Token data from sources such as the Cardano Foundation token registry, and the NFT marketplaces (CNFT.io, JPG.store, etc.) policyid databases
Market data
Social metrics
Many others…

Additionally, there is a wealth of data in the transaction metadata which is specific to certain use cases, and is difficult to access without knowledge of how to query JSON data structures.

There are currently several excellent sites which offer pool specific data such as adapools.org and pooltool.io, as well as several block explorers, however none of these sites provide full historical data, custom queries, or data which has been modelled specifically for analysis or machine learning use cases.

Solution

We propose to build the initial MVP of a community data hub which will provide consolidated analytics-ready data to the Cardano ecosystem. We have already begun initial data sets (on-chain data and stake pool data sets), which would be integrated into the single data hub platform.

At a minimum, there will be data available from DBSync and other sources listed above, which have been modelled for various analytics activities. The DBSync data will have additional aggregated views such as the ones in the following repository: <https://github.com/cardanocanuck/db-sync-queries>

Additionally, we will continue to add special purpose datasets for various domains within the Cardano ecosystem. We have / will be submitting several smaller proposals for specialized datasets to be modelled and developed such as:

On-chain Analytics - Transactions, volume, rewards, etc. (funded in F6 and wrapping up)
Pool Analytics - Machine Learning ready dataset on historical pool performance (funded in F7 and underway)
NFT Analytics - information about several aspects of NFT projects
Smart Contract Analytics

The initial MVP Data Hub will allow the download of scheduled CSV data sets. In the future, the range of sharing methods will be expanded. Some of these sharing methods will be:

API access
Web based data explorer
Community available Google Sheets
Direct cloud database access
Direct data sharing (Azure / Snowflake)

We will prioritize free community access methods, but some access methods such as direct database access or data sharing may be monetized with a subscription model. The purpose of monetizing premium aspects of the data hub is to fund future ongoing development and enhancement.

This proposal is for the core functionality and backend infrastructure development of this community hub.

Project Plan

The requested funds will cover the first 3 months of development of the platform as well as the first 6 months of running costs.

We propose to follow a hybrid waterfall / agile methodology, starting with some upfront architecture and design and feature planning, followed by 4 sprints of feature development. The project plan will be updated throughout this catalyst process as we find team members and refine our idea and feature set.

See attached diagram.

Budget

The budget we are requesting will fund the first 3 months of development, and 6 months of infrastructure costs.

The approximate budget breakdown by role is as follows:

Architect / Senior Dev - 100h x $75 = $7,500
Graphic Designer - 60h x $75 = $4,500
Web Developer - 80h x $75 = $6,000
Data Engineer - 280h x $75 = $21,000
Project Manager - 80h x $75 = $6,000
QA - 40h x $75 = $3,000

Total development costs: $48,000

Infrastructure costs estimated at $2000/mo x 6 months = $12,000

Total Budget: $60,000

Core Team Experience

Michael Stewart

17+ years of software development and architecture experience.
10+ years focused in the data and analytics space
Led the development team of a boutique data / analytics firm where I designed and architected cloud based data warehouse solutions for fortune 500 companies
Member of the Cardano community since 2017
Co-Founder of Cardano Canucks stake pool and Canuckz NFTs
Co-Founder of Canucks Publishing NFT Minting Platform and Service
Co-Founder of CCSPA (Canadian Cardano Stake Pool Association)

Vivek Nankissoor

15+ years of experience in database requirements, design and development
Established and grew web analytics, marketing automation and QA practices
Engaged in marketing, data and analytics strategy development with enterprise retail, cpg organizations, banks, automotive, pharma, fintech and others
Co-Founder of Cardano Canucks stake pool and Canuckz NFTs
Co-Founder of CCSPA (Canadian Cardano Stake Pool Association)
Co-Founder of Canucks Publishing NFT Minting Platform and Service
Participant in community work such as financial literacy relating to crypto and raising awareness with various investment groups

This solution will address the challenge by providing a starting point for data and analytics projects within the developer ecosystem, removing the overhead of time and effort for creating a usable data set. In addition, developers will not need to ramp up on the nuances of raw data sets (e.g., structures and relationships within DBSync). Instead, they can start with curated data that lends itself to easy integration within developer applications.

Also, this solution will allow for previously funded projects to be integrated into a single place for the aggregation and distribution of curated data sets:

on-chain dataset
stake pool dataset

The risks are:

Resource management - ensure that the resources assigned have the proper skills and experience to complete the project
Strict adherence to timelines - ensure that the project stays on time and on budget
Significant changes to data sources - risk that the source data schema, format, etc. may change during the duration of this project. If this occurs, there is a risk that timelines may be extended. This is assumed to be a low risk as any changes in the past have been minor and accompanied by good notice and documentation

Feasibility

Please see the attachment for the overall project timeline.

Legal Setup - creation of a legal entity dedicated to this project so it can be managed and maintained beyond the initial funding
Milestones/Deliverables: creation of the legal entity under which the data hub will be managed
Project Planning - creation and revision of a detailed plan including requirements documentation, resourcing, management via Jira (epic and issue/task creation, assignment and management), milestone definition and deployment (publish/schedule to site)
Milestones/Deliverables: project plan and setup
Creative Design - portal creation and feed integration plan
Milestones/Deliverables: wireframes/mockups for portal, brief for web development
Solution architecture - technical solution planning and initial systems provisioning
Milestones/Deliverables: architecture diagram and development plan
Website Development - creation of the portal
Milestones/Deliverables: completed website where users can find/download data sets
Feature Development - Sprint X - data set creation, including extraction, load, transformation/aggregation, build of curated views, export to supported formats.
Milestones/Deliverables: scheduled data sets presented via the website

Budget

The budget we are requesting will fund the first 3 months of development, and 6 months of infrastructure costs.

The approximate budget breakdown by role is as follows:

Legal Setup / Incorporation = $5000
Architect / Senior Dev - 100h * $75 = $7,500
Graphic Designer - 60h * $75 = $4,500
Web Developer - 80h x $75 = $6,000
Data Engineer - 280h x $75 = $21,000
Project Manager - 80h x $75 = $6,000
QA - 40h x $75 = $3,000

Total development costs: $53,000

Infrastructure costs estimated at $2000/mo x 6 months = $12,000

Total Budget: $65,000

Project resource breakdown (see budget breakdown).

Leadership:

Michael Stewart

17+ years of software development and architecture experience.
10+ years focused in the data and analytics space
Led the development team of a boutique data / analytics firm where I designed and architected cloud based data warehouse solutions for fortune 500 companies
Member of the Cardano community since 2017
Co-Founder of Cardano Canucks stake pool and Canuckz NFTs
Co-Founder of CCSPA (Canadian Cardano Stake Pool Association)

Vivek Nankissoor

15+ years of experience in database requirements, design and development
Established and grew web analytics, marketing automation and QA practices
Engaged in marketing, data and analytics strategy development with enterprise retail, cpg organizations, banks, automotive, pharma, fintech and others
Co-Founder of Cardano Canucks stake pool and Canuckz NFTs
Co-Founder of CCSPA (Canadian Cardano Stake Pool Association)
Participant in community work such as financial literacy relating to crypto and raising awareness with various investment groups

Auditability

This project will be measured primarily by:

milestone/deliverable on time completion
budget control (spend by resource, by phase)

The secondary KPIs may include:

Volume: breadth and depth of data sets created (columns, rows)
Relevance: number of use cases satisfied by data set design

Success is defined as a website where curated datasets are refreshed daily and can be downloaded by data consumers on an ad hoc basis.

Once completed, this project will serve as the foundation for the enablement of data for analytics: modeling, visualization, machine learning, applications specific to particular domains (stake pools, NFTs, etc.), and much more.

The project is a net new project, but will tie in the outputs from previous projects:

on-chain data sets (fund 6)
stake pool data sets (fund 7)

It will also provide a platform for future data set development.

bookmarked!

bookmarked!