Nanopublications Dashboard: a searchable natural language tool for atomic knowledge-sharing

Funds Fund 11 Proposals F11: Cardano Open: Developers - technical Nanopublications Dashboard: a searchable natural language to...

not approved

View on Ideascale View on projectcatalyst.io

Current Project Status

Unfunded

Amount
Received

₳0

Amount
Requested

₳70,181

Percentage
Received

0.00%

Solution

Build a dashboard to collect atomic, searchable learning points from Catalyst projects (similar to nanopublications but more accessible); populate it with data from existing projects.

Problem

Learning-points from Catalyst projects are hard to find - they’re often poorly-evidenced, and buried in unsearchable PDFs / videos. Connections between different projects’ discoveries are opaque.

Diagram explaining what a Nanopublication consists of: an assertion, its provenance, and the relevant publication info.

Impact Alignment

Feasibility

Value for money

Nanopublications Dashboard: a searchable natural language tool for atomic knowledge-sharing

Solution

Please describe your proposed solution.

Making learning from Catalyst projects usable

Our community’s 500+ completed Catalyst projects represent a large corpus of knowledge. However, much of this material remains locked up in closing reports in the form of videos and PDFs, which are unsearchable and undiscoverable; so once a project has presented its close-out, the learning from it is often forgotten. Even where material is held on searchable platforms, it often contains no clear link between a specific piece of knowledge or developer-relevant information, and the evidence that supports it; and no clear attribution to the person who discovered it. There is also no easy way to see connections (or even interesting contradictions) between what different projects have learnt; and no way for a developer who is making a new proposal to look at what has been discovered already and build on it. So we often end up either losing knowledge that is useful to the ecosystem, or reinventing it time and again.

The solution we propose is to build a searchable platform where developers and project teams can add individual, atomic learning points from their completed Catalyst projects, expressed in a CNL (controlled natural language) format, and supported by links to the evidence and attribution for each learning point. This will enable people to search for insights on specific topics, or from specific projects or types of projects, and immediately see connections or contradictions.

The theory and research behind our approach

The idea is based on nanopublications: an approach that is popular in life-sciences research, but has yet to reach very far beyond that field. Essentially, a nanopublication is “the smallest possible unit of publishable information” - a small, discrete, machine-readable assertion, supported by provenance information (i.e. what the assertion is derived from, and the research evidence that supports it). It’s an excellent way to share knowledge and make ecosystem-wide connections between the things we are building - the only drawback is that a “classic” nanopublication is expressed in RDF notation (a WC3 standard originally designed as a data model for metadata). This presents a barrier to adoption because many people find it difficult, and because it might not even be appropriate to express some kinds of knowledge.

So in this proposal, we draw on research by Tobias Kuhn et al in 2013, which looked at how to broaden the scope of the “nanopublication” concept by using CNLs (controlled natural language) rather than RDF triples to express a research conclusion; so people can essentially write their learning-points in normal English.

Kuhn’s research developed a concept called the “AIDA statement” (an acronym for “Atomic, Independent, Declarative, Absolute”, and unrelated to the “AIDA” acronym used in the field of marketing!) AIDA is a simple framework for what a nanopublication statement in natural language should look like, and is the approach we intend to use.

Atomic: a sentence describing one thought that cannot be further broken down in a practical way
Independent: a sentence that can stand on its own, without external references like “this effect” or “we”
Declarative: a complete sentence ending with a full stop that could in theory be either true or false
Absolute: a sentence describing the core of a claim, ignoring the (un)certainty about its truth and ignoring how it was discovered (no “probably” or “evaluation showed that”); typically in present tense

How we’ll put this research into practice

Kuhn found that scientists were fairly easily able to create AIDA statements from the abstracts of published research papers. Based on this, we feel confident that with some supporting "how-to" documentation (which we will create), Catalyst developers will be able to do the same with material from their monthly or closing reports. Once the material is expressed in this atomic, declarative way, it can then be connected to the provenance that supports it - this could be any link, from a heading in a document or a timestamp in a video, to a GitHub commit, a Tweet, a cell in a spreadsheet, or anywhere else a project recorded its discoveries.

While the “nanopublications” approach has, until now, mainly been used for research-based projects, our initial explorations have shown that it is also very effective for developer projects, especially when (as they commonly do) they have documented their progress, and noted bug-fixes, results of user-testing, etc.

If a developer enters their material, expressed as AIDA statements, into our database via a dashboard-style frontend, it will then be searchable by project, by keyword, by developer, etc; so connections and similarities between different projects will become visible. We will also be able to see attribution (i.e. which project or person came up with this insight?), which will help us become more aware of where insights are coming from, and help developers working on similar things to find each other. Insights added to the tool might include pitfalls or problems, thus helping future developers avoid or address them. Also, note that additions to the platform would not necessarily have to be restricted to material from Catalyst project reporting - potentially, developers could also add the knowledge that surfaces in a meeting, a collaborative document, or a Twitter space, by translating it into a series of AIDA statements and adding it.

Once the platform is launched, our intention is for developers to add learning points to it themselves, from their own proposals in F11 onwards. But in order to make the platform usable and valuable from the outset, we will add a corpus of searchable data from completed Catalyst projects from Fund 7 to Fund 10 (200 proposals, 3 to 5 AIDA statements from each one). While this data-population work is not typical of the process of building a tool, in this instance we consider it an essential component of the tool's functionality. If we simply built an empty database and waited for developers to fill it, it could be a long time before the tool was actually useful. Populating it with retrospective data not only lets us test its functionality, but also enables it to be used immediately by developers in the way we intend - i.e. to search for and build on prior learning in Catalyst.

We will then open the dashboard to the community. We'll offer simple documentation, as video and text, showing exactly how to shape one’s material into AIDA statements; 3 awareness sessions in different parts of the developer community to raise interest; and some ongoing user support via DM. Then proposers of finished projects will be incentivised via bounties to add the learning from their F10 and F9 proposals. We’ll also offer small bounties for people to send us a record of any useful and interesting connections they have discovered from searching the database, which we’ll collate on the project GitBook as a way of demonstrating what kind of insights the dashboard is helping the community to uncover.

Essentially, this approach frames the things we do in Catalyst (potentially, everything we do, from proposals, to After TownHalls, to discussions on Telegram or Twitter) as the “experiments” we have always said they are, complete with the research insights that characterise experimentation. It will help clarify and evidence what we're actually learning from projects, and will make that learning searchable and discoverable; it will also surface new insights and previously-unseen connections. Our approach turns development projects into a collaborative research pool that we can all draw on, and embeds attribution and recognition for developers. In this way, it supports Cardano's open-source ethos, by supporting developers to amplify and build on each other’s work.

Impact

Please define the positive impact your project will have on the wider Cardano community.

This proposal offers a tool to enhance the Cardano developer ecosystem and support its open-source ethos; but it also offers increased awareness of a concept (Nanopublications) which is new to Cardano and could have far-reaching positive impact. For instance, our adaptation of the Nanopublications standard will be particularly useful for newcomers to Cardano, making it easier for them to discover and iterate on existing work; thus it has a use in onboarding new developers, facilitating cross-chain collaborations, and raising awareness outside Cardano of what is being built here.

The Dashboard also has the potential to help Cardano with auditability and assessment of impact. It will enable us to more easily audit core learning from any work that is done - whether in Catalyst proposals, or elsewhere - and demonstrate exactly how the team derived that learning. Also, since the process of framing one’s work in the way required by the Dashboard will tend to emphasise conclusions and insights, this encourages us to look at the effects of what we do, and will help us as a community to see the impact that is being made across Cardano on particular topics.

While the tool is initially intended for use by developers, it will also have broader applications - it could be applied to any learning, research, or discovery process that is happening in the Cardano ecosystem, including Voltaire governance processes. The Nanopublications concept essentially deals with the ecosystem's memory - it helps Cardano ensure that we don’t lose or forget what we learn, and that we can continue to access and draw on it longterm.

In the long term, our team hopes to integrate AI tooling into this concept, using LLMs both to create AIDA statements and to compare them/discover similarity. In order to enable this kind of work (which could have far-reaching beneficial effects for Catalyst and Cardano) we need to build this initial proof-of-concept, and invite the developer community to use it.

We will measure our impact by:

Number of GitHub commits during the build process
Number of AIDA statements created during the data-population process
Qualitative feedback from data population team on ease/ difficulty of creating AIDA statements
Number of attendees at 3 awareness sessions held with the community
Number of views of our "how-to" documentation
Qualitative feedback from awareness sessions on usefulness of the approach
Number of people claiming bounties to add material to the database
Amount and quality of material added
Number of people claiming bounties to report insights from searches

We will share our outputs in the following ways:

The dashboard build process will be fully open-source, and trackable on GitHub.
Our initial “mini-whitepaper” on our proposed methodology, plus our documentation of the data population team’s working process, and the "how-to" documentation we create, will all be publicly available on the project's GitBook. We will share them widely in the Catalyst community via Discord, Telegram, Twitter, and the Cardano forum.
Our awareness sessions, our documentation, and our Dework bounties to the community, will enable us to share the dashboard and its underlying ideas widely.

Capability & Feasibility

What is your capability to deliver your project with high levels of trust and accountability? How do you intend to validate if your approach is feasible?

The team members are skilled and experienced members of the Catalyst community, and all have experience of working in transparent and open-source ways via GitHub, GitBook, and Dework, providing a trackable, accountable and trustworthy audit trail. See for example

Community Governance Oversight https://quality-assurance-dao.gitbook.io/community-governance-oversight/
Catalyst Voter Tool https://cardanocataly.st/voter-tool/#/ and https://github.com/Project-Catalyst/voter-tool

The Data Population team also have a thorough grasp of the Nanopublications standard, and an understanding of the concepts that underpin it, such as natural language processing and machine learning.

Our proposal includes not only documentation and sharing, but also thorough testing of the dashboard in practice, including inviting the community to test it and engage with the Nanopublications concept. This will offer a high level of trust and accountability, since the community itself verifies the validity of the work.

Project Milestones

What are the key milestones you need to achieve in order to complete your project successfully?

(to be completed at the end of Month 1): Mini-whitepaper. 12% of budget

Outputs

“Mini-whitepaper” defining the methodology that will be used for adding material to the database, and the database’s exact structure.

Acceptance criteria

The whitepaper and database structure are clear and easy to understand

Evidence

Whitepaper published on GitBook
Database structure defined on GitHub

>(to be completed at the end of Month 3): Dashboard building and data collection. 30% of budget.

Outputs:

a working Dashboard;
a spreadsheet of AIDA statements and provenance from c. 200 past Catalyst proposals

Acceptance criteria:

The Dashboard is usable
the AIDA statements cover a diverse range of completed Catalyst proposals

Evidence:

a working Dashboard
a spreadsheet of AIDA statements

>(to be completed at the end of month 4): data population and documentation. 28% of budget

Outputs:

The dashboard’s database is populated with material from past proposals;
A session plan for awareness sessions
A “how to” training video demonstrating how to add material to the Dashboard
Text-based “how to” on GitBook

Acceptance criteria:

Dashboard search function is working, and returns results from material added by the data population team
“How to” materials and frontend are accesible and clear

Evidence:

Searchable dashboard containing material from past Catalyst projects
Awareness session plan
“How-to” resources as video and text

>(to be completed at the end of month 5): Community testing

Outputs:

3 awareness sessions delivered across Catalyst community
Dework bounties created and widely publicised

Acceptance criteria:

Awareness session attendees rate the training as good or higher ( assessed via feedback form)

Evidence:

Attendees list and list of dates of awareness sessions
Bounties available on Dework

>(to be completed at the end of month 6): close-out and learnings. 16% of budget

Outputs:

Data from 30 new projects added to the dashboard via bounties, including data from this project itself.
Close-out report and video

Acceptance criteria:

Closing report is accepted by IOG
Data added from new projects is valid; developers are able to add data from their projects without problems

Evidence:

New data added to dashboard from 30 projects
Closing report and video

Post-funding:

bug fixes and minor feature additions for 1 year
Outcomes: the platform will continue to be usable, and to develop in response to input from users.

Resources

Who is in the project team and what are their roles?

Vanessa Cardui: Community engagement professional with 20+ years' experience of working with communities to help them engage in grounded-theory research, and record and archive their lives. Part of QA-DAO where she leads on documentation (for example, see documenting Catalyst Circle) part of CGO (Community Governance Oversight), where she facilitated meetings and edited the F8 closing report; founding member of The Facilitators’ Collective; part of the SingularityNET archives team; part of the SingularityNET DeepFunding Focus Group.

Role: managing data population team; managing mini-whitepaper writing team; overall project management and reporting.

Alokam Augusta Chinenyenwa: A dedicated and forward-thinking student of Computer Science, driven by a passion for ML, NLP, and blockchain technology. With knowledge in Data Science, community management, and a vision for applying and building ML tools on the blockchain, she is poised to make a significant impact in the field and drive innovation in the intersection of technology and decentralisation.

Role: Data population team; whitepaper writing team; devising “how-to” documentation.

Efua Edufua Abekah: A budding professional combining her academic background in Economics and French with a deep fascination for blockchain technology. Her journey has been marked by a continuous pursuit of knowledge and innovative applications of technology in financial systems. At Wada, where she serves as the Executive Assistant, she is responsible for providing administrative support for Wada leadership and providing support on projects. Her proficiency in languages and insights into economic frameworks uniquely position her to drive initiatives in creating more transparent, secure, and efficient financial systems with a keen eye on the potential of blockchain technology.

Role: Data population team; publicity and user engagement.

Phil Khoo - experience as an accountant, UI/UX frontend and graphic design and business advisor amongst numerous other pursuits. He currently has a lead position in the development and direction of Cardano AIM and is co-creator of the Community Tools.

Role: front-end and data design; whitepaper writing team; managing AIM development team; delivery of awareness sessions.

AIM Development Team: Cardano AIM has developed several dashboards and other tools that are widely used by the Cardano community, for example the Catalyst Voter Tool <https://cardanocataly.st/voter-tool/#/>

Role: initial dashboard build; ongoing maintenance for 1 year.

Budget & Costs

Please provide a cost breakdown of the proposed work and resources.

Dashboard build: 34,932 ADA

Initial build: 100 hours @ 178 ADA/hr = 17,800 ADA
Frontend design: 60 hours @ 178 ADA/hr = 10,680 ADA
Maintenance for 1 year (covers service costs; bug resolution, minor feature additions): 6,452 ADA

Mini-whitepaper: 1,614 ADA

Planning meeting (3 people x 323 ADA) = 969 ADA
Writing = 645 ADA

Data population team: 19,092 ADA

planning sessions with developer (4 people x 323 ADA) = 1,292 ADA
create and add data from 200 proposals: 100 hours, @ 178 ADA/hr = 17,800 ADA

Creating “how to” materials: 1,000 ADA

comprises short video; a text-based “how-to” on GitBook; and session-plan for awareness sessions

Awareness session delivery and user support: 2,978 ADA

646 ADA /session, x3 sessions = 1,938 ADA
ongoing support for people uploading their own material: 8 hours @ 130 ADA /hr = 1,040 ADA

Publicity and community engagement: 890 ADA

Publicising and sharing the Dashboard via Discord, Telegram, Twitter, Cardano forum, and via official Catalyst proposer channels - 5 hours @ 178 ADA/hr = 890 ADA

Community bounties: 1,675 ADA

We aim for 25 projects adding their info to the dashboard, x 50 ADA per project = 1,250 ADA (Note: lower rate than for the data population team, because proposers know their own proposals and don’t have the extra overhead of reading and understanding the proposal first)
plus 25 people reporting insights from searches, x 17 ADA per insight = 425 ADA

Project management: 8,000 ADA

(comprises team coordination, project documentation, monthly reporting, milestone reporting, wallet management and payments, close-out report and video.)

Total: 70,181 ADA

Value For Money

How does the cost of the project represent value for money for the Cardano ecosystem?

This project represents value for money because it combines building a tool with testing its practical use, populating it with enough data to make it usable from the start, user engagement and support, and ongoing maintenance for a year. The proposal also introduces some novel thinking, in the shape of adapting the Nanopublications standard; since this approach is new to Catalyst, we believe it will prove interesting and fruitful for the developer community, and that this proposal could represent the start of some far-reaching benefits to the Cardano ecosystem for a relatively low cost.

The pay rates given are standard freelance rates in the relevant fields in the parts of the world where we are based. (Note that freelance rates are generally higher than salary rates, since they take into account the employment overheads of the people contracted. For example, freelancers, unlike employees, do not get sick pay, holiday pay, or national insurance contributions, and have to pay all the overheads for their own workspaces.)

Everyone working on this project is taking on the currency risk of being paid in ADA. When converting our costs to ₳, despite the recent price rally, we have anticipated continued market macro conditions that will suppress ₳ prices by Spring 2024 when the project begins; so this is reflected in our costings.

Based on all the above, we believe this relatively low-budget proposal offers excellent value for money.

bookmarked!

bookmarked!