Nanopublications Dashboard: a searchable natural language tool for atomic knowledge-sharing

Funds Fund 10 Proposals Development & Infrastructure Nanopublications Dashboard: a searchable natural language to...

not approved

View on Ideascale View on projectcatalyst.io

Current Project Status

Unfunded

Amount
Received

₳0

Amount
Requested

₳82,800

Percentage
Received

0.00%

Solution

Build a dashboard to collate clear, atomic, searchable learning points from Catalyst projects (similar to nanopublications, but more accessible); and provide training in how to submit material to it.

Problem

Learning-points from Catalyst projects are hard to find - they’re often poorly-evidenced, and buried in unsearchable PDFs / videos. Connections between different projects’ discoveries are opaque.

Image file

Feasibility

Value for money

Impact / Alignment

Nanopublications Dashboard: a searchable natural language tool for atomic knowledge-sharing

Impact

Please describe your proposed solution.

Making learning from Catalyst projects usable

Our community’s 500 completed Catalyst projects represent a large corpus of knowledge. However, much of this material remains locked up in closeout reports in the form of videos and PDFs, which are unsearchable and undiscoverable; so once a project has presented its close-out, the learning from it is often forgotten. Even where material is held on searchable platforms, it often contains no clear link between a specific piece of knowledge and the evidence that supports it, and no clear attribution to the person who discovered it. There is also no easy way to see connections (or even interesting contradictions) between what different projects have learnt; and no way for a new proposer to look at what has been discovered already and build on it. So we often end up either losing knowledge that is useful to the ecosystem, or reinventing it time and again.

The solution we propose is to build a platform where project teams can add individual, atomic learning points from their completed Catalyst projects, expressed in a CNL (controlled natural language), and supported by links to the evidence and attribution for each learning point. We’ll also offer training in exactly how to shape your material into these individual learning points; and we’ll populate the database with learning from completed Catalyst projects from Fund 3 to Fund 8, so that we can a) test the database and the process, and b) provide some material that can be searched, to test the database’s ability to surface the hitherto-hidden connections between projects.

The theory and research behind our approach

The idea is based on nanopublications: an approach that is popular in life-sciences research, but has yet to reach very far beyond that field. Essentially, a nanopublication is “the smallest possible unit of publishable information” - a small, discrete, machine-readable assertion, supported by provenance information (i.e. what the assertion is derived from, and the research evidence that supports it). It’s an excellent way to share research - but the problem with it is that a nanopublication is expressed in RDF notation, (a WC3 standard originally designed as a data model for metadata), which presents a barrier to adoption because many people find it difficult, and it might not even be appropriate to express some kinds of knowledge.

So in this proposal, we draw on research by Tobias Kuhn et al in 2013, which looked at how to broaden the scope of the “nanopublication” concept by using CNLs (controlled natural language) rather than RDF triples to express a research conclusion; so people can essentially write their learning-points in normal English. Kuhn’s research developed a concept called the “AIDA statement” (an acronym for “Atomic, Independent, Declarative, Absolute”, and nothing to do with the “AIDA” acronym used in the field of marketing!) - a simple framework for what a nanopublication statement in natural language should look like. This is the approach we intend to use.

Atomic: a sentence describing one thought that cannot be further broken down in a practical way
Independent: a sentence that can stand on its own, without external references like “this effect” or “we”
Declarative: a complete sentence ending with a full stop that could in theory be either true or false
Absolute: a sentence describing the core of a claim, ignoring the (un)certainty about its truth and ignoring how it was discovered (no “probably” or “evaluation showed that”); typically in present tense

How we’ll put this research into practice

Kuhn found that scientists were fairly easily able to create AIDA statements from the abstracts of published research papers. Based on this, we feel confident that with a little training (which we will provide), Catalyst proposers will be able to do the same with material from their monthly or closing reports.

Note that while this process might most obviously be a fit for research-based projects, our initial explorations have shown that it is also very effective for developer projects, especially when (as they commonly do) they have documented their progress, and noted bug-fixes, results of user-testing, etc. It also works very well for education and community engagement projects, who usually use some degree of reflective practice.

Once the material is expressed in this atomic, declarative way, it can then be connected to the provenance that supports it - this could be any link, from a heading in a document or a timestamp in a video, to a GitHub commit, a Tweet, a Miro board, a cell in a spreadsheet, or anywhere else a project recorded its discoveries.

Entering this material into our database via a dashboard-style frontend means it will be searchable by project, by keyword, by person, etc; so connections and similarities between research conclusions from different proposals will become visible. We will also be able to see attribution (i.e. which project or person came up with this learning?), which will help us become more aware of where insights are coming from. Also, note that additions to the platform would not necessarily have to be restricted to material about Catalyst project reporting. Potentially, the community could also add the knowledge that surfaces in a meeting, a collaborative document, or a Twitter space, by translating it into a series of AIDA statements and adding it.

To make the dashboard usable and valuable from the start, this proposal includes a process for our team to populate it with data from finished Catalyst proposals from F3 to F8 and run some test searches, thereby testing that the database is working as intended, and refining the methodology before trying to teach it to others. This work will cover 200 proposals, creating 3 to 5 AIDA statements from each one.

We will then open the dashboard to the community. We'll offer training and some ongoing support; and then proposers of finished projects will be incentivised via Dework bounties to add the learning from their F8 and F9 proposals. We’ll also offer small bounties for people to send us a record of any useful and interesting connections they have discovered from searching the database, which we’ll collate on the project GitBook as a way of demonstrating what kind of insights the dashboard is helping the community to uncover.

So our process will be

Build the platform; meanwhile, the data-population team prepares AIDA statements from completed Catalyst projects.
Add prepared data to the dashboard and run test searches.
Train the community in how to use it: create a short training video and a text-based learning resource, and run some training sessions.
Offer bounties for community members to add their projects; and smaller bounties for people to record useful connections and insights that they have discovered from searching the database.

Essentially, this approach frames the things we do in Catalyst (potentially, everything we do, from proposals, to After TownHalls, to discussions on Telegram or Twitter) as the “experiments” we have always said they are. It will help clarify and evidence what the community has learnt from projects and conversations, and make that learning searchable and discoverable; it will also surface new insights and previously-unseen connections. Our approach is adaptable to both qualitative and quantitative insights, and it turns all our discussions and ideas into a collaborative research pool that we can all draw on.

How does your proposed solution address the challenge and what benefits will this bring to the Cardano ecosystem?

Our proposal addresses the core question of the challenge by offering both a tool, and community-led research and documentation to support it, to enhance the developer ecosystem. In the words of the challenge, we are bringing “standards, resources [and] documentation that bring … novel innovation to the ecosystem”. This proposal’s adaptation of the nanopublications standard will make it easier to develop on Cardano, by making it easier to research and build on existing knowledge.

Developers on Cardano, particularly newcomers, will be able to use the Nanopublications Dashboard to find out what has already been created in past projects, and iterate on it, very much in the way that traditional nanopublications help academics to discover and build on existing research. This helps Cardano developers to amplify existing discoveries, rather than reinventing the wheel. The Dashboard also enables developers to log insights from their own work, facilitating proper attribution, and helping them find and collaborate with others who are working on similar ideas. Insights added from completed projects might include pitfalls or problems, thus helping future developers avoid or address them. Overall, the proposal offers an approach that can help the developer ecosystem become more iterative and more collaborative.

The benefits to Cardano as a whole include helping ensure that we don’t lose or forget what we learn (whether from Catalyst funded proposals or anything else), and that we can continue to access and draw on it longterm. It will help us see the connections between different projects’ discoveries; it will also help us see any points of disagreement between proposals on similar topics, which could provide fruitful avenues for further exploration. In short, it forms part of our community's memory.

The Dashboard also has the potential to help Cardano with auditability and assessment of impact. It will enable us to audit core learning from a proposal more easily, and track exactly how the team derived that learning. Also, since the process of framing one’s work in the way required by the Dashboard will tend to emphasise conclusions and insights, this encourages us to look at the effects of what we do, and will help us as a community to see the impact that is being made across Catalyst on particular topics.

In the long term, the team hopes to integrate AI tooling into this concept, using LLMs both to create AIDA statements and to compare them/discover similarity. In order to enable this kind of work (which could have far-reaching beneficial effects for Catalyst and Cardano) we need to build this initial proof-of-concept and engage the community with how to use it.

How do you intend to measure the success of your project?

Number of GitHub commits during the build process
Number of AIDA statements created during the data-population process
Qualitative feedback from data population team on ease/ difficulty of creating AIDA statements
Number of training sessions held with the community (we aim for 10)
Number of pageviews of our training material on GitBook
Qualitative feedback from training sessions on usefulness of the approach and how easy/difficult it is to use
Number of people claiming bounties to add material to the database
Amount and quality of material added
Number of people claiming bounties to report insights from searches

Please describe your plans to share the outputs and results of your project?

The dashboard build process will be fully open-source, and trackable on GitHub.

Our initial “mini-whitepaper” on our proposed methodology, plus our documentation of the data population team’s working process, and the training materials we create, will all be publicly available on the project's GitBook. We will share them widely in the Catalyst community via Discord, Telegram, Twitter, and the Cardano forum.

Our 10 training sessions, and the process of publicising our Dework bounties to the community, will enable us to share the dashboard and its underlying ideas widely.

Capability/ Feasibility

What is your capability to deliver your project with high levels of trust and accountability?

All the team members are skilled and experienced members of the Catalyst community, and all have considerable experience of working in transparent and open-source ways via GitHub, GitBook, and Dework, providing a trackable, accountable and trustworthy audit trail. See for example

Community Governance Oversight <https://quality-assurance-dao.gitbook.io/community-governance-oversight/>
Catalyst Voter Tool <https://cardanocataly.st/voter-tool/#/> and <https://github.com/Project-Catalyst/voter-tool>

Our proposal not only includes thorough documentation and sharing; it also rests on a substantial amount of community engagement with, and testing of, the nanopublications dashboard, and engaging with the nanopublication concept. This will offer a high level of trust and accountability, since the community itself tests the validity of the work.

Building a tool for the community is not the end of the process - often, the follow-up work of building a user base and increasing engagement is overlooked. It is for this reason that this relatively simple platform requires additional resources and thinking to populate the dashboard and bring the community along with training and engagement activities.

What are the main goals for the project and how will you validate if your approach is feasible?

The goals of the proposal are:

1) initial mini-whitepaper to outline our proposed methodology. We will validate that our approach is feasible by comparing our approach with existing research in the nanopublications field; testing our methodology (including timings) on some existing proposals; and drawing on our previous experience of building dashboard-type tools. The community can validate that this part of the work has been done by reading the mini-whitepaper itself. NOTE: this background work is already mostly done, and will just need to be written up in a “mini-whitepaper”.

2) build the dashboard. We will validate that our approach is feasible by testing at key stages in the build process, and checking that the build aligns with the mini-whitepaper on how the dashboard will be used. The community can validate that the work has been done by GitHub commits and documentation.

3) test the dashboard and the process, by populating it with data from finished proposals from previous Funds. This work can be validated because it will be viewable and searchable in the dashboard itself. Also, this process validates and tests our ideas about how the dashboard will work, and irons out bugs before we open it to the community.

4) offer training. We plan to deliver 10 sessions, in (for example) Swarm sessions, Gimbalabs Playgrounds, and various Town Halls in the ecosystem. We will validate that our approach to this is feasible by asking for feedback from training session attendees, to check that they feel confident in what they have learnt and found it useful; by providing some ongoing support for any questions that arise after a session; and by modifying future sessions in response to input if needed. The community will be able to validate that the work has been done via a list of attendees, and a record of the training sessions on Miro boards.

5) open the dashboard for the community to use. We can validate that our approach to this was effective by monitoring how many people take up bounties to add their material to the database; evaluating the kinds of insights that are emerging when people conduct searches; and monitoring the quality of the material that is being added. The community can validate that this work was done by looking at additions to the dashboard, and viewing the takeup of bounties on Dework.

Please provide a detailed breakdown of your project’s milestones and each of the main tasks or activities to reach the milestone plus the expected timeline for the delivery.

Milestone 1 (2 weeks after getting funded): Setup, scoping, and initial whitepaper. 12% of budget.

Data population team meet to scope exact methodology (including any variations from the original process created by Kuhn et al);
Development team meet to define exact structure of database
Initial payment of 20% to AIM developer team
Mini-whitepaper outlining our approach

Milestone 2 (2 months after getting funded): building and data collection. 30% of budget 

Dashboard build. 2nd payment to developer team, 20%.
Data population team create AIDA statements, supported by provenance, for 200 past Catalyst proposals, ready to import.
Data population team and build team will be in communication throughout, to ensure that their work dovetails.

Milestone 3 (3 months after getting funded): data population, and training plan. 28% of budget

Data from past proposals added to the database
Any necessary tweaks to the dashboard are made
Final payment to developer team, 60%
Create training materials (“how to” video; text-based “how to”; session plan for live sessions).

Milestone 4 (4 ½ months after getting funded): community engagement and bounties. 14% of budget

Training sessions delivered at 10 places in Catalyst (several Town Halls, Swarm session, Gimbalabs Playground, Catalyst Coordinator meeting, etc)
Dework Bounties created and published for proposers to add their project data to the Dashboard
Ongoing support offered via DM or Zoom to anyone wanting to take up a bounty

Milestone 5 (6 months after getting funded): close-out and learnings. 16% of budget

Bounties paid to c. 80 people adding project data to the dashboard
Ongoing maintenance payment to developer team
Project officially closed
Key learning points from the project added to the Dashboard

Post-funding:

The dashboard will be maintained for a year.

Please describe the deliverables, outputs and intended outcomes of each milestone.

Milestone 1 (2 weeks after getting funded): Setup, scoping, and initial whitepaper. 12% of budget

Deliverables: “Mini-whitepaper” defining methodology that will be used for adding material to the database, and the database’s exact structure

Outcomes: The team will know exactly what we are working to.

Milestone 2 (2 months after getting funded): building and data collection. 30% of budget.

Deliverables: a working Dashboard; a spreadsheet of AIDA statements and provenance from c. 200 past Catalyst proposals

Outcomes: We will have refined the details of the process by practice, and we will have a dashboard that works with that process.

Milestone 3 (3 months after getting funded): data population and training plan. 28% of budget

Deliverables: a database populated with material from past proposals; a plan for a training session; “how to” training video and text

Outcomes: The dashboard will now have material that can be searched, so that people can begin to see its usefulness; the team will be ready to deliver training to encourage people to use it.

Milestone 4 (4 ½ months after getting funded): community engagement and bounties. 14% of budget

Deliverables: Record of 10 training sessions delivered, and training materials widely shared; Dework bounties created and widely publicised

Outcomes: A wide range of people will be ready to add their F9 proposal data to the dashboard

Milestone 5 (6 months after getting funded): close-out and learnings. 16% of budget

Deliverables: Data from c. 50 new projects added to the dashboard via bounties, including data from this project itself. Close-out report and video

Outcomes: project successfully closed, with ideas for further development.

Post-funding:

Deliverables: bug fixes and minor feature additions for 1 year

Outcomes: the platform will continue to be useable, and to develop in response to input from users.

Resources & Value For Money

Please provide a detailed budget breakdown of the proposed work and resources.

Dashboard: 34,400 ADA

Initial build: 80 hours @ 220 ADA/hr = 17,600 ADA
Frontend design: 40 hours @ 220 ADA/hr = 8,800 ADA
Maintenance for 1 year (covers service costs; bug resolution, minor feature additions): 8,000 ADA

Mini-whitepaper: 2,000 ADA

Planning meeting (3 people x 400 ADA) = 1,200 ADA
Writing = 800 ADA

Data population team: 23,600 ADA

onboarding/planning session (4 people x 400 ADA) = 1,600 ADA
create and add data from 200 proposals: 100 hours, @ 220 ADA/hr = 22,000 ADA

Creating training materials: 2,000 ADA

comprises short video; text-based “how-to” on GitBook; session-plan for training sessions

Training sessions delivery: 9,600 ADA

800 ADA /session, x10 sessions = 8,000 ADA
ongoing support for people uploading their own material: 8 hours @ 200 ADA /hr = 1,600 ADA

Community bounties: 3,200 ADA

We aim for 40 people adding their project info to the dashboard, x 60 ADA per project = 2,400 ADA (Note:  lower rate than for the data population team, because proposers know their own proposals and don’t have the extra overhead of reading and understanding the proposal first)
plus 40 people reporting insights from searches, x 20 ADA per insight = 800 ADA

Project management: 8,000 ADA

(comprises team coordination, project documentation, monthly reporting, milestone reporting, wallet management, close-out report and video.)

Total: 82,800 ADA

Who is in the project team and what are their roles?

AIM Development Team - dashboard build
Phil Khoo - experience as an accountant, UI/UX frontend and graphic design and business advisor amongst numerous other pursuits. He currently has a lead position in the development and direction of Cardano AIM and is co-creator of the Community Tools. Role: front-end and data design; delivery of training.
Vanessa Cardui: Community engagement professional with 20+ years' experience of working with communities to help them engage in grounded-theory research, and record and archive their lives. Part of QA-DAO where she led on documenting Catalyst Circle; part of CGO (Community Governance Oversight), where she facilitated meetings and edited the F8 closing report; founding member of The Facilitators’ Collective Role: managing data population team; training materials; project management and reporting
Alokam Augusta Chinenyenwa: a dedicated and forward-thinking student of Computer Science, driven by a passion for ML, NLP, and blockchain technology. With my knowledge in Data Science, community management, and a vision for applying and building ML tools on the blockchain, I am poised to make a significant impact in the field and drive innovation in the intersection of technology and decentralization. Role: Data population team
**Stephen Whitenstall (**LinkedIn: <https://www.linkedin.com/in/stephen-whitenstall-166727210/> , Twitter: https://twitter.com/qa_dao) is the co-founder of QA-DAO, <https://qadao.io/> , and has provided project management consultancy for many Catalyst projects since Fund 4 including Catalyst Circle, Audit Circle, Community Governance Oversight, Training & Automation (with Treasury Guild), Governance Guild and Swarm. A Circle V2 representative for funded proposers. Also engaged in cross chain collaboration with SingularityNET managing an Archive project. He has 30 years experience in development, test management, project management, social enterprises in Investment Banking, Telecoms and Local Government. A philosophy honors graduate with an interest in Blockchain governance. Role: Data population team; reporting
We are awaiting confirmation from the last member of the data population team.

How does the cost of the project represent value for money for the Cardano ecosystem?

This project represents value for money because it combines building a tool, testing its practical use, and community engagement and education with it; not just building alone. It uses some novel thinking which is new to Catalyst, and we believe it will produce something that could have far-reaching benefits to the Cardano ecosystem for a relatively low cost.

The pay rates given are standard freelance rates in the US and Europe in the relevant fields. (Note that freelance rates are higher then salary rates, since they take into account the employment overheads of the people contracted. For example, freelancers, unlike employees, do not get sick pay, holiday pay, or national insurance contributions, and have to pay all the overheads for their own workspaces.)

Additionally, everyone working on this project, like everyone in Fund 10, is taking on the currency risk of being paid in ADA. When converting our costs to ADA, we have anticipated continued market macro conditions that will suppress ₳ prices. As of mid-July, we are basing our conversion rate of $0.25 (V$/₳ = 0.25) on the lower bound of the support channel established around the 200dma ($0.24-$0.35).

Based on the above, we believe this proposal offers excellent value for money in a volatile cryptocurrency environment.

bookmarked!

bookmarked!