Please describe your proposed solution.
<u>Making learning from Catalyst projects usable</u>
Our community’s 500+ completed Catalyst projects represent a large corpus of knowledge. However, much of this material remains locked up in closing reports in the form of videos and PDFs, which are unsearchable and undiscoverable; so once a project has presented its close-out, the learning from it is often forgotten. Even where material is held on searchable platforms, it often contains no clear link between a specific piece of knowledge or developer-relevant information, and the evidence that supports it; and no clear attribution to the person who discovered it. There is also no easy way to see connections (or even interesting contradictions) between what different projects have learnt; and no way for a developer who is making a new proposal to look at what has been discovered already and build on it. So we often end up either losing knowledge that is useful to the ecosystem, or reinventing it time and again.
The solution we propose is to build a searchable platform where developers and project teams can add individual, atomic learning points from their completed Catalyst projects, expressed in a CNL (controlled natural language) format, and supported by links to the evidence and attribution for each learning point. This will enable people to search for insights on specific topics, or from specific projects or types of projects, and immediately see connections or contradictions.
<u>The theory and research behind our approach</u>
The idea is based on nanopublications: an approach that is popular in life-sciences research, but has yet to reach very far beyond that field. Essentially, a nanopublication is “the smallest possible unit of publishable information” - a small, discrete, machine-readable assertion, supported by provenance information (i.e. what the assertion is derived from, and the research evidence that supports it). It’s an excellent way to share knowledge and make ecosystem-wide connections between the things we are building - the only drawback is that a “classic” nanopublication is expressed in RDF notation (a WC3 standard originally designed as a data model for metadata). This presents a barrier to adoption because many people find it difficult, and because it might not even be appropriate to express some kinds of knowledge.
So in this proposal, we draw on research by Tobias Kuhn et al in 2013, which looked at how to broaden the scope of the “nanopublication” concept by using CNLs (controlled natural language) rather than RDF triples to express a research conclusion; so people can essentially write their learning-points in normal English.
Kuhn’s research developed a concept called the “AIDA statement” (an acronym for “Atomic, Independent, Declarative, Absolute”, and unrelated to the “AIDA” acronym used in the field of marketing!) AIDA is a simple framework for what a nanopublication statement in natural language should look like, and is the approach we intend to use.
- Atomic: a sentence describing one thought that cannot be further broken down in a practical way
- Independent: a sentence that can stand on its own, without external references like “this effect” or “we”
- Declarative: a complete sentence ending with a full stop that could in theory be either true or false
- Absolute: a sentence describing the core of a claim, ignoring the (un)certainty about its truth and ignoring how it was discovered (no “probably” or “evaluation showed that”); typically in present tense
<u>How we’ll put this research into practice</u>
Kuhn found that scientists were fairly easily able to create AIDA statements from the abstracts of published research papers. Based on this, we feel confident that with some supporting "how-to" documentation (which we will create), Catalyst developers will be able to do the same with material from their monthly or closing reports. Once the material is expressed in this atomic, declarative way, it can then be connected to the provenance that supports it - this could be any link, from a heading in a document or a timestamp in a video, to a GitHub commit, a Tweet, a cell in a spreadsheet, or anywhere else a project recorded its discoveries.
While the “nanopublications” approach has, until now, mainly been used for research-based projects, our initial explorations have shown that it is also very effective for developer projects, especially when (as they commonly do) they have documented their progress, and noted bug-fixes, results of user-testing, etc.
If a developer enters their material, expressed as AIDA statements, into our database via a dashboard-style frontend, it will then be searchable by project, by keyword, by developer, etc; so connections and similarities between different projects will become visible. We will also be able to see attribution (i.e. which project or person came up with this insight?), which will help us become more aware of where insights are coming from, and help developers working on similar things to find each other. Insights added to the tool might include pitfalls or problems, thus helping future developers avoid or address them. Also, note that additions to the platform would not necessarily have to be restricted to material from Catalyst project reporting - potentially, developers could also add the knowledge that surfaces in a meeting, a collaborative document, or a Twitter space, by translating it into a series of AIDA statements and adding it.
Once the platform is launched, our intention is for developers to add learning points to it themselves, from their own proposals in F11 onwards. But in order to make the platform usable and valuable from the outset, we will add a corpus of searchable data from completed Catalyst projects from Fund 7 to Fund 10 (200 proposals, 3 to 5 AIDA statements from each one). While this data-population work is not typical of the process of building a tool, in this instance we consider it an essential component of the tool's functionality. If we simply built an empty database and waited for developers to fill it, it could be a long time before the tool was actually useful. Populating it with retrospective data not only lets us test its functionality, but also enables it to be used immediately by developers in the way we intend - i.e. to search for and build on prior learning in Catalyst.
We will then open the dashboard to the community. We'll offer simple documentation, as video and text, showing exactly how to shape one’s material into AIDA statements; 3 awareness sessions in different parts of the developer community to raise interest; and some ongoing user support via DM. Then proposers of finished projects will be incentivised via bounties to add the learning from their F10 and F9 proposals. We’ll also offer small bounties for people to send us a record of any useful and interesting connections they have discovered from searching the database, which we’ll collate on the project GitBook as a way of demonstrating what kind of insights the dashboard is helping the community to uncover.
Essentially, this approach frames the things we do in Catalyst (potentially, everything we do, from proposals, to After TownHalls, to discussions on Telegram or Twitter) as the “experiments” we have always said they are, complete with the research insights that characterise experimentation. It will help clarify and evidence what we're actually learning from projects, and will make that learning searchable and discoverable; it will also surface new insights and previously-unseen connections. Our approach turns development projects into a collaborative research pool that we can all draw on, and embeds attribution and recognition for developers. In this way, it supports Cardano's open-source ethos, by supporting developers to amplify and build on each other’s work.