Please describe your proposed solution.
The problem
Self-Sovereign Identity (SSI) movement is gaining momentum. In light of the recent events in the crypto space, trust is more important than ever. But also outside the crypto-community we are in dire need of new solutions to problems of trust, digital identity and the pressure of centralization. Initially, the idealistic idea of a few, Decentralized Identifiers (DIDs) and Verifiable Credentials (VCs) are getting ready for prime-time. Even big organizations like Microsoft are rolling out first SSI solutions available for general use. With Atala PRISM, IOG presented its own technical implementation of that concept.
New projects, integrations, and businesses are starting to emerge (not only thanks to Catalyst) to leverage this new promising technology. Most prominently, it is IOG itself, which is collaborating with the Ministry of Education in Ethiopia to get students all over the country onboarded onto Atala PRISM. The US network provider DISH is in the process of rolling out their customer loyalty program based on PRISM, World Mobile is using DIDs in their network infrastructure and many other projects (in and outside of Catalyst) are on the verge of using PRISM in production – so the statements read.
But what’s happening behind the curtain? How many DIDs are there on the Blockchain? Does it really get used? What are the trust relationships between entities? In this proposal, we are presenting an open web-based analytics software to answer these questions.
We’ve clustered the questions into three categories:
<u>Questions from the community and the public:</u>
- How many DIDs have been created on the Cardano Blockchain? How many credentials have been issued?
- What is the development over time? How is developer adoption progressing on the testnet? What is happening on the mainnet? How is the growth evolving? Are there any trends that can be identified?
- Is what we hear from the larger players and projects just hot air, or can the claims be backed up by raw data on the chain?
<u>Detail questions about the properties and history of individual DIDs or a set of DIDs:</u>
This is primarily of interest to organizations or developers who want to examine a set of DIDs/VCs.
- What are the DIDs with the highest number of issued credentials?
- How does a particular subset of DIDs compare to another? (engagement, issuing, key-rotation, …)
- Which Cardano payment addresses paid for which DIDs/transactions? (mapping of market participants and verification of billing data)
- How does the behavior of DIDs and linked VC look on a timeline?
- What is the history of Operation-Hashes for specific DIDs? (Important missing feature of the SDK)
- What does the SSI-crosschain activity look like? (e.g., VC issued on Cardano to holder-DIDs on KILT)
<u>Questions about the relationship between DIDs</u>
The most interesting questions, however, arise from the representation of DIDs and Verifiable Credentials as a graph. Graphs are used to represent the connections of different entities to each other, e.g., as a social graph: Alice knows Bob. Bob is an employee of Charlie, etc.
In the domain of SSI, the DIDs can be mapped as nodes in a graph. The relationship between the DIDs are then represented by the Verifiable Credentials (the edges in the graph). But DIDs do not always have to represent people (they can also represent a technical device, e.g., a network node, a mobile phone, a location, a vehicle, etc.) and the relationship could also be very abstract and technical, not just personal.
On the blockchain, the DIDs are stored in their entirety (i.e. all DID metadata is stored on the blockchain apart from the private keys), while the VCs are stored on the blockchain only in hashed form. Therefore, the contents of a credential are not stored on the blockchain itself – only the hash and the information that a certain DID issued some kind of credentials at a specific point in time. Only with the existence of the full, unmodified credential we are able to prove that it was indeed that exact credential that was issued by that DID. This is a deliberate design decision to guarantee privacy. So with reading the blockchain with all PRISM transactions, there would still be no graph at this point, since the nodes are there, but not the edges that define the relationship to another DID (node).
However, VCs were not designed to only be private. They are supposed to be verifiable by a third party (as the name implies). To accomplish this, the original credential must be passed to that third party. This is the only way the third party could verify that the VC's hash is on the blockchain. In some cases, this involves sharing the VC with only a specific person who can verify the VC. In other cases, there is no reason against making the VC completely public (e.g. on a website). Good examples of this are endorsements "Company X has a great product", reviews "The service was good, happy to come back" or many types of achievements "Person X attended the course with best grades". These types of credentials are only effective when they are made public and available for all to see and verify by themselves.
In addition, it is also not possible to regain control of the VCs once they have been shared for verification purposes. Nor can anyone force that VCs once shared should be deleted or cannot be passed on to third parties (the use of ZK-proofs is to be left aside here for the time being, that these will not be available for PRISM in the foreseeable future).
The fact, that these VCs are often public, makes them theoretically able to be collected and inspected by anyone. By importing them into a graph, we can now supply the edges of the graph and see the relationships between the DIDs visually. But why, you might ask? A developer might be working on a project and wants to verify a chain of credentials with DIDs rapidly changing keys. Or it might be used by a company which issued thousands of VCs which they have to monitor regularly. Image the VCs representing security guaranties which third parties rely on with SLAs regarding their currency. But more importantly, it might be used by individuals who want to trace a chain of trust. In the world of SSI we are frequently talking about emerging trust networks which are built button up: In these networks trust is not derived by a single central authority which some time ago, issued a certificate to e.g., a dentist, but by hundreds of people who write reviews, share their (sometimes professional) opinions and delegate some of their trust by endorsement. After a while, a complex network emerges which offers more guaranties than just a single certificate on a wall, or untraceable fake reviews on Google. With an ever-growing ecosystem, we also need tools to better understand what we are building.
The final building block of this graph are trust registries. These represent trust anchor points, in which might be endpoints of a trust chain. I trust Alice because Alice got a VC from Bob, and Bob is listed in a trust register. DIDs which are part of trust registries are an essential part of any graph representation.
So, ultimately, it all comes down to the question of
- What are the trust relationships between DIDs? What types of credentials are used? Are they valid and trustworthy?
A step towards a solution
This proposal presents a web-based analysis tool that can be used to answer all of the above questions. The tool is divided into three sections according to the questions:
- The statistical overview, with already prepared live generated reports, to get an overview of the actual usage of PRISM.
- An analysis area to get to the bottom of certain specific questions that are relevant for individual users, companies, developers. E.g.: How many credentials were issued by this DID? How many DIDs associated with this DID show engagement in the last 30 days?
- A graph view section to perform dedicated analysis and investigate trust chains in complex networks.
At present, there is no analysis tool that can sufficiently answer the above questions - not even internal tools of IOG. In many respects, we are all operating in the dark: everyone with a small candle in their hand which lights their DID and VCs, but little else.
It should be emphasized that this is not at all about acting as a data crawler, collecting the imported VCs in order to build a complete graph of all DIDs and VCs (which is not possible btw). Nor is it about developing a DID forensic tool to uncover statistically based correlations by means of analysis of payment-flows and temporal sequences, but rather about building a toolkit that helps us to further develop the SSI ecosystem around Atala PRISM and to shed some more light into the darkness. We believe that professional tooling will be a major reason for the adoption of PRISM by large enterprises.
<u>What will the platform look like? </u>
First of all, it will be a public website that displays the most important statistical information of all PRISM-related operations from the testnet/mainnet in real-time. It will also provide the ability to quickly search for DIDs to get an overview over publicly available information.
Additionally, there will be a secure private area (login with DID: already implemented by us as a proof of concept -> see demo of the blocktrust identity wallet). Here the user will be able to specify multiple set of DIDs, import VCs and manage and execute more complex queries. This private area is subdivided into:
- An area for statistical information on the given sets of DIDs and a timeline of the DIDs/VCs that are being analyzed. As well as engagement metrics, usage, etc. For professional users, this data might even be provided to be directly linked to be used in Power BI or Tableau.
- The described graph view to examine individual trust chains.
Both the public and private sections are intended to be made completely free with the features described here. (Subject to rate limiting and implementation of API keys for excessive use, or advanced export functionality for enterprise users).
<u>The application consists of the following parts:</u>
- Blocktrust Connector: .net application running on Linux as a service to extract all PRISM related information from PostgreSQL DB(-> db-snyc). Decode and verify them and send them to a graph-based Cosmos DB Instance on Azure.
- Different Azure Services, including the before mentioned Cosmos DB, as well as a scheduled application running Gremlin queries against the graph database to extract statistical information about the state of PRISM (usage, number DIDs/VC). Redis Cache to store frequent historical queries.
- Web-API based application backend behind an API-Management (for rate limiting), to serve the different kinds of personalized queries from the frontend. Allows to import VCs into the graph and maintain separation of user-accounts with their respected DIDs and imported VCs.
- Blazor / Typescript-based frontend
A visual representation of the architecture, the setup of PRISM and a mock-up of the graph view can be found here: <https://blocktrust.dev/analytics>
Please describe how your proposed solution will address the Challenge that you have submitted it in.
The proposed solution is an essential building block for everyone involved in SSI. It helps the community to monitor and verify the growth and usage of Atala PRISM. It also offers developers and businesses a tool to better understand trust relationships.
This being not only a tool for developers, but everyone involved with SSI, we believe this challenge is the best fit for this proposal.
What are the main risks that could prevent you from delivering the project successfully and please explain how you will mitigate each risk?
Technical risks
From a technical perspective, the risks can be considered to be relatively low. The proposed technologies have all been used before, and there is a good understanding of what can and cannot be achieved using the existing data. For testing purposes, PRISM data has already been extracted from the testnet and made analyzable. The aim of this proposal is therefore to build the infrastructure to make the data available to the public. All necessary libraries for handling PRISM data have already been developed by us.
There is already experience with large datasets of relational, non-relational and graph data. The analysis and presentation of these should not pose any major problems.
Planning risks
Software development is difficult to plan. Most of the time there are bigger delays than expected. With our experience in large software projects, we are fully aware of this and plan with appropriate margins. But as the project progresses, goals and requirements naturally shift, and features that were initially considered easy or great later become difficult or impossible to implement. Instead, other, better ideas emerge and are being developed instead. We wouldn't call this a risk, but something to be aware of if you're not familiar with agile software development.
Budget risks
Due to growing data volumes over time and complex queries, there is generally a risk of increasing costs of the cloud infrastructure. In this case, some kind of rate limiting may have to be introduced to make the service usable for all. However, possible budget overruns are covered by us, as outlined below, and therefore do not pose a risk to the completion or operation of the website.