MLabs - Purus: PureScript to Plutus Core compiler

Funds Fund 10 Proposals OSDE: Open Source Dev Ecosystem MLabs - Purus: PureScript to Plutus Core compiler

funded

View on Ideascale View on projectcatalyst.io

Current Project Status

In Progress

Amount
Received

₳490,239

Amount
Requested

₳619,761

Percentage
Received

79.10%

Solution

Create a PureScript compiler backend for UPLC. PureScript has a strict runtime and almost all Haskell features. Some of the tooling can be reused.

Problem

Cardano on-chain languages are either suboptimal (Plutus) or are too low-level (Aiken, Plutarch). We need a fully-featured functional language with good performance and script binary size.

Value for money

Feasibility

Impact / Alignment

MLabs - Purus: PureScript to Plutus Core compiler

Impact

Please describe your proposed solution.

Problem

Cardano smart contract development is far more painful than it needs to be. Our team has the utmost respect for the existing Cardano contract languages (PlutusTx/Plutarch/Aiken), which are impressive technical achievements. Nevertheless, our practical experience using these languages has shown that technical sophistication alone does not amount to an adequate solution to problems faced by developers working in the trenches of real-world development.

Some of the problems we have encountered with existing solutions are:

Why not PlutusTx?

PlutusTx employs the TemplateHaskell language extension to enable users to write vanilla Haskell code that compiles down to UPLC. On paper, this appears to be an excellent solution. The Haskell development ecosystem contains excellent tooling (mature build systems, a reliable language server that integrates with every mainstream editor/IDE, etc). Moreover, GHC is capable of performing remarkable optimizations when compiling Haskell programs, which presumably could be leveraged to generate efficient and compact UPLC.

Unfortunately, PlutusTx falls woefully short in practice. Most importantly, the UPLC emitted by PlutusTx is, in almost every case we are familiar with, significantly inefficient in terms of both script size and execution speed. While PlutusTx may be suitable for extremely simple scripts, we do not consider it to be a viable option for complex projects. Choosing PlutusTx for a complex project is just too risky - it may turn out that the design for a complex smart contract cannot possibly be implemented in PlutusTx without exceeding script size limits. Even if the script size can be minimized, execution performance is often so poor with PlutusTx that the resulting fees render a contract's protocol economically unviable. These are not merely theoretical objections; several members of our team have been involved in PlutusTx projects which ultimately required complete rewrites in Plutarch - at great expense to clients. We note that some of these projects ultimately failed due to the increased expense of a ground-up rewrite.

It may be possible to squeeze performance gains out of PlutusTx by performing certain optimizations in the Haskell code that serves as input to its compile functions. However, this is extremely unintuitive - a developer attempting to optimize PlutusTx is put in the incredibly awkward position where they must reason about how changes to Haskell code affect UPLC output in spite of the fact that Haskell (a lazy language) and UPLC (a strict language) have different semantics. Even if one could account for the different semantics, the machinery that performs the Haskell -> UPLC translation is an incredibly obscure TemplateHaskell. It is unreasonable to expect that even the most talented developers would be able to perform these tasks.

On a more mundane level, PlutusTx simply does not integrate well with the Haskell ecosystem's tooling. We have lost dozens - if not hundreds - of hours tracking down incredibly obscure errors emitted by the Haskell language server when using PlutusTx. Moreover, PlutusTx requires the use of a special standard library, which makes integration with existing Haskell libraries impossible and (again) gives rise to extremely confusing errors if one accidentally mixes functions from the Plutus standard library with functions from Haskell's prelude. These errors are not intractable, but solving them requires a large amount of folk-wisdom that makes it very difficult to onboard new developers and greatly increases development costs.

Why not Plutarch?

Plutarch eschews compilation-via-metaprogramming for a sophisticated embedding strategy that enables an embedded UPLC DSL. In many respects, Plutarch is a substantial improvement over Plutus: Plutarch, when used by an experienced developer, results in scripts with vastly smaller sizes and greatly improved performance compared to an equivalent PlutusTx implementation.

Although these improvements over PlutusTx have enabled more sophisticated smart contracts, the embedding strategy that powers these performance increases entails a severe cost in terms of ergonomics. At a glance, writing Plutarch requires:

Heeding the distinction between Plutarch-level and Haskell-level functions, which can be very confusing to developers who are not familiar with complex embedded DSLs
The capability to write performant "raw" Lambda calculus, which is a skill that not many developers (even experienced functional developers) possess. (To use a simple example, it is frequently necessary to use fixed-point combinators directly in Plutarch)
A slew of conversions between different data encodings and representations, such that it is frequently unclear which representation should be used in a particular context
Familiarity with dozens of complicated types and type classes, including (but not limited to) PLift, PLifted, PConstant, PConstanted, PIsData, PIsDataRepr, PIsDataField, PListLike, PTryFrom, PlutusType, PCon, PMatch, PAsData, PData, PDataSum, PDataRecord. Many of these types and type classes are implemented using cutting-edge Haskell type system features and type-level programming techniques, rendering the Plutarch source code inaccessible even to experienced Haskellers
The ability to reason about complex custom deriving strategies
Coping with a large amount of syntactical noise necessitated by the embedding strategy

As type system enthusiasts, we find Plutarch to be a fascinating example of what is possible when the Haskell type system is pushed to its limits. As smart contract developers, however, our familiarity with Plutarch has made clear that it is not a viable general solution to the problem of Cardano smart contract development. The degree of niche expertise required to make full use of Plutarch is just too high.

Why not Aiken?

Aiken, a bespoke language for developing Cardano smart-contracts with a rust-like syntax, is an excellent solution for developers that do not specialize in functional programming. However, Aiken's main strength - a much simpler type system than Haskell’s combined with conventional imperative syntax - is, at the same time, a limitation: While Aiken's simplicity enables imperative developers to build on Cardano without struggling to adopt a new paradigm, it also prevents functional programmers from leveraging a strong type system to create abstractions and express sophisticated invariants at the type level. Aiken does not support type classes, data-generic programming, effects systems (monads, monad transformers, etc), or optics (functional-style data manipulation). Admittedly, these language features can be difficult to master, but when mastered they provide developers with the power to significantly simplify codebases and write intrinsically secure code. In the context of smart-contract development, the absence of these features is likely to lead to verbose code (which increases auditing costs and increases the potential for bugs) that is less secure (because certain important invariants cannot be expressed without these features).

Nevertheless, we acknowledge that Aiken is the right choice for many projects, especially those with relatively simple on-chain logic. In our experience, many projects have relatively simple on-chain logic, and would do better to choose Aiken than PureUPLC. As functional programmers, however, we believe that the advanced features found in strongly-typed pure functional languages are sometimes the best tool for the job. Aiken, in our view, is one part of the solution to the problem of Cardano smart-contract development - but that it is only one part of the solution.

Solution

We propose to build a UPLC backend for PureScript - a production-ready language that has a mature ecosystem and integrates well with existing Cardano development tools (as have been proven by cardano-transaction-library, also funded by Catalyst in the past). PureScript, in our view, is uniquely well-suited for this task: It is a strict functional language with a strong (but not overcomplicated) type system, was designed from the outset to support multiple backends, and yield an abstract-syntax tree which is particularly suitable for conversion to UPLC.

In order to solve the plethora of problems with existing solutions, we propose to develop a UPLC compiler backend for the PureScript Language. We believe that this is the best solution to the problem of smart contract development on Cardano because:

PureScript is a mature language with an extensive ecosystem and excellent tooling. A PureScript UPLC backend can make use of existing language servers, linters & formatters, and a rich standard library. Importantly, utilizing an existing mature language greatly reduces the maintenance burden - a large number of existing tools can be immediately employed by developers with little-to-no modification required on our part.
PureScript, like UPLC, employs an eager (i.e. strict) evaluation strategy, which greatly reduces the mental overhead required to reason about performance and simplifies the compilation process.
The PureScript compiler's abstract syntax tree is significantly less noisy than Haskell's AST, which, again, greatly simplifies the compilation process (particularly when compared to PlutusTx).
PureScript (unlike Aiken) has a rich type system that can be leveraged to express sophisticated invariants and enables the application of formal reasoning techniques…
… yet Purescript's type system is much simpler than Haskell's. PureScript, unlike Haskell, does not have a bevy of arcane language extensions, making it a much more suitable introductory functional language that imperative programmers can be onboarded to quickly.
PureScript offers the potential for optimizations which would be difficult or impossible to perform on UPLC itself. As a general rule, the number of optimizations a compiler can perform is proportional to the amount of information the compiler has access to. By operating on the PureScript CoreFn IR, we anticipate that we will be able to implement a number of sophisticated optimizations that are not possible with limited program information.
There is a backend optimizer for PureScript's CoreFn IR which we could adapt to our purposes, substantially reducing the amount of research required: <https://github.com/aristanetworks/purescript-backend-optimizer>
PureScript's builtin support for row types intrinsically reduces the amount of syntactical noise (especially when contrasted with Plutarch) required to implement and operate on different data encodings and representations.

Our solution is conceptually simple: We will implement a PureScript backend that transforms the PureScript compiler IR (internal representation) into UPLC, and spend the remainder of our budget implementing as many optimizations as possible. (See the sections below for a more detailed discussion of particular challenges and our strategy for overcoming them.)

Risks involved

Row polymorphism: consider forall r. Record (foo :: Bar | r) -> Bar
We will not be able to use row type parameters easily, because we need to know the position of a field in a record to translate it to access by index
Possible solution: break per-file compiling. Compile everything together until we have fully instantiated parameters and then translate record field accessors to index accessors
Another solution: make functions like this parameterizable (implicitly) with a numeric index. Then we could apply this parameter where the position is known, and leave it unapplied where it is unknown. An optimizer pass could then reduce the lambda-abstractions
Under-estimation of amount of work
Inability to optimize enough to be competitive with Plutus or Aiken
Failure to figure out monad desugaring
ADA price volatility that could put funding at risk

Market

The proposed solution can be useful for Cardano dApp developers who already have experience with Plutus or just functional programming in general.

How does your proposed solution address the challenge and what benefits will this bring to the Cardano ecosystem?

Intended Challenge: OSDE: Open Source Dev Ecosystem

**Challenge statement: “**Can we build a community-owned Open-Source Ecosystem that’s commercially viable to drive growth, increase opportunities, and increase project visibility?”

What does this proposal entail?

At a bare minimum, this proposal entails designing and implementing a UPLC backend for the PureScript compiler that developers can immediately use to quickly build efficient and secure smart-contracts. In order to achieve this goal, we may also have to design and implement PureScript utility libraries for contract development.

Concretely, the problems that we must solve to achieve are stated goals are:

Translation of PureScript's CoreFN IR into UPLC: Our initial research into this task indicates that it is eminently achievable (although it is certainly not trivial!), and we do not anticipate substantial difficulties.
Solving the data representation/encoding problem: One significant difference between PureScript and UPLC is that PureScript allows functions to be stored in native data structures, whereas UPLC does not. Attempting to prohibit functions in PureScript data types would cripple the language. Consequently, our tentative plan is to follow Plutarch and develop machinery for translating PureScript data types into Scott-encoded equivalents. Of course, we still must provide users of our language with the capability to work with data-encoded (PlutusData) types, and will need to design and implement a solution that allows them to do so. We suspect that it may be possible to implement a solution to this problem as a PureScript library that uses row-paramaterized types to represent data-encoded objects, but this requires further research and experimentation.
Optimization: While the PureScript compiler is, in many respects, exceptionally suitable for a UPLC backend, there is one respect in which it is not: The PureScript compiler currently performs only minimal optimizations on the CoreFN IR (though it performs a larger number of optimizations in the JavaScript backend). In particular, the PureScript compiler performs very minimal CSE (Common Subexpression Elimination), which leads to a large amount of duplicated computation of pure terms. This is particularly pernicious when targeting UPLC, as duplicate computation can cause an explosion in execution costs and, consequently, transaction fees. Although other optimizations may be useful, it is absolutely essential that we develop a CSE framework that is tuned for UPLC. Fortunately, CSE is a very well-studied optimization - although this task will certainly require research, we believe it can be achieved.
FFI: Developers using our backend ought to (ideally) be able to make use of existing PureScript libraries. Unfortunately, nearly every useful PureScript library employs PureScript's JavaScript FFI (or depends on another library that does). In many cases, it should be possible to make existing libraries compatible with our backend simply by providing foreign UPLC imports where foreign JS imports now exist. Though not essential for the success of our project, we would like to explore the possibility of designing and implementing a mechanism that would allow "swappable" foreign imports to reduce the amount of work necessary to integrate with the existing PureScript library ecosystem.

How do you intend to measure the success of your project?

The criterion for success of this project is delivery of a UPLC backend for the PureScript compiler that integrates with existing PureScript tooling, and that developers can immediately use to quickly and ergonomically develop Cardano smart-contracts.

Success can be measured by the number of projects using the tool.

Please describe your plans to share the outputs and results of your project?

MLabs maintains social presence on Twitter and in Plutonomicon Discord, where updates could be posted. Additionally, the release could be announced on IOG technical Discord. All source code will be available on Github.

Capability/ Feasibility

What is your capability to deliver your project with high levels of trust and accountability?

MLabs has proven itself as a company employing dozens of Haskell and PureScript software developers and delivering a number of Catalyst-funded projects in the past.

What are the main goals for the project and how will you validate if your approach is feasible?

The main goals are:

Build a proof of concept for PureScript to UPLC compiler
Extend this PoC with the features needed to make it usable for development
Build supporting libraries, website with documentation and project template

Feasibility of our approaches to achieve these goals will be evaluated by project managers and technical leadership of the project.

Please provide a detailed breakdown of your project’s milestones and each of the main tasks or activities to reach the milestone plus the expected timeline for the delivery.

Research - 0-1.5 month
Figure out our approach to data encoding (Generics vs compiler built-in, scott vs. PlutusData encodings)
Consider integration with LambdaBuffers (the goal is to make it not on compiler level)
Type classes, ad hoc polymorphism - (tc = records?)
Figure out how to deal with PS modules (a linker?)
Optimisations (look into plutonomy and techniques for lambda calculus optimisations from papers)
Proof-of-concept - basic CLI interface for a single file (PS -> CoreFn -> UPLC) - months 1.5-3
Repo setup, Nix, CI
Replicating compilation workflow from PS code to IR in our new compiler (PS -> CoreFn)
CoreFn -> UPLC (no plutus APIs yet, just basic primops)
Test engine setup using UPLC executor
Months 3-6 Implementing language APIs
Plutus ledger API (tx info, plutus ledger api library)
Adapt PureScript prelude
Rework purescript built-ins (Prim* modules)
Provide UPLC primops as functions
Design an interface for validators and minting policies
The compiler must know the entry point
Ensure the entry point is of correct type
Testing
Months 6-7 Integration testing, benchmarking and documentation
Integration tests
Golden tests
Comparisons with other languages (Plutus, Aiken)
A website with documentation
Months 7-8 Reusable project template (including a demo)
Consider using or adapting spago (PureScript package manager)

Please describe the deliverables, outputs and intended outcomes of each milestone.

Design document should be created, outlining our approach to solving the mentioned problems
A repository with the proof of concept binary and test should be made available
The POC repo should be extended with support of the listed features
Tests and comparisons should be added, a website should be publicly deployed
A new repo containing a functional project template should be created

Resources & Value For Money

Please provide a detailed budget breakdown of the proposed work and resources.

Research - 160h for 2 people = 320 hours
Figure out our approach to data encoding (Generics vs compiler built-in, scott vs. PlutusData encodings)
Consider integration with LambdaBuffers (the goal is to make it not on compiler level)
Type classes, ad hoc polymorphism - (tc = records?)
Figure out how to deal with PS modules (a linker?)
Optimisations (look into plutonomy and techniques for lambda calculus optimisations from papers)
Proof-of-concept - basic CLI interface for a single file (PS -> CoreFn -> UPLC) 320 hours
Repo setup, Nix, CI 32 hours
Replicating compilation workflow from PS code to IR in our new compiler (PS -> CoreFn) 40 hours
CoreFn -> UPLC (no plutus APIs yet, just basic primops) 208 hours
Test engine setup using UPLC executor 40 hours
Implementing language APIs 380 hours
Plutus ledger API (tx info, plutus ledger api library) 40 hours
Adapt PureScript prelude 80 hours
Rework purescript built-ins (Prim* modules) 80 hours
Provide UPLC primops as functions 40 hours
Design an interface for validators and minting policies 40 hours
The compiler must know the entry point
Ensure the entry point is of correct type
Testing 100 hours
Integration testing, benchmarking and documentation 240 hours
Integration tests 40 hours
Golden tests
Comparisons with other languages (Plutus, Aiken) 40 hours
A website + design 160 hours
Reusable project template (including a demo) 110 hours
Consider using or adapting spago (PureScript package manager) 30 hours
Change budget 80 hours

1370 hours * $95/h = 130150 USD

130150 / 0.21 = 619761 ADA

**In the interest of full transparency, please note we have applied a conservative USD/ADA exchange rate in pricing this proposal. This is to ensure our operations remain stable regardless of market conditions. Although we firmly believe the future of Cardano is bright, we recognize the price of ADA and all cryptocurrencies is inherently volatile. Our financial obligations are denominated in fiat. Most importantly, this includes the salary of our engineers whose hard work makes projects like this possible.

In the unlikely scenario of severe negative price movement beyond our forecasted rate, it is possible that MLabs may need to temporarily suspend work on this proposal until the market recovers. Rest assured, this decision would be made solely to protect our business's long-term viability and never taken lightly.

We appreciate your understanding and support, and we are excited to see what we can achieve together.

Who is in the project team and what are their roles?

MLabs

MLabs has quickly become one of the premier development firms in the Cardano Ecosystem. We are an IOG Plutus Partner and work regularly with IOG to develop the Cardano blockchain and ecosystem. Our team is composed of talented developers who have helped build community projects such as:

Liqwid
SundaeSwap
Minswap
Optim
Many others

Through our work with early-stage projects, we have one of the largest groups of Haskell/Plutus developers in the community.

Core Team

Drazen Popovic

Full-stack Cardano distributed application (dApp) developer and auditor, working on several Cardano dApps that span Haskell, Purescript and Nix language environments.
Technical lead on the Lambda Buffers (also known as Cardano DApp Schemas) project funded by Catalyst Fund9.
Technical lead on the Cardano Open Oracle Protocol project funded by Catalyst Fund8.
Worked on decentralized protocols based on the Cardano blockchain including decentralized exchange, synthetic asset and oracle protocols, and programmable money.

Chase Maity

Chase is a polyglot software developer with expertise in Haskell and C. He’s interested in type systems, programming language design and performance optimizing compilers. At MLabs, he has worked on both on-chain Plutarch code and off-chain infrastructure; as well as providing technical specialist assistance on Plutus Core and its intricacies. Outside of MLabs, Chase spends time contributing to open source, and learning more about Haskell and type systems.

Sean Hunter

Sean is an engineer with extensive Cardano smart-contract development experience. He has implemented and audited multiple complex projects written in both PlutusTx and Plutarch. Sean's functional programming journey began as an offshoot of his academic interest in formal logic, and to this day he maintains a strong interest in type theory (with a special interest in row types and their applications).

Vladimir Kalnitsky

Vladimir is a software developer with a number of contributions to the PureScript ecosystem and solid experience with Haskell. During his undergraduate years, Vladimir focused on functional programming and type theory. Vladimir is more of a 'hacker' than a scientist, but he still values formal reasoning about code and well-founded software development practices.

Tomasz Maciosowski

Tomasz, a Haskell/Plutus developer, has gained experience through involvement in projects such as Clarity and Charli3. Additionally, he has made contributions to different infrastructure projects like ogmios-datum-cache and cardano-transaction-lib

How does the cost of the project represent value for money for the Cardano ecosystem?

How does this help developers?

This project will allow developers to use our Purus PureScript backend to generate efficient UPLC code from vanilla PureScript, which we believe will greatly reduce development costs and improve the overall Cardano smart contract development experience.

bookmarked!

bookmarked!