Building an access framework using Cedar
A deep-dive into building a solid foundation for access control in applications.
I’ve been pondering how to provide a robust foundation for access control in atlas9. By “access control”, I mean answering questions like “can Alice edit document Y”.
Access control can be tricky – it’s simple at first, but the complexity reveals itself slowly over time, and one day the system is complicated, poorly understood, fragile, and changes become difficult, slow, and risky.
I wish we would collectively forget the terms authentication and authorization. We need one of those machines from Eternal Sunshine of the Spotless Mind to erase them from our memories. They’re too close in spelling, which causes confusion and complicates discussion. I recommend using “identity” and “access” instead.
Here are some of the factors to consider when thinking about access control:
Users and groups. In the beginning, projects may get away with granting access to specific users, but inevitably there are enough users and resources that users need to be organized into groups. Groups may even contain other groups. This creates permission inheritance (user A is in Group B which is in Group C, and each level has permissions attached).
Groups of resources. Similarly, projects may get away with having all resources in one namespace, but eventually they need to be organized into groups, and maybe even a hierarchy of groups. Think of folders of files, or account→organization→workspace hierarchy, etc. This also creates permission inheritance.
Mapping roles to permissions. Imagine an issue tracking system. Projects may start with “Admin, Editor, Viewer” roles. And then they add a “Metadata Editor” role which gives access to edit only issue metadata but not content, or a “Commenter” role that gives access to comment, but not edit. And then the system adds a “Discussion” feature, or a “Report” feature. Does “Editor” apply to discussion and reports? Is it possible to rename “Editor”?
Querying for permissions. The UI for a project usually needs to know what access a user has, so it can hide a button, show a message, give hints. What about lists? Are you using object-level permissions, or even just showing a table of resources from multiple workspaces, with a “delete’ button for each row? Is that going to be efficient?
Avoiding work. I usually try to do as little work as possible before access checks, so that unauthorized callers are not using up server and database resources. In the worst case, this could be considered an attack vector by malicious users. How does this work in systems with a hierarchy of resources? Is the “organization ID” or “workspace ID” present in the request, or do you need to look it up?
Default deny. Preferably, requests should be denied unless explicitly allowed. The benefit is that it’s harder to forget to add access checks. It can be difficult to apply this in practice. You can have a policy engine that defaults to deny, but you still have to apply access checks in all the right places – servers and databases are “default allow” by design, so it’s up to us to put checks in all the right places.
Multiple services? Multiple languages? Lots of systems have multiple backend services, sometimes written in a mix of programming languages. How will the access system scale across all this? Can anyone truly understand, learn, audit, or test an access system with rules and implementations spread across all this territory?
Object-level permissions are tempting, and tricky. In my experience, most systems get pretty far by managing permissions for broad scopes – they grant access at the “workspace” level, and that access applies to all resources in that workspace. Over time, there’s a constant temptation to add object-level permissions – e.g. “we need contractors to have edit access to the design documents for project foo”. Object-level permission are a natural fit for so many use cases and it’s unfortunate when a project can’t easily provide them. The biggest hurdle is usually solving the “search/filter” use case, where a user can use a “list” endpoint with various filters and efficiently get only the resources they have permission to view.
Testing. Testing access control in a meaningful way is very difficult. If you mock parts of the system, you risk mocking out access checks. And, even a simple system with a few resources types and roles can create a combinatoric explosion of test cases that most people don’t want to manually write and maintain.
Checking referenced resources. I think the focus of access checks is often on API endpoints – the
update(id)handler checks whether the user has access to update given resource. But with if the resource references another resource? Does it check access to that resource? What if there’s abulkImport()handler? Does it reuse the same code to check access to all resources and all referenced resources? Projects need a reliable pattern for organizing this code so that it’s hard to get miss access checks, especially for referenced resources.
So, in summary, what do I want for atlas9?
easily support users, groups of users, and groups of groups
easily support nested groups of resources (like a filesystem tree)
support object-level permissions
check early, check often
check early to avoid work where possible
check often to encourage checks at multiple levels of abstraction, not just at API endpoints
checking often means we need good performance so we don’t have to worry about it
default deny policy engine. try to make the system
easy testing, perhaps even automated
Cedar: defining and evaluating policies
With a vague understanding of what I wanted, I set out to search for libraries or systems that would help me build this foundation. I had come across a few projects previously: OpenFGA, OPA/Rego, and Cedar. I liked the look of Cedar – the policy language is easy to read, it’s implemented as a library, it’s fast, it’s created by AWS, thoroughly tested and verified, and more.
Let’s design a “report” feature using Cedar. Users build reports to tell a story with data - a document with text and embedded charts that query data.
We’ll start simple – the report owner can take any action:
permit(principal, action, resource is Report)
when { resource.owner == principal };Reports can be published publicly, allowing anonymous read access:
permit(
principal == User::"anonymous",
action == Action::"viewReport",
resource is Report
)
when { resource.public };The application code might look like:
func viewReport(user, reportId) {
report = loadReport(reportId)
checkAccess(user, "viewReport", report)
return report
}
func editReport(user, report) {
checkAccess(user, "editReport", report)
saveReport(report)
}Let’s add the ability to grant access to other users.
There are multiple ways to approach this (see Representing Relationships). We’re going to use the “attribute-based relationships” approach here, because it’s great for clearly demonstrating relationships in policies, but we’ll talk about “template-based relationships” and Policy Templates later in the post.
permit(principal, action == Action::"viewReport", resource is Report)
when { principal in resource.viewers };
permit(principal, action == Action::"editReport", resource is Report)
when { principal in resource.editors };Relationships and Hierarchy
If there are more than a few users, it quickly becomes impractical to manage access for every individual user, so users belong to groups, and access can be granted to groups.
We don’t need to change the policies to support groups. The “in” operator tests hierarchy membership. All we need to do is tell Cedar about our parent-child relationships. That brings us to “entities”, which is how we tell Cedar about our principal and resource data. For example, some of the entities in our report system might look like:
instance Group::"marketing" = {};
instance Group::"contractors" = {};
instance User::"alice" in [Group::"marketing"] = {};
instance User::"mark" in [Group::"contractors"] = {};
instance Report::"1" = {
editors: [User::"alice", Group::"marketing"],
viewers: [Group::"contractors"]
};
permit(principal, action == Action::"viewReport", resource is Report)
when { principal in resource.viewers };I’m using syntax for entity definitions that is not yet implemented, because it’s easier to read. See this RFC. Entities are usually described by JSON. In fact, everything in Cedar can be described with JSON: entities, policies, schema, etc.
When “mark” views the report, Cedar can work out that “mark” is in the “contractors” group, which is in the “viewers” set.
This already supports nested groups, so you could imagine groups matching an org chart, for example. It’s just up to our application data model to describe the relationships to Cedar.
Similarly, users might want to organize reports into groups. You might do that by traversing the directory tree from the report to the root, getting set of users/groups that have viewer/editor access at each level, and adding all that information to the “report.editors” and “report.viewers” sets.
Checking References
Let’s add a wrinkle: reports use datasets to provide charts, but some datasets are sensitive and need to be protected. A user can’t create a report using datasets they don’t have access to, and similarly, a user can’t view a report if it uses data they don’t have access to.
This wrinkle demonstrates references, which is one of those things that might be easy to miss in access control.
permit(principal, action == Action::"viewDataset", resource is Dataset)
when { principal in resource.viewers }We’ll cover this more later, but note that while Cedar is great, you still need to ensure you execute the proper access checks at the application layer. So we’ll need to update our application code to check these dataset references:
func viewReport(user, reportId) {
report = loadReport(reportId)
checkAccess(user, "viewReport", report)
for ds in report.datasets {
checkAccess(user, "viewDataset", ds)
}
return report
}
func editReport(user, report) {
checkAccess(user, "editReport", report)
for ds in report.datasets {
checkAccess(user, "viewDataset", ds)
}
saveReport(report)
}Cedar seems limited in this respect – it can’t model this relationship, as far as I can tell. It seems like an important characteristic of real world data models. I haven’t done deep research on other policy frameworks like OpenFGA, SpiceDB, OPA/Rego yet. Claude tells me it’s possible to model this in those systems.
// SpiceDB
definition user {}
definition dataset {
relation viewer: user
permission view = viewer
}
definition report {
relation datasets: dataset
relation direct_viewer: user
// User can view report if they can view ALL referenced datasets
permission view = direct_viewer & datasets.all(view)
}
// OpenFGA
type user
type dataset
relations
define viewer: [user]
type report
relations
define datasets: [dataset]
define viewer: [user] or viewer from datasetsPerhaps I need to do another deep dive on OpenFGA, OPA/Rego, or SpiceDB. If you want to see that, let me know in the comments.
One item on my TODO list is to come up with a way to model this generically across an application’s data model in Cedar, so that it’s harder to forget to check access to references like this.
Listing Reports
Now for something harder: we’ve implemented object-level access control, but we haven’t implemented a “list” function. Solving this problem efficiently can be tricky.
The brute-force approach is to execute the list query, and check the access for each result.
func listReports(principal, listRequest) {
results = []
q = buildReportsQueryWithFilters(listRequest.filters)
for item in executeQuery(q) {
if cedar.checkAccess(principal, "viewReport", item) {
results.append(item)
}
}
return results
}That’s an incomplete solution – pagination adds significant complexity. You might need to run multiple queries to get a full page of results. Cursor-based pagination might be easier to implement than offset-based pagination. You might have to think carefully about the potential overhead of multiple queries – you might be filtering out lots of rows that the user doesn’t have access to. You might need to add some required filters to reduce the potential scope. OpenFGA has some docs with more detail.
In our hypothetical system, we don’t need to worry about the overhead of Cedar itself – it can easily handle doing lots of policy evaluations, and it’s (hypothetically) running in-process, so there’s no network overhead or batching to worry about (although that could be a real world concern, depending on how you deploy Cedar).
I left a bug in the
listReports()code. Can you see it? Imagine you’re reviewing my PR. We just talked about it. I forgot to check that the user can view the referenced datasets! I can easily imagine this bug happening in the real world – someone implementsviewReport(), and then 2 months later someone else implements (or rewrites, or duplicates + tweaks, etc)listReports()
Partial Evaluation
Cedar has a fascinating capability called partial evaluation. Partial evaluation allows you to evaluate policies with partial data, and Cedar will return Residuals that describe the missing parts. I can’t do a better job explaining it than the Cedar blog post, so I highly recommend checking it out.
The most interesting bit is that you can use partial evaluation for listing which resources a principal has access to, or which users have access to a given resource. This could give us a more efficient implementation of the listReports() function above – we’d convert the residuals into a SQL query, add the filters from the user’s request, and execute a SQL query that will return only the reports the user has access to. Theoretically, that’s a much more efficient implementation of listReports().
Policy Templates for Generic Access Control
Policy templates allow us to easily fill in policy details with data from our database. Instead of using the attribute-based relationships we described above, we could instead store all permissions in a single table, and link the data with policies at evalution time. We’ll store the access grants in the database:
Table: report_permissions
principal | report_id | action
----------------------------------------------
User/alice | 1 | edit
User/bob | 1 | view
User/anonymous | 1 | viewThen we load this data when we check access:
func viewReport(user, reportId) {
perms = loadReportPerms(user)
checkAccess(user, "viewReport", perms)
report = loadReport(reportId)
return report
}Access Control is Tricky
Perhaps the moral of this story is, access control is tricky, even at small scales. Take it a good, hard look before you leap. Don’t back into it, unaware of the complexity ahead. Maybe Cedar will help you (and an atlas9 framework will help with that), maybe OpenFGA/OPA/etc, or maybe you write your own version, but keep it organized and robust.
Future Work
I’ll do a follow-up post in the future to walk through an actual implementation of all the concepts discussed above.
Bonus: Feature Flags, Entitlements, and more
Many projects make use of feature flags to control the rollout of a feature, to provide long-lived controls for operators (e.g. manually disable expensive actions during overload), to provide entitlements (i.e. access to special features you have to pay for), to run A/B tests, and more.
Many projects pay for a service like LaunchDarkly, or maybe they build their own solution. I wonder if Cedar could provide a good foundation for implementing feature flags.
Perhaps feature flags could be implemented as resources:
permit(
principal,
action == Action::Flags::"evaluate",
resource is Flag
)
when {
resource.deployed ||
principal in resource.allow ||
context.account_percentage < resource.percentage_enabled
}
unless {
resource.blocked ||
principal in resource.deny
}Perhaps entitlements could be implemented as:
permit(
principal,
action in Action::"SSO_Actions",
resource
)
when {
Features::SSO in principal.plan
}An “enterprise” plan can use entity hierarchy so that “SSO” is in the plan, or the plan could put the SSO feature directly in the “plan” set, for example, if SSO can be sold as an individual add-on feature.
More Resources
There’s a ton of content out there on this topic. Here are some links if you’re interested in diving deeper:
https://medium.com/intuit-engineering/authz-intuits-unified-dynamic-authorization-system-bea554d18f91
https://www.osohq.com/post/why-authorization-is-hard
https://medium.com/building-carta/authz-cartas-highly-scalable-permissions-system-782a7f2c840f
https://medium.com/airbnb-engineering/himeji-a-scalable-centralized-system-for-authorization-at-airbnb-341664924574
https://authzed.com/blog/casbin
https://dev.to/alex-ac-r/9-access-control-and-permission-management-for-modern-web-app-j6k
