GraphQL has come a long way since Facebook first announced this open-sourced spec in 2015. Since that debut, prominent companies have embraced this technology to enhance feature velocity and shrink product development cycles. To mention some, Fortune 500 enterprises, such as Intuit, Walmart, The New York Times, and NBC have employed GraphQL. Also, refusing to be left behind, global startups, such as Airbnb, Docker, Github, Uber and Yelp are just a few of those who have followed suite and added this technology to its long-term strategy.
To smooth out your implementation of the GraphQL backend at your company, here are five things you need to keep in mind:
- The learning curve when it comes to Implementing a GraphQL server from scratch is hard. But the ever-improving availability of community tooling is making the work easier.
- It is very important to figure out where a GraphQL server fits into your current back-end architecture.
- You need to carefully think through your GraphQL server's performance. Again, community tooling can help.
- Securing GraphQL APIs and queries is inherently different compared to REST APIs, especially with data access control lists (ACL).
- Support for specialized data, such as real-time data or data types such as geo-location, needs to be explicitly added.
GraphQL knowledge curve
To implement a GraphQL server you need to:
- Author a GraphQL schema that defines:
- The data types that your API will interact with and the relationships between them.
- Specification for queries (for reads) and mutations (for writes).
- Write resolvers for queries and mutations - the code that actually executes the queries (i.e. talks to and/or manipulates your sources of data such as databases, REST, GraphQL APIs, etc.). Resolvers need to handle errors and exceptions from the data source and, just as a with a normal GraphQL response, convert the corresponding responses to JSON. If you want to support source specific features like custom SQL functions, stored procedures, etc., this needs to be factored in as well.
- If you need event-driven real-time APIs (similar to the push equivalents in REST like Webhooks and Websockets: see How and Why To Provide Event-Driven Streaming APIs), then support for subscriptions (the GraphQL name for live queries) will have to be added.
Architecture of a basic GraphQL backend
All of this results in a lot of boilerplate, server-side code and complexity like passing context around for data sharing and handling relationships between different entities in the result set, etc.
Community tooling for GraphQL addresses this complexity, especially when dealing with popular relational or NoSQL databases. They do this in one of two ways:
- Beginning with the database: auto-generating a ready-to-use GraphQL server from your new or existing database schema. If you need to extend this schema to include other unsupported data sources, you will have to do so manually. And, there are some GraphQL tools on the market that work with Postgres databases.
- Going in the opposite direction and beginning with a GraphQL schema: if you handwrite a GraphQL schema, the database schema and the resolver code can be auto-generated. A potential downside of such a solution is the possibility of opaqueness at the database layer, and therefore not being able to take advantage of the underlying database.
Architecture
Let's say you have a standalone GraphQL server up and running. Now where exactly does this component fit into your back-end architecture? A few questions will need to be addressed:
- How will the GraphQL server interact with authorization middleware (new or existing)?
- If needed, how are multiple sources of data stitched together? The relationships between different sources of data may also need to be defined.
- What's the best way to handle a mix of REST and GraphQL APIs to support your front-end?
- What's a clean way to handle business logic (pre and post CRUD)? This is especially critical when you want to migrate a REST API that tightly couples business logic with the CRUD operations (using code or database stored procedures, etc.). For example, how do you handle the following story:
New signup request -> Validation -> Insert user details in DB -> Send welcome email - How does your backend scale with GraphQL APIs?
The answers to these questions vary greatly depending on your existing architecture and the GraphQL tools you decide to use. Broadly speaking, however, the following are good guidelines to stick to:
- Front your REST and GraphQL APIs with an API gateway.
- Use a proxy to stitch different GraphQL schemas together (or to leave the option open to do so in the future).
Stitching together different GraphQL schemas using a GraphQL proxy
- Handling business logic (Pre/Post CRUD): you may have to execute some business logic before or after interacting with your database. You can choose either one or both of these patterns, depending on your use-cases:
- Write custom GraphQL APIs that execute business logic along with CRUD (need to write back-end code).
- Separate CRUD and business logic — subscribe to changes in data models and then execute business logic (need subscriptions support).
- Handling authentication: depending on how your client apps interact with your GraphQL API, you can handle authentication by reusing your existing middleware in the following ways:
- Clients directly call your API: Use a webhook to delegate auth to your existing middleware. Your GraphQL API can pass the auth tokens from your client to a webhook that interacts with your middleware and returns sessions variables.
- Clients contact your GraphQL endpoint via an API Gateway: If you aren't already doing so, this is a good opportunity to centralize your session resolution, benefiting not just your GraphQL API, but also every other service in your back-end.
- Scalability: Your GraphQL server should never become a bottleneck for scale or performance. It should scale linearly with more back-end resources. Design and engineer your server accordingly to leverage the underlying infrastructure and optimally interact with your databases.
Performance
Poorly written GraphQL implementations are prone to running into the n+1 problem — superfluous calls to the database for nested objects, a problem well-known in the ORM world too. For example, when you want to fetch a list of authors and each of their respective articles, an inferior implementation will make one query for a list of n authors and then n queries for each author's articles.
While the n+1 problem and its solutions are well documented, ensuring compliance when extending your GraphQL schema is still an overhead. Most GraphQL tools that work with relational databases leverage SQL relationships (1:1 and 1:m based on foreign key constraints) to batch queries and avoid the problem for this part of the larger GraphQL schema. If you have access/visibility into the database layer, you can also use SQL views to solve this problem yourself.
Security
There are three major security concerns you should address:
- Authentication and session resolution for GraphQL queries: As highlighted above, handling authorization middleware for GraphQL API requests is essential. This concern is not very different compared to the REST scheme of things, but may have certain implications for handing data ACL, as we'll see below.
Granular access control for data: The advantage of being able to run any query on a GraphQL schema comes at the cost of having to granularly control access. Let's take the example of a To-Do application; you want users to be able to CRUD only their to-dos. Ideally, you should aim for the capability to configure ACL rules such as:
- For the role "user", allow read/write access to rows where value in "user_id" column = HTTP-header-user-ID.
- For the role "admin", allow read access to all rows.
So you will need a role-based permissioning system that controls access to data based on configurable rules. And your auth middleware must be able to resolve auth tokens to session variables that include information about the assigned roles for the requesting user.
- Query whitelisting: To prevent unauthorized use of GraphQL queries, in some cases, support for whitelisting allowed queries may be required. (Facebook does this; please note that this doesn't affect GraphQL's inherent flexibility and can be implemented in production alone.) This setup can also be leveraged for monitoring performance!
Special Data types
Real-time data: Real-time features in apps are handled by implementing GraphQL subscriptions. Subscriptions are usually implemented with websockets, which allow clients to subscribe to notifications and payload of changes to a dataset specified by a subscription query. Thankfully, most community tools come with varying degrees of support for subscriptions out of the box.
Niche data types: If you are coming from an RDBMS world and prefer working with special data types like geo-location (latitude/longitude from PostGIS), date/timestamp types, etc., you will have to explicitly add support for them. Again, some GraphQL server tools that leverage the underlying databases provide out-of-the box support for these data types.
Final Thoughts
GraphQL can be a game-changer for your organization; it can dramatically shrink product development cycles and improve feature throughput. However, implementing a GraphQL back-end from scratch is non-trivial.
Thankfully, one of GraphQL's best aspects is the continually innovating community that has sprung up around it. The community has developed tools, tutorials and best practices for most of the challenges described above.
If you're looking to build a GraphQL backend, it is advisable that you spend time examining the various community tools before you get started so that you can quickly get to evaluating its benefits.