# Clear everything
rm(list = ls())
# Install required packages if not already installed
options(repos = c(CRAN = "https://cloud.r-project.org"))
if (!requireNamespace("httr", quietly = TRUE)) install.packages("httr")
if (!requireNamespace("jsonlite", quietly = TRUE)) install.packages("jsonlite")
if (!requireNamespace("dotenv", quietly = TRUE)) install.packages("dotenv")
# Load packages
library(httr)
library(jsonlite)
library(dotenv)
# Load environment variables
load_dot_env()
1. The Sertotype Database API: Introduction
Introduction to the Serotype Database
The Serotype Database serotype.org, implements the concepts detailed in the paper by Osoegawa et al. (HLA. 2022 Sep;100(3):193-231). In that work, the authors describe a new strategy for mapping recently identified HLA alleles—especially those discovered by molecular typing but lacking serotype assignments—by correlating key amino acid residues with known serologic epitopes in order to define a serotype for each allele.
The Serotype Database implements their findings into an easy-to-use website with a robust API that will streamline the retrieval of serological assignments for HLA alleles, making it easier for researchers, clinicians, and laboratories to integrate these data into their analyses and improve transplant-related decision-making.
1.1 What is an API?
An API (Application Programming Interface) is a way for different software components or services to talk to each other. When you make a request to an API, you’re essentially sending some information or instructions to a server, which then processes those instructions and sends back data (or confirmation of a task). APIs allow developers to interact with external services without necessarily understanding all their internal details.
1.2 What is GraphQL?
GraphQL is a query language for APIs that provides a more flexible and efficient way to get the data you need. Instead of having multiple endpoints (as is common in RESTful APIs), GraphQL typically has a single endpoint where you send a query describing exactly what data you want. GraphQL uses the JSON data format to query the API and to return the results.
1.3 What is JSON and How Does GraphQL Use It?
JSON (JavaScript Object Notation) is a lightweight data-interchange format that is easy for humans to read and write, and easy for machines to parse and generate. It is built on two structures:
Objects (enclosed by curly braces) containing name-value pairs, for example:
{ "name": "John", "age": 30 }
Arrays (enclosed by square brackets), for example:
[ "apple", "banana", "cherry" ]
GraphQL typically sends and receives data in JSON format. When you send a GraphQL query or mutation to the server, you include the query string (and optionally, variables) in a JSON object. The GraphQL server processes your request, then sends back a JSON response containing the requested data (or any errors).
For example, if you run this query:
query {
alleleToSerotype("A*02:01"]
alleles: [
resolution: two_field
) {
allele
serotype
} }
The response from the GraphQL API is JSON, which might look like:
{
"data": {
"alleleToSerotype": [
{
"allele": "A*02:01",
"serotype": "A-0201",
}
]
}
}
Because JSON is a standard format, many programming languages—including R, Python, and JavaScript—can parse this response easily for further analysis or display.
1.4 How to call the serotype.org GraphQL API from R
Setup Required Packages
We’ll use the following packages:
- httr: to send HTTP requests.
- jsonlite: to parse JSON responses into R objects.
- dotenv: to get your Serotype Database API Key from local file.
Set API Key
To query the Serotype Database API, you will need an API Key, which are available for free by signing up for an account at https://www.serotype.org/user.
If you are familiar with how to set R environment variables, you can save the key to your .env file as SEROTYPE_API_KEY=YOUR_API_KEY
, replacing YOUR_API_KEY
with your actual API key. This is considered best practice because it keeps your private API key separate from your code, enhancing security and making your code easier to share or collaborate on without exposing sensitive information.
Alternatively, you can set the value directly in the code block below by assigning your API key to apiKeyOverride
. However, be cautious: if you choose to embed your API key in the code, ensure you remove it before sharing the file, as each user must use their own unique API key for security and proper functionality.
# Check for the API key in environment variables
<- Sys.getenv("SEROTYPE_API_KEY", unset = NA)
apiKey
# Allow manual override of the API key by user here
<- "" # Set this to your API manually if not using environment variables
apiKeyOverride
# Use the override if provided, otherwise use the environment variable value
if (!is.null(apiKeyOverride) && nzchar(apiKeyOverride)) {
<- apiKeyOverride
apiKey }
1.5 Example Serotype Database API Query
Based on the example in 1.4, we will query the API for a specific allele (A*02:01
) at two_field
resolution.
This JSON response will be processed in R using the function fromJSON
to create an R variable that contains this information.
# Define the API endpoint
<- "https://serotype.org/api/graphql"
url
# Define the GraphQL query in a single string
<- '
query_string query {
alleleToSerotype(
alleles: ["A*02:01"],
resolution: two_field
) {
allele
serotype
}
}
'
# Make the POST request
<- POST(
response
url,body = list(query = query_string),
encode = "json",
add_headers(`x-api-key` = apiKey)
)
# Parse the JSON response
<- fromJSON(content(response, "text"), flatten = TRUE)
res_data
# Inspect the data
res_data
$data
$data$alleleToSerotype
allele serotype
1 A*02:01 A-0201
To extract just the serologyData
part into a data frame:
<- res_data$data$alleleToSerotype
df df
allele serotype
1 A*02:01 A-0201
To extract just the serotype
part into a data frame:
<- res_data$data$alleleToSerotype$serotype
serotype serotype
[1] "A-0201"
This shows how we can use R to make a simple GraphQL query to the Serotype Database, parse the result, and examine it.