SFS Error Handling
Context and Problem Statement
Error Sources
Filesystem.
ENOSPC
, EIO
, corruption, permissions, write / read / close / open errors.
SQLite.
See SQLite Docs: Result and Error Codes for details.
sqlite_orm makes primary result codes available via std::system_error
Critical: SQLITE_INTERNAL
, SQLITE_PERM
, SQLITE_NOMEM
, SQLITE_READONLY
, SQLITE_IOERR
, SQLITE_CORRUPT
, SQLITE_NOTFOUND
, SQLITE_FULL
, SQLITE_CANTOPEN
, SQLITE_TOOBIG
, SQLITE_MISMATCH
, SQLITE_MISUSE
, SQLITE_NOLFS
, SQLITE_AUTH
, SQLITE_RANGE
, SQLITE_NOTADB
Not critical (transaction aborts, deadlocks, busy database, constrain violation, etc.): SQLITE_ABORT
, SQLITE_BUSY
, SQLITE_LOCKED
, SQLITE_INTERRUPT
, SQLITE_PROTOCOL
, SQLITE_SCHEMA
, SQLITE_CONSTRAINT
Failed transaction retries.
We retried a transaction that threw SQLITE_BUSY
to often
Requests to non-existing data. Bucket, object, version, user does not exists.
Out of scope. Rate limiting, Broken requests (parse failures, etc.)
Layers
RGW OPs (this document). Translates RGW error codes to S3/HTTP compatible responses. We have a generic exception handler that translates exceptions into 500 / Internal Server Error.
SAL.
See rgw_sal.h
.
Errors returned via negative return codes. See rgw_common.{cc,h}
.
SFS: SAL Implementation (this document). Where we use SFS SQLite to implement SAL logic. Examples: Atomic Writer
SFS: SQLite (this document).
Methods and functions that do SQLite queries, transactions.
Examples: SQLiteVersionedObjects::get_versioned_objects
, Object::metadata_finish
Filesystem.
Typically errno
style errors. With STL sometimes exceptions.
sqlite_orm.
Throws std::system_error
with SQLite error code.
Decision
RGW OPs Layer
In addition to the regular RGW error handling, we have an exception handler in place.
Transforms critical errors into shutdowns / crashes. Critical errors may originate from sqlite_orm or filesystem operations.
Transforms non-critical errors into 500 / Internal Server Error.
Non-critical errors should not bubble up to this handler and are considered a bug.
SFS: SQLite Layer
Must not throw non-critical errors. Critical errors are OK to bubble up.
Options to return errors:
boolean returns, where true
means did the thing and false
did not do the thing.
Useful, when the exact cause isn't important to the layer above.
negative integer style returns, where the integer should be something unique to SFS. Should not be a RGW error code, filesystem error, SQLite error, etc.
SFS: SAL Implementation Layer
Must handle non-critical lower-level errors and return RGW error codes. May catch and rethrow critical exceptions.
Example:
A failed transaction from SFS SQLite returns false.
The SFS SAL implementation uses that to clean up the request and return a ERR_INTERNAL_ERROR
.
Important on this layer is, that clients may retry on certain errors before failing a request. We can leverage this where it is easier / cheaper to let the client retry than us.