AVRO-4236: [doc] Update security model for Avro IPC#3681
AVRO-4236: [doc] Update security model for Avro IPC#3681RyanSkraba wants to merge 1 commit intoapache:mainfrom
Conversation
Emphasize the importance of not exposing the Avro IPC mechanism on public networks.
| Avro should be surrounded by security measures that prevent attackers from writing | ||
| random data and otherwise interfering with the consumers of schemas. | ||
| random data and otherwise interfering with the consumers of schemas. In addition, | ||
| the Avro IPC mechanism should not be exposed on a public network to untrusted actors. |
There was a problem hiding this comment.
maybe:
| the Avro IPC mechanism should not be exposed on a public network to untrusted actors. | |
| the Avro IPC mechanism should not be exposed on a public network or to untrusted actors. |
There was a problem hiding this comment.
This is a simple addition, and will at the very least warn people about the IPC mechanism.
I wonder though, would it be feasible/useful to create a more elaborate paragraph, mentioning the things one would need to do to make IPC safer?
There was a problem hiding this comment.
I'm thinking about this! I don't want to suggest that adding a layer of input validation or logging in users would solve the problem (and I don't particularly know how to document how to do that!)
But we could definitely be a bit more categorical:
Avro IPC is intended for use in inter-process communication between trusted components within a controlled and secure network environment. It does not implement input validation, authorization, or authentication, and therefore must not be deployed on public networks or made accessible to untrusted clients or services.
What do you think?
There was a problem hiding this comment.
i actually think that what separates gRPC from avro ipc is not any of these things. it's simply that it's implemented in a more hardened way.
avro ipc could work on untrusted networks if:
- protocol canonicalization actually worked (maybe by converting to a strict avro schema rather than JSON to remove language differences?)
- clients couldn't send arbitrary length nonsense as protocol or non canonical things as protocol: less detail is admissible to send to servers (or it's squashed when it's received)
- spec needs a lot more detail on how to canonicalize 100% consistently across languages
- protocol caches couldn't be totally filled forever by one client: eviction mechanisms with sharding
- protocol hashing were implemented properly as server side to prevent cache poisoning
- or maybe you use a confluent schema registry but for protocols?
- auditing were done for "allocating user controlled buffer lengths" and similar bugs
these aren't intractable, and avro ipc looks very interesting to me! but the implementations just aren't designed for untrusted input today.
There was a problem hiding this comment.
Using the Avro SASL profile there actually is authentication, though the documentation is very sketchy (to put it mildly). And when reading the code, I see that implementing an authentication method requires non-trivial work. Also, it is not immediately clear how to add encryption.
All in all, Avro IPC would benefit from the hardening that @lf- mentions and more documentation. Ideally with a detailed example that uses an SSL connection with DIGEST or OAUTHBEARER authentication.
What is the purpose of the change
Emphasize the importance of not exposing the Avro IPC mechanism on public networks.
Verifying this change
This is a documentation change.
Documentation