You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Array + ArrayImpl traits still don't feel quite right. There's quite a lot of surface area to implement a new encoding, the weird underscore functions to hide ArrayImpl from the public API is a little gross, and it doesn't fit well with handling Arrays from FFI.
Much of this can be fixed by creating an Array struct that wraps some opaque heap-allocated data. This could be something like Array(Arc<dyn ArrayImpl>).
In a former life, we had something more like this:
But this suffers from forcing us to keep metadata serialized in-memory, and therefore makes creating arrays quite expensive. Although we do pretty much have an ABI-stable Array struct.
We could relax this slightly, and support heap-allocated metadata, i.e.
But there is still a question of whether something lives in metadata, or lives in a buffer. For example, a ConstantArray with a large scalar value should presumably store that value in a buffer, rather than metadata. Therefore we still have to serialize the value when we create the array. And so we end up pushing buffers into the dyn Any too.
We then face the question of whether we should hold children as an opaque Array, or whether it's better to hold a specific child. e.g. FSSTArray holds a VarBin child. And so perhaps we also push children into the dyn Any!
At this point, we may as well keep going, push everything into the opaque heap data, and we end up back with: Array(Arc<dyn ArrayImpl>).
The problem with this solution is what we do with PrimitiveArray::new, should it return Array(Arc<dyn Any>) and be completely type erased? Should it return PrimitiveArray and only expose the functions on ArrayImpl, rather than Array? Both are bad solutions in my opinion.
So one final option we haven't considered is whether we have struct Array<Encoding>(Encoding) and we type alias type PrimitiveArray = Array<PrimitiveEncoding>. This solves many of the problems above in that the encoding can hold typed scalars and children, we don't expose the internal API (the ones without post-validation), and PrimitiveArray::new returns a PrimitiveArray. The slight issue is that we cannot add new functions to PrimitiveArray if it lives in a third-party crate without the slight annoyance of defining a PrimitiveArrayExt trait. The bigger issue is that we can't really pass these arrays around, e.g. into compute functions, since they hold a generic type.
So, perhaps we actually have:
/// Object-safe trait exposing the API of a Vortex array.traitArray{ ... }/// Type-def to pass around owned arc'd arrays.typeArrayRef = Arc<dynArray>;/// Wrapper struct to dispatch array API functions over an encoding.structArrayImpl<Encoding>(Encoding);impl<E>ArrayforArrayImpl<E>{ ... }
The problem with this setup is that when implementing ArrayImpl, self no longer implements Array, so we cannot easily use any of the possibly useful Array functions. Blergh.
The text was updated successfully, but these errors were encountered:
Uh oh!
There was an error while loading. Please reload this page.
The Array + ArrayImpl traits still don't feel quite right. There's quite a lot of surface area to implement a new encoding, the weird underscore functions to hide ArrayImpl from the public API is a little gross, and it doesn't fit well with handling Arrays from FFI.
Much of this can be fixed by creating an
Array
struct that wraps some opaque heap-allocated data. This could be something likeArray(Arc<dyn ArrayImpl>)
.In a former life, we had something more like this:
But this suffers from forcing us to keep metadata serialized in-memory, and therefore makes creating arrays quite expensive. Although we do pretty much have an ABI-stable Array struct.
We could relax this slightly, and support heap-allocated metadata, i.e.
But there is still a question of whether something lives in metadata, or lives in a buffer. For example, a ConstantArray with a large scalar value should presumably store that value in a buffer, rather than metadata. Therefore we still have to serialize the value when we create the array. And so we end up pushing buffers into the
dyn Any
too.We then face the question of whether we should hold children as an opaque Array, or whether it's better to hold a specific child. e.g. FSSTArray holds a VarBin child. And so perhaps we also push children into the
dyn Any
!At this point, we may as well keep going, push everything into the opaque heap data, and we end up back with:
Array(Arc<dyn ArrayImpl>)
.The problem with this solution is what we do with
PrimitiveArray::new
, should it returnArray(Arc<dyn Any>)
and be completely type erased? Should it returnPrimitiveArray
and only expose the functions onArrayImpl
, rather thanArray
? Both are bad solutions in my opinion.So one final option we haven't considered is whether we have
struct Array<Encoding>(Encoding)
and we type aliastype PrimitiveArray = Array<PrimitiveEncoding>
. This solves many of the problems above in that the encoding can hold typed scalars and children, we don't expose the internal API (the ones without post-validation), andPrimitiveArray::new
returns aPrimitiveArray
. The slight issue is that we cannot add new functions toPrimitiveArray
if it lives in a third-party crate without the slight annoyance of defining aPrimitiveArrayExt
trait. The bigger issue is that we can't really pass these arrays around, e.g. into compute functions, since they hold a generic type.So, perhaps we actually have:
The problem with this setup is that when implementing
ArrayImpl
,self
no longer implementsArray
, so we cannot easily use any of the possibly useful Array functions. Blergh.The text was updated successfully, but these errors were encountered: