-
Notifications
You must be signed in to change notification settings - Fork 348
Load index into Vector using highway ops #2544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi, our lane types are fixed-size whereas size_t can be either 32-bit or 64-bit. |
@jan-wassenberg Thanks for the response! It sounds good. I’m currently trying to load elements from an array with any data type using an index vector. I believe this can be achieved using the GatherIndex operation in Highway, but I’m facing some confusion regarding the following: Let’s say I want to load int32 elements into a 128-bit SIMD vector using an index vector. The index vector is 64 bits wide, meaning it can hold 2 indices per vector. However, int32 elements can be processed in sets of 4 at a time. This leads me to wonder—using the GatherIndex operation to load 4 int32 elements into a vector with only 2 indices per vector seems impossible, right? How can this situation be handled? |
Yes, mixed-width gathers are problematic. Although x86 supports them, other platforms do not. Thus I would advise ensuring the width of the index and the to-be-gathered elements match. If you really must have size_t indices, then you'd probably want to DemoteTo them to u32 (unless |
@jan-wassenberg I understood, but array size always would not be of size less than int32 right? such cases we can't able to use gather right for all dtypes with index vector of either int32/int64 type? This what I am trying to achieve: |
That's correct. CPUs usually do not support gather for less than 32-bit elements. Gather is anyway slow and best avoided if possible, unless you're going to use the resulting vector inside other SIMD code. |
@jan-wassenberg I am currently trying to use basecase of vqsort to work with sort indices along with keys wihtout the use of KV structure, thats were I am trying to use gather and index vector here. Do you have any suggestions |
hm, I'm not sure I understand. VQSort key+value only works with KV structures, unless you are defining u32 "keys" yourself which consist of the actual u16 key plus a u16 index. If you want to gather values, you can just use normal C++ code. This will be about as fast as vector code for that. |
@jan-wassenberg to be precise, I am trying to introduce two different pointers one is Key and another index pointer, trying modify basecase code to implement index swapping along with key swap. |
Ah, got it. That will involve changing traits-inl.h
to add an output argument for the indices, and instead of returning the min directly, getting a mask via |
@jan-wassenberg that's sounds good!, I believe for basecase the size of elements would be less than 256, so we can handle the indices in uint16, the only thing I am still getting stuck is that how we can handle appropriate indices to dtype of input key. |
For the code above, indices would be the same size as the dtype. You can use |
@jan-wassenberg I could able to understand the VQSort calls for all DTypes, but how KV is actually handled, since keys are used for comparison, how values are getting rearranged accordingly to the keys? |
@jan-wassenberg Between I tried implementing the suggestions you gave; it got worked but while copying the sorted vectors back to indices array, it stores wrong results, not sure what goes wrong |
KV is handled by including the value inside the key, and changing the key comparator to ignore the value part of the key. |
@jan-wassenberg Kindly can you share the exact implementation details that performs the above case |
@jan-wassenberg are you pointing to this piece of code : struct OrderAscendingKV64 : public KeyValue64 { HWY_INLINE bool Compare1(const LaneType* a, const LaneType* b) const { template // Not required to be stable (preserving the order of equivalent keys), so template template template // Same as for regular lanes. template template |
Yes, that's right. Compare is shifting out the value bits. |
@jan-wassenberg, does the current VQSort implementation in [vqsort-ini.h] support all Highway-supported intrinsics, including x86 (AVX-512, AVX2), Arm (NEON, SVE), and RISC-V (V extension)? Are there any other intrinsic sets supported by VQSort? Thanks in advance! |
Yes indeed, you can see the full list in the readme or detect_targets. |
Hi Champs!
I am trying to load/store an index into a simd vector from an index array pointer of type(size_t* indices). Is there any relevant operator available in highway to perform such load and store operations?
Thanks
The text was updated successfully, but these errors were encountered: