-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include Offsets & Fringe Case Fix for outerSize > size && lda = {1, 1, ...} #33
Open
njh80
wants to merge
3
commits into
springer13:master
Choose a base branch
from
njh80:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…luding offset inputs for tensors. This feature includes backwards compatibility for calls to hptt::create_plan without offsets and is therefore not a breaking change. **Detail:** *Makefiles* (Makefile, benchmark/Makefile, testframework/Makefile) FIX: In the case of libomp not being discovered in LD_LIBRARY_PATH (MacOS M2 issue), user can specify a path for build. *benchmark/benchmark.cpp* FEAT: `transpose_ref` is for internal use and therefore changes do not include backwards compatibility for the function and hence the function call is amended to pass new nullptr arguments. *benchmark/maxFromFiles.py* FIX: Print statement of Error is given parentheses. *benchmark/reference.cpp* FEAT: Firstly, function receives new outerSize (A/B) and offset (A/B) arrays which are initialised to mimic the size array in the supplication of nullptrs. Next, the stepping through B is amended to ensure that the outerSize is traversed where the row of size is exceeded. Further offsets are inserted into the traversal. Behaviour can be verified via DEBUG. Pseudo-Code is: for each dimension not the innermost loop of B: divide the current position by the size of the next innermost loop of B that we want to traverse move across the offset distance as many times as we have exceeded it plus the initial offset further move over any space that remains after the end of the block required by size as many times as we exceed it *benchmark/reference.h* FEAT: Amended template to reflect new inputs of transpose_ref(), namely offsetA, offsetB, outerSizeA and outerSizeB. *include/compute_node.h* FEAT: Included three new members of a ComputeNode without exceeding the cache size of 64 bytes (unaligned memory in caches exceeding this programmers be warned!). First the offset difference (A - B) which reduces the number of calculations required in adjusting for the offset in the execution of hptt. The plan is created with start and end positions inclusive of the offset of B and the difference is added to access the start and end values of A. FIX: Secondly, the booleans of indexA and indexB indicate true when the leading dimension of A/B is 1 and the index is 0. The original code faultered when A or B's innermost dimensions were 1 causing the transpose_int functions to identify incorrect innermost indexes - especially problematic with non-zero outerSizes. *include/hptt.h* FEAT: New template functions provided for provision of offsets in various floatType contexts. *include/transpose.h* FEAT: Amended skipIndices and verifyParameter to include offset inputs as these functions are effected by the inclusion of these. Also included offsets as properties of the transpose class. *include/utils.h* FEAT: Amended the template of accountForRowMajor as this needs to change the orders of the offsets similarly to the other parameters. *src/hptt.cpp* FEAT: Implemented new offset templates and amended original templates to point to plan() with nullptrs or offsets where appropriate. *src/transpose.cpp* FEAT: Amended plan assignment section to include assignments for the new computeNode members. FEAT: Included offsets in fuseIndices, skipIndices and verifyParameters functions where amendments effect offsets too and verification proves offset + size <= outerSize for all dimensions. FEAT: axpy functions require offset differences as well and so these are calculated and the integer/array passed to the respective functions for proper calculation. Similarly, the axpy functions themselves are amended. FEAT: in transpose_ functions offDiffAB is always added to i to get the correct start/end. Also where lda/ldb == 1 is checked, plan->indexA/B is also asserted to ensure correct blocking is passed. As result of the increased robustness, the blockingA/B can always be confidently passed and loops can be included for cases where scalar is reached and lda/ldb is not 1. FEAT: Included a plethora of DEBUG statements (coding this was very fun). *src/utils.cpp* FEAT: Implemented accountForRowMajor changes for offsets mirroring the behaviour for outerSizes. *testframework/testframework.cpp* FEAT: Improved testing to include triggerable outerSize != size and offsets with strings printed for DEBUG cases. FEAT: Error messages modified for clarity.
Sub-Tensors often omit their inner-most dimension meaning that they access their source data without an inner stride of one. This commit adds a basic level of support for this in a similar way to the support for offsets. Inner Strides are optional arguments and are supplied as integers. *benchmark/benchmark.cpp* Amends reference to `transpose_ref` to include nullptrs to inner strides. *benchmark/reference.cpp* `transpose_ref` can now receive non-integer inner strides - used for evaluating tests. *benchmark/reference.h* Amends template function. *include/hptt.h* Creates overloads including innerStrides (size_t) for create_plan calls *include/transpose.h* Amends functions to receive innerStrides as inputs. *src/hptt.cpp* Includes the new overloads and amends existing to pass nullptr objects in the cases where inner strides are not supplied. *src/transpose.cpp* Amends behaviour of execution to include innerStrides. As `transpose_int` functions are not part of the `Transpose` class, the innerStrides must be passed as new arguments, unchanged throughout, to the `macro_kernel_scalar` and `micro_kernel`. These then use the new strides. An attempt to write support for Arch ARM and Arch AVX has been written but the execution of these is unchecked as the author is not working with access to these operating systems. Further, no support has been included for the B buffer case in the Macro-Kernel which in theory could be included. Further, as a comment to the offset version as well, there has not been any changes made to the plan generation stages - be the effectiveness of these will likely be altered by these commits. *testframework/testframework.cpp* Tests have been added for innerStrides of 1 or 2 to test behaviour (in theory larger strides are fine but exceed the memory capabilities of my device).
…en one and the number of dimenions (a small random number) and number of dimension any value between 1 and MAX_DIM again.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Headline
FEAT: Introduced increased flexibility for handling subtensors by including offset inputs for tensors. This feature includes backwards compatibility for calls to hptt::create_plan without offsets and is therefore not a breaking change.
Performance
Passes all testFramework.cpp tests.
Benchmark Output: hptt_benchmark.txt
Detail:
Makefiles (Makefile, benchmark/Makefile, testframework/Makefile) FIX: In the case of libomp not being discovered in LD_LIBRARY_PATH (MacOS M2 issue), user can specify a path for build.
benchmark/benchmark.cpp
FEAT:
transpose_ref
is for internal use and therefore changes do not include backwards compatibility for the function and hence the function call is amended to pass new nullptr arguments.benchmark/maxFromFiles.py
FIX: Print statement of Error is given parentheses.
benchmark/reference.cpp
FEAT: Firstly, function receives new outerSize (A/B) and offset (A/B) arrays which are initialised to mimic the size array in the supplication of nullptrs. Next, the stepping through B is amended to ensure that the outerSize is traversed where the row of size is exceeded. Further offsets are inserted into the traversal. Behaviour can be verified via DEBUG.
Pseudo-Code is:
for each dimension not the innermost loop of B:
divide the current position by the size of the next innermost loop of B that we want to traverse
move across the offset distance as many times as we have exceeded it plus the initial offset
further move over any space that remains after the end of the block required by size as many times as we exceed it
benchmark/reference.h
FEAT: Amended template to reflect new inputs of transpose_ref(), namely offsetA, offsetB, outerSizeA and outerSizeB.
include/compute_node.h
FEAT: Included three new members of a ComputeNode without exceeding the cache size of 64 bytes (unaligned memory in caches exceeding this programmers be warned!).
First the offset difference (A - B) which reduces the number of calculations required in adjusting for the offset in the execution of hptt. The plan is created with start and end positions inclusive of the offset of B and the difference is added to access the start and end values of A.
FIX: Secondly, the booleans of indexA and indexB indicate true when the leading dimension of A/B is 1 and the index is 0. The original code faultered when A or B's innermost dimensions were 1 causing the transpose_int functions to identify incorrect innermost indexes - especially problematic with non-zero outerSizes.
include/hptt.h
FEAT: New template functions provided for provision of offsets in various floatType contexts.
include/transpose.h
FEAT: Amended skipIndices and verifyParameter to include offset inputs as these functions are effected by the inclusion of these. Also included offsets as properties of the transpose class.
include/utils.h
FEAT: Amended the template of accountForRowMajor as this needs to change the orders of the offsets similarly to the other parameters.
src/hptt.cpp
FEAT: Implemented new offset templates and amended original templates to point to plan() with nullptrs or offsets where appropriate.
src/transpose.cpp
FEAT: Amended plan assignment section to include assignments for the new computeNode members. FEAT: Included offsets in fuseIndices, skipIndices and verifyParameters functions where amendments effect offsets too and verification proves offset + size <= outerSize for all dimensions. FEAT: axpy functions require offset differences as well and so these are calculated and the integer/array passed to the respective functions for proper calculation. Similarly, the axpy functions themselves are amended. FEAT: in transpose_ functions offDiffAB is always added to i to get the correct start/end. Also where lda/ldb == 1 is checked, plan->indexA/B is also asserted to ensure correct blocking is passed. As result of the increased robustness, the blockingA/B can always be confidently passed and loops can be included for cases where scalar is reached and lda/ldb is not 1. FEAT: Included a plethora of DEBUG statements (coding this was very fun).
src/utils.cpp
FEAT: Implemented accountForRowMajor changes for offsets mirroring the behaviour for outerSizes.
testframework/testframework.cpp
FEAT: Improved testing to include triggerable outerSize != size and offsets with strings printed for DEBUG cases. FEAT: Error messages modified for clarity.