Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buffered performance improvements and bug fix #2890

Merged
merged 6 commits into from
Oct 11, 2018

Conversation

dipinhora
Copy link
Contributor

This PR makes a number of enhancements to the buffered package.

  • Add benchmarks to buffered package
    Follows same convention as collections/persistent by creating a
    benchmarks subdirectory.

  • Efficient buffer reading and writing + numeric array conversions
    This commit enhances the performance of the buffered package by
    adding pointer conversion based reads and writes. The new functions
    are limited to only work for Array[U8]s.

    This commit also adds in the ability to convert numeric array types
    between each other for reading.

  • Fix Files.mixedwrite test for coalescing buffer behavior.

  • Fix bug related to searching within a Reader
    Prior to this commit, if an unsuccessful search was performed,
    the _distance_of function would cache the last node checked
    for a value and future searches would resume from the next node
    after the cached node. Unfortunately, this results in incorrect
    behavior (an error) if someone were to search for a value that
    doesn't exist in the buffer and then search for a value that
    does exist in the buffer. This is because the search for the
    existing value picks up where the search for the non-existing
    value left off and reaches the end of the buffer and determines
    that the value doesn't exist. However, if the search were to be
    conducted starting from the current position in the buffer, then
    the existing value would be correctly found by the _distance_of
    function.

    This commit fixes this by removing the caching and always having
    _distance_of restart it's search from the current position in
    the buffer. It also add a couple of test cases to explicitly
    check that the searches are working correctly.


The following are some very unscientific/not controlled benchmark results both before and after the changes to illustrate the performance gains.

Benchmarks before these changes:

vagrant@ubuntu-xenial:~/dhp/packages/buffered/benchmarks$ ./benchmarks.before --ponythreads=1 --ponyminthreads=1
Benchmark results will have their mean and median adjusted for overhead.
You may disable this with --noadjust.

Benchmark                                   mean            median   deviation  iterations
_ReaderU8                                  33 ns             32 ns      ±7.27%     2000000
_ReaderU16LE                               70 ns             71 ns      ±2.34%     2000000
_ReaderU16LESplit                         464 ns            497 ns     ±15.03%     1000000
_ReaderU16BE                               90 ns             91 ns      ±6.01%     2000000
_ReaderU16BESplit                         488 ns            498 ns      ±9.77%     1000000
_ReaderU32LE                              180 ns            185 ns      ±7.52%     1000000
_ReaderU32LESplit                         909 ns            928 ns      ±7.78%      500000
_ReaderU32BE                              194 ns            198 ns      ±8.84%     1000000
_ReaderU32BESplit                         945 ns            972 ns      ±7.63%      500000
_ReaderU64LE                              420 ns            430 ns     ±11.09%      500000
_ReaderU64LESplit                        1838 ns           1855 ns     ±10.57%      100000
_ReaderU64BE                              402 ns            410 ns     ±13.34%      500000
_ReaderU64BESplit                        2083 ns           2138 ns     ±10.25%      100000
_ReaderU128LE                             903 ns            929 ns     ±14.41%      300000
_ReaderU128LESplit                       5529 ns           5369 ns     ±22.22%       30000
_ReaderU128BE                             816 ns            821 ns      ±8.14%      300000
_ReaderU128BESplit                       3684 ns           3806 ns     ±13.56%       50000
_WriterU8                                  10 ns             11 ns      ±6.24%     3000000
_WriterU16LE                               12 ns             10 ns      ±6.93%     3000000
_WriterU16BE                               13 ns             12 ns      ±7.85%     3000000
_WriterU32LE                               17 ns             16 ns      ±6.09%     2000000
_WriterU32BE                               17 ns             16 ns      ±7.94%     2000000
_WriterU64LE                               35 ns             34 ns      ±6.14%     2000000
_WriterU64BE                               42 ns             42 ns      ±5.08%     2000000
_WriterU128LE                              78 ns             75 ns      ±7.96%     1000000
_WriterU128BE                              73 ns             73 ns      ±6.70%     1000000

Benchmarks after these changes:

vagrant@ubuntu-xenial:~/dhp/packages/buffered/benchmarks$ ./benchmarks.after --ponythreads=1 --ponyminthreads=1
Benchmark results will have their mean and median adjusted for overhead.
You may disable this with --noadjust.

Benchmark                                   mean            median   deviation  iterations
_ReaderU8                                  34 ns             34 ns      ±2.76%     2000000
_ReaderU16LE                               33 ns             34 ns      ±1.62%     2000000
_ReaderU16LESplit                         459 ns            493 ns     ±14.70%     1000000
_ReaderU16BE                               43 ns             43 ns      ±4.41%     2000000
_ReaderU16BESplit                         502 ns            513 ns      ±8.88%     1000000
_ReaderU32LE                               45 ns             45 ns      ±1.58%     2000000
_ReaderU32LESplit                         925 ns            944 ns      ±6.64%      300000
_ReaderU32BE                               45 ns             45 ns      ±2.91%     2000000
_ReaderU32BESplit                        1022 ns           1040 ns      ±7.33%      500000
_ReaderU64LE                               49 ns             49 ns      ±2.69%     2000000
_ReaderU64LESplit                        1841 ns           1878 ns      ±8.32%      100000
_ReaderU64BE                               49 ns             49 ns      ±2.64%     2000000
_ReaderU64BESplit                        1934 ns           1919 ns      ±5.91%      100000
_ReaderU128LE                              51 ns             50 ns      ±2.20%     2000000
_ReaderU128LESplit                       3812 ns           3839 ns      ±7.86%       50000
_ReaderU128BE                              40 ns             40 ns      ±1.05%     2000000
_ReaderU128BESplit                       3055 ns           3252 ns     ±17.97%       50000
_WriterU8                                   8 ns              8 ns      ±1.28%     3000000
_WriterU16LE                                5 ns              5 ns      ±1.14%     5000000
_WriterU16BE                                7 ns              7 ns      ±1.47%     5000000
_WriterU32LE                                6 ns              6 ns      ±1.70%     3000000
_WriterU32BE                                7 ns              6 ns      ±1.61%     3000000
_WriterU64LE                                8 ns              8 ns      ±1.55%     3000000
_WriterU64BE                                7 ns              7 ns      ±1.73%     3000000
_WriterU128LE                               7 ns              7 ns      ±0.65%     3000000
_WriterU128BE                              10 ns             10 ns      ±0.98%     2000000

Follows same convention as `collections/persistent` by creating a
`benchmarks` subdirectory.
This commit enhances the performance of the `buffered` package by
adding pointer conversion based reads and writes. The new functions
are limited to only work for `Array[U8]`s.

This commit also adds in the ability to convert numeric array types
between each other for reading.
Prior to this commit, if an unsuccessful search was performed,
the `_distance_of` function would cache the last node checked
for a value and future searches would resume from the next node
after the cached node. Unfortunately, this results in incorrect
behavior (an error) if someone were to search for a value that
doesn't exist in the buffer and then search for a value that
does exist in the buffer. This is because the search for the
existing value picks up where the search for the non-existing
value left off and reaches the end of the buffer and determines
that the value doesn't exist. However, if the search were to be
conducted starting from the current position in the buffer, then
the existing value would be correctly found by the `_distance_of`
function.

This commit fixes this by removing the caching and always having
`_distance_of` restart it's search from the current position in
the buffer. It also add a couple of test cases to explicitly
check that the searches are working correctly.
// into a single bigger array
if data.size() <= 64 then
match data
| let d: String => let a = d.array(); _current.copy_from(a, 0, _current.size(), a.size())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wider than 80 columns, and it has two statements separated by a semicolon, which is usually a discouraged style.
You should probably split it onto separate lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

"""
let bytes = _element_size()
let u16_bytes = U16(0).bytewidth()
if ((offset*bytes) + u16_bytes) <= (_size*bytes) then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please surround the * operator with a space on each side, here and in the other places in this file below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

"""
Reads a U16 from offset. This is only allowed for an array of U8s.
"""
let bytes = _element_size()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this function is only allowed for Array[U8], why do we need _element_size() here? Won't it always be 1? Or am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, _element_size() will always be 1. I had originally written these functions to work for all numeric array types and then later decided to limit them only to Array[U8].

I left _element_size() because I had assumed the compiler would optimize it away.

Let me know if you would prefer I replace _element_size() with 1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure to what degree it will get optimized away at runtime - I'm mainly concerned with making sure the code as no more complex than it needs to be, for maintainability reasons.

So my vote would be to remove it, along with the multiplication operations below where the bytes local is used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Complexity removed.

@@ -0,0 +1,13 @@
primitive NumericArrayConverter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to be required for the buffered perf improvements, and as a new feature it should probably go through the RFC process (or, at the very least, be in a separate PR).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed. It was included because this was my initial plan/approach to solving the problem before adding in the functions on Array[U8].

@@ -752,7 +752,7 @@ class iso _TestFileMixedWriteQueue is UnitTest
line3 + "\n"
line5
line1
line3 + "\n"
line3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change related to the others?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is due to the Writer now coalescing write() buffers into one big buffer if the buffers being added are smaller than a cacheline (https://github.com/ponylang/ponyc/pull/2890/files#diff-761128221a9d08227d2ae82f79a30ea2R258).

The fact that the output of Writer.done() can be split across multiple Array[ByteSeq]s is an implementation detail (one which changes slightly due to the new coalescing behavior of Writer.write()).

The test being fixed relies on the printv() call adding in \n after every individual buffer returned from Writer.done() and the new coalescing only produces 1 buffer instead of 2 resulting in the test failing without this fix.

@jemc jemc added the changelog - fixed Automatically add "Fixed" CHANGELOG entry on merge label Oct 3, 2018
@SeanTAllen
Copy link
Member

@jemc are you good with these changes?

@SeanTAllen
Copy link
Member

@dipinhora can you add release notes that should be included with this release into this PR?

@SeanTAllen SeanTAllen mentioned this pull request Oct 11, 2018
@SeanTAllen SeanTAllen merged commit e7199b2 into ponylang:master Oct 11, 2018
ponylang-main added a commit that referenced this pull request Oct 11, 2018
@dipinhora
Copy link
Contributor Author

@SeanTAllen is the commit message not adequate for what the release notes require?

@SeanTAllen
Copy link
Member

@dipinhora that's a lot of information, something paired down to a short paragraph would be best.

@dipinhora
Copy link
Contributor Author

@SeanTAllen How's the following?

Pony now has efficient reading and writing of numeric datatypes from Array[U8]s and the buffered Reader and Writer classes. Also included is a bug fix related to searching for bytes within a Reader where searches that would fail incorrectly will now succeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog - fixed Automatically add "Fixed" CHANGELOG entry on merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants