`buffered` performance improvements and bug fix #2890

dipinhora · 2018-09-29T00:19:01Z

This PR makes a number of enhancements to the buffered package.

Add benchmarks to buffered package
Follows same convention as collections/persistent by creating a
benchmarks subdirectory.
Efficient buffer reading and writing + numeric array conversions
This commit enhances the performance of the buffered package by
adding pointer conversion based reads and writes. The new functions
are limited to only work for Array[U8]s.

This commit also adds in the ability to convert numeric array types
between each other for reading.
Fix Files.mixedwrite test for coalescing buffer behavior.
Fix bug related to searching within a Reader
Prior to this commit, if an unsuccessful search was performed,
the _distance_of function would cache the last node checked
for a value and future searches would resume from the next node
after the cached node. Unfortunately, this results in incorrect
behavior (an error) if someone were to search for a value that
doesn't exist in the buffer and then search for a value that
does exist in the buffer. This is because the search for the
existing value picks up where the search for the non-existing
value left off and reaches the end of the buffer and determines
that the value doesn't exist. However, if the search were to be
conducted starting from the current position in the buffer, then
the existing value would be correctly found by the _distance_of
function.

This commit fixes this by removing the caching and always having
_distance_of restart it's search from the current position in
the buffer. It also add a couple of test cases to explicitly
check that the searches are working correctly.

The following are some very unscientific/not controlled benchmark results both before and after the changes to illustrate the performance gains.

Benchmarks before these changes:

vagrant@ubuntu-xenial:~/dhp/packages/buffered/benchmarks$ ./benchmarks.before --ponythreads=1 --ponyminthreads=1
Benchmark results will have their mean and median adjusted for overhead.
You may disable this with --noadjust.

Benchmark                                   mean            median   deviation  iterations
_ReaderU8                                  33 ns             32 ns      ±7.27%     2000000
_ReaderU16LE                               70 ns             71 ns      ±2.34%     2000000
_ReaderU16LESplit                         464 ns            497 ns     ±15.03%     1000000
_ReaderU16BE                               90 ns             91 ns      ±6.01%     2000000
_ReaderU16BESplit                         488 ns            498 ns      ±9.77%     1000000
_ReaderU32LE                              180 ns            185 ns      ±7.52%     1000000
_ReaderU32LESplit                         909 ns            928 ns      ±7.78%      500000
_ReaderU32BE                              194 ns            198 ns      ±8.84%     1000000
_ReaderU32BESplit                         945 ns            972 ns      ±7.63%      500000
_ReaderU64LE                              420 ns            430 ns     ±11.09%      500000
_ReaderU64LESplit                        1838 ns           1855 ns     ±10.57%      100000
_ReaderU64BE                              402 ns            410 ns     ±13.34%      500000
_ReaderU64BESplit                        2083 ns           2138 ns     ±10.25%      100000
_ReaderU128LE                             903 ns            929 ns     ±14.41%      300000
_ReaderU128LESplit                       5529 ns           5369 ns     ±22.22%       30000
_ReaderU128BE                             816 ns            821 ns      ±8.14%      300000
_ReaderU128BESplit                       3684 ns           3806 ns     ±13.56%       50000
_WriterU8                                  10 ns             11 ns      ±6.24%     3000000
_WriterU16LE                               12 ns             10 ns      ±6.93%     3000000
_WriterU16BE                               13 ns             12 ns      ±7.85%     3000000
_WriterU32LE                               17 ns             16 ns      ±6.09%     2000000
_WriterU32BE                               17 ns             16 ns      ±7.94%     2000000
_WriterU64LE                               35 ns             34 ns      ±6.14%     2000000
_WriterU64BE                               42 ns             42 ns      ±5.08%     2000000
_WriterU128LE                              78 ns             75 ns      ±7.96%     1000000
_WriterU128BE                              73 ns             73 ns      ±6.70%     1000000

Benchmarks after these changes:

vagrant@ubuntu-xenial:~/dhp/packages/buffered/benchmarks$ ./benchmarks.after --ponythreads=1 --ponyminthreads=1
Benchmark results will have their mean and median adjusted for overhead.
You may disable this with --noadjust.

Benchmark                                   mean            median   deviation  iterations
_ReaderU8                                  34 ns             34 ns      ±2.76%     2000000
_ReaderU16LE                               33 ns             34 ns      ±1.62%     2000000
_ReaderU16LESplit                         459 ns            493 ns     ±14.70%     1000000
_ReaderU16BE                               43 ns             43 ns      ±4.41%     2000000
_ReaderU16BESplit                         502 ns            513 ns      ±8.88%     1000000
_ReaderU32LE                               45 ns             45 ns      ±1.58%     2000000
_ReaderU32LESplit                         925 ns            944 ns      ±6.64%      300000
_ReaderU32BE                               45 ns             45 ns      ±2.91%     2000000
_ReaderU32BESplit                        1022 ns           1040 ns      ±7.33%      500000
_ReaderU64LE                               49 ns             49 ns      ±2.69%     2000000
_ReaderU64LESplit                        1841 ns           1878 ns      ±8.32%      100000
_ReaderU64BE                               49 ns             49 ns      ±2.64%     2000000
_ReaderU64BESplit                        1934 ns           1919 ns      ±5.91%      100000
_ReaderU128LE                              51 ns             50 ns      ±2.20%     2000000
_ReaderU128LESplit                       3812 ns           3839 ns      ±7.86%       50000
_ReaderU128BE                              40 ns             40 ns      ±1.05%     2000000
_ReaderU128BESplit                       3055 ns           3252 ns     ±17.97%       50000
_WriterU8                                   8 ns              8 ns      ±1.28%     3000000
_WriterU16LE                                5 ns              5 ns      ±1.14%     5000000
_WriterU16BE                                7 ns              7 ns      ±1.47%     5000000
_WriterU32LE                                6 ns              6 ns      ±1.70%     3000000
_WriterU32BE                                7 ns              6 ns      ±1.61%     3000000
_WriterU64LE                                8 ns              8 ns      ±1.55%     3000000
_WriterU64BE                                7 ns              7 ns      ±1.73%     3000000
_WriterU128LE                               7 ns              7 ns      ±0.65%     3000000
_WriterU128BE                              10 ns             10 ns      ±0.98%     2000000

Follows same convention as `collections/persistent` by creating a `benchmarks` subdirectory.

This commit enhances the performance of the `buffered` package by adding pointer conversion based reads and writes. The new functions are limited to only work for `Array[U8]`s. This commit also adds in the ability to convert numeric array types between each other for reading.

Prior to this commit, if an unsuccessful search was performed, the `_distance_of` function would cache the last node checked for a value and future searches would resume from the next node after the cached node. Unfortunately, this results in incorrect behavior (an error) if someone were to search for a value that doesn't exist in the buffer and then search for a value that does exist in the buffer. This is because the search for the existing value picks up where the search for the non-existing value left off and reaches the end of the buffer and determines that the value doesn't exist. However, if the search were to be conducted starting from the current position in the buffer, then the existing value would be correctly found by the `_distance_of` function. This commit fixes this by removing the caching and always having `_distance_of` restart it's search from the current position in the buffer. It also add a couple of test cases to explicitly check that the searches are working correctly.

jemc · 2018-09-29T03:50:48Z

packages/buffered/writer.pony

+    // into a single bigger array
+    if data.size() <= 64 then
+      match data
+      | let d: String => let a = d.array(); _current.copy_from(a, 0, _current.size(), a.size())


This is wider than 80 columns, and it has two statements separated by a semicolon, which is usually a discouraged style.
You should probably split it onto separate lines.

jemc · 2018-09-29T03:50:55Z

packages/builtin/array.pony

+    """
+    let bytes = _element_size()
+    let u16_bytes = U16(0).bytewidth()
+    if ((offset*bytes) + u16_bytes) <= (_size*bytes) then


Please surround the * operator with a space on each side, here and in the other places in this file below.

jemc · 2018-09-29T04:15:48Z

packages/builtin/array.pony

+    """
+    Reads a U16 from offset. This is only allowed for an array of U8s.
+    """
+    let bytes = _element_size()


If this function is only allowed for Array[U8], why do we need _element_size() here? Won't it always be 1? Or am I missing something?

Yes, _element_size() will always be 1. I had originally written these functions to work for all numeric array types and then later decided to limit them only to Array[U8].

I left _element_size() because I had assumed the compiler would optimize it away.

Let me know if you would prefer I replace _element_size() with 1.

I'm not sure to what degree it will get optimized away at runtime - I'm mainly concerned with making sure the code as no more complex than it needs to be, for maintainability reasons.

So my vote would be to remove it, along with the multiplication operations below where the bytes local is used.

Complexity removed.

jemc · 2018-09-29T04:42:03Z

packages/builtin/numeric_array_converter.pony

@@ -0,0 +1,13 @@
+primitive NumericArrayConverter


This doesn't seem to be required for the buffered perf improvements, and as a new feature it should probably go through the RFC process (or, at the very least, be in a separate PR).

Removed. It was included because this was my initial plan/approach to solving the problem before adding in the functions on Array[U8].

jemc · 2018-09-29T04:42:50Z

packages/files/_test.pony

@@ -752,7 +752,7 @@ class iso _TestFileMixedWriteQueue is UnitTest
            line3 + "\n"
            line5
            line1
-            line3 + "\n"
+            line3


Is this change related to the others?

Yes, this is due to the Writer now coalescing write() buffers into one big buffer if the buffers being added are smaller than a cacheline (https://github.com/ponylang/ponyc/pull/2890/files#diff-761128221a9d08227d2ae82f79a30ea2R258).

The fact that the output of Writer.done() can be split across multiple Array[ByteSeq]s is an implementation detail (one which changes slightly due to the new coalescing behavior of Writer.write()).

The test being fixed relies on the printv() call adding in \n after every individual buffer returned from Writer.done() and the new coalescing only produces 1 buffer instead of 2 resulting in the test failing without this fix.

SeanTAllen · 2018-10-11T00:30:33Z

@jemc are you good with these changes?

SeanTAllen · 2018-10-11T00:31:10Z

@dipinhora can you add release notes that should be included with this release into this PR?

dipinhora · 2018-10-12T16:47:24Z

@SeanTAllen is the commit message not adequate for what the release notes require?

SeanTAllen · 2018-10-12T17:48:22Z

@dipinhora that's a lot of information, something paired down to a short paragraph would be best.

dipinhora · 2018-10-12T20:44:37Z

@SeanTAllen How's the following?

Pony now has efficient reading and writing of numeric datatypes from Array[U8]s and the buffered Reader and Writer classes. Also included is a bug fix related to searching for bytes within a Reader where searches that would fail incorrectly will now succeed.

dipinhora added 4 commits September 28, 2018 19:35

Add benchmarks to buffered package

b93fdd1

Follows same convention as `collections/persistent` by creating a `benchmarks` subdirectory.

Fix Files.mixedwrite test for coalescing buffer behavior.

de7d517

jemc requested changes Sep 29, 2018

View reviewed changes

Fix style; remove array converter

3e7f6c9

jemc added the changelog - fixed Automatically add "Fixed" CHANGELOG entry on merge label Oct 3, 2018

Remove unnecessary complexity

118a92d

jemc approved these changes Oct 11, 2018

View reviewed changes

SeanTAllen mentioned this pull request Oct 11, 2018

Release 0.25.0 #2874

Closed

SeanTAllen merged commit e7199b2 into ponylang:master Oct 11, 2018

ponylang-main added a commit that referenced this pull request Oct 11, 2018

Update CHANGELOG for PR #2890 [skip ci]

65f4ce1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`buffered` performance improvements and bug fix #2890

`buffered` performance improvements and bug fix #2890

dipinhora commented Sep 29, 2018

jemc Sep 29, 2018

dipinhora Sep 29, 2018

jemc Sep 29, 2018

dipinhora Sep 29, 2018

jemc Sep 29, 2018

dipinhora Sep 29, 2018

jemc Oct 3, 2018

dipinhora Oct 4, 2018

jemc Sep 29, 2018

dipinhora Sep 29, 2018

jemc Sep 29, 2018

dipinhora Sep 29, 2018

SeanTAllen commented Oct 11, 2018

SeanTAllen commented Oct 11, 2018

dipinhora commented Oct 12, 2018

SeanTAllen commented Oct 12, 2018

dipinhora commented Oct 12, 2018

buffered performance improvements and bug fix #2890

buffered performance improvements and bug fix #2890

Conversation

dipinhora commented Sep 29, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SeanTAllen commented Oct 11, 2018

SeanTAllen commented Oct 11, 2018

dipinhora commented Oct 12, 2018

SeanTAllen commented Oct 12, 2018

dipinhora commented Oct 12, 2018

`buffered` performance improvements and bug fix #2890

`buffered` performance improvements and bug fix #2890