Performance Problem. #27

zacksiri · 2018-10-03T00:10:52Z

I tried switching to nebulex from my hand-rolled version using GenServer + redis. There was quite a performance hit. The 3 bottom results (below 79734e2) were before nebulex, and everything on top is nebulex results.

Could it be that now the workload has shifted from redis to the internal node hence it is more affected by CPU (it's running in a container and I cap the CPU to 1 core). I think further testing is needed.

The text was updated successfully, but these errors were encountered:

cabol · 2018-10-03T00:59:10Z

@zacksiri please give me more details about your environment and config, for instance, how many nodes, what cache topology are you rolling out (partitioned through the distributed adapter maybe?), what operations are you testing, how are you running the tests, in other words, if you can share some example code would be the best.

On other hand, a single docker container, 1 core could be limited, remember one of the most important features about Erlang/Elixir is taking advantage of multicore architectures, so single core and on top of that a docker container, it might be a very limited node.

Other thing thing to keep in mind, the distributed adapter uses "Distributed Erlang/Elixir" and that is implemented on top of TCP, so in terms of performance, it might be comparable with any other optimized protocol that uses TCP (like Redis or Memcached). But anyways, I wouldn't expect too much difference. So if you can repeat the test with better resources for Elixir (Nebulex) boxes as you mentioned, it would be great, and also gime as much details as possible.

Thanks, I stay tuned!

PD: Redis adapter is coming soon, so you can play with the adapters easier, changing the default distributed adapter by Redis adapter (also running in distributed fashion), etc.

zacksiri · 2018-10-03T01:20:14Z

@cabol I'm using the local adapter, to keep things simple for now. It's running on one node. I'm not doing too much. I implemented a custom fetch function on the MyApp.Cache Just doing a fetch call to check if key exists and if it doesn't I call a function to calculate the value.

Here is my config

config :studio, MyApp.Cache,
  adapter: Nebulex.Adapters.Local,
  gc_interval: 86_400

  @spec fetch(any(), keyword(), [{:do, any()}]) :: any()
  def fetch(key, opts, do: calculate) do
    ttl = Keyword.get(opts, :ttl, 86_400)

    if __MODULE__.has_key?(key) do
      __MODULE__.get(key)
    else
      {:ok, value} = calculate
      __MODULE__.set(key, value, ttl: ttl, return: :value)
    end
  end

I'm going to allow the containers to use all the cores on the machine, and see what happens.

cabol · 2018-10-03T01:28:05Z

@zacksiri in that case it is weird, maybe as you mentioned there might be something in the environment, check this link: https://github.com/cabol/nebulex_examples/tree/master/nebulex_bench

There you will see the last bench I ran using the local adapter, you can run the bench tests again in your environment to compare (also there is a description about the environment I used)

Also, you can check in your env running the local bench for Nebulex: mix nebulex.bench; take a look tho the benchmarks section

Let me know!

zacksiri · 2018-10-03T05:11:48Z

Yeah I’m going to run some benches, with Redis the service is on a different node, but moving to nebulex means all the load is in 1 place so all the calls to the cache might add up, for that reason add to the response time. I will do some further research.

cabol · 2018-10-03T13:03:05Z

@zacksiri as far as I understand, you are using the local adapter right? So you are running the tests on a single node, which make sense, you are trying to bench Nebulex locally (single-node), which is fine. The problem is, the node you are using is very small (that's our premise), what I'd do is the same test, but using bigger instances or nodes. For example, in my laptop (4 cores, 8 GB RAM, 256 SSD) I got this bench results (running mix nebulex.bench):

## LocalBench
benchmark name  iterations   average time
update_counter     1000000   1.46 µs/op
has_key?           1000000   1.47 µs/op
get!               1000000   1.56 µs/op
get                1000000   1.57 µs/op
take               1000000   1.62 µs/op
delete             1000000   1.63 µs/op
add                1000000   1.75 µs/op
replace            1000000   1.80 µs/op
set                1000000   1.88 µs/op
update              500000   3.50 µs/op
get_and_update      500000   3.70 µs/op
size                100000   14.00 µs/op
transaction         100000   14.58 µs/op
set_many            100000   14.87 µs/op
all                  10000   107.95 µs/op

I'd try to run the same but in a docker container (or smaller instance), but meanwhile you can try the same, no need to move Nebulex to a single node, you can keep it on each node where you're running your app (that's the idea), but try to roll it out on bigger nodes, at least a bit bigger (2-4 cores and 8 GB of RAM - depends on what amount of data you want to cache).

zacksiri · 2018-10-03T19:03:33Z

I’m going to recreate the node using a container and limiting the cpu to 1 core. To compare to your results. Haven’t had the time today, I’ll let you know what I find. Will try and scale to 2 core then 4 and see how results change.

cabol · 2018-10-03T19:14:18Z

That sounds good, stay tuned !!

zacksiri · 2018-10-08T23:57:14Z

Hey here are the results for 1 core inside the container. it is significantly slower than when I run it on my laptop. on my local laptop I'm getting similar results to yours. I think my cloud VM is really over subscribed

## LocalBench
benchmark name  iterations   average time
has_key?           1000000   1.84 µs/op
update_counter     1000000   1.85 µs/op
take               1000000   1.86 µs/op
get                1000000   2.03 µs/op
get!               1000000   2.14 µs/op
delete             1000000   2.28 µs/op
replace            1000000   2.51 µs/op
add                1000000   2.52 µs/op
set                1000000   2.52 µs/op
get_and_update      500000   4.91 µs/op
update              500000   5.40 µs/op
size                100000   16.65 µs/op
transaction         100000   18.09 µs/op
set_many            100000   21.08 µs/op
all                  10000   129.36 µs/op
get_many              1000   2463.41 µs/op

2 Cores

## LocalBench
benchmark name  iterations   average time
update_counter     1000000   2.00 µs/op
has_key?           1000000   2.28 µs/op
take               1000000   2.32 µs/op
get                1000000   2.48 µs/op
get!               1000000   2.52 µs/op
delete             1000000   2.60 µs/op
add                1000000   2.82 µs/op
set                1000000   2.91 µs/op
replace            1000000   2.99 µs/op
update              500000   5.46 µs/op
get_and_update      500000   5.87 µs/op
transaction         100000   19.99 µs/op
size                100000   20.31 µs/op
set_many            100000   21.25 µs/op
all                  10000   187.35 µs/op
get_many              1000   2719.68 µs/op

4 Cores

## LocalBench
benchmark name  iterations   average time
update_counter     1000000   1.95 µs/op
take               1000000   2.33 µs/op
has_key?           1000000   2.38 µs/op
get                1000000   2.53 µs/op
delete             1000000   2.54 µs/op
get!               1000000   2.66 µs/op
add                1000000   2.87 µs/op
set                1000000   2.87 µs/op
replace             500000   2.96 µs/op
get_and_update      500000   5.65 µs/op
update              500000   6.00 µs/op
size                100000   20.43 µs/op
transaction         100000   20.75 µs/op
set_many            100000   22.01 µs/op
all                  10000   170.18 µs/op
get_many               500   3037.67 µs/op

zacksiri · 2018-10-09T00:18:00Z

I just tried on 2 different cloud providers, I'm getting very similar results between them. this one I posted is actually better.

Also there doesn't seem to be a difference between container or vm

I think something doesn't add up. 2.5µs is fast, and if i make 100 calls it should still be only add 0.25ms to my response time. maybe i'm doing something wrong.

zacksiri · 2018-10-09T13:24:12Z

@cabol just wanted to let you know I've resolved the performance issue, It had nothing to do with nebulex, it was my implementation making N+1 calls to the cache. which meant it was saturating my CPU which made the response time go up. Once I removed the N+1 Nebulex is now faster than redis as expected.

I'm so impressed with Nebulex I'll be covering it in my Elixir Foundation videos series, and make it as a part of my company's standard tools.

I'm also looking forward to the 1.0 release. Thank you for your hardwork on this library it's beautifully implemented.

cabol · 2018-10-09T13:46:21Z

@zacksiri awesome, that's good to know, I like this kind of issues a lot, because they help you to improve or discover bugs, or new issues, but I'm glad to hear you were able to fix your issue. Any further issue(s) don't hesitate to ping me or open a new one.

On other hand I'm also glad Nebulex is being useful for you (that's the goal) and I'd really appreciate if you cover it in you Elixir Foundation, that would be great!!

And, I'm working very hard to release 1.0.0 version ASAP, there are a lot of improvements and new features.

Thank you very much for your post and findings, it was very fruitful :)

zacksiri · 2018-10-09T13:57:07Z

So just to complete the issue with a screenshot.

everything above 54e5fc6 is with nebulex everything below is redis. Ignore the first request that says 156ms that has to make the initial network call to cache a token, everyhing after that call is using nebulex cache. I'm getting about 3x improvement overall.

cabol · 2018-10-09T14:03:57Z

👍

zacksiri closed this as completed Oct 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Problem. #27

Performance Problem. #27

zacksiri commented Oct 3, 2018 •

edited

Loading

cabol commented Oct 3, 2018 •

edited

Loading

zacksiri commented Oct 3, 2018 •

edited

Loading

cabol commented Oct 3, 2018

zacksiri commented Oct 3, 2018

cabol commented Oct 3, 2018 •

edited

Loading

zacksiri commented Oct 3, 2018 •

edited

Loading

cabol commented Oct 3, 2018

zacksiri commented Oct 8, 2018 •

edited

Loading

zacksiri commented Oct 9, 2018 •

edited

Loading

zacksiri commented Oct 9, 2018 •

edited

Loading

cabol commented Oct 9, 2018 •

edited

Loading

zacksiri commented Oct 9, 2018 •

edited

Loading

cabol commented Oct 9, 2018

Performance Problem. #27

Performance Problem. #27

Comments

zacksiri commented Oct 3, 2018 • edited Loading

cabol commented Oct 3, 2018 • edited Loading

zacksiri commented Oct 3, 2018 • edited Loading

cabol commented Oct 3, 2018

zacksiri commented Oct 3, 2018

cabol commented Oct 3, 2018 • edited Loading

zacksiri commented Oct 3, 2018 • edited Loading

cabol commented Oct 3, 2018

zacksiri commented Oct 8, 2018 • edited Loading

zacksiri commented Oct 9, 2018 • edited Loading

zacksiri commented Oct 9, 2018 • edited Loading

cabol commented Oct 9, 2018 • edited Loading

zacksiri commented Oct 9, 2018 • edited Loading

cabol commented Oct 9, 2018

zacksiri commented Oct 3, 2018 •

edited

Loading

cabol commented Oct 3, 2018 •

edited

Loading

zacksiri commented Oct 3, 2018 •

edited

Loading

cabol commented Oct 3, 2018 •

edited

Loading

zacksiri commented Oct 3, 2018 •

edited

Loading

zacksiri commented Oct 8, 2018 •

edited

Loading

zacksiri commented Oct 9, 2018 •

edited

Loading

zacksiri commented Oct 9, 2018 •

edited

Loading

cabol commented Oct 9, 2018 •

edited

Loading

zacksiri commented Oct 9, 2018 •

edited

Loading