Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make lazy loading of the language models optional #79

Closed
davidecaroselli opened this issue Nov 23, 2020 · 4 comments
Closed

Make lazy loading of the language models optional #79

davidecaroselli opened this issue Nov 23, 2020 · 4 comments

Comments

@davidecaroselli
Copy link

Hello!

running the very first detection really takes a lot of time. Moreover, it depends on the actual language detected:

detector.detectLanguageOf("This is an example") // takes ~8 seconds
detector.detectLanguageOf("This is an example") // takes ~4 ms
detector.detectLanguageOf("Questo è un esempio") // takes ~14 seconds

So my question is: how can I create a "warmup" procedure to run once and have all models loaded?
Very trivial implementation: run detect with a multi-language sample. Is there anything more elegant than that?

Thank you!

@pemistahl
Copy link
Owner

Hi Davide, thanks for your question.

The behavior you are describing is on purpose. The language models are loaded only for those languages which are possible for the given input text. That is why the first detection takes more time than the second one. The third detection takes again more time because special characters are encountered for which all fitting models are loaded.

I could change this behavior to load all models before the first detection but why should I do that? The total running time of your three detections would be the same.

Do you have any specific problem with the current behavior?

@davidecaroselli
Copy link
Author

Hi @pemistahl !

No doubt this behavior is perfect for some use cases! Not intended to complain as it was bug :)
My idea would be to have an optional way to pre-load all models at once (and maybe even in parallel with multithreading).

Why this can be useful? Imagine if you want to expose the service to your users: you would like to pre-load models at startup time, this way you avoid having some initial requests take 1000x what it is supposed to take in terms of time per request.

Is there a way I can achieve that without having a list of sentences in all languages, and manually detect them at the beginning?

Thanks!

@pemistahl
Copy link
Owner

pemistahl commented Nov 24, 2020

You are right. For this use case, preloading all models at once would be beneficial. At the moment, this is not possible. But I could implement an optional setting to allow this in the next release. I will put it on my bucket list. :)

Thanks again for your input and for using my library. Very much appreciated.

@davidecaroselli
Copy link
Author

Thank you @pemistahl !

@pemistahl pemistahl changed the title How to warmup models? Make lazy loading of the language models optional Dec 9, 2020
@pemistahl pemistahl added this to the Lingua 1.1.0 milestone Dec 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants