How DeepSeek uses a Mixture of Experts architecture and other training techniques to outperform more expensive models 👇 https://hubs.la/Q0347b7q0
DeepSeek is really pushing the boundaries with Mixture of Experts! Exciting to see how these techniques make models more efficient.
DeepLearning.AI I'm sorry, but there's no way that something like this will replace tool use. Why is that? Because with tool use, especially with memory management in the loop, you can extend the initial training data significantly. I have no idea how you would deal with the complexity of non-linearity otherwise. This is absolutely relevant in engineering, and I've actually advocate for deepseek to have tool use because it often will say that it needs to research things and is not able to do it without access to search and scrape or extract capabilities. You can literally give this model echolocation type abilities with something like Firecrawl. This also calls into question your interpretation of open versus closed models. I realize that you mean open as open source, but it might actually be more useful to think of open as something that has tools and can actually access data beyond its initial training set in real time.
Very informative
Love this
But it has copied or maybe trained on ChatGPT data. I am not sure if this is a bad copy/paste of a DeepSeek engineer or just a anomaly from same training data, but it actually provides references to OpenAI/ChatGPT documentation for something it should just answer it self or refer to its own documentation. I asked it to provide instructions for downloading the chat conversation, and one of the options literally has ChatGPT written in it and provides link to OpenAI documentation. So I am a bit skeptical of DeepSeek.