找回密码
 立即注册
搜索

17 MB 人工智能聊天机器人来了!



Github:https://github.com/mcks2000/llm_notebooks/blob/main/notebooks/17megabyteChatBot.ipynb
小语言模型兴起

小语言模型(参数小于 30 亿个的人工智能模型)的兴起已成为一种趋势。高达 Gemma 2B 的强大 SLM 在提供较低硬件资源的同时展现出令人难以置信的灵活性:


  • Gemma-2B
  • Danube-1.8B_chat
  • TinyLlama 1.1B family
  • Cosmo-1B
  • Qwen/Quyen Mini (1.8B)
无论如何,尝试在不进行量化的情况下运行上述模型对于 CPU 来说是一项艰巨的工作。我们应该选择 llama-cpp 解决方案并以 GGUF 格式运行模型,从而给出有希望且令人满意的结果。

但是,如果我告诉你,我们可以用只有 17 MB 重量的 AI 设置一个聊天机器人呢?

AI能做什么

首先,让我们关注期望。在这里,我指的是一个FAQ风格的聊天机器人,其中的脚本和一对问题和答案已经策划和准备。

在这种特定情况下,老实说,它可以覆盖聊天机器人高达 80% 的实际使用量,可以使用超微小语言模型,并且仍然提供流畅的聊天体验。

为了进行测试,我使用了流行的 WindscribeFAQ 部分作为现成的问题/答案对数据库。在这里,我们将使用相同的方法。




Ms-marco-TinyBERT-L-2 是一个交叉编码器

首先,了解我们的场景中有哪些编码器非常重要。确定两个或多个句子之间语义相似性的最佳技术是语义搜索。

语义搜索使用嵌入模型(编码器)来获取句子对,然后确定与余弦或点积函数的相似性。用于此任务的嵌入模型基本上有两种类型:双编码器和交叉编码器。

双编码器为给定的句子生成一个句子嵌入。我们将句子 A 和 B 独立传递给 BERT,这导致句子嵌入 u 和 v。然后可以使用余弦相似度来比较这些句子嵌入:






相比之下,对于交叉编码器,我们将两个句子同时传递给 Transformer 网络。然后,它会生成一个介于 0 和 1 之间的输出值,表示输入句子对的相似性。

交叉编码器不会生成句子嵌入。Sentence-Transformers (sbert) 背后的惊人团队是这样解释的:




策略是什么?

如果我们花费(我会说如果我们投资)大量时间来管理我们的问答数据集,我们可以创建一个聊天机器人,它可以理解用户请求并返回符合用户意图的最佳答案。

基本上,我们正在应用语义搜索和基于 Retriever 的聊天机器人。我们不使用任何类型的生成式人工智能,但我们将达到相同的结果:回答问题!




交叉编码器聊天机器人

!pip install sentence_transformers!pip install rich
这些是唯一的包。实际上,火把和变压器会自动安装句子变压器。

这些是唯一的软件包。实际上torchestransformers 会自动安装 sentence-transformers

我们来看一下数据。为了方便起见,我只保留了前 7 个条目

listato = [    {'q': 'What is WireGuard?' , 'a': 'WireGuard is a connection protocol used in the Windscribe desktop and mobile applications. It is typically faster than OpenVPN (called UDP and TCP in the apps) and more flexible than IKEv2, making it a great option for securing your online activity.' },    {'q': 'What kind of encryption does Windscribe use?', 'a' : "Windscribe's encryption varies based on the protocol selected, as well as the format of our app you are using: OpenVPN: Our OpenVPN implementation uses the AES-256-GCM cipher with SHA512 auth and a 4096-bit RSA key. Perfect forward secrecy is also supported. IKEv2: Our in-app IKEv2 implementation utilizes AES-256-GCM for encryption, SHA-256 for integrity checks. Desktop and Android apps use ECP384 for Diffie-Hellman key negotiation (DH group 20), and iOS uses ECP521 for Diffie-Hellman key negotiation (DH group 21). WireGuard: WireGuard is an opinionated protocol that uses ChaCha20 for symmetric encryption, authenticated with Poly1305; Curve25519 for ECDH; BLAKE2s for hashing and keyed hashing; SipHash24 for hashtable keys; and HKDF for key derivation."},    {'q': 'What is OpenVPN?', 'a' : 'Our OpenVPN implementation uses the AES-256-GCM cipher with SHA512 auth and a 4096-bit RSA key. Perfect forward secrecy is also supported.'},    {'q': 'What is IKEv2?', 'a' : 'Our in-app IKEv2 implementation utilizes AES-256-GCM for encryption, SHA-256 for integrity checks. Desktop and Android apps use ECP384 for Diffie-Hellman key negotiation (DH group 20), and iOS uses ECP521 for Diffie-Hellman key negotiation (DH group 21).'},    {'q': 'What is WireGuard?', 'a' : 'WireGuard is an opinionated protocol that uses ChaCha20 for symmetric encryption, authenticated with Poly1305; Curve25519 for ECDH; BLAKE2s for hashing and keyed hashing; SipHash24 for hashtable keys; and HKDF for key derivation.'},    {'q': 'Can I use Windscribe without an internet connection?', 'a' : 'In order to use Windscribe, you must have an existing internet connection, such as mobile data, home internet or public WiFi. Windscribe does not provide an internet connection, we simply encrypt and reroute your existing internet connection through our secure VPN servers. This means that the bandwidth you use with the VPN is also being used with your Internet Service Provider (ISP).'},    {'q': 'Where is Windscribe located?', 'a' : 'The actual Windscribe headquarters is based in Toronto, Ontario, Canada. However, we are a diverse company with employees working on three continents, in over 7 countries.'},]
这是我们将拆分的字典列表。我们的 SuperMicro 交叉编码器基本上是一个不需要任何预嵌入函数的重排序器。

实际上,聊天机器人的核心在于这 5 行代码:

from sentence_transformers import CrossEncoder# Load the model, here we use our base sized modelmodel = CrossEncoder("cross-encoder/ms-marco-TinyBERT-L-2")  # 17 MegaByte !!# Lets get the scoresquery = 'Where is Windscribe located?'# Classic RAG approachresults_q = model.rank(query, questions, return_documents=True, top_k=3)# Our approachresults_a = model.rank(query, answers, return_documents=True, top_k=3)
经典的 RAG/相似性搜索方法将尝试将查询与文档(在我们的场景中为答案)相匹配。由于数据集是精心策划的,我们将直接将用户查询与问题配对。

我们可以首先对其进行测试,并使用代码中的查询“Where is Windscribe located?”查看结果。






给定文档(块)列表,ms-marco-TinyBERT-L-2 为我们提供了一个列表,按与查询语义含义匹配的最佳命中排序。这就是我们所需要的。

用户在聊天中的每个输入都将通过交叉编码器,我们将以流式传输效果返回匹配的答案。

import time,os,sys# typewriter effectdef typingPrint(text):  for character in text:    sys.stdout.write(character)    sys.stdout.flush()    time.sleep(0.05)console.print("Hello! How can I help you today?")q = input("User: ")results_a = model.rank(q, answers, return_documents=True, top_k=3)# we print the result with the ID of the questions from the answersid = results_a[0]['corpus_id']typingPrint(answers[id])
而且,显示是我们想要的结果

Hello! How can I help you today?User: where is Windscribe?The actual Windscribe headquarters is based in Toronto, Ontario, Canada. However, we are a diverse company with employees working on threecontinents, in over 7 countries.Hello! How can I help you today?User: what are Windscribe's encryptions?Windscribe's encryption varies based on the protocol selected, as well as the format of our app you are using: OpenVPN: Our OpenVPN implementation uses the AES-256-GCM cipher with SHA512 auth and a 4096-bit RSA key. Perfect forward secrecy is also supported. IKEv2: Our in-app IKEv2 implementation utilizes ...


结论

为一个小型网站或小型企业做一个对话机器人,一定不需要一个巨大的LLM。一个超微型模型 (17 MB) 就足够了!

而秘诀都与充分的准备有关:


  • 好数据
  • 良好的组织
  • 明确的目标
资源:


  • Github:https://github.com/mcks2000/llm_notebooks/blob/main/notebooks/17megabyteChatBot.ipynb
  • ms-marco-TinyBERT-L-2:https://huggingface.co/cross-encoder/ms-marco-TinyBERT-L-2
点赞关注 二师兄 talk 获取更多资讯,并在 头条 上阅读我的短篇技术文章

本帖子中包含更多资源

您需要 登录 才可以下载或查看,没有帐号?立即注册

x
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册