{"id":284872,"date":"2024-02-28T18:18:17","date_gmt":"2024-02-28T12:48:17","guid":{"rendered":"https:\/\/forumias.com\/blog\/?p=284872"},"modified":"2024-02-28T18:18:17","modified_gmt":"2024-02-28T12:48:17","slug":"large-language-models","status":"publish","type":"post","link":"https:\/\/forumias.com\/blog\/large-language-models\/","title":{"rendered":"Large Language Models"},"content":{"rendered":"<p><strong>Source<\/strong>&#8211; This post on <strong><span class=\"TextRun Underlined SCXW24252944 BCX8\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"none\"><span class=\"NormalTextRun SCXW24252944 BCX8\" data-ccp-charstyle=\"Hyperlink\">Large Language Models<\/span><\/span> <\/strong>is based on the article<strong> \u201cWhat is an LLM, the backbone of AI chatbots like ChatGPT, Gemini?\u201d<\/strong> published in \u201c<b>Indian Express<\/b>\u201d on 27th February 2024.<\/p>\n<h2><strong>Why in the News?<\/strong><\/h2>\n<p>Large language model serve as the backbone of Artificial Intelligence Chat boxes like ChatGPT and Gemini.<\/p>\n<h2><strong>What are Large Language Models<\/strong><\/h2>\n<p><span style=\"color: #333333;\"><strong>1. Description-<\/strong><\/span> Large language Models (LLMs) are <span style=\"color: #ff0000;\">large general-purpose language models<\/span> that can be <span style=\"color: #ff0000;\">pre-trained and fine-tuned<\/span> for specific purposes like text classification, question answering and document summarisation.<\/p>\n<p><strong>2. Use-<\/strong> Large Language Models enable the <span style=\"color: #ff0000;\">Generative AI models like ChatGPT and Gemini <span style=\"color: #333333;\">to \u201cconverse\u201d with humans and predict the next word or sentence.<\/span><\/span><\/p>\n<p><strong>3<\/strong><span style=\"color: #ff0000;\"><strong><span style=\"color: #333333;\">. Features-<\/span><\/strong><br \/>\n<strong><span style=\"color: #333333;\">a.<\/span><\/strong><span style=\"color: #000000;\"><span style=\"color: #333333;\"><strong>\u00a0Large Data Sets-<\/strong> Large Language Models use <\/span><span style=\"color: #ff0000;\"><span style=\"color: #333333;\">extensive size of training data. They also have large parameter count.<\/span><\/span><\/span><\/span><\/p>\n<p><span style=\"color: #ff0000;\"><span style=\"color: #000000;\"><strong>Note-<\/strong> Parameters, also called <span style=\"color: #ff0000;\">hyperparameters,<\/span> in machine learning represent the <span style=\"color: #ff0000;\">memories and knowledge acquired by a machine during model training.<\/span> They determine the proficiency of the model in addressing a particular problem.<br \/>\n<\/span><\/span><\/p>\n<p><span style=\"color: #ff0000;\"><span style=\"color: #000000;\"><strong>b. Use for General Purpose-<\/strong> This means the model is sufficient to <span style=\"color: #ff0000;\">solve general problems<\/span> that are based on the commonality of human language regardless of specific tasks, and resource restrictions.<\/span><\/span><\/p>\n<p><span style=\"color: #ff0000;\"><span style=\"color: #000000;\"><strong>c. Tool to produce Human Language-<\/strong> It is a tool that helps computers understand and produce human language.<\/span><\/span><\/p>\n<p><span style=\"color: #ff0000;\"><span style=\"color: #000000;\">4. <strong>T<\/strong><strong>ypes of LLMs:<\/strong><\/span><br \/>\n<span style=\"color: #000000;\">a)<\/span> <strong><span style=\"color: #333333;\">On the basis of architecture- <\/span><\/strong><span style=\"color: #333333;\">These are of 3 types, which are mentioned below-<\/span><br \/>\n<span style=\"color: #000000;\"><span style=\"color: #333333;\"><strong>i) Autoregressive model- <\/strong><\/span>They predict the next word in a sequence based on previous words. <strong>For ex-<\/strong> GPT-3<br \/>\n<\/span><\/span><\/p>\n<p><span style=\"color: #ff0000;\"><span style=\"color: #000000;\"><span style=\"color: #ff0000;\"><strong><span style=\"color: #333333;\">ii) Transformer-based model-<\/span><\/strong>\u00a0<span style=\"color: #000000;\">They use a specific type of neural network architecture for language processing. <strong>For Ex-<\/strong> LaMDA or Gemini (formerly known as Bard)<br \/>\n<\/span><\/span><\/span><\/span><\/p>\n<p><span style=\"color: #ff0000;\"><span style=\"color: #000000;\"><span style=\"color: #ff0000;\"><span style=\"color: #000000;\"><span style=\"color: #ff0000;\"><strong><span style=\"color: #333333;\">iii) Encoder-decoder model-<\/span><\/strong> <span style=\"color: #000000;\">They encode input text into a representation and then decode it into another language or format. <strong>For Ex-<\/strong> T5, Bart, Pegasus, ProphetNet, Marge<\/span><\/span><\/span><\/span><br \/>\n<\/span><\/span><\/p>\n<p><span style=\"color: #ff0000;\"><span style=\"color: #000000;\">b)<strong><span style=\"color: #333333;\"> On the basis of training data:<\/span><\/strong> There are three types of LLMs<br \/>\n<span style=\"color: #ff0000;\"><span style=\"color: #333333;\"><strong>i) Pretrained and fine-tuned-<\/strong> <span style=\"color: #000000;\">These language models are trained with multiple data set and are fine-tuned to provide accurate results.<\/span><\/span><\/span><\/span><\/span><\/p>\n<p><span style=\"color: #ff0000;\"><span style=\"color: #000000;\"><span style=\"color: #ff0000;\"><span style=\"color: #333333;\"><strong>ii) Multilingual Models-<\/strong><\/span><span style=\"color: #000000;\"> These LLMs can understand and generate text in multiple languages<\/span><\/span><\/span><\/span><\/p>\n<p><span style=\"color: #ff0000;\"><span style=\"color: #000000;\"><span style=\"color: #333333;\"><strong>iii) Domain-specific Models-<\/strong><\/span> These are trained on data related to specific domains such as legal, finance or healthcare.<br \/>\n<\/span><\/span><\/p>\n<p><span style=\"color: #ff0000;\"><span style=\"color: #000000;\"><strong><span style=\"color: #333333;\">c) Based on availability-<\/span><\/strong> They are categorised as <span style=\"color: #ff0000;\">open-source and closed-source.<\/span><\/span><\/span><\/p>\n<p><span style=\"color: #ff0000;\"><span style=\"color: #000000;\"><span style=\"color: #333333;\"><strong>i) Open Source-<\/strong><\/span> These use open source data from the web for training. <span style=\"color: #333333;\"><strong>For ex-<\/strong><\/span> LLaMA2, BlOOM, Google BERT, Falcon 180B, OPT-175 B.<\/span><\/span><\/p>\n<p><span style=\"color: #ff0000;\"><span style=\"color: #000000;\"><span style=\"color: #333333;\"><strong>ii) Closed Source-<\/strong><\/span> These models with close data sets. <strong><span style=\"color: #333333;\">For ex-<\/span><\/strong> Claude 2, Bard, GPT-4, are some proprietary LLMs.<br \/>\n<\/span><\/span><\/p>\n<p><span style=\"color: #ff0000;\"><span style=\"color: #000000;\">d) LLMs also vary<span style=\"color: #ff0000;\"> based on their sizes. <span style=\"color: #333333;\">L<\/span><\/span><span style=\"color: #333333;\">arge models require more computational resources but also offer better performance.<\/span><br \/>\n<\/span><\/span><\/p>\n<p><strong><span style=\"color: #333333;\">5) Working Methodology- <\/span><\/strong><\/p>\n<p><span style=\"color: #ff0000;\"><span style=\"color: #000000;\"><span style=\"color: #333333;\">i) Deep learning is a key technique in training Large Language Models (LLMs). Deep learning involves using artificial neural networks inspired by the human brain.<\/span><\/span><\/span><\/p>\n<p>ii) For LLMs, this <span style=\"color: #ff0000;\">neural network<\/span> learns to predict the probability of a word or sequence of words by analysing the patterns and relationships between words in the data set used for training.<\/p>\n<p>iii) Once trained, an LLM can predict the most likely next word or sequence of words based on inputs also known as <span style=\"color: #ff0000;\">prompts.<br \/>\n<\/span><\/p>\n<p><strong><span style=\"color: #333333;\">6)<\/span><\/strong><span style=\"color: #ff0000;\"><strong><span style=\"color: #333333;\"> What can LLMs do<\/span><\/strong><br \/>\n<\/span><\/p>\n<p><span style=\"color: #ff0000;\"><span style=\"color: #000000;\">a) They <span style=\"color: #ff0000;\">generate text and are capable of producing human-like content<\/span> for purposes ranging from stories to articles to poetry and songs.<\/span><\/span><\/p>\n<p><span style=\"color: #ff0000;\"><span style=\"color: #000000;\">b) They can strike up a conversation or <span style=\"color: #ff0000;\">function as virtual assistants.<\/span><\/span><\/span><\/p>\n<p><span style=\"color: #ff0000;\"><span style=\"color: #000000;\">c) They <span style=\"color: #ff0000;\">show proficiency in language understanding tasks,<\/span> including sentiment analysis, language translation, and summarisation of dense texts.<\/span><\/span><\/p>\n<p><span style=\"color: #ff0000;\"><span style=\"color: #000000;\">d) LLMs <span style=\"color: #ff0000;\">engage with users<\/span> providing information, answering questions, and maintaining context over multiple exchanges.<\/span><\/span><\/p>\n<p><span style=\"color: #ff0000;\"><span style=\"color: #000000;\">e) They can create <span style=\"color: #ff0000;\">content and personalise it, <\/span>aiding in marketing strategies, offering personalised product recommendations, and tailoring content to specific target audiences.<br \/>\n<\/span><\/span><\/p>\n<p><span style=\"color: #ff0000;\"><span style=\"color: #333333;\"><strong>6) Advantages of LLMs:<\/strong><\/span><\/span><\/p>\n<p><span style=\"color: #ff0000;\">a) \u00a0Versatility: <\/span>LLMs display a wide versatility as these can be applied to a broad range of tasks with just one model, due to their training on extensive datasets.<\/p>\n<p><span style=\"color: #ff0000;\">b) Efficiency with Limited Data<\/span><span style=\"color: #ff0000;\">:<\/span> LLMs can perform effective tasks despite the availability of only small amounts of domain-specific data. This is because they utilize the extensive knowledge gained from their general language training.<\/p>\n<p><span style=\"color: #ff0000;\">c) Continuous Improvement:<\/span> The performance of LLMs enhances as they are fed more data and parameters. This showcases a capacity for ongoing learning and development.<\/p>\n<p><span style=\"color: #333333;\"><strong>UPSC Syllabus- Science and Technology<\/strong><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Source&#8211; This post on Large Language Models is based on the article \u201cWhat is an LLM, the backbone of AI chatbots like ChatGPT, Gemini?\u201d published in \u201cIndian Express\u201d on 27th February 2024. Why in the News? Large language model serve as the backbone of Artificial Intelligence Chat boxes like ChatGPT and Gemini. What are Large&hellip; <a class=\"more-link\" href=\"https:\/\/forumias.com\/blog\/large-language-models\/\">Continue reading <span class=\"screen-reader-text\">Large Language Models<\/span><\/a><\/p>\n","protected":false},"author":10366,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"footnotes":""},"categories":[1566,1738,1],"tags":[11872,10500],"class_list":["post-284872","post","type-post","status-publish","format-standard","hentry","category-daily-factly-articles","category-science-and-technology-daily-factly-articles","category-uncategorized","tag-9pm-daily-factly","tag-indian-express","entry"],"jetpack_featured_media_url":"","views":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/forumias.com\/blog\/wp-json\/wp\/v2\/posts\/284872","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/forumias.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/forumias.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/forumias.com\/blog\/wp-json\/wp\/v2\/users\/10366"}],"replies":[{"embeddable":true,"href":"https:\/\/forumias.com\/blog\/wp-json\/wp\/v2\/comments?post=284872"}],"version-history":[{"count":0,"href":"https:\/\/forumias.com\/blog\/wp-json\/wp\/v2\/posts\/284872\/revisions"}],"wp:attachment":[{"href":"https:\/\/forumias.com\/blog\/wp-json\/wp\/v2\/media?parent=284872"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/forumias.com\/blog\/wp-json\/wp\/v2\/categories?post=284872"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/forumias.com\/blog\/wp-json\/wp\/v2\/tags?post=284872"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}