RAPTOR Recursive Abstractive Processing for Tree-Organized Retrieval

A new information retrieval paper was published recently, RAPTOR (Recursive Abstractive Processing for Tree-Organized Retrieval) represents a leap forward in the domain of retrieval-augmented language models. Developed by a team from Stanford University, RAPTOR addresses the critical limitation of existing models that struggle with incorporating comprehensive document context during retrieval, thus hindering their ability to adapt to new information and access detailed knowledge. RAPTOR introduces a novel method that recursively embeds, clusters, and summarizes text chunks, constructing a hierarchical tree that captures information at various levels of abstraction. This tree structure, rich in layered summaries, allows the model to retrieve information that spans across a document efficiently, ensuring that even complex, multi-step reasoning tasks benefit from a holistic understanding of the content. The paper summarizes it thus: ...

February 19, 2024 · 2 min · Jered Sutton

Automatic Agent Learning from Scratch via Self-Planning

A new paper, “AUTOACT: Autonomous Agent Creation for Task Completion,"1 by Shuofei Qiao et al., introduces the AUTOACT automatic agent learning framework. This framework stands out by eschewing the traditional reliance on extensive annotated data and synthetic trajectories, a stark contrast to models like GPT-4. AUTOACT’s strength lies in its ability to synthesize planning trajectories and implement a division-of-labor strategy. This facilitates the creation of sub-agent groups that work in tandem, showing promise in complex question-answering tasks and potentially surpassing the capabilities of established models like GPT-3.5-Turbo. ...

February 1, 2024 · 2 min · Jered Sutton

Glaze and Nightshade

As generative AI models continue to grow and progress, the need for new content to train them on increases. This insatiable appetite for data clashes against the creative spirit of individual artists, though it’s not only individual artists being affected as we see in the New York Times v. OpenAI case. The somewhat digestive nature of model creation renders traditional techniques for protecting images, such as watermarks ineffective. Glaze and Nightshade, two new tools built by the University of Chicago, aim to restore some of the balance that has been lost between content creators and model creators. ...

January 29, 2024 · 2 min · Jered Sutton

A Structured Approach to Developing RAG Applications

Integrating Retrieval-Augmented Generation (RAG) into your business can significantly enhance how you interact with data and respond to queries. Here’s a guide to help effectively integrate this technology into your business: Identify Assets: Start by identifying all your data sources, including databases, internal documents, and web content. Knowing the breadth of your data is crucial for a targeted RAG deployment. Identify Questions: Clearly define the types of queries your RAG system will address. Distinct categorization helps customize your RAG application for various needs, from customer inquiries to complex analytical tasks. ...

January 26, 2024 · 1 min · Jered Sutton

Data Sovereignty and Architectural Choices in RAG Applications

Selecting an appropriate architecture for RAG applications involves balancing data sovereignty with operational efficiency. This section outlines various architectural choices, each with its unique implications for data control and processing capabilities: Cloud-Based Data and LLM (e.g., ChatGPT Assistant API): Pros: Benefits from scalability, easy integration, and access to advanced AI models. Cons: Introduces concerns about data privacy in the cloud and dependence on external services. Local Data with Cloud LLM (Vector Database and OpenAI API): ...

January 26, 2024 · 1 min · Jered Sutton

Self-taught models are gaining steam

A new research paper, “Self-Rewarding Language Models”, explores a novel approach to LLM training. Unlike traditional models, these models generate and evaluate their own training data, enabling continuous self-improvement beyond initial training limits[1]. This is another step along the path to potentially realizing AGI. Data quality has been and remains one of the key challenges for LLM technology. This method, reminds me of the approach Microsoft used for Phi-2’s static training. In that case, they used GPT-3.5 to generate synthetic textbook data. However in this case the model under training is doing the generation[2]. ...

January 22, 2024 · 1 min · Jered Sutton

Reduce the latency and cost of LLM inference with prompt compression

A new paper from Microsoft proposes using small models to compress prompts before passing them to larger models like gpt-4. The researchers were able to both achieve up to a 20x reduction in prompt tokens with some performance loss or a 4x reduction with a performance increase. Performance in this case means produced the desired output[1]. Usage is straightforward: from llmlingua import PromptCompressor llm_lingua = PromptCompressor() compressed_prompt = llm_lingua.compress_prompt( prompt_complex.split("\n\n"), instruction="", question="", target_token=200, context_budget="*1.5", iterative_size=100, ) instruction = "Please reference the following examples to answer the math question,\n" prompt = instruction + compressed_prompt["compressed_prompt"] + "\n\nQuestion: " + question request_data = { "prompt": prompt, "max_tokens": 400, "temperature": 0, "top_p": 1, "n": 1, "stream": False, "stop": "\r\n", } response = openai.Completion.create( "gpt-3.5-turbo-0301", **request_data, ) There are 4 big challenges to deploying LLMs in production performance, cost, latency and security. This project hits 3 of the 4. Though it is possible that this approach might even be useful to mitigate prompt injection if a small model that was trained to recognize and strip prompt injection attempts were created. ...

January 18, 2024 · 1 min · Jered Sutton

LLM Unicode Prompt Injection

Be careful copying AI prompts… It has become common place on social media to see posts sharing “super prompts” or prompt templates. Researchers have discovered a technique that uses unicode to hide prompt injection as non-printable characters1. Prompt injection, a term coined by Simon Willison, is a type of attack that attempts to override a user or application prompt to either alter the results or to exfiltrate earlier elements of the prompt or used in retrieval augmented generation (RAG). It is a real challenge for LLM apps at the moment as there are no completely reliable mitigation techniques. ...

January 17, 2024 · 1 min · Jered Sutton

LLMs Poison and Trust

A fascinating new paper by the Anthropic team explores how LLMs can be ’trained’ to appear normal during training, only to manifest malicious behavior once deployed1. Andrej Karpathy expanded on this idea, hypothesizing that this initial training could be provided by publishing malicious text on the internet where it would be picked up for use in training new models1. This might not seem significant, as LLMs merely generate text. However, consider the capabilities of Open Interpreter2. Open Interpreter is a program that helps you run code generated by LLMs. With Open Interpreter you can: ...

January 14, 2024 · 2 min · Jered Sutton

Response to Rabbit R1

Three days ago Rabbit announced the Rabbit R1, a new handheld AI device co-designed by Teenage Engineering. The reactions have been polarized, with some seeing it as merely an app in physical form, while others hail it as a revolution in how we interact with machines. The R1’s affordability and Teenage Engineering’s design are certainly appealing. However, I’m curious about Rabbit’s business model since the R1 doesn’t run inference locally and doesn’t require a subscription. I wonder what their revenue strategy is and what the implications for user privacy are. That being said, the privacy claims on the website seem really solid. ...

January 13, 2024 · 2 min · Jered Sutton