How to Draw a Haystack

Pipelines Tutorial

Open In Colab

In this tutorial, you will learn how the Pipeline class acts as a connector between all the different building blocks that are found in FARM. Whether you are using a Reader, Generator, Summarizer or Retriever (or 2), the Pipeline class will help you build a Directed Acyclic Graph (DAG) that determines how to route the output of one component into the input of another.

Setting Up the Environment

Let's start by ensuring we have a GPU running to ensure decent speed in this tutorial. In Google colab, you can change to a GPU runtime in the menu:

  • Runtime -> Change Runtime type -> Hardware accelerator -> GPU

These lines are to install Haystack through pip

If running from Colab or a no Docker environment, you will want to start Elasticsearch from source

Initialization

Then let's fetch some data (in this case, pages from the Game of Thrones wiki) and prepare it so that it can be used indexed into our DocumentStore

Here we initialize the core components that we will be gluing together using the Pipeline class. We have a DocumentStore, an BM25Retriever and a FARMReader. These can be combined to create a classic Retriever-Reader pipeline that is designed to perform Open Domain Question Answering.

Prebuilt Pipelines

Haystack features many prebuilt pipelines that cover common tasks. Here we have an ExtractiveQAPipeline (the successor to the now deprecated Finder class).

If you want to just do the retrieval step, you can use a DocumentSearchPipeline

Or if you want to use a Generator instead of a Reader, you can initialize a GenerativeQAPipeline like this:

Haystack features prebuilt pipelines to do:

  • just document search (DocumentSearchPipeline),
  • document search with summarization (SearchSummarizationPipeline)
  • generative QA (GenerativeQAPipeline)
  • FAQ style QA (FAQPipeline)
  • translated search (TranslationWrapperPipeline) To find out more about these pipelines, have a look at our documentation

With any Pipeline, whether prebuilt or custom constructed, you can save a diagram showing how all the components are connected.

image

Custom Pipelines

Now we are going to rebuild the ExtractiveQAPipelines using the generic Pipeline class. We do this by adding the building blocks that we initialized as nodes in the graph.

Pipelines offer a very simple way to ensemble together different components. In this example, we are going to combine the power of an EmbeddingRetriever with the keyword based BM25Retriever. See our documentation to understand why we might want to combine a dense and sparse retriever.

image

Here we use a JoinDocuments node so that the predictions from each retriever can be merged together.

Custom Nodes

Nodes are relatively simple objects and we encourage our users to design their own if they don't see on that fits their use case

The only requirements are:

  • Create a class that inherits BaseComponent.
  • Add a method run() to your class. Add the mandatory and optional arguments it needs to process. These arguments must be passed as input to the pipeline, inside params, or output by preceding nodes.
  • Add processing logic inside the run() (e.g. reformatting the query).
  • Return a tuple that contains your output data (for the next node) and the name of the outgoing edge (by default "output_1" for nodes that have one output)
  • Add a class attribute outgoing_edges = 1 that defines the number of output options from your node. You only need a higher number here if you have a decision node (see below).

Here we have a template for a Node:

Decision Nodes

Decision Nodes help you route your data so that only certain branches of your Pipeline are run. One popular use case for such query classifiers is routing keyword queries to Elasticsearch and questions to EmbeddingRetriever + Reader. With this approach you keep optimal speed and simplicity for keywords while going deep with transformers when it's most helpful.

image

Though this looks very similar to the ensembled pipeline shown above, the key difference is that only one of the retrievers is run for each request. By contrast both retrievers are always run in the ensembled approach.

Below, we define a very naive QueryClassifier and show how to use it:

Evaluation Nodes

We have also designed a set of nodes that can be used to evaluate the performance of a system. Have a look at our tutorial to get hands on with the code and learn more about Evaluation Nodes!

Debugging Pipelines

You can print out debug information from nodes in your pipelines in a few different ways.

YAML Configs

A full Pipeline can be defined in a YAML file and simply loaded. Having your pipeline available in a YAML is particularly useful when you move between experimentation and production environments. Just export the YAML from your notebook / IDE and import it into your production environment. It also helps with version control of pipelines, allows you to share your pipeline easily with colleagues, and simplifies the configuration of pipeline parameters in production.

It consists of two main sections: you define all objects (e.g. a reader) in components and then stick them together to a pipeline in pipelines. You can also set one component to be multiple nodes of a pipeline or to be a node across multiple pipelines. It will be loaded just once in memory and therefore doesn't hurt your resources more than actually needed.

The contents of a YAML file should look something like this:

To load, simply call:

Conclusion

The possibilities are endless with the Pipeline class and we hope that this tutorial will inspire you to build custom pipeplines that really work for your use case!

About us

This Haystack notebook was made with love by deepset in Berlin, Germany

We bring NLP to the industry via open source!
Our focus: Industry specific language models & large scale QA systems.

Some of our other work:

  • German BERT
  • GermanQuAD and GermanDPR
  • FARM

Get in touch: Twitter | LinkedIn | Slack | GitHub Discussions | Website

By the way: we're hiring!

pinionanctioni.blogspot.com

Source: https://haystack.deepset.ai/tutorials/pipelines

0 Response to "How to Draw a Haystack"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel