Build your own Free "Chat with PDF" system
With the rise of AI LLM, there are many who like to have fun with such customized chatbot system. And there are many who uses such customized chatbot system by paying some money to open AI API to get embeddings or Natural language processing. In this article, I will give you step by step instructions, on making your own chat with pdf system using open source models. Which means, it is free and also doesn't needs high end system as it will be on browser. So no need of GPUs too.
I have utilized the power of Llama Index. More details about it can be accessed through their official website https://www.llamaindex.ai/ or you can follow them on Linkedin page https://www.linkedin.com/company/llamaindex/
Interesting point about this system, even if you have not coded in your life before, you will be able to do it. You just need patience and attention to details for following this step by step instructions. Some basics of google colab will be required, which you learn as you use.
1) Signup in google colab and go to following link
Chat With PDF template Google Colab Link
2) Create your own notebook with relevant title and copy paste all the code blocks from above notebook to your notebook
3) There is a play button in every code block. You need to run all play buttons in sequence.
4) While pasting this code blocks, I have written few instructions in template google colab notebook. You need to follow those instructions and be ready with required tokens.
5) Finally, when you reach last line, where print(response) is written, all the magic of chatting with your own pdf is there
Guidelines for file management and tokens:
You will need few tokens to make sure that this system runs smoothly
1) Gradient AI Token and workspace ID
2) Astra Datastax Vector DB json file and zip file
Lets understand how to access this one by one
Gradient AI:
Login to https://auth.gradient.ai
create workspace and get workspace ID
Also you can see Access token tab there. Where you will get Access token in string. Paste it in relevant place in google colab file as mentioned in text blocks
os.environ['GRADIENT_ACCESS_TOKEN'] = 'Gradient access token here'
os.environ['GRADIENT_WORKSPACE_ID'] = 'Gradient workspace id here'
Datastax
signup to astra.datastax.com
You need to create serverless vector database and generate token. You will get this option in overview section of database you just created
Also, notice download SCB. Its a secure content bundle. So here you will get 2 files. One will be json file, which is a token and other is zip file, which is secure content bundle
You need to upload both this files in current working directory of your google colab, which is content folder by default.
If you notice left hand side bar, there are few icons and last icon is a folder symbol. You need to click that
You need to click that topmost folder with ..
you will notice content folder. Whatever files you need to upload has to be inside this directory. So Hover your mouse over that content directory, you will notice three vertical dots
you will get upload option there. You need to upload your zip and json file here, or whatever current working directory you had set.
Knowledge Base PDF:
Whatever PDF you want to use to chat with, should be uploaded in isolated folder. As per code base, it should be in content/Documents folder. If you prefer some other location, consider changing codebase like that.
All the settings and requirements are fulfilled till here.
Once all required files are uploaded in your google colab, you are ready to enjoy chat with pdf.
Just Click all play buttons of every code block in sequence, till you reach last code block
This the fun section of this whole system, keep changing your questions here.
This is the first level of chat with pdf, where you type your question and you get answer from the pdf. It is not sequential. Sequential chatting needs more expertise. I wanted to make system in such a way that even non coders should be able to follow and make their own chat with pdf system with their own knowledge base pdf.
Soon, will try to bring level 2 of this with some additional features ;-)