Hi everyone! Continuing on from my last blog, I got selected in C4GT Program and have successfully passed the Mid-Point Evaluation! So I will be sharing my one month journey here today.
PROJECT :- LLMs for Question Answering Part II by iGoT Karmayogi
My C4GT Experience till now has been full of working with new technologies, designing an efficient pipeline and brainstorming on code with my fellow contributor and Mentor.
The C4GT Management Team has been super supportive throughout the month and helped us to easily slide into our new roles. The team helped us a lot by conducting a very helpful bootcamp where we learned Git & GitHub, time management techniques and much more. You can find those sessions here.
WEEK 1 :- Charting the Course
We kicked off the first week with a detailed project discussion with our mentor to properly define the milestones to be achieved before mid-point evaluation. Our mentor was very helpful and guided us in the right direction for the project requirements. We had a brief discussion over Design Documentation of the project. I started deciding which dataset to use for evaluation of LLM Models and made a list of LLM models that can be supported by hugging face transformers.
WEEK 2:- Scripting Progress
I started working on the evaluation script for finding the best LLM Models for Question Answering Task. My mentor guided me on which evaluation metrics to choose and helped define the structure of our code. I started working with the evaluation script and generated Evaluation Results table for more than 60+ models much like this.
WEEK 3 :- Visualization and Demo
I started working on a visualization script in python to visualize the evaluation results for better analysis and documentation purposes. As mid-point evaluation was approaching we started working on a demo in Langchain where we could demo first- half of the project which is the ingestion part and retrieval of top-n documents.
WEEK 4 :- Refining and Looking Ahead
In the final week, we refined our demo, ensuring it worked seamlessly. The demo involved document ingestion, coreferencing, vector database integration, and similarity search. The retrieved documents were used as context for the LLM, generating answers. While the answers were insightful, there’s room for improvement through LLM fine-tuning. I concluded the week by creating my mid-point evaluation presentation, which you can view here. My mid- point evaluation mentor feedback was to work on LLM finetuning and multiple testing cycles to configure the pipeline.
This journey has introduced me to numerous Large Language Models, the Hugging Face Library, TensorFlow, and Python functionalities. Overall, it has been a valuable experience, and I look forward to what the next month will bring.
Tips for future contributors :-
- Don’t be discouraged by the initial complexity of a project and keep faith in your skills and ask for help from fellow contributors, mentors and C4GT Team if you get stuck anywhere.
- Ask as many doubts as you can in the beginning of the project and have clear objectives established. It can be very difficult to correct mistakes later.
- Establish clear communication channel with mentor and fellow project contributors and write meeting notes. Share those meeting notes with them immediately after the meet to avoid miscommunication much like this.