Developed Pipelines
A Text Extraction and OCR Pipeline was implemented to handle a variety of academic and research document formats, including PDFs, Word files, and image-based documents. Optical Character Recognition (OCR) was integrated to ensure that scanned documents and image-based academic papers could be accurately processed and converted into searchable text.
A Prompt Engineering Pipeline was developed to generate dynamic, context-aware prompts tailored specifically to research questions. This system ensured that responses were accurate, aligned with the topic, and consistent with academic citation standards.
Additionally, an On-the-Go RAG (Retrieval-Augmented Generation) Pipeline was built to deliver real-time, context-rich answers. This pipeline supported both indexed retrieval through a Weaviate Vector Database and direct, live parsing of documents without prior indexing. It merged internal document content with relevant web data to provide comprehensive and high-quality responses.