Hey everyone!
I'm trying to create an app which i can upload a few PDF files and let the AI to go through over them and summarize them. but facing a problem which the PDF files after upload are just too big for the AI models.
I tried so far to play in the Workflow area to see if i can implement some JS code to cut the base64 into chunks but had no success.
I am using the upload component and trying to receive a feedback from AI right after uploading.
Anyone have an idea how to overcome the character limit from the AI? I
import { PDFDocument } from 'pdf-lib';
async function splitPdf(pdfBytes, pagesPerFile) {
const pdfDoc = await PDFDocument.load(pdfBytes);
const totalPages = pdfDoc.getPageCount();
const numFiles = Math.ceil(totalPages / pagesPerFile);
const result = [];
for (let i = 0; i < numFiles; i++) {
const newPdf = await PDFDocument.create();
for (let j = i * pagesPerFile; j < Math.min((i + 1) * pagesPerFile, totalPages); j++) {
const [copiedPage] = await newPdf.copyPages(pdfDoc, [j]);
newPdf.addPage(copiedPage);
}
const newPdfBytes = await newPdf.save();
result.push(newPdfBytes);
}
return result;
}
you will need to use a library though as just splitting the base64 won't work. this is because part of the base64 string includes data specific for 1 pdf file. you can visualize it like below:
const my_base64_str = 'asdfghjklqwertyuiop1234567890kljhgfd'
// if the above is your base64 string then
// asdfghj | klqwertyuiop1234567890k | ljhgfd
// file info | file data | file closer
// if you were to split this:
// split0: asdfghj | klqwertyuiop
// split1: 1234567890k | ljhgfd
// we can try and designate each split as a .pdf, but the OS is looking for the 'file info' and 'file closer' sections of both splits and is unable to find it so you end up with 2 corrupted files.
// to do this correctly, you would need to put the files back together on the server side...
// so you would split the pdf on the client/browser side, then send both splits to the server/backend which combines all the splits into a single file to send to a vector service.
// since we're unable to manage the backend you need to create 2 new pdf files from 1 and then send both files, all from the client/browser.