beautifulsoup4 bs4 huggingface-hub langchain pdf2image pdfminer.six==20221105 pytesseract requests sentence-transformers sentencepiece streamlit torch unstructured faiss-cpu