What's Changed

Adding Renovate config and newer dependencies by @jamesbraza in #398
Fixing mutable BaseModel defaults and removing extra BaseModel by @jamesbraza in #400
Passing mypy paperqa by @jamesbraza in #405
Adding CONTRIBUTING.md by @jamesbraza in #404
Creating LitQAv2TaskDataset for agent training/evaluation by @jamesbraza in #401

Full Changelog: v5.0.2...v5.0.3

Contributors

jamesbraza

Assets 2

13 Sep 18:36

whitead

v5.0.2

30ec9b0

v5.0.2

What's Changed

Changed email addresses to something less generic by @whitead in #376
Fix examples in README by @taabishm2 in #379
Validating LiteLLMModel.config structure by @jamesbraza in #383
No google auth, better CI names by @jamesbraza in #384
Fixing Pydantic validation in Python<3.12 by @jamesbraza in #385
LitQA2 downloading and question creation functionality by @jamesbraza in #386
Testing MemoryAgent and timeouts on ldp agents by @jamesbraza in #375
Removed monkeypatch fixture since it's not a dependency by @jamesbraza in #395
Fixing crash in chunk_text for empty file by @jamesbraza in #389

New Contributors

@taabishm2 made their first contribution in #379

Full Changelog: v5.0.1...v5.0.2

Contributors

whitead, jamesbraza, and taabishm2

Assets 2

11 Sep 22:51

whitead

v5.0.1

5ed7974

v5.0.1

What's Changed

Removed StrPath in favor of direct type hints by @jamesbraza in #369
Added tool description update to test by @mskarlin in #368
Added explanation of different with paper by @whitead in #371
Updates to retraction status checker by @geemi725 in #370
Retrying if ToolSelector fails to select a tool by @jamesbraza in #373
Reset default settings to use high_quality and remove truncation by @mskarlin in #374

Full Changelog: v5.0.0...v5.0.1

Contributors

whitead, jamesbraza, and 2 other contributors

Assets 2

11 Sep 16:25

whitead

v5.0.0

45b206d

V5.0.0

New Features

Automatic population of metadata: PDF metadata is automatically retrieved from a variety of providers, including adding bibtex, citation counts, journal quality assessments, and noting retractions
full-text search: A major difference between our published work and this repo is ability to search over all of scientific literature. We've brought the OSS version closer by adding full-text keyword search via tantivy. Now you can index and search many papers before embdding, making it feasible to ingest many papers.
unified settings management: You can now save/load settings and that makes it easier for us to distribute settings reflecting various tasks with PaperQA2. Examples are writing wikipedia articles, identifying contradictions, and obtaining structured data
CLI: We've made a CLI that uses persistent parsings/indexes and makes it much easier to just ask questions of a folder of PDFs
Litellm: We've adopted litellm as the LLM wrapper of choice. This means we now support many LLM APIs directly with only the model string changing. It also means we have "routers" now that can do fallbacks, api rate limiting, and retries.

Improvements

More modern agent frameworks
Reduction in dependencies
Removed code duplicated by litellm
Many improvements on code style and best practices

Regressions/Deprecation

We've removed the following features to keep our library focused:

doc_match - we do not have enough data to support that this method actually helps for very large corpuses
LangchainVectorStore - We no longer support more complex vector stores via Langchain like FAISS. Instead, we only support Numpy vector stores. We never found the paradigm of very large vector stores to be better than keyword search -> vector search -> LLM reranking and thus removed the code

Detailed Changes:

typo by @oganm in #303
Updated readme and models by @mskarlin in #305
Add Client (external API) Module For Enhanced Metadata by @mskarlin in #306
Agentic workflows, locally indexed search, and CLI by @mskarlin in #309
Add new unpaywall provider by @mskarlin in #310
Rollback search fields to list and dynamically compute md5 hash in tests by @mskarlin in #311
Refactor to breakout config from rest of code by @whitead in #289
Changed to rely on litellm for computing cost by @whitead in #321
Fixing LLMModel.axyz_iter type hints by @jamesbraza in #324
CLI Fixes by @whitead in #322
blackened code to prevent IDE scrolling by @jamesbraza in #330
Optimized import paths by @jamesbraza in #331
Removed pytest-mock plugin by @jamesbraza in #328
Adding pytest-xdist plugin by @jamesbraza in #329
Passing mypy by @jamesbraza in #332
Removing make_chain in favor of run_prompt by @jamesbraza in #325
Readme updates by @mskarlin in #323
Adding refurb tool, and lint CI by @jamesbraza in #333
Fixing arg ordering after #325 by @jamesbraza in #334
Fixing parse_text after #332 by @jamesbraza in #335
Fixing union attr error by @jamesbraza in #338
Check if a journal name starts with the by @geemi725 in #320
Fixing two more tests by @jamesbraza in #340
All Ruff ANN autofixes by @jamesbraza in #341
Adding in .mailmap by @jamesbraza in #342
Remove cassettes which aren't needed by @mskarlin in #339
Add configs for contracrow + wikicrow by @mskarlin in #336
Removed LangchainVectorStore, llms extra, and fixing up README by @jamesbraza in #343
Dropping requests dependency by @jamesbraza in #346
Removed html2text requirement by @jamesbraza in #347
Requiring Python 3.11+ by @jamesbraza in #348
Did one revision at README by @whitead in #344
Renaming fitz to pymupdf by @mskarlin in #350
Better control flow in litellm_get_search_query by @jamesbraza in #351
Recurse into directories; catch empty documents by @sidnarayanan in #352
Move configure_cli_logging such that it's not called twice by @mskarlin in #353
Cleaning up dependencies by @jamesbraza in #354
Fixed code in README by @whitead in #355
Added citation and paper URL by @whitead in #357
aviary and ldp for agents over langchain by @jamesbraza in #358
Adds retraction status by @geemi725 in #314
Adding pylint by @jamesbraza in #349
Added account for cost info by @whitead in #360