Spaces:
Running
on
Zero
Apply for community grant: Academic project (gpu)
I'm Dominik Stammbach, a postdoctoral researcher at Princeton University working on access to justice. Together with Peter Henderson, we’ve been building a tool that allows semantic search over most of caselaw. Users can enter a query and get back the most relevant paragraphs from across a broad set of legal texts. We hope this empowers e.g., pro-bono lawyers, public defenders or legal professionals from public agencies, and would be free to use. Moreover, logging user queries and providing a feedback option helps us develop a high-quality and real-world dataset, which will serve as a cricitcal resource for legal information retrieval and can inform future directions. We plan to host this service free of charge for public good, and will advertise it to pro-bono lawyers, public defenders and legal professionals in public agencies.
We have finalized a first version of this space, and are currently conducting last tests. We have bought the enterprise version with access to zero-GPU spaces, and this sort of works. However, we have to truncate our embeddings to ca. 8% with PCA to be able to host all of them. We realized that this procedure significantly reduces retrieval performance. Thus, it would be great if we can get a space with way more RAM, as our embeddings alone are ca. 300GB, so approx. 500GB would be great. And we need access to a single GPU to embedd user queries.
We believe this space has great potential for public good and enabling access to justice in the US. Thus, it would be great if we could get more compute resources to host such a service in the best possible way, and to maximize impact from the beginning. Happy to provide a draft of a paper where we describe the system, intended use cases and broader goals of this project, and happy to chat to provide more information.