Kensho
Kensho
Sign InFree TrialContact Us

Re­search

Our Approach

At Kensho, we work on cutting-edge research and develop leading ML and NLP capabilities for real-world impact.

We hire talented people and give them the freedom and resources needed to accomplish our shared goals. Our team conducts pure research with the goal of creating novel contributions and publishing our work in top-tier venues.

Meet the Team
Research Intro

Our Projects

Learn more about what we’re researching.

We focus our efforts on problems in the finance and business worlds, designed to push the envelope and drive impact.

Tokenization

What is the role of tokenization in training LLMs, particularly for financial and quantitative use cases? In our research, we dissect and improve the tokenization process, quantifying impact on pre-training and downstream performance.

LLMs

LLMs show strong performance on many challenging tasks, but they still struggle to solve many real-world problems in business and finance. We’re developing better models for the business world, while focusing on the entire pipeline, including tokenization, constructing high-quality datasets, alignment, and evaluation.

Numeric Understanding

LLMs often struggle with numeric understanding. While we can alleviate some of this difficulty with code generation, we still want our models to be able to understand and process numerical data. Our team is investigating how well language models use numbers, and identifying the mechanisms language models learn to use numbers.

Benchmarks

LLMs require rigorous evaluation benchmarks, and targeting the reasoning skills needed in business and finance presents unique challenges. We are developing benchmarks for evaluating a model’s ability to reason about realistic financial problems, S&P AI Benchmarks.

Multi Document QA

Current state-of-the-art foundational models do not always correctly answer complex questions that require grounding knowledge from multiple sources. We are developing intelligent reading comprehension agents that can process and reason over a range of document collections.

Factuality

As GenAI applications become more prevalent in our daily lives, it’s increasingly important that they produce factually correct and accurate outputs. We develop methods to monitor factuality with the ultimate goal of providing a tool to manage model outputs of GenAI products.

Meet the R&D Team

Chris's avatar
Chris
Head of R&D

Chris created and leads Kensho’s R&D lab, and he holds a joint faculty appointment at MIT, where he teaches ML and NLP. Previously, he taught and advised graduate students at Harvard. Since 2004, he has conducted ML research within industry, government, and academia, including at MIT Lincoln Laboratory, Spotify, Google, IBM Watson. He received his PhD from Brown University.

Michael's avatar
Michael
Research Scientist

Michael leads language model efforts at Kensho, which includes training, alignment and evaluation. Previously, Michael was part of a federally funded R&D center, leading research teams focusing on AI-enhanced decision making and ML security. He holds an M.S. in Computational Science and Engineering from Harvard.

Varshini's avatar
Varshini
Research Engineer

Varshini’s research focuses on developing and evaluating language models for financial applications, with specific focus on both tokenization and retrieval-based approaches (RAG). Prior to joining Kensho’s R&D team as a Research Engineer, she obtained her Master’s in Data Science from Harvard.

Charlie's avatar
Charlie
Research Scientist

Charlie is interested in both language model alignment for conversational and code-based use cases, and understanding what and how language models learn. Before joining Kensho, he completed his PhD at Brown University as part of the Language Understanding and Representation Lab.

Work with us!

We are looking for world-class researchers to join our growing team.

See available positions

Publications

Learn more about our research through our publications.

Tokenization Is More Than Compression

Craig W. Schmidt, Varshini Reddy, Haoran Zhang, Alec Alameddine, Omri Uzan, Yuval Pinter, Chris Tanner

EMNLP - 2024

An Analysis of Multilingual FActScore

Vu Trong Kim, Michael Krumdick, Varshini Reddy, Franck Dernoncourt, Viet Dac Lai

EMNLP - 2024

BizBench: A Quantitative Reasoning Benchmark for Business and Finance

Michael Krumdick, Rik Koncel-Kedziorski, Viet Dac Lai, Varshini Reddy, Charles Lovering, Chris Tanner

ACL - 2024

DocFinQA: A Long-Context Financial Reasoning Dataset

Varshini Reddy, Rik Koncel-Kedziorski, Viet Dac Lai, Michael Krumdick, Charles Lovering, Chris Tanner

ACL - 2024

Greed is All You Need: An Evaluation of Tokenizer Inference Methods

Omri Uzan, Craig W Schmidt, Chris Tanner, Yuval Pinter

ACL - 2024

NAACL (Findings) - 2024

LREC-COLING - 2024