NotebookLM is Really Powerful and a Tiny Bit Creepy

I’ve spent a few weeks experimenting with the Node.js flavour of Firebase Genkit* – Google’s framework for AI-powered features and apps. Retrieval Augmented Generation (RAG) was my focus:

“RAG is an AI framework that combines the strengths of traditional information retrieval systems (such as databases) with the capabilities of generative large language models (LLMs).”

RAG gives models the capability to ‘understand’ (if you’ll excuse the anthropomorphism) additional context when responding to a prompt. For example, a PDF of a family tree might be useful if you’re asking a model “what is the relationship between John Smith and Emma Smith?”. A base model could struggle to answer that question for a few reasons:

  • Multiple John Smiths and Emma Smiths in the training data
  • Either of them could be missing from the training data
  • The training data might not include the specific biographical information needed to draw a conclusion about their relationship

With a PDF of a specific family tree in context, you have a better chance of getting a grounded answer.

RAG, But Absolutely Huge

With RAG on the brain, I was really excited to see the release of Google’s new AI-powered research assistant – NotebookLM. It uses a RAG-based approach where you interact with an LLM (Gemini 1.5 at the time of writing) via a web console and you provide the data sources you’re interested in. This additional context is the grounding for the research you do in NotebookLM. Sources can take the form of:

  • Google Docs
  • Google Slides
  • PDF, Text and Markdown files
  • Web URLs
  • Copy-pasted text
  • Youtube URLs of public videos
  • Audio files

You can upload up to 50 sources, with a word limit of 500,000 words or 200mb for uploaded files. That’s the equivalent of 52 copies of The Lord of The Rings trilogy. It could also be 25,000,000 words of whatever proprietary data you’re researching or some other multimodal combination (e.g. if you wanted a model to understand a mixture of audio, text and video content).

It works really well and does everything you’d expect a research assistant to do (like providing accurate references/citations to source material). There are some slightly wacky features like the audio overviews: a podcast-style summary of whatever sources you’ve uploaded. I find them quite creepy and firmly in the uncanny valley but yeah, fine there’s probably a use case.

I had a really interesting test of NotebookLM where I used it to research the relative safety of nuclear energy production by providing a handful of PDF studies, podcast episodes and public websites. I couldn’t get it to hallucinate, as much as I tried. It consistently provided accurate references when answering and was forthcoming about gaps in its knowledge. So far so good. Unfortunately it failed fairly spectacularly on some crucial parts of the audio summary. The AI hosts maintained a jokey and lighthearted tone whilst discussing the effects of radiation on children – at one point there’s an audible AI chuckle from the female ‘host’. It’s honestly quite disturbing.

On a less sensitive topic however, the output was decent. When I input the Counter website into NotebookLM, it generated audio that was both pretty accurate and insightful, with a natural, conversational tone and cadence weirdly similar to that of a podcaster:

How I’ll be Trying to Use it

Continuing with the positives: crucially, a notebook is shareable. The summaries and analyses you generate can form the knowledge base for a group of people. Team member A can get a useful answer about a family tree, share the findings in the notebook with team member B who then continues to interrogate and work with the research assistant. This really interests me from an engineering and product delivery point of view: it means that a team with varying levels of technical expertise could communicate in natural language with a multimodal knowledge base. At least that’s my hope.

For technical projects with a good amount of high-quality documentation this could be useful simply because the semantic search is highly performant. For messy projects it could still be useful to use NotebookLM to highlight inconsistencies or gaps in the knowledge base. In either case, being able to use so many different kinds of source data in such vast quantities could streamline some of the more thankless engineering activities.

NotebookLM would probably need API support and the ability to ingest directories (e.g. a code repository) to deliver a true software engineering RAG nirvana. I’d love to have a queryable multimodal knowledge base that includes the source code for a project that always has the latest/most stable version of your project as context.

I’ll continue playing with NotebookLM over the next couple of weeks and share my findings. If my next post is the announcement of a new podcast series then the experiment has probably gone a little too well.

  •  Genkit is currently in Beta and there’s a separate write-up coming soon!

We use cookies