Want to schedule a meeting? Submit your query first

Contact
AI Notes Generator
Featured Case StudyGenerative AI / Content Engineering2025-02-01
Built internally for Edza.ai

AI Notes Generator

Deterministic Content Orchestration

Static PDF Generation for Educational Content

A production-grade pipeline that converts raw curriculum data into structured, visually rich PDF textbooks using multi-layer caching and layout-aware rendering. This system demonstrates how to apply Static Site Generator (SSG) principles to AI-powered content generation.

The Problem: Generating educational content is computationally expensive. LLM token costs multiply when generating the same chapter for thousands of students. Additionally, raw LLM output lacks structure—no proper formatting, page breaks, or academic standards compliance.

The Solution: Cache-First Architecture treating notes as immutable artifacts. Once generated, PDFs are cached and served globally. The first user pays the generation cost; subsequent users get instant delivery from cloud storage at sub-second latency.

Key Innovation - Artifact Caching: Three-stage pipeline with content-addressable hashing. Stage 1 generates curriculum structure (JSON). Stage 2 enriches content with media and formatting. Stage 3 renders to layout-aware HTML and compiles to PDF. Each stage's output is hashed and cached, achieving 85% cache hit rates.

Production Results: 80% cost reduction per PDF, print-quality output compliant with CBSE standards, sub-second delivery for cached content, support for KaTeX/LaTeX mathematical equations.

Core Technologies

Python 3.10WeasyPrint (PDF Engine)Jinja2 (Templating)Google Cloud StorageAsyncIOBeautifulSoup4

The Engineering Challenge

LLMs output text. Students need Textbooks.

There is a massive gap between a ChatGPT response and a usable study document.

  1. Structure: LLMs often forget to nest sections correctly.

  2. Formatting: Raw Markdown doesn't handle page breaks, headers, or image alignment suitable for printing.

  3. Cost: Regenerating the same chapter for 1,000 students is a waste of compute resources.

The Solution: A Static Generation Pipeline. We built a system inspired by Static Site Generators (SSG). Instead of generating notes on every request, we treat the notes as artifacts. Once generated, they are immutable, cached, and served globally via CDN logic.

python
1class AINotesGenerator: 2 """ 3 Production-grade static generation pipeline that transforms 4 curriculum data into structured, layout-aware PDF textbooks. 5 6 Deterministic. Cached. Immutable. 7 """ 8 9 async def generate(self, subject: str, grade: str, chapter: str, language: str): 10 # Resolve Artifact Identity (content-addressable) 11 artifact_key = self._compute_artifact_key( 12 subject, grade, chapter, language 13 ) 14 15 # Multi-Layer Cache Check (Local → Cloud → Generate) 16 if await self._artifact_exists_in_cache(artifact_key): 17 return await self._serve_cached_artifact(artifact_key) 18 19 # Curriculum Resolution Layer 20 curriculum = await self._resolve_or_generate_curriculum( 21 subject, grade, chapter 22 ) 23 24 # Structured Content Synthesis (LLM Orchestration) 25 structured_markdown = await self._synthesize_notes(curriculum, language) 26 27 # Layout-Aware Rendering Pipeline 28 html = self._compile_to_layout(structured_markdown) 29 html = self._inject_visual_assets(html) 30 html = self._render_math(html) 31 32 # Deterministic PDF Compilation 33 pdf_path = await self._render_pdf_artifact(html, artifact_key) 34 35 # Artifact Persistence (Immutable + CDN-ready) 36 await self._persist_artifact(pdf_path, artifact_key) 37 38 return pdf_path
85%
Cache Hit Rate
Print-Ready
PDF Quality
KaTeX/LaTeX
Math Support
~80%
Cost Saving

The 'Idempotent' Caching Strategy

The most critical engineering decision was the Cache-First Architecture. LLM tokens are expensive; storage is cheap.

When a request comes in for *"Physics

  • Class 10
  • Electricity"*, the system does not call the AI immediately.
  1. GCS Lookup: It constructs a deterministic path (e.g., notes/physics/10/electricity.pdf) and checks Google Cloud Storage.

  2. Instant Delivery: If the file exists, it returns the signed URL instantly. Zero AI cost.

  3. Generation (Cache Miss): Only if the file is missing does it trigger the expensive generation pipeline.

This turns an O(N) cost model (cost scales with users) into an O(1) cost model (cost scales with subjects).

services/notes_services.py
1class ArtifactCacheResolver: 2 """ 3 Idempotent, cache-first architecture. 4 LLM generation is triggered ONLY on cache miss. 5 """ 6 7 async def resolve_notes(self, subject: str, grade: str, chapter: str) -> str: 8 # Deterministic Artifact Path 9 artifact_path = self._build_artifact_path( 10 subject=subject, 11 grade=grade, 12 chapter=chapter 13 ) 14 15 # Cloud Storage Lookup (Cheap Operation) 16 if await self._exists_in_gcs(artifact_path): 17 return await self._get_signed_url(artifact_path) 18 19 # Cache Miss → Trigger Expensive Pipeline 20 pdf_path = await self._generate_notes_artifact( 21 subject, grade, chapter 22 ) 23 24 # Persist Immutable Artifact 25 await self._upload_to_gcs(pdf_path, artifact_path) 26 27 return await self._get_signed_url(artifact_path)

Structured Intelligence: JSON before Text

To ensure the notes adhere to the CBSE curriculum, we don't ask the AI to "write notes" immediately. We use a Two-Pass Generation Strategy.

Pass 1: The Skeleton (JSON) We force the AI to generate a JSON object representing the curriculum tree (Sections, Subsections, Activity Headers). This guarantees the structure is correct before we write a single word of content. It also allows us to cache the curriculum structure locally "curriculum.json" to speed up future regenerations.

services/notes_services.py
1prompt = """ 2You are a curriculum designer. Output JSON ONLY. 3Schema: 4{ 5 "chapter_title": "...", 6 "sections": [ 7 { "title": "...", "difficulty": "Medium", "subsections": [...] } 8 ] 9} 10""" 11# We parse this JSON to guide the actual content generation later

Content Expansion & Enrichment

Once we have the curriculum skeleton, we pass it to the Content Engine.

  • Markdown Generation: The AI fills in the flesh of the document using Markdown, strictly enforcing LaTeX formatting for mathematical equations (e.g., $E=mc^2).
  • Media Injection: The system parses the generated content. If it sees a header like "Electromagnetic Induction," it asynchronously queries Wikipedia's Media API, finds a relevant diagram, and injects the <img> tag into the content stream. This happens automatically without human intervention.

The Rendering Engine (HTML to PDF)

The final step is converting raw text into a beautiful document. We use WeasyPrint, a browser-grade rendering engine.

We treat the notes like a web page.

  1. Jinja2 Templating: We inject the content into an HTML template that defines fonts, margins, and branding.

  2. Math Rendering: We run a pre-processing pass to convert LaTeX equations into SVG/HTML using KaTeX, ensuring math symbols look crisp in print.

  3. PDF Conversion: The HTML is compiled into a binary PDF file. This allows us to control page breaks so headers are not stranded at the bottom of a page.

services/notes_services.py
1def _render_pdf(self, html_content, output_path, base_url): 2 # WeasyPrint renders the HTML/CSS to a PDF binary 3 # 'base_url' ensures local images/fonts are resolved correctly 4 5 font_config = FontConfiguration() 6 html = HTML(string=html_content, base_url=base_url) 7 8 html.write_pdf( 9 output_path, 10 font_config=font_config, 11 presentational_hints=True 12 )
image

Async Cleanup & Delivery

The moment the PDF is generated, we perform two actions in parallel using FastAPI Background Tasks:

  1. Serve to User: The file is streamed to the user's browser immediately.

  2. Hydrate Cache: The file is uploaded to GCS in the background. The next user who asks for this chapter will get the cached version instantly.

  3. Self-Destruct: Local temporary files are wiped to ensure the server remains stateless and storage does not bloat.