
AI Notes Generator
Deterministic Content Orchestration
A production-grade pipeline that converts raw curriculum data into structured, visually rich PDF textbooks using Multi-Layer Caching and layout-aware rendering.
Core Technologies
The Engineering Challenge
LLMs output text. Students need Textbooks.
There is a massive gap between a ChatGPT response and a usable study document.
-
Structure: LLMs often forget to nest sections correctly.
-
Formatting: Raw Markdown doesn't handle page breaks, headers, or image alignment suitable for printing.
-
Cost: Regenerating the same chapter for 1,000 students is a waste of compute resources.
The Solution: A Static Generation Pipeline. We built a system inspired by Static Site Generators (SSG). Instead of generating notes on every request, we treat the notes as artifacts. Once generated, they are immutable, cached, and served globally via CDN logic.
1class AINotesGenerator:
2 """
3 Production-grade static generation pipeline that transforms
4 curriculum data into structured, layout-aware PDF textbooks.
5
6 Deterministic. Cached. Immutable.
7 """
8
9 async def generate(self, subject: str, grade: str, chapter: str, language: str):
10 # Resolve Artifact Identity (content-addressable)
11 artifact_key = self._compute_artifact_key(
12 subject, grade, chapter, language
13 )
14
15 # Multi-Layer Cache Check (Local → Cloud → Generate)
16 if await self._artifact_exists_in_cache(artifact_key):
17 return await self._serve_cached_artifact(artifact_key)
18
19 # Curriculum Resolution Layer
20 curriculum = await self._resolve_or_generate_curriculum(
21 subject, grade, chapter
22 )
23
24 # Structured Content Synthesis (LLM Orchestration)
25 structured_markdown = await self._synthesize_notes(curriculum, language)
26
27 # Layout-Aware Rendering Pipeline
28 html = self._compile_to_layout(structured_markdown)
29 html = self._inject_visual_assets(html)
30 html = self._render_math(html)
31
32 # Deterministic PDF Compilation
33 pdf_path = await self._render_pdf_artifact(html, artifact_key)
34
35 # Artifact Persistence (Immutable + CDN-ready)
36 await self._persist_artifact(pdf_path, artifact_key)
37
38 return pdf_pathThe 'Idempotent' Caching Strategy
The most critical engineering decision was the Cache-First Architecture. LLM tokens are expensive; storage is cheap.
When a request comes in for *"Physics
- Class 10
- Electricity"*, the system does not call the AI immediately.
-
GCS Lookup: It constructs a deterministic path (e.g.,
notes/physics/10/electricity.pdf) and checks Google Cloud Storage. -
Instant Delivery: If the file exists, it returns the signed URL instantly. Zero AI cost.
-
Generation (Cache Miss): Only if the file is missing does it trigger the expensive generation pipeline.
This turns an O(N) cost model (cost scales with users) into an O(1) cost model (cost scales with subjects).
1class ArtifactCacheResolver:
2 """
3 Idempotent, cache-first architecture.
4 LLM generation is triggered ONLY on cache miss.
5 """
6
7 async def resolve_notes(self, subject: str, grade: str, chapter: str) -> str:
8 # Deterministic Artifact Path
9 artifact_path = self._build_artifact_path(
10 subject=subject,
11 grade=grade,
12 chapter=chapter
13 )
14
15 # Cloud Storage Lookup (Cheap Operation)
16 if await self._exists_in_gcs(artifact_path):
17 return await self._get_signed_url(artifact_path)
18
19 # Cache Miss → Trigger Expensive Pipeline
20 pdf_path = await self._generate_notes_artifact(
21 subject, grade, chapter
22 )
23
24 # Persist Immutable Artifact
25 await self._upload_to_gcs(pdf_path, artifact_path)
26
27 return await self._get_signed_url(artifact_path)Structured Intelligence: JSON before Text
To ensure the notes adhere to the CBSE curriculum, we don't ask the AI to "write notes" immediately. We use a Two-Pass Generation Strategy.
Pass 1: The Skeleton (JSON) We force the AI to generate a JSON object representing the curriculum tree (Sections, Subsections, Activity Headers). This guarantees the structure is correct before we write a single word of content. It also allows us to cache the curriculum structure locally "curriculum.json" to speed up future regenerations.
1prompt = """
2You are a curriculum designer. Output JSON ONLY.
3Schema:
4{
5 "chapter_title": "...",
6 "sections": [
7 { "title": "...", "difficulty": "Medium", "subsections": [...] }
8 ]
9}
10"""
11# We parse this JSON to guide the actual content generation laterContent Expansion & Enrichment
Once we have the curriculum skeleton, we pass it to the Content Engine.
- Markdown Generation: The AI fills in the flesh of the document using Markdown, strictly enforcing LaTeX formatting for mathematical equations (e.g.,
$E=mc^2). - Media Injection: The system parses the generated content. If it sees a header like "Electromagnetic Induction," it asynchronously queries Wikipedia's Media API, finds a relevant diagram, and injects the
<img>tag into the content stream. This happens automatically without human intervention.
The Rendering Engine (HTML to PDF)
The final step is converting raw text into a beautiful document. We use WeasyPrint, a browser-grade rendering engine.
We treat the notes like a web page.
-
Jinja2 Templating: We inject the content into an HTML template that defines fonts, margins, and branding.
-
Math Rendering: We run a pre-processing pass to convert LaTeX equations into SVG/HTML using KaTeX, ensuring math symbols look crisp in print.
-
PDF Conversion: The HTML is compiled into a binary PDF file. This allows us to control page breaks so headers are not stranded at the bottom of a page.
1def _render_pdf(self, html_content, output_path, base_url):
2 # WeasyPrint renders the HTML/CSS to a PDF binary
3 # 'base_url' ensures local images/fonts are resolved correctly
4
5 font_config = FontConfiguration()
6 html = HTML(string=html_content, base_url=base_url)
7
8 html.write_pdf(
9 output_path,
10 font_config=font_config,
11 presentational_hints=True
12 )
Async Cleanup & Delivery
The moment the PDF is generated, we perform two actions in parallel using FastAPI Background Tasks:
-
Serve to User: The file is streamed to the user's browser immediately.
-
Hydrate Cache: The file is uploaded to GCS in the background. The next user who asks for this chapter will get the cached version instantly.
-
Self-Destruct: Local temporary files are wiped to ensure the server remains stateless and storage does not bloat.