Want to schedule a meeting? Submit your query first

Contact
AI Notes Generator
Featured Case StudyGenerative AI / Content EngineeringFeb 2025
Genrate Notes @edza.ai

AI Notes Generator

Deterministic Content Orchestration

A production-grade pipeline that converts raw curriculum data into structured, visually rich PDF textbooks using Multi-Layer Caching and layout-aware rendering.

Core Technologies

Python 3.10WeasyPrint (PDF Engine)Jinja2 (Templating)Google Cloud StorageAsyncIOBeautifulSoup4

The Engineering Challenge

LLMs output text. Students need Textbooks.

There is a massive gap between a ChatGPT response and a usable study document.

  1. Structure: LLMs often forget to nest sections correctly.

  2. Formatting: Raw Markdown doesn't handle page breaks, headers, or image alignment suitable for printing.

  3. Cost: Regenerating the same chapter for 1,000 students is a waste of compute resources.

The Solution: A Static Generation Pipeline. We built a system inspired by Static Site Generators (SSG). Instead of generating notes on every request, we treat the notes as artifacts. Once generated, they are immutable, cached, and served globally via CDN logic.

python
1class AINotesGenerator: 2 """ 3 Production-grade static generation pipeline that transforms 4 curriculum data into structured, layout-aware PDF textbooks. 5 6 Deterministic. Cached. Immutable. 7 """ 8 9 async def generate(self, subject: str, grade: str, chapter: str, language: str): 10 # Resolve Artifact Identity (content-addressable) 11 artifact_key = self._compute_artifact_key( 12 subject, grade, chapter, language 13 ) 14 15 # Multi-Layer Cache Check (Local → Cloud → Generate) 16 if await self._artifact_exists_in_cache(artifact_key): 17 return await self._serve_cached_artifact(artifact_key) 18 19 # Curriculum Resolution Layer 20 curriculum = await self._resolve_or_generate_curriculum( 21 subject, grade, chapter 22 ) 23 24 # Structured Content Synthesis (LLM Orchestration) 25 structured_markdown = await self._synthesize_notes(curriculum, language) 26 27 # Layout-Aware Rendering Pipeline 28 html = self._compile_to_layout(structured_markdown) 29 html = self._inject_visual_assets(html) 30 html = self._render_math(html) 31 32 # Deterministic PDF Compilation 33 pdf_path = await self._render_pdf_artifact(html, artifact_key) 34 35 # Artifact Persistence (Immutable + CDN-ready) 36 await self._persist_artifact(pdf_path, artifact_key) 37 38 return pdf_path
85%
Cache Hit Rate
Print-Ready
PDF Quality
KaTeX/LaTeX
Math Support
~80%
Cost Saving

The 'Idempotent' Caching Strategy

The most critical engineering decision was the Cache-First Architecture. LLM tokens are expensive; storage is cheap.

When a request comes in for *"Physics

  • Class 10
  • Electricity"*, the system does not call the AI immediately.
  1. GCS Lookup: It constructs a deterministic path (e.g., notes/physics/10/electricity.pdf) and checks Google Cloud Storage.

  2. Instant Delivery: If the file exists, it returns the signed URL instantly. Zero AI cost.

  3. Generation (Cache Miss): Only if the file is missing does it trigger the expensive generation pipeline.

This turns an O(N) cost model (cost scales with users) into an O(1) cost model (cost scales with subjects).

services/notes_services.py
1class ArtifactCacheResolver: 2 """ 3 Idempotent, cache-first architecture. 4 LLM generation is triggered ONLY on cache miss. 5 """ 6 7 async def resolve_notes(self, subject: str, grade: str, chapter: str) -> str: 8 # Deterministic Artifact Path 9 artifact_path = self._build_artifact_path( 10 subject=subject, 11 grade=grade, 12 chapter=chapter 13 ) 14 15 # Cloud Storage Lookup (Cheap Operation) 16 if await self._exists_in_gcs(artifact_path): 17 return await self._get_signed_url(artifact_path) 18 19 # Cache Miss → Trigger Expensive Pipeline 20 pdf_path = await self._generate_notes_artifact( 21 subject, grade, chapter 22 ) 23 24 # Persist Immutable Artifact 25 await self._upload_to_gcs(pdf_path, artifact_path) 26 27 return await self._get_signed_url(artifact_path)

Structured Intelligence: JSON before Text

To ensure the notes adhere to the CBSE curriculum, we don't ask the AI to "write notes" immediately. We use a Two-Pass Generation Strategy.

Pass 1: The Skeleton (JSON) We force the AI to generate a JSON object representing the curriculum tree (Sections, Subsections, Activity Headers). This guarantees the structure is correct before we write a single word of content. It also allows us to cache the curriculum structure locally "curriculum.json" to speed up future regenerations.

services/notes_services.py
1prompt = """ 2You are a curriculum designer. Output JSON ONLY. 3Schema: 4{ 5 "chapter_title": "...", 6 "sections": [ 7 { "title": "...", "difficulty": "Medium", "subsections": [...] } 8 ] 9} 10""" 11# We parse this JSON to guide the actual content generation later

Content Expansion & Enrichment

Once we have the curriculum skeleton, we pass it to the Content Engine.

  • Markdown Generation: The AI fills in the flesh of the document using Markdown, strictly enforcing LaTeX formatting for mathematical equations (e.g., $E=mc^2).
  • Media Injection: The system parses the generated content. If it sees a header like "Electromagnetic Induction," it asynchronously queries Wikipedia's Media API, finds a relevant diagram, and injects the <img> tag into the content stream. This happens automatically without human intervention.

The Rendering Engine (HTML to PDF)

The final step is converting raw text into a beautiful document. We use WeasyPrint, a browser-grade rendering engine.

We treat the notes like a web page.

  1. Jinja2 Templating: We inject the content into an HTML template that defines fonts, margins, and branding.

  2. Math Rendering: We run a pre-processing pass to convert LaTeX equations into SVG/HTML using KaTeX, ensuring math symbols look crisp in print.

  3. PDF Conversion: The HTML is compiled into a binary PDF file. This allows us to control page breaks so headers are not stranded at the bottom of a page.

services/notes_services.py
1def _render_pdf(self, html_content, output_path, base_url): 2 # WeasyPrint renders the HTML/CSS to a PDF binary 3 # 'base_url' ensures local images/fonts are resolved correctly 4 5 font_config = FontConfiguration() 6 html = HTML(string=html_content, base_url=base_url) 7 8 html.write_pdf( 9 output_path, 10 font_config=font_config, 11 presentational_hints=True 12 )
image

Async Cleanup & Delivery

The moment the PDF is generated, we perform two actions in parallel using FastAPI Background Tasks:

  1. Serve to User: The file is streamed to the user's browser immediately.

  2. Hydrate Cache: The file is uploaded to GCS in the background. The next user who asks for this chapter will get the cached version instantly.

  3. Self-Destruct: Local temporary files are wiped to ensure the server remains stateless and storage does not bloat.