Streamlining Feedback: A Streamlit App That Reads PDFs and Generates Surveys with AI
Overview
A large federal government organization commissioned a Streamlit application (Python) that leverages AI to read PDFs (reports, grant applications, guidance documents) and automatically generate draft survey questions and response templates. The goal was to accelerate stakeholder feedback collection, reduce manual effort, and ensure surveys reflected the specific language and concerns found in source documents.
The challenge
- Manual process: subject-matter experts spent hours reading long PDFs and drafting surveys, slowing stakeholder outreach.
- Inconsistent surveys: variations in question quality and alignment with source documents reduced response relevance.
- Scale: SBA handles many program documents regularly; a manual workflow couldn’t keep pace.
- Compliance & privacy: extracted content could contain sensitive or personally identifiable information requiring careful handling.
Objectives
- Build a simple, secure Streamlit app to ingest PDFs and produce draft survey questions and response options.
- Use AI to summarize documents, identify key themes, and propose relevant, unbiased survey items.
- Provide an editor workflow so SMEs could review, edit, and publish surveys quickly.
- Ensure data privacy, access controls, and auditability for compliance.
Solution
A lightweight Streamlit application written in Python that combines document processing, AI analysis, and survey generation with a reviewer workflow.
Key features
- PDF ingestion: supports single PDFs and bulk uploads; extracts text with OCR fallback (Tesseract) for scanned documents.
- Document understanding: AI-based summarization and topic extraction to surface key themes, stakeholders, and action items.
- Survey generation: AI drafts multiple-choice, Likert-scale, and open-ended questions tied to extracted themes, with suggested response choices and validation rules.
- Editor & approval flow: inline editing of questions, versioning, and a reviewer sign-off step before publishing.
- Export & integration: exports survey JSON, CSV, and can push to survey platforms (e.g., Qualtrics, SurveyMonkey) via APIs.
- Security & privacy: role-based access, PII redaction prompts, encrypted storage for uploaded documents, and audit logs.
- Lightweight deployment: Dockerized app deployable to managed platforms or internal hosts; optional authentication via SSO (SAML/OIDC).
Architecture (high level)
- Streamlit frontend (Python) for upload, review, and editing.
- Backend workers (FastAPI/RQ or Celery) for PDF extraction, OCR, and AI calls.
- AI services: LLMs for summarization, question generation, and topic extraction (hosted or via secure API).
- Storage: encrypted object store for PDFs, Postgres for metadata and versions, and audit logs in append-only storage.
- Integrations: survey platforms and internal CRM for respondent targeting.
Implementation highlights
- Phase 1 — Requirements & compliance (2 weeks): mapped document types, PII risks, access controls, and integration targets.
- Phase 2 — Prototype (3 weeks): Streamlit UI with single‑PDF upload, AI summarization, and initial survey draft generation.
- Phase 3 — Robust extraction & OCR (2 weeks): added Tesseract OCR, layout-aware extraction to preserve headings and tables.
- Phase 4 — Reviewer workflow & export (2 weeks): inline editing, version history, and export connectors to survey platforms.
- Phase 5 — Security hardening & rollout (3 weeks): encryption at rest, RBAC, SSO integration, and pilot with one program office.
AI role and guardrails
- AI generated initial summaries, candidate questions, and response templates — accelerating drafts from hours to minutes.
- Guardrails enforced: PII redaction prompts, model output filters for bias/unbiased wording, and SME review required before publishing.
- Explainability: the app surfaced source excerpts tied to each generated question so reviewers could verify provenance.
Business results (measured)
- Time to draft a first survey dropped from ~6 hours to under 20 minutes for typical documents.
- SME review time for surveys decreased ~70% because AI produced contextually relevant question candidates.
- Volume capability: the SBA could process dozens of PDFs weekly instead of a handful.
- Higher-quality surveys: initial pilot showed a 25% increase in response relevance as judged by program analysts.
- Compliance maintained: no PII leaks in pilot due to enforced redaction and access controls.
Customer impact
- Before: Analysts read 40‑page grant reports and manually composed surveys to collect grantee feedback, taking 4–6 hours per survey.
- After: Uploading the report produced a draft 12‑question survey (mix of Likert, MC, and open text) in ~15 minutes; after a 30‑minute SME review the survey was published to stakeholders.
Lessons learned
- Keep SMEs in the loop: AI speeds drafting but domain experts must approve and refine questions.
- Tie questions to source excerpts to increase trust and traceability.
- Build strict PII handling flows and surface redaction suggestions proactively.
- Start with a narrow document type set to tune prompts and extraction rules before broadening.
- Monitor survey performance and iterate prompts based on response quality.
Why this mattered
- Faster, better-aligned surveys improved the SBA’s ability to collect actionable feedback and make program adjustments sooner.
- Staff time saved could be reallocated to analysis and program improvements rather than drafting.
- Scalable process enabled broader stakeholder engagement across more programs without linear increases in headcount.
Bottom line
A Streamlit-based, AI-enabled PDF-to-survey app let helped a federal government organization transform a slow, manual survey creation process into a fast, auditable workflow — producing quality surveys in minutes, preserving compliance, and scaling stakeholder feedback across programs.