Skip to content

File Reading & Document Processing

Rawi provides powerful file reading capabilities that allow you to process and analyze various document formats seamlessly. Extract content from PDFs, Word documents, Excel spreadsheets, and text files to enhance your AI conversations with rich document context.

Terminal window
# Process a single file
rawi ask --file document.pdf "Summarize this document"
# Multiple files
rawi ask --files config.json package.json "Compare these configurations"
# Batch processing with patterns
rawi ask --batch "src/**/*.js" "Review all JavaScript files"

Extract text content from PDF documents for analysis and summarization.

Terminal window
# Basic PDF processing
rawi ask --file report.pdf "What are the main conclusions?"
# Detailed analysis
rawi ask --file financial-report.pdf --act data-analyst "Analyze the financial trends"
# Research paper review
rawi ask --file research.pdf --act academic-researcher "Summarize the methodology and findings"

Features:

  • ✅ Text extraction from standard PDFs
  • ✅ Multi-page document support
  • ✅ Automatic content cleaning
  • ⚠️ Limited support for image-heavy PDFs
  • ⚠️ Encrypted PDFs require manual unlock

Process Word documents with full text extraction and formatting awareness.

Terminal window
# Document analysis
rawi ask --file requirements.docx "Extract all functional requirements"
# Content review
rawi ask --file proposal.docx --act tech-writer "Review this proposal for clarity"
# Template generation
rawi ask --file example.docx "Create a similar document template"

Features:

  • ✅ Complete text extraction
  • ✅ Table content preservation
  • ✅ Header and footer inclusion
  • ✅ Embedded formatting context
  • ⚠️ Images and charts not processed

Analyze Excel data with sheet selection and intelligent data formatting.

Terminal window
# Data analysis
rawi ask --file sales-data.xlsx "Identify sales trends and patterns"
# Specific sheet processing
rawi ask --file workbook.xlsx --sheet "Q4 Results" "Analyze Q4 performance"
# Financial modeling
rawi ask --file budget.xlsx --act financial-analyst "Review this budget for accuracy"

Features:

  • ✅ Multiple sheet support
  • ✅ Automatic data formatting
  • ✅ Header row detection
  • ✅ Formula result extraction
  • ✅ Sheet name listing
  • ⚠️ Charts and pivot tables not processed

Process various text-based files with intelligent content type detection.

Terminal window
# Source code review
rawi ask --file app.js "Review this code for best practices"
# Configuration analysis
rawi ask --file docker-compose.yml "Optimize this Docker configuration"
# Log file analysis
rawi ask --file app.log "Identify and categorize errors"
# Documentation review
rawi ask --file README.md "Improve this documentation"

Supported Extensions:

  • .js, .ts, .jsx, .tsx - JavaScript/TypeScript
  • .py, .pyw - Python
  • .java, .kt, .scala - JVM languages
  • .cpp, .c, .h, .hpp - C/C++
  • .rs - Rust
  • .go - Go
  • .rb - Ruby
  • .php - PHP
  • .swift - Swift
  • .dart - Dart
Terminal window
# Basic file analysis
rawi ask --file document.pdf "Summarize this document"
# With expert template
rawi ask --file code.js --act code-reviewer "Review this code"
# Override file type detection
rawi ask --file unknown-file --file-type txt "Process as text"
Terminal window
# Process specific files
rawi ask --files file1.pdf file2.docx "Compare these documents"
# Up to 10 files at once
rawi ask --files *.json "Validate all JSON configurations"
Terminal window
# Process all JavaScript files
rawi ask --batch "src/**/*.js" "Find potential bugs"
# Multiple patterns
rawi ask --batch "*.{json,yml,yaml}" "Validate all config files"
# Exclude patterns
rawi ask --batch "src/**/*.js" --exclude "**/node_modules/**" "Review source code only"
Terminal window
# Process files in parallel for speed
rawi ask --batch "docs/**/*.md" --parallel "Create documentation index"
# Control concurrency
rawi ask --batch "**/*.py" --parallel --max-concurrency 3 "Analyze Python code"
# Continue on errors
rawi ask --batch "**/*.json" --parallel --continue-on-error "Validate JSON files"
Terminal window
# 1. Overview
rawi ask --file report.pdf "Provide executive summary"
# 2. Detailed analysis
rawi ask --file report.pdf --act business-analyst "Extract key metrics and trends"
# 3. Action items
rawi ask --file report.pdf "What actions should be taken based on this report?"
Terminal window
# Technical documentation
rawi ask --file api.py --act technical-writer "Generate API documentation"
# Financial analysis
rawi ask --file budget.xlsx --act financial-analyst "Analyze budget variance"
# Research review
rawi ask --file paper.pdf --act academic-researcher "Critique methodology"
# Legal document review
rawi ask --file contract.docx --act legal-expert "Identify key terms and risks"
Terminal window
# Pipe content to Rawi
cat config.json | rawi ask "Optimize this configuration"
# Process command output
ls -la | rawi ask "Explain these file permissions"
# Generate files from analysis
rawi ask --file data.csv "Create Python script to visualize this data" > visualize.py
# Batch process with shell
for file in reports/*.pdf; do
rawi ask --file "$file" "Extract key metrics" >> summary.md
done
Terminal window
# These won't be processed:
# - Images within documents
# - Charts and graphs in Excel
# - Embedded objects
# - Protected/encrypted files
# - Binary files without text content
Terminal window
# For large files, extract specific sections
rawi ask --file large-report.pdf "Focus only on the executive summary section"
# Use batch processing for multiple files
rawi ask --batch "reports/*.pdf" --parallel "Extract key findings from each report"
# Filter before processing
grep ERROR app.log | rawi ask "Categorize these errors"

File not found:

Terminal window
# Use absolute paths
rawi ask --file "$(pwd)/document.pdf" "Analyze this"
# Check file exists
ls -la document.pdf

Permission denied:

Terminal window
# Fix file permissions
chmod 644 document.pdf

Empty content:

Terminal window
# Verify file integrity
file document.pdf
# Check file size
ls -lh document.pdf

Unsupported format:

Terminal window
# Override file type detection
rawi ask --file data.backup --file-type txt "Process as text"
# Convert to supported format first
pandoc document.rtf -o document.docx
rawi ask --file document.docx "Analyze this"

Timeout errors:

Terminal window
# Process in smaller chunks
split -l 100 large-file.txt chunk_
for chunk in chunk_*; do
cat "$chunk" | rawi ask "Analyze this section"
done

Memory issues:

Terminal window
# Use head/tail for samples
head -500 large-file.txt | rawi ask "Analyze this sample"
# Filter content first
grep -v DEBUG app.log | rawi ask "Analyze non-debug entries"

Ready to process your documents? Start with a simple rawi ask --file yourfile.pdf "Summarize this" and explore the possibilities!