File Reading & Document Processing
📄 File Reading & Document Processing
Section titled “📄 File Reading & Document Processing”Rawi provides powerful file reading capabilities that allow you to process and analyze various document formats seamlessly. Extract content from PDFs, Word documents, Excel spreadsheets, and text files to enhance your AI conversations with rich document context.
🚀 Quick Start
Section titled “🚀 Quick Start”# Process a single filerawi ask --file document.pdf "Summarize this document"
# Multiple filesrawi ask --files config.json package.json "Compare these configurations"
# Batch processing with patternsrawi ask --batch "src/**/*.js" "Review all JavaScript files"
📋 Supported File Formats
Section titled “📋 Supported File Formats”📄 PDF Documents (.pdf)
Section titled “📄 PDF Documents (.pdf)”Extract text content from PDF documents for analysis and summarization.
# Basic PDF processingrawi ask --file report.pdf "What are the main conclusions?"
# Detailed analysisrawi ask --file financial-report.pdf --act data-analyst "Analyze the financial trends"
# Research paper reviewrawi ask --file research.pdf --act academic-researcher "Summarize the methodology and findings"
Features:
- ✅ Text extraction from standard PDFs
- ✅ Multi-page document support
- ✅ Automatic content cleaning
- ⚠️ Limited support for image-heavy PDFs
- ⚠️ Encrypted PDFs require manual unlock
📝 Microsoft Word Documents (.docx)
Section titled “📝 Microsoft Word Documents (.docx)”Process Word documents with full text extraction and formatting awareness.
# Document analysisrawi ask --file requirements.docx "Extract all functional requirements"
# Content reviewrawi ask --file proposal.docx --act tech-writer "Review this proposal for clarity"
# Template generationrawi ask --file example.docx "Create a similar document template"
Features:
- ✅ Complete text extraction
- ✅ Table content preservation
- ✅ Header and footer inclusion
- ✅ Embedded formatting context
- ⚠️ Images and charts not processed
📊 Microsoft Excel Spreadsheets (.xlsx)
Section titled “📊 Microsoft Excel Spreadsheets (.xlsx)”Analyze Excel data with sheet selection and intelligent data formatting.
# Data analysisrawi ask --file sales-data.xlsx "Identify sales trends and patterns"
# Specific sheet processingrawi ask --file workbook.xlsx --sheet "Q4 Results" "Analyze Q4 performance"
# Financial modelingrawi ask --file budget.xlsx --act financial-analyst "Review this budget for accuracy"
Features:
- ✅ Multiple sheet support
- ✅ Automatic data formatting
- ✅ Header row detection
- ✅ Formula result extraction
- ✅ Sheet name listing
- ⚠️ Charts and pivot tables not processed
📋 Text & Source Files
Section titled “📋 Text & Source Files”Process various text-based files with intelligent content type detection.
# Source code reviewrawi ask --file app.js "Review this code for best practices"
# Configuration analysisrawi ask --file docker-compose.yml "Optimize this Docker configuration"
# Log file analysisrawi ask --file app.log "Identify and categorize errors"
# Documentation reviewrawi ask --file README.md "Improve this documentation"
Supported Extensions:
.js
,.ts
,.jsx
,.tsx
- JavaScript/TypeScript.py
,.pyw
- Python.java
,.kt
,.scala
- JVM languages.cpp
,.c
,.h
,.hpp
- C/C++.rs
- Rust.go
- Go.rb
- Ruby.php
- PHP.swift
- Swift.dart
- Dart
.html
,.htm
,.xml
- Markup languages.css
,.scss
,.sass
- Stylesheets.json
,.yaml
,.yml
- Data formats.toml
,.ini
,.env
- Configuration.csv
,.tsv
- Tabular data.sql
- Database queries
.md
,.markdown
- Markdown.txt
- Plain text.log
- Log files.dockerfile
- Docker.gitignore
,.gitattributes
- GitREADME
,CHANGELOG
,LICENSE
- Project files
🔧 File Processing Options
Section titled “🔧 File Processing Options”Single File Processing
Section titled “Single File Processing”# Basic file analysisrawi ask --file document.pdf "Summarize this document"
# With expert templaterawi ask --file code.js --act code-reviewer "Review this code"
# Override file type detectionrawi ask --file unknown-file --file-type txt "Process as text"
Multiple File Processing
Section titled “Multiple File Processing”# Process specific filesrawi ask --files file1.pdf file2.docx "Compare these documents"
# Up to 10 files at oncerawi ask --files *.json "Validate all JSON configurations"
Batch Processing with Patterns
Section titled “Batch Processing with Patterns”# Process all JavaScript filesrawi ask --batch "src/**/*.js" "Find potential bugs"
# Multiple patternsrawi ask --batch "*.{json,yml,yaml}" "Validate all config files"
# Exclude patternsrawi ask --batch "src/**/*.js" --exclude "**/node_modules/**" "Review source code only"
Parallel Processing
Section titled “Parallel Processing”# Process files in parallel for speedrawi ask --batch "docs/**/*.md" --parallel "Create documentation index"
# Control concurrencyrawi ask --batch "**/*.py" --parallel --max-concurrency 3 "Analyze Python code"
# Continue on errorsrawi ask --batch "**/*.json" --parallel --continue-on-error "Validate JSON files"
💡 Advanced Usage Patterns
Section titled “💡 Advanced Usage Patterns”Document Analysis Workflow
Section titled “Document Analysis Workflow”# 1. Overviewrawi ask --file report.pdf "Provide executive summary"
# 2. Detailed analysisrawi ask --file report.pdf --act business-analyst "Extract key metrics and trends"
# 3. Action itemsrawi ask --file report.pdf "What actions should be taken based on this report?"
# 1. General reviewrawi ask --file src/app.js --act code-reviewer "Review this code"
# 2. Security auditrawi ask --file src/app.js --act security-expert "Check for vulnerabilities"
# 3. Performance analysisrawi ask --file src/app.js --act performance-engineer "Identify bottlenecks"
# 1. Data overviewrawi ask --file data.xlsx "Describe the dataset structure"
# 2. Statistical analysisrawi ask --file data.xlsx --act data-scientist "Perform statistical analysis"
# 3. Business insightsrawi ask --file data.xlsx --act business-analyst "Extract business insights"
Combining Files and Templates
Section titled “Combining Files and Templates”# Technical documentationrawi ask --file api.py --act technical-writer "Generate API documentation"
# Financial analysisrawi ask --file budget.xlsx --act financial-analyst "Analyze budget variance"
# Research reviewrawi ask --file paper.pdf --act academic-researcher "Critique methodology"
# Legal document reviewrawi ask --file contract.docx --act legal-expert "Identify key terms and risks"
Shell Integration
Section titled “Shell Integration”# Pipe content to Rawicat config.json | rawi ask "Optimize this configuration"
# Process command outputls -la | rawi ask "Explain these file permissions"
# Generate files from analysisrawi ask --file data.csv "Create Python script to visualize this data" > visualize.py
# Batch process with shellfor file in reports/*.pdf; do rawi ask --file "$file" "Extract key metrics" >> summary.mddone
⚠️ Limitations and Considerations
Section titled “⚠️ Limitations and Considerations”File Size Limits
Section titled “File Size Limits”Unsupported Content
Section titled “Unsupported Content”# These won't be processed:# - Images within documents# - Charts and graphs in Excel# - Embedded objects# - Protected/encrypted files# - Binary files without text content
Performance Tips
Section titled “Performance Tips”# For large files, extract specific sectionsrawi ask --file large-report.pdf "Focus only on the executive summary section"
# Use batch processing for multiple filesrawi ask --batch "reports/*.pdf" --parallel "Extract key findings from each report"
# Filter before processinggrep ERROR app.log | rawi ask "Categorize these errors"
🔍 Troubleshooting
Section titled “🔍 Troubleshooting”Common Issues
Section titled “Common Issues”File not found:
# Use absolute pathsrawi ask --file "$(pwd)/document.pdf" "Analyze this"
# Check file existsls -la document.pdf
Permission denied:
# Fix file permissionschmod 644 document.pdf
Empty content:
# Verify file integrityfile document.pdf
# Check file sizels -lh document.pdf
Unsupported format:
# Override file type detectionrawi ask --file data.backup --file-type txt "Process as text"
# Convert to supported format firstpandoc document.rtf -o document.docxrawi ask --file document.docx "Analyze this"
Performance Issues
Section titled “Performance Issues”Timeout errors:
# Process in smaller chunkssplit -l 100 large-file.txt chunk_for chunk in chunk_*; do cat "$chunk" | rawi ask "Analyze this section"done
Memory issues:
# Use head/tail for sampleshead -500 large-file.txt | rawi ask "Analyze this sample"
# Filter content firstgrep -v DEBUG app.log | rawi ask "Analyze non-debug entries"
📚 Related Documentation
Section titled “📚 Related Documentation”- ask Command Reference - Complete command options
- Advanced Usage Patterns - Effective usage strategies
- Shell Integration - Terminal workflow integration
- Workflow Examples - Real-world usage patterns
- Troubleshooting - Common issues and solutions
Ready to process your documents? Start with a simple rawi ask --file yourfile.pdf "Summarize this"
and explore the possibilities!