Data Analysis Workflow
Overview
Section titled “Overview”Comprehensive data analysis workflows using AI assistance for data exploration, statistical analysis, visualization, machine learning, and reporting across various data formats and domains.
Prerequisites
Section titled “Prerequisites”- Rawi configured with a suitable provider
- Data files (CSV, JSON, Excel, databases)
- Basic understanding of data analysis concepts
- Python/R environment (optional for advanced analysis)
Workflow Steps
Section titled “Workflow Steps”1. Data Exploration and Profiling
Section titled “1. Data Exploration and Profiling”Initial data understanding and quality assessment:
# Explore CSV data structurerawi ask --file sales-data.csv "Analyze this dataset structure, data types, missing values, and provide initial insights"
# Profile dataset qualityrawi ask --file customer-data.xlsx "Perform comprehensive data quality assessment including duplicates, outliers, and inconsistencies"
# Multi-file data explorationrawi ask --files sales.csv customers.csv products.csv "Analyze relationships between these datasets and suggest join strategies"
2. Statistical Analysis
Section titled “2. Statistical Analysis”Perform comprehensive statistical analysis:
# Descriptive statisticsrawi ask --file survey-data.csv --act data-analyst "Generate comprehensive descriptive statistics and identify key patterns"
# Correlation analysisrawi ask --file financial-data.csv --act statistician "Perform correlation analysis and identify significant relationships between variables"
# Hypothesis testingrawi ask --file experiment-results.csv --act statistician "Perform appropriate statistical tests to validate experimental hypotheses"
3. Data Visualization Recommendations
Section titled “3. Data Visualization Recommendations”Generate visualization strategies and code:
# Visualization strategyrawi ask --file time-series-data.csv --act data-visualizer "Recommend optimal visualization strategies for this time series data"
# Python visualization coderawi ask --file sales-data.csv --act data-scientist "Generate Python code using matplotlib/seaborn for comprehensive data visualization"
# Dashboard specificationsrawi ask --file kpi-data.csv --act dashboard-designer "Design dashboard layout and visualizations for business KPIs"
4. Machine Learning Analysis
Section titled “4. Machine Learning Analysis”Apply machine learning techniques:
# ML model recommendationsrawi ask --file customer-churn.csv --act machine-learning-engineer "Recommend appropriate ML models for customer churn prediction"
# Feature engineeringrawi ask --file raw-features.csv --act data-scientist "Suggest feature engineering techniques to improve model performance"
# Model evaluationrawi ask --file model-results.csv --act ml-engineer "Analyze model performance metrics and suggest improvements"
5. Automated Data Analysis Script
Section titled “5. Automated Data Analysis Script”Comprehensive data analysis automation:
#!/bin/bash# data-analysis-pipeline.sh - Comprehensive data analysis workflow
set -e
DATA_FILE=${1:-"data.csv"}ANALYSIS_TYPE=${2:-"exploratory"} # exploratory, statistical, ml, businessOUTPUT_DIR="analysis_$(date +%Y%m%d_%H%M%S)"
echo "📊 Starting data analysis for $DATA_FILE"
# Create analysis directory structuremkdir -p "$OUTPUT_DIR"/{reports,visualizations,insights,code,documentation}
# 1. Data profiling and quality assessmentecho "🔍 Performing data profiling..."rawi ask --file "$DATA_FILE" --act data-analyst "Perform comprehensive data profiling including:- Dataset structure and dimensions- Data types and formats- Missing value analysis- Duplicate record detection- Outlier identification- Data quality score- Summary statistics- Initial observations and concerns" > "$OUTPUT_DIR/reports/data-profile.md"
# 2. Exploratory data analysisecho "📈 Conducting exploratory data analysis..."rawi ask --file "$DATA_FILE" --act data-scientist "Conduct thorough exploratory data analysis including:- Distribution analysis for all variables- Correlation matrix and relationships- Trend identification and patterns- Seasonal patterns (if time-based)- Categorical variable analysis- Numerical variable distributions- Potential data transformations needed- Business insights and opportunities" > "$OUTPUT_DIR/reports/eda-report.md"
echo "✅ Data analysis complete!"echo "📊 Analysis results saved to: $OUTPUT_DIR"
6. Specialized Data Analysis
Section titled “6. Specialized Data Analysis”Financial Data Analysis
Section titled “Financial Data Analysis”# Financial performance analysisrawi ask --file financial-statements.xlsx --act financial-analyst "Analyze financial performance including:- Profitability ratios and trends- Liquidity and solvency analysis- Cash flow analysis- Revenue growth patterns- Cost structure optimization- Investment efficiency metrics- Risk assessment indicators"
# Portfolio analysisrawi ask --file portfolio-data.csv --act investment-analyst "Perform portfolio analysis including:- Asset allocation assessment- Risk-return analysis- Diversification evaluation- Performance attribution- Benchmark comparison- Optimization recommendations"
Marketing Data Analysis
Section titled “Marketing Data Analysis”# Campaign performance analysisrawi ask --file campaign-data.csv --act marketing-analyst "Analyze marketing campaign performance including:- ROI and ROAS calculations- Channel effectiveness comparison- Customer acquisition costs- Conversion funnel analysis- Audience segmentation insights- Attribution modeling- Budget optimization recommendations"
# Customer behavior analysisrawi ask --file customer-journey.csv --act customer-analyst "Analyze customer behavior patterns including:- Purchase journey mapping- Churn prediction indicators- Lifetime value calculations- Segmentation analysis- Behavioral triggers identification- Retention strategy recommendations"
Operational Data Analysis
Section titled “Operational Data Analysis”# Process efficiency analysisrawi ask --file operations-data.csv --act operations-analyst "Analyze operational efficiency including:- Process performance metrics- Bottleneck identification- Resource utilization analysis- Quality metrics assessment- Cost optimization opportunities- Productivity improvements- Automation potential"
# Supply chain analysisrawi ask --file supply-chain.csv --act supply-chain-analyst "Analyze supply chain performance including:- Inventory optimization- Supplier performance metrics- Demand forecasting accuracy- Lead time analysis- Cost structure evaluation- Risk assessment"
7. Advanced Analytics
Section titled “7. Advanced Analytics”Time Series Analysis
Section titled “Time Series Analysis”# Time series forecastingrawi ask --file time-series.csv --act time-series-analyst "Perform comprehensive time series analysis including:- Trend and seasonality decomposition- Stationarity testing- ARIMA modeling recommendations- Forecasting accuracy assessment- Anomaly detection- Intervention analysis- Business cycle identification"
# Predictive modelingrawi ask --file historical-data.csv --act predictive-analyst "Develop predictive models including:- Feature importance analysis- Model selection and validation- Hyperparameter optimization- Cross-validation strategies- Performance benchmarking- Deployment considerations"
Text Analytics
Section titled “Text Analytics”# Sentiment analysisrawi ask --file reviews-data.csv --act nlp-analyst "Perform text analytics including:- Sentiment analysis and scoring- Topic modeling and themes- Keyword extraction- Text classification- Emotion detection- Content recommendations- Brand perception analysis"
# Document analysisrawi ask --file survey-responses.txt --act text-analyst "Analyze text data including:- Thematic analysis- Content categorization- Frequency analysis- Readability assessment- Language pattern identification- Summary generation"
8. Data Visualization and Reporting
Section titled “8. Data Visualization and Reporting”Interactive Dashboard Design
Section titled “Interactive Dashboard Design”# Dashboard specificationrawi ask --file kpi-data.csv --act dashboard-designer "Design comprehensive dashboard including:- KPI visualization layout- Interactive filter components- Drill-down capabilities- Real-time data integration- Mobile responsiveness- User access controls- Export functionality"
# Visualization best practicesrawi ask --file complex-data.csv --act data-visualizer "Recommend visualization best practices including:- Chart type selection- Color scheme optimization- Layout and composition- Accessibility considerations- Interactive elements- Performance optimization- User experience design"
Report Automation
Section titled “Report Automation”# Automated reporting pipelinerawi ask --file report-data.csv --act reporting-analyst "Design automated reporting system including:- Report template design- Data refresh schedules- Distribution mechanisms- Quality validation checks- Exception handling- Performance monitoring- Version control"
9. Data Quality and Governance
Section titled “9. Data Quality and Governance”Data Quality Assessment
Section titled “Data Quality Assessment”# Comprehensive data quality auditrawi ask --batch "data/**/*.csv" --act data-quality-analyst "Perform data quality assessment including:- Completeness analysis- Accuracy validation- Consistency checks- Timeliness evaluation- Validity assessment- Uniqueness verification- Integrity validation"
# Data governance frameworkrawi ask --act data-governance-expert "Design data governance framework including:- Data ownership definitions- Quality standards and metrics- Access control policies- Data lifecycle management- Compliance requirements- Audit procedures- Training programs"
10. Advanced Statistical Modeling
Section titled “10. Advanced Statistical Modeling”Experimental Design
Section titled “Experimental Design”# A/B testing analysisrawi ask --file ab-test-results.csv --act experimental-designer "Analyze A/B testing results including:- Statistical significance testing- Effect size calculations- Power analysis- Confidence intervals- Multiple testing corrections- Practical significance assessment- Recommendation formulation"
# Causal inferencerawi ask --file observational-data.csv --act causal-analyst "Perform causal analysis including:- Confounding variable identification- Instrumental variable analysis- Propensity score matching- Difference-in-differences- Regression discontinuity- Sensitivity analysis"
Data Analysis Best Practices
Section titled “Data Analysis Best Practices”1. Data Quality
Section titled “1. Data Quality”- Validate data sources and collection methods
- Check for missing values and outliers
- Ensure data consistency and accuracy
- Document data transformations
- Implement quality monitoring
2. Statistical Rigor
Section titled “2. Statistical Rigor”- Choose appropriate statistical methods
- Validate assumptions
- Use proper significance levels
- Consider multiple testing corrections
- Report confidence intervals
3. Visualization Excellence
Section titled “3. Visualization Excellence”- Choose appropriate chart types
- Use clear and intuitive designs
- Ensure accessibility
- Provide proper context
- Enable interactivity when useful
4. Reproducibility
Section titled “4. Reproducibility”- Document analysis procedures
- Version control code and data
- Use reproducible environments
- Provide clear instructions
- Share code and methodologies
Integration with Tools
Section titled “Integration with Tools”Python Integration
Section titled “Python Integration”# Python analysis templateimport pandas as pdimport numpy as npimport matplotlib.pyplot as pltimport seaborn as snsfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import classification_report
# Load and explore datadf = pd.read_csv('data.csv')print(df.info())print(df.describe())
# Generated by Rawi analysis recommendations# [Insert AI-generated analysis code here]
R Integration
Section titled “R Integration”# R analysis templatelibrary(dplyr)library(ggplot2)library(tidyr)library(corrplot)
# Load and explore datadata <- read.csv('data.csv')summary(data)str(data)
# Generated by Rawi analysis recommendations# [Insert AI-generated analysis code here]